Welcome to the Movie Recommender Demonstration!

Recommender systems significantly improve our online experience. Streaming services, social media and online shopping are shaped by the mechanisms of different recommender systems. By analyzing the content as well as user preferences, behavior and interactions with content, these systems provide personalized suggestions, helping users discover items, products, or information that aligns with their interests. They can efficiently navigate large amounts of data, ultimately enhancing user engagement and satisfaction as well as increasing sales and user retention. Recommender Systems can be split into three categories:

  1. Content-Based Filtering System: Recommendations are made purely on similarity to the prefered content.
  2. Collaborative Filtering System: Recommendations are made based on user-interactions with the content. Other user profiles are used.
  3. Hybrid System: These take into account content as well as user interactions.

For this project, we compare different systems. Additional details on these systems can be found in these Jupyter Notebooks.

Content-Based Filtering

Collaborative-Filtering

The first system we look at is a Content-Based Filtering System. For this purpose, we use this dataset (include link) that contains data on the top 1000 movies based on IMDB ratings. We use the description of the movie as well as the name of the director, the genre and the most prominent actors in the movie as a basis for the system. Building the recommender system relies on a few steps:

  1. Natural Language Processing: We combine the text data from the different features (overview, director, etc.) into a text. Note that you can apply different weights based on the importance that the features have for you in the system. The resulting texts undergoes some natural language processing (lemmatization and TFIDF-vectorization) as it is translated into numerical features.
  2. Similarity measures: The resulting numerical features of the input data is compared to the similarly processed data of the remaining movies. The comparison can be based on different measures. For this project, we used so called cosine similarity. There is an explanation of this metric below. Other possible metrics are the Euclidean distance or the Manhattan distance.
  3. Selection: Based on the similarity measure, the five movies that are most closely related to the input movies are selected.

We note here that this system has a clear weakness: If you like a movie from a franchise, recommendations are very likely to come from the same franchise. Additionally, I want to note here that the dataset is fairly small, but that issue can be easily fixed by using a larger dataset. That is a consideration for future work in this direction.

Here, you can test out the content based recommender system by yourself. You can plug in three movies you like and the system will recommend five similar movies.

Select three movies

Preferred Movies

Recommendations

While content-based system relie (as the name suggests) on the content of the movies, collaborative systems relie on human interactions with the movies. Collaborative-filtering systems can use user informations to compare the user to other users with similarity metrics. Recommendations then are made based on what people who are similar to me like. The system That was developed for this system, however, is item based. Here, items are compared based on user feedback (e.g. reviews) with similarity metrics. Then, the items that are most closely related to the items that I like are recommended.

Let's discuss in more detail what such a system entails. To build our system, we used a dataset of critic reviews on Rotten Tomatoes (include link to data set). We proceeded in the following steps:

  1. Reviewer-Movie-Matrix: The columns are the reviewers and the rows the movies. Each entry tells us what score the reviewer gave the movie (if any). If the reviewer did not write a review of a specific movie, we set the value to 0. (Most entries are zero, we say the matrix is sparse.)
  2. Movie-Similarity-Matrix: The Reviewer-Movie-Matrix gives us a vector of scores for each movie. We compare these vectors pairwise (here we use cosine similarity, but other metrics are possible) and create a new matrix that consists of these pairwise comparisons. The entries of this matrix tell us how similar each movie is to the other movies.
  3. Recommendations: When the user provides their favorite movies, the system checks the Movie-Similarity-Matrix and finds the movies that are closest to them by adding the cosine similarity between that movie and each of the favorite movies.

Here you can try out the recommender system for yourself. Note that this system works better than the content-based system. Part of the reason might be that the dataset is significantly larger than the conten-based dataset that was used (1000 movies compared to 10000).

Select three movies

Preferred Movies

Recommendations