What is a content-based recommendation system, and how do you build one?

December 24, 2022
Content-based recommendations

Every time you tell a friend to check out a new TV show, recommend a book you recently read, or even tell them to visit a restaurant, you're recommending content to them! If you know your friend loves South-East Asian cuisine and there's a hot new Vietnamese restaurant in town, you’ll tell them about it. You're using the information you know about your friend, connecting things they might like or dislike to suggest things that might make them happy.

This is also the basic idea behind a content-based recommendation system: making suggestions based on what you know about an individual's preferences, and their past behaviors.

Recommender systems are used all the time, especially by streaming services like Netflix and Amazon Prime. They're also used by social media sites like Twitter to recommend new followers or groups to users, and by online retailers to suggest products that their customers might be interested in.

Broadly speaking there are 5 types of recommendation systems:

1. Collaborative:

It relies on grouping similar users to recommend new items to them. For example, if you've watched a lot of romantic comedies on Netflix, the platform will suggest similar movies to you that other people who have watched those same romantic comedies have also enjoyed.

2. Content:

This system relies on the information extracted from the content itself. So, a music streaming platform like Spotfiy might take into account factors like the genre, artist, beats-per-minute, etc. associated with the songs you've listened to in the past. 

3. Demographic:

This type of system makes recommendations based on demographic information, such as the user's age, gender, location, etc. A social recommender system might recommend different songs to you based on what people in your city and age group have been listening to recently.

4. Contextual:

Recommendations are made based on the context in which the user is consuming content. For example, if you're listening to music on your commute to work, the recommender system might suggest different songs to you than it would if you were sitting at home on a Saturday night.

5. Hybrid:

A combination of any of the above systems. Usually, the most common hybrid system is the combination of collaborative and content-based filtering.

How does it work?

How does content-based recommendation systems work? The fundamental concept is simple: you take the content (e.g. a movie, song, book, etc.) and extract information about that content (e.g. the genre, plot keywords, author, etc.). This information is then used to find other pieces of content that are similar, and those are recommended to the user.

The simplest way to understand this is through an example. Suppose you open Amazon Prime and watch the movie "Top Gun: Maverick". 

Prime’s recommendation system will extract information about "Top Gun: Maverick" like the genres (action, drama), plot keywords (war, fighter jet, epic heroes, etc.), and main actor (Tom Cruise). It will then use this information to find other movies with similar characteristics like ‘Mission Impossible’ or ‘Dunkirk’, and recommend them to you.

In short: The content is tagged with various different attributes, commonalities are identified, and then suggestions are made based on similarity between the content of the target item and the user's historical behavior.

Let's explore the technicalities of how exactly this works, and then how to go about building a content-based recommendation system. 

We need to first understand the user's preferences by analyzing their past behaviors. 

So, a user profile and item profiles are needed. These profiles can be described by a feature vector. A user profile is generated from their past behaviors, while an item profile is made up of features that describe the items.

Then we look at what items they have interacted with in the past and measure how similar those items are to each other. 

A Utility Matrix is constructed which contains the ratings given by different users to different items. This might be in the form of ratings out of 10, or simply a binary system of ‘0’ and ‘1’. 

Once we have a good understanding of the user's preferences, we can then use a similarity metric to recommend new items to them.

To generate recommendations, the system calculates the similarity between the user profile and item profile, it can then rank the items and recommend the most similar ones to the user.

The above example is a very simplistic way of how a content-based recommender system works. In reality, there are a lot of different factors that go into making recommendations, and the system is constantly learning and evolving as more data is collected.

But at its core, a content-based recommender system relies on understanding the user's preferences in order to make better suggestions. And that's why they can be so useful - they make recommendations specifically for you.

How to build one?

Here’s a step-by-step guide:

To make this easier, we’ll take a common example. You have two movies ‘A’, ‘B’, & ‘C’. Our user watches movie A, and we need to decide what movie to recommend to them.

1. Assign attributes to your data

In order to build a content-based recommender system, you'll first need to assign attributes to your data. For example, if you're working with a dataset of movies, each movie will have several attributes like the genre, plot keywords, director, etc.

This isn’t an easy undertaking, many companies hire experts to do so. Netflix for instance uses Hollywood screenwriters to rate shows on various aspects.

For our example, let’s say we choose 5 attributes: action, adventure, comedy, horror, and thriller.

2. Encode data and weigh attributes

Once you have your data attributes, you'll need to convert them into a common format. A simple way to encode data is by using a "1" or "0" to indicate whether or not an attribute is present. 

After you've encoded the data, you'll need to weigh the importance of each attribute. This is important because not all attributes are created equal. For example, if you're building a recommender system for movies, the genre will be more important than the plot keywords.

Back to our example, we create vectors for each movie. [Action, Adventure, Comedy, Horror, Thriller]

Movie A is encoded as [1, 1, 0, 0, 1] to indicate it has action, adventure, and thriller elements. Similarly, B is encoded as [0, 1, 1, 0, 0], and C is encoded as [0, 1, 0, 1, 1]

3. Choose a method

To generate recommendations, the system needs to calculate the similarity between the item profiles and/or the user profile.

We will be using a similarity-based method, other popular methods are One-class SVMs, Matrix Factorisation, and Deep Learning.

The most common similarity-based method is "cosine similarity", which measures the angle between two vectors. The more similar the angles are, the more similar the vectors are.

The cosine similarity is a number between 0 and 1, with 0 indicating that the two movies are completely different, and 1 indicating that they're exactly the same.

For two vectors, A & B, the cosine similarity is given by:

cos(A,B) = dot(A,B)/|A||B|

Using our example, you have three movies, A, B, and C and each movie has 5 attributes.

Using the above formula, the cosine similarity between A and B is: 0.408248.

While, between A and C it is: 0.666667

Clearly, it’s higher between A and C, and so our user who watched movie A would be recommended movie C.

4. Generate user preference profiles

To make even better recommendations, we can aggregate the item profiles of content a user has interacted with positively. This is commonly done by creating “user preference profiles".

User profiles are critical for recommending content. They typically include everything the user has interacted with – what they purchase, what they browse, watch, etc. along with the assigned attributes of that content.

A simple way to do so is to use weighted means of all movies the user rates or gives  a ‘thumbs up’ to. 

So for our example, if our user watches and positively rates movie C after movie A, then a user profile could be constructed using weighted means of vector representations of A and B.

Wrapping Up

Recommendation system is a great way for you to keep your users happy while also helping to improve the quality of the content on your application. Content-based recommender systems are a powerful tool that can be used to provide personalized recommendations to users.

By analyzing the attributes of your offerings, you get a better sense of what users are actually interested in and make sure that they're seeing the best possible content at all times. 

Ultimately, everyone wins with content-based filtering – users get more relevant, interesting content and platforms get better conversions, and build brand loyalty with their users.