How Using a 5-Point Scale Costs Netflix Sales
Posted by Jeffrey Henning on Mon, Dec 01, 2008
To celebrate Cyber Monday, let’s look at product ratings on online retailers. The New York Times has a great article on precisely this subject, “If You Liked This, You’re Sure to Love That”:
When Netflix customers log into their accounts, they can rate any movie from one to five stars, to help “teach” the Netflix system what their preferences are; the average customer has rated around 200 movies, so Netflix has a lot of information about what its customers like and don’t like….
Cinematch is the bit of software embedded in the Netflix Web site that analyzes each customer’s movie-viewing habits and recommends other movies that the customer might enjoy. (Did you like the legal thriller “The Firm”? Well, maybe you’d like “Michael Clayton.” Or perhaps “A Few Good Men.”) The Netflix Prize goes to anyone who can make Cinematch’s predictions 10 percent more accurate. One million dollars might sound like an awfully big prize for such a small improvement. But in fact, Netflix’s founders tried for years to improve Cinematch, with only incremental results, and they knew that a 10 percent bump would be a challenge for even the most deft programmer. They also knew that, as Reed Hastings, the chief executive of Netflix, told me recently, “getting to 10 percent would certainly be worth well in excess of $1 million” to the company. The competition was announced in October 2006, and no one has won yet.
The choice of a 1-5 scale is hurting Netflix here. A rating scale of 1 to 5 stars is fairly ubiquitous in online retailing, but the scarcity of choices hurts the predictive ability of Cinematch. The five-point scale is practically a three-point scale, with ratings of 3-5 stars accounting for 85% of the ratings (source of ratings: Ilya Grigorik).
Distribution of Netflix 1-5 Ratings
Most likely this is because of the bias of which movies Netflix users have watched: most users only watch movies they think they are going to like, and their ratings reflect this. It even becomes a positive feedback loop: the average rating has been increasing over time.
With most of the ratings bunched around 3, 4 and 5 stars, there is too little distinction and the average is 3.6 stars. Commenters on the blog Hacking Netflix have asked for half-star ratings, which would double the refinement of the scale.
For purposes of comparison, let’s look at the ratings of board games at BoardGameGeek, which is not an online retailer but a community site. (Hat-tip: Ratings export provided by Joe Grundy.) Board games are rated on a 1-10 scale, shown in the outer circle below, with the Netflix ratings distribution shown in the inner circle:
Distribution of Netflix 1-5 Ratings vs. BoardGameGeek 1-10 Ratings
Where 23% of Netflix ratings are 5 stars, only 13% of BGG ratings are ratings of 9 to 10. Interestingly, the bottom half of the BGG scale (ratings of 1-5) accounts for barely a fifth of the ratings (22%). Only the distributions of ratings are dramatically different. The averages are not far apart; Netflix movies have an average rating of 3.6, while BGG board games have an average rating of 3.5 on a 5-point scale (average of 6.7 on a 10-point scale). [Note: BGG permits users to enter fractional ratings; those ratings are not included in this analysis.]
A 1-10 scale, by providing finer shades of gradation at the low and high ends of the scale, might have increased the predictive validity of Cinematch. Since Netflix believes a 10% improvement in Cinematch “would certainly be worth well in excess of $1 million”, the choice of the wrong ratings scale is costing Netflix millions in potential sales.