The Napoleon Dynamite Problem Stymies Netflix Prize Competitors
from the love-it-or-hate-it dept
We’ve been covering the ongoing race to claim the $1 million Netflix Prize for a while now, highlighting some surprising and unique methods for attacking the problem. Every time we write about it, it appears that the lead teams have inched just slightly closer to that 10% improvement hurdle, but progress has certainly been slow. Clive Thompson’s latest NY Times piece looks at the latest standings, noting that the issue now is “The Napoleon Dynamite problem.”
Apparently, the algorithms cooked up by various teams seems to work great for your typical mainstream movies, but where it runs into trouble is when it hits on quirky films, like Napoleon Dynamite or Lost in Translation or I Heart Huckabees, where people tend to have a rather strong and immediate love or hate reaction to those films, with very little in-between. No one seems quite sure what leads to such a strong polar reaction, and no algorithm can yet figure out how people will react to such films, which is where all of the various algorithms seem to run into a dead end.
Some folks believe that’s just the nature of taste. It really can’t just be programmed like an algorithm, but takes into account a variety of other factors: including what your friends think of something, or even if you happened to go see that movie with certain friends. Basically, there are external factors that could play into taste, that isn’t necessarily indicated in the fact that you may have liked some other set of quirky movies, and therefore you must love Napoleon Dynamite. In some ways, it makes you wonder if we’re all putting too much emphasis on an algorithmic approach to the issue, and if other recommendation systems, including what specific friends think of a movie might be more effective. Of course, Netflix is hedging its bets. It’s been pushing social networking “friend recommendation” features for a while as well.
Filed Under: movies, napoleon dynamite, netflix prize, ranking, recommendation engine
Companies: netflix
Comments on “The Napoleon Dynamite Problem Stymies Netflix Prize Competitors”
Taste is in the mind of the beholder
The problem with quantifying taste is that a lot of people are not really sure about their taste. Only after they have had to try something new, be it wine, food, movies, or music do they know if they like it.
Perhaps these movies are the far outliers and could be used to calibrate the system better than movies that bring on weaker reactions.
humans
Go figure, human beings can’t totally be dissected and analyzed with an algorithm! Maybe mankind isn’t as predictable as previously thought.
Re: humans
While it’s true that people will always surprise you, the surprise is due to them not doing the “normal” thing.
That being said, it is easily implied that such surprises are the rarity–q.e.d. people are predictable.
Re: Re: humans
Kudos for proper use of q.e.d. in an online post, to say nothing of actually putting together an argument that follows the rules of logic…
Re: Re: Re: humans
The burden of having an education. 😛
Re: humans
Mmmmmm. Dissected human.
Approaching true noise
If you ask same set of people to rate same set movies multiple times (and assuming that they forget the rating they gave last time), the ratings are going to change. Any algorithm that beats this “noise” threshold is just overfitting.
In this case it seems the contestants have reached that threshold.
Re: Approaching true noise
They can account for some of that already. From the linked article, “For example, the teams are grappling with the problem that over time, people can change how sternly or leniently they rate movies. Psychological studies show that if you ask someone to rate a movie and then, a month later, ask him to do so again, the rating varies by an average of 0.4 stars. “
Re: Re: Approaching true noise
But it doesnt account for the fact that people do remember how they rated last time and are most likely stick with it.
There is going to be a difference between actual rating and declared rating. It would be easier to predict actual rating (exactly what viewer thinks) than the declared ones. Some factors affecting declared (to be different from actual) are: alcohol, company, time…….
I don’t see the problem really..
you track what people like what movies (with what percentage), then you tie those together and you have a tree-system that tracks distance of likability, how many people liked it, and the amount they liked it by. so for example:
most people who liked old yeller liked homeward bound.
a few people who liked old yeller liked airbud
most people who liked homeward bound liked Beethoven.
a few people who liked homeward bound liked Dunstan checks in
so it would suggest them in roughly this order:
homeward bound
Beethoven/airbud
Dunstan checks in
recurse until your algorithm would give a movie 50% or less probability, adjusting for more links where most movies you watch have the have common movies that people liked. the problem would be making it fast on anything other than a beast of a machine. but as they specify accuracy, not performance…
(feel free to poke holes in my plan, I really only thought about it for a few minutes before typing it up)
Re: algorithm
That’s not too far off from the baseline algorithm, at least in general principal. (Actually, it’s quite different in practice, but I’m not here to pick nits.) The problem is, you have to get 10% better than the baseline algorithm in order to win the prize.
The problem isn’t one that’s hard to solve to a first approximation. The hard part is improving significantly from a reasonable baseline algorithm.
sense
This movie strikes me as a word of mouth movie that’s either a letdown or a surprise and I think that explains the ratings and also what makes it hard to predict.
No matter what aspect of the movie, its actors, or its characters that you look at you don’t find anything compelling about it that would make more than a small subset of the population want to see it based on that alone. It was purely a word of mouth promotion.
That means most people that saw it did so ONLY because they heard it was really good. That means most either agreed and gave it five stars or were sorely disappointed and gave it one or two.
That differs from movies that appeal to and attract a large portion of the audience before word of mouth gets in the mix. These folks provide a lot of the middle ratings.
can't predict my movies!
I have watched movies that were awesome because of the friends I had and/or alcohol on hand. There are movies I’ve seen I know I would not have liked if I was in the wrong mood. Brainy thought invoking movies could be what I want to watch one day but the next I’ll be popping in brainless action movie with over the top explosions and one liners.
Newton's got nothing on intuition
The weak point of any A.I. is the fact that we don’t even fully understand human intelligence let alone how to mimic it. Human understanding works from three sources logic, emotion, and intuition; we got the logic one down pretty good, and we are even making strides in understanding emotion, but intuition is still a shot in the dark. Taste comes from the realm of intuition, so if you want even a shot at the answer drop the Newtonian particle physics and enter the wild world of wave mechanics. Hey, why not base it on resonant frequencies of sympathetic circuits? Who knows, you might actually be in the ballpark then.
tastes change, and even among people, and the audience with whom you’re watching the show… further depending on the time you watch the movie and the environment, can affect what you like/don’t like. further it is possible to watch a movie once, love it, and then second time around, despise it. So good luck on the algorithm, boyz… 🙂
Well, I thought about doing this competition, but realized that my quantum difference engine algorithm was actually worth much more than a million bucks…
Actually, it’s a failed contest premise. The data in NF’s current rating system is inadequate, so achieving even 10% improvement won’t meet the needs of their customers. Need to revisit the data considered relevant to recommendation. Huge room for new work here.
based on personal experience...
I had a fairly lukewarm reaction to Napoleon Dynamite…
until one of my teenage sons started quoting incessantly from that movie. Now I *HATE* it….
Idiots…
🙂
Obvious solution
Is it just me or does anyone else see the obvious solution as being provided by Network Theory?
Ratings by different family members
Another problem for an algorithm is that in a family that shares a NF account, there will be different tastes, so you’ll get inconsistent ratings.
do the chickens have large talons?
do the chickens have large talons?