« Fighting the dominant paradigm | Main | Hierarchy in networks »

October 07, 2006

The Netflix Prize

The big news in Computer Science this week is the announcement of the Netflix Prize competition, which is to design a whole new way of generating recommendations. NPR has an interview with James Bennett (of Netflix) about it, and the NYTimes broke the story.The game here is that Netflix provides you with a sample of its database of movie ratings by users (anonymized). Your job is to come up with a new recommendation algorithm that beats their in-house approach by at least 10%. If you win, they'll pay out a cool million USD.

There are a couple of exciting things about this competition. First, the field of collaborative filtering (CF) has apparently been squeezed dry of fresh ideas -- ideas that Amazon and Netflix have already implemented, tweaked and adjusted to make their respective recommendation systems -- Netflix wants to use the competition as a way to get the creative juices flowing again. Although I'm certainly not an expert on CF, nor am I aware of the latest techniques in this area, I sure hope that the ML community (who will probably be most interested in this competition) does more than simply try to solve yet another problem with Bayes nets, SVMs and their ilk. For a fairly narrow problem like this, it seems that some careful thinking might produce something significantly more compelling than these techniques. Lance Fortnow and John Langford both seem to also feel this way. John, who works in machine learning, seems to think that the "progress prize" of $50k (what you get if you're the best so-far, but not above the 10% threshold) is achievable with some careful thought.

The second thing is the data set itself, which is huge. For the collaborative filtering people, it's easily 100x bigger than the previous best public data set. Closer to home for me, the Netflix data can easily be turned into a bipartite network, with people on one side and movies on the other. Bipartite networks are old news themselves, but what's cool about this one is that the data is given as a series of tuples {userID, movieID, date, rating}, so it should be possible to watch how the network has grown over time. The temporal information alone is a big enough addition to the complex networks data store to produce a slew of new papers from applying the old tools as a function of time. But I think that we can ask more interesting questions than that. For instance, can our network analysis techniques be extended to explicitly account for the temporal information? By that, I mean, to account for the temporal behavior of nodes in time (velocity, acceleration, etc.). And, what would it mean to do community structure inference on a bipartite network in a meaningful way?

posted October 7, 2006 12:01 AM in Computer Science | permalink

Comments

Aha,it's your time. Community structure is closer to the problem in essence than others and may provide abundant new ideas.
ps:Thank you for your introduction of "Visualizing biology". My younger sisiter love it very much and share it with her college students actively.

Posted by: succc at October 10, 2006 06:28 AM