In today's New York Times, there is a story about Netflix, the movie rental company, and how they awarded a $1 Million Prize for a statistical model. The company decided its million-dollar competition was such a good investment that it is planning another one.
The company’s challenge, begun in October 2006, was to come up with a recommendation software that could do a better job accurately predicting the movies customers would like than Netflix’s in-house software, Cinematch. To qualify for the prize, entries had to be at least 10 percent better than Cinematch.
The data set for the first contest was 100 million movie ratings, with the personally identifying information stripped off. Contestants worked with the data to try to predict what movies particular customers would prefer, and then their predictions were compared with how the customers actually did rate those movies later, on a scale of one to five stars.
The new contest is going to present the contestants with demographic and behavioral data, and they will be asked to model individuals’ “taste profiles,” the company said. The data set of more than 100 million entries will include information about renters’ ages, gender, ZIP codes, genre ratings and previously chosen movies. Unlike the first challenge, the contest will have no specific accuracy target. Instead, $500,000 will be awarded to the team in the lead after six months, and $500,000 to the leader after 18 months.
You might need Stata SE for that dataset.
ReplyDelete