Thursday, May 14, 2009

Pearson Correlation of Netflix Prize

This week I start to learn "Pearson Correlation", "KNN", for Netflix prize, and summarized all related information as below. My biggest question is about running time(it happens again, in my previous SVD algorithm, running time was a big issue), some body said it can be finished all movie-to-movie Pearson Correlation (calculate 17770*17770, or about 316 million Pearson correlations) within 2 minutes(I know you can just calculate the upper triangle correlation matrix). Two minutes:( Is it just for attracting people's eyes? In my stupid brute force program to calculate only one(Movie 30(Something's Gotta Give) : size(2.3MB) : total ratings : 118413), needs 7 hours, so honestly, I don't 100% believe how can you calculate all within 2 minutes.

Calculating 316 million movie correlations in 2 minutes (down from 2.5 hours)
Calculating 316 million movie correlations in 2 minutes (down from 2.5 hours), Updated on June/11/2009

Fast way to compute correlations?
My kNN source code for Icefox's framework

5 comments:

  1. Kadence's code of icefox's framework can finish all Pearson coefficent around 10 hours, amazing :)

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Yes I do have it running under 2 minutes, and that's not even on a fast PC by today's standards.

    But it's only the correlation calculation time, and does not include prediction time.

    ReplyDelete
  5. wow, Newman, thanks for your comments,
    I think you update your blog by more detail, right?
    I definitely will read it again, great share..

    ReplyDelete