Saturday, April 18, 2009

Netflix contest Numbers

Total Movie Number = MovieIDs range from 1 to 17,770 sequentially ~ 17 thousands
Total Customer Number = CustomerIDs range from 1 to 2649429, with gaps. There are 480,189 users.
Rating Range = Ratings are on a five star (integral) scale from 1 to 5.

Total Rating Number = 17770*480189 = 8,532,958,530 ~ 8.5 billion entries
Total Training set Number = 100,480,507 ~ 100 million
[The movie rating files contain over 100 million ratings from Netlfix README]
we have 100M entries and 8.4B empty cells, 100millsion/8.5billion ~ 0.01, only have 1% entries of this ginormous 8.5 billion entries matrix

how many entries need to predict in "qualifying.txt"??
the total line of "qualifying.txt" is 2,834,601(needs to minus movieID) ~ 2.8 million

No comments:

Post a Comment