Wednesday, April 29, 2009
NAPOLEON DYNAMITE for Netflix Prize
If You Liked This, You’re Sure to Love That
By CLIVE THOMPSON
Published: November 23, 2008, New York Times
The reason, Bertoni says, is that “Napoleon Dynamite” is very weird and very polarizing. It contains a lot of arch, ironic humor, including a famously kooky dance performed by the titular teenage character to help his hapless friend win a student-council election. It’s the type of quirky entertainment that tends to be either loved or despised. The movie has been rated more than two million times in the Netflix database, and the ratings are disproportionately one or five stars.
Mathematically speaking, “Napoleon Dynamite” is a very significant problem for the Netflix Prize.
I draw a matlab graph for this "Napoleon Dynamite" base on my 3 prediction files, it did is a polarizing movie. ha ha, amazing.By the way, in "qualificaiton.txt" file, we have 10551 prediction entries of "Napoleon Dynamite" need to be guessed. In other words, we have 10551 weird behavior kids need to be controlled well, Wow..
If we can’t understand our own preferences, can computers really be any better at it?
Matlab Command
plot(Small_x,svdMix3151(1:100),'g-o',Small_x,userAVG3151(1:100),'r-x',Small_x,movie3151AVG(1:100),'k:');
legend('svd','useraverage','movieaverage');
title('Movie3151(2004,Napoleon Dynamite) first 100 prediction comparison');
xlabel('User Id(not sequential)');
ylabel('Movie Rating - predict value');
Tuesday, April 28, 2009
Tweak SVD V1
Thursday, April 23, 2009
Movie 1(2003,Dinosaur Planet) Predict comparisons
MATLAB:
plot(x,movie1_svd,'g',x,useraverage,'r--',x,movieaverage,'k:');
legend('svd','useraverage','movieaverage');
title('Movie1(2003,Dinosaur Planet) prediction comparison');
xlabel('User Id(not sequential)');
ylabel('Movie Rating - predict value');
I don't know why I have this idea to plot this diagram, just want to know the relation of SVD, useraverage, movieaverage.
Python vs C++
After 10 more days experiment, python's poor performance is the reason for my to change the language to C++/C, I found some information about how to write a C extension for python, but it is not easy as the description. But the advantage of python is the concise code style, tons of module to use, it is a double side sword, easy coding make poor running speed. If you need performance, use C, if you need easy coding, use python. It should be my wrong choice at beginning, at least I learn a new language and some experience.
By the way, thanks Turnbeutel and TD's sharing, I ran their code for 22 hours, time consuming project
120 = MIN_EPOCHS
200 = MAX_EPOCHS
64 = FEATURES
3000 counts RMSE = 0.783169(y value)
6000 counts RMSE = 0.7491227
Final RMSE = 0.735715
1 count make 7.5 seconds(By Simon Funk's blog, yes, it real cost around 7.5 seconds, amazing)
min epochs = 120, at least, it will cost 120*7.5 = 900 seconds = 15 minutes for 1 feature
features = 64, so we total need 64*15=960 minutes = 16 hours, but it is the minimal, my wee ibmx60s has 3G ram, 1.6 dual core cpu, spent almost 22 hours to finish the running.
Another method to calculate the total time
total count = 7680, each count spends 7.5s, 7680*7.5=57,600 second = 960 minutes = 16 hours
By the way, thanks Turnbeutel and TD's sharing, I ran their code for 22 hours, time consuming project
120 = MIN_EPOCHS
200 = MAX_EPOCHS
64 = FEATURES
3000 counts RMSE = 0.783169(y value)
6000 counts RMSE = 0.7491227
Final RMSE = 0.735715
1 count make 7.5 seconds(By Simon Funk's blog, yes, it real cost around 7.5 seconds, amazing)
min epochs = 120, at least, it will cost 120*7.5 = 900 seconds = 15 minutes for 1 feature
features = 64, so we total need 64*15=960 minutes = 16 hours, but it is the minimal, my wee ibmx60s has 3G ram, 1.6 dual core cpu, spent almost 22 hours to finish the running.
Another method to calculate the total time
total count = 7680, each count spends 7.5s, 7680*7.5=57,600 second = 960 minutes = 16 hours
Tuesday, April 21, 2009
Install Boost at Ubuntu
3.1 Basic Procedure
1.Get Boost; see sections 1 and 2 [Unix/Linux, Windows] of the Boost Getting Started Guide.
2.Get the bjam build driver. See section 5 [Unix/Linux, Windows] of the Boost Getting Started Guide.
3.cd into the libs/python/example/quickstart/ directory of your Boost installation, which contains a small example project.
4.Invoke bjam. Replace the “stage“ argument from the example invocation from section 5 of the Getting Started Guide with “test,“ to build all the test targets. Also add the argument “--verbose-test” to see the output generated by the tests when they are run.
On Windows, your bjam invocation might look something like:
C:\boost_1_34_0\…\quickstart> bjam toolset=msvc --verbose-test test
and on Unix variants, perhaps,
~/boost_1_34_0/…/quickstart$ bjam toolset=gcc --verbose-test test
1.Get Boost; see sections 1 and 2 [Unix/Linux, Windows] of the Boost Getting Started Guide.
2.Get the bjam build driver. See section 5 [Unix/Linux, Windows] of the Boost Getting Started Guide.
3.cd into the libs/python/example/quickstart/ directory of your Boost installation, which contains a small example project.
4.Invoke bjam. Replace the “stage“ argument from the example invocation from section 5 of the Getting Started Guide with “test,“ to build all the test targets. Also add the argument “--verbose-test” to see the output generated by the tests when they are run.
On Windows, your bjam invocation might look something like:
C:\boost_1_34_0\…\quickstart> bjam toolset=msvc --verbose-test test
and on Unix variants, perhaps,
~/boost_1_34_0/…/quickstart$ bjam toolset=gcc --verbose-test test
Monday, April 20, 2009
what is the "steroids" of python?
Psyco Home
Make Python run as fast as C with Psyco
Recently, I try to increase the python running speed, found Psyco module, but don't have the "steroids" effect, or should make a C extension to Python?
Extending and Embedding the Python Interpreter
The Python Tutorial
Make Python run as fast as C with Psyco
Recently, I try to increase the python running speed, found Psyco module, but don't have the "steroids" effect, or should make a C extension to Python?
Extending and Embedding the Python Interpreter
The Python Tutorial
Sunday, April 19, 2009
Learning about top wine variety
1.Chardonnay
Chardonnay is a green-skinned grape variety used to make white wine. It is believed to have originated in the Burgundy wine region of eastern France but is now grown wherever wine is produced, from England to New Zealand. For new and developing wine regions, growing Chardonnay is seen as a "rite of passage" and an easy segue into the international wine market.
2.Pinot noir
Pinot noir (IPA: [pi.no.'nwaʁ]) is a red wine grape variety of the species Vitis vinifera. The name may also refer to wines produced predominantly from Pinot noir grapes. The name is derived from the French words for "pine" and "black" alluding to the varietals' tightly clustered dark purple pine cone-shaped bunches of fruit.
3.Merlot
Merlot is a red wine grape that is used as both a blending grape and for varietal wines. Merlot-based wines usually have medium body with hints of berry, plum, and currant. Its softness and "fleshiness", combined with its earlier ripening, makes Merlot a popular grape for blending with the sterner, later-ripening Cabernet Sauvignon, which tends to be higher in tannin.
4.Riesling
Riesling is a white grape variety which originates in the Rhine region of Germany. Riesling is an aromatic grape variety displaying flowery, almost perfumed, aromas as well as high acidity. It is used to make dry, semi-sweet, sweet and sparkling white wines.
5.Shiraz
Syrah is a dark-skinned grape grown throughout the world and used primarily to produce powerful red wines. Syrahs enjoy great popularity in the marketplace, relatively often under the name Shiraz.
6.Cabernet Sauvignon
Cabernet Sauvignon is one of the world's most widely recognized red wine grape varieties. It is grown in nearly every major wine producing country among a diverse spectrum of climates from Canada's Okanagan Valley to Lebanon's Beqaa Valley. Cabernet Sauvignon became internationally recognized through its prominence in Bordeaux wines where it is often blended with Merlot and Cabernet franc.
Chardonnay is a green-skinned grape variety used to make white wine. It is believed to have originated in the Burgundy wine region of eastern France but is now grown wherever wine is produced, from England to New Zealand. For new and developing wine regions, growing Chardonnay is seen as a "rite of passage" and an easy segue into the international wine market.
2.Pinot noir
Pinot noir (IPA: [pi.no.'nwaʁ]) is a red wine grape variety of the species Vitis vinifera. The name may also refer to wines produced predominantly from Pinot noir grapes. The name is derived from the French words for "pine" and "black" alluding to the varietals' tightly clustered dark purple pine cone-shaped bunches of fruit.
3.Merlot
Merlot is a red wine grape that is used as both a blending grape and for varietal wines. Merlot-based wines usually have medium body with hints of berry, plum, and currant. Its softness and "fleshiness", combined with its earlier ripening, makes Merlot a popular grape for blending with the sterner, later-ripening Cabernet Sauvignon, which tends to be higher in tannin.
4.Riesling
Riesling is a white grape variety which originates in the Rhine region of Germany. Riesling is an aromatic grape variety displaying flowery, almost perfumed, aromas as well as high acidity. It is used to make dry, semi-sweet, sweet and sparkling white wines.
5.Shiraz
Syrah is a dark-skinned grape grown throughout the world and used primarily to produce powerful red wines. Syrahs enjoy great popularity in the marketplace, relatively often under the name Shiraz.
6.Cabernet Sauvignon
Cabernet Sauvignon is one of the world's most widely recognized red wine grape varieties. It is grown in nearly every major wine producing country among a diverse spectrum of climates from Canada's Okanagan Valley to Lebanon's Beqaa Valley. Cabernet Sauvignon became internationally recognized through its prominence in Bordeaux wines where it is often blended with Merlot and Cabernet franc.
Saturday, April 18, 2009
Netflix contest Numbers
Total Movie Number = MovieIDs range from 1 to 17,770 sequentially ~ 17 thousands
Total Customer Number = CustomerIDs range from 1 to 2649429, with gaps. There are 480,189 users.
Rating Range = Ratings are on a five star (integral) scale from 1 to 5.
Total Rating Number = 17770*480189 = 8,532,958,530 ~ 8.5 billion entries
Total Training set Number = 100,480,507 ~ 100 million
[The movie rating files contain over 100 million ratings from Netlfix README]
we have 100M entries and 8.4B empty cells, 100millsion/8.5billion ~ 0.01, only have 1% entries of this ginormous 8.5 billion entries matrix
how many entries need to predict in "qualifying.txt"??
the total line of "qualifying.txt" is 2,834,601(needs to minus movieID) ~ 2.8 million
Total Customer Number = CustomerIDs range from 1 to 2649429, with gaps. There are 480,189 users.
Rating Range = Ratings are on a five star (integral) scale from 1 to 5.
Total Rating Number = 17770*480189 = 8,532,958,530 ~ 8.5 billion entries
Total Training set Number = 100,480,507 ~ 100 million
[The movie rating files contain over 100 million ratings from Netlfix README]
we have 100M entries and 8.4B empty cells, 100millsion/8.5billion ~ 0.01, only have 1% entries of this ginormous 8.5 billion entries matrix
how many entries need to predict in "qualifying.txt"??
the total line of "qualifying.txt" is 2,834,601(needs to minus movieID) ~ 2.8 million
Thursday, April 16, 2009
AI Season 8 - Adam Lambert
Adam Lambert, a name you wouldn't forget for American Idol Season 8, honestly, from my personal perspective, he will be champion of this season, he is in different level comparing with other contester, the biggest difference between he and other competitor is he can sing any style of song, and sing perfectly to suit any type, unmatchable with others. He must be the winner this AI season.
Wednesday, April 15, 2009
SVD - singular value decomposition
thanks Simon Funk's blog, he makes SVD looks like so simple, awesome!!
you can combine pyflix and nullity's SVD code together to make a netflix prize prediction of SVD algorithm, cool, honestly, I think this SVD algorithms can beat a lot of recommendation system in the world.
nullity's SVD on pyflix
you can combine pyflix and nullity's SVD code together to make a netflix prize prediction of SVD algorithm, cool, honestly, I think this SVD algorithms can beat a lot of recommendation system in the world.
nullity's SVD on pyflix
Tuesday, April 14, 2009
Netflix Prize Blogs
I summary some very interesting blogs about this recommendation contest..
The rank of Netflix Prize (04/14/2009)
Pragmatic Theory - No.1
We are two simple guys with no academic background in either machine learning or mathematics. We both have bachelors degrees in engineering (software and electrical). We work full time as computer engineers in a totally unrelated field (telecommunications).
Just a guy in a garage - No.19
not an academic or a mathematican. However, being an unemployed psychologist I do have a bit of time.
Simon Funk,Try This at Home - No.193
Simon Funk is a pen name, having obtained a BA from UCSD at the age of 18.
timelydevelopment - No.652
Timely Development is an Atlanta-based provider of software and services.
The rank of Netflix Prize (04/14/2009)
Pragmatic Theory - No.1
We are two simple guys with no academic background in either machine learning or mathematics. We both have bachelors degrees in engineering (software and electrical). We work full time as computer engineers in a totally unrelated field (telecommunications).
Just a guy in a garage - No.19
not an academic or a mathematican. However, being an unemployed psychologist I do have a bit of time.
Simon Funk,Try This at Home - No.193
Simon Funk is a pen name, having obtained a BA from UCSD at the age of 18.
timelydevelopment - No.652
Timely Development is an Atlanta-based provider of software and services.
Monday, April 13, 2009
Pyflix - 04/13/2009
Basic algorithms
====================================================================================
By Bold Raved Tithe
The best I achieved on the probe with predictions removed is:
RMSE 0.9897
For the following equation:
R = -2.58797372 + 0.95682133*M + 0.7606549*U
Where:
R = predicted rating
M = movie average
U = user average
====================================================================================
pyflix
class Mix(MovieAverage,UserAverage):
'''Returns the average of L{MovieAverage} and L{UserAverage}.
This algorithm returns an RMSE score of ?? on the scrubbed dataset.
'''
def __call__(self, movie_id, user_id):
return (
-2.58797372 + 0.95682133*MovieAverage.__call__(self,movie_id,user_id) +
0.7606549*UserAverage.__call__(self,movie_id,user_id)
)
====================================================================================
Check this interesting article
This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize
====================================================================================
By Bold Raved Tithe
The best I achieved on the probe with predictions removed is:
RMSE 0.9897
For the following equation:
R = -2.58797372 + 0.95682133*M + 0.7606549*U
Where:
R = predicted rating
M = movie average
U = user average
====================================================================================
pyflix
class Mix(MovieAverage,UserAverage):
'''Returns the average of L{MovieAverage} and L{UserAverage}.
This algorithm returns an RMSE score of ?? on the scrubbed dataset.
'''
def __call__(self, movie_id, user_id):
return (
-2.58797372 + 0.95682133*MovieAverage.__call__(self,movie_id,user_id) +
0.7606549*UserAverage.__call__(self,movie_id,user_id)
)
====================================================================================
Check this interesting article
This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize
Sunday, April 12, 2009
Pyflix - 04/12/2009
Pyflix
If you put your algorithm in a module under the algorithms package (e.g. pyflix.algorithms.myalg.py), you can test it on the probe set and get its RMSE by running:
python pyflix/algorithms/run.py myalg.MyFancyAlgorithm path/to/database
===========================================================================
Under my environment
E:\pyflix-0.1>python pyflix/algorithms/run.py average.DoubleAverage e:\pyflix-0.
1\database
E:\pyflix-0.1\pyflix\_path.py:37: DeprecationWarning: the md5 module is deprecat
ed; use hashlib instead
import sys, warnings, os, fnmatch, glob, shutil, codecs, md5
[********************************************************************] 100%
Computed RMSE in 41 minutes and 56.5 seconds
average.DoubleAverage: RMSE=1.0158
===========================================================================
how to generate a prediction file by DoubleAverage algorithm
modify run.py
(1) beginning
from pyflix.algorithms import Algorithm
from pyflix.algorithms.average import DoubleAverage
(2) main function
rmse = timeCall('Computed RMSE', probe_set.rmse, model, progressbar=True)
print '%s.%s: RMSE=%.4f' % (algorithm.__module__, algorithm.__name__, rmse)
a = DoubleAverage(RatedDataset(basedir/'training_set'))
a.writePredictions('E:/np/download/qualifying.txt','DoubleAverage_pre.txt')
(3) winxp cmd
E:\pyflix-0.1>python pyflix/algorithms/run.py average.DoubleAverage e:\pyflix-0.
1\database
E:\pyflix-0.1\pyflix\_path.py:37: DeprecationWarning: the md5 module is deprecat
ed; use hashlib instead
import sys, warnings, os, fnmatch, glob, shutil, codecs, md5
[********************************************************************] 100%
Computed RMSE in 3 minutes and 59.2 seconds
average.DoubleAverage: RMSE=1.0158
[********************************************************************] 100%
==================================================================================
Submit your prediction file
(1) gzip or zip your file
(2) check the MD5
(3) submit your result (my firefox can't submit, only IE can, weird??)
If you put your algorithm in a module under the algorithms package (e.g. pyflix.algorithms.myalg.py), you can test it on the probe set and get its RMSE by running:
python pyflix/algorithms/run.py myalg.MyFancyAlgorithm path/to/database
===========================================================================
Under my environment
E:\pyflix-0.1>python pyflix/algorithms/run.py average.DoubleAverage e:\pyflix-0.
1\database
E:\pyflix-0.1\pyflix\_path.py:37: DeprecationWarning: the md5 module is deprecat
ed; use hashlib instead
import sys, warnings, os, fnmatch, glob, shutil, codecs, md5
[********************************************************************] 100%
Computed RMSE in 41 minutes and 56.5 seconds
average.DoubleAverage: RMSE=1.0158
===========================================================================
how to generate a prediction file by DoubleAverage algorithm
modify run.py
(1) beginning
from pyflix.algorithms import Algorithm
from pyflix.algorithms.average import DoubleAverage
(2) main function
rmse = timeCall('Computed RMSE', probe_set.rmse, model, progressbar=True)
print '%s.%s: RMSE=%.4f' % (algorithm.__module__, algorithm.__name__, rmse)
a = DoubleAverage(RatedDataset(basedir/'training_set'))
a.writePredictions('E:/np/download/qualifying.txt','DoubleAverage_pre.txt')
(3) winxp cmd
E:\pyflix-0.1>python pyflix/algorithms/run.py average.DoubleAverage e:\pyflix-0.
1\database
E:\pyflix-0.1\pyflix\_path.py:37: DeprecationWarning: the md5 module is deprecat
ed; use hashlib instead
import sys, warnings, os, fnmatch, glob, shutil, codecs, md5
[********************************************************************] 100%
Computed RMSE in 3 minutes and 59.2 seconds
average.DoubleAverage: RMSE=1.0158
[********************************************************************] 100%
==================================================================================
Submit your prediction file
(1) gzip or zip your file
(2) check the MD5
(3) submit your result (my firefox can't submit, only IE can, weird??)
Saturday, April 11, 2009
Pyflix - 04/11/2009
(1) from pyflix.datasets import RatedDataset
you need to be the father directory (in my case, I am at /media/80G/np folder)
(2) then goto python command line
(3) the this "from pyflix.datasets import RatedDataset" will works
=> 奇怪的python directory convention
Basic usage
Pyflix comes with full documentation under docs/api/index.html, created with the excellent Epydoc package. Below is a sample interactive session to demonstrate the dataset API:
======================================================================================
Python 2.4.2 (#1, Mar 8 2006, 13:24:00)
[GCC 3.4.4 20050721 (Red Hat 3.4.4-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> # the main module you'll need is pyflix.datasets
... from pyflix.datasets import RatedDataset
you need to be the father directory (in my case, I am at /media/80G/np folder)
(2) then goto python command line
(3) the this "from pyflix.datasets import RatedDataset" will works
=> 奇怪的python directory convention
Basic usage
Pyflix comes with full documentation under docs/api/index.html, created with the excellent Epydoc package. Below is a sample interactive session to demonstrate the dataset API:
======================================================================================
Python 2.4.2 (#1, Mar 8 2006, 13:24:00)
[GCC 3.4.4 20050721 (Red Hat 3.4.4-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> # the main module you'll need is pyflix.datasets
... from pyflix.datasets import RatedDataset
Pyflix - 04/10/2009
Pyflix
Error Message of pyflix
==================================================
Traceback (most recent call last):
File "pyflix/setup.py", line 13, in
from pyflix import timeCall
ImportError: No module named pyflix
==================================================
the better solution should understand the module loading method of python, then change the module name of pyflix
the module name needs to change, my temp solution is
(1) add a new file init.py (which was copied from __init__.py)
(2) change setup.py code
#Old
from pyflix import timeCall
from pyflix._path import path
from pyflix.records import *
from pyflix.progressbar import ProgressBar, progress
#New
from init import timeCall
from _path import path
from records import *
from progressbar import ProgressBar, progress
===========================================================
michael@michael-laptop:/media/80G/np/pyflix$ python setup.py -i /media/80G/np/download/ -o database
* Reading /media/80G/np/download/probe.txt...
[********************************************************************] 100%
* Building rating files...
[********************************************************************] 100%
4544779 training set entries rated 1
9995998 training set entries rated 2
28458811 training set entries rated 3
33288865 training set entries rated 4
22783659 training set entries rated 5
73211 probe set entries rated 1
136082 probe set entries rated 2
352436 probe set entries rated 3
462093 probe set entries rated 4
384573 probe set entries rated 5
Split files in 19 minutes and 49.5 seconds
* Sorting database/training_set/1.tmp, database/training_set/2.tmp, database/training_set/3.tmp, database/training_set/4.tmp, database/training_set/5.tmp...
Sorted files in 4 minutes and 21.5 seconds
* Merging sorted files from database/training_set/sorted_by_movie...
[********************************************************************] 100%
Merged movies in 16 minutes and 40.4 seconds
* Merging sorted files from database/training_set/sorted_by_user...
[********************************************************************] 100%
Merged users in 20 minutes and 18.1 seconds
* Sorting database/probe_set/1.tmp, database/probe_set/2.tmp, database/probe_set/3.tmp, database/probe_set/4.tmp, database/probe_set/5.tmp...
Sorted files in 0 minutes and 3.3 seconds
* Merging sorted files from database/probe_set/sorted_by_movie...
[********************************************************************] 100%
Merged movies in 0 minutes and 19.7 seconds
* Merging sorted files from database/probe_set/sorted_by_user...
[********************************************************************] 100%
Merged users in 3 minutes and 30.0 seconds
* Completed successfully in 65 minutes and 2.5 seconds
Error Message of pyflix
==================================================
Traceback (most recent call last):
File "pyflix/setup.py", line 13, in
from pyflix import timeCall
ImportError: No module named pyflix
==================================================
the better solution should understand the module loading method of python, then change the module name of pyflix
the module name needs to change, my temp solution is
(1) add a new file init.py (which was copied from __init__.py)
(2) change setup.py code
#Old
from pyflix import timeCall
from pyflix._path import path
from pyflix.records import *
from pyflix.progressbar import ProgressBar, progress
#New
from init import timeCall
from _path import path
from records import *
from progressbar import ProgressBar, progress
===========================================================
michael@michael-laptop:/media/80G/np/pyflix$ python setup.py -i /media/80G/np/download/ -o database
* Reading /media/80G/np/download/probe.txt...
[********************************************************************] 100%
* Building rating files...
[********************************************************************] 100%
4544779 training set entries rated 1
9995998 training set entries rated 2
28458811 training set entries rated 3
33288865 training set entries rated 4
22783659 training set entries rated 5
73211 probe set entries rated 1
136082 probe set entries rated 2
352436 probe set entries rated 3
462093 probe set entries rated 4
384573 probe set entries rated 5
Split files in 19 minutes and 49.5 seconds
* Sorting database/training_set/1.tmp, database/training_set/2.tmp, database/training_set/3.tmp, database/training_set/4.tmp, database/training_set/5.tmp...
Sorted files in 4 minutes and 21.5 seconds
* Merging sorted files from database/training_set/sorted_by_movie...
[********************************************************************] 100%
Merged movies in 16 minutes and 40.4 seconds
* Merging sorted files from database/training_set/sorted_by_user...
[********************************************************************] 100%
Merged users in 20 minutes and 18.1 seconds
* Sorting database/probe_set/1.tmp, database/probe_set/2.tmp, database/probe_set/3.tmp, database/probe_set/4.tmp, database/probe_set/5.tmp...
Sorted files in 0 minutes and 3.3 seconds
* Merging sorted files from database/probe_set/sorted_by_movie...
[********************************************************************] 100%
Merged movies in 0 minutes and 19.7 seconds
* Merging sorted files from database/probe_set/sorted_by_user...
[********************************************************************] 100%
Merged users in 3 minutes and 30.0 seconds
* Completed successfully in 65 minutes and 2.5 seconds
Thursday, April 9, 2009
City Data Website
City Data
Stats about all US cities - real estate, relocation info, house prices, home value estimator, recent sales, cost of living, crime, race, income, photos, education, maps, weather, houses, schools, neighborhoods, and more...
Stats about all US cities - real estate, relocation info, house prices, home value estimator, recent sales, cost of living, crime, race, income, photos, education, maps, weather, houses, schools, neighborhoods, and more...
Netflix Prize Forum's Summary(4/25/07)
1. Netflix Recommender Framework
Icefox,免費提供c++ source code(我有試過,可是不能RUN,我ㄉ電腦有些它需要軟體好像沒有),而且最近一版ㄉCODE 裡,有提供SVD algorithm (singular value decomposition (SVD))
http://www.netflixprize.com/community/viewtopic.php?id=352
http://sifter.org/~simon/journal/20061211.html (這個是寫SVD 那個人ㄉ網頁)
2. Pyflix - The pythonic Netflix Prize
Deus ex silicon,免費提供python source code
http://www.netflixprize.com/community/viewtopic.php?id=432
3. Useful public RMSEs
一些基本ㄉ猜測ㄉRMSE(像是全猜1或2)
下面這個怪公式ㄉRMSE 可以低於1
For the following equation:
R = -2.58797372 + 0.95682133*M + 0.7606549*U
Where:
R = predicted rating
M = movie average
U = user average
http://www.netflixprize.com/community/viewtopic.php?id=435
4.Not sure how to calculate RMSE
如何計算RMSE
http://www.netflixprize.com/community/viewtopic.php?id=625
5.A better spreadsheet than Excel ?
這個人用EXCEL
http://www.netflixprize.com/community/viewtopic.php?id=483
6.Anyone help with Collaborative
算兩個userㄉ相關
http://www.netflixprize.com/community/viewtopic.php?pid=4924#p4924
7. Ramblings on the lower bound on the RMSE
http://www.netflixprize.com/community/viewtopic.php?id=78
RMSE-------perfect%----random%
0.2887-----100%--------0%
0.8563-----48%---------52%
0.9514-----34%---------66%
0.98--------30%---------70%
1.0540-----18%---------82%
1.1547-----0%---------100%
8.Why RMSE=1.79 for random ratings?
亂猜也有RMSE=1.79
http://www.netflixprize.com/community/viewtopic.php?id=224
Icefox,免費提供c++ source code(我有試過,可是不能RUN,我ㄉ電腦有些它需要軟體好像沒有),而且最近一版ㄉCODE 裡,有提供SVD algorithm (singular value decomposition (SVD))
http://www.netflixprize.com/community/viewtopic.php?id=352
http://sifter.org/~simon/journal/20061211.html (這個是寫SVD 那個人ㄉ網頁)
2. Pyflix - The pythonic Netflix Prize
Deus ex silicon,免費提供python source code
http://www.netflixprize.com/community/viewtopic.php?id=432
3. Useful public RMSEs
一些基本ㄉ猜測ㄉRMSE(像是全猜1或2)
下面這個怪公式ㄉRMSE 可以低於1
For the following equation:
R = -2.58797372 + 0.95682133*M + 0.7606549*U
Where:
R = predicted rating
M = movie average
U = user average
http://www.netflixprize.com/community/viewtopic.php?id=435
4.Not sure how to calculate RMSE
如何計算RMSE
http://www.netflixprize.com/community/viewtopic.php?id=625
5.A better spreadsheet than Excel ?
這個人用EXCEL
http://www.netflixprize.com/community/viewtopic.php?id=483
6.Anyone help with Collaborative
算兩個userㄉ相關
http://www.netflixprize.com/community/viewtopic.php?pid=4924#p4924
7. Ramblings on the lower bound on the RMSE
http://www.netflixprize.com/community/viewtopic.php?id=78
RMSE-------perfect%----random%
0.2887-----100%--------0%
0.8563-----48%---------52%
0.9514-----34%---------66%
0.98--------30%---------70%
1.0540-----18%---------82%
1.1547-----0%---------100%
8.Why RMSE=1.79 for random ratings?
亂猜也有RMSE=1.79
http://www.netflixprize.com/community/viewtopic.php?id=224
Subscribe to:
Posts (Atom)