[mlpack] GSOC-2013 : Working on Collaborative Filtering

Ryan Curtin gth671b at mail.gatech.edu
Mon Apr 22 14:46:38 EDT 2013


On Sun, Apr 21, 2013 at 09:29:03PM +0530, Srijan Kumar wrote:
> Hi,
> 
> My full profile can be found at
> https://sites.google.com/site/srijankedia/home.

Interesting aside -- I see that you like to solve sudoku puzzles.  Are
you familiar with the literature on sudoku?  Some years ago it was shown
that sudoku is np-complete:

http://www-imai.is.s.u-tokyo.ac.jp/~yato/data2/SIGAL87-2.pdf

> I do have a few questions that would help to decide the CF algorithm to be
> finally implemented.
> It would be great if the mentors could please answer the following
> questions -
> 1. What is the kind and size of data that we would require to handle? Do
> you have anything in mind or is it general at the moment?

mlpack is generally used on single systems, not clusters, so datasets up
to probably 16GB is about the aim.  I suppose you could OpenMP-ize the
code and then run it on a cluster on a huge amount of data, but I don't
imagine many people are planning to use mlpack that way.

> 2. What are the other factors that we would need to consider while choosing
> the algorithm?

Extensibility is helpful.  If we can provide an algorithm that is highly
modular, this opens up possibilities for other researchers to try
modifying the algorithm slightly with ease.  For instance, take a look
at the k-means code in src/mlpack/methods/kmeans/ and note that one of
the template parameters is a class which defines how to find the
starting points.  If a researcher wanted to play around with different
initialization methods for k-means (which actually I have been doing
this past week), then they can implement their own without having to
deal with the k-means algorithm at all.

Make sure to take a look at the list archives to find the previous
discussion on the collaborative filtering projects.  You may find useful
information there.

https://mailman.cc.gatech.edu/pipermail/mlpack/2013-April/thread.html

If you have more questions, feel free to ask.

Thanks,

Ryan

-- 
Ryan Curtin       | "I love it when a plan comes together."
ryan at igglybob.com |   - Hannibal Smith


More information about the mlpack mailing list