[mlpack] GSoC - 2013 Collaborative Filtering - Introduction and Initial thoughts

Mudit Gupta mudit.raaj.gupta at gmail.com
Tue Apr 16 19:23:02 EDT 2013


Hi Ryan, Ajinkya

Thank you for taking the time out to answer my questions. I am sorry for a
late reply, I was travelling.

As it is pointed out by you guys earlier the project is a research +
implementation project. I was going through some
of the literature available on collaborating filtering and also some open
source implementations. It looks like the three best implementations
available are:

1. GraphLab[1] as suggested by Ajinkya
2. GraphChi[2] is also by the same author as 1
3. Apache Mahaout[3]

I was also going through the algorithms implemented in these libraries. One
algorithms which is implemented in most of the
collaborative filtering packages is Alternating Least Square (ALS) with
weighed lambda regularization [4]. It seems like a good algorithm to start
coding. I think it looks like a definite choice simply because it is
implemented in all the libraries and can be used for benchmarking. The
paper pointed out by Ryan[5] has a SVD based approach and I think a similar
implementation is in GraphLab. I also came accross some collaborative
filtering algorithms which used hmm, knn and other similar algorithms which
are not always generic. It would be great to know your views on these
algorithms. Moreover, I will try to post a review soon. As far as the
text-numeric value data mapping is concerned, it looks like a smaller issue
than selection of algorithms.

It would be good to know around how many algorithm implementation is
desired during the summer? It is too early to estimate but from what it
looks to me 2 thoughly tested and well documented algorithms would take
around 6-7 weeks + 1-2 weeks buffer + 3-4 weeks for the designing the
system getting it verified and iterating for correction from mentor and the
community and implementing basic features like ratings or input/output
formats. May be have 1 algorithm in "If  time permits section". (I am just
asking this because I want my proposal neither to be over ambitious and nor
insufficient work for the summer.).

Best Regards,

Mudit Raj Gupta

P.S.: I got mlpack running and I am trying some examples.

[1] http://docs.graphlab.org/collaborative_filtering.html
[2] http://graphlab.org/graphchi/
[3]
https://cwiki.apache.org/confluence/display/MAHOUT/Collaborative+Filtering+with+ALS-WR
[4]
www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
[5] ttp://fodava.gatech.edu/files/reports/FODAVA-09-11.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130417/d5da57cd/attachment.html>


More information about the mlpack mailing list