[mlpack] GSoC - 2013 Collaborative Filtering - Introduction and Initial thoughts

Wed Apr 17 12:42:17 EDT 2013

On Wed, Apr 17, 2013 at 04:53:02AM +0530, Mudit Gupta wrote:
> Hi Ryan, Ajinkya
> 
> Thank you for taking the time out to answer my questions. I am sorry for a
> late reply, I was travelling.
> 
> As it is pointed out by you guys earlier the project is a research +
> implementation project. I was going through some
> of the literature available on collaborating filtering and also some open
> source implementations. It looks like the three best implementations
> available are:
> 
> 1. GraphLab[1] as suggested by Ajinkya
> 2. GraphChi[2] is also by the same author as 1
> 3. Apache Mahaout[3]
> 
> I was also going through the algorithms implemented in these libraries. One
> algorithms which is implemented in most of the
> collaborative filtering packages is Alternating Least Square (ALS) with
> weighed lambda regularization [4]. It seems like a good algorithm to start
> coding. I think it looks like a definite choice simply because it is
> implemented in all the libraries and can be used for benchmarking. The
> paper pointed out by Ryan[5] has a SVD based approach and I think a similar
> implementation is in GraphLab. I also came accross some collaborative
> filtering algorithms which used hmm, knn and other similar algorithms which
> are not always generic. It would be great to know your views on these
> algorithms. Moreover, I will try to post a review soon. As far as the
> text-numeric value data mapping is concerned, it looks like a smaller issue
> than selection of algorithms.

Yeah; I have a script that does text->numeric value mapping already;
it's not a difficult challenge.

The nice thing about implementing QUIC-SVD would be that mlpack already
has a robust tree framework, so we just have to adapt it to cosine trees
and from there it shouldn't be hard.

The three packages you suggested are the standard packages that people
will go to for algorithms like this.  So for us to implement this, we
should make sure that we have something that those libraries don't -- a
flexible API.  Using templates we can write a modular ALS-WL
implementation which allows researchers to plug in different components.
One example of this is our NMF implementation, which allows a developer
to write their own simple update rules.

> It would be good to know around how many algorithm implementation is
> desired during the summer? It is too early to estimate but from what it
> looks to me 2 thoughly tested and well documented algorithms would take
> around 6-7 weeks + 1-2 weeks buffer + 3-4 weeks for the designing the
> system getting it verified and iterating for correction from mentor and the
> community and implementing basic features like ratings or input/output
> formats. May be have 1 algorithm in "If  time permits section". (I am just
> asking this because I want my proposal neither to be over ambitious and nor
> insufficient work for the summer.).

I think that is reasonable.  The students I've worked with in the past
have worked about on that timeframe.

Ryan

-- 
Ryan Curtin       | "I am the luckiest man alive!"
ryan at igglybob.com |   - General Borzov