[mlpack] GSoC - 2013 Collaborative Filtering - Introduction and Initial thoughts

Fri Apr 12 15:49:01 EDT 2013

On Fri, Apr 12, 2013 at 01:32:41AM +0530, Mudit Gupta wrote:
> Hello Everyone,
> 
> I am interested in mlpack project on* developing a collaborative filtering
> package* for GSoC 2013. I am very* interested in Machine Learning* and have
> done my thesis on the same. I have* relevant C++ experience* through my
> contribution to ns-3.
> 
> I would like to start by introducing my self. I am Mudit Raj Gupta. final
> year student of B.E.(H), M.Sc(H) at Birla Institute of Technology and
> Science - Pilani (BITS-Pilani). I have been* selected for Google Summer of
> Code, twice in 2011 and 2012* and worked for University of Michigan (USA)
> and The network simulator - 3 project respectively. My contributions in
> Repast Simphony and ns-3 can be checked here
> http://code.google.com/p/cscs-repast-demos/wiki/Mudit and here
> http://www.nsnam.org/wiki/index.php/GSOC2012HLA respectively. I am
> presently working on my *thesis in the field on Machine Learning*. In my
> thesis, I worked on developing a mathematical model in order to model
> cognition and biases in individual with the help of Machine Learning. The
> thesis report can be found here
> http://code.google.com/p/multiagent-reinforcement-learning/downloads/list.
> I am presently working on use of probabilistic algorithms for hand-gesture
> recognition. My* coding profile* can be found here :
> http://code.google.com/u/110675325175605367090/ . My* linkedIn profile *can
> be found here:  http://www.linkedin.com/profile/view?id=79832898&trk=tab_pro
>  .
> 
> I think a good point to start with would be by *defining the core
> modules*of the package. This would include the data format/model,
> similarity
> measures, recommender system as a higher abstraction and output format
> (similar nodes, predict degree of likeness etc). The algorithms that can be
> considered can start from the ones implemented in other leading libraries
> like:
> 1. *Item Based Collaborative Filtering*
> 2. May be look into something like this(listed and implemented in some
> other software) :
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
> Finally, completing* testing and documentation *along with high quality *
> examples* cold be looked into.
> 
> I a looking into coding standards and practices used in mlpack would try
> some features of mlpack. I would like to request the mentors and people
> from the community to please provide any details/pointers to resources
> which could be helpful for the project. Moreover, it would be great if the
> mentors provide details about the project and their views on design or
> choice of algorithms.

Hello Mudit,

It sounds like you have put a lot of thought into this.  I think you are
right that a good point to start would be to (in part) define how the
user will interact with the model -- so, the data model format that will
be saved, the API of the CF implementation, and so forth.  Up until this
point mlpack has used XML-format files to save models that aren't just
matrices (see the GMM code in src/mlpack/methods/gmm/ for more details,
and also src/mlpack/core/util/save_restore_model.hpp).

There is a lot of literature on CF and to my understanding (note that I
am not a CF expert) it is a broad collection of different algorithms
instead of just one.  So the choice of which CF algorithms to implement
is going to be something that will require some research.

Another thing to think about is that up to this point, mlpack deals with
only numeric-valued data matrices.  So if you are thinking of a
recommender system built on top of mlpack, it will need to have some
sort of code to map text labels (such as usernames, perhaps) to numeric
IDs so that the underlying mlpack algorithms can handle them.  A tool
like that would also be useful for preprocessing other datasets.

I'm not too sure what I think about a full-blown recommender system in
mlpack; mlpack tends to aim at low-level algorithms which can then be
used in more polished, user-friendly systems.  I would need to see the
proposed API for something like that before making a decision.  Even so,
a standalone CF recommender system built on top of mlpack would surely
be interesting to the machine learning community.

The CF project in particular is as much of a research project ("which
algorithms are best to implement?") as well as an implementation
project.

Hopefully this clears up some questions, but if you have more, feel free
to ask.  :)

Thanks,

Ryan

-- 
Ryan Curtin       | "Kill them, Machine... kill them all."
ryan at igglybob.com |   - Dino Velvet