[mlpack] [GSoc2013] Interested in developing collaborative filtering package

Nilesh Chakraborty nilesh at nileshc.com
Sat Apr 20 05:01:01 EDT 2013


Hi Ajinkya,

Yes, it's at matchfreak.com. The frontend is in a very early stage and some
stuff are breaking on the site. I'm currently working on fixing it and
improving the load time. I'm afraid that since it's not open source, I
won't be able to share its exact code with you, but I can describe how it
works.

In short, I fetch a user's status messages and Likes, and those of his/her
friends and store it in a DB. The database has hundreds of thousands of
data like this, from many users. Now, the Likes and status messages are
scanned against a modified wikipedia data dump and a lot of article IDs are
extracted. Those are stored in the DB as user-item combinations. These are
fed to Myrrix, and call an API for finding similar items. We used to use
Mahout before this, with calculating user similarity using log-likelihood
metric. Myrrix fared better, in terms of speed and usability.

The best algorithm options to venture into first would be Alternating Least
Squares, and maybe SVD. But ALS provides a lot of added perks that we
normally wouldn't have from SVD, like handling incomplete user-item pairs
etc. This paper is a good description of a parallel ALS algorithm -
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf

No, I have not implemented CF algorithms from scratch. I have mostly used
APIs and studyied research papers on different kinds of algorithm
implementations.

Suppose we want some kind of an implementation of SVD. Do you think it is a
good idea to add a dependency on a mature library like
Eigen<http://eigen.tuxfamily.org/index.php?title=Main_Page> (all
templates, only compile-time dependency) or
redsvd<https://code.google.com/p/redsvd/>,
or is it better to implement it ourselves reusing already existing code
from mlpack and armadillo if possible, and why?

Cheers,
Nilesh


On Fri, Apr 19, 2013 at 11:32 PM, Ajinkya Kale <kaleajinkya at gmail.com>wrote:

> Hi Nilesh,
>
> Do you have a link to the site where you deployed the recommendation
> engine ?
> It would be good if you can point us to any code you might have written
> for it..
> What algorithms did you use, and did you use just the apis or implemented
> any of the CF algorithms from scratch?
>
> --ajinkya
>
>
> On Fri, Apr 19, 2013 at 10:03 AM, Nilesh Chakraborty <nilesh at nileshc.com>wrote:
>
>> Hi,
>>
>> I am a 3rd year undergraduate student of computer science, pursuing my
>> B.Tech degree at RCC Institute of Information Technology. I am fairly good
>> in C++, and working on brushing it up for this project, and proficient in
>> Java, PHP and C#.
>>
>> Among the project ideas on the GSoC 2013 ideas page, the one particular
>> idea that seemed really interesting to me is developing a collaborative
>> filtering package<http://www.mlpack.org/trac/wiki/SummerOfCodeIdeas#Collaborativefilteringpackage>for mlpack. I want to work on it.
>>
>> I am passionate about data mining, big data and recommendation engines,
>> therefore this idea naturally appeals to me a lot. I have experience with
>> building music and people recommendation systems, and have worked with
>> Myrrix and Apache Mahout. I recently designed and implemented such a
>> recommendation system and deployed it on a live production site, where I'm
>> interning at, to recommend Facebook users to each other depending upon
>> their interests.
>>
>> I am familiar with a few collaborative filtering algorithms and familiar
>> with the Mahout APIs. Mahout contains a whole bunch of collaborative
>> filtering algorithm implementations in org.apache.mahout.cf.taste (here is
>> a quick overview :
>> https://cwiki.apache.org/MAHOUT/recommender-documentation.html). Myrrix (
>> https://code.google.com/p/myrrix-recommender/) focuses on matrix
>> factorization through Alternate Least Squares - it's fast, and it
>> eliminates the cold start problem where the recommender has too little data
>> to provide any useful recommendations.
>>
>> I have since long searched for good C++ libraries for collaborative
>> filtering but to no avail. Having something like this in mlpack will be
>> fabulous. I can use Mahout and Myrrix code among other things as
>> implementation references, since Mahout is easily the most "complete" CF
>> library around.
>>
>> I browsed around the source and checked out the mlpack API for available
>> methods. Please let me know what should be my next course of action, what I
>> can do to dig in, get myself acquainted.
>>
>> Please share your views and do ask me if you have any questions. :-)
>>
>> Cheers,
>> Nilesh
>>
>> --
>> A quest eternal, a life so small! So don't just play the guitar, build
>> one.
>> You can also email me at contact at nileshc.com or visit my website<http://www.nileshc.com/>
>>
>>
>> _______________________________________________
>> mlpack mailing list
>> mlpack at cc.gatech.edu
>> https://mailman.cc.gatech.edu/mailman/listinfo/mlpack
>>
>>
>
>
> --
>
> Sincerely,
> Ajinkya
> http://ajinkya.info
>



-- 
A quest eternal, a life so small! So don't just play the guitar, build one.
You can also email me at contact at nileshc.com or visit my
website<http://www.nileshc.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130420/931ef9f2/attachment.html>


More information about the mlpack mailing list