[mlpack] Google Summer of Code

Ryan Curtin gth671b at mail.gatech.edu
Mon Apr 15 11:02:46 EDT 2013


On Sun, Apr 14, 2013 at 01:46:04PM +0100, lynnette ng wrote:
> Hello there,
> 
> I am interested in taking part in the Google Summer of Code in
> collaboration with mil pack on the collaborative filtering package. I am a
> first year MEng computer science undergraduate student from the Univeristy
> College London. Prior to entering university, I did an internship with a
> research company in Singapore, implementing collaborative filtering
> techniques used in the BellKor solution of the Netflix prize on a 100m data
> set.
> 
> I am interested in the project posed and would like to know more about it.
> I understand this is going to be coded in C++. I coded for my internship
> project in python. I do know a little C++, and am more than willing to
> learn coding these collaborat filtering techniques.
> 
> First, what collaborative filtering techniques are going to be used in this
> package? Will it range from simple clustering to the complex neural
> networks? Are there specific techniques that the team is looking at? Can
> you point me to papers or articles detailing the techniques required?

Hello Lynnette,

I think the discussion in the following thread may be useful as answers
to your first questions (there are multiple responses):

https://mailman.cc.gatech.edu/pipermail/mlpack/2013-April/000034.html

> Also, would this project include visualisation of the results? say, a
> pictorial representation of the clusters formed?

mlpack doesn't offer visualizations but instead the better approach for
that would be for a user to use an existing plotting library (i.e.
gnuplot or similar) to visualize the results which mlpack has produced.
For the collaborative filtering package, it would be a good idea to
think about how the resulting models could be visualized, and then see
if there are packages which already do this visualization, and then
figure out how to make our model format compatible with theirs.  Then, a
user can use mlpack to build their model, and then the CF visualization
package to do the visualization.

> Where could I find documentation on how to implement these algorithms in
> mlpack and also the coding style and conventions for mlpack?

The tutorials can be helpful for learning how algorithms are implemented
in mlpack (http://www.mlpack.org/tutorial.html) and you can also look at
the implementation of existing algorithms to see how it is done.  I
would suggest looking at algorithms like NMF or SGD (in
src/mlpack/core/optimizers/sgd/) because they are more related to the CF
problem.

The coding standards can be found at
http://trac.research.cc.gatech.edu/fastlab/wiki/NewStyleGuidelines and
they should cover nearly all possible cases.

> If you could provide me with more information on this project, I would
> greatly appreciate it as well!

If what I've written above leaves some questions unanswered, feel free
to ask. :)

> Additionally, I am a Singaporean student studying in London. I will be
> returning to Singapore during the summer. This means I will be in London
> time zone for June and October, but Singapore timezone from July to
> September. Will this be alright with the project? I am more than happy to
> work during my day times in the respective countries.

I can't speak for the other mentors but I tend to find myself working
around lunchtime and early afternoon in Singapore.  It shouldn't be a
problem.

-- 
Ryan Curtin       | "Aw, Brian's doing it again, dude.  Brian, you ain't
ryan at igglybob.com | no pimp, dude."   "Where's my money?"


More information about the mlpack mailing list