[mlpack] GSoC '16: Introduction and Project ideas discussion

Ryan Curtin ryan at ratml.org
Sun Mar 6 13:28:37 EST 2016


On Sat, Mar 05, 2016 at 09:11:59PM +0530, Vivek Pal wrote:
> Hello,
> 
> I hope your day is going well! Allow me to introduce myself. My name is
> Vivek Pal.  ...
>
> ...
>
> I have moderate-difficulty level experience in working on recommender
> systems as I had built a recommender system for the Minor project in
> college by taking cues from this research paper: A Graph-based model
> for context-aware recommendation using implicit feedback data. Here's
> the link for this paper:
> http://link.springer.com/article/10.1007/s11280-014-0307-z
> 
> After checking out proposed project ideas, I am quite interested in working
> on these projects:
> 
> 1. Alternatives to neighborhood-based collaborative filtering
> 
> 2. Approximate Nearest Neighbor Search
> 
> Regarding the first project idea, I'm currently reading the paper
> referenced as an alternative to k-NN for recommender selection. Would
> revert back if I require more clarification.

I hope you find the paper interesting.  I would encourage you to spend
some time thinking about how the graph-based model you mentioned earlier
might possibly fit into the CF framework.  I think it would be very cool
to be able to use other techniques, but that technique is so different
from matrix factorization that we would need to work through a couple of
important design details:

 * In what format does the user provide their implicit feedback data?
   Will this be the same as the rest of the mlpack algorithms (i.e.
   arma::mat or arma::sp_mat)?

 * How will the API of the CF class need to change to be able to support
   this?

 * How will a user use CF with this technique?

If that doesn't interest you too much, that's okay; I wanted to mention
this as a possibility.  Don't feel constrained by what is written in the
Ideas list description; if you have another algorithm you would like to
implement that will give good performance for CF, that would definitely
be interesting.

> Also, if the issue #406 is taken up by someone else what else can be
> done on my part to get involved deeper into the project.

Don't worry---contributing is not a requirement for an application.  So
if you don't find anything that you think you can do, that's not
necessarily a problem.

Still, there are lots of things you could do outside of the existing
list of bugs.  If you can think of improvements to the CF code, or if
you found a way to improve the speed of CF, these would be valuable
contributions too.

> And regarding the second project, no relevant
> issues are mentioned yet. How can I proceed further on working on this
> project? I think I could use some appropriate directions that can help me
> get seriously involved into the projects.

Familiarity with the literature is an important part of that project.
Here's a link to a mailing list post with more information on this
project:

https://mailman.cc.gatech.edu/pipermail/mlpack/2016-March/000765.html

> I kind of feel lost even though I'm trying to take up open issues from
> Github but most of them are already taken up and it wouldn't make
> sense to duplicate the work already done.

I'll see if I can add some more "easy" issues in the next couple of
days, but often things like that get taken care of quickly so it's hard
to keep a good backlog for people who want to contribute.  Like I said
earlier, you are always welcome to just poke around the library and try
to fix any problems you find, or improve the speed of various parts of
it.

> I apologize for starting off a little late but I've been quite busy
> with coursework in college last week. Though, it seems next week is
> going to be much more relaxed as per the planned schedule so expect me
> to be available all the time.

The application period hasn't even opened yet; I would not say that you
are starting "a little late". :)

> Also, I was wondering that is it possible that these projects
> are already assigned to other students already on the account of their
> prior contributions?

No decisions will be made on which projects are accepted until
applications are all submitted (the deadline is March 25).  Project
acceptance announcements will be made on April 22nd.

> One thing I want to make a point on is the EXTENSIVE and VERY HELPFUL
> guidelines for anyone who wants to get invoved in this project. I was able
> to build and install Mlpack easily on my system. Thanks so much for putting
> together this documentation. I'm basically done with all the guidelines and
> starting points given on this page: http://www.mlpack.org/involved.html,

Great, I am glad you found those resources helpful! :)

Thanks,

Ryan

-- 
Ryan Curtin    | "For more enjoyment and greater efficiency,
ryan at ratml.org | consumption is being standardized."


More information about the mlpack mailing list