[mlpack] google summer of code [GSoC]

Ryan Curtin gth671b at mail.gatech.edu
Tue Apr 9 22:59:47 EDT 2013


On Tue, Apr 09, 2013 at 08:16:43PM -0600, Daniel Bell wrote:
> I was intrigued by your improvement of tree traversers suggestion, I
> am currently in a course that has a component on trees and so did some
> work with them.  I am interested in learning about and learning
> machine learning however I have never programmed in C++.  Do you think
> this is within my skill set that I have laid out?

The tree traversal project is a particularly difficult one which expects
rigorous C++ knowledge.  This does not mean that you could not do it,
but in slightly more detail:

Many of the algorithms in mlpack are fast because they are tree-based
branch-and-bound algorithms.  If you don't know about those you can read
them about them online; search for something like 'nearest neighbor' and
'kd-trees'.

In addition, there are many different possible types of trees and many
different machine learning algorithms.  mlpack uses C++ templates to
provide abstract traversals and algorithms which can work with any type
of tree (assuming the user implements the tree with the proper methods).
This is actually very neat and to my knowledge there is no other library
which implements anything even remotely similar.

However, the functionality is still somewhat experimental -- which is
why the tree abstractions haven't been advertised as a main selling
point.  The branch and bound type algorithms to which I was referring to
earlier depend heavily on tight bounds.  So improving the effectiveness
of these tree traversals will depend on the implementation of the bounds
in the algorithms and the efficiency of those bound calculations (as
well as other calculations).

So the project would involve both analyzing the existing traversals for
either algorithmic or implementational improvements as well as examining
the existing algorithms.

I could talk about this all day, given that it is my primary research
interest, but I'll spare the gory details for now.

tl;dr: if you're a second-year undergraduate without heavy algorithmic
knowledge and you're interested in this project, you probably are best
served by putting in 30 hours a week reading papers both on dual-tree
algorithms and learning C++ inside-out (metaprogramming does happen in
the mlpack world of trees).  So, I'm not going to say it's impossible,
but there's a reason its difficulty is rated a 9/10...

-- 
Ryan Curtin       | "...I still don't know what it means."
ryan at igglybob.com |   - Rigby Reardon


More information about the mlpack mailing list