[mlpack] #251

Ryan Curtin gth671b at mail.gatech.edu
Sat Jan 11 23:43:01 EST 2014


On Sat, Jan 11, 2014 at 07:39:55PM +0530, Abhinav Pathak wrote:
> Hey,
> 
> I'm Abhinav from NITK Surathkal, India and i have recently started looking
> into the source of mlpack and i'm greatly interested to contribute to it in
> any way i can.
> 
> Please let me know where and how should i start doing so?
> 
> 
> And also, can i be given some more info on ticket no #251 ? As in what
> exactly needs to be done there? I saw it mentioned in the kmeans.hpp code
> file also. I would like to fix it, if i can be given more info.

Hello Abhinav,

Ticket #251 references the Pelleg-Moore k-means algorithm, detailed in
this paper:

http://ra.adm.cs.cmu.edu/anon/usr/ftp/2000/CMU-CS-00-105.pdf

It's a neat algorithm that accelerates k-means by considering the points
in batch using a tree structure.  Anyway, the ticket is a bit in-depth,
but if you are interested, I am happy to help out...

There are a lot of algorithms in mlpack that use tree-based algorithms;
nearest neighbor, emst (fast Euclidean minimum spanning tree), FastMKS,
range search, approximate nearest neighbor.  Basically, these algorithms
work by building a tree on the data, and then traversing the tree (or
multiple trees, depending on the problem) and pruning different subtrees
to accelerate computation.  These concepts are detailed a little bit
better in this paper:

http://arxiv.org/pdf/1304.4327

Anyway, mlpack implements trees and traversals in a pretty abstract
form, so to implement a dual-tree algorithm (or a single-tree algorithm,
which is what Pelleg-Moore k-means is), you simply need to implement the
Score() and BaseCase() functions.  Unfortunately it's not quite that
simple, though, because someone needs to take the Pelleg-Moore algorithm
and figure out what the BaseCase() and Score() functions need to be.

Once the Pelleg-Moore algorithm can be expressed as a BaseCase() and
Score() function that could be paired with a pruning single-tree
traversal, then it should be straightforward to implement both that and
tests for it.

Anyway, like I said, it's a bit complex and may be time-consuming, but
you will learn lots of interesting things about tree-based algorithms...
:)

An easier place to start might be ticket #235:

http://www.mlpack.org/trac/ticket/235

Basically, the BinarySpaceTree class in
src/mlpack/core/tree/binary_space_tree/ needs to be refactored in such a
way that the SplitNode() function is part of a template class that is an
argument to BinarySpaceTree.  I can clarify more on that if you like.

Thanks,

Ryan

-- 
Ryan Curtin    | "We need some time for some things to happen!"
ryan at ratml.org |   - Bells


More information about the mlpack mailing list