[mlpack] #251

Rafi Kamal rafikamal93 at yahoo.com
Sun Jan 12 02:33:40 EST 2014


Hi
I'm Rafi from Bangladesh. I'm an undergraduate student from Bangladesh University of Engineering & Technology. Like Abhinav, I've recently started looking at the ML pack codebase. I might participate in upcoming GSoC 2014, and as I've interest in machine learning, I've chosen MLPack. I was following your discussion and I think it's a good time to join :)

I've looked into the BSP tree codes. To answer Abhinav's query: I think what is needed here is to pull out the splitNode function from BinarySpaceTree class and wrap it in a separate class (say, BSPNodeSplitterMaxWIdth). Then this BSPNodeSplitterMaxWIdth class will be passed as a template argument to the BinarySpaceTree class. 

The reason for why it is needed: if you look at the splitNode class, you can see that the node is being split in it's maximum width position. According to ticket #235, there might be other ways to split the node. So, we can create another class which split the node based on another criteria; say for example, BSPNodeSplitterMid, which splits the node in the middle point. And then we can pass this class as a template argument to BinarySpaceTree class. So
BinarySpaceTree<SomeBound, SomeStateType, SomeMatType, BSPNodeSplitterMaxWIdth> tree; // nodes of this class will be split in maximum width position

BinarySpaceTree<SomeBound, SomeStateType, SomeMatType, BSPNodeSplitterMid> tree; // nodes of this class will be split in the middle position


That's what I've understood by looking at the source and the ticket, Ryan can correct me if I'm wrong.

Regards
Rafi Kamal



On Sunday, January 12, 2014 10:43 AM, Ryan Curtin <gth671b at mail.gatech.edu> wrote:
 
On Sat, Jan 11, 2014 at 07:39:55PM +0530, Abhinav Pathak wrote:
> Hey,
> 
> I'm Abhinav from NITK Surathkal, India and i have recently started looking
> into the source of mlpack and i'm greatly interested to contribute to it in
> any way i can.
> 
> Please let me know where and how should i start doing so?
> 
> 
> And also, can i be given some more info on ticket no #251 ? As in what
> exactly needs to be done there? I saw it mentioned in the kmeans.hpp code
> file also. I would like to fix it, if i can be given more info.

Hello Abhinav,

Ticket #251 references the Pelleg-Moore k-means algorithm, detailed in
this paper:

http://ra.adm.cs.cmu.edu/anon/usr/ftp/2000/CMU-CS-00-105.pdf

It's a neat algorithm that accelerates k-means by considering the points
in batch using a tree structure.  Anyway, the ticket is a bit in-depth,
but if you are interested, I am happy to help out...

There are a lot of algorithms in mlpack that use tree-based algorithms;
nearest neighbor, emst (fast Euclidean minimum spanning tree), FastMKS,
range search, approximate nearest neighbor.  Basically, these algorithms
work by building a tree on the data, and then traversing the tree (or
multiple trees, depending on the problem) and pruning different subtrees
to accelerate computation.  These concepts are detailed a little bit
better in this paper:

http://arxiv.org/pdf/1304.4327

Anyway, mlpack implements trees and traversals in a pretty abstract
form, so to implement a dual-tree algorithm (or a single-tree algorithm,
which is what Pelleg-Moore k-means is), you simply need to implement the
Score() and BaseCase() functions.  Unfortunately it's not quite that
simple, though, because someone needs to take the Pelleg-Moore algorithm
and figure out what the BaseCase() and Score() functions need to be.

Once the Pelleg-Moore algorithm can be expressed as a BaseCase() and
Score() function that could be paired with a pruning single-tree
traversal, then it should be straightforward to implement both that and
tests for it.

Anyway, like I said, it's a bit complex and may be time-consuming, but
you will learn lots of interesting things about tree-based algorithms...
:)

An easier place to start might be ticket #235:

http://www.mlpack.org/trac/ticket/235

Basically, the BinarySpaceTree class in
src/mlpack/core/tree/binary_space_tree/ needs to be refactored in such a
way that the SplitNode() function is part of a template class that is an
argument to BinarySpaceTree.  I can clarify more on that if you like.

Thanks,

Ryan

-- 
Ryan Curtin    | "We need some time for some things to happen!"
ryan at ratml.org |   - Bells

_______________________________________________
mlpack mailing list
mlpack at cc.gatech.edu
https://mailman.cc.gatech.edu/mailman/listinfo/mlpack
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140111/227c512d/attachment-0001.html>


More information about the mlpack mailing list