[mlpack] Introduction to the Community

Ryan Curtin ryan at ratml.org
Mon Mar 2 10:02:02 EST 2015


On Sun, Mar 01, 2015 at 01:25:08PM +0530, prabhdeep singh wrote:
> Hi,
> 
> I am Prabhdeep Singh and I would like to introduce myself to the
> developer's community of mlpack. I am an undergraduate student at Birla
> Institute of Technology and Science, Pilani, India. I am extremely
> proficient with using C, C++ and the UNIX environment. Also I have finished
> some Machine Learning courses and am very interested in contributing to its
> application. I hope for the opportunity to contribute to the development of
> mlpack through GSOC.
> 
> I have already built mlpack and set up the development environment and
> tried some examples to get familiar with using the libraries. I was
> browsing the mlpack GSoC 2015 ideas page <
> https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas> and I was
> interested in these ideas based on dual-tree algorithms :
> 
> 1. Implement tree types
> <https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#implement-tree-types>
> 2. Improvement of tree traversers
> <https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#improvement-of-tree-traversers>
> 
> I would love to learn more about dual-tree algorithms, their
> implementation, and studying how different implementations as mentioned in
> [1] above, or how tinkering with existing implementation [2], would improve
> mlpack's performance.
> 
> It would be great if you could mention some resources from where I could
> study the time-space analysis, and implementation of dual-tree algorithms.
> Also I would love to work on any existing bugs and start contributing to
> mlpack.

Hi Prabhdeep,

A lot of the papers to read here are the same as what's linked in this
email:

https://mailman.cc.gatech.edu/pipermail/mlpack/2015-February/000610.html

You can also follow the references of those papers.  That should give a
good introduction to dual-tree algorithms.

If you want to tinker with mlpack's current implementations, once you
understand the abstractions used for dual-tree algorithms (described in
`Tree-Independent Dual-Tree Algorithms'), you might take a look at the
traversers, tree types, and rule sets for the dual-tree algorithms
mlpack has implemented.  Rules can be found in
src/mlpack/methods/neighbor_search/, src/mlpack/methods/emst/,
src/mlpack/methods/range_search/, src/mlpack/methods/rann/, and
src/mlpack/methods/fastmks/.

You can build the corresponding program (i.e. 'allknn'), then you can
run allknn with the '-v' option on some datasets to see how long it
takes.  Then, if you make some changes in your tinkering, you can
rebuild and run again and see if it yields any speedup.

I hope this provides a decent picture of how to start modifying and
improving mlpack's dual-tree algorithms.  Please let me know if you have
any questions.

Thanks,

Ryan

-- 
Ryan Curtin    | "Open the pig!"
ryan at ratml.org |   - Frank Moses


More information about the mlpack mailing list