[mlpack] GSoC 2015: Fast k-centers algorithm & implementation

Ryan Curtin ryan at ratml.org
Wed Mar 4 10:00:04 EST 2015


On Wed, Mar 04, 2015 at 03:23:55AM -0300, Lukas Zorich wrote:
> Hi Ryan,
> 
> First of all, it's too bad that mlpack hasn't been accepted into GSoC'15 :(

Indeed -- once I learn from Google on Friday why mlpack wasn't accepted,
I'll make a more detailed post to the mailing list.

> I spent a lot of time reading about dual-tree algorithms and diving into
> mlpack code. The thing is that I'm still interested in dual-tree
> algorithms, and I also want to sharpen my C++ skills, so I still want to
> contribute to mlpack :)

Great! :)

> I thought that I could implement KDE (
> https://github.com/mlpack/mlpack/issues/150) to start. I will use the
> various mlpack dual-tree algorithms already implemented as a guide. Do you
> think this is achivable for me?

Yes, KDE should be pretty straightforward.  You can use the rules in
``Tree-Independent Dual-Tree Algorithms'', Algorithms 9 and 10 as a
simple implementation.  That's relative-value approximate kernel density
estimation, which ensures that the estimate f*(x) is such that

  | f*(x) - f(x) | < \epsilon

for every point x in the dataset.

There are more complicated algorithms in other papers, so if you want to
try those, feel free, but if they're not in a tree-independent form
already you may have to do some thinking.  The one in ``Nonparametric
density estimation: towards computational tractability'' by Alex Gray
and Andrew Moore is especially crazy.  I don't know how well it works in
practice.

> I will let you know if I have questions :)

Sure, I'm more than happy to help.

-- 
Ryan Curtin    | "You can think about it... but don't do it."
ryan at ratml.org |   - Sheriff Justice


More information about the mlpack mailing list