[mlpack-git] [mlpack] Bug fix for Elkan? (#592)

Ryan Curtin notifications at github.com
Fri Mar 25 14:35:50 EDT 2016


Yep, thanks for pointing this out.  The dualtree code does not have the same bug that the Elkan code does.  Your fix here appears to solve one of these issues.  But there are some other things to do to that relate to your initial discussion, which I will work on as I have time:

 * A better initialization strategy.  You are right that what is in mlpack now is not good.  But this will require some amount of refactoring.
 * There is some bug with the dual-tree algorithm.  I have traced it back to a commit where I know it works, but for now it is doing something really weird and not converging for anything.
 * Better handling for what to do with empty clusters.  There are many options: leave them dead and set to DBL_MAX or some other invalid value; leave them dead in-place; re-initialize them (in any number of different ways).  The right way to handle this will be to allow any template policy to be used here; so instead of setting them to DBL_MAX in kmeans_impl.hpp, that should be separated out into some other file.

Unfortunately the changes I'm proposing here will end up meaning that the fix here will be overwritten.  But I'm still happy to merge it as a just-for-now workaround, if you make the fix the simple `newCentroids.col(i).fill(DBL_MAX)` and not worry about what to do with empty clusters.  (That change will come later.)

Thanks again for pointing all this out, it is always nice to have another good set of eyes on the implementation.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/592#issuecomment-201409766
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160325/78a0d613/attachment.html>


More information about the mlpack-git mailing list