[mlpack-git] [mlpack] Mean shift clustering (#388)

Fri Mar 27 14:57:25 EDT 2015

Okay, I finally had a chance to look through this more thoroughly (my schedule has cleared up a bit, and I can now focus on getting this done).  I went through and made several changes, but it was unclear to me how I might commit to your particular branch, so what I did instead is post my patches to http://www.ratml.org/misc/patches.zip.  You can unzip the patches, then 'git am *.patch' to apply them to your branch.  It's mostly formatting/style cleanups, but there are a few speedups there.

However, what kept me from saying "okay, we're good to go" is that the scikit implementation of mean shift is an order of magnitude faster or more.  I used the following test script for scikit:

```
import numpy
from sklearn.cluster import MeanShift, estimate_bandwidth

d = numpy.genfromtxt('/path/to/dataset.csv', delimiter=',')
bw = estimate_bandwidth(d, quantile=0.2, n_samples=500)

print(bw)

ms = MeanShift(bandwidth=bw, bin_seeding=True)
ms.fit(d)

print(ms.cluster_centers_)
print(len(numpy.unique(ms.labels_)))
```

And then I would use the printed bandwidth with the `mean_shift` program.  Could you please look into why the implementation for mlpack is so slow comparatively?

---
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/388#issuecomment-87055142
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20150327/08bd4237/attachment.html>