[mlpack-git] [mlpack] Mean shift clustering (#388)

Mon Mar 30 04:05:08 EDT 2015

Thanks for your patches. I feel very guilty for letting you do these trivial work like modifying code style. But I can't apply 0002-Remove-spaces-remove-unnecessary-semicolons.patch, git always failed when applying it. So I just remove two unnecessary semicolons manually. 
I do tests on iris.csv. scikit costs about 30 ms and mlpack costs 200 ms. After using unsafe_col, mlpack costs 140ms. I think the major cause is range search.
In my implementation I use brute way to do range search. But scikit uses tree to do this.
I do following changes in mean_shift_.py line 131 in scikit 
            #i_nbrs = nbrs.radius_neighbors([my_mean], bandwidth,
            #                               return_distance=False)[0]
            i_nbrs = []
            for i in range(0, X.shape[0]) :
                if (extmath.norm(my_mean - X[i]) < bandwidth) :
                    i_nbrs.append(i)
With this brute approach, scikit costs 400ms.
I go through RangeSearch in mlpack but it seems I can only do search with pre-determined querySet.
How can I do search if I have a pre-determined referenceSet  but don't know the querySet ahead.

---
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/388#issuecomment-87584301
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20150330/e1014660/attachment-0001.html>