[mlpack-git] [mlpack] Mean shift clustering (#388)

Shangtong Zhang notifications at github.com
Thu Apr 23 03:58:04 EDT 2015


I tried corel.csv.
My implementation costs 44s
```
[INFO ] Program timers:
[INFO ]   clustering: 44.613388s
[INFO ]   computing_neighbors: 0.114881s
[INFO ]   loading_data: 7.664288s
[INFO ]   range_search/computing_neighbors: 37.602321s
[INFO ]   range_search/tree_building: 1.618025s
[INFO ]   saving_data: 0.639427s
[INFO ]   total_time: 52.928002s
[INFO ]   tree_building: 1.622805s
```
While scikit costs 10.5s.
Result is the same.
I think the bottleneck is range_search.
I compare the range_search of mlpack with NearestNeighbors of scikit
I save all queried vectors when performing MS into a means.csv and run query.
```
  range::RangeSearch<> rangeSearcher(data, false, true);
  arma::mat tAllMean = allMean.t();
  tAllMean.save("means.csv", arma::csv_ascii);
  Timer::Start("search_test");
  for (size_t i = 0; i < allMean.n_cols; ++i) {
    rangeSearcher.Search(allMean.unsafe_col(i), validRadius,
                         neighbors, distances);

  }
  Timer::Stop("search_test");
  timeval t = Timer::Get("search_test");
  std::cout << t.tv_sec << std::endl;
```
it costs 38s
while in scikit,
```
d = numpy.genfromtxt('/Users/HurricaneTong/GitHub/mlpack/build_MS_nondebug/bin/Debug/corel.csv', delimiter=',')
bw = estimate_bandwidth(d, quantile=0.2, n_samples=500)


means = numpy.genfromtxt('/Users/HurricaneTong/GitHub/mlpack/build_MS_nondebug/bin/Debug/means.csv', delimiter=',')
nbrs = NearestNeighbors(radius=bw).fit(d)
t1 = time.time()
for i in range(0, means.shape[0]):
    nbrs.radius_neighbors([means[i,:]], bw, return_distance=True)
t2 = time.time()
print t2 - t1
```
it costs 3.4s

---
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/388#issuecomment-95480678
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20150423/83f21bec/attachment.html>


More information about the mlpack-git mailing list