[mlpack-git] [mlpack] Mean shift clustering (#388)
Shangtong Zhang
notifications at github.com
Thu Apr 23 03:58:04 EDT 2015
I tried corel.csv.
My implementation costs 44s
```
[0;32m[INFO ] [0mProgram timers:
[0;32m[INFO ] [0m clustering: 44.613388s
[0;32m[INFO ] [0m computing_neighbors: 0.114881s
[0;32m[INFO ] [0m loading_data: 7.664288s
[0;32m[INFO ] [0m range_search/computing_neighbors: 37.602321s
[0;32m[INFO ] [0m range_search/tree_building: 1.618025s
[0;32m[INFO ] [0m saving_data: 0.639427s
[0;32m[INFO ] [0m total_time: 52.928002s
[0;32m[INFO ] [0m tree_building: 1.622805s
```
While scikit costs 10.5s.
Result is the same.
I think the bottleneck is range_search.
I compare the range_search of mlpack with NearestNeighbors of scikit
I save all queried vectors when performing MS into a means.csv and run query.
```
range::RangeSearch<> rangeSearcher(data, false, true);
arma::mat tAllMean = allMean.t();
tAllMean.save("means.csv", arma::csv_ascii);
Timer::Start("search_test");
for (size_t i = 0; i < allMean.n_cols; ++i) {
rangeSearcher.Search(allMean.unsafe_col(i), validRadius,
neighbors, distances);
}
Timer::Stop("search_test");
timeval t = Timer::Get("search_test");
std::cout << t.tv_sec << std::endl;
```
it costs 38s
while in scikit,
```
d = numpy.genfromtxt('/Users/HurricaneTong/GitHub/mlpack/build_MS_nondebug/bin/Debug/corel.csv', delimiter=',')
bw = estimate_bandwidth(d, quantile=0.2, n_samples=500)
means = numpy.genfromtxt('/Users/HurricaneTong/GitHub/mlpack/build_MS_nondebug/bin/Debug/means.csv', delimiter=',')
nbrs = NearestNeighbors(radius=bw).fit(d)
t1 = time.time()
for i in range(0, means.shape[0]):
nbrs.radius_neighbors([means[i,:]], bw, return_distance=True)
t2 = time.time()
print t2 - t1
```
it costs 3.4s
---
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/388#issuecomment-95480678
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20150423/83f21bec/attachment.html>
More information about the mlpack-git
mailing list