<p>Are you sure you compiled mlpack without debugging symbols? Here is what I get when compiling mlpack with <code>-DDEBUG=OFF</code> and <code>-DPROFILE=OFF</code>. I used this test program for scikit:</p>
<pre><code>#!/usr/bin/python
import numpy
from sklearn.cluster import MeanShift
from sklearn.cluster import estimate_bandwidth
import time
d = numpy.genfromtxt('/home/ryan/datasets/corel.csv', delimiter=',')
bw = estimate_bandwidth(d, quantile=0.2, n_samples=500)
print(bw)
ms = MeanShift(bandwidth=bw, bin_seeding = True)
t1 = time.time()
ms.fit(d)
t2 = time.time()
print t2 - t1
print(len(numpy.unique(ms.labels_)))
</code></pre>
<p>This gave me the following output:</p>
<pre><code>0.430335887828
7.03606009483
1
</code></pre>
<p>So, bandwidth of 0.430336, it took 7.036 seconds, and we got 1 cluster as a result. Then, I use your implementation for mlpack:</p>
<pre><code>$ mean_shift -i ~/datasets/corel.csv -r 0.430335887828 -v -C centers.csv
[INFO ] Loading '/home/ryan/datasets/corel.csv' as CSV data. Size is 32 x 37749.
[INFO ] Performing mean shift clustering...
[INFO ] 46511 node combinations were scored.
[INFO ] 37749 base cases were calculated.
[INFO ] Found 1 centroids.
[WARN ] No extension given with filename ''; type unknown. Save failed.
[INFO ] Saving CSV data to 'centers.csv'.
[INFO ]
[INFO ] Execution parameters:
[INFO ] bandwidth: (Unknown data type - )
[INFO ] centroid_file: centers.csv
[INFO ] help: false
[INFO ] in_place: false
[INFO ] info: ""
[INFO ] inputFile: /home/ryan/datasets/corel.csv
[INFO ] max_iterations: 1000
[INFO ] output_file: ""
[INFO ] radius: 0.430336
[INFO ] verbose: true
[INFO ] version: false
[INFO ]
[INFO ] Program timers:
[INFO ] clustering: 3.681358s
[INFO ] computing_neighbors: 0.009845s
[INFO ] loading_data: 0.459559s
[INFO ] range_search/computing_neighbors: 2.075936s
[INFO ] range_search/tree_building: 0.440392s
[INFO ] saving_data: 0.000118s
[INFO ] total_time: 4.143638s
[INFO ] tree_building: 0.487500s
</code></pre>
<p>So, the mlpack implementation appears to be twice as fast as the scikit implementation. (I'm using Python 2.7.9 with Debian's <code>python-sklearn</code> 0.15.2-3 package.) I wouldn't be surprised if newer versions of scikit seem faster, but either way, the timings I'm getting are drastically different than you are, so maybe there is a configuration issue on your end?</p>
<p>With the covertype dataset and a bandwidth of 1524.6535, scikit takes 157.3325s while mlpack takes 42.159s.</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br>Reply to this email directly or <a href="https://github.com/mlpack/mlpack/pull/388#issuecomment-95606901">view it on GitHub</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFHxaE6PCM6cdJUGBDqBn7YEZC6Udks5oCPpFgaJpZM4DTzb1.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/mlpack/mlpack/pull/388#issuecomment-95606901"></link>
<meta itemprop="name" content="View Pull Request"></meta>
</div>
<meta itemprop="description" content="View this Pull Request on GitHub"></meta>
</div>