<p>I think my issue was that I had to specify <code>HAS_OPENMP=YES</code>.  So now I have it running in parallel.</p>


<p>Some scaling tests on my machine (8-core i7-3770), for a few datasets.  Here I tested with the corel and covertype datasets, looking only at the <code>computing_neighbors</code> timer.  The outside-of-mlpack load average of the system was about 1.8, so you can assume that 2 cores were already busy.  I tested with a couple underlying LAPACK/BLAS variants.</p>


<p>With ATLAS (3.10.2-9+b1):</p>


<pre><code>-&lt; ryan@zax &gt;&lt; ~/src/mlpack-mentekid/build-nodebug &gt;&lt; 24 &gt;-

-&lt; 14:16:51 &gt;- $ for t in 1 2 3 4 5 6 7 8; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/corel.csv -r ~/datasets/corel.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 7.910973s

2 threads: 4.195096s

3 threads: 3.297027s

4 threads: 2.180582s

5 threads: 2.102062s

6 threads: 1.936919s

7 threads: 1.958664s

8 threads: 1.713584s

</code></pre>


<pre><code>-&lt; ryan@zax &gt;&lt; ~/src/mlpack-mentekid/build-nodebug &gt;&lt; 26 &gt;-

-&lt; 14:23:39 &gt;- $ for t in 1 2 3 4 5 6 7 8; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/phy.csv -r ~/datasets/phy.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 83.253656s (1 mins, 23.2 secs)

2 threads: 42.035172s

3 threads: 29.575169s

4 threads: 23.218118s

5 threads: 20.773536s

6 threads: 18.055156s

7 threads: 17.681238s

8 threads: 17.888020s

</code></pre>


<pre><code>-&lt; ryan@zax &gt;&lt; ~/src/mlpack-mentekid/build-nodebug &gt;&lt; 27 &gt;-

-&lt; 14:29:09 &gt;- $ for t in 1 2 3 4 5 6 7 8; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/miniboone.csv -r ~/datasets/miniboone.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 19.044967s

2 threads: 9.594015s

3 threads: 6.481856s

4 threads: 3.683142s

5 threads: 4.161889s

6 threads: 3.129131s

7 threads: 4.141623s

8 threads: 4.190556s

</code></pre>


<p>With OpenBLAS (0.2.18-1):</p>


<pre><code>-&lt; ryan@zax &gt;&lt; ~/src/mlpack-mentekid/build-nodebug &gt;&lt; 45 &gt;-

-&lt; 14:38:21 &gt;- $ for t in 1 2 3 4 5 6 7 8; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/corel.csv -r ~/datasets/corel.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 7.706135s

2 threads: 4.041237s

3 threads: 2.777471s

4 threads: 2.023099s

5 threads: 2.031080s

6 threads: 2.088373s

7 threads: 1.623153s

8 threads: 1.656306s

</code></pre>


<pre><code>-&lt; ryan@zax &gt;&lt; ~/src/mlpack-mentekid/build-nodebug &gt;&lt; 46 &gt;-

-&lt; 14:43:01 &gt;- $ for t in 1 2 3 4 5 6 7 8; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/phy.csv -r ~/datasets/phy.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 85.958403s (1 mins, 25.9 secs)

2 threads: 43.510783s

3 threads: 27.753276s

4 threads: 22.183104s

5 threads: 19.071099s

6 threads: 17.832296s

7 threads: 16.625094s

8 threads: 15.502252s

</code></pre>


<pre><code>-&lt; ryan@zax &gt;&lt; ~/src/mlpack-mentekid/build-nodebug &gt;&lt; 47 &gt;-

-&lt; 14:50:13 &gt;- $ for t in 1 2 3 4 5 6 7 8; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/miniboone.csv -r ~/datasets/miniboone.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 16.803330s

2 threads: 9.012268s

3 threads: 6.607606s

4 threads: 4.308089s

5 threads: 4.446321s

6 threads: 3.457773s

7 threads: 3.839490s

8 threads: 3.354377s

</code></pre>


<p>Maybe I could have made a nicer graph, but I did not want to take the effort. :)</p>


<p>Next I ran on a very powerful system, with a Xeon E5-2630 v3 (32 cores).</p>


<p>With standard LAPACK/BLAS:</p>


<pre><code>◈ ryan@humungus ☃ build-nodebug ◈ $ for t in `seq 1 32`; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/corel.csv -r ~/datasets/corel.csv -k 3 -v -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done            

1 threads: 8.404026s

2 threads: 5.168623s

3 threads: 3.672489s

4 threads: 2.822260s

5 threads: 2.223130s

6 threads: 2.099296s

7 threads: 1.787155s

8 threads: 1.557677s

9 threads: 1.445799s

10 threads: 1.218698s

11 threads: 1.283903s

12 threads: 1.261723s

13 threads: 1.354944s

14 threads: 1.013850s

15 threads: 1.053046s

16 threads: 1.122099s

17 threads: 0.957182s

18 threads: 0.888229s

19 threads: 0.911108s

20 threads: 0.924035s

21 threads: 0.920874s

22 threads: 0.859121s

23 threads: 0.836497s

24 threads: 0.823132s

25 threads: 0.809634s

26 threads: 0.737277s

27 threads: 0.804975s

28 threads: 0.805146s

29 threads: 0.762401s

30 threads: 0.729818s

31 threads: 0.724836s

32 threads: 0.799443s

</code></pre>


<pre><code>◈ ryan@humungus ☃ build-nodebug ◈ $ for t in `seq 1 32`; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/phy.csv -r ~/datasets/phy.csv -k 3 -v -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 101.918782s (1 mins, 41.9 secs)

2 threads: 53.510828s

3 threads: 36.325209s

4 threads: 30.153590s

5 threads: 24.189268s

6 threads: 19.784561s

7 threads: 17.123877s

8 threads: 15.772134s

9 threads: 14.109838s

10 threads: 14.002757s

11 threads: 12.633819s

12 threads: 11.876447s

13 threads: 11.657881s

14 threads: 11.784745s

15 threads: 10.936825s

16 threads: 9.407911s

17 threads: 10.028609s

18 threads: 9.399953s

19 threads: 9.154050s

20 threads: 8.479986s

21 threads: 7.621993s

22 threads: 8.136546s

23 threads: 7.710549s

24 threads: 7.581741s

25 threads: 7.403005s

26 threads: 6.827410s

27 threads: 6.997940s

28 threads: 7.297680s

29 threads: 6.643068s

30 threads: 6.553058s

31 threads: 6.937021s

32 threads: 6.724141s

</code></pre>


<pre><code>◈ ryan@humungus ☃ build-nodebug ◈ $ for t in `seq 1 32`; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/miniboone.csv -r ~/datasets/miniboone.csv -k 3 -v -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 24.463162s               

2 threads: 14.399175s               

3 threads: 6.809879s                

4 threads: 6.064711s                

5 threads: 5.320080s                

6 threads: 3.913631s                

7 threads: 3.571118s                

8 threads: 2.730636s                

9 threads: 2.855679s                

10 threads: 2.648417s

11 threads: 3.071749s

12 threads: 2.562618s

13 threads: 2.517803s

14 threads: 2.085122s

15 threads: 2.079082s

16 threads: 2.138712s

17 threads: 2.142987s

18 threads: 1.836003s

19 threads: 1.576602s

20 threads: 1.795865s

21 threads: 1.637288s

22 threads: 1.889029s

23 threads: 1.258768s

24 threads: 1.474051s

25 threads: 1.658719s

26 threads: 1.444587s

27 threads: 1.327272s

28 threads: 1.342775s

29 threads: 1.756671s

30 threads: 1.317495s

31 threads: 1.431359s

32 threads: 1.595325s

</code></pre>


<p>With OpenBLAS (0.2.14-1ubuntu1):</p>


<pre><code>◈ ryan@humungus ☃ build-nodebug ◈ $ for t in `seq 1 32`; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/corel.csv -r ~/datasets/corel.csv -k 3 -v -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 8.336585s

2 threads: 4.911874s

3 threads: 3.842410s

4 threads: 2.971360s

5 threads: 2.358011s

6 threads: 1.924199s

7 threads: 1.716306s

8 threads: 1.568955s

9 threads: 1.541698s

10 threads: 1.256109s

11 threads: 1.356592s

12 threads: 1.159481s

13 threads: 1.290556s

14 threads: 1.227934s

15 threads: 1.227318s

16 threads: 1.109251s

17 threads: 1.082635s

18 threads: 0.902164s

19 threads: 0.908723s

20 threads: 0.905903s

21 threads: 0.905672s

22 threads: 0.887319s

23 threads: 0.877363s

24 threads: 0.802047s

25 threads: 0.762360s

26 threads: 0.835936s

27 threads: 0.823067s

28 threads: 0.748453s

29 threads: 0.758463s

30 threads: 0.834105s

31 threads: 0.810029s

32 threads: 0.830186s

</code></pre>


<pre><code>◈ ryan@humungus ☃ build-nodebug ◈ $ for t in `seq 1 32`; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/phy.csv -r ~/datasets/phy.csv -k 3 -v -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 101.596847s (1 mins, 41.5 secs)

2 threads: 53.533078s

3 threads: 36.128960s

4 threads: 27.706339s

5 threads: 23.167973s

6 threads: 19.631714s

7 threads: 16.814206s

8 threads: 15.843670s

9 threads: 15.085720s

10 threads: 13.145210s

11 threads: 13.119659s

12 threads: 11.100898s

13 threads: 11.431071s

14 threads: 11.277082s

15 threads: 10.915975s

16 threads: 9.818397s

17 threads: 9.682370s

18 threads: 9.183385s

19 threads: 8.878544s

20 threads: 8.670723s

21 threads: 8.163627s

22 threads: 8.209054s

23 threads: 7.823726s

24 threads: 7.691504s

25 threads: 7.547275s

26 threads: 7.398752s

27 threads: 7.768681s

28 threads: 6.944279s

29 threads: 7.044016s

30 threads: 7.291548s

31 threads: 6.655275s

32 threads: 6.863990s

</code></pre>


<pre><code>◈ ryan@humungus ☃ build-nodebug ◈ $ for t in `seq 1 32`; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/miniboone.csv -r ~/datasets/miniboone.csv -k 3 -v -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 22.185537s

2 threads: 10.286940s

3 threads: 8.115816s

4 threads: 5.231896s

5 threads: 4.176525s

6 threads: 3.875466s

7 threads: 3.264623s

8 threads: 3.284492s

9 threads: 3.308781s

10 threads: 2.977220s

11 threads: 2.649113s

12 threads: 2.250442s

13 threads: 2.180048s

14 threads: 1.871922s

15 threads: 1.915900s

16 threads: 2.084260s

17 threads: 1.952864s

18 threads: 1.906784s

19 threads: 2.026871s

20 threads: 1.933004s

21 threads: 1.643694s

22 threads: 1.599656s

23 threads: 1.547806s

24 threads: 1.600161s

25 threads: 1.788019s

26 threads: 1.630448s

27 threads: 1.785991s

28 threads: 1.744910s

29 threads: 1.849506s

30 threads: 2.061815s

31 threads: 2.133709s

32 threads: 1.773207s

</code></pre>


<p>Lastly, I ran on a humble i5 650 (4 cores).  This system was completely idle and was under 0 load when I ran these simulations.  (So it differs from your typical 4-core desktop/laptop, in which probably one or two cores will be saturated at any given time because someone is actively using the system.)</p>


<p>With ATLAS (3.10.2-6):</p>


<pre><code>(( ryan @ dambala )) ~/src/mlpack-mentekid/build-nodebug $ for t in 1 2 3 4; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/corel.csv -r ~/datasets/corel.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 16.780850s

2 threads: 10.141014s

3 threads: 8.087287s

4 threads: 7.063538s

</code></pre>


<pre><code>(( ryan @ dambala )) ~/src/mlpack-mentekid/build-nodebug $ for t in 1 2 3 4; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/phy.csv -r ~/datasets/phy.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 132.373939s (2 mins, 12.3 secs)

2 threads: 71.849087s (1 mins, 11.8 secs)

3 threads: 54.420773s

4 threads: 46.805077s

</code></pre>


<pre><code>(( ryan @ dambala )) ~/src/mlpack-mentekid/build-nodebug $ for t in 1 2 3 4; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/miniboone.csv -r ~/datasets/miniboone.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 21.639796s

2 threads: 12.144827s

3 threads: 13.050360s

4 threads: 9.987645s

</code></pre>


<p>With OpenBLAS (0.2.12-1):</p>


<pre><code>(( ryan @ dambala )) ~/src/mlpack-mentekid/build-nodebug $ for t in 1 2 3 4; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/corel.csv -r ~/datasets/corel.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 17.990595s

2 threads: 9.546705s

3 threads: 6.832160s

4 threads: 7.639326s

</code></pre>


<pre><code>(( ryan @ dambala )) ~/src/mlpack-mentekid/build-nodebug $ for t in 1 2 3 4; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/phy.csv -r ~/datasets/phy.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 127.936811s (2 mins, 7.9 secs)

2 threads: 72.647918s (1 mins, 12.6 secs)

3 threads: 53.592957s

4 threads: 45.845096s

</code></pre>


<pre><code>(( ryan @ dambala )) ~/src/mlpack-mentekid/build-nodebug $ for t in 1 2 3 4; do OMP_NUM_THREADS=$t bin/mlpack_lsh -q ~/datasets/miniboone.csv -r ~/datasets/miniboone.csv -v -k 3 -d d.csv -n n.csv | grep computing_neighbors | awk -F':' '{ print $2 }' | sed "s/^/$t threads:/" ; done

1 threads: 28.859252s

2 threads: 17.077041s

3 threads: 12.146553s

4 threads: 8.597622s

</code></pre>


<p>So overall I am definitely seeing some non-negligible speedup, although with only four cores the speedup is limited.  I guess I am a bit confused: I think you were saying that you were seeing no useful speedup at all?</p>


<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">&mdash;<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/700#issuecomment-230593171">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe/AJ4bFDF8vU_44ysM8MXFN2lwOxqaeZGpks5qSr6IgaJpZM4I5KSz">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFMXUUtkQAYl59vR6zoWXNzcUs_Pnks5qSr6IgaJpZM4I5KSz.gif" width="1" /></p>

<div itemscope itemtype="http://schema.org/EmailMessage">

<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">

  <link itemprop="url" href="https://github.com/mlpack/mlpack/pull/700#issuecomment-230593171"></link>

  <meta itemprop="name" content="View Pull Request"></meta>

</div>

<meta itemprop="description" content="View this Pull Request on GitHub"></meta>

</div>