[mlpack] CMakeList adjustments for compiling Mlpack with Armadillo using openblas

Steenwijk, Martijn m.steenwijk at vumc.nl
Sat Dec 28 15:06:32 EST 2013


Thanks again for your response :)

@armadillo: exactly, that's what I did. 

@benchmarks: thanks, that looks pretty impressive. Marcus did a pretty damn cool job on the benchmarking system. :-) Would be really helpful to have widely used libraries such as ANN and FLANN in there, but I'm not sure whether he has still time after this... The problem (or stated differently: "challenge") with my data is always the amount of points. There is no comparable standard dataset of this size... 

Oh before I forget, I use ANN with "exact" precision. 

@allkrann: that's another possibility, although my application normally requires exact (to very low error) accuracy. Anyway, thanks again, I'll try some things and let you know how they worked out. 
 

-----Original Message-----
From: Ryan Curtin [mailto:gth671b at mail.gatech.edu] 
Sent: zaterdag 28 december 2013 20:07
To: Steenwijk, Martijn
Cc: 'mlpack at cc.gatech.edu'
Subject: Re: [mlpack] CMakeList adjustments for compiling Mlpack with Armadillo using openblas

On Sat, Dec 28, 2013 at 06:31:49PM +0000, Steenwijk, Martijn wrote:
> Hi Ryan,
> 
> Thanks for your reply. I'm not sure were things got wrong, as I'm not
> a true expert on this topic. Openblas is on my system not available by
> default, and as I'm not an admin, I had to explicitly point to the
> library while compiling Armadillo. This is probably why armadillo did
> not find openblas automatically - and why I had to change armadillo's
> configuration file. 

By default the Armadillo ./configure script doesn't allow options, but
you can just call CMake directly and tell it where OpenBLAS is:

$ cd armadillo-X.Y.Z/
$ cmake -D OpenBLAS_LIBRARY=/path/to/libopenblas.so .

> It was just a suggestion, I would appreciate if you can make a ticket out of it. :-)

http://www.mlpack.org/trac/ticket/312

> The use of MLpack last days has led to another question; is there any
> paper or performance data on the kNN implementation used? It will off
> course depend on the number of dimensions and train points; but in my
> case (~5-10 dimensions and several millions of train points) it
> appeared to work slightly slower than ANN (which is quite impressive,
> as other alternatives such as FLANN and libnabo appear to be much and
> much slower in this situation). I'm wondering how MLpack would respond
> to extra dimensions (ANN slows down tremendously) or intercorrelated
> dimensions (ANN also slows down).

There are benchmarks for AllkNN in the following papers:

  http://jmlr.org/papers/volume14/curtin13a/curtin13a.pdf
  http://www.biglearn.org/2011/files/papers/biglearn2011_submission_38.pdf

Also, our Google Summer of Code student Marcus made a very cool
benchmarking system that I still haven't linked to the frontpage of
mlpack.org:

  http://www.mlpack.org/benchmark.html

You can open the "ALLKNN" tab and take a look at how mlpack compares
with respect to other libraries.  There are some further performance
improvements I am working on for kNN but they are not done yet.

I am glad to hear than our exact kNN implementation is nearly as fast as
ANN (which is approximate).  You may want to try rank-approximate
nearest neighbors (allkrann).  In our benchmarks it outperforms allknn
on the MNIST dataset (784 dimensions, 70k points) by a factor of 10.

Performance for nearest neighbor search with trees is highly dependent
on the dataset.  In general, kd-trees don't scale very well with
dimension.  A better choice there would be something like ball trees,
but mlpack support for ball trees is not yet stable.  I'm not sure what
the effect of intercorrelated dimensions would be, but I do know that
ANN does not use kd-trees and instead uses the BBD-tree which is a
related structure that I don't have much intuition for.

Your best bet is to try it on a small subset of your higher
dimensionality data and see what trends you see.

-- 
Ryan Curtin    | "She fell..."
ryan at ratml.org |   - Ludvig


More information about the mlpack mailing list