[mlpack-git] [mlpack] mlpack KMeans class much slower than armadillo kmeans() (#514)
jciminogh
notifications at github.com
Tue Feb 2 02:53:48 EST 2016
I seem to be getting much faster clustering from armadillo's [kmeans()](http://armasourceforgenet/docshtml#kmeans) function in comparison to mlpack's kmeans::KMeans<> class, which is about 2x to 6x slower Using the latest mlpack code from the git repo Am I doing something wrong?
Using the code below I get the following timings on my machine (Intel i5, 64 bit, g++ version 53)
Compiled without openmp:
`g++ kmeans_testcpp -o kmeans_test -O3 -std=c++11 -larmadillo -lmlpack`
mlpack_kmeans time: 173024
arma::kmeans time: 916399
Compiled with openmp:
`g++ kmeans_testcpp -o kmeans_test -O3 -std=c++11 -larmadillo -lmlpack -fopenmp`
mlpack_kmeans time: 177575
arma::kmeans time: 287675
```
#include <fstream>
#include <mlpack/methods/kmeans/kmeanshpp>
#include <armadillo>
int main() {
arma::uword dims = 20; // number of dimensions
arma::uword samples = 5000000;
arma::uword max_iterations = 10;
arma::uword k = 10; // number of clusters
arma::arma_rng::set_seed_random(); // random start
std::cout << "Generating some synthetic data " << std::endl;
arma::mat data(dims, samples, arma::fill::zeros);
// generate data with unique centroids, added with a small amount of noise
for (arma::uword i=0; i<samples; i++) {
arma::uword c = as_scalar( arma::randi<arma::uvec>(1, arma::distr_param(0,k-1)) );
datacol(i) = arma::linspace<arma::vec>(c, c+dims-1, dims) + 025*arma::randn<arma::vec>(dims);
}
arma::wall_clock timer;
std::cout << "mlpack_kmeans start " << std::endl;
arma::Row<size_t> mlpack_assignments;
arma::mat mlpack_centroids;
mlpack::kmeans::KMeans<> mlpack_kmeans(max_iterations);
timertic();
mlpack_kmeansCluster(data, k, mlpack_assignments, mlpack_centroids);
std::cout << "mlpack_kmeans time: " << timertoc() << std::endl;
std::cout << "---" << std::endl;
std::cout << "arma::kmeans start " << std::endl;
arma::mat arma_centroids;
timertic();
arma::kmeans(arma_centroids, data, k, arma::random_subset, max_iterations, false);
std::cout << "arma::kmeans time: " << timertoc() << std::endl;
std::cout << "---" << std::endl;
mlpack_centroidsprint("mlpack_centroids:");
arma_centroidsprint("arma_centroids:");
return 0;
}
```
---
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/issues/514
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160201/01f0a34a/attachment.html>
More information about the mlpack-git
mailing list