[mlpack-git] [mlpack/mlpack] mlpack KMeans class much slower than armadillo kmeans() (#514)

Tue Jun 28 04:46:36 EDT 2016

Infact, dual-tree or pelleg-moore(same as x-means) are really fast enough!!!
The implement of naive-kmean now is not suitable for parallel using omp, it's “textbook style”， but also good for small dataset ：）
**I changed the details of NaiveKMeans::Iterate method directly to use arma::kmeans**

```cpp
template<typename MetricType, typename MatType>
double NaiveKMeans<MetricType, MatType>::Iterate(const arma::mat& centroids, arma::mat& newCentroids, arma::Col<size_t>& counts) {
    counts.zeros(centroids.n_cols); // never used, in fact
    newCentroids = centroids;

    arma::kmeans(newCentroids, dataset, centroids.n_cols, arma::keep_existing, 10, false);
    Log::Assert(newCentroids.n_cols == centroids.n_cols);

    // Now normalize the centroid.
    distanceCalculations += centroids.n_cols * dataset.n_cols;

    // Calculate cluster distortion for this iteration.
    double cNorm = 0.0;
    for (size_t i = 0; i < centroids.n_cols; ++i) {
        cNorm += std::pow(metric.Evaluate(centroids.col(i), newCentroids.col(i)), 2.0);
    }
    distanceCalculations += centroids.n_cols;

    return std::sqrt(cNorm);
}
```

Now it's as faster as armadillo's
But this should not be a correct, good or final solution *_*

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/issues/514#issuecomment-228989205
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160628/854ad967/attachment.html>