# [mlpack-git] [mlpack/mlpack] Modeling LSH For Performance Tuning (#749)

Tue Aug 30 05:34:46 EDT 2016

```> +  Timer::Start("pairwise_distances");
> +  arma::vec distances(numSamples * (numSamples - 1) / 2);
> +  size_t d = 0; // Index of where to store next.
> +  for (size_t i = 0; i < numSamples; ++i)
> +    for (size_t j = i + 1; j < numSamples; ++j)
> +      distances(d++) = metric::EuclideanDistance::Evaluate(
> +          sampleSet.unsafe_col(i), sampleSet.unsafe_col(j));
> +  Log::Info << "Computed " << d << " pointwise distances." << std::endl;
> +  Timer::Stop("pairwise_distances");
> +
> +  // Step 3. Estimate statistics of these distances: log(mean(d)), mean(log(d)),
> +  // mean(d).
> +  distances = arma::pow(distances, 2);
> +  this->meanDist = arma::mean(distances);
> +  this->logMeanDist = std::log(meanDist);
> +  this->meanLogDist = arma::mean(arma::log(distances));

This works. I'll have to test how it affects correctness once the other details are ironed out.

I wonder if my way of calculating the geometric mean creates the problem:

I use the exp(log( prod(...) ))  to compute the geometric mean. That is correct assuming the products (i.e. distances) are strictly positive numbers, since the logarithm is undefined for <= 0. The geometric mean, though, is defined for 0s as well, but it is simply equal to 0.
The authors haven't specified this, and I would expect the gamma distribution with geometric mean equal to 0 to be undefined, but do you think simply setting geometric means to 0 in this case be sufficient?

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/749/files/a0626a8c41fedc60ec255ef2939a519dfac5b83a#r76761769
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160830/15424303/attachment.html>
```