[mlpack-git] [mlpack/mlpack] Modeling LSH For Performance Tuning (#749)
Yannis Mentekidis
notifications at github.com
Tue Aug 30 05:34:46 EDT 2016
> + Timer::Start("pairwise_distances");
> + arma::vec distances(numSamples * (numSamples - 1) / 2);
> + size_t d = 0; // Index of where to store next.
> + for (size_t i = 0; i < numSamples; ++i)
> + for (size_t j = i + 1; j < numSamples; ++j)
> + distances(d++) = metric::EuclideanDistance::Evaluate(
> + sampleSet.unsafe_col(i), sampleSet.unsafe_col(j));
> + Log::Info << "Computed " << d << " pointwise distances." << std::endl;
> + Timer::Stop("pairwise_distances");
> +
> + // Step 3. Estimate statistics of these distances: log(mean(d)), mean(log(d)),
> + // mean(d).
> + distances = arma::pow(distances, 2);
> + this->meanDist = arma::mean(distances);
> + this->logMeanDist = std::log(meanDist);
> + this->meanLogDist = arma::mean(arma::log(distances));
This works. I'll have to test how it affects correctness once the other details are ironed out.
I wonder if my way of calculating the geometric mean creates the problem:
I use the exp(log( prod(...) )) to compute the geometric mean. That is correct assuming the products (i.e. distances) are strictly positive numbers, since the logarithm is undefined for <= 0. The geometric mean, though, is defined for 0s as well, but it is simply equal to 0.
The authors haven't specified this, and I would expect the gamma distribution with geometric mean equal to 0 to be undefined, but do you think simply setting geometric means to 0 in this case be sufficient?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/749/files/a0626a8c41fedc60ec255ef2939a519dfac5b83a#r76761769
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160830/15424303/attachment.html>
More information about the mlpack-git
mailing list