[mlpack-git] [mlpack/mlpack] Modeling LSH For Performance Tuning (#749)

Ryan Curtin notifications at github.com
Tue Aug 30 09:08:16 EDT 2016


> +  Timer::Start("pairwise_distances");
> +  arma::vec distances(numSamples * (numSamples - 1) / 2);
> +  size_t d = 0; // Index of where to store next.
> +  for (size_t i = 0; i < numSamples; ++i)
> +    for (size_t j = i + 1; j < numSamples; ++j)
> +      distances(d++) = metric::EuclideanDistance::Evaluate(
> +          sampleSet.unsafe_col(i), sampleSet.unsafe_col(j));
> +  Log::Info << "Computed " << d << " pointwise distances." << std::endl;
> +  Timer::Stop("pairwise_distances");
> +
> +  // Step 3. Estimate statistics of these distances: log(mean(d)), mean(log(d)),
> +  // mean(d).
> +  distances = arma::pow(distances, 2);
> +  this->meanDist = arma::mean(distances);
> +  this->logMeanDist = std::log(meanDist);
> +  this->meanLogDist = arma::mean(arma::log(distances));

I think for the geometric mean calculation, you should do the same hack as the one I suggested for the full distances distribution: "fake" a small nonzero value.  The gamma distribution doesn't have any probability mass at 0, so really it's not even the right distribution to use when there are zero distances.  But I'm not sure of what other alternatives might be, so I think just using this hack to make it work for now should be okay...

If you have a better idea for how to modify the distances away from 0, do it---the idea I wrote down was just the first thing I could think of that seemed to work.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/749/files/a0626a8c41fedc60ec255ef2939a519dfac5b83a#r76790331
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160830/8699ab78/attachment.html>


More information about the mlpack-git mailing list