[mlpack-git] [mlpack/mlpack] Modeling LSH For Performance Tuning (#749)

Ryan Curtin notifications at github.com
Tue Aug 23 14:40:00 EDT 2016

> +  maxKValue = k;
> +
> +  // Save pointer to training set.
> +  this->referenceSet = &referenceSet;
> +
> +  // Step 1. Select a random sample of the dataset. We will work with only that
> +  // sample.
> +  arma::vec sampleHelper(referenceSet.n_cols, arma::fill::randu);
> +
> +  // Keep a sample of the dataset: We have uniformly random numbers in [0, 1],
> +  // so we expect about N*sampleRate of them to be in [0, sampleRate).
> +  arma::mat sampleSet = referenceSet.cols(
> +        arma::find(sampleHelper < sampleRate));
> +  // Shuffle to be impartial (in case dataset is sorted in some way).
> +  sampleSet = arma::shuffle(sampleSet);
> +  const size_t numSamples = sampleSet.n_cols; // Points in sampled set.

Are you sampling with or without replacement?  If you're sampling without replacement (I don't think that's the case based on the code here) you can use `math::ObtainDistinctSamples()` from somewhere in `core/math/`.  Otherwise it might be better to simply keep a list of indices of samples, and don't actually extract it from the original matrix.  Then later you can use that vector of indices to create a non-contiguous matrix subview, like this:

extern arma::uvec indices; // This has already been filled with stuff.
extern arma::mat dataset; // This is our dataset.
dataset.cols(indices); // Returns all the columns we're interested in.

This is a pretty low-priority comment, though, so don't worry too heavily about it, only if you want to.  I'd say testing is higher priority. :)

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160823/bc4a3298/attachment.html>

More information about the mlpack-git mailing list