[mlpack-git] [mlpack/mlpack] Modeling LSH For Performance Tuning (#749)
notifications at github.com
Wed Aug 24 11:18:18 EDT 2016
> + maxKValue = k;
> + // Save pointer to training set.
> + this->referenceSet = &referenceSet;
> + // Step 1. Select a random sample of the dataset. We will work with only that
> + // sample.
> + arma::vec sampleHelper(referenceSet.n_cols, arma::fill::randu);
> + // Keep a sample of the dataset: We have uniformly random numbers in [0, 1],
> + // so we expect about N*sampleRate of them to be in [0, sampleRate).
> + arma::mat sampleSet = referenceSet.cols(
> + arma::find(sampleHelper < sampleRate));
> + // Shuffle to be impartial (in case dataset is sorted in some way).
> + sampleSet = arma::shuffle(sampleSet);
> + const size_t numSamples = sampleSet.n_cols; // Points in sampled set.
I think it's without replacement: I generate uniform numbers in [0, 1] and then threshold at the sample rate, getting a vector of booleans. I keep only columns (so, points) that have "true" in the corresponding vector position:
In matlab/pseudocode it would be:
sampleRate = 0.3;
referenceSet = [
1 3 5 7;
2 4 6 8;
sampleHelper = [0.1 0.3 0.7 0.05];
sampleHelper = sampleHelper > sampleRate;
% So here sampleHelper = [0 0 1 0]
sampleSet = referenceSet.cols(sampleHelper);
% and therefore sampleSet = [5; 6] - only column 3
Is there something I don't see here?
I didn't know about `ObtainDistinctSamples()`, I think that will make the code cleaner so I'll refactor it to use that instead.
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mlpack-git