[mlpack-git] [mlpack/mlpack] Random projection trees (#726)

Ryan Curtin notifications at github.com
Tue Aug 16 17:04:55 EDT 2016

> +template<typename BoundType, typename MatType>
> +bool RPTreeMeanSplit<BoundType, MatType>::SplitNode(const BoundType&  bound,
> +                                                  MatType& data,
> +                                                  const size_t begin,
> +                                                  const size_t count,
> +                                                  SplitInfo& splitInfo)
> +{
> +  const size_t maxNumSamples = 100;
> +  const size_t numSamples = std::min(maxNumSamples, count);
> +  arma::uvec samples;
> +
> +  // Get no more than numSamples distinct samples.
> +  math::ObtainDistinctSamples(begin, begin + count, numSamples, samples);
> +
> +  // Find the average distance between points.
> +  ElemType averageDistanceSq = GetAveragePointDistance(data, samples);

Continuing with the discussion about whether the theory is still valid with your modifications, I think that you are right to do sampling to estimate statistics of the dataset.  An exact approach would be quadratic time!  This would yield a total tree building time of something like O(N^2 log N).  So I think that implementing the tree exactly as given in the paper is not really feasible.

It should be possible to re-derive the theorems, taking into account the sampling you are doing here, without affecting the results significantly (only the success probability).  Don't feel obligated to do that---I am mostly just thinking out loud here about the implications. :)

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160816/1347b336/attachment.html>

More information about the mlpack-git mailing list