[mlpack-git] [mlpack/mlpack] Random projection trees (#726)
Ryan Curtin
notifications at github.com
Tue Aug 16 17:04:55 EDT 2016
> +template<typename BoundType, typename MatType>
> +bool RPTreeMeanSplit<BoundType, MatType>::SplitNode(const BoundType& bound,
> + MatType& data,
> + const size_t begin,
> + const size_t count,
> + SplitInfo& splitInfo)
> +{
> + const size_t maxNumSamples = 100;
> + const size_t numSamples = std::min(maxNumSamples, count);
> + arma::uvec samples;
> +
> + // Get no more than numSamples distinct samples.
> + math::ObtainDistinctSamples(begin, begin + count, numSamples, samples);
> +
> + // Find the average distance between points.
> + ElemType averageDistanceSq = GetAveragePointDistance(data, samples);
Continuing with the discussion about whether the theory is still valid with your modifications, I think that you are right to do sampling to estimate statistics of the dataset. An exact approach would be quadratic time! This would yield a total tree building time of something like O(N^2 log N). So I think that implementing the tree exactly as given in the paper is not really feasible.
It should be possible to re-derive the theorems, taking into account the sampling you are doing here, without affecting the results significantly (only the success probability). Don't feel obligated to do that---I am mostly just thinking out loud here about the implications. :)
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/726/files/ea66e5d17914460289974cc61f0669941edc2524#r75019892
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160816/1347b336/attachment.html>
More information about the mlpack-git
mailing list