<p>In <a href="https://github.com/mlpack/mlpack/pull/726#discussion_r75019892">src/mlpack/core/tree/binary_space_tree/rp_tree_mean_split_impl.hpp</a>:</p>

<pre style='color:#555'>&gt; +template&lt;typename BoundType, typename MatType&gt;

&gt; +bool RPTreeMeanSplit&lt;BoundType, MatType&gt;::SplitNode(const BoundType&amp;  bound,

&gt; +                                                  MatType&amp; data,

&gt; +                                                  const size_t begin,

&gt; +                                                  const size_t count,

&gt; +                                                  SplitInfo&amp; splitInfo)

&gt; +{

&gt; +  const size_t maxNumSamples = 100;

&gt; +  const size_t numSamples = std::min(maxNumSamples, count);

&gt; +  arma::uvec samples;

&gt; +

&gt; +  // Get no more than numSamples distinct samples.

&gt; +  math::ObtainDistinctSamples(begin, begin + count, numSamples, samples);

&gt; +

&gt; +  // Find the average distance between points.

&gt; +  ElemType averageDistanceSq = GetAveragePointDistance(data, samples);

</pre>

<p>Continuing with the discussion about whether the theory is still valid with your modifications, I think that you are right to do sampling to estimate statistics of the dataset.  An exact approach would be quadratic time!  This would yield a total tree building time of something like O(N^2 log N).  So I think that implementing the tree exactly as given in the paper is not really feasible.</p>

<p>It should be possible to re-derive the theorems, taking into account the sampling you are doing here, without affecting the results significantly (only the success probability).  Don't feel obligated to do that---I am mostly just thinking out loud here about the implications. :)</p>

<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">&mdash;<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/726/files/ea66e5d17914460289974cc61f0669941edc2524#r75019892">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AJ4bFIOwjuh_v7ExGARqqsQQaK8u82UZks5qgiX3gaJpZM4JOuGE">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFJx-39gy3uPgYZbWqjt6mmzuCtcrks5qgiX3gaJpZM4JOuGE.gif" width="1" /></p>

<div itemscope itemtype="http://schema.org/EmailMessage">

<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">

  <link itemprop="url" href="https://github.com/mlpack/mlpack/pull/726/files/ea66e5d17914460289974cc61f0669941edc2524#r75019892"></link>

  <meta itemprop="name" content="View Pull Request"></meta>

</div>

<meta itemprop="description" content="View this Pull Request on GitHub"></meta>

</div>

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/mlpack/mlpack","title":"mlpack/mlpack","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/mlpack/mlpack"}},"updates":{"snippets":[{"icon":"PERSON","message":"@rcurtin in #726: Continuing with the discussion about whether the theory is still valid with your modifications, I think that you are right to do sampling to estimate statistics of the dataset.  An exact approach would be quadratic time!  This would yield a total tree building time of something like O(N^2 log N).  So I think that implementing the tree exactly as given in the paper is not really feasible.\r\n\r\nIt should be possible to re-derive the theorems, taking into account the sampling you are doing here, without affecting the results significantly (only the success probability).  Don't feel obligated to do that---I am mostly just thinking out loud here about the implications. :)"}],"action":{"name":"View Pull Request","url":"https://github.com/mlpack/mlpack/pull/726/files/ea66e5d17914460289974cc61f0669941edc2524#r75019892"}}}</script>