<p>In <a href="https://github.com/mlpack/mlpack/pull/747#discussion_r73954733">src/mlpack/core/tree/spill_tree/spill_tree_impl.hpp</a>:</p>
<pre style='color:#555'>> + }
> +
> + std::vector<size_t> leftPoints, rightPoints;
> + // Split the node.
> + overlappingNode = SplitPoints(tau, rho, points, leftPoints, rightPoints);
> +
> + // We don't need the information in points, so lets clean it.
> + std::vector<size_t>().swap(points);
> +
> + // Now we will recursively split the children by calling their constructors
> + // (which perform this splitting process).
> + left = new SpillTree(this, leftPoints, tau, maxLeafSize, rho);
> + right = new SpillTree(this, rightPoints, tau, maxLeafSize, rho);
> +
> + // Update count number, to represent the number of descendant points.
> + count = left->NumDescendants() + right->NumDescendants();
</pre>
<p>Sometimes you want to sample descendant points from a node. Rank-approximate nearest neighbor search (<code>src/mlpack/methods/rann/</code>) does this. So you would just sample uniformly from <code>i</code> in [0, <code>NumDescendants()</code>) and then take <code>Descendant(i)</code> as your random point. But if descendants are not unique (that is, if they are double-counted), then you get a biased random sample. In this case, points in the spill region are twice as likely to be sampled. Let me know if I can clarify further.</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/747/files/a71b57caa90311f5542180bc0553449c3691395d#r73954733">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AJ4bFBDnCVnoXZeT9rC7Fy3_eaLD6Kfrks5qd5pcgaJpZM4JZzLU">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFHj5nxJeoVcR7uGAwCFaOrYuWjFJks5qd5pcgaJpZM4JZzLU.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/mlpack/mlpack/pull/747/files/a71b57caa90311f5542180bc0553449c3691395d#r73954733"></link>
<meta itemprop="name" content="View Pull Request"></meta>
</div>
<meta itemprop="description" content="View this Pull Request on GitHub"></meta>
</div>
<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/mlpack/mlpack","title":"mlpack/mlpack","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/modules/aws/aws-bg.jpg","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/mlpack/mlpack"}},"updates":{"snippets":[{"icon":"PERSON","message":"@rcurtin in #747: Sometimes you want to sample descendant points from a node. Rank-approximate nearest neighbor search (`src/mlpack/methods/rann/`) does this. So you would just sample uniformly from `i` in [0, `NumDescendants()`) and then take `Descendant(i)` as your random point. But if descendants are not unique (that is, if they are double-counted), then you get a biased random sample. In this case, points in the spill region are twice as likely to be sampled. Let me know if I can clarify further."}],"action":{"name":"View Pull Request","url":"https://github.com/mlpack/mlpack/pull/747/files/a71b57caa90311f5542180bc0553449c3691395d#r73954733"}}}</script>