<p>In <a href="https://github.com/mlpack/mlpack/pull/747#discussion_r73720820">src/mlpack/core/tree/spill_tree/spill_tree_impl.hpp</a>:</p>

<pre style='color:#555'>&gt; +  left-&gt;ParentDistance() = leftParentDistance;

&gt; +  right-&gt;ParentDistance() = rightParentDistance;

&gt; +}

&gt; +

&gt; +template&lt;typename MetricType,

&gt; +         typename StatisticType,

&gt; +         typename MatType,

&gt; +         template&lt;typename HyperplaneMetricType&gt; class HyperplaneType,

&gt; +         template&lt;typename SplitMetricType, typename SplitMatType&gt;

&gt; +             class SplitType&gt;

&gt; +bool SpillTree&lt;MetricType, StatisticType, MatType, HyperplaneType, SplitType&gt;::

&gt; +    SplitPoints(const double tau,

&gt; +                const double rho,

&gt; +                const std::vector&lt;size_t&gt;&amp; points,

&gt; +                std::vector&lt;size_t&gt;&amp; leftPoints,

&gt; +                std::vector&lt;size_t&gt;&amp; rightPoints)

</pre>

<p>I have a comment about this function.  I think that this function can be pretty slow because of the large number of allocations that may be incurred as <code>std::vector</code> resizes itself when <code>push_back()</code> is called.  Preventing many allocations was a big concern when I implemented sparse matrices for Armadillo, and the typical strategy looked like this:</p>


<ul>

<li>take a first pass over the data, and determine how large the results need to be</li>

<li>take a second pass over the data and fill the results</li>

</ul>


<p>So in this case, in the first pass you could calculate <code>leftSize</code>, <code>leftFrontierSize</code>, <code>rightSize</code>, and <code>rightFrontierSize</code>, and cache the values of the projections for each point.  In the second pass, you could just use those cached values to assign points to the correct side.  Let me know what you think.  I think for big datasets, it will make a difference.</p>


<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">&mdash;<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/747/files/a71b57caa90311f5542180bc0553449c3691395d#r73720820">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AJ4bFFRY0e1STKzlmSb8bEv-DL_L5cpVks5qc2SkgaJpZM4JZzLU">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFO88sWeKH11ItKgkb9IJn0MHiSJ5ks5qc2SkgaJpZM4JZzLU.gif" width="1" /></p>

<div itemscope itemtype="http://schema.org/EmailMessage">

<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">

  <link itemprop="url" href="https://github.com/mlpack/mlpack/pull/747/files/a71b57caa90311f5542180bc0553449c3691395d#r73720820"></link>

  <meta itemprop="name" content="View Pull Request"></meta>

</div>

<meta itemprop="description" content="View this Pull Request on GitHub"></meta>

</div>


<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/mlpack/mlpack","title":"mlpack/mlpack","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/modules/aws/aws-bg.jpg","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/mlpack/mlpack"}},"updates":{"snippets":[{"icon":"PERSON","message":"@rcurtin in #747: I have a comment about this function.  I think that this function can be pretty slow because of the large number of allocations that may be incurred as `std::vector` resizes itself when `push_back()` is called.  Preventing many allocations was a big concern when I implemented sparse matrices for Armadillo, and the typical strategy looked like this:\r\n\r\n - take a first pass over the data, and determine how large the results need to be\r\n - take a second pass over the data and fill the results\r\n\r\nSo in this case, in the first pass you could calculate `leftSize`, `leftFrontierSize`, `rightSize`, and `rightFrontierSize`, and cache the values of the projections for each point.  In the second pass, you could just use those cached values to assign points to the correct side.  Let me know what you think.  I think for big datasets, it will make a difference."}],"action":{"name":"View Pull Request","url":"https://github.com/mlpack/mlpack/pull/747/files/a71b57caa90311f5542180bc0553449c3691395d#r73720820"}}}</script>