<blockquote>
<p>Did you try any other parallelization strategies? i.e. the for loops inside of SplitNode(), or something like that.</p>
</blockquote>
<p>If you mean FindSplit(), yes, I tried parallelizing across dimensions, but it was slower. I suspect this is because FindSplit, while called many times, is relatively quick (even with the sort), and the overhead of setting up the parallel for wasn't worth it.</p>
<p>I also tried parallelizing the ComputeValue loops in Trainer. Again, it was slower. I suspect this is because ComputeValue is fast anyway and the pointer chasing could have thrashed the cache (that's very speculative, though).</p>
<p>Of course, these might be very parallelizable operations that I bungled.</p>
<blockquote>
<p>What's the overhead for the OpenMP-ized code using only one core (do you happen to know)?</p>
</blockquote>
<p>I compared the serial code with the parallel code restricted to one thread and there was no significant difference (1 thread parallel version was 1% faster in total runtime).</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br>Reply to this email directly or <a href="https://github.com/mlpack/mlpack/pull/438#issuecomment-103652980">view it on GitHub</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFD32DRNdOd1pJ6phK1_AkTBZKi0Hks5oK49XgaJpZM4Ea3Qy.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/mlpack/mlpack/pull/438#issuecomment-103652980"></link>
<meta itemprop="name" content="View Pull Request"></meta>
</div>
<meta itemprop="description" content="View this Pull Request on GitHub"></meta>
</div>