<p>Oh, right, I did not think about the fact that different buckets have different numbers of points in them!  Now that I think of that, I do think that perhaps <code>std::vector&lt;size_t&gt;*</code> is the right way to go (or actually maybe <code>std::vector&lt;arma::Col&lt;size_t&gt;&gt;</code>).</p>


<p>I think that we can have the best of both worlds if we do it like this:</p>


<ul>

<li><p>Use <code>std::vector&lt;arma::Col&lt;size_t&gt;&gt;</code> for representing <code>secondHashTable</code> (this also avoids memory allocation, which is good---I am pretty sure your code had a subtle bug where the user could initialize the <code>LSHSearch</code> object without training, but then the destructor would still try to delete the <code>std::vector&lt;size_t&gt;*</code> object which would cause a crash).</p></li>

<li><p>Before filling <code>secondHashTable</code>, calculate the sizes of each bin (the code I wrote does this), truncating the length to <code>bucketSize</code>.  Then we can allocate the exact correct size for each <code>arma::Col&lt;size_t&gt;</code> (and also allocate exactly the right number of <code>arma::Col&lt;size_t&gt;</code>s), and then fill them like your code does.</p></li>

<li><p>When the object is constructed, if <code>bucketSize = 0</code>, set <code>bucketSize = referenceSet.n_cols</code>.</p></li>

</ul>


<p>What do you think, do you think this would work?  We would have to modify the serialization again, but I don't think we need to increment the version from 1 to 2 because we did not release mlpack with the serialization change we did before (which was the change from <code>std::vector&lt;arma::mat&gt;</code> to <code>arma::cube</code>).  I was going to try and release mlpack 2.0.2 today, but, if we are going to change serialization again I will wait on this otherwise we will end up with more-complex-than-necessary legacy code to handle. :)</p>


<blockquote>

<p>I can't see your changes any more because there's something wrong with the commits</p>

</blockquote>


<p>Yes, there was a force push to the repository to the state it was in about 20 days ago, but I restored the current state earlier today.  It seems like the PR interface has not been updated though, so it still shows way more commits.</p>


<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">&mdash;<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/675#issuecomment-223828754">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe/AJ4bFJ3YUOHC3H26h374SjDgLUVqU_92ks5qIxS9gaJpZM4IuAcT">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFB-vHZOo1WyFK-G-tIOJIHv1Wpa8ks5qIxS9gaJpZM4IuAcT.gif" width="1" /></p>

<div itemscope itemtype="http://schema.org/EmailMessage">

<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">

  <link itemprop="url" href="https://github.com/mlpack/mlpack/pull/675#issuecomment-223828754"></link>

  <meta itemprop="name" content="View Pull Request"></meta>

</div>

<meta itemprop="description" content="View this Pull Request on GitHub"></meta>

</div>