<p>Oh, right, I did not think about the fact that different buckets have different numbers of points in them! Now that I think of that, I do think that perhaps <code>std::vector<size_t>*</code> is the right way to go (or actually maybe <code>std::vector<arma::Col<size_t>></code>).</p>
<p>I think that we can have the best of both worlds if we do it like this:</p>
<ul>
<li><p>Use <code>std::vector<arma::Col<size_t>></code> for representing <code>secondHashTable</code> (this also avoids memory allocation, which is good---I am pretty sure your code had a subtle bug where the user could initialize the <code>LSHSearch</code> object without training, but then the destructor would still try to delete the <code>std::vector<size_t>*</code> object which would cause a crash).</p></li>
<li><p>Before filling <code>secondHashTable</code>, calculate the sizes of each bin (the code I wrote does this), truncating the length to <code>bucketSize</code>. Then we can allocate the exact correct size for each <code>arma::Col<size_t></code> (and also allocate exactly the right number of <code>arma::Col<size_t></code>s), and then fill them like your code does.</p></li>
<li><p>When the object is constructed, if <code>bucketSize = 0</code>, set <code>bucketSize = referenceSet.n_cols</code>.</p></li>
</ul>
<p>What do you think, do you think this would work? We would have to modify the serialization again, but I don't think we need to increment the version from 1 to 2 because we did not release mlpack with the serialization change we did before (which was the change from <code>std::vector<arma::mat></code> to <code>arma::cube</code>). I was going to try and release mlpack 2.0.2 today, but, if we are going to change serialization again I will wait on this otherwise we will end up with more-complex-than-necessary legacy code to handle. :)</p>
<blockquote>
<p>I can't see your changes any more because there's something wrong with the commits</p>
</blockquote>
<p>Yes, there was a force push to the repository to the state it was in about 20 days ago, but I restored the current state earlier today. It seems like the PR interface has not been updated though, so it still shows way more commits.</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/675#issuecomment-223828754">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe/AJ4bFJ3YUOHC3H26h374SjDgLUVqU_92ks5qIxS9gaJpZM4IuAcT">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFB-vHZOo1WyFK-G-tIOJIHv1Wpa8ks5qIxS9gaJpZM4IuAcT.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/mlpack/mlpack/pull/675#issuecomment-223828754"></link>
<meta itemprop="name" content="View Pull Request"></meta>
</div>
<meta itemprop="description" content="View this Pull Request on GitHub"></meta>
</div>