<p>In <a href="https://github.com/mlpack/mlpack/pull/749#discussion_r76360737">src/mlpack/methods/lsh/lshmodel_impl.hpp</a>:</p>
<pre style='color:#555'>> +
> + // Reference set for kNN
> + arma::mat refMat = sampleSet.cols(refSetStart, refSetEnd);
> + referenceSizes(i) = refMat.n_cols;
> +
> + arma::Mat<size_t> neighbors; // Not going to be used but required.
> + arma::mat kNNDistances; // What we need.
> + KNN naive(refMat, true); // true: train and use naive kNN.
> + naive.Search(queryMat, k, neighbors, kNNDistances);
> +
> + // Store the squared distances (what we need).
> + kNNDistances = arma::pow(kNNDistances, 2);
> +
> + // Compute Arithmetic and Geometric mean of the distances.
> + Ek.row(i) = arma::mean(kNNDistances.t());
> + Gk.row(i) = arma::exp(arma::mean(arma::log(kNNDistances.t()), 0));
</pre>
<p>I took a look at the most current code, and I see you are doing <code>find(kNNDistances > 0)</code>, but I don't think this will adequately filter duplicates. Tomorrow I'll try and think about a good way to filter duplicate points; probably the best time to do that is during the calculation of the kNN distances matrix. (i.e. if we encounter a zero distance, clear the row/column of the matrix and skip to the next point) I need to think a little bit more about it...</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/749/files/57c9d5e634d7d3d7e2ca1618353fe37d9e23b34a#r76360737">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AJ4bFCRC5fMHx4GjenrtDID6u7mMhATHks5qjmGngaJpZM4JczVR">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFIignWSQ5P69r-IK9SQ117dbbd6dks5qjmGngaJpZM4JczVR.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/mlpack/mlpack/pull/749/files/57c9d5e634d7d3d7e2ca1618353fe37d9e23b34a#r76360737"></link>
<meta itemprop="name" content="View Pull Request"></meta>
</div>
<meta itemprop="description" content="View this Pull Request on GitHub"></meta>
</div>
<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/mlpack/mlpack","title":"mlpack/mlpack","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/mlpack/mlpack"}},"updates":{"snippets":[{"icon":"PERSON","message":"@rcurtin in #749: I took a look at the most current code, and I see you are doing `find(kNNDistances \u003e 0)`, but I don't think this will adequately filter duplicates. Tomorrow I'll try and think about a good way to filter duplicate points; probably the best time to do that is during the calculation of the kNN distances matrix. (i.e. if we encounter a zero distance, clear the row/column of the matrix and skip to the next point) I need to think a little bit more about it..."}],"action":{"name":"View Pull Request","url":"https://github.com/mlpack/mlpack/pull/749/files/57c9d5e634d7d3d7e2ca1618353fe37d9e23b34a#r76360737"}}}</script>