<p>In <a href="https://github.com/mlpack/mlpack/pull/749#discussion_r76081866">src/mlpack/methods/lsh/lshmodel_impl.hpp</a>:</p>
<pre style='color:#555'>&gt; +
&gt; +    // Reference set for kNN
&gt; +    arma::mat refMat = sampleSet.cols(refSetStart, refSetEnd);
&gt; +    referenceSizes(i) = refMat.n_cols;
&gt; +
&gt; +    arma::Mat&lt;size_t&gt; neighbors; // Not going to be used but required.
&gt; +    arma::mat kNNDistances; // What we need.
&gt; +    KNN naive(refMat, true); // true: train and use naive kNN.
&gt; +    naive.Search(queryMat, k, neighbors, kNNDistances);
&gt; +
&gt; +    // Store the squared distances (what we need).
&gt; +    kNNDistances = arma::pow(kNNDistances, 2);
&gt; +
&gt; +    // Compute Arithmetic and Geometric mean of the distances.
&gt; +    Ek.row(i) = arma::mean(kNNDistances.t());
&gt; +    Gk.row(i) = arma::exp(arma::mean(arma::log(kNNDistances.t()), 0));
</pre>
<p>Here's the cause of the L_BFGS -NaN values:<br>
I compute the logarithm of the kNN distances, always assuming that there's no points that have distance 0. In the case of duplicate points, that is not a good assumption to make.<br>
The iris.csv datset that's included in mlpack has some duplicates:</p>

<pre><code>5.8,2.7,5.1,1.9 # repeated twice
4.9,3.1,1.5,0.1 # repeated three times
</code></pre>

<p>running <code>$sort iris.csv | uniq -c | awk '{print $1}' | sort | uniq</code> will print <code>1 2 3</code> meaning that's all the duplicates.</p>

<p>I think the correct approach here is to simply disregard 0-distances completely, by resizing the kNNDistances matrix to only hold positive entries.</p>

<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">&mdash;<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/749/files/57c9d5e634d7d3d7e2ca1618353fe37d9e23b34a#r76081866">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AJ4bFM1R74oDNGT1DrXF-ilbraPkJiYsks5qjGXbgaJpZM4JczVR">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFBrtG-vXigreAkxex8fGczYyB7vuks5qjGXbgaJpZM4JczVR.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
  <link itemprop="url" href="https://github.com/mlpack/mlpack/pull/749/files/57c9d5e634d7d3d7e2ca1618353fe37d9e23b34a#r76081866"></link>
  <meta itemprop="name" content="View Pull Request"></meta>
</div>
<meta itemprop="description" content="View this Pull Request on GitHub"></meta>
</div>

<script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/mlpack/mlpack","title":"mlpack/mlpack","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/mlpack/mlpack"}},"updates":{"snippets":[{"icon":"PERSON","message":"@mentekid in #749: Here's the cause of the L_BFGS -NaN values:\r\nI compute the logarithm of the kNN distances, always assuming that there's no points that have distance 0. In the case of duplicate points, that is not a good assumption to make.\r\nThe iris.csv datset that's included in mlpack has some duplicates:\r\n```\r\n5.8,2.7,5.1,1.9 # repeated twice\r\n4.9,3.1,1.5,0.1 # repeated three times\r\n```\r\nrunning `$sort iris.csv | uniq -c | awk '{print $1}' | sort | uniq` will print `1 2 3` meaning that's all the duplicates.\r\n\r\nI think the correct approach here is to simply disregard 0-distances completely, by resizing the kNNDistances matrix to only hold positive entries."}],"action":{"name":"View Pull Request","url":"https://github.com/mlpack/mlpack/pull/749/files/57c9d5e634d7d3d7e2ca1618353fe37d9e23b34a#r76081866"}}}</script>