[mlpack-git] [mlpack/mlpack] Approximate Neighbor Search for Dual tree algorithms. (#684)

Fri Jun 10 16:29:41 EDT 2016

> @@ -150,6 +150,21 @@ class NearestNeighborSort
>        return DBL_MAX;
>      return a + b;
>    }
> +
> +  /**
> +   * Return the given value relaxed.
> +   *
> +   * @param value Value to relax.
> +   * @param epsilon Relative error (non-negative).
> +   *
> +   * @return double Value relaxed.
> +   */
> +  static inline double Relax(const double value, const double epsilon)
> +  {
> +    if (value == DBL_MAX)
> +      return DBL_MAX;
> +    return (1 / (1 + epsilon)) * value;

You are right, there are multiple ways to consider the error here.  I can see the advantages and disadvantages of each approach, and I think that with all those in mind I can't have a strong opinion.  Looking at both options, if we go with true-relative-approximate (which is what I'll call what you are suggesting), then for KFN, we can only have 0 <= epsilon < 1, but with KNN we can have 0 <= epsilon < DBL_MAX.  If we go with approximate-relative-approximate for KFN, then epsilon can be in the same range as for KNN.  Because both techniques operate just fine, the question we should be asking is, exactly how are most users going to want to specify their problem?

In the options that we have, if the user wants an approximate furthest neighbor that is at least 90% of the distance as the true furthest neighbor, then they would specify e = 0.1 with true-relative-approximate and e = 0.11111... with approximate-relative-approximate.  Or, to generalize, if the user wants an approximate furthest neighbor that is at least x% of the distance as the true furthest neighbor, then they would specify e = (1 - x/100) for true-relative-approximate and e = ((1 - x/100) / (x/100)) for approximate-relative-approximate.  It would be hard to say that the second approach is more intuitive than the first for a user. :)

But, I'm going to go one step further, and suggest that maybe, for KFN, what we should be taking in is not epsilon but instead (1 - epsilon), so if the user wants furthest neighbors that are at least x% of the distance as the true furthest neighbor, they just pass in "x/100".  So valid settings of the approximation parameter would be between 0 (non-inclusive) and 1 (no approximation).  Maybe we should consider this for KNN too?  The user would pass in a value not between 0 and DBL_MAX but between 1 and DBL_MAX, where 1 indicates no approximation.  What do you think?

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/684/files/07879a2cc79b35b10d7fae687d6e27ad90a9f2d7#r66676262
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160610/b1e6b54d/attachment.html>