[mlpack-git] [mlpack/mlpack] Approximate Neighbor Search for Dual tree algorithms. (#684)

Sat Jun 11 15:27:46 EDT 2016

> @@ -150,6 +150,21 @@ class NearestNeighborSort
>        return DBL_MAX;
>      return a + b;
>    }
> +
> +  /**
> +   * Return the given value relaxed.
> +   *
> +   * @param value Value to relax.
> +   * @param epsilon Relative error (non-negative).
> +   *
> +   * @return double Value relaxed.
> +   */
> +  static inline double Relax(const double value, const double epsilon)
> +  {
> +    if (value == DBL_MAX)
> +      return DBL_MAX;
> +    return (1 / (1 + epsilon)) * value;

@rcurtin, thanks for your clarification! :)

Yeah, I definitely agree in that we should focus on what most users find more intuitive.

I have been reading the documentation of other libraries available for Aproximate Nearest Neighbor. FLANN and ANN consider an epsilon value in the range: [ 0 , +DBL_MAX ), using it for true-relative-approximate. Maybe we should take a similar approach for KNN? wouldn't it be *more intuitive* for new users?

I couldn't find any library for Approximate Furthest Neighbor... So, I don't think any new user will have a previous idea of what the epsilon value means for KFN... So, probably, if we make it clear in the documentation, everyone will manage to properly specify their problems, no matter which approach we adopt...

So, the possible approaches are:

+ **Option 1:**
    - For KNN: Consider epsilon in [ 0 , + DBL_MAX ) (as FLANN/ANN) for true-relative-error.
    - For KFN: Consider epsilon in [ 0 , + DBL_MAX ) for true-relative-error. For all values of epsilon >= 1, the algorithm will behave in the same way....

  Advantage: epsilon represents the same concept for KNN and KFN.

+ **Option 2:**
    - KNN: Consider epsilon in [ 0 , + DBL_MAX ) (as FLANN/ANN) for true-relative-error.
    - KFN: Consider epsilon in ( 0 , 1 ], "if the user wants furthest neighbors that are at least x% of the distance as the true furthest neighbor, they just pass in "x/100" ".

   Disadvantage: maybe, it can be confusing that in one case e=0 means no approximation, and in the other case, e=1 means no approximation.

+ **Option 3:**
    - KNN: Consider epsilon in [ 1 , + DBL_MAX ) .
    - KFN: Consider epsilon in ( 0 , 1 ] .  

  Advantage: in both cases e = 1 means no approximation.

  Disadvantage: we take a different approach to existing libraries for Approx. NN, such as ANN and FLANN, and existing papers...
  Also, epsilon represents different concepts for KNN and for KFN...

+ **Option 4:** Use Option 1 for the interface of the classes provided by MLPack, so it is clear what epsilon represents. For example, for the NeighborSearch class, epsilon parameter in the Search() method represents always the same concept, the true-relative-error, no matter the sort policy considered.
But, unlike Option 1, we would modify the command line tool for KFN, so we provide a more intuitive option, with a different name to "epsilon" to avoid confusion.... For example, we could provide a different option "-p" that specifies the percentage of the approximation against the true value (between 0% and 100%). It could be translated in the kfn_main, before using the NeighborSearch class.

What do you think? @sumedhghaisas ? I don't have a strong preference on this. 

Maybe, I could ask in the mailing list to get more feedback from other users...

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/684/files/07879a2cc79b35b10d7fae687d6e27ad90a9f2d7#r66710500
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160611/7a1a8880/attachment.html>