[mlpack-git] [mlpack/mlpack] Approximate Neighbor Search for Dual tree algorithms. (#684)

Fri Jun 10 15:25:02 EDT 2016

> @@ -150,6 +150,21 @@ class NearestNeighborSort
>        return DBL_MAX;
>      return a + b;
>    }
> +
> +  /**
> +   * Return the given value relaxed.
> +   *
> +   * @param value Value to relax.
> +   * @param epsilon Relative error (non-negative).
> +   *
> +   * @return double Value relaxed.
> +   */
> +  static inline double Relax(const double value, const double epsilon)
> +  {
> +    if (value == DBL_MAX)
> +      return DBL_MAX;
> +    return (1 / (1 + epsilon)) * value;

@rcurtin
Thanks for your reply. I am no sure about this...

I think this could be confusing for future users, because we would be adopting different approaches for KNN and KFN.

For KNN we consider the relative error of the *approximated value* against the *true value*. So, we assert that:
```
|  d(p_q, p_r)   -   d(p_q, p_r^*) |     /     |  d(p_q, p_r^*) |     <=   e
```
So, I suggest to take exactly the same approach for KFN. We make an assertion about the relative error of our *approximation* against the *true value*:
```
|  d(p_q, p_r)   -   d(p_q, p_r^*) |     /     |  d(p_q, p_r^*) |     <=   e
```

However, you propose to consider a different approach for KFN:
```
|  d(p_q, p_r)  - d(p_q, p_r^*) |     /     |  d(p_q, p_r) |     <=   e
```
This calculates the proportion in relation to the *approximation value*, instead of the *true value*.
Instead of asserting:
```
|  d(p_q, p_r)  - d(p_q, p_r^*) |    <=    e  *  |  d(p_q, p_r^*) |
```
You are asserting that:
```
|  d(p_q, p_r)  - d(p_q, p_r^*) |    <=    e  *  |  d(p_q, p_r) |
```

So, these are different approches.
I have been looking for documentation of furthest neighbor search. Both approaches are considered:

Using relative error as I propose:

+ Farthest neighbors, maximum spanning trees and related problems in higher dimensions. [[ref]](http://www.sciencedirect.com/science/article/pii/0925772192900019)

+ Approximate Weighted Farthest Neighbors and Minimum Dilation Stars. [[ref]](https://arxiv.org/pdf/cs/0602029.pdf)

Considering approximation as you proposes:

+ Reductions Among High Dimensional Proximity Problems. [[ref]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.6492&rep=rep1&type=pdf)
 Defines the c-Furthest Neighbor Search Problem (c-FNS).

+ Approximate Furthest Neighbor in High Dimensions. [[ref]](http://www.itu.dk/people/jovt/main.pdf)
 Defines c-approximate furthest neighbor (c-AFN).

Personally, I would prefer the first approach, because it mentions the relative error of the *approximation* against the *true value*. This is similar to the definition of relative error: [[ref]](https://en.wikipedia.org/wiki/Approximation_error#Formal_Definition)
Also, it is the same as for KNN...

Of course, if you prefer I can change it.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/684/files/07879a2cc79b35b10d7fae687d6e27ad90a9f2d7#r66668125
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160610/89481214/attachment-0001.html>