[mlpack-git] (blog) master: Blog post week 10. (8250d76)

gitdub at mlpack.org gitdub at mlpack.org
Wed Aug 3 09:29:26 EDT 2016

Repository : https://github.com/mlpack/blog
On branch  : master
Link       : https://github.com/mlpack/blog/compare/bc591099531846e25421e201ebbc0e164897fbc4...8250d76a65e3c9cc8847b2427be2ecbf466d52e9


commit 8250d76a65e3c9cc8847b2427be2ecbf466d52e9
Author: MarcosPividori <marcos.pividori at gmail.com>
Date:   Wed Aug 3 10:29:08 2016 -0300

    Blog post week 10.


 content/blog/MarcosWeekTen.md | 44 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/content/blog/MarcosWeekTen.md b/content/blog/MarcosWeekTen.md
new file mode 100644
index 0000000..4415e8c
--- /dev/null
+++ b/content/blog/MarcosWeekTen.md
@@ -0,0 +1,44 @@
+Title: Approximate Nearest Neighbor Search - Week 10
+Date: 2016-08-03 09:00:00
+Tags: gsoc, knn, spill-tree
+Author: Marcos Pividori
+Last week, I have implemented Spill trees with axis-orthogonal splitting hyperplanes [[1]](https://github.com/MarcosPividori/mlpack/tree/spill-trees/src/mlpack/core/tree/spill_tree).
+I created a new class `SpillSearch` that provides an interface similar to `NeighborSearch` but with an extension to properly set the tau parameter. It encapsulates an instance of `NeighborSearch`.
+Also, I have implemented a new version of `NeighborSearchRules` specialized for Spill Trees, because I needed to modify the methods:
+ + `Score()` to consider splitting hyperplanes when calculating the score for overlapping nodes.
+ + `CalculateBounds()` to ignore $B_2$ bound (we can not use $B_2$ bound for Spill Trees).
+## Single Tree Search:
+The `SingleTreeTraverser` is similar to the implementation for `BinarySpaceTree`.
+The difference is in the implementation of `NeighborSearchRules` for SpillTrees.
+When calculating the score of a query point and a reference node I consider 2 cases:
+  + If the reference node is non-overlapping, I calculate the score the same than before.
+  + If the reference node is overlapping, I analyze the reference node's half space. If it contains the given query point, I return 0 (best score). Else, I return DBL_MAX (prune). As we consider axis-orthogonal splitting hyperplanes, we can decide which child node to traverse in O(1), analising the appropiate dimension.
+## Dual Tree Search:
+The Query tree is built without overlapping.
+When calculating the score of a query node and a reference node, I consider 2 cases:
+  + If the reference node is a non-overlapping node, I calculate the score the same as before.
+  + If the reference node is a overlapping node, I analyze query node's bounding box. If it intersects the reference node's half space, I return 0 (best score). Else, I return DBL_MAX (prune). As the reference spill tree is built with axis-orthogonal hyperplanes and the query tree considers hrect bounds, we can make this decision in O(1) time.
+The `DualTreeTraverser` is slightly different to the implementation for `BinarySpaceTree`. When referenceNode is a overlapping node and we can't decide which child node to traverse, this means that queryNode is at both sides of the splitting hyperplane, we analyse the queryNode:
+ + If queryNode is a non-leaf node, I recurse down the query node.
+ + If queryNode is a leaf node, I do single tree search for each point in the query node.
+The `DualTreeTraverser` is faster than the `SingleTreeTraverser`, specially when the value of tau increases.
+The extension was incorporated to existing *mlpack_knn*. With actual implementation, we can use *"-t spill"* to consider spill trees and *"--tau 0.1"* to set different values for the overlapping size (default value is tau=0).
+I have made a pull request with this implementation in: [[2]](https://github.com/mlpack/mlpack/pull/747).
+I plan to work next days in these topics:
+ + Implement another version of SpillTrees to consider general hyperplanes (not necessarily axis-orthogonal), using `BallBound` instead of `HrectBound`, and holding a projection vector in each node.
+ + Add a command line flag *"--get_real_error"* to compare the approximate neighbor search against the naive method and print the real relative error, so we can test with differents values of tau and see the difference.
+Follow the progress in: [[1]](https://github.com/MarcosPividori/mlpack/tree/spill-trees/src/mlpack/core/tree/spill_tree).

More information about the mlpack-git mailing list