[mlpack-git] [mlpack/mlpack] Spill Trees implementation. (#728)

Mon Aug 1 13:16:32 EDT 2016

Hi @sumedhghaisas @rcurtin,

I have implemented Spill trees with axis-parallel splitting hyperplanes. I have made an effort to avoid duplicating existing code for neighbor search.

I created a new class `SpillSearch` that provides an interface similar to `NeighborSearch` but with an extension to properly set the tau parameter. It encapsulates an instance of `NeighborSearch`.

Also, I have implemented a new version of `NeighborSearchRules` specialized for Spill Trees, because I needed to modify the methods:
 + `Score()` to consider splitting hyperplanes for overlapping nodes.
 + `CalculateBounds()` to ignore B_2 bound (we can not use B_2 bound for Spill Trees).

## Single Tree Search:

The `SingleTreeTraverser` is similar to the implementation for `BinarySpaceTree`.
The difference is in the implementation of `NeighborSearchRules` for SpillTrees.
When calculating the score of a query point and a reference node I consider 2 cases:
  + If the reference node is non-overlapping, I calculate the score the same than before.
  + If the reference node is overlapping, I analyze the reference node's half space. If it contains the given query point, I return 0 (best score). Else, I return DBL_MAX (prune).

## Dual Tree Search:

The Query tree is built without overlapping.

When calculating the score of a query node and a reference node, I consider 2 cases:
  + If the reference node is a non-overlapping node, I calculate the score the same as before.
  + If the reference node is a overlapping node, I analyze query node's bounding box. If it intersects the reference node's half space, I return 0 (best score). Else, I return DBL_MAX (prune).

The `DualTreeTraverser` is slightly different to the implementation for `BinarySpaceTree`. When referenceNode is a overlapping node and we can't decide which child node to traverse, this means that queryNode is at both sides of the splitting hyperplane, we analyse the queryNode:
 + If queryNode is a non-leaf node, I recurse down the query node.
 + If queryNode is a leaf node, I do single tree search for each point in the query node.

The `DualTreeTraverser` is faster than the `SingleTreeTraverser`. Specially when the value of tau increases, because we will have more non-overlapping nodes which implies more time involved in backtracking.

The extension was incorporated to existing *mlpack_knn*. With actual implementation, we can use *"-t spill"* to consider spill trees and *"--tau 0.1"* to set different values for the overlapping size (default value is tau=0).

I have made a pull request with this implementation in: https://github.com/mlpack/mlpack/pull/747.

If you agree, I plan to work next days in these topics:
 + Implement another version of SpillTrees to consider non-axis-parallel hyperplanes, using `BallBound` instead of `HrectBound`, and holding a projection vector in each node.
 + Add a command line flag *"--get_real_error"* to compare the approximate neighbor search against the naive method and print the real relative error, so we can test with differents values of tau and see the difference.

Every feedback is welcome! :)

Thanks

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/issues/728#issuecomment-236645481
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160801/6b119387/attachment-0001.html>