[mlpack-git] [mlpack/mlpack] Spill trees (#747)

MarcosPividori notifications at github.com
Mon Aug 1 13:15:43 EDT 2016


Hi @sumedhghaisas @rcurtin,

I have implemented Spill trees with axis-parallel splitting hyperplanes. I have made an effort to avoid duplicating existing code for neighbor search.

I created a new class `SpillSearch` that provides an interface similar to `NeighborSearch` but with an extension to properly set the tau parameter. It encapsulates an instance of `NeighborSearch`.

Also, I have implemented a new version of `NeighborSearchRules` specialized for Spill Trees, because I needed to modify the methods:
 + `Score()` to consider splitting hyperplanes for overlapping nodes.
 + `CalculateBounds()` to ignore B_2 bound (we can not use B_2 bound for Spill Trees).

## Single Tree Search:

The `SingleTreeTraverser` is similar to the implementation for `BinarySpaceTree`.
The difference is in the implementation of `NeighborSearchRules` for SpillTrees.
When calculating the score of a query point and a reference node I consider 2 cases:
  + If the reference node is non-overlapping, I calculate the score the same than before.
  + If the reference node is overlapping, I analyze the reference node's half space. If it contains the given query point, I return 0 (best score). Else, I return DBL_MAX (prune).


## Dual Tree Search:

The Query tree is built without overlapping.

When calculating the score of a query node and a reference node, I consider 2 cases:
  + If the reference node is a non-overlapping node, I calculate the score the same as before.
  + If the reference node is a overlapping node, I analyze query node's bounding box. If it intersects the reference node's half space, I return 0 (best score). Else, I return DBL_MAX (prune).

The `DualTreeTraverser` is slightly different to the implementation for `BinarySpaceTree`. When referenceNode is a overlapping node and we can't decide which child node to traverse, this means that queryNode is at both sides of the splitting hyperplane, we analyse the queryNode:
 + If queryNode is a non-leaf node, I recurse down the query node.
 + If queryNode is a leaf node, I do single tree search for each point in the query node.

The `DualTreeTraverser` is faster than the `SingleTreeTraverser`. Specially when the value of tau increases, because we will have more non-overlapping nodes which implies more time involved in backtracking.

The extension was incorporated to existing *mlpack_knn*. With actual implementation, we can use *"-t spill"* to consider spill trees and *"--tau 0.1"* to set different values for the overlapping size (default value is tau=0).

Every feedback is welcome! :)

Thanks
You can view, comment on, or merge this pull request online at:

  https://github.com/mlpack/mlpack/pull/747

-- Commit Summary --

  * Add Spill Trees, based on binary space trees.
  * Fix details.
  * Improve expansion of node's bound. Do not include the overlapping buffer in the bounds of both nodes.
  * Fix simple errors.
  * Add support for Hybrid SP-Tree Search (Single tree traverser).
  * Add support for spill trees in knn search.
  * Syntax details.
  * Fix error in the order of parameters.
  * Add tests for spill tree.
  * Add approximate knn search tests for hybrid spill trees.
  * Add exact knn search tests for hybrid spill trees.
  * Include missing headers.
  * Use a simple tree traverser for spill tree (Remove defeatist seach included in single tree traverser).
  * Implement defeatist search in the Rescore() method, with a specialization for Spill Trees.
  * Remove unnecessary BoundType template parameter.
  * Record splitDimension and splitValue.
  * Set Overlapping node when percentage greater than rho but not because of
  * Add NeighborSearchRules specialization for Spill Trees.
  * Add dual tree traverser for Spill Trees.
  * Set non-overlapping for spill query tree.
  * Avoid calculating bounds when oldScore is the best possible.
  * Remove unnecessary copying of dataset in SpillTrees.
  * Create a new class SpillSearch that encapsulates an instance of NeighborSearch class, and adds the functionality to deal with spill trees.
  * Update NSModel to use SpillSearch class.
  * Log Num of BaseCases/Score.
  * Add more tests for SpillSearch
  * Update documentation.
  * Remove B_2 bound for neighbor search with spill trees.
  * Change spill rules's filename.
  * Fix SpillTree's move constructor

-- File Changes --

    M src/mlpack/core/tree/CMakeLists.txt (9)
    M src/mlpack/core/tree/binary_space_tree/mean_split.hpp (16)
    M src/mlpack/core/tree/binary_space_tree/mean_split_impl.hpp (71)
    M src/mlpack/core/tree/binary_space_tree/midpoint_split.hpp (17)
    M src/mlpack/core/tree/binary_space_tree/midpoint_split_impl.hpp (65)
    A src/mlpack/core/tree/spill_tree.hpp (20)
    A src/mlpack/core/tree/spill_tree/dual_tree_traverser.hpp (94)
    A src/mlpack/core/tree/spill_tree/dual_tree_traverser_impl.hpp (384)
    A src/mlpack/core/tree/spill_tree/single_tree_traverser.hpp (63)
    A src/mlpack/core/tree/spill_tree/single_tree_traverser_impl.hpp (108)
    A src/mlpack/core/tree/spill_tree/spill_tree.hpp (440)
    A src/mlpack/core/tree/spill_tree/spill_tree_impl.hpp (683)
    A src/mlpack/core/tree/spill_tree/traits.hpp (60)
    A src/mlpack/core/tree/spill_tree/typedef.hpp (80)
    M src/mlpack/methods/neighbor_search/CMakeLists.txt (4)
    M src/mlpack/methods/neighbor_search/knn_main.cpp (26)
    M src/mlpack/methods/neighbor_search/neighbor_search.hpp (19)
    M src/mlpack/methods/neighbor_search/neighbor_search_impl.hpp (8)
    M src/mlpack/methods/neighbor_search/neighbor_search_rules.hpp (3)
    M src/mlpack/methods/neighbor_search/neighbor_search_rules_impl.hpp (3)
    M src/mlpack/methods/neighbor_search/ns_model.hpp (41)
    M src/mlpack/methods/neighbor_search/ns_model_impl.hpp (92)
    A src/mlpack/methods/neighbor_search/spill_search.hpp (289)
    A src/mlpack/methods/neighbor_search/spill_search_impl.hpp (213)
    A src/mlpack/methods/neighbor_search/spill_search_rules.hpp (228)
    A src/mlpack/methods/neighbor_search/spill_search_rules_impl.hpp (440)
    M src/mlpack/tests/CMakeLists.txt (1)
    M src/mlpack/tests/aknn_test.cpp (41)
    M src/mlpack/tests/knn_test.cpp (117)
    A src/mlpack/tests/spill_tree_test.cpp (122)

-- Patch Links --

https://github.com/mlpack/mlpack/pull/747.patch
https://github.com/mlpack/mlpack/pull/747.diff

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/747
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160801/e7584c12/attachment.html>


More information about the mlpack-git mailing list