[mlpack-git] [mlpack/mlpack] Spill trees (#747)
MarcosPividori
notifications at github.com
Mon Aug 1 13:15:43 EDT 2016
Hi @sumedhghaisas @rcurtin,
I have implemented Spill trees with axis-parallel splitting hyperplanes. I have made an effort to avoid duplicating existing code for neighbor search.
I created a new class `SpillSearch` that provides an interface similar to `NeighborSearch` but with an extension to properly set the tau parameter. It encapsulates an instance of `NeighborSearch`.
Also, I have implemented a new version of `NeighborSearchRules` specialized for Spill Trees, because I needed to modify the methods:
+ `Score()` to consider splitting hyperplanes for overlapping nodes.
+ `CalculateBounds()` to ignore B_2 bound (we can not use B_2 bound for Spill Trees).
## Single Tree Search:
The `SingleTreeTraverser` is similar to the implementation for `BinarySpaceTree`.
The difference is in the implementation of `NeighborSearchRules` for SpillTrees.
When calculating the score of a query point and a reference node I consider 2 cases:
+ If the reference node is non-overlapping, I calculate the score the same than before.
+ If the reference node is overlapping, I analyze the reference node's half space. If it contains the given query point, I return 0 (best score). Else, I return DBL_MAX (prune).
## Dual Tree Search:
The Query tree is built without overlapping.
When calculating the score of a query node and a reference node, I consider 2 cases:
+ If the reference node is a non-overlapping node, I calculate the score the same as before.
+ If the reference node is a overlapping node, I analyze query node's bounding box. If it intersects the reference node's half space, I return 0 (best score). Else, I return DBL_MAX (prune).
The `DualTreeTraverser` is slightly different to the implementation for `BinarySpaceTree`. When referenceNode is a overlapping node and we can't decide which child node to traverse, this means that queryNode is at both sides of the splitting hyperplane, we analyse the queryNode:
+ If queryNode is a non-leaf node, I recurse down the query node.
+ If queryNode is a leaf node, I do single tree search for each point in the query node.
The `DualTreeTraverser` is faster than the `SingleTreeTraverser`. Specially when the value of tau increases, because we will have more non-overlapping nodes which implies more time involved in backtracking.
The extension was incorporated to existing *mlpack_knn*. With actual implementation, we can use *"-t spill"* to consider spill trees and *"--tau 0.1"* to set different values for the overlapping size (default value is tau=0).
Every feedback is welcome! :)
Thanks
You can view, comment on, or merge this pull request online at:
https://github.com/mlpack/mlpack/pull/747
-- Commit Summary --
* Add Spill Trees, based on binary space trees.
* Fix details.
* Improve expansion of node's bound. Do not include the overlapping buffer in the bounds of both nodes.
* Fix simple errors.
* Add support for Hybrid SP-Tree Search (Single tree traverser).
* Add support for spill trees in knn search.
* Syntax details.
* Fix error in the order of parameters.
* Add tests for spill tree.
* Add approximate knn search tests for hybrid spill trees.
* Add exact knn search tests for hybrid spill trees.
* Include missing headers.
* Use a simple tree traverser for spill tree (Remove defeatist seach included in single tree traverser).
* Implement defeatist search in the Rescore() method, with a specialization for Spill Trees.
* Remove unnecessary BoundType template parameter.
* Record splitDimension and splitValue.
* Set Overlapping node when percentage greater than rho but not because of
* Add NeighborSearchRules specialization for Spill Trees.
* Add dual tree traverser for Spill Trees.
* Set non-overlapping for spill query tree.
* Avoid calculating bounds when oldScore is the best possible.
* Remove unnecessary copying of dataset in SpillTrees.
* Create a new class SpillSearch that encapsulates an instance of NeighborSearch class, and adds the functionality to deal with spill trees.
* Update NSModel to use SpillSearch class.
* Log Num of BaseCases/Score.
* Add more tests for SpillSearch
* Update documentation.
* Remove B_2 bound for neighbor search with spill trees.
* Change spill rules's filename.
* Fix SpillTree's move constructor
-- File Changes --
M src/mlpack/core/tree/CMakeLists.txt (9)
M src/mlpack/core/tree/binary_space_tree/mean_split.hpp (16)
M src/mlpack/core/tree/binary_space_tree/mean_split_impl.hpp (71)
M src/mlpack/core/tree/binary_space_tree/midpoint_split.hpp (17)
M src/mlpack/core/tree/binary_space_tree/midpoint_split_impl.hpp (65)
A src/mlpack/core/tree/spill_tree.hpp (20)
A src/mlpack/core/tree/spill_tree/dual_tree_traverser.hpp (94)
A src/mlpack/core/tree/spill_tree/dual_tree_traverser_impl.hpp (384)
A src/mlpack/core/tree/spill_tree/single_tree_traverser.hpp (63)
A src/mlpack/core/tree/spill_tree/single_tree_traverser_impl.hpp (108)
A src/mlpack/core/tree/spill_tree/spill_tree.hpp (440)
A src/mlpack/core/tree/spill_tree/spill_tree_impl.hpp (683)
A src/mlpack/core/tree/spill_tree/traits.hpp (60)
A src/mlpack/core/tree/spill_tree/typedef.hpp (80)
M src/mlpack/methods/neighbor_search/CMakeLists.txt (4)
M src/mlpack/methods/neighbor_search/knn_main.cpp (26)
M src/mlpack/methods/neighbor_search/neighbor_search.hpp (19)
M src/mlpack/methods/neighbor_search/neighbor_search_impl.hpp (8)
M src/mlpack/methods/neighbor_search/neighbor_search_rules.hpp (3)
M src/mlpack/methods/neighbor_search/neighbor_search_rules_impl.hpp (3)
M src/mlpack/methods/neighbor_search/ns_model.hpp (41)
M src/mlpack/methods/neighbor_search/ns_model_impl.hpp (92)
A src/mlpack/methods/neighbor_search/spill_search.hpp (289)
A src/mlpack/methods/neighbor_search/spill_search_impl.hpp (213)
A src/mlpack/methods/neighbor_search/spill_search_rules.hpp (228)
A src/mlpack/methods/neighbor_search/spill_search_rules_impl.hpp (440)
M src/mlpack/tests/CMakeLists.txt (1)
M src/mlpack/tests/aknn_test.cpp (41)
M src/mlpack/tests/knn_test.cpp (117)
A src/mlpack/tests/spill_tree_test.cpp (122)
-- Patch Links --
https://github.com/mlpack/mlpack/pull/747.patch
https://github.com/mlpack/mlpack/pull/747.diff
---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/747
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160801/e7584c12/attachment.html>
More information about the mlpack-git
mailing list