[mlpack-git] [mlpack/mlpack] Spill Trees implementation. (#728)

Fri Jul 22 20:03:08 EDT 2016

Hi @rcurtin,
Thanks for your response! I agree with what was proposed. So I will continue implementing the original approach with splitting hyperplanes.

I have been thinking about this. I list some ideas that I need to refine:

+ We can decide which child node to traverse in O(1) if we consider axis-parallel splitting hyperplanes.
As I mentioned in the pdf file, another option is to consider splitting hyperplanes that are not necessarily axis-parallel. For example, as suggested in Ting Liu's paper, we can approximate the pair of furthest points p1, p2, in the current set of points and split the set of points according to the projection over the vector defined by p1 and p2. We could record that vector in each node, so we can project the query node and decide which child node to traverse when doing dfs.
The advantage is that we have more flexibility, I think it would improve the classification of points. However the disadvantage is that it takes O(d) to calculate the projection of the query point when deciding which node to traverse.
Also, depending on this, I can see on 2 different approaches for dual tree search:
 - If the reference spill tree is built with axis-parallel hyperplanes, we can build a query tree with hrects bounds (KDTree) and use the information of the hrects bounds to know which reference node to traverse. Given a hrect bound and an axis-parallel hyperplane, we can know in O(1) time which side to traverse (or both).
 - If the reference spill tree is built with non-axis-parallel hyperplanes, we can build a query tree with ball bounds (BallTree), project the ball bound's center and use the radius to know which side to traverse in O(d) time.

+ I think I could record the splitting dimension and value (or the projection vector in case of non-axis-parallel hyperplanes) in each node instead of using the StatisticsType, because this information won't change after building the tree.

+ I don't think it is possible to rearrange the points in the dataset instead of holding a set of indices in the leaf nodes.

Thanks

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/issues/728#issuecomment-234684817
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160722/9bc181ac/attachment-0001.html>