[mlpack-git] [mlpack/mlpack] Bugfix #350 and some fixes in RStarTreeSplit and XTreeSplit (#556)

Ryan Curtin notifications at github.com
Mon Apr 4 23:18:50 EDT 2016

I'm really sorry this took so long.  The end of the GSoC application period was pretty overwhelming and then I went on vacation for a week, so only now am I able to take a look at this in-depth.

Here is some background based on what I remember of this during GSoC 2014.  The rectangle tree and its variants are made for dynamic point sets where insertions and deletions are common, but this is a little bit at odds with the other trees in mlpack, which tend to be "build it once and don't insert or delete points from the dataset".  In order to support this design decision, the original idea was to have each node hold its own arma::mat (this is called `localDataset`) and then points could easily be inserted or deleted by just calling `insert_cols()` or `shed_cols()` on the local datasets.  But I am not sure this ever really happened in earnest, so I think maybe the `Insert()` and `Delete()` functions do not work correctly.  This is a thing I would eventually like to look into, but have not yet had time to.  I don't think the local dataset is even used; instead I think each node holds a `std::vector<size_t>` that holds the indices of the points that are in that node.  If you are in
 terested in solving that problem and making the Insert()/Delete() support meaningful, I can discuss it more if you like.

The other issue was the necessity for a shallow copy constructor, because sometimes the insertion procedure necessitates creating a new root node.  But if we are constructing the tree like

RTreeType* tree = new RTreeType(dataset);

we can't have `tree` point to an intermediate level of the tree; it must point to the root.  So we need to make a shallow copy of the current `tree` node, then make that shallow copy a child of the current `tree` node and update all the data in the `tree` node.

I think you are right that you uncovered a problem there, and I think that your solution in 121bbaf is the right solution.  The situation where `deepCopy = true` is the default copy constructor behavior, and would be used if you wanted to copy the tree entirely:

RTreeType myTree(dataset);
RTreeType myTreeCopy(myTree /* defaults to a deep copy */);
For the second issue, the `const MatType* dataset` only makes sense if the tree should not be modifying the dataset.  We should just remove the non-const function `Dataset()`, to be honest.

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160404/657f90d4/attachment.html>

More information about the mlpack-git mailing list