[mlpack-git] [mlpack/mlpack] Spill trees (#747)

Ryan Curtin notifications at github.com
Wed Aug 10 10:24:38 EDT 2016

> +  }
> +
> +  std::vector<size_t> leftPoints, rightPoints;
> +  // Split the node.
> +  overlappingNode = SplitPoints(tau, rho, points, leftPoints, rightPoints);
> +
> +  // We don't need the information in points, so lets clean it.
> +  std::vector<size_t>().swap(points);
> +
> +  // Now we will recursively split the children by calling their constructors
> +  // (which perform this splitting process).
> +  left = new SpillTree(this, leftPoints, tau, maxLeafSize, rho);
> +  right = new SpillTree(this, rightPoints, tau, maxLeafSize, rho);
> +
> +  // Update count number, to represent the number of descendant points.
> +  count = left->NumDescendants() + right->NumDescendants();

I think there is an easier way to do this.  We only hold a list of point indices in the leaves, so we only have to worry about the arrangement of points at the leaf level.  Each leaf node only has one sibling, so given two sibling leaf nodes N_l and N_r, let's define a couple sets.  I've just chosen arbitrary set names:

 - S: all the points held in N_l and N_r
 - T: points held in N_r but _not_ in N_l
 - U: points held in both N_l and N_r
 - V: points held in N_l

Now we can arrange the point sets for each child like this:

 - N_l: [V] (order does not matter here)
 - N_r: [T U] (order does matter: all points in T come before all points in U)

If we have set the number of descendants in the parent properly to be |S| (this should be easy), then we can do the following:

size_t Descendant(const size_t i)
  // Are we a leaf?
  if (IsLeaf())
    return (*pointsIndex)[index];

  if (i > left->NumDescendants())
    return left->Descendant(i);
    return right->Descendant(i - left->NumDescendants());

The key to this approach is determining the sets T and U, but I think that is straightforward to do either before you recurse into building the children (you can "pre-organize" the set you pass to the right child), or afterwards (you can access the sets of the left and right child and organize them).

When you set `numDescendants` when building a node, that would simply be equal to the number of points held in that node.  This means for a given node, it is possible (and okay) if `node->NumDescendants() < node->Left()->NumDescendants() + node->Right()->NumDescendants()`.

Let me know if I have overlooked anything here.

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160810/7701dc74/attachment-0001.html>

More information about the mlpack-git mailing list