[mlpack-git] [mlpack/mlpack] Spill trees (#747)

Ryan Curtin notifications at github.com
Wed Aug 17 14:47:56 EDT 2016

> @@ -209,12 +245,6 @@ int main(int argc, char *argv[])
>      Log::Info << "Loaded kNN model from '" << inputModelFile << "' (trained on "
>          << knn.Dataset().n_rows << "x" << knn.Dataset().n_cols << " dataset)."
>          << endl;
> -
> -    // Adjust singleMode and naive if necessary.
> -    knn.SingleMode() = CLI::HasParam("single_mode");
> -    knn.Naive() = CLI::HasParam("naive");
> -    knn.LeafSize() = size_t(lsInt);
> -    knn.Epsilon() = epsilon;

I see what you mean.  There are really two separate concerns in the comments you made; I'll address them in reverse order:

I agree that a `-m` option would be a nice idea, since the number of options for these neighbor searching programs we have are exploding as we add so many types of trees and types of search!  Unfortunately we will have some reverse compatibility to contend with, but it should not be a huge issue, just annoying.

Onto the other issue, allowing the user to specify some options at query time can provide functionality we don't otherwise have: for instance, the user can specify a query tree with a different leaf size, and more importantly, a user can train a model (i.e. build a reference tree) once, and then use that saved model to do both single-tree and dual-tree search.  This can save a lot of time, for instance if the tree took a long time to build (which is often the case with cover trees).  This is also true of the epsilon parameter---I can do nearest neighbor search with a lot of different epsilon levels.

So, I guess, the documentation was wrong about what was actually happening before.  A better way to put it would be this:  If `--single_mode`, `--naive`, `--leaf_size`, or `--epsilon` are specified along with `--input_model_file`, then these options will be taken into account for the search.  For `--leaf_size`, the option will only apply to the reference tree.  The new model preferences will not be saved unless `--output_model_file` is specified.

Really there are two different types of options there; one can be encapsulated by the `--mode` parameter you suggested; the other (`--leaf_size`) is a specific parameter for the query tree.  I am willing to believe that the use-case for different-leaf-size query trees is so niche that nobody is going to do it, and we can force the user to say that if they are going to build a query tree, then it will be built with the same options as the reference tree, and if they don't like that, then they will need to write C++ and build their trees themselves.  But we can't remove the option to allow the user to specify a different search mode.

If you prefer, for the second part, we can leave that for a separate issue, leave the code in currently to set `LeafSize()` and the other parameters, and then open another Github issue and we can work that out separately.

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160817/64c5e55f/attachment-0001.html>

More information about the mlpack-git mailing list