[mlpack-git] [mlpack] add train test split (#523)

Marcus Edel notifications at github.com
Thu Feb 25 09:18:34 EST 2016


hanks for the pull request I think, this is the first step to get the GSOC 'Dataset and experimentation tools' project started.

I took a quick look over the pull request and have some comments:

- Instead of returning a tuple that contains the train and test data, I would go with references, to be more consistent with the other mlpack code that has functions like this.

``TrainTestSplit(const MatType& input, MatType& train, MatType& test)``

- You can create a random vector of indexes using armadillo instead of using an std::vector.

`'arma::Col<size_t> order = arma::shuffle(arma::linspace<arma::Col<size_t> >(0, (details::extractSize(input) - 1), details::extractSize(input)));``

 I guess this is just a preference of mine, because your code is absolutely fine. I've seen this code fragment a couple of times, so I guess I'm biased.

 - Please use camel casing for all names. Capitalize class and method names.

Another thing that crossed my mind is, can we modify the code in a why that it also works with data matrices with more than one dimension. Imagine a dataset of images with more than one channel per image.

---
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/523#issuecomment-188802055
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160225/7329f326/attachment-0001.html>


More information about the mlpack-git mailing list