[mlpack-git] [mlpack] add train test split (#523)

Ryan Curtin notifications at github.com
Mon Feb 29 12:11:17 EST 2016


The way `arma::Cube` holds data is "slice-major": the data in an individual slice is all contiguous.  Each slice is column-major.  So if an `arma::Cube` is going to be used to hold data, then it would probably be best if each slice represented an individual point in the dataset.

But a problem with this approach is that now each point is accessed with `.slice(i)` instead of `.col(i)`.  This means that a user can't use an `arma::Cube` in this way and pass it as a `MatType` parameter to, say, logistic regression.  i.e. this is not possible with each point as a slice: `LogisticRegression<arma::Cube<double>>`.

In previous situations where images are dealt with or other multi-dimensional structures, they are basically "vectorized": a 256x256 image is just treated as a 65536-dimensional vector.  The same thing could be done with channels of an image: a 256x256x3 image becomes a 196608-dimensional vector.  And in that case, a column is a single point, like in the rest of mlpack.  For input into a neural network, I think that any image is vectorized in this way anyway (maybe I am incorrect here?), so personally I think it makes the most sense to just work with a vectorized representation the whole time.

If we decide to work with arbitrary types of data in higher-dimensional interpretations instead of vectorizing it, we'll have to rethink some of the core abstractions of mlpack, because "one column is one point" will no longer apply.  I'm not saying that we shouldn't make that change, just that we'll have to think through it pretty clearly so we can continue to present a unified interface.

---
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/523#issuecomment-190294718
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160229/ed5099c0/attachment.html>


More information about the mlpack-git mailing list