[mlpack-git] [mlpack/mlpack] add cli executable for data_split (#650)

Ryan Curtin notifications at github.com
Fri May 27 11:07:21 EDT 2016


Another consideration here is that if space is really a concern, csv/tsv/txt is really about the most inefficient possible format for representing doubles.  (The situation is a little bit different for small integers...).  Either using `csv.gz` or HDF5 or any of the packed binary formats is a much better option in that case (and actually accelerates load times by sometimes an order of magnitude or more).

If we are doing something like reading the precision of the file when we load it in order that we can have the same precision when we save it, this is definitely a way to solve our problem, but to me it's not too clear how much extra overhead we'll have when we load files... do we then need to modify `data::Load()` to return a `size_t` (or take a `size_t&` as a parameter) which can then be passed to `data::Save()`?  But even then this `size_t` wouldn't be useful for packed binary formats.

So I am definitely not opposed to a change in how we load things to improve this behavior, but I'm not quite sure what the best way to do it is.  Contributions to Armadillo upstream to change its behavior are definitely something we can do (though Conrad may or may not like the ideas we propose).

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/650#issuecomment-222171387
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160527/0559dc01/attachment-0001.html>


More information about the mlpack-git mailing list