[mlpack-git] [mlpack/mlpack] add cli executable for data_split (#650)

Ryan Curtin notifications at github.com
Fri May 27 10:07:48 EDT 2016


The issue is that in `armadillo_bits/diskio_meat.hpp`, we have this bit of code:

```
if( (is_float<eT>::value == true) || (is_double<eT>::value == true) )
  {
  f.setf(ios::scientific);
  f.precision(12);
  cell_width = 20;
  }
```

and this gives us no way to set the stream how we like.  Reimplementing `save_raw_ascii()` and calling our own overload is probably not a great idea because `ios::scientific` is the best way to represent arbitrary numbers...

One thing that we can do is to determine whether or not there are any numbers in the labels which are not integers (or very very close to integers), and in that case, we can cast the matrix to a `Mat<size_t>` or `Mat<int>` and then save that.  However, we should only do this for the case of text-file matrices (csv, tsv, txt)... otherwise, we might load an `arma_binary` matrix with packed doubles but then save it as packed ints, causing disaster later on in the user's pipeline.

If this is too hard, I don't think that it is the end of the world to turn the user's '1's and '0's into '1.0000000000e+00' and '0.00000000000e+00' since those are still valid number representations and will work later in the user's machine learning pipeline.  But we should try to avoid reimplementing Armadillo's save functionality, whatever it is we do choose to do.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/650#issuecomment-222155636
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160527/cf645dbe/attachment.html>


More information about the mlpack-git mailing list