[mlpack-git] [mlpack/mlpack] DatasetMapper & Imputer (#694)
Tham
notifications at github.com
Tue Jun 28 18:03:15 EDT 2016
Hi, keon, what is your codes?Do you do the transform?I think your codes looks like
```
BOOST_AUTO_TEST_CASE(loadFile)
{
using namespace mlpack;
using namespace mlpack::data;
using namespace std;
fstream f;
f.open("test.csv", fstream::out);
//f << "3, a, 2, a" << endl;
//f << "5, 6, 0, 6" << endl;
//f << "9, 8, 4, 8" << endl;
f << "3, 0, a, 0" << endl;
f << "5, 6, 0, 6" << endl;
f << "9, 8, 4, 8" << endl;
f.close();
arma::mat dataIn;
data::DatasetInfo info;
bool const transpose = false;
data::Load("test.csv", dataIn, info, true, transpose);
std::cout<<dataIn<<std::endl;
Log::Info << "dataset info: " << endl;
for (size_t i = 0; i < data.n_rows; ++i)
{
std::cout << info.NumMappings(i) << " mappings in dimension "
<< i << "." << endl;
}//*/
remove("test.csv");
}
```
I get the same results as yours, and I think it is expected results.
```
3.0, a, 2.0, a
5.0, 6.0, 0.0, 6.0
9.0, 8.0, 4.0, 8.0
is translated to:
[INFO ] 3.0000 5.0000 9.0000
[INFO ] 0 1.0000 2.0000 <-- should be 0, 1, 2, not 0, 6, 8
[INFO ] 2.0000 0 4.0000
[INFO ] 0 1.0000 2.0000 <-- should be 0, 1, 2, not 0, 6, 8
```
Why they should be 0, 1, 2? Because after transform, every column is a dimension, if we find out every column exist any element do not belongs to numeric, we should treat all of the element as categorical. If not, how could we differentiate data as following?
```
3.0, a, 2.0,a
5.0, 0, 1.0, 0
```
First we map a to 0, 0 back to 0?This do not make sense, the easiest yet reasonable solution is treat the whole column as categorical data.
---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/694#issuecomment-229198175
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160628/33a62079/attachment.html>
More information about the mlpack-git
mailing list