[mlpack-git] [mlpack/mlpack] DatasetMapper & Imputer (#694)

Tham notifications at github.com
Tue Jun 28 18:03:15 EDT 2016


Hi, keon, what is your codes?Do you do the transform?I think your codes looks like

```
BOOST_AUTO_TEST_CASE(loadFile)
{
    using namespace mlpack;
    using namespace mlpack::data;
    using namespace std;

    fstream f;
    f.open("test.csv", fstream::out);
    //f << "3, a, 2, a" << endl;
    //f << "5, 6, 0, 6" << endl;
    //f << "9, 8, 4, 8" << endl;
    f << "3, 0, a, 0" << endl;
    f << "5, 6, 0, 6" << endl;
    f << "9, 8, 4, 8" << endl;
    f.close();

    arma::mat dataIn;
    data::DatasetInfo info;
    bool const transpose = false;
    data::Load("test.csv", dataIn, info,  true, transpose);
    std::cout<<dataIn<<std::endl;

    Log::Info << "dataset info: " << endl;    
    for (size_t i = 0; i < data.n_rows; ++i)
    {
        std::cout << info.NumMappings(i) << " mappings in dimension "
                  << i << "." << endl;
    }//*/

    remove("test.csv");
}

```
I get the same results as yours, and I think it is expected results.


```
3.0, a, 2.0, a
5.0, 6.0, 0.0, 6.0
9.0, 8.0, 4.0, 8.0

is translated to:
[INFO ]    3.0000   5.0000   9.0000
[INFO ]         0   1.0000   2.0000 <-- should be 0, 1, 2, not 0, 6, 8
[INFO ]    2.0000        0   4.0000
[INFO ]         0   1.0000   2.0000 <-- should be 0, 1, 2, not 0, 6, 8
```

Why they should be 0, 1, 2? Because after transform, every column is a dimension, if we find out every column exist any element do not belongs to numeric, we should treat all of the element as categorical. If not, how could we differentiate data as following?

```
3.0, a, 2.0,a
5.0, 0, 1.0, 0
```

First we map a to 0, 0 back to 0?This do not make sense, the easiest yet reasonable solution is treat the whole column as categorical data.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/694#issuecomment-229198175
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160628/33a62079/attachment.html>


More information about the mlpack-git mailing list