[mlpack-git] [mlpack/mlpack] add overload that imputes every dimensions (#740)

Tham notifications at github.com
Wed Jul 27 16:40:45 EDT 2016


Some suggestions after more study

1 : Maybe a little bit late, if I am correct, IncrementPolicy make no sense with Imputer class? Maybe we can use static assert to forbid users specify the mapper policy as IncrementPolicy?

```
static_assert(!std::is_same<MapperType, DatasetMapper<IncrementPolicy>>::value,
                  "Mapper type do not support DatasetMapper<IncrementPolicy>");
```

2 : I think separate the test of imputation all to another test case is easier to read

3 : Found a "bug"(this one cost me a while to find out) when I try to Impute all data by non columnMajor

```
BOOST_AUTO_TEST_CASE(DatasetMapperImputeAllTest)
{
    fstream f;
    f.open("test_file.csv", fstream::out);
    f << "a,  2,  3,  4"  << endl;
    f << "5,  6,  a,  8"  << endl;
    f << "9, 10, 11, 12" << endl;
    f.close();

    arma::mat allInputColumnWise;
    MissingPolicy allPolicy({"a"});
    DatasetMapper<MissingPolicy> allInfo(allPolicy);
    BOOST_REQUIRE(data::Load("test_file.csv", allInputColumnWise, allInfo) == true);

    // convert missing vals to 99.
    CustomImputation<double> allCustomStrategy(99);
    Imputer<double,
            DatasetMapper<MissingPolicy>,
            CustomImputation<double>> allImputer(allInfo, allCustomStrategy);
    // convert a or nan to 99 for all dimensions.
    arma::mat allInputRowWise = allInputColumnWise;
    allImputer.Impute(allInputColumnWise, "a");
    allInputColumnWise.print();
    std::cout<<allInfo.NumMappings(0)<<", "<<allInfo.NumMappings(1)<<", "
            <<allInfo.NumMappings(2)<<", "<<allInfo.NumMappings(3)<<std::endl;

    // Custom imputation result check
    auto requireClose = [](arma::mat const &input)
    {
        BOOST_REQUIRE_CLOSE(input(0, 0), 99.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(0, 1), 5.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(0, 2), 9.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(1, 0), 2.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(1, 1), 6.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(1, 2), 10.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(2, 0), 3.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(2, 1), 99.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(2, 2), 11.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(3, 0), 4.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(3, 1), 8.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(3, 2), 12.0, 1e-5);
    };
    requireClose(allInputColumnWise);


    allImputer.ColumnMajor() = false;
    allImputer.Impute(allInputRowWise, "a");
    allInputRowWise.print("\n");
    //Do not work as expected, because the results of NumMappings are 1,0,1,0
    std::cout<<allInfo.NumMappings(0)<<", "<<allInfo.NumMappings(1)<<", "
            <<allInfo.NumMappings(2)<<", "<<allInfo.NumMappings(3)<<std::endl;
    requireClose(allInputRowWise);

    // Remove the file.
    remove("test_file.csv");
}

```

Maybe we should change the codes to

 ```
void Impute(arma::Mat<T>& input,
              const std::string& missingValue)
  {    
    for (size_t i = 0; i < input.n_rows; ++i)
    {
      if (mapper.NumMappings(i) > 0)
      {
        T mappedValue = static_cast<T>(mapper.UnmapValue(missingValue, i));
        strategy.Impute(input, mappedValue, i, true);
      }
    }
  }
```

This should be ok, because dimensions of NumMappings will always equal to the n_rows of input.

4 : What if users want to impute more than one missing values?

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/740#issuecomment-235713804
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160727/b672b526/attachment.html>


More information about the mlpack-git mailing list