[mlpack-git] [mlpack/mlpack] add overload that imputes every dimensions (#740)

Tham notifications at github.com
Wed Jul 27 16:40:45 EDT 2016

Some suggestions after more study

1 : Maybe a little bit late, if I am correct, IncrementPolicy make no sense with Imputer class? Maybe we can use static assert to forbid users specify the mapper policy as IncrementPolicy?

static_assert(!std::is_same<MapperType, DatasetMapper<IncrementPolicy>>::value,
                  "Mapper type do not support DatasetMapper<IncrementPolicy>");

2 : I think separate the test of imputation all to another test case is easier to read

3 : Found a "bug"(this one cost me a while to find out) when I try to Impute all data by non columnMajor

    fstream f;
    f.open("test_file.csv", fstream::out);
    f << "a,  2,  3,  4"  << endl;
    f << "5,  6,  a,  8"  << endl;
    f << "9, 10, 11, 12" << endl;

    arma::mat allInputColumnWise;
    MissingPolicy allPolicy({"a"});
    DatasetMapper<MissingPolicy> allInfo(allPolicy);
    BOOST_REQUIRE(data::Load("test_file.csv", allInputColumnWise, allInfo) == true);

    // convert missing vals to 99.
    CustomImputation<double> allCustomStrategy(99);
            CustomImputation<double>> allImputer(allInfo, allCustomStrategy);
    // convert a or nan to 99 for all dimensions.
    arma::mat allInputRowWise = allInputColumnWise;
    allImputer.Impute(allInputColumnWise, "a");
    std::cout<<allInfo.NumMappings(0)<<", "<<allInfo.NumMappings(1)<<", "
            <<allInfo.NumMappings(2)<<", "<<allInfo.NumMappings(3)<<std::endl;

    // Custom imputation result check
    auto requireClose = [](arma::mat const &input)
        BOOST_REQUIRE_CLOSE(input(0, 0), 99.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(0, 1), 5.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(0, 2), 9.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(1, 0), 2.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(1, 1), 6.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(1, 2), 10.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(2, 0), 3.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(2, 1), 99.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(2, 2), 11.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(3, 0), 4.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(3, 1), 8.0, 1e-5);
        BOOST_REQUIRE_CLOSE(input(3, 2), 12.0, 1e-5);

    allImputer.ColumnMajor() = false;
    allImputer.Impute(allInputRowWise, "a");
    //Do not work as expected, because the results of NumMappings are 1,0,1,0
    std::cout<<allInfo.NumMappings(0)<<", "<<allInfo.NumMappings(1)<<", "
            <<allInfo.NumMappings(2)<<", "<<allInfo.NumMappings(3)<<std::endl;

    // Remove the file.


Maybe we should change the codes to

void Impute(arma::Mat<T>& input,
              const std::string& missingValue)
    for (size_t i = 0; i < input.n_rows; ++i)
      if (mapper.NumMappings(i) > 0)
        T mappedValue = static_cast<T>(mapper.UnmapValue(missingValue, i));
        strategy.Impute(input, mappedValue, i, true);

This should be ok, because dimensions of NumMappings will always equal to the n_rows of input.

4 : What if users want to impute more than one missing values?

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160727/b672b526/attachment.html>

More information about the mlpack-git mailing list