[mlpack-git] [mlpack/mlpack] add overload that imputes every dimensions (#740)
Tham
notifications at github.com
Wed Jul 27 16:40:45 EDT 2016
Some suggestions after more study
1 : Maybe a little bit late, if I am correct, IncrementPolicy make no sense with Imputer class? Maybe we can use static assert to forbid users specify the mapper policy as IncrementPolicy?
```
static_assert(!std::is_same<MapperType, DatasetMapper<IncrementPolicy>>::value,
"Mapper type do not support DatasetMapper<IncrementPolicy>");
```
2 : I think separate the test of imputation all to another test case is easier to read
3 : Found a "bug"(this one cost me a while to find out) when I try to Impute all data by non columnMajor
```
BOOST_AUTO_TEST_CASE(DatasetMapperImputeAllTest)
{
fstream f;
f.open("test_file.csv", fstream::out);
f << "a, 2, 3, 4" << endl;
f << "5, 6, a, 8" << endl;
f << "9, 10, 11, 12" << endl;
f.close();
arma::mat allInputColumnWise;
MissingPolicy allPolicy({"a"});
DatasetMapper<MissingPolicy> allInfo(allPolicy);
BOOST_REQUIRE(data::Load("test_file.csv", allInputColumnWise, allInfo) == true);
// convert missing vals to 99.
CustomImputation<double> allCustomStrategy(99);
Imputer<double,
DatasetMapper<MissingPolicy>,
CustomImputation<double>> allImputer(allInfo, allCustomStrategy);
// convert a or nan to 99 for all dimensions.
arma::mat allInputRowWise = allInputColumnWise;
allImputer.Impute(allInputColumnWise, "a");
allInputColumnWise.print();
std::cout<<allInfo.NumMappings(0)<<", "<<allInfo.NumMappings(1)<<", "
<<allInfo.NumMappings(2)<<", "<<allInfo.NumMappings(3)<<std::endl;
// Custom imputation result check
auto requireClose = [](arma::mat const &input)
{
BOOST_REQUIRE_CLOSE(input(0, 0), 99.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(0, 1), 5.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(0, 2), 9.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(1, 0), 2.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(1, 1), 6.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(1, 2), 10.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(2, 0), 3.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(2, 1), 99.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(2, 2), 11.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(3, 0), 4.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(3, 1), 8.0, 1e-5);
BOOST_REQUIRE_CLOSE(input(3, 2), 12.0, 1e-5);
};
requireClose(allInputColumnWise);
allImputer.ColumnMajor() = false;
allImputer.Impute(allInputRowWise, "a");
allInputRowWise.print("\n");
//Do not work as expected, because the results of NumMappings are 1,0,1,0
std::cout<<allInfo.NumMappings(0)<<", "<<allInfo.NumMappings(1)<<", "
<<allInfo.NumMappings(2)<<", "<<allInfo.NumMappings(3)<<std::endl;
requireClose(allInputRowWise);
// Remove the file.
remove("test_file.csv");
}
```
Maybe we should change the codes to
```
void Impute(arma::Mat<T>& input,
const std::string& missingValue)
{
for (size_t i = 0; i < input.n_rows; ++i)
{
if (mapper.NumMappings(i) > 0)
{
T mappedValue = static_cast<T>(mapper.UnmapValue(missingValue, i));
strategy.Impute(input, mappedValue, i, true);
}
}
}
```
This should be ok, because dimensions of NumMappings will always equal to the n_rows of input.
4 : What if users want to impute more than one missing values?
---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/740#issuecomment-235713804
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160727/b672b526/attachment.html>
More information about the mlpack-git
mailing list