[mlpack-git] [mlpack/mlpack] DatasetMapper & Imputer (#694)

Keon Kim notifications at github.com
Mon Jun 13 23:23:24 EDT 2016


I finished implementing basic structure of Imputer class. So I wanted to get your opinion about it.

I renamed DatasetInfo to DatasetMapper, which accepts template parameter of MapPolicy. ( can be used to store different kinds of maps.)
DatasetMapper, however, still provides backward compatibility with typedef:
`using DatasetInfo = DatasetMapper<IncrementPolicy>`. The IncrementPolicy denotes the original mapping policy used, which increments numbers for different categories, starting from 0.

I believe what is left is to connect it with data::Load function so that it creates maps for missing values.

Imputer class is also added in this pull request. Imputer also accepts template parameters, so that different strategies can be applied. 

Strategies and tests are yet to be fully implemented.

Overall flow of the DatasetMapper and Imputer can be derived from [preprocess_imputer_main.hpp](https://github.com/keonkim/mlpack/blob/5a517c25ef55de1f4814dc3605190d17f868ff82/src/mlpack/methods/preprocess/preprocess_imputer_main.cpp)
You can view, comment on, or merge this pull request online at:

  https://github.com/mlpack/mlpack/pull/694

-- Commit Summary --

  * concept work for imputer
  * Merge branch 'master' of github.com:keonkim/mlpack into imputer
  * do not to use NaN by default, let the user specify
  * Merge branch 'master' of github.com:keonkim/mlpack into imputer
  * add template to datasetinfo and add imputer class
  * clean datasetinfo class and rename files
  * implement basic imputation strategies
  * modify imputer_main and clean logs
  * add parameter verification for imputer_main
  * add custom strategy to impute_main
  * add datatype change in IncrementPolicy

-- File Changes --

    M src/mlpack/core/data/CMakeLists.txt (1)
    M src/mlpack/core/data/dataset_info.hpp (78)
    M src/mlpack/core/data/dataset_info_impl.hpp (77)
    A src/mlpack/core/data/impute_strategies/CMakeLists.txt (17)
    A src/mlpack/core/data/impute_strategies/custom_strategy.hpp (26)
    A src/mlpack/core/data/impute_strategies/mean_strategy.hpp (62)
    A src/mlpack/core/data/impute_strategies/median_strategy.hpp (46)
    A src/mlpack/core/data/impute_strategies/mode_strategy.hpp (38)
    A src/mlpack/core/data/imputer.hpp (125)
    A src/mlpack/core/data/map_policies/CMakeLists.txt (15)
    A src/mlpack/core/data/map_policies/increment_policy.hpp (67)
    A src/mlpack/core/data/map_policies/missing_policy.hpp (67)
    M src/mlpack/methods/preprocess/CMakeLists.txt (2)
    A src/mlpack/methods/preprocess/preprocess_imputer_main.cpp (130)

-- Patch Links --

https://github.com/mlpack/mlpack/pull/694.patch
https://github.com/mlpack/mlpack/pull/694.diff

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/694
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160613/264912f0/attachment.html>


More information about the mlpack-git mailing list