[mlpack-git] [mlpack/mlpack] DatasetMapper & Imputer (#694)

Keon Kim notifications at github.com
Sat Jun 18 08:13:25 EDT 2016


@stereomatchingkiss sorry for the delay, I re-thought of the structure and this took more time than I thought :(. Here are some updates I've made.

* rename imputate_strategies (folder) -> imputation_methods
* rename mean_strategies.hpp -> mean_imputation.hpp (applied similar changes to other files). I will call them Imputation classes for simplicity.
* imputation classes are now named as MeanImputation<T> instead of MeanStrategy. They can be used independently to impute number type variables. (please refer to the tests). Iteration to search the mapped values and replacements now both occur in Imputation classes rather than Imputer. Imputer class now just finds mapped value from the string and passes it to imputation classes.
* re-implement MeanImputation class to ignore the mapped values of missing variables.
* add listwise_deletion class (simply deletes the whole row or col if missing value exists)
* add tests for the imputation classes and imputer class

TODOs:
- [ ] make a overload of data::Load function so that it maps using different policy for missing variables.
- [ ] fix comparison of floating type variables with ==
- [ ] write documentations and comments
- [ ] more sophisticated imputation classes?

concerns:
* comparing double is not done correctly as of now. I am looking for the most efficient solution.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/694#issuecomment-226938307
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160618/8d1f32d2/attachment-0001.html>


More information about the mlpack-git mailing list