[mlpack-git] (blog) master: keon week four (7b57790)

Mon Jun 20 12:35:23 EDT 2016

Repository : https://github.com/mlpack/blog
On branch  : master
Link       : https://github.com/mlpack/blog/compare/ffa7049aa28a2c2b4aac7dd98f6e2c4bd453a0fa...13e13adba487ce4fa3a63137577b574493f21028

>---------------------------------------------------------------

commit 7b57790a6dd8d5d0a1bbb252eae9a929ec159c3f
Author: Keon Kim <kwk236 at gmail.com>
Date:   Tue Jun 21 01:29:59 2016 +0900

    keon week four


>---------------------------------------------------------------

7b57790a6dd8d5d0a1bbb252eae9a929ec159c3f
 content/blog/KeonWeekFour.md | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/content/blog/KeonWeekFour.md b/content/blog/KeonWeekFour.md
new file mode 100644
index 0000000..5833108
--- /dev/null
+++ b/content/blog/KeonWeekFour.md
@@ -0,0 +1,37 @@
+Title: Dataset and Experimentation Tools : Week-4 Highlights
+Date: 2016-06-20 24:00:00
+Tags: gsoc, dataset, data
+Author: Keon Kim
+
+This week, I worked on restructuring imputer and imputation methods.
+Here are briefs of what I did.
+
+1) tests for imputer and imputation methods.
+
+2) Restructured imputer and imputation classes.
+In this new implementation, imputer works like a wrapper that 
+provides a convinient interface of the imputation classes.
+Imputation classes can also be used independently if a user wants to replace
+a number variable to another. This work took longer than I thought.
+
+I did not make pull requests for standardization and normalization classes yet, 
+since they are also structured as the imputer class.
+I will be able to make similar changes after getting comments for the imputer class, 
+and make the pull request accordingly. (This should be quick)
+
+I also droped one-hot-encoding class that I was working on 
+because I did not see the clear use of this in other methods in mlpack.
+
+todo list:
+
+1) apply changes to imputer, imputer classes, and scalers after getting comments
+
+2) make a overload of data::Load function so that it maps using different policy for missing variables.
+
+3) optimize using openmp
+
+4) start working on preprocess_scan, a cli executable which scans through the dataset and finds
+missing variables or abrupt gaps.
+
+
+Notice: I already talked about this before to my mentors, but I have mandatory military training in June 21, 22, and 23.