[mlpack-git] (blog) master: keon week six (51238e9)

gitdub at mlpack.org gitdub at mlpack.org
Tue Jul 5 17:43:27 EDT 2016


Repository : https://github.com/mlpack/blog
On branch  : master
Link       : https://github.com/mlpack/blog/compare/2ca010c0c24ac83e283baf38841fb716d8d70bd7...1592519d9df365ee00d8034cc020a288667611f8

>---------------------------------------------------------------

commit 51238e9b5b4d29a0b1769ccaaa7a645c4d3a0291
Author: Keon Kim <kwk236 at gmail.com>
Date:   Wed Jul 6 06:43:27 2016 +0900

    keon week six


>---------------------------------------------------------------

51238e9b5b4d29a0b1769ccaaa7a645c4d3a0291
 content/blog/KeonWeekSix.md | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/content/blog/KeonWeekSix.md b/content/blog/KeonWeekSix.md
new file mode 100644
index 0000000..7df22ec
--- /dev/null
+++ b/content/blog/KeonWeekSix.md
@@ -0,0 +1,29 @@
+Title: Dataset and Experimentation Tools : Week-6 Highlights
+Date: 2016-07-05 16:00:00
+Tags: gsoc, dataset, data
+Author: Keon Kim
+
+I continued working on DatasetMapper & Imputer to finalize the pull request last week.
+All DatasetMapper, Imputer, Policy, and Imputation classes and their tests are ready for the last review.
+
+The executable is also ready for the final review.
+
+The changes I made are:
+
+1) Load funciton can now work with any type of DatasetMapper class. Policy can also be decided by the user.
+
+2) MissingPolicy now maps user-defined missing variables to NaN. 
+
+3) We had problem how data::Load maps through the MapToNumerical function.
+In order for MissingPolicy to work, the mapping should be done only for the missing variables,
+not the whole variables in the dimension. And IncrementPolicy requires the whole variables in a dimension
+to be mapped if at least one variable turns out to be categorical (string).
+I solved this by moving MapToNumerical from data::Load to Policy classes, so that
+each policies can decide how to map the tokens. I also renamed this function to MapTokens to be clear.
+
+4) completed tests and cleaned the apis so that they are more consistent.
+
+This week, I am going to work on statistics module.
+The statistics module would be a simple executable application to start with;
+the features we want to add are some what similar to [this application](http://personality-project.org/r/basics.t.html).
+




More information about the mlpack-git mailing list