[mlpack-git] (blog) master: keonkim week2 highlights (003d656)

Mon Jun 6 09:04:16 EDT 2016

Repository : https://github.com/mlpack/blog
On branch  : master
Link       : https://github.com/mlpack/blog/compare/2947c46dcc18958741661b7d146d4832cbffa631...2ff2a4f558415375d40ab8795a316dd6c0a30fb9

>---------------------------------------------------------------

commit 003d656f0e6b06eab663ee57a5334e73f90e6b66
Author: Keon Kim <kwk236 at gmail.com>
Date:   Mon Jun 6 22:01:05 2016 +0900

    keonkim week2 highlights


>---------------------------------------------------------------

003d656f0e6b06eab663ee57a5334e73f90e6b66
 content/blog/KeonWeekTwo.md | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/content/blog/KeonWeekTwo.md b/content/blog/KeonWeekTwo.md
new file mode 100644
index 0000000..4473332
--- /dev/null
+++ b/content/blog/KeonWeekTwo.md
@@ -0,0 +1,28 @@
+Title: Dataset and Experimentation Tools : Week-2 Highlights
+Date: 2016-06-05 21:00:00
+Tags: gsoc, dataset, data
+Author: Keon Kim
+
+Here are some things I've done in week 2.
+
+1) fixed [default output problem](https://github.com/mlpack/mlpack/issues/667) with [this pull request](https://github.com/mlpack/mlpack/pull/680).
+Previously when output parameters are not specified the user, the program saved the results in a file with arbitrary name. This might delete user's data without warning.
+I changed the default outputs to required parameters.
+In some cases where output is not necessary, the program now gives warning to the user that it is not going to save the result if it is not specified, not save or overwrite existing data with default name.
+
+2) implemented [binarize](https://github.com/mlpack/mlpack/pull/666) functions, which transforms matrix values to 0 and 1 according to a given threshold.
+This can provide a easy-to-use implementation for pre-processing dataset. Previously the user had to learn how to work with armadillo matrix.
+Plus, it provides an overload which can apply binarize to selected dimensions.
+
+3) I experimented with the proof-of-concept I've done last week.
+I thought of a way to change missing variables to NaNs while mapping the categorical (including missingi) data
+and apply various imputation strategies by reverse-mapping the values,
+but after a few discussion, it seems that implementing this while loading seems to be a better idea since it can allow users to specify which values are invalid or missing.
+
+4) Wrote a [How to install mlpack on Windows 10 Tutorial](http://keon.io/mlpack-on-windows.html)
+
+5) I discussed and implemented basic one-hot-encoding and min-max-scale functions.
+These preprocessing features can be used in other methods or projects.
+
+Next week, I am going to (really) finalize missing variables and imputation features, one-hot-encoding, and min-max-scale.
+Along the way, I also hope to solve [this issue](https://github.com/mlpack/mlpack/issues/671), which I got unsuccessful this week because of segmentation faults errors.