[mlpack-git] [mlpack/mlpack] DatasetMapper & Imputer (#694)

Tham notifications at github.com
Sun Jul 17 09:16:04 EDT 2016


> +  }
> +
> +  /**
> +   * Impute function searches through the input looking for mappedValue and
> +   * replaces it with the median of the given dimension. The result is
> +   * overwritten to the input matrix.
> +   *
> +   * @param input Matrix that contains mappedValue.
> +   * @param mappedValue Value that the user wants to get rid of.
> +   * @param dimension Index of the dimension of the mappedValue.
> +   * @param columnMajor State of whether the input matrix is columnMajor or not.
> +   */
> +  void Impute(arma::Mat<T>& input,
> +              const T& mappedValue,
> +              const size_t dimension,
> +              const bool columnMajor = true)

>I might want to load a training set, followed by loading a test set. I
need to be assured that when I load the test set, the mappings will be the
same as for the training set

I think this is inevitable for DataMapper if you load the training set and
test set separately.
The DataMapper build for training set may/may not work out of the box on
the test set.
If the users load the training set + test set into the same matrix and
split them later on.

>The overload of Impute() that gives a separate output matrix should not
copy the input matrix to the output matrix,
but instead impute directly into the output matrix, and copy elements as
needed.

Nice idea, did not think of this solution

I think change

output = input;

to

output.set_size(input.n_rows, input.n_cols);

should do the trick, or ask the users to do it before they pass the target
into the Impute function

>Impute() should allow imputation in all dimensions
I agree withi this one and suggest we make it in another pull request.
There are too many comments and things add to this pull request already.



2016-07-13 22:33 GMT+08:00 Ryan Curtin <notifications at github.com>:

> In src/mlpack/core/data/imputation_methods/median_imputation.hpp
> <https://github.com/mlpack/mlpack/pull/694#discussion_r70636674>:
>
> > +  }
> > +
> > +  /**
> > +   * Impute function searches through the input looking for mappedValue and
> > +   * replaces it with the median of the given dimension. The result is
> > +   * overwritten to the input matrix.
> > +   *
> > +   * @param input Matrix that contains mappedValue.
> > +   * @param mappedValue Value that the user wants to get rid of.
> > +   * @param dimension Index of the dimension of the mappedValue.
> > +   * @param columnMajor State of whether the input matrix is columnMajor or not.
> > +   */
> > +  void Impute(arma::Mat<T>& input,
> > +              const T& mappedValue,
> > +              const size_t dimension,
> > +              const bool columnMajor = true)
>
> When I was looking over this, I thought, maybe this method could be static?
> This probably applies to some other imputation strategies too.
>
>> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/mlpack/mlpack/pull/694/files/e5d591e511ae449eae1523a80346357b93b968d1#r70636674>,
> or mute the thread
> <https://github.com/notifications/unsubscribe/ABt-ugqNq36xMsvvlyZ74oOXd6LaZnFRks5qVPdGgaJpZM4I07W->
> .
>


---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/694/files/e5d591e511ae449eae1523a80346357b93b968d1#r71082194
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160717/4bc662de/attachment.html>


More information about the mlpack-git mailing list