[mlpack-git] [mlpack/mlpack] DatasetMapper & Imputer (#694)

Thu Jul 7 10:42:29 EDT 2016

> +              const bool transpose = true)
> +  {
> +    //initiate output
> +    output = input;
> +
> +    if (transpose)
> +    {
> +      arma::Mat<T> medianMat = arma::median(input, 1);
> +      for (size_t i = 0; i < input.n_cols; ++i)
> +      {
> +        if (input(dimension, i) == mappedValue ||
> +            std::isnan(input(dimension, i)))
> +        {
> +          output(dimension, i) = medianMat(dimension, 0);
> +        }
> +      }

This pattern seems to exist in all of the imputation strategies, so my comment here applies to all of them.  Here you are getting a single value out of each column, and each column is held contiguous in memory.  So each element you access is probably not in the same cache line, and as a result will probably be slower.  So if I were to call the imputer with every single dimension sequentially, it would be a lot slower than just taking one pass over the matrix and imputing each element as we went.  (It would also be slower in this case because we would be calculating the mean for every dimension every time we called the method.)

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/694/files/a8818316a04506530e2269a2e0a32ba2f6a1c83b#r69919289
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160707/d2237299/attachment.html>