I will try to fix the loading issue on this weekend and open a pull<br>

request.<br>

<br>

2016-07-18 14:34 GMT+08:00 Keon Kim &lt;notifications@github.com&gt;:<br>

<br>

&gt; @rcurtin &lt;https://github.com/rcurtin&gt; @stereomatchingkiss<br>

&gt; &lt;https://github.com/stereomatchingkiss&gt;<br>

&gt; I think now the overloads that produces output matrix are little bit more<br>

&gt; optimized.<br>

&gt; The previous method went through every matrix again and again when<br>

&gt; imputing each dimensions.<br>

&gt;<br>

&gt; Now the copy of the matrix happens at the same time calculating the mean<br>

&gt; (or median or anything). And target vector still remains to reduce the work<br>

&gt; of going through the dimension again.<br>

&gt; So now it becomes (1m + 1t) (copy and caculate + replace) instead of<br>

&gt; previous (1m + 1d + 1t) (copy + caculate + replace). (m is the whole<br>

&gt; matrix, d is dimension, and t is the target vector). This showed slight<br>

&gt; improvements in performance.<br>

&gt;<br>

&gt; However for the executable, I made it so that when going through every<br>

&gt; dimensions, first to check if any mappings exist in the dimension, put them<br>

&gt; in a list of dirtyDimensions, and apply the imputation methods on those<br>

&gt; dimensions. And when applying the changes using Impute(), the executable<br>

&gt; uses the overload that does not produce the output matrix. This one results<br>

&gt; in (1d + 1t) for every dimensions that have missing value mappings.<br>

&gt;<br>

&gt; Benchmarks:<br>

&gt; data: &#39;imputer.csv&#39; as CSV data. Size is 400850 x 4.<br>

&gt;<br>

&gt; [INFO ] 15970 mappings in dimension 0.<br>

&gt; [INFO ] 2646 mappings in dimension 1.<br>

&gt; [INFO ] 2646 mappings in dimension 2.<br>

&gt; [INFO ] 2661 mappings in dimension 3.<br>

&gt; mlpack_preprocess_imputer -i imputer.csv -d 0 -m a -s mean -v<br>

&gt;<br>

&gt; Impute one dimension<br>

&gt;<br>

&gt;    - overload producing output (1m + 1d + 1t) for every dimensions:<br>

&gt;    0.058182s<br>

&gt;    - overload producing output (1m + 1t) for every dimensions: 0.056293s<br>

&gt;    - overload without producing output (1d + 1t) for every dimensions:<br>

&gt;    0.047528s<br>

&gt;<br>

&gt; And for FYI - Impute all dimensions<br>

&gt; Same data,<br>

&gt; mlpack_preprocess_imputer -i imputer.csv -m a -s mean -v<br>

&gt;<br>

&gt;    - overload without producing output(1d + 1t) for every dimensions.<br>

&gt;<br>

&gt; [INFO ]   imputation: 0.197194s<br>

&gt; [INFO ]   loading_data: 18.417980s<br>

&gt; [INFO ]   total_time: 18.616683s<br>

&gt;<br>

&gt; I know this is being fixed, but the most of the overhead comes from the<br>

&gt; loading_data right now.<br>

&gt;<br>

&gt; —<br>

&gt; You are receiving this because you were mentioned.<br>

&gt; Reply to this email directly, view it on GitHub<br>

&gt; &lt;https://github.com/mlpack/mlpack/pull/694#issuecomment-233244428&gt;, or mute<br>

&gt; the thread<br>

&gt; &lt;https://github.com/notifications/unsubscribe-auth/ABt-unjAQz3afitJLBrdVouI7fYzUHAlks5qWx6QgaJpZM4I07W-&gt;<br>

&gt; .<br>

&gt;<br>


<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">&mdash;<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/694#issuecomment-233748087">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AJ4bFLqW9cnRPahCFnq8B0M10aRhxw9Kks5qXSzbgaJpZM4I07W-">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFIIatqxNJp4h0rQyeS5CMwAzYb2lks5qXSzbgaJpZM4I07W-.gif" width="1" /></p>

<div itemscope itemtype="http://schema.org/EmailMessage">

<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">

  <link itemprop="url" href="https://github.com/mlpack/mlpack/pull/694#issuecomment-233748087"></link>

  <meta itemprop="name" content="View Pull Request"></meta>

</div>

<meta itemprop="description" content="View this Pull Request on GitHub"></meta>

</div>