I will try to fix the loading issue on this weekend and open a pull<br>
request.<br>
<br>
2016-07-18 14:34 GMT+08:00 Keon Kim <notifications@github.com>:<br>
<br>
> @rcurtin <https://github.com/rcurtin> @stereomatchingkiss<br>
> <https://github.com/stereomatchingkiss><br>
> I think now the overloads that produces output matrix are little bit more<br>
> optimized.<br>
> The previous method went through every matrix again and again when<br>
> imputing each dimensions.<br>
><br>
> Now the copy of the matrix happens at the same time calculating the mean<br>
> (or median or anything). And target vector still remains to reduce the work<br>
> of going through the dimension again.<br>
> So now it becomes (1m + 1t) (copy and caculate + replace) instead of<br>
> previous (1m + 1d + 1t) (copy + caculate + replace). (m is the whole<br>
> matrix, d is dimension, and t is the target vector). This showed slight<br>
> improvements in performance.<br>
><br>
> However for the executable, I made it so that when going through every<br>
> dimensions, first to check if any mappings exist in the dimension, put them<br>
> in a list of dirtyDimensions, and apply the imputation methods on those<br>
> dimensions. And when applying the changes using Impute(), the executable<br>
> uses the overload that does not produce the output matrix. This one results<br>
> in (1d + 1t) for every dimensions that have missing value mappings.<br>
><br>
> Benchmarks:<br>
> data: 'imputer.csv' as CSV data. Size is 400850 x 4.<br>
><br>
> [INFO ] 15970 mappings in dimension 0.<br>
> [INFO ] 2646 mappings in dimension 1.<br>
> [INFO ] 2646 mappings in dimension 2.<br>
> [INFO ] 2661 mappings in dimension 3.<br>
> mlpack_preprocess_imputer -i imputer.csv -d 0 -m a -s mean -v<br>
><br>
> Impute one dimension<br>
><br>
> - overload producing output (1m + 1d + 1t) for every dimensions:<br>
> 0.058182s<br>
> - overload producing output (1m + 1t) for every dimensions: 0.056293s<br>
> - overload without producing output (1d + 1t) for every dimensions:<br>
> 0.047528s<br>
><br>
> And for FYI - Impute all dimensions<br>
> Same data,<br>
> mlpack_preprocess_imputer -i imputer.csv -m a -s mean -v<br>
><br>
> - overload without producing output(1d + 1t) for every dimensions.<br>
><br>
> [INFO ] imputation: 0.197194s<br>
> [INFO ] loading_data: 18.417980s<br>
> [INFO ] total_time: 18.616683s<br>
><br>
> I know this is being fixed, but the most of the overhead comes from the<br>
> loading_data right now.<br>
><br>
> —<br>
> You are receiving this because you were mentioned.<br>
> Reply to this email directly, view it on GitHub<br>
> <https://github.com/mlpack/mlpack/pull/694#issuecomment-233244428>, or mute<br>
> the thread<br>
> <https://github.com/notifications/unsubscribe-auth/ABt-unjAQz3afitJLBrdVouI7fYzUHAlks5qWx6QgaJpZM4I07W-><br>
> .<br>
><br>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/694#issuecomment-233748087">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AJ4bFLqW9cnRPahCFnq8B0M10aRhxw9Kks5qXSzbgaJpZM4I07W-">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFIIatqxNJp4h0rQyeS5CMwAzYb2lks5qXSzbgaJpZM4I07W-.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/mlpack/mlpack/pull/694#issuecomment-233748087"></link>
<meta itemprop="name" content="View Pull Request"></meta>
</div>
<meta itemprop="description" content="View this Pull Request on GitHub"></meta>
</div>