<p>Hi, I use boost::spirit to implement the csv parser, it is more memory efficient and faster.</p>
<p>parse file with 1 million lines, 39796KByte</p>
<p>spirit version :</p>
<p>transpose : 2151 msec<br>
non transpose : 4073 msec</p>
<p>old version :</p>
<p>transpose : 9616 msec<br>
non transpose : 10131 msec</p>
<p>non transpose version is slower, I guess it is because arma::Mat is column wise.</p>
<p>Upload for code reviews, haven't integrate it into the load function and run the test cases yet.</p>
<p>ps : Single thread only, do not know multi-thread can make performance become better or worse, DataSetInfo is not a lock free data structure. If we want to utilize the power of multi-thread, I think we could read a bunch of string into the vector, create thread pool and DataSetInfo vectors, merge the DataSetInfo at last.</p>
<hr>
<h4>You can view, comment on, or merge this pull request online at:</h4>
<p> <a href='https://github.com/mlpack/mlpack/pull/678'>https://github.com/mlpack/mlpack/pull/678</a></p>
<h4>Commit Summary</h4>
<ul>
<li>add overload, able to move string</li>
<li>fix bug--infinite recursive call</li>
<li>first commit</li>
<li>1 : fix bug, did not consider case like "210DM, 1~200"</li>
<li>fix bug--category conversion should based on columns but not rows</li>
</ul>
<h4>File Changes</h4>
<ul>
<li>
<strong>M</strong>
<a href="https://github.com/mlpack/mlpack/pull/678/files#diff-0">src/mlpack/core/data/dataset_info.hpp</a>
(16)
</li>
<li>
<strong>M</strong>
<a href="https://github.com/mlpack/mlpack/pull/678/files#diff-1">src/mlpack/core/data/dataset_info_impl.hpp</a>
(9)
</li>
<li>
<strong>A</strong>
<a href="https://github.com/mlpack/mlpack/pull/678/files#diff-2">src/mlpack/core/data/load_csv.hpp</a>
(313)
</li>
</ul>
<h4>Patch Links:</h4>
<ul>
<li><a href='https://github.com/mlpack/mlpack/pull/678.patch'>https://github.com/mlpack/mlpack/pull/678.patch</a></li>
<li><a href='https://github.com/mlpack/mlpack/pull/678.diff'>https://github.com/mlpack/mlpack/pull/678.diff</a></li>
</ul>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/pull/678">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe/AJ4bFKWix2QUOGaM6rFHWkNHWgHR7bkDks5qIif6gaJpZM4IuQh1">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFEKja2dtzZUffYVL8sL27VWGpcY7ks5qIif6gaJpZM4IuQh1.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/mlpack/mlpack/pull/678"></link>
<meta itemprop="name" content="View Pull Request"></meta>
</div>
<meta itemprop="description" content="View this Pull Request on GitHub"></meta>
</div>