<p>When I use mlpack_kmeans client tool for a big dataset about 300MB csv file:</p>


<p>[INFO ] Loading 'train.csv' as CSV data.  Size is <strong>11 x 10000000.</strong><br>

[INFO ] Program timers:<br>

[INFO ]   <strong>clustering: 21.957208s</strong><br>

[INFO ]   computing_neighbors: 0.000669s<br>

[INFO ]   knn: 0.000710s<br>

[INFO ]   <strong>loading_data: 28.429786s</strong><br>

[INFO ]   saving_data: 0.577348s<br>

[INFO ]   total_time: 51.004174s</p>


<p>As we see, loading data takes a long time, even longer than training ... so I use another simple impl of myself to read and split csv file and init armadillo matrix. In fact, this should take less than 5 seconds.</p>


<p>The source code of core/data/load_impl.hpp has a lot of optimized spaces, you known sometimes, the routine need to be execute many times, if loading becomes faster... :)</p>


<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">&mdash;<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/mlpack/mlpack/issues/707">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe/AJ4bFDsruNXmWBGhvEI8z9fxn5H_2BGqks5qP59NgaJpZM4I-7-3">mute the thread</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFB0Wx-gQR9XZn1XN69upmZMJEccYks5qP59NgaJpZM4I-7-3.gif" width="1" /></p>

<div itemscope itemtype="http://schema.org/EmailMessage">

<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">

  <link itemprop="url" href="https://github.com/mlpack/mlpack/issues/707"></link>

  <meta itemprop="name" content="View Issue"></meta>

</div>

<meta itemprop="description" content="View this Issue on GitHub"></meta>

</div>