[mlpack-git] [mlpack/mlpack] data::load may becomes an io bottleneck? (#707)
cypro666
notifications at github.com
Mon Jun 27 06:00:45 EDT 2016
When I use mlpack_kmeans client tool for a big dataset about 300MB csv file:
[INFO ] Loading 'train.csv' as CSV data. Size is **11 x 10000000.**
[INFO ] Program timers:
[INFO ] **clustering: 21.957208s**
[INFO ] computing_neighbors: 0.000669s
[INFO ] knn: 0.000710s
[INFO ] **loading_data: 28.429786s**
[INFO ] saving_data: 0.577348s
[INFO ] total_time: 51.004174s
As we see, loading data takes a long time, even longer than training ... so I use another simple impl of myself to read and split csv file and init armadillo matrix. In fact, this should take less than 5 seconds.
The source code of core/data/load_impl.hpp has a lot of optimized spaces, you known sometimes, the routine need to be execute many times, if loading becomes faster... :)
---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/issues/707
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160627/dd67d9a6/attachment.html>
More information about the mlpack-git
mailing list