[mlpack-git] [mlpack/mlpack] Optimize load csv (#678)
stereomatchingkiss
notifications at github.com
Sat Jun 4 21:35:54 EDT 2016
Hi, I use boost::spirit to implement the csv parser, it is more memory efficient and faster.
parse file with 1 million lines, 39796KByte
spirit version :
transpose : 2151 msec
non transpose : 4073 msec
old version :
transpose : 9616 msec
non transpose : 10131 msec
non transpose version is slower, I guess it is because arma::Mat is column wise.
Upload for code reviews, haven't integrate it into the load function and run the test cases yet.
ps : Single thread only, do not know multi-thread can make performance become better or worse, DataSetInfo is not a lock free data structure. If we want to utilize the power of multi-thread, I think we could read a bunch of string into the vector, create thread pool and DataSetInfo vectors, merge the DataSetInfo at last.
You can view, comment on, or merge this pull request online at:
https://github.com/mlpack/mlpack/pull/678
-- Commit Summary --
* add overload, able to move string
* fix bug--infinite recursive call
* first commit
* 1 : fix bug, did not consider case like "210DM, 1~200"
* fix bug--category conversion should based on columns but not rows
-- File Changes --
M src/mlpack/core/data/dataset_info.hpp (16)
M src/mlpack/core/data/dataset_info_impl.hpp (9)
A src/mlpack/core/data/load_csv.hpp (313)
-- Patch Links --
https://github.com/mlpack/mlpack/pull/678.patch
https://github.com/mlpack/mlpack/pull/678.diff
---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/678
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160604/fcb67ebc/attachment-0001.html>
More information about the mlpack-git
mailing list