[mlpack-git] [mlpack/mlpack] Optimize load csv (#678)

stereomatchingkiss notifications at github.com
Sat Jun 4 21:35:54 EDT 2016


Hi, I use boost::spirit to implement the csv parser, it is more memory efficient and faster.

parse file with 1 million lines, 39796KByte

spirit version :

transpose : 2151 msec
non transpose : 4073 msec

old version :

transpose : 9616 msec
non transpose : 10131 msec

non transpose version is slower, I guess it is because arma::Mat is column wise.

Upload for code reviews, haven't integrate it into the load function and run the test cases yet.

ps : Single thread only, do not know multi-thread can make performance become better or worse, DataSetInfo is not a lock free data structure. If we want to utilize the power of multi-thread, I think we could read a bunch of string into the vector, create thread pool and DataSetInfo vectors, merge the DataSetInfo at last.

You can view, comment on, or merge this pull request online at:

  https://github.com/mlpack/mlpack/pull/678

-- Commit Summary --

  * add overload, able to move string
  * fix bug--infinite recursive call
  * first commit
  * 1 : fix bug, did not consider case like "210DM, 1~200"
  * fix bug--category conversion should based on columns but not rows

-- File Changes --

    M src/mlpack/core/data/dataset_info.hpp (16)
    M src/mlpack/core/data/dataset_info_impl.hpp (9)
    A src/mlpack/core/data/load_csv.hpp (313)

-- Patch Links --

https://github.com/mlpack/mlpack/pull/678.patch
https://github.com/mlpack/mlpack/pull/678.diff

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/678
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160604/fcb67ebc/attachment-0001.html>


More information about the mlpack-git mailing list