[mlpack-git] [mlpack/mlpack] Optimize load csv 00 (#681)
Tham
notifications at github.com
Sun Jul 17 04:56:30 EDT 2016
Some performance measurements with loading a file with 38.8MB.
Fast csv : 60ms
IoStream : 490ms
Fast csv is 8 times faster than IoStream.
Sounds good, but the bottleneck Load is not file loading but the parser.
Original mapper performance :
transpose : 9616 msec
non transpose : 10131 msec
Parsers and mapper dominate the times.
String to int measurement of different solutions : [http://www.kumobius.com/2013/08/c-string-to-int/](url)
I assume it is quite hard to beat spirit.
The fastest run time solution, I guess is reuse part of the fast csv reader(I could extract part of the codes) to read the file, use boost::spirit/manual converter to parse the file.
Codes of performance measurement :
```
BOOST_AUTO_TEST_CASE(FastCSVSpeed)
{
io::LineReader reader("big_file.csv");
size_t line_num = 0;
auto const t1 =
std::chrono::high_resolution_clock::now();
while(auto *line = reader.next_line()){
++line_num;
}
auto const t2 =
std::chrono::high_resolution_clock::now();
auto const duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
std::cout<<"line num "<<line_num<<std::endl;
std::cout<<"duration "<<duration<<std::endl;
}
BOOST_AUTO_TEST_CASE(IoStream)
{
std::ifstream in("big_file.csv");
std::ios_base::sync_with_stdio(false);
in.tie(nullptr);
std::string line;
size_t line_num = 0;
auto const t1 =
std::chrono::high_resolution_clock::now();
while(std::getline(in, line)){
++line_num;
}
auto const t2 =
std::chrono::high_resolution_clock::now();
auto const duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
std::cout<<"line num "<<line_num<<std::endl;
std::cout<<"duration "<<duration<<std::endl;
}
```
---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/681#issuecomment-233172179
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160717/401def5f/attachment-0001.html>
More information about the mlpack-git
mailing list