[mlpack-git] [mlpack/mlpack] Optimize load csv 00 (#681)

Tham notifications at github.com
Sun Jul 17 04:56:30 EDT 2016


Some performance measurements with loading a file with 38.8MB.

Fast csv : 60ms
IoStream : 490ms

Fast csv is 8 times faster than IoStream.
Sounds good, but the bottleneck Load is not file loading but the parser.

Original mapper performance : 
transpose : 9616 msec
non transpose : 10131 msec

Parsers and mapper dominate the times.

String to int measurement of different solutions : [http://www.kumobius.com/2013/08/c-string-to-int/](url)

I assume it is quite hard to beat spirit.

The fastest run time solution, I guess is reuse part of the fast csv reader(I could extract part of the codes) to read the file, use boost::spirit/manual converter to parse the file.

Codes of performance measurement : 

```
BOOST_AUTO_TEST_CASE(FastCSVSpeed)
{
    io::LineReader reader("big_file.csv");
    size_t line_num = 0;
    auto const t1 =
            std::chrono::high_resolution_clock::now();
    while(auto *line = reader.next_line()){
        ++line_num;
    }
    auto const t2 =
            std::chrono::high_resolution_clock::now();
    auto const duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
    std::cout<<"line num "<<line_num<<std::endl;
    std::cout<<"duration "<<duration<<std::endl;
}

BOOST_AUTO_TEST_CASE(IoStream)
{        
    std::ifstream in("big_file.csv");
    std::ios_base::sync_with_stdio(false);
    in.tie(nullptr);
    std::string line;
    size_t line_num = 0;
    auto const t1 =
            std::chrono::high_resolution_clock::now();
    while(std::getline(in, line)){
        ++line_num;
    }
    auto const t2 =
            std::chrono::high_resolution_clock::now();
    auto const duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
    std::cout<<"line num "<<line_num<<std::endl;
    std::cout<<"duration "<<duration<<std::endl;
}
```

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/681#issuecomment-233172179
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160717/401def5f/attachment-0001.html>


More information about the mlpack-git mailing list