[mlpack-git] [mlpack/mlpack] Optimize load csv 00 (#681)

Tham notifications at github.com
Sat Jul 23 04:05:20 EDT 2016


>does the fast CSV parser outperform boost::spirit?

```
BOOST_AUTO_TEST_CASE(FastCSVSpeed)
{
    io::CSVReader<> reader(7, "big_file.csv");
    size_t line_num = 0;
    auto const t1 =
            std::chrono::high_resolution_clock::now();
    std::vector<std::string> val(7);
    //I will create an api support a vector
    while(reader.ReadRow(val[0], val[1], val[2],
                         val[3], val[4], val[5],
                         val[6])){        
    }
    auto const t2 =
            std::chrono::high_resolution_clock::now();
    auto const duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
    std::cout<<"line num "<<line_num<<std::endl;
    std::cout<<"duration "<<duration<<std::endl;
}

BOOST_AUTO_TEST_CASE(boostSpiritSpeed)
{
    using namespace boost::spirit;

    using iter_type = boost::iterator_range<char*>;

    io::LineReader reader("big_file.csv");
    auto const t1 =
            std::chrono::high_resolution_clock::now();
    int line = 0;
    auto parse_str = [&](iter_type const &)
    {
        std::cout<<line++<<",";
    };
    qi::rule<char*, iter_type(), ascii::space_type> charRule =
            qi::raw[*~qi::char_(",")];
    while(auto *line = reader.next_line()){
        qi::phrase_parse(line, line + std::strlen(line),
                         charRule % ",", ascii::space);       
    }
    auto const t2 =
            std::chrono::high_resolution_clock::now();
    auto const duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
    std::cout<<"duration "<<duration<<std::endl;
}
```

fast csv : 192
boost spirit : 224

In this test, fast csv save the string into std::string already, but spirit haven't.
If I push the string into std::string, spirit took 312 ms to finish the task.

This is just a small test, to implement the Load functions, we need more sophisticated codes, but this small test show us fast csv parser is fast enough.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/681#issuecomment-234706170
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160723/9880eb44/attachment.html>


More information about the mlpack-git mailing list