[mlpack-git] [mlpack/mlpack] Optimize load csv 01 (#735)

Mon Jul 25 09:25:41 EDT 2016

Great, this is a nice prototype.  One thing to think about is, how adaptable will the fast CSV loader be to other formats, like ARFF, txt, matrix market, and anything else we might choose to support in the future?  Is there a way we can adapt the code and make it flexible?

Another important thing to consider is this: right now, we depend on Armadillo for reading CSVs, and as a result don't really test the CSV parser in great detail.  i.e. we assume that the Armadillo CSV reader doesn't have bugs.  If we switch to boost::spirit we need to test that our usage of boost::spirit is correct (but we can assume the internals of boost::spirit are tested); if we switch to the fast CSV reader here and modify it by including the source, we can no longer assume that, so we either need to write our own tests for everything, since the fast CSV library doesn't have any tests I can see.

What do you think?

I know we are trying many different things here but I think we can figure something out in the end and it will be a big improvement.  Thanks again for the time you are putting into this adventure. :)

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/735#issuecomment-234951985
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160725/eab4508b/attachment.html>