[mlpack-git] [mlpack/mlpack] Fix mapping issue (#660)

Ryan Curtin notifications at github.com
Tue May 31 15:20:25 EDT 2016


I agree, this is a nicer approach to the loading problem.  But I am a little concerned about speed: if we are loading the entire csv into `vector<vector<string>>`, this could mean lots of memory allocation if the dataset is of any significant size.  It's possible to do better with memory, by not storing the dataset entirely in memory: we read a row at a time, and if one of the dimensions has to be mapped, then we need to just read through the file again, getting the right columns, and applying the mappings and updating the matrix we have.  (If the matrix is not to be transposed, we only need to read through the row we are looking at.)

So it might be worth checking how long this takes on datasets that are, say, 100MB and make sure it does not take insanely long.  (Although realistically, if the user wants fast loading, they should not be using CSV!)

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/660#issuecomment-222792249
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160531/3dcf443e/attachment.html>


More information about the mlpack-git mailing list