[mlpack-git] [mlpack/mlpack] Fix mapping issue (#660)

Wed Jun 1 00:18:08 EDT 2016

>this could mean lots of memory allocation if the dataset is of any significant size

I agree with that, this solution is not memory efficient. 

>we read a row at a time, and if one of the dimensions has to be mapped, then we need to just read through the file again, getting the right columns, and applying the mappings and updating the matrix we have

This solution will save much more memory, drawbacks are slower speed(IO manipulation is expensive) and more complicated file manipulation. 

I think the problem is, should we apply this optimization?Realistically, If the files are big, the users should not use CSV from the beginning.

>Although realistically, if the user wants fast loading, they should not be using CSV

Agree

>If the matrix is not to be transposed, we only need to read through the row we are looking at

This part already did, I only store the whole string into vector of vector when the users want to transpose the file.

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/660#issuecomment-222888565
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160531/fdfbddc1/attachment.html>