[mlpack-git] [mlpack/mlpack] Density Estimation Tree made sparse-enabled (#802)

Ivan Georgiev notifications at github.com
Tue Nov 1 11:56:42 EDT 2016


Yes, I'm done with the PR - feel free to merge it, whenever is ok for you.

I wish I could agree on the `ExtractSort` issue, but there are sparse-specific problems, in short - `row_col_iterator` will not report the zero values, but the split decision algorithm does need them - to make a split attempt between a zero and the adjacent non-zero element. The all-universal implementation with sparse enabled `arma::sort` and then index-based iteration will be correct, but terribly slow because of all the zeroes, that will be reported.

My implementation on `arma::sort` is as part of MLPack's arma extensions. It is a less-than-a-page source code, which I can even paste into a comment, I guess :-) However, if it's really better to go the PR to Armadillo way - I can do it, I just don't know exactly when.

Regarding the coordinate list sparse matrix implementation - isn't the CSC storage faster when it comes to column-based iteration? Anything that is fast on both rows and columns directed iteration will be fine.

Regarding the format loading - I'd say that the format to be supported is Matrix Market, which is a coordinate list with the header. Such format will also save one swipe through the input file, which is happening now - for obtaining the matrix size, and which technically is not even quite correct, because if you have zeroes at the last columns/rows you will not have entries for these in the coordinate list and will deduce a false matrix size.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/802#issuecomment-257605495
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20161101/cbac2eb9/attachment.html>


More information about the mlpack-git mailing list