[mlpack-git] [mlpack/mlpack] Density Estimation Tree made sparse-enabled (#802)

Ivan Georgiev notifications at github.com
Wed Oct 26 04:49:14 EDT 2016


The problem with CSC in this task is that we want in-row iteration and we _need_ it to be of complexity O(1), and with CSC it is not. In the same time we need _swap_cols_ operation, which would be much slower (I guess) if we work on the transposed matrix. If we have dual indices, so we can "walk" on both direction fast - that would be fine. And, generally - the matrix that I'm trying with is pretty sparse as well - approx. 99.995% is empty. 

The benefit of `ExtractSplit` is that for dense matrices it makes in-place sorting, which is not the case with `arma::sort`. Another thing is that even with `row_col_iterator` you'll need to know you're in sparse matrix - because if the first entry you get is not in the beginning you need to _fake_ a previous value of zero, to get a split between them. Also if you happen to jump over indices (with sorted sparse rows/cols it is good that all emptiness is in a single slot, so you have one jump at most), etc. So split-attempts won't be generic. That's why I've chosen to have split extraction points custom, and then iteration over them - generic. I'm a big enemy of duplicated code too! :-) Also, I've tried avoiding `submat` call and directly iterating over the given region to extract splits - it turned to be _way_ much slower for sparse matrices. So now the most expensive step is `submat`.

For sparse sorting - isn't it best if you make another branch here, in the master repo and I make a pull request for `SpMat` sort to that branch, rather then to `master`? Or, you can get it directly from my repo - it is in `feature/sparse_sort` branch. Whichever is easier for you.

And finally - the Travis CI is failing on some other tests - not DET-related :-)


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/802#issuecomment-256286777
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20161026/c45450bb/attachment.html>


More information about the mlpack-git mailing list