[mlpack-git] [mlpack] CF cannot properly handle 0s in the input matrix (#379)

Ryan Curtin notifications at github.com
Mon Jan 12 17:27:56 EST 2015


(For background, see #376.)  In collaborative filtering or other matrix completion tasks, the user may have an incomplete rating matrix which may contain 0s.  Currently, mlpack won't handle this properly; nor will the the `arma::sp_mat` sparse matrix class.  Although sparse matrices can be easily made to use 0s, the Armadillo documentation seems to discourage this (the Armadillo documentation and sparse matrix code, since I wrote a large part of it, is mutable).

@stephentu suggested masked arrays in numpy: http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html
but personally I think any structure like this will incur significant extra overhead in both runtime and memory (if it turns out I'm wrong, then, I'm happy to use something like masked arrays).

In addition, a bunch of the existing factorizers that CF can use have checks for zero values.  Those checks should probably go away regardless of the decision here, especially when `MatType::row_col_iterator` is used, since that will iterate only over nonzero values when `MatType = sp_mat` and all values when `MatType = mat`.

I'm leaning towards a first step in this direction being a significant modification of the `arma::SpMat<eT>` class to take some extra parameter specifying what the value of a "missing value" is.  For a default `sp_mat` it should be 0; maybe for the CF applications it should be something like `NaN` or something else.  The question in my mind right now is how to provide that support cleanly in such a way that it'll be accepted upstream in Armadillo...

---
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/issues/379
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20150112/21e908a0/attachment.html>


More information about the mlpack-git mailing list