<p>(For background, see <a href="https://github.com/mlpack/mlpack/pull/376" class="issue-link" title="Warn if ratings of 0 are found">#376</a>.) In collaborative filtering or other matrix completion tasks, the user may have an incomplete rating matrix which may contain 0s. Currently, mlpack won't handle this properly; nor will the the <code>arma::sp_mat</code> sparse matrix class. Although sparse matrices can be easily made to use 0s, the Armadillo documentation seems to discourage this (the Armadillo documentation and sparse matrix code, since I wrote a large part of it, is mutable).</p>
<p><a href="https://github.com/stephentu" class="user-mention">@stephentu</a> suggested masked arrays in numpy: <a href="http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html">http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html</a><br>
but personally I think any structure like this will incur significant extra overhead in both runtime and memory (if it turns out I'm wrong, then, I'm happy to use something like masked arrays).</p>
<p>In addition, a bunch of the existing factorizers that CF can use have checks for zero values. Those checks should probably go away regardless of the decision here, especially when <code>MatType::row_col_iterator</code> is used, since that will iterate only over nonzero values when <code>MatType = sp_mat</code> and all values when <code>MatType = mat</code>.</p>
<p>I'm leaning towards a first step in this direction being a significant modification of the <code>arma::SpMat<eT></code> class to take some extra parameter specifying what the value of a "missing value" is. For a default <code>sp_mat</code> it should be 0; maybe for the CF applications it should be something like <code>NaN</code> or something else. The question in my mind right now is how to provide that support cleanly in such a way that it'll be accepted upstream in Armadillo...</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br>Reply to this email directly or <a href="https://github.com/mlpack/mlpack/issues/379">view it on GitHub</a>.<img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFOjB4DM__ObVqDtz_utAZut7PSi2ks5nhEHsgaJpZM4DRaGQ.gif" width="1" /></p>
<div itemscope itemtype="http://schema.org/EmailMessage">
<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">
<link itemprop="url" href="https://github.com/mlpack/mlpack/issues/379"></link>
<meta itemprop="name" content="View Issue"></meta>
</div>
<meta itemprop="description" content="View this Issue on GitHub"></meta>
</div>