[mlpack-git] [mlpack] CF should avoid calculating full matrix when providing recommendations (#406)

Ryan Curtin notifications at github.com
Thu Feb 19 17:13:02 EST 2015

Right now, the first line of `CF::GetRecommendations()` reads

rating = w * h

which has the issue that the RAM on the system must be equal to the number of items vs. the number of recommendations.  Then, we run tree-based kNN on the rating matrix, which is of high dimension, which will be slow:

// Calculate the neighborhood of the queried users.
// This should be a templatized option.
neighbor::AllkNN a(rating, query);
arma::mat resultingDistances; // Temporary storage.
a.Search(numUsersForSimilarity, neighborhood, resultingDistances);

But this isn't necessary.  Note that what we are trying to do is find the most similar users (columns), but we have decomposed the input matrix X = W * H.  (H is the matrix that holds the user preferences, depending on how you look at it.)

Now, some quick linear algebra gives us that X.col(i) = W * H.col(i).  But remember, we are looking for the nearest neighbors of X.col(i), so this is equivalent to the nearest neighbors of H.col(i).  Why aren't we searching for the nearest neighbors in the H matrix, then?

A patch for this ticket should also include some information on the speedup obtained (in either a test program or the `cf` executable), and verification that the module provides the same results (perhaps through the already written tests).

Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20150219/1879c891/attachment.html>

More information about the mlpack-git mailing list