[mlpack] Alternatives to neighborhood-based collaborative filtering for GSOC 2016

Fri Mar 11 19:16:12 EST 2016

On Fri, Mar 11, 2016 at 11:17:05PM +0530, Divyam Khandelwal wrote:
> Hello Ryan Sir,
> After successful testing *Movilens* dataset 1 million ratings from 6000
> users on 4000 movies data. I think the changes can be bring  by:
> *1. Adding time dependency*
> TimeSVD++
> • Parameterize explicit user factor vectors by time
> a u (t) = a u + α u dev(t) + א ut
> • a u is a static baseline vector
> • α u dev(t) is a static vector multiplied by the deviation from the user’s
> average rating time
> • Captures linear changes in time
> • א ut is a vector learned for a specific point in time
> 
> *2.By Stacked Ridge Regression*
> • Diminishing returns from optimizing a single algorithm
> • Different models capture different aspects of the data
> • Moral: Errors of different algorithms can cancel out
> • Treat the prediction errors of one algorithm as input “preferences” of
> second       algorithm
> • Second algorithm can learn to predict and hence offset the errors of the
> first
> • improved accuracy
> 
> *3.KNN by User Optimized Weights*
> 
> I am trying to implementing cf algorithm by optimizing weights and time
> dependency and will update you.
> Am i going on right direction?
> Can we make it more efficient by using stacked linear regression?

I think someone else said that weighted KNN did not perform very well,
but I am not sure if they were referring to the same algorithm that you
are thinking of.

For a time-based model, please spend some time thinking about what the
right abstraction to use to handle data is.  There is a little more
discussion about that in this thread:

https://mailman.cc.gatech.edu/pipermail/mlpack/2016-March/000858.html

But overall the idea of adding some new algorithms to do collaborative
filtering is just fine.

Thanks,

Ryan

-- 
Ryan Curtin    | "Leave the gun.  Take the cannoli."
ryan at ratml.org |   - Clemenza