[mlpack] Alternatives to neighborhood-based collaborative filtering for GSOC 2016
Ryan Curtin
ryan at ratml.org
Fri Mar 11 19:16:12 EST 2016
On Fri, Mar 11, 2016 at 11:17:05PM +0530, Divyam Khandelwal wrote:
> Hello Ryan Sir,
> After successful testing *Movilens* dataset 1 million ratings from 6000
> users on 4000 movies data. I think the changes can be bring by:
> *1. Adding time dependency*
> TimeSVD++
> • Parameterize explicit user factor vectors by time
> a u (t) = a u + α u dev(t) + א ut
> • a u is a static baseline vector
> • α u dev(t) is a static vector multiplied by the deviation from the user’s
> average rating time
> • Captures linear changes in time
> • א ut is a vector learned for a specific point in time
>
> *2.By Stacked Ridge Regression*
> • Diminishing returns from optimizing a single algorithm
> • Different models capture different aspects of the data
> • Moral: Errors of different algorithms can cancel out
> • Treat the prediction errors of one algorithm as input “preferences” of
> second algorithm
> • Second algorithm can learn to predict and hence offset the errors of the
> first
> • improved accuracy
>
> *3.KNN by User Optimized Weights*
>
> I am trying to implementing cf algorithm by optimizing weights and time
> dependency and will update you.
> Am i going on right direction?
> Can we make it more efficient by using stacked linear regression?
I think someone else said that weighted KNN did not perform very well,
but I am not sure if they were referring to the same algorithm that you
are thinking of.
For a time-based model, please spend some time thinking about what the
right abstraction to use to handle data is. There is a little more
discussion about that in this thread:
https://mailman.cc.gatech.edu/pipermail/mlpack/2016-March/000858.html
But overall the idea of adding some new algorithms to do collaborative
filtering is just fine.
Thanks,
Ryan
--
Ryan Curtin | "Leave the gun. Take the cannoli."
ryan at ratml.org | - Clemenza
More information about the mlpack
mailing list