[mlpack-svn] [MLPACK] #363: use Armadillo diagonal GMM implementation when DiagonalConstraint is used with GMMs
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Mon Aug 18 13:16:07 EDT 2014
#363: use Armadillo diagonal GMM implementation when DiagonalConstraint is used
with GMMs
-----------------------------------------------------------+----------------
Reporter: rcurtin | Owner:
Type: wishlist | Status: new
Priority: trivial | Milestone:
Component: mlpack | Resolution:
Keywords: gmm, diagonal gmm, emfit, diagonalconstraint | Blocking:
Blocked By: |
-----------------------------------------------------------+----------------
Description changed by rcurtin:
Old description:
> The mlpack implementation of GMMs is hilariously inefficient. A user
> will write this code:
>
> {{{
> GMM<EMFit<KMeans, DiagonalConstraint> > gmm(dataset);
> }}}
>
> but what this code will actually do is run the regular full-covariance EM
> algorithm, and then set all non-diagonal elements of each covariance
> matrix to 0. This is obviously not the right way to do it.
>
> So the right thing to do is probably to specialize EMFit<...,
> DiagonalConstraint> so that it uses an algorithm specifically for
> diagonal covariances and not the entire covariance matrix.
>
> Armadillo now contains an implementation of diagonal GMMs, which might be
> useful for this specialization. If it will be used, one should ensure
> that the amount of memory being copied is minimal.
New description:
The mlpack implementation of GMMs is hilariously inefficient. A user will
write this code:
{{{
GMM<EMFit<KMeans, DiagonalConstraint> > gmm(dataset);
}}}
but what this code will actually do is run the regular full-covariance EM
algorithm, and then set all non-diagonal elements of each covariance
matrix to 0 at the end of each EM iteration. This is obviously not the
right way to do it.
So the right thing to do is probably to specialize EMFit<...,
DiagonalConstraint> so that it uses an algorithm specifically for diagonal
covariances and not the entire covariance matrix.
Armadillo now contains an implementation of diagonal GMMs, which might be
useful for this specialization. If it will be used, one should ensure
that the amount of memory being copied is minimal.
--
--
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/363#comment:1>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.
More information about the mlpack-svn
mailing list