[mlpack-svn] [MLPACK] #221: GMM::Classify() is slow
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Tue Mar 27 12:39:25 EDT 2012
#221: GMM::Classify() is slow
-------------------------+--------------------------------------------------
Reporter: rcurtin | Owner:
Type: enhancement | Status: new
Priority: trivial | Milestone: mlpack 1.1.0
Component: mlpack | Keywords: gmm classify slow probability covariance inversion
Blocking: | Blocked By:
-------------------------+--------------------------------------------------
Currently, in gmm.cpp, `GMM::Classify()` is defined like so:
{{{
void GMM::Classify(const arma::mat& observations,
arma::Col<size_t>& labels) const
{
// This is not the best way to do this!
// We should not have to fill this with values, because each one should
be
// overwritten.
labels.set_size(observations.n_cols);
for (size_t i = 0; i < observations.n_cols; ++i)
{
// Find maximum probability component.
double probability = 0;
for (size_t j = 0; j < gaussians; ++j)
{
double newProb = Probability(observations.unsafe_col(i), j);
if (newProb >= probability)
{
probability = newProb;
labels[i] = j;
}
}
}
}
}}}
This is an O(n*m) operation, where m is the number of Gaussians and n is
the number of points. It could be sped up greatly with the use of trees
-- using the means of the Gaussians as "reference points" and the phi()
function as the distance metric.
In addition, at the very least, running `GMM::Probability()` over and over
again for each point is slow, as it will invert each covariance matrix
each time the method is called. It could be replaced with a better call,
perhaps to one of the overloads of `phi()` which takes many observations
at once.
--
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/221>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed by the FASTLAB at Georgia Tech under Dr. Alex Gray.
More information about the mlpack-svn
mailing list