[mlpack-svn] [MLPACK] #221: GMM::Classify() is slow

MLPACK Trac trac at coffeetalk-1.cc.gatech.edu
Tue Mar 27 12:39:25 EDT 2012


#221: GMM::Classify() is slow
-------------------------+--------------------------------------------------
 Reporter:  rcurtin      |        Owner:                                                    
     Type:  enhancement  |       Status:  new                                               
 Priority:  trivial      |    Milestone:  mlpack 1.1.0                                      
Component:  mlpack       |     Keywords:  gmm classify slow probability covariance inversion
 Blocking:               |   Blocked By:                                                    
-------------------------+--------------------------------------------------
 Currently, in gmm.cpp, `GMM::Classify()` is defined like so:

 {{{

 void GMM::Classify(const arma::mat& observations,
                    arma::Col<size_t>& labels) const
 {
   // This is not the best way to do this!

   // We should not have to fill this with values, because each one should
 be
   // overwritten.
   labels.set_size(observations.n_cols);
   for (size_t i = 0; i < observations.n_cols; ++i)
   {
     // Find maximum probability component.
     double probability = 0;
     for (size_t j = 0; j < gaussians; ++j)
     {
       double newProb = Probability(observations.unsafe_col(i), j);
       if (newProb >= probability)
       {
         probability = newProb;
         labels[i] = j;
       }
     }
   }
 }
 }}}

 This is an O(n*m) operation, where m is the number of Gaussians and n is
 the number of points.  It could be sped up greatly with the use of trees
 -- using the means of the Gaussians as "reference points" and the phi()
 function as the distance metric.

 In addition, at the very least, running `GMM::Probability()` over and over
 again for each point is slow, as it will invert each covariance matrix
 each time the method is called.  It could be replaced with a better call,
 perhaps to one of the overloads of `phi()` which takes many observations
 at once.

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/221>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed by the FASTLAB at Georgia Tech under Dr. Alex Gray.


More information about the mlpack-svn mailing list