[mlpack-svn] [MLPACK] #299: Enhancement of PCA library

MLPACK Trac trac at coffeetalk-1.cc.gatech.edu
Wed Sep 11 16:26:59 EDT 2013


#299: Enhancement of PCA library
----------------------------+-----------------------------------------------
  Reporter:  sumedhghaisas  |        Owner:              
      Type:  enhancement    |       Status:  new         
  Priority:  major          |    Milestone:  mlpack 1.0.7
 Component:  mlpack         |   Resolution:              
  Keywords:                 |     Blocking:              
Blocked By:                 |  
----------------------------+-----------------------------------------------

Comment (by rcurtin):

 Hello Sumedh,

 Marcus Edel and I had looked into the use of SVD for faster PCA sometime
 back.  What we found was that there are three ways to do PCA:

  * SVD on the input data
  * eigendecomposition on the input data
  * eigendecomposition on the covariance matrix

 We found that none of these three heuristics were always better.
 Sometimes, SVD would give far better performance, but sometimes it would
 give far worse performance.  There wasn't really a pattern to it.  So we
 chose to leave it using eigendecomposition.

 If you can show that for numerous datasets, SVD is faster, then I'll
 happily make the change.  :)

 Variance retention is a good idea.  I don't think function (2) makes sense
 though; the variance retained is just the normalized eigenvalues, which is
 super easy to calculate if you call the overload of Apply() that returns
 eigVal.  Then just do

 {{{
 arma::vec varRetained = eigVal / sum(eigVal); // Multiply by 100 if you
 want percentage.
 }}}

 The second two functions are useful though.  I will get to work
 incorporating them into the codebase.

 We just need two tests: one to check that the Apply() that takes
 varRetained as a parameter is functioning correctly (Also, what should
 happen when I set varRetained to 0?  Do we return 1 dimension or 0?  0
 dimensions doesn't make sense...), and then make sure that the overload of
 Apply() that returns varRetained is functioning properly.  Both of these
 are easy tests to write.

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/299#comment:1>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.


More information about the mlpack-svn mailing list