[mlpack-svn] [MLPACK] #327: PCA Module Improvement

MLPACK Trac trac at coffeetalk-1.cc.gatech.edu
Wed Mar 12 16:21:08 EDT 2014


#327: PCA Module Improvement
----------------------------+-----------------------------------------------
  Reporter:  siddharth.950  |        Owner:  birm    
      Type:  enhancement    |       Status:  accepted
  Priority:  major          |    Milestone:          
 Component:  mlpack         |   Resolution:          
  Keywords:                 |     Blocking:          
Blocked By:                 |  
----------------------------+-----------------------------------------------

Comment (by rcurtin):

 The MNIST dataset is a particularly hard one because the dimensionality is
 784, but many of the dimensions have very little power (that is, there
 will be a lot of eigenvalues that are quite small).  When the eigenvalues
 are small, it is more likely that the recovered eigenvector will point a
 different direction than the true eigenvector.

 In reality, a practitioner won't run PCA on a very high-dimensional
 dataset and keep all the dimensions.  That is a rare case.  More likely,
 they might run PCA on a dataset like the MNIST dataset and keep maybe 50
 dimensions.  Keeping only 50 dimensions (or a small number of dimensions)
 should be a case where this algorithm should shine and outperform the
 regular PCA implementation, so I don't think there are really many
 problems here.

 Can you do some runtime comparisons with the MNIST dataset when the new
 dimension is only 50?  (and see if it passes the tests for the first 50
 eigenvalues/eigenvectors?)

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/327#comment:17>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.


More information about the mlpack-svn mailing list