[mlpack-svn] [MLPACK] #327: PCA Module Improvement
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Wed Mar 12 16:21:08 EDT 2014
#327: PCA Module Improvement
----------------------------+-----------------------------------------------
Reporter: siddharth.950 | Owner: birm
Type: enhancement | Status: accepted
Priority: major | Milestone:
Component: mlpack | Resolution:
Keywords: | Blocking:
Blocked By: |
----------------------------+-----------------------------------------------
Comment (by rcurtin):
The MNIST dataset is a particularly hard one because the dimensionality is
784, but many of the dimensions have very little power (that is, there
will be a lot of eigenvalues that are quite small). When the eigenvalues
are small, it is more likely that the recovered eigenvector will point a
different direction than the true eigenvector.
In reality, a practitioner won't run PCA on a very high-dimensional
dataset and keep all the dimensions. That is a rare case. More likely,
they might run PCA on a dataset like the MNIST dataset and keep maybe 50
dimensions. Keeping only 50 dimensions (or a small number of dimensions)
should be a case where this algorithm should shine and outperform the
regular PCA implementation, so I don't think there are really many
problems here.
Can you do some runtime comparisons with the MNIST dataset when the new
dimension is only 50? (and see if it passes the tests for the first 50
eigenvalues/eigenvectors?)
--
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/327#comment:17>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.
More information about the mlpack-svn
mailing list