[mlpack] Rotate new data with Kernel PCA

Ryan Curtin gth671b at mail.gatech.edu
Wed Apr 2 19:05:08 EDT 2014


On Wed, Apr 02, 2014 at 10:45:38PM +0000, dslate at speakeasy.net wrote:
> Hi,
> 
> I am quite experienced with predictive analytics and machine learning, but
> I'm a newbie to mlpack and this mailing list, so please forgive me if I'm
> not posting my question to the right place. 
> 
> I would like to call mlpack's kpca::KernelPCA facility from a C++ program
> to perform kernel PCA analysis on the feature matrix for some "training"
> data for the purposes of dimensionality reduction, and then apply the
> resulting rotation on some new "test" data.  I've done this kind of thing
> with regular (linear) PCA in R, OpenCV, etc., using a "predict" or
> "project" method, but I have been unable to figure out how to do the
> equivalent operation using mlpack, and I can't seem to find any examples
> of this.  The documentation of the various versions of the Apply method
> seem to all involve doing the KernelPCA analysis on some data, and then
> transforming the same data, but I see no way to apply the results to new
> data. 
> 
> Can anyone give an example of how to do this?

Hi Dave,

What you're trying to do is apply the nonlinear mappings of kernel PCA
to data other than what the kernel matrix was calculated on.  For
regular PCA, this is easily possible because the eigenvectors are
calculated in the input space that your points live in.  Then you just
use those eigenvectors and project your new data onto it.  This is
pretty straightforward in mlpack.

However, for kernel PCA, this is less simple.  Kernel PCA
eigendecomposes a matrix that is built in the kernel space, not the
input space.  So, in general, you can't take the eigenvectors produced
by that eigendecomposition and multiply them to your new data to get
nonlinearly mapped test data.

I think that what you are trying to do is possible, and detailed in this
paper:

https://papers.nips.cc/paper/2461-out-of-sample-extensions-for-lle-isomap-mds-eigenmaps-and-spectral-clustering.pdf

Unfortunately, mlpack doesn't have that support implemented.  From what
it looks like, kernlab in R does support this functionality.  I am not
sure exactly what they have implemented to map new points, but it's
probably the same thing as the paper above (or a variant thereof).  I'll
probably open a bug in the next few days to implement that support, but
it almost certainly won't be implemented in the short-term...

Sorry that I don't have a better answer or solution to your problem. :-\

Thanks,

Ryan

-- 
Ryan Curtin    | "Gentlemen, you can't fight in here!  This is the
ryan at ratml.org | War Room!" - President Muffley


More information about the mlpack mailing list