[mlpack-svn] [MLPACK] #280: Kernel PCA unexpected results
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Tue May 7 09:36:03 EDT 2013
#280: Kernel PCA unexpected results
---------------------+------------------------------------------------------
Reporter: marcus | Owner:
Type: defect | Status: new
Priority: trivial | Milestone:
Component: mlpack | Keywords:
Blocking: | Blocked By:
---------------------+------------------------------------------------------
Hello,
I've tested the Kernel PCA code and got some unexpected results.
So I looked into the code (kernel_pca_impl.hpp). I suppose it's not the
right way to compute the covariance matrix (arma::mat transData =
ccov(data);) and then compute the Kernel Matrix. Maybe I got something
wrong. But consider the data points vec(x) and vec(y) in the input space
{{{I = R^n}}}. When we map the data non linearly into a feature space F by
{{{
Phi: R^n -> F, x -> X
}}}
there is no covariance on which to perform eigendecomposition explicitly
as we would in linear PCA.
I've written some code which provides the expected results. Maybe someone
can explain if I made a wrong assumption.
{{{
#include <mlpack/core.hpp>
#include <mlpack/core/kernels/gaussian_kernel.hpp>
#include "kernel_pca.hpp"
using namespace mlpack;
using namespace mlpack::kpca;
using namespace mlpack::kernel;
int main(int argc, char** argv)
{
arma::mat data, transformedData, eigvec;
arma::vec eigVal;
// Load the data.
data::Load("circle.txt", data, true);
// Using the Gaussian Kernel to construct the Kernel Matrix.
GaussianKernel kernel;
arma::mat kernelMat = GetKernelMatrix(kernel, trans(data));
// For PCA the data has to be centered, even if the data is centered.
// It is nit guarantee the data when mapped is also centered. Since
// we actually never work in the feature space we cannot center the
data.
// Since centered data is required to perform an effective principal
// component analysis we perform a pseudo center method using the
Kernel Matrix.
arma::mat oneMat = arma::ones<arma::mat>(kernelMat.n_rows,
kernelMat.n_cols);
arma::mat kernelMatCenter = kernelMat - oneMat*kernelMat -
kernelMat*oneMat + oneMat*kernelMat*oneMat;
// Compute eigenvectors and the corresponding eigenvalues.
arma::eig_sym(eigVal, eigvec, kernelMatCenter);
// The eigenvectors and the corresponding eigenvalues are
// already sorted but in the wrong order.
// Since descend is required, we reverse the eigenvectors
// and the corresponding eigenvalues.
// To avoid temporary matrices we use swap.
int n_eigVal = eigVal.n_elem;
for(int i = 0; i < floor(n_eigVal / 2.0); i++)
eigVal.swap_rows(i, (n_eigVal - 1) - i);
eigvec = arma::fliplr(eigvec);
// Dimension of output data.
size_t dim = 2;
// Projecting the data in lower dimensions.
transformedData = eigvec.submat(0, 0, eigvec.n_rows-1, dim-1).t() *
kernelMatCenter.t();
std::cout << transformedData.t() << std::endl;
return 0;
}
}}}
I've attached the data set and the plots of the results.
Thanks and regards,
Marcus
--
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/280>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed by the FASTLAB at Georgia Tech under Dr. Alex Gray.
More information about the mlpack-svn
mailing list