[mlpack-svn] [MLPACK] #242: LARS produces NaNs when the data matrix has duplicate features
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Tue Aug 14 14:53:30 EDT 2012
#242: LARS produces NaNs when the data matrix has duplicate features
------------------------------+---------------------------------------------
Reporter: niche | Owner: niche
Type: defect | Status: new
Priority: major | Milestone:
Component: armadillo sparse | Keywords: lars, sparse coding
Blocking: | Blocked By:
------------------------------+---------------------------------------------
When the data matrix (points as rows) contains two columns that are very
close in norm, and when both of these columns will be used in the path to
the optimal solution, CholeskyInsert ends up trying to solve a linear
system that is rank-deficient and produces NaNs. I have not yet verified
whether this problem also occurs when the useCholesky option is set to
false.
This bug also affects SparseCoding, which uses LARS for the sparse codes
computation step.
Typically, the data matrix (or dictionary, in the case of sparse coding)
should not have two columns that are very close together, as the stability
of the method is then placed in question. Nevertheless, it would be nice
to perhaps prefer one of the columns/features rather than going into NaN
hell.
--
Ticket URL: <https://trac.research.cc.gatech.edu/fastlab/ticket/242>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed by the FASTLAB at Georgia Tech under Dr. Alex Gray.
More information about the mlpack-svn
mailing list