[mlpack-svn] [MLPACK] #242: LARS produces NaNs when the data matrix has duplicate features

Tue Aug 14 14:53:30 EDT 2012

#242: LARS produces NaNs when the data matrix has duplicate features
------------------------------+---------------------------------------------
 Reporter:  niche             |        Owner:  niche              
     Type:  defect            |       Status:  new                
 Priority:  major             |    Milestone:                     
Component:  armadillo sparse  |     Keywords:  lars, sparse coding
 Blocking:                    |   Blocked By:                     
------------------------------+---------------------------------------------
 When the data matrix (points as rows) contains two columns that are very
 close in norm, and when both of these columns will be used in the path to
 the optimal solution, CholeskyInsert ends up trying to solve a linear
 system that is rank-deficient and produces NaNs. I have not yet verified
 whether this problem also occurs when the useCholesky option is set to
 false.

 This bug also affects SparseCoding, which uses LARS for the sparse codes
 computation step.

 Typically, the data matrix (or dictionary, in the case of sparse coding)
 should not have two columns that are very close together, as the stability
 of the method is then placed in question. Nevertheless, it would be nice
 to perhaps prefer one of the columns/features rather than going into NaN
 hell.

-- 
Ticket URL: <https://trac.research.cc.gatech.edu/fastlab/ticket/242>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed by the FASTLAB at Georgia Tech under Dr. Alex Gray.