[mlpack-svn] [MLPACK] #242: LARS produces NaNs when the data matrix has duplicate features

MLPACK Trac trac at coffeetalk-1.cc.gatech.edu
Wed Jan 22 11:40:40 EST 2014


#242: LARS produces NaNs when the data matrix has duplicate features
----------------------------------+-----------------------------------------
  Reporter:  niche                |        Owner:  niche       
      Type:  defect               |       Status:  closed      
  Priority:  major                |    Milestone:  mlpack 1.1.0
 Component:  mlpack               |   Resolution:  fixed       
  Keywords:  lars, sparse coding  |     Blocking:              
Blocked By:                       |  
----------------------------------+-----------------------------------------
Changes (by rcurtin):

  * status:  new => closed
  * resolution:  => fixed


Comment:

 A fix from Michael Fox in r16154 solves the issue.  It turns out that in
 the Cholesky case, checking for rank-deficiency is very simple.  In the
 non-Cholesky case one can call arma::solve() and check its return value.
 When a linearly dependent feature is discovered, it is then added to a set
 of ignored features (which, in the case of a full-rank data matrix, is
 empty) and won't be added to the generated model (i.e., the beta parameter
 for that feature is 0).

 In r16155 I've adapted the test executable he gave into an actual test
 that's integrated with mlpack_test, to help ensure that future
 modifications to LARS still work when rank-deficient data is encountered.

 Now, in this particular case, the mlpack LARS implementation is consistent
 with the Hastie implementation of LARS in R.

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/242#comment:2>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.


More information about the mlpack-svn mailing list