[mlpack-svn] [MLPACK] #256: SGD converges to NaN, but only when the coordinates aren't printed at each iteration

MLPACK Trac trac at coffeetalk-1.cc.gatech.edu
Wed Oct 31 17:54:35 EDT 2012


#256: SGD converges to NaN, but only when the coordinates aren't printed at each
iteration
---------------------+------------------------------------------------------
 Reporter:  rcurtin  |        Owner:  rcurtin                                      
     Type:  defect   |       Status:  new                                          
 Priority:  major    |    Milestone:  mlpack 1.0.4                                 
Component:  mlpack   |     Keywords:  armadillo, nan, weird, optimization, sgd, nca
 Blocking:           |   Blocked By:                                               
---------------------+------------------------------------------------------
 This is not the only time an issue like this has occurred.  In fact I can
 think of several times I have either been told of behavior like this or
 come across an example myself.  The crux of the issue is this:  the
 algorithm doesn't work and it doesn't make sense why -- in a debugger, all
 the values are correct, and when it's run in a debugger, it works!  But
 outside of a debugger, it doesn't work at all; weird things happen.  And
 worse yet, the problem seems to be solved by printing the values of the
 "weird" variable!

 Here is an example, using NCA at r13809:

 {{{
 :[ ryan @ trevelyan ]: $ /home/ryan/work/fastlab/mlpack/trunk/build-
 nodebug/bin/nca -i ionosphere.train.csv -l ionosphere.labels.train.csv -o
 nca-metric-2.csv -v
 [INFO ] Loading 'ionosphere.train.csv' as CSV data.
 [INFO ] Loading 'ionosphere.labels.train.csv' as CSV data.
 [INFO ] SGD: iteration 1, objective -201.416.
 [WARN ] Denominator of p_31 is 0!
 [WARN ] Denominator of p_32 is 0!
 [WARN ] Denominator of p_32 is 0!
 [WARN ] Denominator of p_33 is 0!
 [WARN ] Denominator of p_33 is 0!
 [WARN ] Denominator of p_34 is 0!
 [WARN ] Denominator of p_34 is 0!
 [WARN ] Denominator of p_35 is 0!
 [WARN ] Denominator of p_35 is 0!
 [WARN ] Denominator of p_36 is 0!
 [WARN ] Denominator of p_36 is 0!
 [WARN ] Denominator of p_37 is 0!
 [WARN ] Denominator of p_37 is 0!
 [WARN ] Denominator of p_38 is 0!
 [WARN ] Denominator of p_38 is 0!
 [WARN ] Denominator of p_39 is 0!
 [WARN ] Denominator of p_39 is 0!
 [WARN ] Denominator of p_40 is 0!
 [INFO ] SGD: iteration 247, objective -nan.
 [WARN ] SGD: converged to -nan; terminating with failure.  Try a smaller
 step size?
 [INFO ] Saving CSV data to 'nca-metric-2.csv'.
 [INFO ]
 [INFO ] Execution parameters:
 [INFO ]   help: false
 [INFO ]   info: ""
 [INFO ]   input_file: ionosphere.train.csv
 [INFO ]   labels_file: ionosphere.labels.train.csv
 [INFO ]   max_iterations: 500000
 [INFO ]   normalize: false
 [INFO ]   output_file: nca-metric-2.csv
 [INFO ]   seed: 0
 [INFO ]   step_size: 0.01
 [INFO ]   tolerance: 1e-07
 [INFO ]   verbose: true
 [INFO ]
 [INFO ] Program timers:
 [INFO ]   loading_data: 0.009918s
 [INFO ]   nca_sgd_optimization: 0.502951s
 [INFO ]   saving_data: 0.000486s
 [INFO ]   total_time: 0.513591s
 }}}

 But now I add one line, which prints the coordinates, somewhere in the
 code that is visited every iteration (I chose
 `SoftmaxErrorFunction<MetricType>::Gradient()`):

 {{{
   Log::Info << coordinates << std::endl;
 }}}

 and now when we recompile we see entirely different results!

 {{{
 :[ ryan @ trevelyan ]: $ /home/ryan/work/fastlab/mlpack/trunk/build-
 nodebug/bin/nca -i ionosphere.train.csv -l ionosphere.labels.train.csv -o
 nca-metric-2.csv -v
 [INFO ] Loading 'ionosphere.train.csv' as CSV data.
 [INFO ] Loading 'ionosphere.labels.train.csv' as CSV data.
 [INFO ] SGD: iteration 1, objective -201.416.
 [INFO ] SGD: iteration 247, objective -196.461.
 [INFO ] SGD: iteration 493, objective -196.21.
 [INFO ] SGD: iteration 739, objective -203.255.
 [INFO ] SGD: iteration 985, objective -209.301.
 [INFO ] SGD: iteration 1231, objective -212.904.
 [INFO ] SGD: iteration 1477, objective -215.316.
 [INFO ] SGD: iteration 1723, objective -216.942.
 [INFO ] SGD: iteration 1969, objective -217.943.
 [INFO ] SGD: iteration 2215, objective -218.531.
 [INFO ] SGD: iteration 2461, objective -218.878.
 [INFO ] SGD: iteration 2707, objective -219.093.
 [INFO ] SGD: iteration 2953, objective -219.235.
 [INFO ] SGD: iteration 3199, objective -219.337.
 [INFO ] SGD: iteration 3445, objective -219.416.
 [INFO ] SGD: iteration 3691, objective -219.479.
 [INFO ] SGD: iteration 3937, objective -219.531.
 ...
 }}}

 While the situation in question is unique to SGD/NCA and it's what I'm
 working on right now (I don't need a reminder to finish it), I'm
 documenting what I find here because this problem has been stumbled upon
 before and certainly will be stumbled upon again.

-- 
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/256>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed by the FASTLAB at Georgia Tech under Dr. Alex Gray.


More information about the mlpack-svn mailing list