[mlpack-svn] [MLPACK] #256: SGD converges to NaN, but only when the coordinates aren't printed at each iteration
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Wed Oct 31 17:54:35 EDT 2012
#256: SGD converges to NaN, but only when the coordinates aren't printed at each
iteration
---------------------+------------------------------------------------------
Reporter: rcurtin | Owner: rcurtin
Type: defect | Status: new
Priority: major | Milestone: mlpack 1.0.4
Component: mlpack | Keywords: armadillo, nan, weird, optimization, sgd, nca
Blocking: | Blocked By:
---------------------+------------------------------------------------------
This is not the only time an issue like this has occurred. In fact I can
think of several times I have either been told of behavior like this or
come across an example myself. The crux of the issue is this: the
algorithm doesn't work and it doesn't make sense why -- in a debugger, all
the values are correct, and when it's run in a debugger, it works! But
outside of a debugger, it doesn't work at all; weird things happen. And
worse yet, the problem seems to be solved by printing the values of the
"weird" variable!
Here is an example, using NCA at r13809:
{{{
:[ ryan @ trevelyan ]: $ /home/ryan/work/fastlab/mlpack/trunk/build-
nodebug/bin/nca -i ionosphere.train.csv -l ionosphere.labels.train.csv -o
nca-metric-2.csv -v
[INFO ] Loading 'ionosphere.train.csv' as CSV data.
[INFO ] Loading 'ionosphere.labels.train.csv' as CSV data.
[INFO ] SGD: iteration 1, objective -201.416.
[WARN ] Denominator of p_31 is 0!
[WARN ] Denominator of p_32 is 0!
[WARN ] Denominator of p_32 is 0!
[WARN ] Denominator of p_33 is 0!
[WARN ] Denominator of p_33 is 0!
[WARN ] Denominator of p_34 is 0!
[WARN ] Denominator of p_34 is 0!
[WARN ] Denominator of p_35 is 0!
[WARN ] Denominator of p_35 is 0!
[WARN ] Denominator of p_36 is 0!
[WARN ] Denominator of p_36 is 0!
[WARN ] Denominator of p_37 is 0!
[WARN ] Denominator of p_37 is 0!
[WARN ] Denominator of p_38 is 0!
[WARN ] Denominator of p_38 is 0!
[WARN ] Denominator of p_39 is 0!
[WARN ] Denominator of p_39 is 0!
[WARN ] Denominator of p_40 is 0!
[INFO ] SGD: iteration 247, objective -nan.
[WARN ] SGD: converged to -nan; terminating with failure. Try a smaller
step size?
[INFO ] Saving CSV data to 'nca-metric-2.csv'.
[INFO ]
[INFO ] Execution parameters:
[INFO ] help: false
[INFO ] info: ""
[INFO ] input_file: ionosphere.train.csv
[INFO ] labels_file: ionosphere.labels.train.csv
[INFO ] max_iterations: 500000
[INFO ] normalize: false
[INFO ] output_file: nca-metric-2.csv
[INFO ] seed: 0
[INFO ] step_size: 0.01
[INFO ] tolerance: 1e-07
[INFO ] verbose: true
[INFO ]
[INFO ] Program timers:
[INFO ] loading_data: 0.009918s
[INFO ] nca_sgd_optimization: 0.502951s
[INFO ] saving_data: 0.000486s
[INFO ] total_time: 0.513591s
}}}
But now I add one line, which prints the coordinates, somewhere in the
code that is visited every iteration (I chose
`SoftmaxErrorFunction<MetricType>::Gradient()`):
{{{
Log::Info << coordinates << std::endl;
}}}
and now when we recompile we see entirely different results!
{{{
:[ ryan @ trevelyan ]: $ /home/ryan/work/fastlab/mlpack/trunk/build-
nodebug/bin/nca -i ionosphere.train.csv -l ionosphere.labels.train.csv -o
nca-metric-2.csv -v
[INFO ] Loading 'ionosphere.train.csv' as CSV data.
[INFO ] Loading 'ionosphere.labels.train.csv' as CSV data.
[INFO ] SGD: iteration 1, objective -201.416.
[INFO ] SGD: iteration 247, objective -196.461.
[INFO ] SGD: iteration 493, objective -196.21.
[INFO ] SGD: iteration 739, objective -203.255.
[INFO ] SGD: iteration 985, objective -209.301.
[INFO ] SGD: iteration 1231, objective -212.904.
[INFO ] SGD: iteration 1477, objective -215.316.
[INFO ] SGD: iteration 1723, objective -216.942.
[INFO ] SGD: iteration 1969, objective -217.943.
[INFO ] SGD: iteration 2215, objective -218.531.
[INFO ] SGD: iteration 2461, objective -218.878.
[INFO ] SGD: iteration 2707, objective -219.093.
[INFO ] SGD: iteration 2953, objective -219.235.
[INFO ] SGD: iteration 3199, objective -219.337.
[INFO ] SGD: iteration 3445, objective -219.416.
[INFO ] SGD: iteration 3691, objective -219.479.
[INFO ] SGD: iteration 3937, objective -219.531.
...
}}}
While the situation in question is unique to SGD/NCA and it's what I'm
working on right now (I don't need a reminder to finish it), I'm
documenting what I find here because this problem has been stumbled upon
before and certainly will be stumbled upon again.
--
Ticket URL: <http://trac.research.cc.gatech.edu/fastlab/ticket/256>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed by the FASTLAB at Georgia Tech under Dr. Alex Gray.
More information about the mlpack-svn
mailing list