[mlpack-svn] [MLPACK] #344: Using Welford method to calculate variance in Naive Bayes Classifier
MLPACK Trac
trac at coffeetalk-1.cc.gatech.edu
Tue Apr 15 09:51:34 EDT 2014
#344: Using Welford method to calculate variance in Naive Bayes Classifier
-----------------------------------+----------------------------------------
Reporter: akvah | Owner: rcurtin
Type: enhancement | Status: accepted
Priority: minor | Milestone:
Component: mlpack | Resolution:
Keywords: variance calculation | Blocking:
Blocked By: |
-----------------------------------+----------------------------------------
Comment (by akvah):
Hi Ryan,
Yes, the new code seems ok.
With respect to running time you are right, as we are doing a division at
each iteration it is taking much longer.
So I looked around and found that the standard (two pass) approach is the
one usually used. Although it uses two passes over the data, it basically
performs the same operations that the squared method performs (only in two
iterations), therefore its running time should be close to that of squared
method.
For example I looked at the source code of the R
(src/library/stats/src/cov.c) and saw that they also use the standard
algorithm.
To detect when the algorithm is going to fail is not that easy. The point
is that when the difference between the mean and variance is large, the
standard methods will fail (the higher the difference the more error they
accumulate). Although the standard method will fail much less than the
squared method (http://www.johndcook.com/blog/2008/09/26/comparing-three-
methods-of-computing-standard-deviation/).
So I suggest that we can make the method to be a parameter of the
function, but the default method would better be either the standard or
the Welford method.
Vahab
--
Ticket URL: <https://trac.research.cc.gatech.edu/fastlab/ticket/344#comment:2>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.
More information about the mlpack-svn
mailing list