[mlpack-svn] [MLPACK] #344: Using Welford method to calculate variance in Naive Bayes Classifier

Mon Apr 14 16:37:38 EDT 2014

#344: Using Welford method to calculate variance in Naive Bayes Classifier
-----------------------------------+----------------------------------------
  Reporter:  akvah                 |        Owner:  rcurtin 
      Type:  enhancement           |       Status:  accepted
  Priority:  minor                 |    Milestone:          
 Component:  mlpack                |   Resolution:          
  Keywords:  variance calculation  |     Blocking:          
Blocked By:                        |  
-----------------------------------+----------------------------------------
Changes (by rcurtin):

  * owner:  => rcurtin
  * status:  new => accepted

Comment:

 Hi Vahab,

 Thank you for the contribution.  I refactored it slightly; the updated
 patch is attached.  If you can make sure I haven't broken anything, I'd
 appreciate it.

 Unfortunately, what I found was that the patched code, while more robust,
 takes approximately 3x as long for training.  Here are my results for two
 datasets:

  * isolet (617x7797): ~0.01s training pre-patch, ~0.03s training post-
 patch
  * randu (10x1000000): ~0.05s training pre-patch, ~0.15s training post-
 patch

 I ran both of those enough times to account for timing variance.

 So, can we find a way to increase the speed of the Wellford method to be
 closer to the previous implementation?  Alternately, another option is to
 have the user specify which method should be used; or, perhaps an error
 can be detected when using the original calculation, and then the Wellford
 method could be used.  What do you think?

 Thanks,

 Ryan

-- 
Ticket URL: <https://trac.research.cc.gatech.edu/fastlab/ticket/344#comment:1>
MLPACK <www.fast-lab.org>
MLPACK is an intuitive, fast, and scalable C++ machine learning library developed at Georgia Tech.