[mlpack-git] [mlpack] Add covariance factorization caching to gaussian distribution (#390)

Ryan Curtin notifications at github.com
Mon Jan 26 15:09:04 EST 2015


I spent far too long benchmarking this trying to track down some anomalously slow results with the new code.  I ended up unable to reproduce any slowdown, but hey, I did a good amount of benchmarking, so here are the results.  I used this gist:

https://gist.github.com/rcurtin/daf960aa6ad545f58402

Below are the numbers for each of the timers in that program (the `no chol` numbers are where I removed the call to `chol(..., "lower")` and just inverted the covariance matrix directly in `FactorCovariance()`:

```
covertype (54x581012)

master

estimate                 2.027     2.027     2.031
gmm_training_imitation  43.095    43.014    42.926
probability_batch        0.838     0.835     0.835
probability_individual  64.599    66.545    66.608
random                   3.607     3.608     3.601

stephentu

estimate                 2.049     2.012     1.005
gmm_training_imitation  42.363    42.615    42.303
probability_batch        0.823     0.826     0.824
probability_individual   0.746     0.745     0.745
random                   0.437     0.436     0.438

stephentu, no chol

estimate                 1.976     1.995     1.984
gmm_training_imitation  42.377    42.306    42.323
probability_batch        0.825     0.825     0.823
probability_individual   0.744     0.748     0.747
random                   0.437     0.439     0.437
```

```
corel (32x37749)

master

estimate                0.053   0.053   0.053
gmm_training_imitation  1.522   1.548   1.547
probability_batch       0.031   0.031   0.031
probability_individual  1.647   1.656   1.603
random                  0.917   0.910   0.915

stephentu

estimate                0.052   0.052   0.051
gmm_training_imitation  1.572   1.559   1.573
probability_batch       0.031   0.031   0.031
probability_individual  0.047   0.047   0.046
random                  0.257   0.258   0.257

stephentu no chol

estimate                0.051   0.052   0.051
gmm_training_imitation  1.575   1.570   1.583
probability_batch       0.031   0.031   0.031
probability_individual  0.047   0.046   0.047
random                  0.256   0.257   0.256
```

```
1000000-10-randu (10x1000000)

master

estimate                 0.159     0.160     0.159
gmm_training_imitation  16.528    16.606    16.936
probability_batch        0.332     0.333     0.339
probability_individual   3.477     3.483     3.489
random                   0.156     0.156     0.151

stephentu

estimate                 0.164     0.164     0.164
gmm_training_imitation  16.514    16.492    16.562
probability_batch        0.335     0.331     0.332
probability_individual   0.155     0.163     0.155
random                   0.071     0.071     0.071

stephentu, no chol

estimate                 0.158     0.159     0.159
gmm_training_imitation  16.892    16.576    16.844
probability_batch        0.341     0.335     0.337
probability_individual   0.165     0.162     0.169
random                   0.071     0.071     0.071
```

So, we get tons of speedup for calls to `Random()` and `Probability()` for a single point, but not much speedup for the other cases.  One might see better speedup for the other methods in very high-dimensional settings.  As a result, I don't see much speedup in the `gmm` program as a result of this, but it's certainly still an important and valuable contribution.  :+1: 

---
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/390#issuecomment-71528581
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20150126/4b568827/attachment.html>


More information about the mlpack-git mailing list