[mlpack-git] [mlpack/mlpack] Adds a Train() function that only needs dataset statistics (#748)

Yannis Mentekidis notifications at github.com
Tue Aug 2 10:28:11 EDT 2016


Instead of using the dataset itself, this Train() function only needs the dataset statistics. This function works for 1-dimensional distributions only (you can only provide 1 number for each statistic).

This is useful for my code in LSHModel, where instead of using data to fit the distribution, we create a regression function that can predict the arithmetic and geometric mean of squared distances given the size of the dataset. Since we don't have the actual distances - we only have an estimation of the distances' statistics - we can't use the previous Train() function in GammaDistribution.

I modified my code so that the Train() that accepts the dataset simply calls this function for each row after computing the statistics, to avoid code reuse.

I also added a test to make sure this produces the same result as giving the Train() function the dataset.
You can view, comment on, or merge this pull request online at:

  https://github.com/mlpack/mlpack/pull/748

-- Commit Summary --

  * Adds a Train() function that only needs dataset statistics, not the dataset itself

-- File Changes --

    M src/mlpack/core/dists/gamma_distribution.cpp (81)
    M src/mlpack/core/dists/gamma_distribution.hpp (15)
    M src/mlpack/tests/distribution_test.cpp (22)

-- Patch Links --

https://github.com/mlpack/mlpack/pull/748.patch
https://github.com/mlpack/mlpack/pull/748.diff

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/mlpack/mlpack/pull/748
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack-git/attachments/20160802/18df7253/attachment.html>


More information about the mlpack-git mailing list