[mlpack] mlpack 2.0.0 released
Ryan Curtin
ryan at ratml.org
Thu Dec 24 11:20:46 EST 2015
Hello there,
This has been a long time coming...
Last night I tagged mlpack-2.0.0 and uploaded it to the mlpack website.
You can get it here:
http://www.mlpack.org/files/mlpack-2.0.0.tar.gz
There has been a significant amount of refactoring and hard work by lots
of people since the last release in January, and the changelog is fairly
long, so I'll put what I think are the most exciting bits below:
* Parallelization: the DET (density estimation trees) code is now
parallelized with OpenMP. As time goes on, parallelization will be
added to other algorithms, but note that you can also use Armadillo
with OpenBLAS, which will parallelize all the linear algebra calls.
* Model saving and loading: where appropriate, all of the command-line
programs now support loading and saving models. So you can train,
say, a logistic regression model, and save it for later use. This is
also possible with techniques like all-k-nearest-neighbor search,
which allow you to save the tree built on the points. Model
serialization support is also available from C++, too, of course.
* Significant refactoring: most machine learning algorithms now follow
the same API, and documentation has been improved.
* Tree-based algorithms now support multiple types of trees in a far
easier manner.
* The k-means code now supports five different algorithms, many of them
far faster than the original implementation.
* Add streaming decision trees (Hoeffding trees) for fast classifiers
on huge datasets. This supports both categorical and numeric
features.
* No more dependence on libxml2; boost::serialization is used instead.
* Armadillo minimum version bump to 4.100.0.
* All mlpack programs are now prefixed with 'mlpack_', so for instance
'allknn' is now 'mlpack_allknn'.
Also exciting, in my opinion, is the community that has grown around
mlpack. Here are some neat and interesting statistics:
* mlpack has almost 40 contributors
* mlpack has now been downloaded at least 35k+ times (my logs
undercount)
* mlpack has been used in at least 40 academic papers (also a lowball
estimate)---and this number is increasing faster and faster
* the mlpack codebase now contains about 60k source lines of code
(SLOC)
So I have to say, I'm very happy that we have built tools that people
are finding useful! I hope that this trend continues. :)
For the full changelog in mlpack-2.0.0, see
http://www.mlpack.org/history.html. Over the next few days/weeks,
updated mlpack packages will be pushed to the package repositories of
various distributions.
Lastly, some notes about the future. Upcoming releases will follow the
versioning guidelines now present in UPDATING.txt (semantic versioning):
https://github.com/mlpack/mlpack/blob/master/UPDATING.txt
Future goals include a flexible framework for artificial neural networks
(prototype code can currently be found in the master branch in
src/mlpack/methods/ann), generic bindings to other languages such as
Python, Java, MATLAB, and others, parallelization support for more
algorithms via OpenMP, a new implementation of random forests, and
dimensionality reduction or manifold learning techniques.
I'm also hopeful that we can have a much more frequent release cycle,
more like once a month or more, following the versioning guidelines I
mentioned earlier.
So, I hope that you find this release useful! Please feel free to
report any bugs as Github issues to https://github.com/mlpack/mlpack or
to this mailing list, or to the #mlpack channel in freenode.
--
Ryan Curtin | "Good Lord - I've heard about this - cat juggling!
ryan at ratml.org | Stop! Stop! Stop it!" - Navin R. Johnson
More information about the mlpack
mailing list