[mlpack] mlpack 2.0.0 released

Ryan Curtin ryan at ratml.org
Thu Dec 24 11:20:46 EST 2015


Hello there,

This has been a long time coming...

Last night I tagged mlpack-2.0.0 and uploaded it to the mlpack website.
You can get it here:

  http://www.mlpack.org/files/mlpack-2.0.0.tar.gz

There has been a significant amount of refactoring and hard work by lots
of people since the last release in January, and the changelog is fairly
long, so I'll put what I think are the most exciting bits below:

 * Parallelization: the DET (density estimation trees) code is now
   parallelized with OpenMP.  As time goes on, parallelization will be
   added to other algorithms, but note that you can also use Armadillo
   with OpenBLAS, which will parallelize all the linear algebra calls.

 * Model saving and loading: where appropriate, all of the command-line
   programs now support loading and saving models.  So you can train,
   say, a logistic regression model, and save it for later use.  This is
   also possible with techniques like all-k-nearest-neighbor search,
   which allow you to save the tree built on the points.  Model
   serialization support is also available from C++, too, of course.

 * Significant refactoring: most machine learning algorithms now follow
   the same API, and documentation has been improved.

 * Tree-based algorithms now support multiple types of trees in a far
   easier manner.

 * The k-means code now supports five different algorithms, many of them
   far faster than the original implementation.

 * Add streaming decision trees (Hoeffding trees) for fast classifiers
   on huge datasets.  This supports both categorical and numeric
   features.

 * No more dependence on libxml2; boost::serialization is used instead.

 * Armadillo minimum version bump to 4.100.0.

 * All mlpack programs are now prefixed with 'mlpack_', so for instance
   'allknn' is now 'mlpack_allknn'.

Also exciting, in my opinion, is the community that has grown around
mlpack.  Here are some neat and interesting statistics:

 * mlpack has almost 40 contributors

 * mlpack has now been downloaded at least 35k+ times (my logs
   undercount)

 * mlpack has been used in at least 40 academic papers (also a lowball
   estimate)---and this number is increasing faster and faster

 * the mlpack codebase now contains about 60k source lines of code
   (SLOC)

So I have to say, I'm very happy that we have built tools that people
are finding useful!  I hope that this trend continues. :)

For the full changelog in mlpack-2.0.0, see
http://www.mlpack.org/history.html.  Over the next few days/weeks,
updated mlpack packages will be pushed to the package repositories of
various distributions.

Lastly, some notes about the future.  Upcoming releases will follow the
versioning guidelines now present in UPDATING.txt (semantic versioning):
https://github.com/mlpack/mlpack/blob/master/UPDATING.txt

Future goals include a flexible framework for artificial neural networks
(prototype code can currently be found in the master branch in
src/mlpack/methods/ann), generic bindings to other languages such as
Python, Java, MATLAB, and others, parallelization support for more
algorithms via OpenMP, a new implementation of random forests, and
dimensionality reduction or manifold learning techniques.

I'm also hopeful that we can have a much more frequent release cycle,
more like once a month or more, following the versioning guidelines I
mentioned earlier.

So, I hope that you find this release useful!  Please feel free to
report any bugs as Github issues to https://github.com/mlpack/mlpack or
to this mailing list, or to the #mlpack channel in freenode.

-- 
Ryan Curtin    | "Good Lord - I've heard about this - cat juggling!
ryan at ratml.org | Stop! Stop! Stop it!" - Navin R. Johnson


More information about the mlpack mailing list