[mlpack] GSoC 2013 Final Report - September 23, 2013

Mon Sep 23 15:38:48 EDT 2013

This is the final report of the project 'Automatic benchmarking of mlpack methods' for the Google Summer of Code 2013.

Project Description

For widespread adoption of MLPACK to happen, it is very important that relevant and up-to-date benchmarks are available. This project entails writing support scripts which will run MLPACK methods on a variety of datasets and produce runtime numbers. The benchmarking scripts will also run the same machine learning methods from other machine learning libraries and then produce runtime graphs. This is integrated into Jenkins so that benchmarks are auto-generated, to informing developers which of their changesets have caused speedups or slowdowns.

Deliverables

The project consists of three components:

* The scripts to benchmark the methods.

This part has claimed most of the time because there were several libraries we've compared with the MLPACK library. At the end they're several scripts for various libraries namely WEKA, Shogun, Scikit, MATALB, MLPy and of course MLPACK.

The benchmark script is modular: for each library and method a script needs to be written. The script specifies where the particular benchmarking suite is, how to run it and how to interpret the results. The benchmark module runs the scripts with the different configurations and stores the results in a local database or just displays the results on the std output.

* A small site to browse the results from the benchmark.

For the design of the reports we use the Twitter-Bootstrap framework [1] which contains HTML and CSS-based design templates to create websites. Most of the templates are designed to be backward compatible, so the reports are available for almost all devices and browsers. As I said before the Bootstrap framework is a package full of templates, for that reason if you don't like the default style it's always possible to pull individual pieces of the Bootstrap framework and customize aspects of it to get something that looks more the way you like it. I've created a different look [2] to show you how easy it is to create a different look.

* The integration into Jenkins.

We've decided to create a single job for every task: For each library there is a job to run the associated benchmark scripts. We've created one job to checkout the current version and run the unittest and a single job to create the HTML page for the reports. To publish the benchmark results we use the 'HTML Publisher Plugin' by Michael Rooney [3] one of the 300+ plugins that makes Jenkins so powerful.

Documentation

The page: http://trac.research.cc.gatech.edu/fastlab/wiki/AutomaticBenchmark contains the documentation about how to run the benchmark script, which packages are necessary to run the benchmark, example configurations and a bunch of other stuff. The page also contains a section that could be useful for developers if they want to write new scripts.

Results

The results page: http://big.cc.gt.atl.ga.us/job/benchmark%20-%20reports/Benchmark_Results/? presents the results for all available scripts plus the memory reports for the MLPACK methods. The current source code can be found via the following link: http://svn.cc.gatech.edu/fastlab/mlpack/conf/jenkins-conf/benchmark/.

Thanks

This project was a great experience for me and I would like to thank all involved participants.

First of all, I would like to thank Ryan, who has been a great mentor, was always there to answer my questions and helped me a lot almost daily during the whole summer. Thank you very much for organizing these months and making it works so smoothly.

Next, I would like to thank everybody from MLPACK community which helped me to get the job done. On the organizational side, I want to thank Google for this incredible experience. Google's Open Source department and especially Carol Smith are doing a great job.

Finally an incomplete and unordered list of things I've started to understand during the last three months: MLPACK code base, Templates, Scikit API, WEKA API, Shogun API, MLPy API, MATLAB API, Python Standard Library, bug tracking and fixing, Twitter-Bootstrap and a lot of other things.

GSoC Conclusion

I can only recommend every student out there to consider applying for the Summer of Code! It has been an absolutely great time, meeting new people and discussing stuff with them has always been fun for me, the last months were just great!

That’s all!
Best regards
Marcus

[1] http://getbootstrap.com/
[2] http://virtual-artz.de/gsoc/results/
[3] https://wiki.jenkins-ci.org/display/JENKINS/HTML+Publisher+Plugin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20130923/a3f8bc19/attachment.html>