[mlpack] Automatic Benchmarking

Fri Feb 28 05:04:12 EST 2014

Hello, 
I am Praveen Venkateswaran, an undergraduate doing Computer Science and 
Mathematics in India. 
I have worked with various machine learning algorithms as well as on 
information retrieval and I would love to be able to contribute to 
mlpack starting with GSoC 2014. 

I am interested in working on the improvement to the automatic 
benchmarking that was done during the last summer. I would like to 
start in terms of comparing the accuracy of implementations of various 
libraries. I've been browsing resources to try and find some starting 
point for this. [0] basically describes WiseRF's benchmarking of the 
random forest classification. 

The point that strikes me the most is that they tried whenever possible 
to see which parameters yielded better results on individual libraries 
and then compared the libraries based on those individualistic 
parameters instead of just using default parameters which i totally 
agree with as it yields more unbiased results, what do you think about 
this?
I had already spoken to Ryan asking him to clarify details on the 
project and the crux would be the comparison of parameters. Now if we 
use the above point, then we would have to individually check out the 
best parameters to be used in particular libraries for the database 
size range and then run the individual methods on those. 
Then we could score them on that basis (For classification algorithms, 
it’s the fraction of correctly classified samples, for regression 
algorithms it’s the mean squared error and for k-means it’s the 
inertia criterion) or something along those lines(not too sure about 
this, as I don't have experience with all the libraries being tested). 
Please let me know what you think about this and any further 
suggestions would be most welcome. 

[0] 
http://about.wise.io/blog/2013/07/15/benchmarking-random-forest-part-1/

Regards
Praveen Venkateswaran
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140228/321a4582/attachment-0001.html>