[mlpack] GSoC 2014 : Introduction and Interests
Marcus Edel
marcus.edel at fu-berlin.de
Mon Mar 10 13:54:49 EDT 2014
Hello,
> I was studying on bench-marking and performance analysis of machine
> learning algorithms and came across an interesting idea in a research
> paper.
Can you point us to the paper?
> So, one of the things that I propose for this project is that we
> implement, say, k metrics and perform a bootstrap analysis for the
> given algorithms over these k metrics. By this, we will have a good
> idea about how probable is it for an algorithm to perform "well" given
> various metrics.
Yes, that seems reasonable.
> I have not yet decided on the metrics to use, but I am working on
> that.
I think we should offer some standard metrics and the class should also be templatized in such a way that the user can easily implement own metrics or choose different metrics.
> I would like to have comments and feedback on the idea. Also, it
> would be great if you can tell me the algorithms/tools that we will be
> comparing for performance in the project. I can give more rigorous
> details in the proposal.
Currently there are a few classifiers in the mlpack/benchmark system (linear regression, logistic regression, least angle regression, naive bayes classifier, etc.).
The following link list the currently available methods in mlpack:
http://mlpack.org/doxygen.php
So maybe it's a good idea to include some additional classifiers from shogun, weka, scikit, etc.
http://scikit-learn.org/stable/supervised_learning.html#supervised-learning
http://www.shogun-toolbox.org/page/features/
http://weka.sourceforge.net/doc.dev/weka/classifiers/Classifier.html
I hope that helps.
Thanks,
Marcus
On 10 Mar 2014, at 17:56, Anand Soni <anand.92.soni at gmail.com> wrote:
> Hi Marcus and Ryan,
>
> I was studying on bench-marking and performance analysis of machine
> learning algorithms and came across an interesting idea in a research
> paper.
>
> Suppose we need to compare 'n' algorithms for performance. (I need
> more information about the algorithms that will be involved in this
> project). Also, suppose I have 'k' performance metrics. Obviously we
> must not infer anything by looking at an algorithm's performance based
> on just one metric.
>
> For example, in one of my projects where I did sentiment analysis
> using ANNs (artificial neural networks), I got a good accuracy while
> the precision/recall measures were not in good figures. This means
> there is no "best algorithm". It all depends on the metrics used.
>
> So, one of the things that I propose for this project is that we
> implement, say, k metrics and perform a bootstrap analysis for the
> given algorithms over these k metrics. By this, we will have a good
> idea about how probable is it for an algorithm to perform "well" given
> various metrics.
>
> I have not yet decided on the metrics to use, but I am working on
> that. I would like to have comments and feedback on the idea. Also, it
> would be great if you can tell me the algorithms/tools that we will be
> comparing for performance in the project. I can give more rigorous
> details in the proposal.
>
> Regards.
>
> Anand Soni
>
> On Thu, Mar 6, 2014 at 10:08 PM, Ryan Curtin <gth671b at mail.gatech.edu> wrote:
>> On Wed, Mar 05, 2014 at 08:39:10PM +0530, Anand Soni wrote:
>>> Thanks a lot Ryan!
>>>
>>> I too, would want to have a single and nice application submitted
>>> rather than many. It was just out of interest that I was reading up on
>>> dual trees and yes, most of the literature that I found was from
>>> gatech. I also came across your paper on dual trees
>>> (http://arxiv.org/pdf/1304.4327.pdf ). Can you give me some more
>>> pointers where I can get a better understanding of dual trees?
>>
>> There are lots of papers on dual-tree algorithms but the paper you
>> linked to is (to my knowledge) the only one that tries to describe
>> dual-tree algorithms in an abstract manner. Here are some links to
>> other papers, but keep in mind that they focus on particular algorithms
>> and often don't devote very much space to describing exactly what a
>> dual-tree algorithm is:
>>
>> A.G. Gray and A.W. Moore. "N-body problems in statistical learning."
>> Advances in Neural Information Processing Systems (2001): 521-527.
>>
>> A.W. Moore. "Nonparametric density estimation: toward computational
>> tractability." Proceedings of the Third SIAM International Conference
>> on Data Mining (2003).
>>
>> A. Beygelzimer, S. Kakade, and J.L. Langford. "Cover trees for nearest
>> neighbor." Proceedings of the 23rd International Conference on Machine
>> Learning (2006).
>>
>> P. Ram, D. Lee, W.B. March, A.G. Gray. "Linear-time algorithms for
>> pairwise statistical problems." Advances in Neural Information
>> Processing Systems (2009).
>>
>> W.B. March, P. Ram, A.G. Gray. "Fast Euclidean minimum spanning tree:
>> algorithm, analysis, and applications." Proceedings of the 16th ACM
>> SIGKDD International Conference on Knowledge Discovery and Data Mining
>> (2010).
>>
>> R.R. Curtin, P. Ram. "Dual-tree fast exact max-kernel search." (this
>> one hasn't been published yet...
>> http://www.ratml.org/pub/pdf/2013fastmks.pdf ).
>>
>> I know that's a lot of references and probably way more than you want to
>> read, so don't feel obligated to read anything, but it will probably
>> help explain exactly what a dual-tree algorithm is... I hope! I can
>> link to more papers too, if you want...
>>
>>> But, of course, I am more willing to work on automatic benchmarking,
>>> on which I had a little talk with Marcus and I am brewing ideas.
>>
>> Ok, sounds good.
>>
>> Thanks,
>>
>> Ryan
>>
>> --
>> Ryan Curtin | "Somebody dropped a bag on the sidewalk."
>> ryan at ratml.org | - Kit
>
>
>
> --
> Anand Soni | Junior Undergraduate | Department of Computer Science &
> Engineering | IIT Bombay | India
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4972 bytes
Desc: not available
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140310/20c30be3/attachment.bin>
More information about the mlpack
mailing list