[mlpack] GSoC 2014 : Introduction and Interests

Tue Mar 4 18:14:29 EST 2014

Hello Anand,

Thanks for your contribution!

> 1. Accuracy, Precision and recall, n-fold cross-validation. (Basic stuff)

That's correct. We should take this metrics into account.

> 2. Area under ROC Curves (Receiver Operating Characteristics)
> [Probability that classifier will rank a randomly chosen positive
> instance higher than a randomly chosen negative
> instance.]

ROC curve works only for binary classification and only when you have a continuous output from the classifier. The problem is, generally we don't have the data to plot ROC graphs. 
However, if you like, you can add this quality metric, as an option.

> There are many other possibilities like Bayesian models and
> statistical confidence intervals which can be used for such purposes.
> I need more clarifications on the expectations from this project so
> that I can do my research in the correct direction before the
> proposal. I will be glad if someone can help.

Generally it depends on what you want to know about the performance characteristics of the classifier/algorithm. I think it is best to report several measures of the performance. Just to add some more metrics:

- F-measure
- Matthews correlation coefficient (MCC)
- Relative Classifier Information (RCI)
- Confusion Entropy (CEN)
- Cohen’s kappa

The last three metrics are also capable to measure the performance of multi-class problems.

My suggestion is to combine the base metrics with unbalanced multi-class metrics. At the end the results are stored in the database so we can easily add more metrics.

Is that helpful? If you have any questions, feel free to ask.

Thanks,

Marcus

On 04 Mar 2014, at 17:49, Anand Soni <anand.92.soni at gmail.com> wrote:

> Hi,
> 
> I built the mlpack environment and tried the all k nearest neighbour
> search for iris data. I am still exploring and analyzing the results.
> As mentioned in the project description, we need to implement methods
> to compare accuracies of algorithms. I have a few ideas. I don't know
> if they are useful here. I am exploring more.
> 
> 1. Accuracy, Precision and recall, n-fold cross-validation. (Basic stuff)
> 2. Area under ROC Curves (Receiver Operating Characteristics)
> [Probability that classifier will rank a randomly chosen positive
> instance higher than a randomly chosen negative
> instance.]
> 3. Information theoretic metrics [Still exporing] like : Good's
> Information reward (for binary classification algorithms)
> 
> There are many other possibilities like Bayesian models and
> statistical confidence intervals which can be used for such purposes.
> I need more clarifications on the expectations from this project so
> that I can do my research in the correct direction before the
> proposal. I will be glad if someone can help.
> 
> Regards.
> 
> Anand
> 
> On Tue, Mar 4, 2014 at 12:26 AM, Ryan Curtin <gth671b at mail.gatech.edu> wrote:
>> On Tue, Mar 04, 2014 at 12:19:30AM +0530, Anand Soni wrote:
>>> Ryan,
>>> 
>>> I think that the gatech server is down or not responding. I am not
>>> even able to access www.gatech.edu . I will try a bit later and it
>>> should work. Thanks a lot, by the way.
>> 
>> Ok; let me know if you have continued issues.  I am able to access it,
>> but I'm right here on campus, so there's probably some issue between
>> here and where you are.  Hopefully it will be resolved soon...
>> 
>> --
>> Ryan Curtin    | "More like a nonja."
>> ryan at ratml.org |   - Pops
> 
> 
> 
> -- 
> Anand Soni | Junior Undergraduate | Department of Computer Science &
> Engineering | IIT Bombay | India

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4972 bytes
Desc: not available
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20140305/da5f7d4e/attachment.bin>