[mlpack] GSoC 2014 : Introduction and Interests

Tue Mar 4 23:59:40 EST 2014

Marcus,
Thanks for the guidance! I will look up on these metrics.

As far as visual performance analysis is concerned, I came across
d3.js and hexbins.js. Once we have the required data we can generate
good visualizations using these tools. I plan on using either one or
both of them.

Also, I was reading up on dual trees. Why exactly does mlpack need
dual tree implementations? I am interested to work on k-d trees.

Is it alright to submit proposals for two projects?

Regards.

Anand

On Wed, Mar 5, 2014 at 4:44 AM, Marcus Edel <marcus.edel at fu-berlin.de> wrote:
> Hello Anand,
>
> Thanks for your contribution!
>
>> 1. Accuracy, Precision and recall, n-fold cross-validation. (Basic stuff)
>
> That's correct. We should take this metrics into account.
>
>> 2. Area under ROC Curves (Receiver Operating Characteristics)
>> [Probability that classifier will rank a randomly chosen positive
>> instance higher than a randomly chosen negative
>> instance.]
>
> ROC curve works only for binary classification and only when you have a continuous output from the classifier. The problem is, generally we don't have the data to plot ROC graphs.
> However, if you like, you can add this quality metric, as an option.
>
>> There are many other possibilities like Bayesian models and
>> statistical confidence intervals which can be used for such purposes.
>> I need more clarifications on the expectations from this project so
>> that I can do my research in the correct direction before the
>> proposal. I will be glad if someone can help.
>
> Generally it depends on what you want to know about the performance characteristics of the classifier/algorithm. I think it is best to report several measures of the performance. Just to add some more metrics:
>
> - F-measure
> - Matthews correlation coefficient (MCC)
> - Relative Classifier Information (RCI)
> - Confusion Entropy (CEN)
> - Cohen's kappa
>
> The last three metrics are also capable to measure the performance of multi-class problems.
>
> My suggestion is to combine the base metrics with unbalanced multi-class metrics. At the end the results are stored in the database so we can easily add more metrics.
>
> Is that helpful? If you have any questions, feel free to ask.
>
> Thanks,
>
> Marcus
>
>
> On 04 Mar 2014, at 17:49, Anand Soni <anand.92.soni at gmail.com> wrote:
>
>> Hi,
>>
>> I built the mlpack environment and tried the all k nearest neighbour
>> search for iris data. I am still exploring and analyzing the results.
>> As mentioned in the project description, we need to implement methods
>> to compare accuracies of algorithms. I have a few ideas. I don't know
>> if they are useful here. I am exploring more.
>>
>> 1. Accuracy, Precision and recall, n-fold cross-validation. (Basic stuff)
>> 2. Area under ROC Curves (Receiver Operating Characteristics)
>> [Probability that classifier will rank a randomly chosen positive
>> instance higher than a randomly chosen negative
>> instance.]
>> 3. Information theoretic metrics [Still exporing] like : Good's
>> Information reward (for binary classification algorithms)
>>
>> There are many other possibilities like Bayesian models and
>> statistical confidence intervals which can be used for such purposes.
>> I need more clarifications on the expectations from this project so
>> that I can do my research in the correct direction before the
>> proposal. I will be glad if someone can help.
>>
>> Regards.
>>
>> Anand
>>
>> On Tue, Mar 4, 2014 at 12:26 AM, Ryan Curtin <gth671b at mail.gatech.edu> wrote:
>>> On Tue, Mar 04, 2014 at 12:19:30AM +0530, Anand Soni wrote:
>>>> Ryan,
>>>>
>>>> I think that the gatech server is down or not responding. I am not
>>>> even able to access www.gatech.edu . I will try a bit later and it
>>>> should work. Thanks a lot, by the way.
>>>
>>> Ok; let me know if you have continued issues.  I am able to access it,
>>> but I'm right here on campus, so there's probably some issue between
>>> here and where you are.  Hopefully it will be resolved soon...
>>>
>>> --
>>> Ryan Curtin    | "More like a nonja."
>>> ryan at ratml.org |   - Pops
>>
>>
>>
>> --
>> Anand Soni | Junior Undergraduate | Department of Computer Science &
>> Engineering | IIT Bombay | India
>

-- 
Anand Soni | Junior Undergraduate | Department of Computer Science &
Engineering | IIT Bombay | India