[mlpack] [GSoC] Implement Decision tree algorithms
Rajath Shashidhara
rajath.shashidhara at gmail.com
Thu Mar 3 05:48:00 EST 2016
Hello,
I surveyed the popular Decision Tree algorithms to identify the
differences and commonalities between them. I believe that this is
very important for the API design part.
Important observations -
[1] Most decision tree algorithms follow the same greedy strategy to
grow the trees.
[2] The input data is provided in the same format for all algorithms.
(the algorithms don't enforce any order on the input data).
[3] Considerable difference is observed in the greedy choice of
attribute to split the node.
This could differ in the following ways.
(a) Impurity measure - GINI / Entropy / etc.
(b) Some algorithms employ a two step process (significance test
+ Impurity measure).
(c) Significance tests used - chi squared tests, permutation tests, etc.
[4] Treatment of missing values differ in each algorithm.
[5] Most algorithms use a 2-way split of each node. But, there are
several exceptions.
[6] Strategy to split ordered and unordered attributes may be different.
[7] Tree pruning and stopping criterion vary across different algorithms.
[8] Validation and Error estimation methods can also be different.
[a] Cross-validation tests.
[b] Misclassification costs.
[c] Prior probabilities.
Although there are several differences between them they share a
common structure. A generic API for decision trees can be designed by
making use of function pointers (or functors in C++) for
specialization.
One level decision trees / decision stumps can be brought under the
same API (by specifying a stopping criterion of one level).
I am not familiar with Density Estimation Trees. I am going though
some literature to understand if they can be generalized under the
same API.
Thank you,
Rajath Shashidhara
References :
[1] http://www.stat.wisc.edu/~loh/treeprogs/guide/wires11.pdf
On Tue, Mar 1, 2016 at 6:58 PM, Rajath Shashidhara
<rajath.shashidhara at gmail.com> wrote:
> Hello,
>
> I am a GSoC' 2016 aspirant. I have perused your ideas page and I am
> interested in working on the implementation of Decision tree
> algorithms.
>
> I will keep in touch with my ideas and suggestions regarding the project.
>
> Thank you,
> Rajath Shashidhara
More information about the mlpack
mailing list