<div dir="ltr"><div>Hello,</div><div>My name is Natalia Teplyakova. I&#39;m third year student at Moscow State University, Russia. I am pursuing a degree in Applied Mathematics and Informatics. Currently I&#39;m taking courses from Mail.ru Group (russian IT company) in addition to my university program covering topics in machine learning, information retrieval, advanced C++ and design patterns. My skills are C/C++, Python and several data analysis libraries (numpy, pandas, scikit-learn).</div><div><br></div><div>I&#39;m interested in &quot;Decision trees&quot; project. I have already looked through decision stump and density estimation tree code in mlpack. I am not quite sure, but I think it would be better to implement new class for decision trees with different fitness functions: gini impurity, information gain (both of them implemented in mlpack), misclassification impurity for classification and mean squared error for regression. Is it a good idea to implement some ensemble methods (random forests for example) as a part of this project?</div><div><br></div><div>Besides I have my own idea for GSoC project: implement different clustering methods. mlpack has efficient data structures for neighbourhood queries, so they can be used in DBSCAN clustering. DBSCAN has several advantages compared to KMeans: it does not require to specify the number of clusters in the dataset. Also this clustering method can find arbitrarily shaped clusters and detect outliers. There is an issue about hierarchical clustering (<a href="https://github.com/mlpack/mlpack/issues/356">https://github.com/mlpack/mlpack/issues/356</a>), so I can implement agglomerative clustering too. What do you think about this idea?</div><div><br></div><div>Regards,</div><div>Natalia.</div></div>