[mlpack] GSoC Project Idea

Ryan Curtin gth671b at mail.gatech.edu
Thu Apr 11 11:58:01 EDT 2013


On Thu, Apr 11, 2013 at 01:36:55PM +0530, Akshay wrote:
> Hello,
> 
> I want to discuss the idea of parallelizing some of the already implemented
> algorithms using OpenCL. I have prior experience with OpenCL and its C++
> bindings, and have developed a parallelized implementation of KMeans
> clustering.
> OpenCL is traditionally for GPU computing but, in my experience it can also
> drive multi-core CPUs to full load for significant speedups. Algorithms
> consuming large data simultaneously(like kmeans) can benefit greatly.
> Also the base implementation of algorithms could be used as ground truth
> which should be useful in debugging the parallel version.
> 
> I am also well versed in the standard ML techniques and statistics through
> MOOCs and course projects, and I'm willing to go further.

OpenCL is a difficult proposition.  One of goals of mlpack is
parallelization, but not at the cost of code maintainability.  This is
why we've preferred OpenMP up to this point; OpenMP code can be
implemented with simple #pragma commands which can just as easily be
ignored by the person reading the code, if they do not understand
parallel code -- and most people in machine learning do not.

We do currently have an OpenMP implementation of k-means that is being
merged into trunk.  Perhaps a more suitable project would be OpenMP
support for other machine learning methods.  Would this be interesting
to you?

-- 
Ryan Curtin       | "And they say there is no fate, but there is: it's
ryan at igglybob.com | what you create." - Minister


More information about the mlpack mailing list