[mlpack] mlpack GSOC idea - Parallel Stochastic Optimization Methods

Chen Qan kazenoyumechen at gmail.com
Sat Feb 28 09:48:41 EST 2015


Hi, I am Qiaoan Chen. I'm interested in the idea of implementing parallel
stochastic optimization methods in mlpack.

Before talking about the idea, let me briefly introduce myself first. I am
a fresh graduate student at Tsinghua University. I think I am a fit student
for this idea because I have finished some courses (Machine Learning,
Machine Learning Techniques on Coursera and our school Optimization Method
class) and have some experience in parallel programming (Multi-thread,
Multi-process, MR). I have fixed mlpack issue 344 to get familiar with
mlpack. I hope I can have a chance of contributing more to mlpack through
the GSOC.

After searching thorough the internet, I found three papers which may be
helpful:

[1]. Parallelized Stochastic Gradient Descent, MA Zinkevich
[2]. HOGWILD! A Lock-Free Approach to Parallelizing Stochastic Gradient
Descent, Niu Feng
[3]. An Asynchronous Parallel Stochastic Coordinate Descent Algorithm, Ji
Liu

I have skimmed the first and the second paper. The parallel version seems
easy to be implemented, but the prove seems much harder (perhaps I lack the
knowledge of mathematical analysis).

I don't know whether or not I found the right papers. If those are right, I
thought it should be easy to implement those methods in mlpack and
integrating with existing methods using SGD, like LR, nca and regularized
svd.

Although the task seems possible, but I have some questions about this:

1. Is parallel SGD/SCD effective ?
    I found the following post
scikit-learn-parallelize-stochastic-gradient-descent
<http://stackoverflow.com/questions/21052050/scikit-learn-parallelize-stochastic-gradient-descent>
says
that instead of using parallel SGD similar in [1], L-BFGS should be used
with warm start point getting from SGD.
2. Is a general parallel SGD/SCD exists ?
    AFAIK, many models use those method tailored for themselves, like FPSG.

Any help would be helpful, thanks.

-- 
Qiaoan Chen
School of Software, Tsinghua University
Addr: Room 11-419, East Main Building, Tsinghua University, 100084
Beijing, P.R.China
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20150228/195427ad/attachment.html>


More information about the mlpack mailing list