[mlpack] mlpack GSOC idea - Parallel Stochastic Optimization Methods

Stephen Tu tu.stephenl at gmail.com
Sun Mar 1 13:08:23 EST 2015


On Sat, Feb 28, 2015 at 6:48 AM, Chen Qan <kazenoyumechen at gmail.com> wrote:
>
> After searching thorough the internet, I found three papers which may be
> helpful:
>
> [1]. Parallelized Stochastic Gradient Descent, MA Zinkevich
> [2]. HOGWILD! A Lock-Free Approach to Parallelizing Stochastic Gradient
> Descent, Niu Feng
> [3]. An Asynchronous Parallel Stochastic Coordinate Descent Algorithm, Ji
> Liu
>
> I have skimmed the first and the second paper. The parallel version seems
> easy to be implemented, but the prove seems much harder (perhaps I lack the
> knowledge of mathematical analysis).
>
> I don't know whether or not I found the right papers. If those are right,
> I thought it should be easy to implement those methods in mlpack and
> integrating with existing methods using SGD, like LR, nca and regularized
> svd.
>

You've found the right papers. For this project, I was thinking of starting
with [2] and [3]. I think [1] makes more sense in a distributed setting
(where communication is much more expensive than in a shared memory
setting).


> Although the task seems possible, but I have some questions about this:
>
> 1. Is parallel SGD/SCD effective ?
>     I found the following post
> scikit-learn-parallelize-stochastic-gradient-descent
> <http://stackoverflow.com/questions/21052050/scikit-learn-parallelize-stochastic-gradient-descent> says
> that instead of using parallel SGD similar in [1], L-BFGS should be used
> with warm start point getting from SGD.
>

Great question, although the link you provide is not really a good counter
example for the reasons discussed within. But nevertheless, a valid
question. This is one of the biggest downfalls of the ML community, to be
honest, that there are no good papers I can point to that really do an
extensive empirical study of this stuff. Hence, if you are interested in
running some benchmarks, you could help answer this question with your
implementation :)


> 2. Is a general parallel SGD/SCD exists ?
>     AFAIK, many models use those method tailored for themselves, like FPSG.
>

What do you mean by general? Anytime you can do SGD, you can do hogwild
(although the performance is sensitive to the structure of the conflict
graph). Similarly, anytime you can so SCD, you can do the async parallel
version.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.cc.gatech.edu/pipermail/mlpack/attachments/20150301/f43c37be/attachment.html>


More information about the mlpack mailing list