<p>Several high level points:</p>


<p>I think you should provide the option for a Hogwild style implementation as well. I think this is generally what people think of when they think of parallel SGD. However, to do this correctly, one should also provide support for sparse gradients-- in fact this is the case when you actually expect parallel SGD to win. When gradients are fully dense, I think the current approach you have is probably the way to go, but its speedups will be inherently limited. </p>


<p>Also echoing what ryan mentioned, the parallel averaging case here can be implemented by reusing the existing optimizer(s). </p>


<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">&mdash;<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly or <a href="https://github.com/mlpack/mlpack/pull/603#issuecomment-207030510">view it on GitHub</a><img alt="" height="1" src="https://github.com/notifications/beacon/AJ4bFPG896WXknIabDViv3PitmZShWR7ks5p1UbngaJpZM4H_54U.gif" width="1" /></p>

<div itemscope itemtype="http://schema.org/EmailMessage">

<div itemprop="action" itemscope itemtype="http://schema.org/ViewAction">

  <link itemprop="url" href="https://github.com/mlpack/mlpack/pull/603#issuecomment-207030510"></link>

  <meta itemprop="name" content="View Pull Request"></meta>

</div>

<meta itemprop="description" content="View this Pull Request on GitHub"></meta>

</div>