optimizers.html 17.2 KB
Newer Older
Y
Yu Yang 已提交
1 2 3 4 5 6 7 8
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
Y
Yu Yang 已提交
9
    <title>BaseSGDOptimizer &mdash; PaddlePaddle  documentation</title>
Y
Yu Yang 已提交
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
    
    <link rel="stylesheet" href="../../../_static/classic.css" type="text/css" />
    <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
    
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '../../../',
        VERSION:     '',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="../../../_static/jquery.js"></script>
    <script type="text/javascript" src="../../../_static/underscore.js"></script>
    <script type="text/javascript" src="../../../_static/doctools.js"></script>
    <script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <link rel="top" title="PaddlePaddle  documentation" href="../../../index.html" />
Y
Yu Yang 已提交
28
    <link rel="up" title="Optimizers" href="optimizers_index.html" />
Y
Yu Yang 已提交
29
    <link rel="next" title="DataSources" href="data_sources.html" />
Y
Yu Yang 已提交
30
    <link rel="prev" title="Optimizers" href="optimizers_index.html" /> 
Y
Yu Yang 已提交
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
  </head>
  <body role="document">
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../../../genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="../../../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="data_sources.html" title="DataSources"
             accesskey="N">next</a> |</li>
        <li class="right" >
Y
Yu Yang 已提交
46
          <a href="optimizers_index.html" title="Optimizers"
Y
Yu Yang 已提交
47 48 49
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../../../index.html">PaddlePaddle  documentation</a> &raquo;</li>
          <li class="nav-item nav-item-1"><a href="../../index.html" >User Interface</a> &raquo;</li>
Y
Yu Yang 已提交
50 51
          <li class="nav-item nav-item-2"><a href="index.html" >Model Config Interface</a> &raquo;</li>
          <li class="nav-item nav-item-3"><a href="optimizers_index.html" accesskey="U">Optimizers</a> &raquo;</li> 
Y
Yu Yang 已提交
52 53 54 55 56 57 58 59
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
Y
Yu Yang 已提交
60 61
  <div class="section" id="basesgdoptimizer">
<h1>BaseSGDOptimizer<a class="headerlink" href="#basesgdoptimizer" title="Permalink to this headline"></a></h1>
Y
Yu Yang 已提交
62
<dl class="class">
Y
Yu Yang 已提交
63 64
<dt>
<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.optimizers.</code><code class="descname">BaseSGDOptimizer</code></dt>
Y
Yu Yang 已提交
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
<dd><p>SGD Optimizer.</p>
<p>SGD is an optimization method, trying to find a neural network that
minimize the &#8220;cost/error&#8221; of it by iteration. In paddle&#8217;s implementation
SGD Optimizer is synchronized, which means all gradients will be wait to
calculate and reduced into one gradient, then do optimize operation.</p>
<p>The neural network consider the learning problem of minimizing an objective
function, that has the form of a sum</p>
<div class="math">
\[Q(w) = \sum_{i}^{n} Q_i(w)\]</div>
<p>The value of function Q sometimes is the cost of neural network (Mean
Square Error between prediction and label for example). The function Q is
parametrised by w, the weight/bias of neural network. And weights is what to
be learned. The i is the i-th observation in (trainning) data.</p>
<p>So, the SGD method will optimize the weight by</p>
<div class="math">
Y
Yu Yang 已提交
80
\[w = w - \eta \nabla Q(w) = w - \eta \sum_{i}^{n} \nabla Q_i(w)\]</div>
Y
Yu Yang 已提交
81 82 83
<p>where <span class="math">\(\eta\)</span> is learning rate. And <span class="math">\(n\)</span> is batch size.</p>
</dd></dl>

Y
Yu Yang 已提交
84 85 86
</div>
<div class="section" id="adamoptimizer">
<h1>AdamOptimizer<a class="headerlink" href="#adamoptimizer" title="Permalink to this headline"></a></h1>
Y
Yu Yang 已提交
87
<dl class="class">
Y
Yu Yang 已提交
88 89
<dt>
<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.optimizers.</code><code class="descname">AdamOptimizer</code><span class="sig-paren">(</span><em>beta1=0.9</em>, <em>beta2=0.999</em>, <em>epsilon=1e-08</em><span class="sig-paren">)</span></dt>
Y
Yu Yang 已提交
90 91 92
<dd><p>Adam optimizer.
The details of please refer <a class="reference external" href="https://arxiv.org/abs/1412.6980">Adam: A Method for Stochastic Optimization</a></p>
<div class="math">
Y
Yu Yang 已提交
93 94 95
\[\begin{split}m(w, t) &amp; = \beta_1 m(w, t-1) + (1 - \beta_1) \nabla Q_i(w) \\
v(w, t) &amp; = \beta_2 v(w, t-1) + (1 - \beta_2)(\nabla Q_i(w)) ^2 \\
w &amp; = w - \frac{\eta}{\sqrt{v(w,t) + \epsilon}}\end{split}\]</div>
Y
Yu Yang 已提交
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>beta1</strong> (<em>float</em>) &#8211; the <span class="math">\(\beta_1\)</span> in equation.</li>
<li><strong>beta2</strong> (<em>float</em>) &#8211; the <span class="math">\(\beta_2\)</span> in equation.</li>
<li><strong>epsilon</strong> (<em>float</em>) &#8211; the <span class="math">\(\epsilon\)</span> in equation. It is used to prevent
divided by zero.</li>
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>

Y
Yu Yang 已提交
112 113 114
</div>
<div class="section" id="adamaxoptimizer">
<h1>AdamaxOptimizer<a class="headerlink" href="#adamaxoptimizer" title="Permalink to this headline"></a></h1>
Y
Yu Yang 已提交
115
<dl class="class">
Y
Yu Yang 已提交
116 117
<dt>
<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.optimizers.</code><code class="descname">AdamaxOptimizer</code><span class="sig-paren">(</span><em>beta1</em>, <em>beta2</em><span class="sig-paren">)</span></dt>
Y
Yu Yang 已提交
118
<dd><p>Adamax optimizer.</p>
Y
Yu Yang 已提交
119 120 121 122 123
<p>The details of please refer this <a class="reference external" href="https://arxiv.org/abs/1412.6980">Adam: A Method for Stochastic Optimization</a></p>
<div class="math">
\[\begin{split}m_t &amp; = \beta_1 * m_{t-1} + (1-\beta_1)* \nabla Q_i(w) \\
u_t &amp; = max(\beta_2*u_{t-1}, abs(\nabla Q_i(w))) \\
w_t &amp; = w_{t-1} - (\eta/(1-\beta_1^t))*m_t/u_t\end{split}\]</div>
Y
Yu Yang 已提交
124 125 126 127 128 129 130 131 132 133 134 135 136 137
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>beta1</strong> (<em>float</em>) &#8211; the <span class="math">\(\beta_1\)</span> in the equation.</li>
<li><strong>beta2</strong> (<em>float</em>) &#8211; the <span class="math">\(\beta_2\)</span> in the equation.</li>
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>

Y
Yu Yang 已提交
138 139 140
</div>
<div class="section" id="adagradoptimizer">
<h1>AdaGradOptimizer<a class="headerlink" href="#adagradoptimizer" title="Permalink to this headline"></a></h1>
Y
Yu Yang 已提交
141
<dl class="class">
Y
Yu Yang 已提交
142 143
<dt>
<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.optimizers.</code><code class="descname">AdaGradOptimizer</code></dt>
Y
Yu Yang 已提交
144 145 146 147 148
<dd><p>Adagrad(for ADAptive GRAdient algorithm) optimizer.</p>
<p>For details please refer this <a class="reference external" href="http://www.magicbroom.info/Papers/DuchiHaSi10.pdf">Adaptive Subgradient Methods for
Online Learning and Stochastic Optimization</a>.</p>
<div class="math">
\[\begin{split}G &amp;= \sum_{\tau=1}^{t} g_{\tau} g_{\tau}^T \\
Y
Yu Yang 已提交
149
w &amp; = w - \eta diag(G)^{-\frac{1}{2}} \circ g\end{split}\]</div>
Y
Yu Yang 已提交
150 151
</dd></dl>

Y
Yu Yang 已提交
152 153 154
</div>
<div class="section" id="decayedadagradoptimizer">
<h1>DecayedAdaGradOptimizer<a class="headerlink" href="#decayedadagradoptimizer" title="Permalink to this headline"></a></h1>
Y
Yu Yang 已提交
155
<dl class="class">
Y
Yu Yang 已提交
156 157 158 159
<dt>
<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.optimizers.</code><code class="descname">DecayedAdaGradOptimizer</code><span class="sig-paren">(</span><em>rho=0.95</em>, <em>epsilon=1e-06</em><span class="sig-paren">)</span></dt>
<dd><p>AdaGrad method with decayed sum gradients. The equations of this method
show as follow.</p>
Y
Yu Yang 已提交
160
<div class="math">
Y
Yu Yang 已提交
161 162
\[\begin{split}E(g_t^2) &amp;= \rho * E(g_{t-1}^2) + (1-\rho) * g^2 \\
learning\_rate &amp;= 1/sqrt( ( E(g_t^2) + \epsilon )\end{split}\]</div>
Y
Yu Yang 已提交
163 164 165 166 167
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
Y
Yu Yang 已提交
168 169
<li><strong>rho</strong> (<em>float</em>) &#8211; The <span class="math">\(\rho\)</span> parameter in that equation</li>
<li><strong>epsilon</strong> (<em>float</em>) &#8211; The <span class="math">\(\epsilon\)</span> parameter in that equation.</li>
Y
Yu Yang 已提交
170 171 172 173 174 175 176
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>

Y
Yu Yang 已提交
177 178 179
</div>
<div class="section" id="adadeltaoptimizer">
<h1>AdaDeltaOptimizer<a class="headerlink" href="#adadeltaoptimizer" title="Permalink to this headline"></a></h1>
Y
Yu Yang 已提交
180
<dl class="class">
Y
Yu Yang 已提交
181 182 183 184
<dt>
<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.optimizers.</code><code class="descname">AdaDeltaOptimizer</code><span class="sig-paren">(</span><em>rho=0.95</em>, <em>epsilon=1e-06</em><span class="sig-paren">)</span></dt>
<dd><p>AdaDelta method. The details of adadelta please refer to this
<a class="reference external" href="http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf">ADADELTA: AN ADAPTIVE LEARNING RATE METHOD</a>.</p>
Y
Yu Yang 已提交
185 186
<div class="math">
\[\begin{split}E(g_t^2) &amp;= \rho * E(g_{t-1}^2) + (1-\rho) * g^2 \\
Y
Yu Yang 已提交
187 188 189
learning\_rate &amp;= sqrt( ( E(dx_{t-1}^2) + \epsilon ) / ( \
                  E(g_t^2) + \epsilon ) ) \\
E(dx_t^2) &amp;= \rho * E(dx_{t-1}^2) + (1-\rho) * (-g*learning\_rate)^2\end{split}\]</div>
Y
Yu Yang 已提交
190 191 192 193 194
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
Y
Yu Yang 已提交
195 196
<li><strong>rho</strong> (<em>float</em>) &#8211; <span class="math">\(\rho\)</span> in equation</li>
<li><strong>epsilon</strong> (<em>float</em>) &#8211; <span class="math">\(\rho\)</span> in equation</li>
Y
Yu Yang 已提交
197 198 199 200 201 202 203
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>

Y
Yu Yang 已提交
204 205 206
</div>
<div class="section" id="rmspropoptimizer">
<h1>RMSPropOptimizer<a class="headerlink" href="#rmspropoptimizer" title="Permalink to this headline"></a></h1>
Y
Yu Yang 已提交
207
<dl class="class">
Y
Yu Yang 已提交
208 209 210 211 212
<dt>
<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.optimizers.</code><code class="descname">RMSPropOptimizer</code><span class="sig-paren">(</span><em>rho=0.95</em>, <em>epsilon=1e-06</em><span class="sig-paren">)</span></dt>
<dd><p>RMSProp(for Root Mean Square Propagation) optimizer. For details please
refer this <a class="reference external" href="http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf">slide</a>.</p>
<p>The equations of this method as follows:</p>
Y
Yu Yang 已提交
213
<div class="math">
Y
Yu Yang 已提交
214 215
\[\begin{split}v(w, t) &amp; = \rho v(w, t-1) + (1 - \rho)(\nabla Q_{i}(w))^2 \\
w &amp; = w - \frac{\eta} {\sqrt{v(w,t) + \epsilon}} \nabla Q_{i}(w)\end{split}\]</div>
Y
Yu Yang 已提交
216 217 218 219 220
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
Y
Yu Yang 已提交
221 222
<li><strong>rho</strong> (<em>float</em>) &#8211; the <span class="math">\(\rho\)</span> in the equation. The forgetting factor.</li>
<li><strong>epsilon</strong> (<em>float</em>) &#8211; the <span class="math">\(\epsilon\)</span> in the equation.</li>
Y
Yu Yang 已提交
223 224 225 226 227 228 229
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>

Y
Yu Yang 已提交
230 231 232
</div>
<div class="section" id="settings">
<h1>settings<a class="headerlink" href="#settings" title="Permalink to this headline"></a></h1>
Y
Yu Yang 已提交
233
<dl class="function">
Y
Yu Yang 已提交
234 235
<dt>
<code class="descclassname">paddle.trainer_config_helpers.optimizers.</code><code class="descname">settings</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
Y
Yu Yang 已提交
236 237 238 239 240 241 242 243 244
<dd><p>Set the optimization method, learning rate, batch size, and other training
settings. The currently supported algorithms are SGD and Async-SGD.</p>
<div class="admonition warning">
<p class="first admonition-title">Warning</p>
<p class="last">Note that the &#8216;batch_size&#8217; in PaddlePaddle is not equal to global
training batch size. It represents the single training process&#8217;s batch
size. If you use N processes to train one model, for example use three
GPU machines, the global batch size is N*&#8217;batch_size&#8217;.</p>
</div>
Y
Yu Yang 已提交
245 246 247 248
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
Y
Yu Yang 已提交
249 250 251 252 253 254 255 256 257 258 259 260 261
<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first last simple">
<li><strong>batch_size</strong> (<em>int</em>) &#8211; batch size for one training process.</li>
<li><strong>learning_rate</strong> (<em>float</em>) &#8211; learning rate for SGD</li>
<li><strong>learning_method</strong> (<em>BaseSGDOptimizer</em>) &#8211; The extension optimization algorithms of gradient
descent, such as momentum, adagrad, rmsprop, etc.
Note that it should be instance with base type
BaseSGDOptimizer.</li>
<li><strong>regularization</strong> (<em>BaseRegularization</em>) &#8211; The regularization method.</li>
<li><strong>is_async</strong> (<em>bool</em>) &#8211; Is Async-SGD or not. Default value is False.</li>
<li><strong>model_average</strong> (<em>ModelAverage</em>) &#8211; Model Average Settings.</li>
<li><strong>gradient_clipping_threshold</strong> (<em>float</em>) &#8211; gradient clipping threshold. If gradient
value larger than some value, will be
clipped.</li>
Y
Yu Yang 已提交
262 263 264 265 266 267 268 269 270 271 272 273 274 275 276
</ul>
</td>
</tr>
</tbody>
</table>
</dd></dl>

</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
Y
Yu Yang 已提交
277 278 279 280 281 282 283 284 285 286 287 288
  <h3><a href="../../../index.html">Table Of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">BaseSGDOptimizer</a></li>
<li><a class="reference internal" href="#adamoptimizer">AdamOptimizer</a></li>
<li><a class="reference internal" href="#adamaxoptimizer">AdamaxOptimizer</a></li>
<li><a class="reference internal" href="#adagradoptimizer">AdaGradOptimizer</a></li>
<li><a class="reference internal" href="#decayedadagradoptimizer">DecayedAdaGradOptimizer</a></li>
<li><a class="reference internal" href="#adadeltaoptimizer">AdaDeltaOptimizer</a></li>
<li><a class="reference internal" href="#rmspropoptimizer">RMSPropOptimizer</a></li>
<li><a class="reference internal" href="#settings">settings</a></li>
</ul>

Y
Yu Yang 已提交
289
  <h4>Previous topic</h4>
Y
Yu Yang 已提交
290 291
  <p class="topless"><a href="optimizers_index.html"
                        title="previous chapter">Optimizers</a></p>
Y
Yu Yang 已提交
292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331
  <h4>Next topic</h4>
  <p class="topless"><a href="data_sources.html"
                        title="next chapter">DataSources</a></p>
  <div role="note" aria-label="source link">
    <h3>This Page</h3>
    <ul class="this-page-menu">
      <li><a href="../../../_sources/ui/api/trainer_config_helpers/optimizers.txt"
            rel="nofollow">Show Source</a></li>
    </ul>
   </div>
<div id="searchbox" style="display: none" role="search">
  <h3>Quick search</h3>
    <form class="search" action="../../../search.html" method="get">
      <input type="text" name="q" />
      <input type="submit" value="Go" />
      <input type="hidden" name="check_keywords" value="yes" />
      <input type="hidden" name="area" value="default" />
    </form>
    <p class="searchtip" style="font-size: 90%">
    Enter search terms or a module, class or function name.
    </p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../../../genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="../../../py-modindex.html" title="Python Module Index"
             >modules</a> |</li>
        <li class="right" >
          <a href="data_sources.html" title="DataSources"
             >next</a> |</li>
        <li class="right" >
Y
Yu Yang 已提交
332
          <a href="optimizers_index.html" title="Optimizers"
Y
Yu Yang 已提交
333 334 335
             >previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../../../index.html">PaddlePaddle  documentation</a> &raquo;</li>
          <li class="nav-item nav-item-1"><a href="../../index.html" >User Interface</a> &raquo;</li>
Y
Yu Yang 已提交
336 337
          <li class="nav-item nav-item-2"><a href="index.html" >Model Config Interface</a> &raquo;</li>
          <li class="nav-item nav-item-3"><a href="optimizers_index.html" >Optimizers</a> &raquo;</li> 
Y
Yu Yang 已提交
338 339 340 341 342 343 344 345
      </ul>
    </div>
    <div class="footer" role="contentinfo">
        &copy; Copyright 2016, PaddlePaddle developers.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.3.5.
    </div>
  </body>
</html>