未验证 提交 35420cdf 编写于 作者: K kavyasrinet 提交者: GitHub

Updating the Latex equation for Adagrad (#6009)

* Updating the Latex equation for Adagrad

* Fixing Latex euqations for adadelta, adam and adamax
上级 4ff6bc17
...@@ -92,12 +92,12 @@ for gradient descent. ...@@ -92,12 +92,12 @@ for gradient descent.
Adadelta updates are as follows: Adadelta updates are as follows:
$$avgSquaredGradOut = \rho * avgSquaredGrad + (1 - \rho) * grad * grad \break $$
paramUpdate = - $\sqrt{((avgSquaredUpdate + \epsilon) / avg\_squared\_grad\_out = \rho * avg\_squared\_grad + (1 - \rho) * grad * grad \\
(avgSquaredGrad_out + \epsilon))}$ * grad \break param\_update = - \sqrt{\frac{avg\_squared\_update + \epsilon}{avg\_squared\_grad\_out + \epsilon}} * grad \\
avgSquaredUpdateOut = \rho * avgSquaredUpdate + (1 - \rho) * avg\_squared\_update\_out = \rho * avg\_squared\_update + (1 - \rho) * {param\_update}^2 \\
{(paramUpdate)}^2 \break param\_out = param + param\_update
paramOut = param + paramUpdate$$ $$
)DOC"); )DOC");
} }
......
...@@ -80,8 +80,8 @@ Adaptive Gradient Algorithm (Adagrad). ...@@ -80,8 +80,8 @@ Adaptive Gradient Algorithm (Adagrad).
The update is done as follows: The update is done as follows:
$$momentOut = moment + grad * grad \break $$moment\_out = moment + grad * grad \\
paramOut = param - learningRate * grad / ($\sqrt{momentOut}$ + \epsilon) \break param\_out = param - \frac{learning\_rate * grad}{\sqrt{moment\_out} + \epsilon}
$$ $$
The original paper(http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf) The original paper(http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
......
...@@ -112,11 +112,13 @@ adaptive estimates of lower-order moments. ...@@ -112,11 +112,13 @@ adaptive estimates of lower-order moments.
Adam updates: Adam updates:
$$moment_1_{out} = \beta_1 * moment_1 + (1 - \beta_1) * grad \break $$
moment_2_{out} = \beta_2 * moment_2 + (1 - \beta_2) * grad * grad \break moment\_1\_out = \beta_1 * moment\_1 + (1 - \beta_1) * grad \\
learningRate = learningRate * moment\_2_\out = \beta_2 * moment\_2 + (1 - \beta_2) * grad * grad \\
$\sqrt{(1 - \beta_2_{pow})}$ / (1 - \beta_1_{pow}) \break learning\_rate = learning\_rate *
paramOut = param - learningRate * moment_1/ ($\sqrt{(moment_2)} + \epsilon)$$ \frac{\sqrt{1 - \beta_{2\_pow}}}{1 - \beta_{1\_pow}} \\
param\_out = param - learning\_rate * \frac{moment\_1}{\sqrt{moment\_2} + \epsilon}
$$
)DOC"); )DOC");
} }
......
...@@ -108,10 +108,10 @@ Adam algorithm based on the infinity norm. ...@@ -108,10 +108,10 @@ Adam algorithm based on the infinity norm.
Adamax updates: Adamax updates:
$$ $$
momentOut = \beta_{1} * moment + (1 - \beta_{1}) * grad \\ moment\_out = \beta_1 * moment + (1 - \beta_1) * grad \\
infNormOut = max(\beta_{2} * infNorm + \epsilon, |grad|) \\ inf\_norm\_out = max(\beta_2 * inf\_norm + \epsilon, |grad|) \\
learningRate = \frac{learningRate}{1 - \beta_{1}^{Beta1Pow}} \\ learning\_rate = \frac{learning\_rate}{1 - \beta_{1\_pow}} \\
paramOut = param - learningRate * \frac{momentOut}{infNormOut} param\_out = param - learning\_rate * \frac{moment\_out}{inf\_norm\_out}
$$ $$
The original paper does not have an epsilon attribute. The original paper does not have an epsilon attribute.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册