Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Greenplum
Annotated Deep Learning Paper Implementations
提交
5388e807
A
Annotated Deep Learning Paper Implementations
项目概览
Greenplum
/
Annotated Deep Learning Paper Implementations
10 个月 前同步成功
通知
6
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
A
Annotated Deep Learning Paper Implementations
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
提交
5388e807
编写于
2月 02, 2021
作者:
V
Varuna Jayasiri
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
layer norm
上级
d3790d70
变更
10
展开全部
隐藏空白更改
内联
并排
Showing
10 changed file
with
315 addition
and
306 deletion
+315
-306
docs/activations/swish.html
docs/activations/swish.html
+9
-7
docs/normalization/batch_norm/index.html
docs/normalization/batch_norm/index.html
+64
-38
docs/normalization/layer_norm/index.html
docs/normalization/layer_norm/index.html
+70
-108
docs/optimizers/mnist_experiment.html
docs/optimizers/mnist_experiment.html
+85
-84
docs/transformers/glu_variants/simple.html
docs/transformers/glu_variants/simple.html
+1
-1
labml_nn/activations/swish.py
labml_nn/activations/swish.py
+3
-1
labml_nn/normalization/batch_norm/__init__.py
labml_nn/normalization/batch_norm/__init__.py
+20
-2
labml_nn/normalization/layer_norm/__init__.py
labml_nn/normalization/layer_norm/__init__.py
+60
-63
labml_nn/optimizers/mnist_experiment.py
labml_nn/optimizers/mnist_experiment.py
+2
-1
labml_nn/transformers/glu_variants/simple.py
labml_nn/transformers/glu_variants/simple.py
+1
-1
未找到文件。
docs/activations/swish.html
浏览文件 @
5388e807
...
...
@@ -75,7 +75,9 @@
</div>
<div
class=
'code'
>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
1
</span><span></span><span
class=
"kn"
>
import
</span>
<span
class=
"nn"
>
torch
</span>
<span
class=
"lineno"
>
2
</span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
torch
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
nn
</span></pre></div>
<span
class=
"lineno"
>
2
</span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
torch
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
nn
</span>
<span
class=
"lineno"
>
3
</span>
<span
class=
"lineno"
>
4
</span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
labml_helpers.module
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
Module
</span></pre></div>
</div>
</div>
<div
class=
'section'
id=
'section-1'
>
...
...
@@ -86,7 +88,7 @@
</div>
<div
class=
'code'
>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
5
</span><span
class=
"k"
>
class
</span>
<span
class=
"nc"
>
Swish
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
nn
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
Module
</span><span
class=
"p"
>
):
</span></pre></div>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
7
</span><span
class=
"k"
>
class
</span>
<span
class=
"nc"
>
Swish
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
Module
</span><span
class=
"p"
>
):
</span></pre></div>
</div>
</div>
<div
class=
'section'
id=
'section-2'
>
...
...
@@ -97,9 +99,9 @@
</div>
<div
class=
'code'
>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
6
</span>
<span
class=
"k"
>
def
</span>
<span
class=
"fm"
>
__init__
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
):
</span>
<span
class=
"lineno"
>
7
</span>
<span
class=
"nb"
>
super
</span><span
class=
"p"
>
()
</span><span
class=
"o"
>
.
</span><span
class=
"fm"
>
__init__
</span><span
class=
"p"
>
()
</span>
<span
class=
"lineno"
>
8
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
sigmoid
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
nn
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
Sigmoid
</span><span
class=
"p"
>
()
</span></pre></div>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
8
</span>
<span
class=
"k"
>
def
</span>
<span
class=
"fm"
>
__init__
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
):
</span>
<span
class=
"lineno"
>
9
</span>
<span
class=
"nb"
>
super
</span><span
class=
"p"
>
()
</span><span
class=
"o"
>
.
</span><span
class=
"fm"
>
__init__
</span><span
class=
"p"
>
()
</span>
<span
class=
"lineno"
>
10
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
sigmoid
</span>
<span
class=
"o"
>
=
</span>
<span
class=
"n"
>
nn
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
Sigmoid
</span><span
class=
"p"
>
()
</span></pre></div>
</div>
</div>
<div
class=
'section'
id=
'section-3'
>
...
...
@@ -110,8 +112,8 @@
</div>
<div
class=
'code'
>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
1
0
</span>
<span
class=
"k"
>
def
</span>
<span
class=
"nf"
>
forward
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
x
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
torch
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
Tensor
</span><span
class=
"p"
>
)
</span>
<span
class=
"o"
>
-
>
</span>
<span
class=
"n"
>
torch
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
Tensor
</span><span
class=
"p"
>
:
</span>
<span
class=
"lineno"
>
1
1
</span>
<span
class=
"k"
>
return
</span>
<span
class=
"n"
>
x
</span>
<span
class=
"o"
>
*
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
sigmoid
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
x
</span><span
class=
"p"
>
)
</span></pre></div>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
1
2
</span>
<span
class=
"k"
>
def
</span>
<span
class=
"nf"
>
forward
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
x
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
torch
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
Tensor
</span><span
class=
"p"
>
)
</span>
<span
class=
"o"
>
-
>
</span>
<span
class=
"n"
>
torch
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
Tensor
</span><span
class=
"p"
>
:
</span>
<span
class=
"lineno"
>
1
3
</span>
<span
class=
"k"
>
return
</span>
<span
class=
"n"
>
x
</span>
<span
class=
"o"
>
*
</span>
<span
class=
"bp"
>
self
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
sigmoid
</span><span
class=
"p"
>
(
</span><span
class=
"n"
>
x
</span><span
class=
"p"
>
)
</span></pre></div>
</div>
</div>
</div>
...
...
docs/normalization/batch_norm/index.html
浏览文件 @
5388e807
此差异已折叠。
点击以展开。
docs/normalization/layer_norm/index.html
浏览文件 @
5388e807
此差异已折叠。
点击以展开。
docs/optimizers/mnist_experiment.html
浏览文件 @
5388e807
此差异已折叠。
点击以展开。
docs/transformers/glu_variants/simple.html
浏览文件 @
5388e807
...
...
@@ -118,7 +118,7 @@ We decided to write a simpler implementation to make it easier readers who are n
</div>
<div
class=
'code'
>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
43
</span>
<span
class=
"k"
>
def
</span>
<span
class=
"fm"
>
__init__
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
src_embed
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
nn
</span><span
class=
"o"
>
.
</span><span
class=
"n"
>
Module
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
encoder
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
Encoder
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
generator
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
nn
</span><span
class=
"o"
>
.
</span>
<span
class=
"n"
>
Module
</span><span
class=
"p"
>
):
</span>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
43
</span>
<span
class=
"k"
>
def
</span>
<span
class=
"fm"
>
__init__
</span><span
class=
"p"
>
(
</span><span
class=
"bp"
>
self
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
src_embed
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
Module
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
encoder
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
Encoder
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
generator
</span><span
class=
"p"
>
:
</span>
<span
class=
"n"
>
Module
</span><span
class=
"p"
>
):
</span>
<span
class=
"lineno"
>
44
</span>
<span
class=
"nb"
>
super
</span><span
class=
"p"
>
()
</span><span
class=
"o"
>
.
</span><span
class=
"fm"
>
__init__
</span><span
class=
"p"
>
()
</span></pre></div>
</div>
</div>
...
...
labml_nn/activations/swish.py
浏览文件 @
5388e807
import
torch
from
torch
import
nn
from
labml_helpers.module
import
Module
class
Swish
(
nn
.
Module
):
class
Swish
(
Module
):
def
__init__
(
self
):
super
().
__init__
()
self
.
sigmoid
=
nn
.
Sigmoid
()
...
...
labml_nn/normalization/batch_norm/__init__.py
浏览文件 @
5388e807
...
...
@@ -98,8 +98,10 @@ a CNN classifier that use batch normalization for MNIST dataset.
import
torch
from
torch
import
nn
from
labml_helpers.module
import
Module
class
BatchNorm
(
nn
.
Module
):
class
BatchNorm
(
Module
):
r
"""
## Batch Normalization Layer
...
...
@@ -157,7 +159,7 @@ class BatchNorm(nn.Module):
def
forward
(
self
,
x
:
torch
.
Tensor
):
"""
`x` is a tensor of shape `[batch_size, channels, *]`.
`*` could be any
(even *
) dimensions.
`*` could be any
number of (even 0
) dimensions.
For example, in an image (2D) convolution this will be
`[batch_size, channels, height, width]`
"""
...
...
@@ -200,3 +202,19 @@ class BatchNorm(nn.Module):
# Reshape to original and return
return
x_norm
.
view
(
x_shape
)
def
_test
():
from
labml.logger
import
inspect
x
=
torch
.
zeros
([
2
,
3
,
2
,
4
])
inspect
(
x
.
shape
)
bn
=
BatchNorm
(
3
)
x
=
bn
(
x
)
inspect
(
x
.
shape
)
inspect
(
bn
.
exp_var
.
shape
)
if
__name__
==
'__main__'
:
_test
()
labml_nn/normalization/layer_norm/__init__.py
浏览文件 @
5388e807
...
...
@@ -32,89 +32,86 @@ Layer normalization is generally used for NLP tasks.
We have used layer normalization in most of the
[transformer implementations](../../transformers/gpt/index.html).
"""
from
typing
import
Union
,
List
import
torch
from
torch
import
nn
from
torch
import
nn
,
Size
from
labml_helpers.module
import
Module
class
LayerNorm
(
nn
.
Module
):
class
LayerNorm
(
Module
):
"""
## Layer Normalization
"""
def
__init__
(
self
,
channels
:
int
,
*
,
eps
:
float
=
1e-5
,
momentum
:
float
=
0.1
,
affine
:
bool
=
True
,
track_running_stats
:
bool
=
True
):
def
__init__
(
self
,
normalized_shape
:
Union
[
int
,
List
[
int
],
Size
]
,
*
,
eps
:
float
=
1e-5
,
elementwise_affine
:
bool
=
True
):
"""
* `
channels` is the number of features in the input
* `eps` is $\epsilon$, used in $\sqrt{Var[x^{(k)}] + \epsilon}$ for numerical stability
* `momentum` is the momentum in taking the exponential moving average
* `
affine` is whether to scale and shift the normalized value
* `
track_running_stats` is whether to calculate the moving averages or mean and varianc
e
* `
normalized_shape` $S$ is shape of the elements (except the batch).
The input should then be
$X \in \mathbb{R}^{*
\t
imes S[0]
\t
imes S[1]
\t
imes ...
\t
imes S[n]}$
* `
eps` is $\epsilon$, used in $\sqrt{Var[X}] + \epsilon}$ for numerical stability
* `
elementwise_affine` is whether to scale and shift the normalized valu
e
We've tried to use the same names for arguments as PyTorch `
Batch
Norm` implementation.
We've tried to use the same names for arguments as PyTorch `
Layer
Norm` implementation.
"""
super
().
__init__
()
self
.
channels
=
channels
self
.
normalized_shape
=
normalized_shape
self
.
eps
=
eps
self
.
momentum
=
momentum
self
.
affine
=
affine
self
.
track_running_stats
=
track_running_stats
# Create parameters for $\gamma$ and $\beta$ for scale and shift
if
self
.
affine
:
self
.
scale
=
nn
.
Parameter
(
torch
.
ones
(
channels
))
self
.
shift
=
nn
.
Parameter
(
torch
.
zeros
(
channels
))
# Create buffers to store exponential moving averages of
# mean $\mathbb{E}[x^{(k)}]$ and variance $Var[x^{(k)}]$
if
self
.
track_running_stats
:
self
.
register_buffer
(
'exp_mean'
,
torch
.
zeros
(
channels
))
self
.
register_buffer
(
'exp_var'
,
torch
.
ones
(
channels
))
self
.
elementwise_affine
=
elementwise_affine
# Create parameters for $\gamma$ and $\beta$ for gain and bias
if
self
.
elementwise_affine
:
self
.
gain
=
nn
.
Parameter
(
torch
.
ones
(
normalized_shape
))
self
.
bias
=
nn
.
Parameter
(
torch
.
zeros
(
normalized_shape
))
def
forward
(
self
,
x
:
torch
.
Tensor
):
"""
`x` is a tensor of shape `[
batch_size, channels, *
]`.
`*` could be any
(even *)
dimensions.
For example, in an
image (2D) convolution
this will be
`[
batch_size, channels, height, width
]`
`x` is a tensor of shape `[
*, S[0], S[1], ..., S[n]
]`.
`*` could be any
number of
dimensions.
For example, in an
NLP task
this will be
`[
seq_len, batch_size, features
]`
"""
# Keep the original shape
x_shape
=
x
.
shape
# Get the batch size
batch_size
=
x_shape
[
0
]
# Sanity check to make sure the number of features is same
assert
self
.
channels
==
x
.
shape
[
1
]
# Reshape into `[batch_size, channels, n]`
x
=
x
.
view
(
batch_size
,
self
.
channels
,
-
1
)
# We will calculate the mini-batch mean and variance
# if we are in training mode or if we have not tracked exponential moving averages
if
self
.
training
or
not
self
.
track_running_stats
:
# Calculate the mean across first and last dimension;
# i.e. the means for each feature $\mathbb{E}[x^{(k)}]$
mean
=
x
.
mean
(
dim
=
[
0
,
2
])
# Calculate the squared mean across first and last dimension;
# i.e. the means for each feature $\mathbb{E}[(x^{(k)})^2]$
mean_x2
=
(
x
**
2
).
mean
(
dim
=
[
0
,
2
])
# Variance for each feature $Var[x^{(k)}] = \mathbb{E}[(x^{(k)})^2] - \mathbb{E}[x^{(k)}]^2$
var
=
mean_x2
-
mean
**
2
# Update exponential moving averages
if
self
.
training
and
self
.
track_running_stats
:
self
.
exp_mean
=
(
1
-
self
.
momentum
)
*
self
.
exp_mean
+
self
.
momentum
*
mean
self
.
exp_var
=
(
1
-
self
.
momentum
)
*
self
.
exp_var
+
self
.
momentum
*
var
# Use exponential moving averages as estimates
else
:
mean
=
self
.
exp_mean
var
=
self
.
exp_var
# Normalize $$\hat{x}^{(k)} = \frac{x^{(k)} - \mathbb{E}[x^{(k)}]}{\sqrt{Var[x^{(k)}] + \epsilon}}$$
x_norm
=
(
x
-
mean
.
view
(
1
,
-
1
,
1
))
/
torch
.
sqrt
(
var
+
self
.
eps
).
view
(
1
,
-
1
,
1
)
# Scale and shift $$y^{(k)} =\gamma^{(k)} \hat{x}^{(k)} + \beta^{(k)}$$
if
self
.
affine
:
x_norm
=
self
.
scale
.
view
(
1
,
-
1
,
1
)
*
x_norm
+
self
.
shift
.
view
(
1
,
-
1
,
1
)
# Sanity check to make sure the shapes match
assert
self
.
normalized_shape
==
x
.
shape
[
-
len
(
self
.
normalized_shape
):]
# Reshape into `[M, S[0], S[1], ..., S[n]]`
x
=
x
.
view
(
-
1
,
*
self
.
normalized_shape
)
# Calculate the mean across first dimension;
# i.e. the means for each element $\mathbb{E}[X}]$
mean
=
x
.
mean
(
dim
=
0
)
# Calculate the squared mean across first dimension;
# i.e. the means for each element $\mathbb{E}[X^2]$
mean_x2
=
(
x
**
2
).
mean
(
dim
=
0
)
# Variance for each element $Var[X] = \mathbb{E}[X^2] - \mathbb{E}[X]^2$
var
=
mean_x2
-
mean
**
2
# Normalize $$\hat{X} = \frac{X} - \mathbb{E}[X]}{\sqrt{Var[X] + \epsilon}}$$
x_norm
=
(
x
-
mean
)
/
torch
.
sqrt
(
var
+
self
.
eps
)
# Scale and shift $$\text{LN}(x) = \gamma \hat{X} + \beta$$
if
self
.
elementwise_affine
:
x_norm
=
self
.
gain
*
x_norm
+
self
.
bias
# Reshape to original and return
return
x_norm
.
view
(
x_shape
)
def
_test
():
from
labml.logger
import
inspect
x
=
torch
.
zeros
([
2
,
3
,
2
,
4
])
inspect
(
x
.
shape
)
ln
=
LayerNorm
(
x
.
shape
[
2
:])
x
=
ln
(
x
)
inspect
(
x
.
shape
)
inspect
(
ln
.
gain
.
shape
)
if
__name__
==
'__main__'
:
_test
()
labml_nn/optimizers/mnist_experiment.py
浏览文件 @
5388e807
...
...
@@ -8,6 +8,7 @@ summary: This is a simple MNIST example with a CNN model to test the optimizers.
"""
import
torch.nn
as
nn
import
torch.utils.data
from
labml_helpers.module
import
Module
from
labml
import
experiment
,
tracker
from
labml.configs
import
option
...
...
@@ -19,7 +20,7 @@ from labml_helpers.train_valid import TrainValidConfigs, BatchIndex, hook_model_
from
labml_nn.optimizers.configs
import
OptimizerConfigs
class
Model
(
nn
.
Module
):
class
Model
(
Module
):
"""
## The model
"""
...
...
labml_nn/transformers/glu_variants/simple.py
浏览文件 @
5388e807
...
...
@@ -40,7 +40,7 @@ class AutoregressiveModel(Module):
## Auto regressive model
"""
def
__init__
(
self
,
src_embed
:
nn
.
Module
,
encoder
:
Encoder
,
generator
:
nn
.
Module
):
def
__init__
(
self
,
src_embed
:
Module
,
encoder
:
Encoder
,
generator
:
Module
):
super
().
__init__
()
# Token embedding module
self
.
src_embed
=
src_embed
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录