Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Greenplum
Annotated Deep Learning Paper Implementations
提交
66100995
A
Annotated Deep Learning Paper Implementations
项目概览
Greenplum
/
Annotated Deep Learning Paper Implementations
11 个月 前同步成功
通知
6
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
A
Annotated Deep Learning Paper Implementations
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
66100995
编写于
2月 19, 2021
作者:
V
Varuna Jayasiri
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
📚
compressive transformer experiment
上级
a1b15502
变更
2
展开全部
隐藏空白更改
内联
并排
Showing
2 changed file
with
501 addition
and
271 deletion
+501
-271
docs/transformers/compressive/experiment.html
docs/transformers/compressive/experiment.html
+447
-241
labml_nn/transformers/compressive/experiment.py
labml_nn/transformers/compressive/experiment.py
+54
-30
未找到文件。
docs/transformers/compressive/experiment.html
浏览文件 @
66100995
此差异已折叠。
点击以展开。
labml_nn/transformers/compressive/experiment.py
浏览文件 @
66100995
...
...
@@ -47,13 +47,14 @@ class AutoregressiveModel(Module):
self
.
mask_mem
=
None
def
forward
(
self
,
x
:
torch
.
Tensor
,
mem
:
CompressedMemory
):
#
Length of the
memory
#
Get memory and compressed
memory
if
mem
is
not
None
:
mem
,
c_mem
=
mem
.
mem
,
mem
.
c_mem
else
:
mem
=
[]
c_mem
=
[]
# Length of the memory (for masks)
m_len
=
len
(
mem
[
0
])
if
mem
else
0
if
c_mem
:
m_len
+=
len
(
c_mem
[
0
])
...
...
@@ -69,7 +70,7 @@ class AutoregressiveModel(Module):
# Concatenate the masks if there is memory
if
m_len
:
mask
=
torch
.
cat
((
self
.
mask_mem
[:
len
(
x
),
:
m_len
],
self
.
mask_x
[:
len
(
x
),
:
len
(
x
)]),
dim
=
1
)
# Use the subsequent mask otherwise
# Use
only
the subsequent mask otherwise
else
:
mask
=
self
.
mask_x
[:
len
(
x
),
:
len
(
x
)]
...
...
@@ -87,7 +88,7 @@ class Configs(NLPAutoRegressionConfigs):
"""
## Configurations
The default configs can and will be over-ridden when we start the experiment
The default configs can and will be over-ridden when we start the experiment
.
"""
model
:
AutoregressiveModel
...
...
@@ -108,8 +109,8 @@ class Configs(NLPAutoRegressionConfigs):
memory
=
SimpleStateModule
()
# Attention Reconstruction Loss
attention_reconstruction_loss
:
AttentionReconstructionLoss
# Compression rat
io
compression_rat
io
:
int
=
4
# Compression rat
e
compression_rat
e
:
int
=
4
# Compressed memory length
c_mem_len
:
int
=
128
...
...
@@ -117,6 +118,7 @@ class Configs(NLPAutoRegressionConfigs):
# Set tracker configurations
tracker
.
set_scalar
(
"accuracy.*"
,
True
)
tracker
.
set_scalar
(
"loss.*"
,
True
)
# Do not print the attention reconstruction loss in the terminal
tracker
.
set_scalar
(
"ar_loss.*"
,
False
)
# Add a hook to log module outputs
hook_model_outputs
(
self
.
mode
,
self
.
model
,
'model'
)
...
...
@@ -124,55 +126,73 @@ class Configs(NLPAutoRegressionConfigs):
self
.
state_modules
=
[
self
.
accuracy
,
self
.
memory
]
@
torch
.
no_grad
()
def
merge_memory
(
self
,
mem
:
CompressedMemory
,
new_mem
:
List
[
torch
.
Tensor
])
\
def
merge_
compress_
memory
(
self
,
mem
:
CompressedMemory
,
new_mem
:
List
[
torch
.
Tensor
])
\
->
Tuple
[
CompressedMemory
,
List
[
torch
.
Tensor
]]:
"""
Concatenate memories and remove old memories to keep a maximum of
`mem_len` memories.
Concatenate new memories and compress the oldest memories.
"""
# If it's configured not to use memory
if
self
.
mem_len
==
0
:
if
self
.
mem_len
==
0
and
self
.
c_mem_len
==
0
:
return
CompressedMemory
([],
[]),
[]
# Get memory and compressed memory
if
mem
is
not
None
:
mem
,
c_mem
=
mem
.
mem
,
mem
.
c_mem
else
:
mem
,
c_mem
=
[],
[]
# Concatenate with old memory
# Concatenate new memories with old memory
if
mem
:
mem
=
[
torch
.
cat
((
m
,
x
),
dim
=
0
)
for
m
,
x
in
zip
(
mem
,
new_mem
)]
else
:
mem
=
new_mem
# Compress the oldest memories if there are more memories than `mem_len`
if
len
(
mem
[
0
])
>
self
.
mem_len
:
n_c_mem
=
(
len
(
mem
[
0
])
-
self
.
mem_len
+
self
.
compression_ratio
-
1
)
//
self
.
compression_ratio
old_mem
=
[]
trunc_mem
=
[]
# Calculate the number of compressed memories to make $n_{cm} = \bigg\lceil\frac{n'_m - N_m}{c}\bigg\rceil$,
# where $n'_m$ is the number of memories we have
# and $N_m$ is the maximum number of memories we maintain (`mem_len`).
n_c_mem
=
(
len
(
mem
[
0
])
-
self
.
mem_len
+
self
.
compression_rate
-
1
)
//
self
.
compression_rate
# Number of memories to compress $c n_{cm}$
n_old
=
n_c_mem
*
self
.
compression_rate
# A list to keep memories that need to be compressed for each layer.
mem_to_compress
=
[]
# A list to keep the memories that do not get compressed for each layer.
uncompressed_mem
=
[]
# Iterate through memories of each layer.
for
m
in
mem
:
n_old
=
n_c_mem
*
self
.
compression_ratio
# Split the memories at $c n_{cm}$
cm
,
m
=
torch
.
split
(
m
,
[
n_old
,
len
(
m
)
-
n_old
])
old_mem
.
append
(
cm
)
trunc_mem
.
append
(
m
)
mem
=
trunc_mem
# Collect memories to compress
mem_to_compress
.
append
(
cm
)
# Collect remaining memories
uncompressed_mem
.
append
(
m
)
# Update the memories
mem
=
uncompressed_mem
# Compress the memories
new_c_mem
=
[]
for
i
,
layer
in
enumerate
(
self
.
model
.
transformer
.
layers
):
new_c_mem
.
append
(
layer
.
compress
(
old_mem
[
i
]))
new_c_mem
.
append
(
layer
.
compress
(
mem_to_compress
[
i
]))
# Concatenate newly compressed memories with old compressed memories
if
c_mem
:
c_mem
=
[
torch
.
cat
((
m
,
nm
),
dim
=
0
)
for
m
,
nm
in
zip
(
c_mem
,
new_c_mem
)]
# If there are no old compressed memories
else
:
c_mem
=
new_c_mem
# Truncate old memories
if
len
(
c_mem
[
0
])
>
self
.
c_mem_len
:
c_mem
=
[
m
[
-
self
.
c_mem_len
:]
for
m
in
c_mem
]
# No memories are compressed if the number of memories is less than `mem_len`
else
:
old_mem
=
[]
mem_to_compress
=
[]
#
return
CompressedMemory
(
mem
,
c_mem
),
old_mem
# Return memories and the memories that were compressed.
# Memories that were compressed is needed for the reconstruction loss computation.
return
CompressedMemory
(
mem
,
c_mem
),
mem_to_compress
def
step
(
self
,
batch
:
any
,
batch_idx
:
BatchIndex
):
"""
...
...
@@ -192,8 +212,8 @@ class Configs(NLPAutoRegressionConfigs):
mem
=
self
.
memory
.
get
()
# Run the model
output
,
new_mem
=
self
.
model
(
data
,
mem
)
# Merge memory
mem
,
old_mem
=
self
.
merge
_memory
(
mem
,
new_mem
)
# Merge
and compress
memory
mem
,
mem_to_compress
=
self
.
merge_compress
_memory
(
mem
,
new_mem
)
# Update memories
self
.
memory
.
set
(
mem
)
...
...
@@ -201,9 +221,13 @@ class Configs(NLPAutoRegressionConfigs):
loss
=
self
.
loss_func
(
output
,
target
)
tracker
.
add
(
"loss."
,
loss
)
if
old_mem
:
ar_loss
=
self
.
attention_reconstruction_loss
(
new_mem
,
old_mem
)
# Calculate attention reconstruction loss if memories were compressed in this step
if
mem_to_compress
:
# Get attention reconstruction loss
ar_loss
=
self
.
attention_reconstruction_loss
(
new_mem
,
mem_to_compress
)
# Track attention reconstruction loss
tracker
.
add
(
"ar_loss."
,
ar_loss
)
# Add attention reconstruction loss to loss
loss
=
loss
+
ar_loss
# Calculate and log accuracy
...
...
@@ -254,8 +278,8 @@ class Configs(NLPAutoRegressionConfigs):
prompt
=
prompt
[
-
1
:]
# Add the prediction for logging
log
+=
[(
self
.
prompt_separator
+
self
.
text
.
itos
[
output
[
-
1
]],
Text
.
value
)]
# Update memory
mem
,
_
=
self
.
merge_memory
(
mem
,
new_mem
)
# Update
and compress
memory
mem
,
_
=
self
.
merge_
compress_
memory
(
mem
,
new_mem
)
# Print the sampled output
logger
.
log
(
log
)
...
...
@@ -273,14 +297,14 @@ def autoregressive_model(c: Configs):
self_attn
=
RelativeMultiHeadAttention
(
c
.
heads
,
c
.
d_model
,
c
.
dropout
),
feed_forward
=
FeedForward
(
c
.
d_model
,
c
.
d_ff
,
c
.
dropout
),
dropout_prob
=
c
.
dropout
,
compress
=
Conv1dCompression
(
c
.
compression_rat
io
,
c
.
d_model
)),
c
.
n_layers
))
compress
=
Conv1dCompression
(
c
.
compression_rat
e
,
c
.
d_model
)),
c
.
n_layers
))
return
m
.
to
(
c
.
device
)
@
option
(
Configs
.
attention_reconstruction_loss
)
def
attention_reconstruction_loss
(
c
:
Configs
):
"""
### Initialize the a
uto-regressive model
### Initialize the a
ttention reconstruction loss
"""
return
AttentionReconstructionLoss
(
c
.
model
.
transformer
.
layers
)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录