Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Greenplum
Annotated Deep Learning Paper Implementations
提交
58cda113
A
Annotated Deep Learning Paper Implementations
项目概览
Greenplum
/
Annotated Deep Learning Paper Implementations
10 个月 前同步成功
通知
6
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
A
Annotated Deep Learning Paper Implementations
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
提交
58cda113
编写于
9月 21, 2021
作者:
V
Varuna Jayasiri
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
primer links
上级
c7fb3f7f
变更
7
隐藏空白更改
内联
并排
Showing
7 changed file
with
200 addition
and
4 deletion
+200
-4
docs/index.html
docs/index.html
+1
-0
docs/transformers/index.html
docs/transformers/index.html
+7
-4
docs/transformers/primer_ez/readme.html
docs/transformers/primer_ez/readme.html
+158
-0
labml_nn/__init__.py
labml_nn/__init__.py
+1
-0
labml_nn/transformers/__init__.py
labml_nn/transformers/__init__.py
+5
-0
labml_nn/transformers/primer_ez/readme.md
labml_nn/transformers/primer_ez/readme.md
+27
-0
readme.md
readme.md
+1
-0
未找到文件。
docs/index.html
浏览文件 @
58cda113
...
...
@@ -96,6 +96,7 @@ implementations.</p>
<li><a
href=
"transformers/mlp_mixer/index.html"
>
MLP-Mixer: An all-MLP Architecture for Vision
</a></li>
<li><a
href=
"transformers/gmlp/index.html"
>
Pay Attention to MLPs (gMLP)
</a></li>
<li><a
href=
"transformers/vit/index.html"
>
Vision Transformer (ViT)
</a></li>
<li><a
href=
"transformers/primer_ez/index.html"
>
Primer EZ
</a></li>
</ul>
<h4>
✨
<a
href=
"recurrent_highway_networks/index.html"
>
Recurrent Highway Networks
</a></h4>
<h4>
✨
<a
href=
"lstm/index.html"
>
LSTM
</a></h4>
...
...
docs/transformers/index.html
浏览文件 @
58cda113
...
...
@@ -121,12 +121,15 @@ It does single GPU training but we implement the concept of switching as describ
<h2><a
href=
"vit/index.html"
>
Vision Transformer (ViT)
</a></h2>
<p>
This is an implementation of the paper
<a
href=
"https://papers.labml.ai/paper/2010.11929"
>
An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale
</a>
.
</p>
<h2><a
href=
"primer_ez/index.html"
>
Primer EZ
</a></h2>
<p>
This is an implementation of the paper
<a
href=
"https://papers.labml.ai/paper/2109.08668"
>
Primer: Searching for Efficient Transformers for Language Modeling
</a>
.
</p>
</div>
<div
class=
'code'
>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
9
3
</span><span></span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
.configs
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
TransformerConfigs
</span>
<span
class=
"lineno"
>
9
4
</span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
.models
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
TransformerLayer
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
Encoder
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
Decoder
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
Generator
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
EncoderDecoder
</span>
<span
class=
"lineno"
>
95
</span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
.mha
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
MultiHeadAttention
</span>
<span
class=
"lineno"
>
96
</span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
labml_nn.transformers.xl.relative_mha
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
RelativeMultiHeadAttention
</span></pre></div>
<div
class=
"highlight"
><pre><span
class=
"lineno"
>
9
8
</span><span></span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
.configs
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
TransformerConfigs
</span>
<span
class=
"lineno"
>
9
9
</span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
.models
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
TransformerLayer
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
Encoder
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
Decoder
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
Generator
</span><span
class=
"p"
>
,
</span>
<span
class=
"n"
>
EncoderDecoder
</span>
<span
class=
"lineno"
>
100
</span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
.mha
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
MultiHeadAttention
</span>
<span
class=
"lineno"
>
101
</span><span
class=
"kn"
>
from
</span>
<span
class=
"nn"
>
labml_nn.transformers.xl.relative_mha
</span>
<span
class=
"kn"
>
import
</span>
<span
class=
"n"
>
RelativeMultiHeadAttention
</span></pre></div>
</div>
</div>
<div
class=
'footer'
>
...
...
docs/transformers/primer_ez/readme.html
0 → 100644
浏览文件 @
58cda113
<!DOCTYPE html>
<html>
<head>
<meta
http-equiv=
"content-type"
content=
"text/html;charset=utf-8"
/>
<meta
name=
"viewport"
content=
"width=device-width, initial-scale=1.0"
/>
<meta
name=
"description"
content=
""
/>
<meta
name=
"twitter:card"
content=
"summary"
/>
<meta
name=
"twitter:image:src"
content=
"https://avatars1.githubusercontent.com/u/64068543?s=400&v=4"
/>
<meta
name=
"twitter:title"
content=
"Primer: Searching for Efficient Transformers for Language Modeling"
/>
<meta
name=
"twitter:description"
content=
""
/>
<meta
name=
"twitter:site"
content=
"@labmlai"
/>
<meta
name=
"twitter:creator"
content=
"@labmlai"
/>
<meta
property=
"og:url"
content=
"https://nn.labml.ai/transformers/primer_ez/readme.html"
/>
<meta
property=
"og:title"
content=
"Primer: Searching for Efficient Transformers for Language Modeling"
/>
<meta
property=
"og:image"
content=
"https://avatars1.githubusercontent.com/u/64068543?s=400&v=4"
/>
<meta
property=
"og:site_name"
content=
"LabML Neural Networks"
/>
<meta
property=
"og:type"
content=
"object"
/>
<meta
property=
"og:title"
content=
"Primer: Searching for Efficient Transformers for Language Modeling"
/>
<meta
property=
"og:description"
content=
""
/>
<title>
Primer: Searching for Efficient Transformers for Language Modeling
</title>
<link
rel=
"shortcut icon"
href=
"/icon.png"
/>
<link
rel=
"stylesheet"
href=
"../../pylit.css"
>
<link
rel=
"canonical"
href=
"https://nn.labml.ai/transformers/primer_ez/readme.html"
/>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script
async
src=
"https://www.googletagmanager.com/gtag/js?id=G-4V3HC8HBLH"
></script>
<script>
window
.
dataLayer
=
window
.
dataLayer
||
[];
function
gtag
()
{
dataLayer
.
push
(
arguments
);
}
gtag
(
'
js
'
,
new
Date
());
gtag
(
'
config
'
,
'
G-4V3HC8HBLH
'
);
</script>
</head>
<body>
<div
id=
'container'
>
<div
id=
"background"
></div>
<div
class=
'section'
>
<div
class=
'docs'
>
<p>
<a
class=
"parent"
href=
"/"
>
home
</a>
<a
class=
"parent"
href=
"../index.html"
>
transformers
</a>
<a
class=
"parent"
href=
"index.html"
>
primer_ez
</a>
</p>
<p>
<a
href=
"https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/transformers/primer_ez/readme.md"
>
<img
alt=
"Github"
src=
"https://img.shields.io/github/stars/labmlai/annotated_deep_learning_paper_implementations?style=social"
style=
"max-width:100%;"
/></a>
<a
href=
"https://twitter.com/labmlai"
rel=
"nofollow"
>
<img
alt=
"Twitter"
src=
"https://img.shields.io/twitter/follow/labmlai?style=social"
style=
"max-width:100%;"
/></a>
</p>
</div>
</div>
<div
class=
'section'
id=
'section-0'
>
<div
class=
'docs'
>
<div
class=
'section-link'
>
<a
href=
'#section-0'
>
#
</a>
</div>
<h1><a
href=
"https://nn.labml.ai/transformers/primer_ez/index.html"
>
Primer: Searching for Efficient Transformers for Language Modeling
</a></h1>
<p>
This is a
<a
href=
"https://pytorch.org"
>
PyTorch
</a>
implementation of the paper
<a
href=
"https://papers.labml.ai/paper/2109.08668"
>
Primer: Searching for Efficient Transformers for Language Modeling
</a>
.
</p>
<p>
The authors do an evolutionary search for transformer architectures.
They name the architecture found using the search Primer (PRIMitives searched transformER).
<strong>
Primer EZ
</strong>
is the architecture with the two most robust modifications in Primer compared to
the original transformer.
Primer EZ trains a lot faster than the vanilla transformer.
</p>
<h3>
Squared ReLU
</h3>
<p>
The most effective modification found by the search is using a square ReLU instead of ReLU in
the
<a
href=
"https://nn.labml.ai/transformers/feed_forward.html"
>
position-wise feedforward module
</a>
.
</p>
<h3>
Multi-DConv-Head Attention (MDHA)
</h3>
<p>
The next effective modification is a depth-wise 3 X 1 convolution after multi-head projection
for queries, keys, and values.
The convolution is along the sequence dimension and per channel (depth-wise).
To be clear, if the number of channels in each head is d_k the convolution will have 1 X 3
kernels for each of the d_k channels.
</p>
<p><a
href=
"https://nn.labml.ai/transformers/primer_ez/experiment.html"
>
Here is the experiment code
</a>
, for Primer EZ.
</p>
<p><a
href=
"https://app.labml.ai/run/30adb7aa1ab211eca7310f80a114e8a4"
><img
alt=
"View Run"
src=
"https://img.shields.io/badge/labml-experiment-brightgreen"
/></a></p>
</div>
<div
class=
'code'
>
</div>
</div>
<div
class=
'footer'
>
<a
href=
"https://papers.labml.ai"
>
Trending Research Papers
</a>
<a
href=
"https://labml.ai"
>
labml.ai
</a>
</div>
</div>
<script
src=
"https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.4/MathJax.js?config=TeX-AMS_HTML"
>
</script>
<!-- MathJax configuration -->
<script
type=
"text/x-mathjax-config"
>
MathJax
.
Hub
.
Config
({
tex2jax
:
{
inlineMath
:
[
[
'
$
'
,
'
$
'
]
],
displayMath
:
[
[
'
$$
'
,
'
$$
'
]
],
processEscapes
:
true
,
processEnvironments
:
true
},
// Center justify equations in code and markdown cells. Elsewhere
// we use CSS to left justify single line equations in code cells.
displayAlign
:
'
center
'
,
"
HTML-CSS
"
:
{
fonts
:
[
"
TeX
"
]
}
});
</script>
<script>
function
handleImages
()
{
var
images
=
document
.
querySelectorAll
(
'
p>img
'
)
console
.
log
(
images
);
for
(
var
i
=
0
;
i
<
images
.
length
;
++
i
)
{
handleImage
(
images
[
i
])
}
}
function
handleImage
(
img
)
{
img
.
parentElement
.
style
.
textAlign
=
'
center
'
var
modal
=
document
.
createElement
(
'
div
'
)
modal
.
id
=
'
modal
'
var
modalContent
=
document
.
createElement
(
'
div
'
)
modal
.
appendChild
(
modalContent
)
var
modalImage
=
document
.
createElement
(
'
img
'
)
modalContent
.
appendChild
(
modalImage
)
var
span
=
document
.
createElement
(
'
span
'
)
span
.
classList
.
add
(
'
close
'
)
span
.
textContent
=
'
x
'
modal
.
appendChild
(
span
)
img
.
onclick
=
function
()
{
console
.
log
(
'
clicked
'
)
document
.
body
.
appendChild
(
modal
)
modalImage
.
src
=
img
.
src
}
span
.
onclick
=
function
()
{
document
.
body
.
removeChild
(
modal
)
}
}
handleImages
()
</script>
</body>
</html>
\ No newline at end of file
labml_nn/__init__.py
浏览文件 @
58cda113
...
...
@@ -32,6 +32,7 @@ implementations.
* [MLP-Mixer: An all-MLP Architecture for Vision](transformers/mlp_mixer/index.html)
* [Pay Attention to MLPs (gMLP)](transformers/gmlp/index.html)
* [Vision Transformer (ViT)](transformers/vit/index.html)
* [Primer EZ](transformers/primer_ez/index.html)
#### ✨ [Recurrent Highway Networks](recurrent_highway_networks/index.html)
...
...
labml_nn/transformers/__init__.py
浏览文件 @
58cda113
...
...
@@ -88,6 +88,11 @@ This is an implementation of the paper
This is an implementation of the paper
[An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale](https://papers.labml.ai/paper/2010.11929).
## [Primer EZ](primer_ez/index.html)
This is an implementation of the paper
[Primer: Searching for Efficient Transformers for Language Modeling](https://papers.labml.ai/paper/2109.08668).
"""
from
.configs
import
TransformerConfigs
...
...
labml_nn/transformers/primer_ez/readme.md
0 → 100644
浏览文件 @
58cda113
# [Primer: Searching for Efficient Transformers for Language Modeling](https://nn.labml.ai/transformers/primer_ez/index.html)
This is a
[
PyTorch
](
https://pytorch.org
)
implementation of the paper
[
Primer: Searching for Efficient Transformers for Language Modeling
](
https://papers.labml.ai/paper/2109.08668
)
.
The authors do an evolutionary search for transformer architectures.
They name the architecture found using the search Primer (PRIMitives searched transformER).
**Primer EZ**
is the architecture with the two most robust modifications in Primer compared to
the original transformer.
Primer EZ trains a lot faster than the vanilla transformer.
### Squared ReLU
The most effective modification found by the search is using a square ReLU instead of ReLU in
the
[
position-wise feedforward module
](
https://nn.labml.ai/transformers/feed_forward.html
)
.
### Multi-DConv-Head Attention (MDHA)
The next effective modification is a depth-wise 3 X 1 convolution after multi-head projection
for queries, keys, and values.
The convolution is along the sequence dimension and per channel (depth-wise).
To be clear, if the number of channels in each head is d_k the convolution will have 1 X 3
kernels for each of the d_k channels.
[
Here is the experiment code
](
https://nn.labml.ai/transformers/primer_ez/experiment.html
)
, for Primer EZ.
[
![View Run
](
https://img.shields.io/badge/labml-experiment-brightgreen
)
](https://app.labml.ai/run/30adb7aa1ab211eca7310f80a114e8a4)
readme.md
浏览文件 @
58cda113
...
...
@@ -37,6 +37,7 @@ implementations almost weekly.
*
[
MLP-Mixer: An all-MLP Architecture for Vision
](
https://nn.labml.ai/transformers/mlp_mixer/index.html
)
*
[
Pay Attention to MLPs (gMLP)
](
https://nn.labml.ai/transformers/gmlp/index.html
)
*
[
Vision Transformer (ViT)
](
https://nn.labml.ai/transformers/vit/index.html
)
*
[
Primer EZ
](
https://nn.labml.ai/transformers/primer_ez/index.html
)
#### ✨ [Recurrent Highway Networks](https://nn.labml.ai/recurrent_highway_networks/index.html)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录