diff --git a/docs/index.html b/docs/index.html
index 8e17ac32124598e6efcbac03fd0e32acba97a6c4..cb6f2bfc6360eb907bf55375dfc00a0b19d6cae4 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -81,12 +81,10 @@ We believe these would help you understand these algorithms better.
implementations.
Modules
-Transformers module
-contains implementations for
-multi-headed attention
-and
-relative multi-headed attention.
+- Multi-headed attention
+- Transformer building blocks
+- Relative multi-headed attention.
- GPT Architecture
- GLU Variants
- kNN-LM: Generalization through Memorization
diff --git a/docs/transformers/index.html b/docs/transformers/index.html
index fa8cd022707dbbaf627e9455770a27953f1ba30a..b97aea941610d5f1e46f3e0bd87fa78259992815 100644
--- a/docs/transformers/index.html
+++ b/docs/transformers/index.html
@@ -78,7 +78,7 @@ from paper Attention Is All You Need<
and derivatives and enhancements of it.
diff --git a/labml_nn/__init__.py b/labml_nn/__init__.py
index f46c8fc08371304e445f4eda0ce602369cdab6c6..4a83e6d8e9dc12570592867fbd357de83a9c8ed3 100644
--- a/labml_nn/__init__.py
+++ b/labml_nn/__init__.py
@@ -15,12 +15,9 @@ implementations.
#### ✨ [Transformers](transformers/index.html)
-[Transformers module](transformers/index.html)
-contains implementations for
-[multi-headed attention](transformers/mha.html)
-and
-[relative multi-headed attention](transformers/relative_mha.html).
-
+* [Multi-headed attention](transformers/mha.html)
+* [Transformer building blocks](transformers/models.html)
+* [Relative multi-headed attention](transformers/xl/relative_mha.html).
* [GPT Architecture](transformers/gpt/index.html)
* [GLU Variants](transformers/glu_variants/simple.html)
* [kNN-LM: Generalization through Memorization](transformers/knn/index.html)
diff --git a/labml_nn/transformers/__init__.py b/labml_nn/transformers/__init__.py
index e511dc276262fd7fc8c90e182fcc6ff64531bfb3..db9169875070bfd2843d1f7b87a8faaaca1c5c31 100644
--- a/labml_nn/transformers/__init__.py
+++ b/labml_nn/transformers/__init__.py
@@ -14,7 +14,7 @@ from paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762),
and derivatives and enhancements of it.
* [Multi-head attention](mha.html)
-* [Relative multi-head attention](relative_mha.html)
+* [Relative multi-head attention](xl/relative_mha.html)
* [Transformer Encoder and Decoder Models](models.html)
* [Fixed positional encoding](positional_encoding.html)
diff --git a/readme.md b/readme.md
index 7bd188a401f719a3c9e97251ab168dfeacc74a7f..533b76ba710b623022bc88467630ed49bf26b5e8 100644
--- a/readme.md
+++ b/readme.md
@@ -21,12 +21,9 @@ implementations almost weekly.
#### ✨ [Transformers](https://nn.labml.ai/transformers/index.html)
-[Transformers module](https://nn.labml.ai/transformers/index.html)
-contains implementations for
-[multi-headed attention](https://nn.labml.ai/transformers/mha.html)
-and
-[relative multi-headed attention](https://nn.labml.ai/transformers/relative_mha.html).
-
+* [Multi-headed attention](https://nn.labml.ai/transformers/mha.html)
+* [Transformer building blocks](https://nn.labml.ai/transformers/models.html)
+* [Relative multi-headed attention](https://nn.labml.ai/transformers/xl/relative_mha.html).
* [GPT Architecture](https://nn.labml.ai/transformers/gpt/index.html)
* [GLU Variants](https://nn.labml.ai/transformers/glu_variants/simple.html)
* [kNN-LM: Generalization through Memorization](https://nn.labml.ai/transformers/knn)
diff --git a/setup.py b/setup.py
index f5568fe644d6bbfa7cf458f0cf32185e565f264f..320e7baaac09bed43aa1add6aa9270beb856f602 100644
--- a/setup.py
+++ b/setup.py
@@ -5,7 +5,7 @@ with open("readme.md", "r") as f:
setuptools.setup(
name='labml-nn',
- version='0.4.85',
+ version='0.4.86',
author="Varuna Jayasiri, Nipun Wijerathne",
author_email="vpjayasiri@gmail.com, hnipun@gmail.com",
description="A collection of PyTorch implementations of neural network architectures and layers.",