<p>This is an implementation of the paper <ahref="https://papers.labml.ai/paper/2010.11929">An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale</a>.</p>
<p>This is an implementation of the paper <ahref="https://papers.labml.ai/paper/2109.08668">Primer: Searching for Efficient Transformers for Language Modeling</a>.</p>
<p>This is an implementation of the paper <ahref="https://papers.labml.ai/paper/2110.13711">Hierarchical Transformers Are More Efficient Language Models</a></p>