# [MLP-Mixer: An all-MLP Architecture for Vision](https://nn.labml.ai/transformers/mlp_mixer/index.html) This is a [PyTorch](https://pytorch.org) implementation of the paper [MLP-Mixer: An all-MLP Architecture for Vision](https://papers.labml.ai/paper/2105.01601). This paper applies the model on vision tasks. The model is similar to a transformer with attention layer being replaced by a MLP that is applied across the patches (or tokens in case of a NLP task). Our implementation of MLP Mixer is a drop in replacement for the [self-attention layer](https://nn.labml.ai/transformers/mha.html) in [our transformer implementation](https://nn.labml.ai/transformers/models.html). So it's just a couple of lines of code, transposing the tensor to apply the MLP across the sequence dimension. Although the paper applied MLP Mixer on vision tasks, we tried it on a [masked language model](https://nn.labml.ai/transformers/mlm/index.html). [Here is the experiment code](https://nn.labml.ai/transformers/mlp_mixer/experiment.html).