#
`UNIMO`
Code for the main conference of ACL 2021 long paper [UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning](https://arxiv.org/pdf/2012.15409.pdf) ## Abstract Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other. They can only utilize single-modal data (i.e., text or image) or limited multi-modal data (i.e., image-text pairs). In this work, we propose a UNIfied-MOdal pre-training architecture, namely `UNIMO`, which can effectively adapt to both single-modal and multi-modal understanding and generation tasks. Large scale of free text corpus and image collections are utilized to improve the capability of visual and textual understanding, and cross-modal contrastive learning (CMCL) is leveraged to align the textual and visual information into a unified semantic space over a corpus of image-text pairs augmented with related images and texts. With the help of rich non-paired single-modal data, our model is able to learn more generalizable representations, by allowing textual knowledge and visual knowledge to enhance each other in the unified semantic space. The experimental results show that `UNIMO` greatly improves the performance of several single-modal and multi-modal downstream tasks. ![UNIMO](images/framework.png#pic_center) ## Performance Results on multi-modal understanding and generation tasks: ![UNIMO](images/multiple.png#pic_center) Results on single-modal understanding and generation tasks: ![UNIMO](images/single.png#pic_center) --- ## TODOs - [] Add all downstream tasks - [] Add unimo large model ## Dependencies python 3.7.4\ paddlepaddle-gpu==1.8.4.post107\ pyrouge==0.1.3 ## Pre-trained Models `UNIMO` adopts large-scale text corpus, image collections and image-text aligned datasets as the pre-training data. We provide `UNIMO` models of 1 scale settings which are pretrained: [UNIMO base](https://unimo.bj.bcebos.com/model/unimo_base_en.tar.gz) (lowercased | 12 layers) ``` MODEL_SIZE=base cd /path/to/model_files wget --no-check-certificate -q https://unimo.bj.bcebos.com/model/unimo_${MODEL_SIZE}_en.tar.gz tar -zxf unimo_${MODEL_SIZE}_en.tar.gz ``` ## Experiments Our fine-tuning experiments are carried on V100 GPU. Here are the results from the `UNIMO` model: