## 1. PLSC-SwinTransformer模型简介


PLSC-SwinTransformer实现了基于[Swin Transformer](https://github.com/microsoft/Swin-Transformer)的视觉分类模型。Swin Transformer是一个层级结构的Vision Transformer(ViT),Swin代表的是滑动窗口。与ViT不同,Swin基于非重叠的局部窗口计算自注意力,并且跨窗口进行连接保证窗口间信息共享,因此Swin Transormer相比于基于全局的ViT更高效。Swin Transformer可以作为CV领域的一个通用的backbone。模型结构如下,

![Figure 1 from paper](https://github.com/microsoft/Swin-Transformer/blob/main/figures/teaser.png?raw=true)


## 2. 模型效果 

| Model |DType | Phase | Dataset | gpu | img/sec | Top1 Acc | Official |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Swin-B |FP16 O1|pretrain |ImageNet2012 |A100*N1C8 | 2155| 0.83362 | 0.835 |
| Swin-B |FP16 O2|pretrain | ImageNet2012 | A100*N1C8 | 3006 | 0.83223	 | 0.835 |


## 3. 模型如何使用

### 3.1 安装PLSC

```
git clone https://github.com/PaddlePaddle/PLSC.git
cd /path/to/PLSC/
# [optional] pip install -r requirements.txt
python setup.py develop
```

### 3.2 模型训练

1. 进入任务目录

```
cd task/classification/swin
```

2. 准备数据

将数据整理成以下格式:
```text
dataset/
└── ILSVRC2012
 ├── train
 ├── val
 ├── train_list.txt
 └── val_list.txt
```

3. 执行训练命令

```shell
export PADDLE_NNODES=1
export PADDLE_MASTER="127.0.0.1:12538"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

python -m paddle.distributed.launch \
 --nnodes=$PADDLE_NNODES \
 --master=$PADDLE_MASTER \
 --devices=$CUDA_VISIBLE_DEVICES \
 plsc-train \
 -c ./configs/swin_base_patch4_window7_224_in1k_1n8c_dp_fp16o1.yaml
```

更多模型的训练教程可参考文档:[Swin训练文档](https://github.com/PaddlePaddle/PLSC/blob/master/task/classification/swin/README.md)

### 3.3 模型推理

1. 下载预训练模型和图片

```shell
# download pretrained model
mkdir -p pretrained/swin/Swin_base/
wget -O ./pretrained/swin/Swin_base/swin_base_patch4_window7_224_fp16o1.pdparams 
https://plsc.bj.bcebos.com/models/swin/v2.5/swin_base_patch4_window7_224_fp16o1.pdparams

# download image
mkdir -p images/
wget -O ./images/zebra.png https://plsc.bj.bcebos.com/dataset/test_images/zebra.png 
```

2. 导出推理模型

```shell
plsc-export -c ./configs/swin_base_patch4_window7_224_in1k_1n8c_dp_fp16o1.yaml -o Global.pretrained_model=./pretrained/swin/Swin_base/swin_base_patch4_window7_224_fp16o1 -o Model.data_format=NCHW -o FP16.level=O0
```


3. 图片预测

In [None]:
import numpy as np

from plsc.data.dataset import default_loader
from plsc.data.preprocess import Resize
from plsc.engine.inference import Predictor


def preprocess(img):
 resize = Resize(size=224, 
 interpolation="bicubic", 
 backend="pil")
 img = np.array(resize(img))
 scale = 1.0 / 255.0
 mean = np.array([0.485, 0.456, 0.406])
 std = np.array([0.229, 0.224, 0.225])
 img = (img * scale - mean) / std
 img = img[np.newaxis, :, :, :]
 img = img.transpose((0, 3, 1, 2))
 return {'x': img.astype('float32')}


def postprocess(logits):
 
 def softmax(x, epsilon=1e-6):
 exp_x = np.exp(x)
 sfm = (exp_x + epsilon) / (np.sum(exp_x) + epsilon)
 return sfm

 pred = np.array(logits).squeeze()
 pred = softmax(pred)
 pred_class_idx = pred.argsort()[::-1][0]
 return pred_class_idx, pred[pred_class_idx]


infer_model = "./output/swin_base_patch4_window7_224/swin_base_patch4_window7_224.pdmodel"
infer_params = "./output/swin_base_patch4_window7_224/swin_base_patch4_window7_224.pdiparams"

predictor = Predictor(
 model_file=infer_model,
 params_file=infer_params,
 preprocess_fn=preprocess,
 postprocess_fn=postprocess)

image = default_loader("./images/zebra.png ")
pred_class_idx, pred_score = predictor.predict(image)

## 4. 相关论文及引用信息


```text
@inproceedings{liu2021Swin,
 title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
 author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
 booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
 year={2021}
}
```