{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ae69ce68",
   "metadata": {},
   "source": [
    "## 1. PLSC-SwinTransformer Introduction\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35485bc6",
   "metadata": {},
   "source": [
    "PLSC-SwinTransformer reimplementation of [microsoft's repository for the Swin-Transformer](https://github.com/microsoft/Swin-Transformer) model that was released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/pdf/2103.14030.pdf).\n",
    "\n",
    "Swin Transformer (the name Swin stands for Shifted window) capably serves as a general-purpose backbone for computer vision. It is basically a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.\n",
    "\n",
    "![Figure 1 from paper](https://github.com/microsoft/Swin-Transformer/blob/main/figures/teaser.png?raw=true)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97e174e6",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 2. Model Effects"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78137a72",
   "metadata": {},
   "source": [
    "| Model |DType | Phase | Dataset | gpu | img/sec | Top1 Acc | Official |\n",
    "| --- | --- | --- | --- | --- | --- | --- | --- |\n",
    "| Swin-B |FP16 O1|pretrain  |ImageNet2012  |A100*N1C8  |  2155| 0.83362 | 0.835 |\n",
    "| Swin-B |FP16 O2|pretrain  | ImageNet2012 | A100*N1C8 | 3006 | 0.83223\t | 0.835 |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ace3c48d",
   "metadata": {},
   "source": [
    "## 3. How to use the Model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a97a5f56",
   "metadata": {},
   "source": [
    "### 3.1 Install PLSC"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "492fa769-2fe0-4220-b6d9-bbc32f8cca10",
   "metadata": {},
   "source": [
    "```\n",
    "git clone https://github.com/PaddlePaddle/PLSC.git\n",
    "cd /path/to/PLSC/\n",
    "# [optional] pip install -r requirements.txt\n",
    "python setup.py develop\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b22824d",
   "metadata": {},
   "source": [
    "### 3.2 Model Training"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d68ca5fb",
   "metadata": {},
   "source": [
    "1. Enter into the task directory\n",
    "\n",
    "```\n",
    "cd task/classification/swin\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9048df01",
   "metadata": {},
   "source": [
    "2. Prepare the data\n",
    "\n",
    "Organize the data into the following format:\n",
    "\n",
    "\n",
    "```text\n",
    "dataset/\n",
    "└── ILSVRC2012\n",
    "    ├── train\n",
    "    ├── val\n",
    "    ├── train_list.txt\n",
    "    └── val_list.txt\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bea743ea",
   "metadata": {},
   "source": [
    "3. Run the command\n",
    "\n",
    "```shell\n",
    "export PADDLE_NNODES=1\n",
    "export PADDLE_MASTER=\"127.0.0.1:12538\"\n",
    "export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7\n",
    "\n",
    "python -m paddle.distributed.launch \\\n",
    "    --nnodes=$PADDLE_NNODES \\\n",
    "    --master=$PADDLE_MASTER \\\n",
    "    --devices=$CUDA_VISIBLE_DEVICES \\\n",
    "    plsc-train \\\n",
    "    -c ./configs/swin_base_patch4_window7_224_in1k_1n8c_dp_fp16o1.yaml\n",
    "```\n",
    "\n",
    "More courses about model training can be learned here [Swin](https://github.com/PaddlePaddle/PLSC/blob/master/task/classification/swin/README.md)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "186a0c17",
   "metadata": {},
   "source": [
    "### 3.3 Model Inference"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e97c527c",
   "metadata": {},
   "source": [
    "1. Download pretrained model and image\n",
    "\n",
    "\n",
    "```shell\n",
    "# download pretrained model\n",
    "mkdir -p pretrained/swin/Swin_base/\n",
    "wget -O ./pretrained/swin/Swin_base/swin_base_patch4_window7_224_fp16o1.pdparams \n",
    "https://plsc.bj.bcebos.com/models/swin/v2.5/swin_base_patch4_window7_224_fp16o1.pdparams\n",
    "\n",
    "# download image\n",
    "mkdir -p images/\n",
    "wget -O ./images/zebra.png https://plsc.bj.bcebos.com/dataset/test_images/zebra.png\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a07c6549",
   "metadata": {},
   "source": [
    "2. Export model for inference\n",
    "\n",
    "```shell\n",
    "plsc-export -c ./configs/swin_base_patch4_window7_224_in1k_1n8c_dp_fp16o1.yaml -o Global.pretrained_model=./pretrained/swin/Swin_base/swin_base_patch4_window7_224_fp16o1 -o Model.data_format=NCHW -o FP16.level=O0\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e92efe35-ea6d-4aee-9a4d-a2c79f40f473",
   "metadata": {},
   "source": [
    "3. Image inference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22f4a080-ad97-4e00-a9fa-697601f579ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "from plsc.data.dataset import default_loader\n",
    "from plsc.data.preprocess import Resize\n",
    "from plsc.engine.inference import Predictor\n",
    "\n",
    "\n",
    "def preprocess(img):\n",
    "    resize = Resize(size=224, \n",
    "                    interpolation=\"bicubic\", \n",
    "                    backend=\"pil\")\n",
    "    img = np.array(resize(img))\n",
    "    scale = 1.0 / 255.0\n",
    "    mean = np.array([0.485, 0.456, 0.406])\n",
    "    std = np.array([0.229, 0.224, 0.225])\n",
    "    img = (img * scale - mean) / std\n",
    "    img = img[np.newaxis, :, :, :]\n",
    "    img = img.transpose((0, 3, 1, 2))\n",
    "    return {'x': img.astype('float32')}\n",
    "\n",
    "\n",
    "def postprocess(logits):\n",
    "    \n",
    "    def softmax(x, epsilon=1e-6):\n",
    "        exp_x = np.exp(x)\n",
    "        sfm = (exp_x + epsilon) / (np.sum(exp_x) + epsilon)\n",
    "        return sfm\n",
    "\n",
    "    pred = np.array(logits).squeeze()\n",
    "    pred = softmax(pred)\n",
    "    pred_class_idx = pred.argsort()[::-1][0]\n",
    "    return pred_class_idx, pred[pred_class_idx]\n",
    "\n",
    "\n",
    "infer_model = \"./output/swin_base_patch4_window7_224/swin_base_patch4_window7_224.pdmodel\"\n",
    "infer_params = \"./output/swin_base_patch4_window7_224/swin_base_patch4_window7_224.pdiparams\"\n",
    "\n",
    "predictor = Predictor(\n",
    "    model_file=infer_model,\n",
    "    params_file=infer_params,\n",
    "    preprocess_fn=preprocess,\n",
    "    postprocess_fn=postprocess)\n",
    "\n",
    "image = default_loader(\"./images/zebra.png\")\n",
    "pred_class_idx, pred_score = predictor.predict(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d375934d",
   "metadata": {},
   "source": [
    "## 4. Related papers and citations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29f05b07-d323-45e4-b00d-0728eafb5af7",
   "metadata": {},
   "source": [
    "```text\n",
    "@inproceedings{liu2021Swin,\n",
    "  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},\n",
    "  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},\n",
    "  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},\n",
    "  year={2021}\n",
    "}\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}