Merge pull request #3798 from andyjpaddle/add_rec_sar

Add rec_sar

Merge pull request #3798 from andyjpaddle/add_rec_sar
Add rec_sar
9a44e279 · xiaoting · GitHub · 4bb1ce9c · 38fb2ba3 · 9a44e279
19 changed file
--- a/configs/rec/rec_r31_sar.yml
+++ b/configs/rec/rec_r31_sar.yml
+Global:
+  use_gpu: true
+  epoch_num: 5
+  log_smooth_window: 20
+  print_batch_step: 20
+  save_model_dir: ./sar_rec
+  save_epoch_step: 1
+  # evaluation is run every 2000 iterations
+  eval_batch_step: [0, 2000]
+  cal_metric_during_train: True
+  pretrained_model:
+  checkpoints: 
+  save_inference_dir:
+  use_visualdl: False
+  infer_img: 
+  # for data or label process
+  character_dict_path: ppocr/utils/dict90.txt
+  character_type: EN_symbol
+  max_text_length: 30
+  infer_mode: False
+  use_space_char: False
+  rm_symbol: True
+  save_res_path: ./output/rec/predicts_sar.txt
+Optimizer:
+  name: Adam
+  beta1: 0.9
+  beta2: 0.999
+  lr:
+    name: Piecewise
+    decay_epochs: [3, 4]
+    values: [0.001, 0.0001, 0.00001] 
+  regularizer:
+    name: 'L2'
+    factor: 0
+Architecture:
+  model_type: rec
+  algorithm: SAR
+  Transform:
+  Backbone:
+    name: ResNet31
+  Head:
+    name: SARHead
+Loss:
+  name: SARLoss
+PostProcess:
+  name: SARLabelDecode
+Metric:
+  name: RecMetric
+Train:
+  dataset:
+    name: SimpleDataSet
+    label_file_list: ['./train_data/train_list.txt']
+    data_dir: ./train_data/
+    ratio_list: 1.0
+    transforms:
+      - DecodeImage: # load image
+          img_mode: BGR
+          channel_first: False
+      - SARLabelEncode: # Class handling label
+      - SARRecResizeImg:
+          image_shape: [3, 48, 48, 160] # h:48 w:[48,160]
+          width_downsample_ratio: 0.25
+      - KeepKeys:
+          keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
+  loader:
+    shuffle: True
+    batch_size_per_card: 64
+    drop_last: True
+    num_workers: 8
+    use_shared_memory: False
+Eval:
+  dataset:
+    name: LMDBDataSet
+    data_dir: ./eval_data/evaluation/
+    transforms:
+      - DecodeImage: # load image
+          img_mode: BGR
+          channel_first: False
+      - SARLabelEncode: # Class handling label
+      - SARRecResizeImg:
+          image_shape: [3, 48, 48, 160]
+          width_downsample_ratio: 0.25
+      - KeepKeys:
+          keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
+  loader:
+    shuffle: False
+    drop_last: False
+    batch_size_per_card: 64
+    num_workers: 4
+    use_shared_memory: False
--- a/doc/doc_ch/algorithm_overview.md
+++ b/doc/doc_ch/algorithm_overview.md
@@ -45,6 +45,7 @@ PaddleOCR基于动态图开源的文本识别算法列表：
 - [x]  RARE([paper](https://arxiv.org/abs/1603.03915v1))[12]
 - [x]  SRN([paper](https://arxiv.org/abs/2003.12294))[5]
 - [x]  NRTR([paper](https://arxiv.org/abs/1806.00926v2))
+- [x]  SAR([paper](https://arxiv.org/abs/1811.00751v2))
 参考[DTRB][3](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程，使用MJSynth和SynthText两个文字识别数据集训练，在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估，算法效果如下：
@@ -60,6 +61,6 @@ PaddleOCR基于动态图开源的文本识别算法列表：
 |RARE|Resnet34_vd|83.6%|rec_r34_vd_tps_bilstm_att |[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)|
 |SRN|Resnet50_vd_fpn| 88.52% | rec_r50fpn_vd_none_srn | [下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar) |
 |NRTR|NRTR_MTB| 84.3% | rec_mtb_nrtr | [下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) |
+|SAR|Resnet31| 87.2% | rec_r31_sar | [下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
 PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。
--- a/doc/doc_ch/recognition.md
+++ b/doc/doc_ch/recognition.md
@@ -86,7 +86,10 @@ train_data/rec/train/word_002.jpg   用科技让复杂的世界更简单
 若您本地没有数据集，可以在官网下载 [ICDAR2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据，用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ，下载 benchmark 所需的lmdb格式数据集。
+如果希望复现SAR的论文指标，需要下载[SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), 提取码：627x。此外，真实数据集icdar2013, icdar2015, cocotext, IIIT5也作为训练数据的一部分。具体数据细节可以参考论文SAR。
 如果你使用的是icdar2015的公开数据集，PaddleOCR 提供了一份用于训练 ICDAR2015 数据集的标签文件，通过以下方式下载：
 ```
 # 训练集标签
 wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
@@ -230,6 +233,7 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
 | rec_r34_vd_tps_bilstm_att.yml |  CRNN |   Resnet34_vd |  TPS   |  BiLSTM |  att  |
 | rec_r50fpn_vd_none_srn.yml    | SRN | Resnet50_fpn_vd    | None    | rnn | srn |
 | rec_mtb_nrtr.yml    | NRTR | nrtr_mtb    | None    | transformer encoder | transformer decoder |
+| rec_r31_sar.yml               | SAR | ResNet31 | None | LSTM encoder | LSTM decoder |
 训练中文数据，推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)，如您希望尝试其他算法在中文数据集上的效果，请参考下列说明修改配置文件：

--- a/doc/doc_en/algorithm_overview_en.md
+++ b/doc/doc_en/algorithm_overview_en.md
@@ -47,6 +47,7 @@ PaddleOCR open-source text recognition algorithms list:
 - [x]  RARE([paper](https://arxiv.org/abs/1603.03915v1))[12]
 - [x]  SRN([paper](https://arxiv.org/abs/2003.12294))[5]
 - [x]  NRTR([paper](https://arxiv.org/abs/1806.00926v2))
+- [x]  SAR([paper](https://arxiv.org/abs/1811.00751v2))
 Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
@@ -62,5 +63,6 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
 |RARE|Resnet34_vd|83.6%|rec_r34_vd_tps_bilstm_att |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)|
 |SRN|Resnet50_vd_fpn| 88.52% | rec_r50fpn_vd_none_srn |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar)|
 |NRTR|NRTR_MTB| 84.3% | rec_mtb_nrtr | [Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) |
+|SAR|Resnet31| 87.2% | rec_r31_sar | [Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
 Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./recognition_en.md)
--- a/doc/doc_en/recognition_en.md
+++ b/doc/doc_en/recognition_en.md
@@ -92,6 +92,8 @@ Similar to the training set, the test set also needs to be provided a folder con
 If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads).
 Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ，download the lmdb format dataset required for benchmark
+If you want to reproduce the paper SAR, you need to download extra dataset [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), extraction code: 627x. Besides, icdar2013, icdar2015, cocotext, IIIT5k datasets are also used to train. For specific details, please refer to the paper SAR.
 PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
 ```
@@ -236,6 +238,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
 | rec_r34_vd_tps_bilstm_att.yml |  CRNN |   Resnet34_vd |  TPS   |  BiLSTM |  att  |
 | rec_r50fpn_vd_none_srn.yml    | SRN | Resnet50_fpn_vd    | None    | rnn | srn |
 | rec_mtb_nrtr.yml    | NRTR | nrtr_mtb    | None    | transformer encoder | transformer decoder |
+| rec_r31_sar.yml               | SAR | ResNet31 | None | LSTM encoder | LSTM decoder |
 For training Chinese data, it is recommended to use
 [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:

--- a/ppocr/data/imaug/__init__.py
+++ b/ppocr/data/imaug/__init__.py
@@ -21,7 +21,7 @@ from .make_border_map import MakeBorderMap
 from .make_shrink_map import MakeShrinkMap
 from .random_crop_data import EastRandomCropData, PSERandomCrop
-from .rec_img_aug import RecAug, RecResizeImg, ClsResizeImg, SRNRecResizeImg, NRTRRecResizeImg
+from .rec_img_aug import RecAug, RecResizeImg, ClsResizeImg, SRNRecResizeImg, NRTRRecResizeImg, SARRecResizeImg
 from .randaugment import RandAugment
 from .copy_paste import CopyPaste
 from .operators import *

--- a/ppocr/data/imaug/label_ops.py
+++ b/ppocr/data/imaug/label_ops.py
@@ -549,3 +549,49 @@ class TableLabelEncode(object):
            assert False, "Unsupport type %s in char_or_elem" \
                              % char_or_elem
        return idx
+class SARLabelEncode(BaseRecLabelEncode):
+    """ Convert between text-label and text-index """
+    def __init__(self,
+                 max_text_length,
+                 character_dict_path=None,
+                 character_type='ch',
+                 use_space_char=False,
+                 **kwargs):
+        super(SARLabelEncode,
+              self).__init__(max_text_length, character_dict_path,
+                             character_type, use_space_char)
+    def add_special_char(self, dict_character):
+        beg_end_str = "<BOS/EOS>"
+        unknown_str = "<UKN>"
+        padding_str = "<PAD>"
+        dict_character = dict_character + [unknown_str]
+        self.unknown_idx = len(dict_character) - 1
+        dict_character = dict_character + [beg_end_str]
+        self.start_idx = len(dict_character) - 1
+        self.end_idx = len(dict_character) - 1
+        dict_character = dict_character + [padding_str]
+        self.padding_idx = len(dict_character) - 1
+        return dict_character
+    def __call__(self, data):
+        text = data['label']
+        text = self.encode(text)
+        if text is None:
+            return None
+        if len(text) >= self.max_text_len - 1:
+            return None
+        data['length'] = np.array(len(text))
+        target = [self.start_idx] + text + [self.end_idx]
+        padded_text = [self.padding_idx for _ in range(self.max_text_len)]
+        padded_text[:len(target)] = target
+        data['label'] = np.array(padded_text)
+        return data
+    def get_ignored_tokens(self):
+        return [self.padding_idx]
--- a/ppocr/data/imaug/rec_img_aug.py
+++ b/ppocr/data/imaug/rec_img_aug.py
@@ -102,6 +102,56 @@ class SRNRecResizeImg(object):
        return data
+class SARRecResizeImg(object):
+    def __init__(self, image_shape, width_downsample_ratio=0.25, **kwargs):
+        self.image_shape = image_shape
+        self.width_downsample_ratio = width_downsample_ratio
+    def __call__(self, data):
+        img = data['image']
+        norm_img, resize_shape, pad_shape, valid_ratio = resize_norm_img_sar(img, self.image_shape, self.width_downsample_ratio)
+        data['image'] = norm_img
+        data['resized_shape'] = resize_shape
+        data['pad_shape'] = pad_shape
+        data['valid_ratio'] = valid_ratio
+        return data
+def resize_norm_img_sar(img, image_shape, width_downsample_ratio=0.25):
+    imgC, imgH, imgW_min, imgW_max = image_shape
+    h = img.shape[0]
+    w = img.shape[1]
+    valid_ratio = 1.0
+    # make sure new_width is an integral multiple of width_divisor.
+    width_divisor = int(1 / width_downsample_ratio)
+    # resize
+    ratio = w / float(h)
+    resize_w = math.ceil(imgH * ratio)
+    if resize_w % width_divisor != 0:
+        resize_w = round(resize_w / width_divisor) * width_divisor
+    if imgW_min is not None:
+        resize_w = max(imgW_min, resize_w)
+    if imgW_max is not None:
+        valid_ratio = min(1.0, 1.0 * resize_w / imgW_max)
+        resize_w = min(imgW_max, resize_w)
+    resized_image = cv2.resize(img, (resize_w, imgH))
+    resized_image = resized_image.astype('float32')
+    # norm 
+    if image_shape[0] == 1:
+        resized_image = resized_image / 255
+        resized_image = resized_image[np.newaxis, :]
+    else:
+        resized_image = resized_image.transpose((2, 0, 1)) / 255
+    resized_image -= 0.5
+    resized_image /= 0.5
+    resize_shape = resized_image.shape
+    padding_im = -1.0 * np.ones((imgC, imgH, imgW_max), dtype=np.float32)
+    padding_im[:, :, 0:resize_w] = resized_image
+    pad_shape = padding_im.shape
+    return padding_im, resize_shape, pad_shape, valid_ratio
 def resize_norm_img(img, image_shape):
    imgC, imgH, imgW = image_shape
    h = img.shape[0]

--- a/ppocr/losses/__init__.py
+++ b/ppocr/losses/__init__.py
@@ -26,6 +26,7 @@ from .rec_ctc_loss import CTCLoss
 from .rec_att_loss import AttentionLoss
 from .rec_srn_loss import SRNLoss
 from .rec_nrtr_loss import NRTRLoss
+from .rec_sar_loss import SARLoss
 # cls loss
 from .cls_loss import ClsLoss
@@ -44,7 +45,7 @@ from .table_att_loss import TableAttentionLoss
 def build_loss(config):
    support_dict = [
        'DBLoss', 'EASTLoss', 'SASTLoss', 'CTCLoss', 'ClsLoss', 'AttentionLoss',
-        'SRNLoss', 'PGLoss', 'CombinedLoss', 'NRTRLoss', 'TableAttentionLoss'
+        'SRNLoss', 'PGLoss', 'CombinedLoss', 'NRTRLoss', 'TableAttentionLoss', 'SARLoss'
    ]
    config = copy.deepcopy(config)

--- a/ppocr/losses/rec_sar_loss.py
+++ b/ppocr/losses/rec_sar_loss.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle
+from paddle import nn
+class SARLoss(nn.Layer):
+    def __init__(self, **kwargs):
+        super(SARLoss, self).__init__()
+        self.loss_func = paddle.nn.loss.CrossEntropyLoss(reduction="mean", ignore_index=96)
+    def forward(self, predicts, batch):
+        predict = predicts[:, :-1, :] # ignore last index of outputs to be in same seq_len with targets
+        label = batch[1].astype("int64")[:, 1:] # ignore first index of target in loss calculation
+        batch_size, num_steps, num_classes = predict.shape[0], predict.shape[
+            1], predict.shape[2]
+        assert len(label.shape) == len(list(predict.shape)) - 1, \
+            "The target's shape and inputs's shape is [N, d] and [N, num_steps]"
+        inputs = paddle.reshape(predict, [-1, num_classes])
+        targets = paddle.reshape(label, [-1])
+        loss = self.loss_func(inputs, targets)
+        return {'loss': loss}
--- a/ppocr/modeling/backbones/__init__.py
+++ b/ppocr/modeling/backbones/__init__.py
@@ -27,8 +27,9 @@ def build_backbone(config, model_type):
        from .rec_resnet_fpn import ResNetFPN
        from .rec_mv1_enhance import MobileNetV1Enhance
        from .rec_nrtr_mtb import MTB
+        from .rec_resnet_31 import ResNet31
        support_dict = [
-            'MobileNetV1Enhance', 'MobileNetV3', 'ResNet', 'ResNetFPN', 'MTB'
+            'MobileNetV1Enhance', 'MobileNetV3', 'ResNet', 'ResNetFPN', 'MTB', "ResNet31"
        ]
    elif model_type == "e2e":
        from .e2e_resnet_vd_pg import ResNet

--- a/ppocr/modeling/backbones/rec_resnet_31.py
+++ b/ppocr/modeling/backbones/rec_resnet_31.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle
+from paddle import ParamAttr
+import paddle.nn as nn
+import paddle.nn.functional as F
+import numpy as np
+__all__ = ["ResNet31"]
+def conv3x3(in_channel, out_channel, stride=1):
+    return nn.Conv2D(
+        in_channel,
+        out_channel,
+        kernel_size=3,
+        stride=stride,
+        padding=1,
+        bias_attr=False
+    )
+class BasicBlock(nn.Layer):
+    expansion = 1
+    def __init__(self, in_channels, channels, stride=1, downsample=False):
+        super().__init__()
+        self.conv1 = conv3x3(in_channels, channels, stride)
+        self.bn1 = nn.BatchNorm2D(channels)
+        self.relu = nn.ReLU()
+        self.conv2 = conv3x3(channels, channels)
+        self.bn2 = nn.BatchNorm2D(channels)
+        self.downsample = downsample
+        if downsample:
+            self.downsample = nn.Sequential(
+                nn.Conv2D(in_channels, channels * self.expansion, 1, stride, bias_attr=False),
+                nn.BatchNorm2D(channels * self.expansion),
+            )
+        else:
+            self.downsample = nn.Sequential()
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        if self.downsample:
+            residual = self.downsample(x)
+        out += residual
+        out = self.relu(out)
+        return out        
+class ResNet31(nn.Layer):
+    '''
+    Args:
+        in_channels (int): Number of channels of input image tensor.
+        layers (list[int]): List of BasicBlock number for each stage.
+        channels (list[int]): List of out_channels of Conv2d layer.
+        out_indices (None | Sequence[int]): Indices of output stages.
+        last_stage_pool (bool): If True, add `MaxPool2d` layer to last stage.
+    '''
+    def __init__(self, 
+                in_channels=3, 
+                layers=[1, 2, 5, 3],
+                channels=[64, 128, 256, 256, 512, 512, 512],
+                out_indices=None,
+                last_stage_pool=False):
+        super(ResNet31, self).__init__()
+        assert isinstance(in_channels, int)
+        assert isinstance(last_stage_pool, bool)
+        self.out_indices = out_indices
+        self.last_stage_pool = last_stage_pool
+        # conv 1 (Conv Conv)
+        self.conv1_1 = nn.Conv2D(in_channels, channels[0], kernel_size=3, stride=1, padding=1)
+        self.bn1_1 = nn.BatchNorm2D(channels[0])
+        self.relu1_1 = nn.ReLU()
+        self.conv1_2 = nn.Conv2D(channels[0], channels[1], kernel_size=3, stride=1, padding=1)
+        self.bn1_2 = nn.BatchNorm2D(channels[1])
+        self.relu1_2 = nn.ReLU()
+        # conv 2 (Max-pooling, Residual block, Conv)
+        self.pool2 = nn.MaxPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self.block2 = self._make_layer(channels[1], channels[2], layers[0])
+        self.conv2 = nn.Conv2D(channels[2], channels[2], kernel_size=3, stride=1, padding=1)
+        self.bn2 = nn.BatchNorm2D(channels[2])
+        self.relu2 = nn.ReLU()
+        # conv 3 (Max-pooling, Residual block, Conv)
+        self.pool3 = nn.MaxPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self.block3 = self._make_layer(channels[2], channels[3], layers[1])
+        self.conv3 = nn.Conv2D(channels[3], channels[3], kernel_size=3, stride=1, padding=1)
+        self.bn3 = nn.BatchNorm2D(channels[3])
+        self.relu3 = nn.ReLU()
+        # conv 4 (Max-pooling, Residual block, Conv)
+        self.pool4 = nn.MaxPool2D(kernel_size=(2, 1), stride=(2, 1), padding=0, ceil_mode=True)
+        self.block4 = self._make_layer(channels[3], channels[4], layers[2])
+        self.conv4 = nn.Conv2D(channels[4], channels[4], kernel_size=3, stride=1, padding=1)
+        self.bn4 = nn.BatchNorm2D(channels[4])
+        self.relu4 = nn.ReLU()
+        # conv 5 ((Max-pooling), Residual block, Conv)
+        self.pool5 = None
+        if self.last_stage_pool:
+            self.pool5 = nn.MaxPool2D(kernel_size=2, stride=2, padding=0, ceil_mode=True)
+        self.block5 = self._make_layer(channels[4], channels[5], layers[3])
+        self.conv5 = nn.Conv2D(channels[5], channels[5], kernel_size=3, stride=1, padding=1)
+        self.bn5 = nn.BatchNorm2D(channels[5])
+        self.relu5 = nn.ReLU()
+        self.out_channels = channels[-1]
+    def _make_layer(self, input_channels, output_channels, blocks):
+        layers = []
+        for _ in range(blocks):
+            downsample = None
+            if input_channels != output_channels:
+                downsample = nn.Sequential(
+                    nn.Conv2D(
+                        input_channels, 
+                        output_channels, 
+                        kernel_size=1, 
+                        stride=1, 
+                        bias_attr=False),
+                    nn.BatchNorm2D(output_channels),
+                )
+            layers.append(BasicBlock(input_channels, output_channels, downsample=downsample))
+            input_channels = output_channels
+        return nn.Sequential(*layers)
+    def forward(self, x):
+        x = self.conv1_1(x)
+        x = self.bn1_1(x)
+        x = self.relu1_1(x)
+        x = self.conv1_2(x)
+        x = self.bn1_2(x)
+        x = self.relu1_2(x)
+        outs = []
+        for i in range(4):
+            layer_index = i + 2
+            pool_layer = getattr(self, f'pool{layer_index}')
+            block_layer = getattr(self, f'block{layer_index}')
+            conv_layer = getattr(self, f'conv{layer_index}')
+            bn_layer = getattr(self, f'bn{layer_index}')
+            relu_layer = getattr(self, f'relu{layer_index}')
+            if pool_layer is not None:
+                x = pool_layer(x)
+            x = block_layer(x)
+            x = conv_layer(x)
+            x = bn_layer(x)
+            x= relu_layer(x)
+            outs.append(x)
+        if self.out_indices is not None:
+            return tuple([outs[i] for i in self.out_indices])
+        return x
--- a/ppocr/modeling/heads/__init__.py
+++ b/ppocr/modeling/heads/__init__.py
@@ -27,12 +27,13 @@ def build_head(config):
    from .rec_att_head import AttentionHead
    from .rec_srn_head import SRNHead
    from .rec_nrtr_head import Transformer
+    from .rec_sar_head import SARHead
    # cls head
    from .cls_head import ClsHead
    support_dict = [
        'DBHead', 'EASTHead', 'SASTHead', 'CTCHead', 'ClsHead', 'AttentionHead',
-        'SRNHead', 'PGHead', 'Transformer', 'TableAttentionHead'
+        'SRNHead', 'PGHead', 'Transformer', 'TableAttentionHead', 'SARHead'
    ]
    #table head

--- a/ppocr/modeling/heads/rec_sar_head.py
+++ b/ppocr/modeling/heads/rec_sar_head.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+import paddle
+from paddle import ParamAttr
+import paddle.nn as nn
+import paddle.nn.functional as F
+class SAREncoder(nn.Layer):
+    """
+    Args:
+        enc_bi_rnn (bool): If True, use bidirectional RNN in encoder.
+        enc_drop_rnn (float): Dropout probability of RNN layer in encoder.
+        enc_gru (bool): If True, use GRU, else LSTM in encoder.
+        d_model (int): Dim of channels from backbone.
+        d_enc (int): Dim of encoder RNN layer.
+        mask (bool): If True, mask padding in RNN sequence.
+    """
+    def __init__(self,
+                 enc_bi_rnn=False,
+                 enc_drop_rnn=0.1,
+                 enc_gru=False,
+                 d_model=512,
+                 d_enc=512,
+                 mask=True,
+                 **kwargs):
+        super().__init__()
+        assert isinstance(enc_bi_rnn, bool)
+        assert isinstance(enc_drop_rnn, (int, float))
+        assert 0 <= enc_drop_rnn < 1.0
+        assert isinstance(enc_gru, bool)
+        assert isinstance(d_model, int)
+        assert isinstance(d_enc, int)
+        assert isinstance(mask, bool)
+        self.enc_bi_rnn = enc_bi_rnn
+        self.enc_drop_rnn = enc_drop_rnn
+        self.mask = mask
+        # LSTM Encoder
+        if enc_bi_rnn:
+            direction = 'bidirectional'
+        else:
+            direction = 'forward'
+        kwargs = dict(
+            input_size=d_model,
+            hidden_size=d_enc,
+            num_layers=2,
+            time_major=False,
+            dropout=enc_drop_rnn,
+            direction=direction)
+        if enc_gru:
+            self.rnn_encoder = nn.GRU(**kwargs)
+        else:
+            self.rnn_encoder = nn.LSTM(**kwargs)
+        # global feature transformation
+        encoder_rnn_out_size = d_enc * (int(enc_bi_rnn) + 1)
+        self.linear = nn.Linear(encoder_rnn_out_size, encoder_rnn_out_size)
+    def forward(self, feat, img_metas=None):
+        if img_metas is not None:
+            assert len(img_metas[0]) == feat.shape[0]
+        valid_ratios = None
+        if img_metas is not None and self.mask:
+            valid_ratios = img_metas[-1]
+        h_feat = feat.shape[2]  # bsz c h w
+        feat_v = F.max_pool2d(
+            feat, kernel_size=(h_feat, 1), stride=1, padding=0)
+        feat_v = feat_v.squeeze(2)  # bsz * C * W
+        feat_v = paddle.transpose(feat_v, perm=[0, 2, 1])  # bsz * W * C
+        holistic_feat = self.rnn_encoder(feat_v)[0]  # bsz * T * C
+        if valid_ratios is not None:
+            valid_hf = []
+            T = holistic_feat.shape[1]
+            for i, valid_ratio in enumerate(valid_ratios):
+                valid_step = min(T, math.ceil(T * valid_ratio)) - 1
+                valid_hf.append(holistic_feat[i, valid_step, :])
+            valid_hf = paddle.stack(valid_hf, axis=0)
+        else:
+            valid_hf = holistic_feat[:, -1, :]  # bsz * C
+        holistic_feat = self.linear(valid_hf)  # bsz * C
+        return holistic_feat
+class BaseDecoder(nn.Layer):
+    def __init__(self, **kwargs):
+        super().__init__()
+    def forward_train(self, feat, out_enc, targets, img_metas):
+        raise NotImplementedError
+    def forward_test(self, feat, out_enc, img_metas):
+        raise NotImplementedError
+    def forward(self,
+                feat,
+                out_enc,
+                label=None,
+                img_metas=None,
+                train_mode=True):
+        self.train_mode = train_mode
+        if train_mode:
+            return self.forward_train(feat, out_enc, label, img_metas)
+        return self.forward_test(feat, out_enc, img_metas)
+class ParallelSARDecoder(BaseDecoder):
+    """
+    Args:
+        out_channels (int): Output class number.
+        enc_bi_rnn (bool): If True, use bidirectional RNN in encoder.
+        dec_bi_rnn (bool): If True, use bidirectional RNN in decoder.
+        dec_drop_rnn (float): Dropout of RNN layer in decoder.
+        dec_gru (bool): If True, use GRU, else LSTM in decoder.
+        d_model (int): Dim of channels from backbone.
+        d_enc (int): Dim of encoder RNN layer.
+        d_k (int): Dim of channels of attention module.
+        pred_dropout (float): Dropout probability of prediction layer.
+        max_seq_len (int): Maximum sequence length for decoding.
+        mask (bool): If True, mask padding in feature map.
+        start_idx (int): Index of start token.
+        padding_idx (int): Index of padding token.
+        pred_concat (bool): If True, concat glimpse feature from
+            attention with holistic feature and hidden state.
+    """
+    def __init__(
+            self,
+            out_channels,  # 90 + unknown + start + padding
+            enc_bi_rnn=False,
+            dec_bi_rnn=False,
+            dec_drop_rnn=0.0,
+            dec_gru=False,
+            d_model=512,
+            d_enc=512,
+            d_k=64,
+            pred_dropout=0.1,
+            max_text_length=30,
+            mask=True,
+            pred_concat=True,
+            **kwargs):
+        super().__init__()
+        self.num_classes = out_channels
+        self.enc_bi_rnn = enc_bi_rnn
+        self.d_k = d_k
+        self.start_idx = out_channels - 2
+        self.padding_idx = out_channels - 1
+        self.max_seq_len = max_text_length
+        self.mask = mask
+        self.pred_concat = pred_concat
+        encoder_rnn_out_size = d_enc * (int(enc_bi_rnn) + 1)
+        decoder_rnn_out_size = encoder_rnn_out_size * (int(dec_bi_rnn) + 1)
+        # 2D attention layer
+        self.conv1x1_1 = nn.Linear(decoder_rnn_out_size, d_k)
+        self.conv3x3_1 = nn.Conv2D(
+            d_model, d_k, kernel_size=3, stride=1, padding=1)
+        self.conv1x1_2 = nn.Linear(d_k, 1)
+        # Decoder RNN layer
+        if dec_bi_rnn:
+            direction = 'bidirectional'
+        else:
+            direction = 'forward'
+        kwargs = dict(
+            input_size=encoder_rnn_out_size,
+            hidden_size=encoder_rnn_out_size,
+            num_layers=2,
+            time_major=False,
+            dropout=dec_drop_rnn,
+            direction=direction)
+        if dec_gru:
+            self.rnn_decoder = nn.GRU(**kwargs)
+        else:
+            self.rnn_decoder = nn.LSTM(**kwargs)
+        # Decoder input embedding
+        self.embedding = nn.Embedding(
+            self.num_classes,
+            encoder_rnn_out_size,
+            padding_idx=self.padding_idx)
+        # Prediction layer
+        self.pred_dropout = nn.Dropout(pred_dropout)
+        pred_num_classes = self.num_classes - 1
+        if pred_concat:
+            fc_in_channel = decoder_rnn_out_size + d_model + d_enc
+        else:
+            fc_in_channel = d_model
+        self.prediction = nn.Linear(fc_in_channel, pred_num_classes)
+    def _2d_attention(self,
+                      decoder_input,
+                      feat,
+                      holistic_feat,
+                      valid_ratios=None):
+        y = self.rnn_decoder(decoder_input)[0]
+        # y: bsz * (seq_len + 1) * hidden_size
+        attn_query = self.conv1x1_1(y)  # bsz * (seq_len + 1) * attn_size
+        bsz, seq_len, attn_size = attn_query.shape
+        attn_query = paddle.unsqueeze(attn_query, axis=[3, 4])
+        # (bsz, seq_len + 1, attn_size, 1, 1)
+        attn_key = self.conv3x3_1(feat)
+        # bsz * attn_size * h * w
+        attn_key = attn_key.unsqueeze(1)
+        # bsz * 1 * attn_size * h * w
+        attn_weight = paddle.tanh(paddle.add(attn_key, attn_query))
+        # bsz * (seq_len + 1) * attn_size * h * w
+        attn_weight = paddle.transpose(attn_weight, perm=[0, 1, 3, 4, 2])
+        # bsz * (seq_len + 1) * h * w * attn_size
+        attn_weight = self.conv1x1_2(attn_weight)
+        # bsz * (seq_len + 1) * h * w * 1
+        bsz, T, h, w, c = attn_weight.shape
+        assert c == 1
+        if valid_ratios is not None:
+            # cal mask of attention weight
+            for i, valid_ratio in enumerate(valid_ratios):
+                valid_width = min(w, math.ceil(w * valid_ratio))
+                attn_weight[i, :, :, valid_width:, :] = float('-inf')
+        attn_weight = paddle.reshape(attn_weight, [bsz, T, -1])
+        attn_weight = F.softmax(attn_weight, axis=-1)
+        attn_weight = paddle.reshape(attn_weight, [bsz, T, h, w, c])
+        attn_weight = paddle.transpose(attn_weight, perm=[0, 1, 4, 2, 3])
+        # attn_weight: bsz * T * c * h * w
+        # feat: bsz * c * h * w
+        attn_feat = paddle.sum(paddle.multiply(feat.unsqueeze(1), attn_weight),
+                               (3, 4),
+                               keepdim=False)
+        # bsz * (seq_len + 1) * C
+        # Linear transformation
+        if self.pred_concat:
+            hf_c = holistic_feat.shape[-1]
+            holistic_feat = paddle.expand(
+                holistic_feat, shape=[bsz, seq_len, hf_c])
+            y = self.prediction(paddle.concat((y, attn_feat, holistic_feat), 2))
+        else:
+            y = self.prediction(attn_feat)
+        # bsz * (seq_len + 1) * num_classes
+        if self.train_mode:
+            y = self.pred_dropout(y)
+        return y
+    def forward_train(self, feat, out_enc, label, img_metas):
+        '''
+        img_metas: [label, valid_ratio]
+        '''
+        if img_metas is not None:
+            assert len(img_metas[0]) == feat.shape[0]
+        valid_ratios = None
+        if img_metas is not None and self.mask:
+            valid_ratios = img_metas[-1]
+        label = label.cuda()
+        lab_embedding = self.embedding(label)
+        # bsz * seq_len * emb_dim
+        out_enc = out_enc.unsqueeze(1)
+        # bsz * 1 * emb_dim
+        in_dec = paddle.concat((out_enc, lab_embedding), axis=1)
+        # bsz * (seq_len + 1) * C
+        out_dec = self._2d_attention(
+            in_dec, feat, out_enc, valid_ratios=valid_ratios)
+        # bsz * (seq_len + 1) * num_classes
+        return out_dec[:, 1:, :]  # bsz * seq_len * num_classes
+    def forward_test(self, feat, out_enc, img_metas):
+        if img_metas is not None:
+            assert len(img_metas[0]) == feat.shape[0]
+        valid_ratios = None
+        if img_metas is not None and self.mask:
+            valid_ratios = img_metas[-1]
+        seq_len = self.max_seq_len
+        bsz = feat.shape[0]
+        start_token = paddle.full(
+            (bsz, ), fill_value=self.start_idx, dtype='int64')
+        # bsz
+        start_token = self.embedding(start_token)
+        # bsz * emb_dim
+        emb_dim = start_token.shape[1]
+        start_token = start_token.unsqueeze(1)
+        start_token = paddle.expand(start_token, shape=[bsz, seq_len, emb_dim])
+        # bsz * seq_len * emb_dim
+        out_enc = out_enc.unsqueeze(1)
+        # bsz * 1 * emb_dim
+        decoder_input = paddle.concat((out_enc, start_token), axis=1)
+        # bsz * (seq_len + 1) * emb_dim
+        outputs = []
+        for i in range(1, seq_len + 1):
+            decoder_output = self._2d_attention(
+                decoder_input, feat, out_enc, valid_ratios=valid_ratios)
+            char_output = decoder_output[:, i, :]  # bsz * num_classes
+            char_output = F.softmax(char_output, -1)
+            outputs.append(char_output)
+            max_idx = paddle.argmax(char_output, axis=1, keepdim=False)
+            char_embedding = self.embedding(max_idx)  # bsz * emb_dim
+            if i < seq_len:
+                decoder_input[:, i + 1, :] = char_embedding
+        outputs = paddle.stack(outputs, 1)  # bsz * seq_len * num_classes
+        return outputs
+class SARHead(nn.Layer):
+    def __init__(self,
+                 out_channels,
+                 enc_bi_rnn=False,
+                 enc_drop_rnn=0.1,
+                 enc_gru=False,
+                 dec_bi_rnn=False,
+                 dec_drop_rnn=0.0,
+                 dec_gru=False,
+                 d_k=512,
+                 pred_dropout=0.1,
+                 max_text_length=30,
+                 pred_concat=True,
+                 **kwargs):
+        super(SARHead, self).__init__()
+        # encoder module
+        self.encoder = SAREncoder(
+            enc_bi_rnn=enc_bi_rnn, enc_drop_rnn=enc_drop_rnn, enc_gru=enc_gru)
+        # decoder module
+        self.decoder = ParallelSARDecoder(
+            out_channels=out_channels,
+            enc_bi_rnn=enc_bi_rnn,
+            dec_bi_rnn=dec_bi_rnn,
+            dec_drop_rnn=dec_drop_rnn,
+            dec_gru=dec_gru,
+            d_k=d_k,
+            pred_dropout=pred_dropout,
+            max_text_length=max_text_length,
+            pred_concat=pred_concat)
+    def forward(self, feat, targets=None):
+        '''
+        img_metas: [label, valid_ratio]
+        '''
+        holistic_feat = self.encoder(feat, targets)  # bsz c
+        if self.training:
+            label = targets[0]  # label
+            label = paddle.to_tensor(label, dtype='int64')
+            final_out = self.decoder(
+                feat, holistic_feat, label, img_metas=targets)
+        if not self.training:
+            final_out = self.decoder(
+                feat,
+                holistic_feat,
+                label=None,
+                img_metas=targets,
+                train_mode=False)
+            # (bsz, seq_len, num_classes)
+        return final_out
--- a/ppocr/postprocess/__init__.py
+++ b/ppocr/postprocess/__init__.py
@@ -25,7 +25,7 @@ from .db_postprocess import DBPostProcess, DistillationDBPostProcess
 from .east_postprocess import EASTPostProcess
 from .sast_postprocess import SASTPostProcess
 from .rec_postprocess import CTCLabelDecode, AttnLabelDecode, SRNLabelDecode, DistillationCTCLabelDecode, NRTRLabelDecode, \
-    TableLabelDecode
+    TableLabelDecode, SARLabelDecode
 from .cls_postprocess import ClsPostProcess
 from .pg_postprocess import PGPostProcess
@@ -33,7 +33,8 @@ def build_post_process(config, global_config=None):
    support_dict = [
        'DBPostProcess', 'EASTPostProcess', 'SASTPostProcess', 'CTCLabelDecode',
        'AttnLabelDecode', 'ClsPostProcess', 'SRNLabelDecode', 'PGPostProcess',
-        'DistillationCTCLabelDecode', 'NRTRLabelDecode', 'TableLabelDecode', 'DistillationDBPostProcess'
+        'DistillationCTCLabelDecode', 'TableLabelDecode',
+        'DistillationDBPostProcess', 'NRTRLabelDecode', 'SARLabelDecode'
    ]
    config = copy.deepcopy(config)

--- a/ppocr/postprocess/rec_postprocess.py
+++ b/ppocr/postprocess/rec_postprocess.py
@@ -15,6 +15,7 @@ import numpy as np
 import string
 import paddle
 from paddle.nn import functional as F
+import re
 class BaseRecLabelDecode(object):
@@ -171,15 +172,15 @@ class NRTRLabelDecode(BaseRecLabelDecode):
        if preds.dtype == paddle.int64:
            if isinstance(preds, paddle.Tensor):
                preds = preds.numpy()
-            if preds[0][0]==2:
+            if preds[0][0] == 2:
-                preds_idx = preds[:,1:]
+                preds_idx = preds[:, 1:]
            else:
                preds_idx = preds
            text = self.decode(preds_idx)
            if label is None:
                return text
-            label = self.decode(label[:,1:])
+            label = self.decode(label[:, 1:])
        else:
            if isinstance(preds, paddle.Tensor):
                preds = preds.numpy()
@@ -188,11 +189,11 @@ class NRTRLabelDecode(BaseRecLabelDecode):
            text = self.decode(preds_idx, preds_prob, is_remove_duplicate=False)
            if label is None:
                return text
-            label = self.decode(label[:,1:])
+            label = self.decode(label[:, 1:])
        return text, label
    def add_special_char(self, dict_character):
-        dict_character = ['blank','<unk>','<s>','</s>'] + dict_character
+        dict_character = ['blank', '<unk>', '<s>', '</s>'] + dict_character
        return dict_character
    def decode(self, text_index, text_prob=None, is_remove_duplicate=False):
@@ -206,7 +207,8 @@ class NRTRLabelDecode(BaseRecLabelDecode):
                if text_index[batch_idx][idx] == 3:  # end
                    break
                try:
-                    char_list.append(self.character[int(text_index[batch_idx][idx])])
+                    char_list.append(self.character[int(text_index[batch_idx][
+                        idx])])
                except:
                    continue
                if text_prob is not None:
@@ -218,7 +220,6 @@ class NRTRLabelDecode(BaseRecLabelDecode):
        return result_list
 class AttnLabelDecode(BaseRecLabelDecode):
    """ Convert between text-label and text-index """
@@ -256,7 +257,8 @@ class AttnLabelDecode(BaseRecLabelDecode):
                    if idx > 0 and text_index[batch_idx][idx - 1] == text_index[
                            batch_idx][idx]:
                        continue
-                char_list.append(self.character[int(text_index[batch_idx][idx])])
+                char_list.append(self.character[int(text_index[batch_idx][
+                    idx])])
                if text_prob is not None:
                    conf_list.append(text_prob[batch_idx][idx])
                else:
@@ -386,10 +388,9 @@ class SRNLabelDecode(BaseRecLabelDecode):
 class TableLabelDecode(object):
    """  """
-    def __init__(self,
+    def __init__(self, character_dict_path, **kwargs):
-                 character_dict_path,
+        list_character, list_elem = self.load_char_elem_dict(
-                 **kwargs):
+            character_dict_path)
-        list_character, list_elem = self.load_char_elem_dict(character_dict_path)
        list_character = self.add_special_char(list_character)
        list_elem = self.add_special_char(list_elem)
        self.dict_character = {}
@@ -408,7 +409,8 @@ class TableLabelDecode(object):
        list_elem = []
        with open(character_dict_path, "rb") as fin:
            lines = fin.readlines()
-            substr = lines[0].decode('utf-8').strip("\n").strip("\r\n").split("\t")
+            substr = lines[0].decode('utf-8').strip("\n").strip("\r\n").split(
+                "\t")
            character_num = int(substr[0])
            elem_num = int(substr[1])
            for cno in range(1, 1 + character_num):
@@ -428,14 +430,14 @@ class TableLabelDecode(object):
    def __call__(self, preds):
        structure_probs = preds['structure_probs']
        loc_preds = preds['loc_preds']
-        if isinstance(structure_probs,paddle.Tensor):
+        if isinstance(structure_probs, paddle.Tensor):
            structure_probs = structure_probs.numpy()
-        if isinstance(loc_preds,paddle.Tensor):
+        if isinstance(loc_preds, paddle.Tensor):
            loc_preds = loc_preds.numpy()
        structure_idx = structure_probs.argmax(axis=2)
        structure_probs = structure_probs.max(axis=2)
-        structure_str, structure_pos, result_score_list, result_elem_idx_list = self.decode(structure_idx,
+        structure_str, structure_pos, result_score_list, result_elem_idx_list = self.decode(
-                                                                                            structure_probs, 'elem')
+            structure_idx, structure_probs, 'elem')
        res_html_code_list = []
        res_loc_list = []
        batch_num = len(structure_str)
@@ -450,8 +452,13 @@ class TableLabelDecode(object):
            res_loc = np.array(res_loc)
            res_html_code_list.append(res_html_code)
            res_loc_list.append(res_loc)
-        return {'res_html_code': res_html_code_list, 'res_loc': res_loc_list, 'res_score_list': result_score_list,
+        return {
-                'res_elem_idx_list': result_elem_idx_list,'structure_str_list':structure_str}
+            'res_html_code': res_html_code_list,
+            'res_loc': res_loc_list,
+            'res_score_list': result_score_list,
+            'res_elem_idx_list': result_elem_idx_list,
+            'structure_str_list': structure_str
+        }
    def decode(self, text_index, structure_probs, char_or_elem):
        """convert text-label into text-index.
@@ -516,3 +523,82 @@ class TableLabelDecode(object):
            assert False, "Unsupport type %s in char_or_elem" \
                          % char_or_elem
        return idx
+class SARLabelDecode(BaseRecLabelDecode):
+    """ Convert between text-label and text-index """
+    def __init__(self,
+                 character_dict_path=None,
+                 character_type='ch',
+                 use_space_char=False,
+                 **kwargs):
+        super(SARLabelDecode, self).__init__(character_dict_path,
+                                             character_type, use_space_char)
+        self.rm_symbol = kwargs.get('rm_symbol', False)
+    def add_special_char(self, dict_character):
+        beg_end_str = "<BOS/EOS>"
+        unknown_str = "<UKN>"
+        padding_str = "<PAD>"
+        dict_character = dict_character + [unknown_str]
+        self.unknown_idx = len(dict_character) - 1
+        dict_character = dict_character + [beg_end_str]
+        self.start_idx = len(dict_character) - 1
+        self.end_idx = len(dict_character) - 1
+        dict_character = dict_character + [padding_str]
+        self.padding_idx = len(dict_character) - 1
+        return dict_character
+    def decode(self, text_index, text_prob=None, is_remove_duplicate=False):
+        """ convert text-index into text-label. """
+        result_list = []
+        ignored_tokens = self.get_ignored_tokens()
+        batch_size = len(text_index)
+        for batch_idx in range(batch_size):
+            char_list = []
+            conf_list = []
+            for idx in range(len(text_index[batch_idx])):
+                if text_index[batch_idx][idx] in ignored_tokens:
+                    continue
+                if int(text_index[batch_idx][idx]) == int(self.end_idx):
+                    if text_prob is None and idx == 0:
+                        continue
+                    else:
+                        break
+                if is_remove_duplicate:
+                    # only for predict
+                    if idx > 0 and text_index[batch_idx][idx - 1] == text_index[
+                            batch_idx][idx]:
+                        continue
+                char_list.append(self.character[int(text_index[batch_idx][
+                    idx])])
+                if text_prob is not None:
+                    conf_list.append(text_prob[batch_idx][idx])
+                else:
+                    conf_list.append(1)
+            text = ''.join(char_list)
+            if self.rm_symbol:
+                comp = re.compile('[^A-Z^a-z^0-9^\u4e00-\u9fa5]')
+                text = text.lower()
+                text = comp.sub('', text)
+            result_list.append((text, np.mean(conf_list)))
+        return result_list
+    def __call__(self, preds, label=None, *args, **kwargs):
+        if isinstance(preds, paddle.Tensor):
+            preds = preds.numpy()
+        preds_idx = preds.argmax(axis=2)
+        preds_prob = preds.max(axis=2)
+        text = self.decode(preds_idx, preds_prob, is_remove_duplicate=False)
+        if label is None:
+            return text
+        label = self.decode(label, is_remove_duplicate=False)
+        return text, label
+    def get_ignored_tokens(self):
+        return [self.padding_idx]
--- a/tools/eval.py
+++ b/tools/eval.py
@@ -55,6 +55,7 @@ def main():
    model = build_model(config['Architecture'])
    use_srn = config['Architecture']['algorithm'] == "SRN"
+    use_sar = config['Architecture']['algorithm'] == "SAR"
    if "model_type" in config['Architecture'].keys():
        model_type = config['Architecture']['model_type']
    else:
@@ -71,7 +72,7 @@ def main():
    # start eval
    metric = program.eval(model, valid_dataloader, post_process_class,
-                          eval_class, model_type, use_srn)
+                        eval_class, model_type, use_srn, use_sar)
    logger.info('metric eval ***************')
    for k, v in metric.items():
        logger.info('{}:{}'.format(k, v))

--- a/tools/infer_rec.py
+++ b/tools/infer_rec.py
@@ -74,6 +74,10 @@ def main():
                    'image', 'encoder_word_pos', 'gsrm_word_pos',
                    'gsrm_slf_attn_bias1', 'gsrm_slf_attn_bias2'
                ]
+            elif config['Architecture']['algorithm'] == "SAR":
+                op[op_name]['keep_keys'] = [
+                    'image', 'valid_ratio'
+                ]
            else:
                op[op_name]['keep_keys'] = ['image']
        transforms.append(op)
@@ -106,11 +110,16 @@ def main():
                    paddle.to_tensor(gsrm_slf_attn_bias1_list),
                    paddle.to_tensor(gsrm_slf_attn_bias2_list)
                ]
+            if config['Architecture']['algorithm'] == "SAR":
+                valid_ratio = np.expand_dims(batch[-1], axis=0)
+                img_metas = [paddle.to_tensor(valid_ratio)]
            images = np.expand_dims(batch[0], axis=0)
            images = paddle.to_tensor(images)
            if config['Architecture']['algorithm'] == "SRN":
                preds = model(images, others)
+            elif config['Architecture']['algorithm'] == "SAR":
+                preds = model(images, img_metas)
            else:
                preds = model(images)
            post_result = post_process_class(preds)

--- a/tools/program.py
+++ b/tools/program.py
@@ -187,7 +187,7 @@ def train(config,
    use_srn = config['Architecture']['algorithm'] == "SRN"
    use_nrtr = config['Architecture']['algorithm'] == "NRTR"
+    use_sar = config['Architecture']['algorithm'] == 'SAR'
    try:
        model_type = config['Architecture']['model_type']
    except:
@@ -215,7 +215,7 @@ def train(config,
            images = batch[0]
            if use_srn:
                model_average = True
-            if use_srn or model_type == 'table' or use_nrtr:
+            if use_srn or model_type == 'table' or use_nrtr or use_sar:
                preds = model(images, data=batch[1:])
            else:
                preds = model(images)
@@ -279,7 +279,8 @@ def train(config,
                    post_process_class,
                    eval_class,
                    model_type,
-                    use_srn=use_srn)
+                    use_srn=use_srn,
+                    use_sar=use_sar)
                cur_metric_str = 'cur metric, {}'.format(', '.join(
                    ['{}: {}'.format(k, v) for k, v in cur_metric.items()]))
                logger.info(cur_metric_str)
@@ -351,7 +352,8 @@ def eval(model,
         post_process_class,
         eval_class,
         model_type,
-         use_srn=False):
+         use_srn=False,
+         use_sar=False):
    model.eval()
    with paddle.no_grad():
        total_frame = 0.0
@@ -364,7 +366,7 @@ def eval(model,
                break
            images = batch[0]
            start = time.time()
-            if use_srn or model_type == 'table':
+            if use_srn or model_type == 'table' or use_sar:
                preds = model(images, data=batch[1:])
            else:
                preds = model(images)
@@ -400,7 +402,7 @@ def preprocess(is_train=False):
    alg = config['Architecture']['algorithm']
    assert alg in [
        'EAST', 'DB', 'SAST', 'Rosetta', 'CRNN', 'STARNet', 'RARE', 'SRN',
-        'CLS', 'PGNet', 'Distillation', 'NRTR', 'TableAttn'
+        'CLS', 'PGNet', 'Distillation', 'NRTR', 'TableAttn', 'SAR'
    ]
    device = 'gpu:{}'.format(dist.ParallelEnv().dev_id) if use_gpu else 'cpu'