fix floordiv and iou ops.

642d48e9 · linqingke · 61538e3e · 642d48e9 · 642d48e9 · 642d48e9
11 changed file
--- a/mindspore/ccsrc/backend/kernel_compiler/gpu/cuda_impl/broadcast_impl.cu
+++ b/mindspore/ccsrc/backend/kernel_compiler/gpu/cuda_impl/broadcast_impl.cu
@@ -77,7 +77,9 @@ struct AddFunc {

 template <typename T, typename S>
 struct FloorDivFunc {
-  __device__ __forceinline__ S operator()(const T &lhs, const T &rhs) { return floor(static_cast<float>(lhs / rhs)); }
+  __device__ __forceinline__ S operator()(const T &lhs, const T &rhs) {
+    return floor(static_cast<float>(lhs) / static_cast<float>(rhs));
+  }
 };

 template <>

--- a/mindspore/ops/operations/other_ops.py
+++ b/mindspore/ops/operations/other_ops.py
@@ -246,9 +246,9 @@ class IOU(PrimitiveWithInfer):

    Inputs:
        - **anchor_boxes** (Tensor) - Anchor boxes, tensor of shape (N, 4). "N" indicates the number of anchor boxes,
-          and the value "4" refers to "x0", "x1", "y0", and "y1". Data type must be float16 or float32.
+          and the value "4" refers to "x0", "y0", "x1", and "y1". Data type must be float16 or float32.
        - **gt_boxes** (Tensor) - Ground truth boxes, tensor of shape (M, 4). "M" indicates the number of ground
-          truth boxes, and the value "4" refers to "x0", "x1", "y0", and "y1". Data type must be float16 or float32.
+          truth boxes, and the value "4" refers to "x0", "y0", "x1", and "y1". Data type must be float16 or float32.

    Outputs:
        Tensor, the 'iou' values, tensor of shape (M, N), with the same data type as `anchor_boxes`.

--- a/model_zoo/official/cv/faster_rcnn/README.md
+++ b/model_zoo/official/cv/faster_rcnn/README.md
-# FasterRcnn Example
- 
-## Description
+# Contents
+
+- [FasterRcnn Description](#fasterrcnn-description)
+- [Model Architecture](#model-architecture)
+- [Dataset](#dataset)
+- [Environment Requirements](#environment-requirements)
+- [Quick Start](#quick-start)    
+- [Script Description](#script-description)
+    - [Script and Sample Code](#script-and-sample-code)
+    - [Training Process](#training-process)
+        - [Training Usage](#usage)
+        - [Training Result](#result)
+    - [Evaluation Process](#evaluation-process)
+        - [Evaluation Usage](#usage)
+        - [Evaluation Result](#result)
+- [Model Description](#model-description)
+    - [Performance](#performance)  
+        - [Evaluation Performance](#evaluation-performance)
+        - [Inference Performance](#evaluation-performance)
+- [ModelZoo Homepage](#modelzoo-homepage)
+
+# FasterRcnn Description
 
+Before FasterRcnn, the target detection networks rely on the region proposal algorithm to assume the location of targets, such as SPPnet and Fast R-CNN. Progress has reduced the running time of these detection networks, but it also reveals that the calculation of the region proposal is a bottleneck.
+
+FasterRcnn proposed that convolution feature maps based on region detectors (such as Fast R-CNN) can also be used to generate region proposals. At the top of these convolution features, a Region Proposal Network (RPN) is constructed by adding some additional convolution layers (which share the convolution characteristics of the entire image with the detection network, thus making it possible to make regions almost costlessProposal), outputting both region bounds and objectness score for each location.Therefore, RPN is a full convolutional network (FCN), which can be trained end-to-end, generate high-quality region proposals, and then fed into Fast R-CNN for detection.
+
+[Paper](https://arxiv.org/abs/1506.01497):   Ren S , He K , Girshick R , et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6).
+
+#Model Architecture
+
 FasterRcnn is a two-stage target detection network,This network uses a region proposal network (RPN), which can share the convolution features of the whole image with the detection network, so that the calculation of region proposal is almost cost free. The whole network further combines RPN and FastRcnn into a network by sharing the convolution features.

-## Requirements
+# Dataset
+
+Dataset used: [COCO2017](<http://images.cocodataset.org/>) 
+
+- Dataset size：19G
+  - Train：18G，118000 images  
+  - Val：1G，5000 images 
+  - Annotations：241M，instances，captions，person_keypoints etc
+- Data format：image and json files
+  - Note：Data will be processed in dataset.py
+
+#Environment Requirements

 - Install [MindSpore](https://www.mindspore.cn/install/en).

 - Download the dataset COCO2017.

- We use coco2017 as training dataset in this example by default, and you can also use your own datasets.
+- We use COCO2017 as training dataset in this example by default, and you can also use your own datasets.

    1. If coco dataset is used. **Select dataset to coco when run script.**
        Install Cython and pycocotool, and you can also install mmcv to process data.
@@ -20,7 +58,7 @@ FasterRcnn is a two-stage target detection network,This network uses a region pr

        pip install pycocotools

-        pip install mmcv
+        pip install mmcv==0.2.14
        ```
        And change the COCO_ROOT and other settings you need in `config.py`. The directory structure is as follows:

@@ -45,58 +83,71 @@ FasterRcnn is a two-stage target detection network,This network uses a region pr

        Each row is an image annotation which split by space, the first column is a relative path of image, the others are box and class infomations of the format [xmin,ymin,xmax,ymax,class]. We read image from an image path joined by the `IMAGE_DIR`(dataset directory) and the relative path in `ANNO_PATH`(the TXT file path), `IMAGE_DIR` and `ANNO_PATH` are setting in `config.py`. 

+# Quick Start
+
+After installing MindSpore via the official website, you can start training and evaluation as follows: 
+
+```
+# standalone training
+sh run_standalone_train_ascend.sh [PRETRAINED_MODEL]
+
+# distributed training
+sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL]
+
+# eval
+sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
+```
+
+# Script Description
+
+## Script and Sample Code

-## Example structure
- 
 ```shell
 .
 └─FasterRcnn      
-  ├─README.md
+  ├─README.md    // descriptions about fasterrcnn
  ├─scripts
-    ├─run_download_process_data.sh       
-    ├─run_standalone_train.sh
-    ├─run_train.sh
-    └─run_eval.sh
+    ├─run_standalone_train_ascend.sh    // shell script for standalone on ascend
+    ├─run_distribute_train_ascend.sh    // shell script for distributed on ascend
+    └─run_eval_ascend.sh    // shell script for eval on ascend
  ├─src
    ├─FasterRcnn
-      ├─__init__.py
-      ├─anchor_generator.py
-      ├─bbox_assign_sample.py
-      ├─bbox_assign_sample_stage2.py
-      ├─faster_rcnn_r50.py
-      ├─fpn_neck.py
-      ├─proposal_generator.py
-      ├─rcnn.py
-      ├─resnet50.py
-      ├─roi_align.py
-      └─rpn.py
-    ├─config.py
-    ├─dataset.py
-    ├─lr_schedule.py
-    ├─network_define.py
-    └─util.py
-  ├─eval.py
-  └─train.py
+      ├─__init__.py    // init file
+      ├─anchor_generator.py    // anchor generator
+      ├─bbox_assign_sample.py    // first stage sampler
+      ├─bbox_assign_sample_stage2.py    // second stage sampler
+      ├─faster_rcnn_r50.py    // fasterrcnn network
+      ├─fpn_neck.py    //feature pyramid network
+      ├─proposal_generator.py    // proposal generator
+      ├─rcnn.py    // rcnn network
+      ├─resnet50.py    // backbone network
+      ├─roi_align.py    // roi align network
+      └─rpn.py    //  region proposal network
+    ├─config.py    // total config
+    ├─dataset.py    // create dataset and process dataset
+    ├─lr_schedule.py    // learning ratio generator
+    ├─network_define.py    // network define for fasterrcnn
+    └─util.py    // routine operation
+  ├─eval.py    //eval scripts
+  └─train.py    // train scripts
 ```

-## Running the example
-
-### Train
+## Training Process
 
-#### Usage
+### Usage

 ```
-# distributed training
-sh run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL]
- 
-# standalone training
-sh run_standalone_train.sh [PRETRAINED_MODEL]
+# standalone training on ascend
+sh run_standalone_train_ascend.sh [PRETRAINED_MODEL]
+
+# distributed training on ascend
+sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL]
 ```
 
 > Rank_table.json which is specified by RANK_TABLE_FILE is needed when you are running a distribute task. You can generate it by using the [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools).
 > As for PRETRAINED_MODEL，it should be a ResNet50 checkpoint that trained over ImageNet2012. Ready-made pretrained_models are not available now. Stay tuned.

-#### Result
+### Result
 
 Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". You can find checkpoint file together with result like the followings in loss.log.

@@ -112,20 +163,20 @@ epoch: 11 step: 7393, rpn_loss: 0.19849, rcnn_loss: 0.47827, rpn_cls_loss: 0.116
 epoch: 12 step: 7393, rpn_loss: 0.00691, rcnn_loss: 0.10168, rpn_cls_loss: 0.00529, rpn_reg_loss: 0.00162, rcnn_cls_loss: 0.05426, rcnn_reg_loss: 0.04745, total_loss: 0.10859
 ```

-### Infer
- 
-#### Usage
+## Evaluation Process
+
+### Usage
 
 ```
-# infer
-sh run_eval.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
+# eval on ascend
+sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
 ```
 
 > checkpoint can be produced in training process.

-#### Result
+### Result
 
-Inference result will be stored in the example path, whose folder name is "eval". Under this, you can find result like the followings in log.
+Eval result will be stored in the example path, whose folder name is "eval". Under this, you can find result like the followings in log.
 
 ```
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.360
@@ -141,3 +192,42 @@ Inference result will be stored in the example path, whose folder name is "eval"
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.562
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.631
 ```
+
+
+# Model Description
+## Performance
+
+### Training Performance 
+
+| Parameters                 | FasterRcnn                                                   |
+| -------------------------- | ----------------------------------------------------------- |
+| Model Version              | V1                                                |
+| Resource                   | Ascend 910 ；CPU 2.60GHz，56cores；Memory，314G             |
+| uploaded Date              | 06/01/2020 (month/day/year)                                 |
+| MindSpore Version          | 0.3.0-alpha                                                       |
+| Dataset                    | COCO2017                                                   |
+| Training Parameters        | epoch=12,  batch_size = 2          |
+| Optimizer                  | SGD                                                         |
+| Loss Function              | Softmax Cross Entropy ,Sigmoid Cross Entropy,SmoothL1Loss                                      |
+| Speed                      | 1pc: 190 ms/step;  8pcs: 200 ms/step                          |
+| Total time                 | 1pc: 37.17 hours;  8pcs: 4.89 hours                          |
+| Parameters (M)             | 250                                                         |
+| Scripts                    | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/faster_rcnn |
+
+
+### Evaluation Performance
+
+| Parameters          | FasterRcnn                   |
+| ------------------- | --------------------------- |
+| Model Version       | V1                |
+| Resource            | Ascend 910                  |
+| Uploaded Date       | 06/01/2020 (month/day/year) |
+| MindSpore Version   | 0.3.0-alpha                       |
+| Dataset             | COCO2017    |
+| batch_size          | 2                         |
+| outputs             | mAP                 |
+| Accuracy            |  IoU=0.50: 58.6%  |
+| Model for inference | 250M (.ckpt file)         |
+
+# [ModelZoo Homepage](#contents)  
+ Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).  
\ No newline at end of file
--- a/model_zoo/official/cv/faster_rcnn/scripts/run_distribute_train.sh
+++ b/model_zoo/official/cv/faster_rcnn/scripts/run_distribute_train.sh
@@ -16,7 +16,7 @@

 if [ $# -ne 2 ]
 then 
-    echo "Usage: sh run_train.sh [RANK_TABLE_FILE] [PRETRAINED_PATH]"
+    echo "Usage: sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_PATH]"
 exit 1
 fi


--- a/model_zoo/official/cv/faster_rcnn/scripts/run_eval.sh
+++ b/model_zoo/official/cv/faster_rcnn/scripts/run_eval.sh
@@ -16,7 +16,7 @@

 if [ $# != 2 ]
 then 
-    echo "Usage: sh run_eval.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]"
+    echo "Usage: sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]"
 exit 1
 fi


--- a/model_zoo/official/cv/faster_rcnn/scripts/run_standalone_train.sh
+++ b/model_zoo/official/cv/faster_rcnn/scripts/run_standalone_train.sh
@@ -16,7 +16,7 @@

 if [ $# -ne 1 ]
 then 
-    echo "Usage: sh run_standalone_train.sh [PRETRAINED_PATH]"
+    echo "Usage: sh run_standalone_train_ascend.sh [PRETRAINED_PATH]"
 exit 1
 fi


--- a/model_zoo/official/nlp/mass/README.md
+++ b/model_zoo/official/nlp/mass/README.md
@@ -2,8 +2,8 @@

 <!-- TOC -->

- [MASS: Masked Sequence to Sequence Pre-training for Language Generation Description](#googlenet-description)
- [Model architecture](#model-architecture)
+- [MASS: Masked Sequence to Sequence Pre-training for Language Generation Description](#mass-description)
+- [Model Architecture](#model-architecture)
 - [Dataset](#dataset)
 - [Features](#features)
 - [Script description](#script-description)
@@ -19,11 +19,6 @@
    - [Training & Evaluation process](#Training-&-Evaluation-process)
    - [Weights average](#Weights-average)
    - [Learning rate scheduler](#Learning-rate-scheduler)
- [Model description](#model-description)
-    - [Performance](#performance)
-        - [Results](#results)
-            - [Training Performance](#training-performance)
-            - [Inference Performance](#inference-performance)
 - [Environment Requirements](#environment-requirements)
    - [Platform](#Platform)
    - [Requirements](#Requirements)
@@ -31,6 +26,10 @@
    - [Pre-training](#Pre-training)
    - [Fine-tuning](#Fine-tuning)
    - [Inference](#Inference)
+- [Performance](#performance)
+    - [Results](#results)
+        - [Training Performance](#training-performance)
+        - [Inference Performance](#inference-performance)
 - [Description of random situation](#description-of-random-situation)
 - [others](#others)
 - [ModelZoo Homepage](#modelzoo-homepage)
@@ -50,12 +49,13 @@ Inspired by BERT, GPT and other language models, MicroSoft addressed [MASS: Mask

 [Paper](https://www.microsoft.com/en-us/research/uploads/prod/2019/06/MASS-paper-updated-002.pdf): Song, Kaitao, Xu Tan, Tao Qin, Jianfeng Lu and Tie-Yan Liu. “MASS: Masked Sequence to Sequence Pre-training for Language Generation.” ICML (2019).

+# Model Architecture

-# Model architecture
-
-The overall network architecture of MASS is shown below, which is Transformer(Vaswani et al., 2017):
-
-MASS is consisted of 6-layer encoder and 6-layer decoder with 1024 embedding/hidden size, and 4096 intermediate size between feed forward network which has two full connection layers.
+The MASS network is implemented by Transformer, which has multi-encoder layers and multi-decoder layers. 
+For pre-training, we use the Adam optimizer and loss-scale to get the pre-trained model. 
+During fine-turning, we fine-tune this pre-trained model with different dataset according to different tasks. 
+During testing, we use the fine-turned model to predict the result, and adopt a beam search algorithm to 
+get the most possible prediction results.

 # Dataset

@@ -465,86 +465,18 @@ For Inverse square root scheduler, config could be like:
 More detail about LR scheduler could be found in `src/utils/lr_scheduler.py`.


-# Model description
-
-The MASS network is implemented by Transformer, which has multi-encoder layers and multi-decoder layers. 
-For pre-training, we use the Adam optimizer and loss-scale to get the pre-trained model. 
-During fine-turning, we fine-tune this pre-trained model with different dataset according to different tasks. 
-During testing, we use the fine-turned model to predict the result, and adopt a beam search algorithm to 
-get the most possible prediction results.
-
-
-## Performance
-
-### Results
-
-#### Fine-Tuning on Text Summarization
-The comparisons between MASS and two other pre-training methods in terms of ROUGE score on the text summarization task 
-with 3.8M training data are as follows:
-
-| Method         |  RG-1(F)      | RG-2(F)      | RG-L(F)      |
-|:---------------|:--------------|:-------------|:-------------|
-| MASS           | Ongoing       | Ongoing      | Ongoing      |
-
-#### Fine-Tuning on Conversational ResponseGeneration
-The comparisons between MASS and other baseline methods in terms of PPL on Cornell Movie Dialog corpus are as follows:
-
-| Method             | Data = 10K       |  Data = 110K    |
-|--------------------|------------------|-----------------|
-| MASS               | Ongoing          | Ongoing         |
-
-#### Training Performance
-
-| Parameters                 | Masked Sequence to Sequence Pre-training for Language Generation          |
-|:---------------------------|:--------------------------------------------------------------------------|
-| Model Version              | v1                                                                        |
-| Resource                   | Ascend 910, cpu 2.60GHz, 56cores；memory, 314G                            |
-| uploaded Date              | 05/24/2020                                                                |
-| MindSpore Version          | 0.2.0                                                                     |
-| Dataset                    | News Crawl 2007-2017 English monolingual corpus, Gigaword corpus, Cornell Movie Dialog corpus |
-| Training Parameters        | Epoch=50, steps=XXX, batch_size=192, lr=1e-4                              |
-| Optimizer                  | Adam                                                                      |
-| Loss Function              | Label smoothed cross-entropy criterion                                    |
-| outputs                    | Sentence and probability                                                  |
-| Loss                       | Lower than 2                                                              |
-| Accuracy                   | For conversation response, ppl=23.52, for text summarization, RG-1=29.79. |
-| Speed                      | 611.45 sentences/s                                                        |
-| Total time                 | --/--                                                                     |
-| Params (M)                 | 44.6M                                                                     |
-| Checkpoint for Fine tuning | ---Mb, --, [A link]()                                                     |
-| Model for inference        | ---Mb, --, [A link]()                                                     |
-| Scripts                    | [A link]()                                                                |
-
-
-#### Inference Performance
-
-| Parameters                 | Masked Sequence to Sequence Pre-training for Language Generation |
-|:---------------------------|:-----------------------------------------------------------|
-| Model Version              | V1                                                         |
-| Resource                   | Huawei 910                                                 |
-| uploaded Date              | 05/24/2020                                                 |
-| MindSpore Version          | 0.2.0                                                      |
-| Dataset                    | Gigaword corpus, Cornell Movie Dialog corpus               |
-| batch_size                 | ---                                                        |
-| outputs                    | Sentence and probability                                   |
-| Accuracy                   | ppl=23.52 for conversation response, RG-1=29.79 for text summarization. |
-| Speed                      | ---- sentences/s                                           |
-| Total time                 | --/--                                                      |
-| Model for inference        | ---Mb, --, [A link]()                                      |
-
-
 # Environment Requirements

 ## Platform

- Hardware(Ascend)
-  - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you could get the resources for trial. 
+- Hardware（Ascend/GPU）
+  - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend  , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. 
 - Framework
  - [MindSpore](http://10.90.67.50/mindspore/archive/20200506/OpenSource/me_vm_x86/)
 - For more information, please check the resources below：
  - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) 
  - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)
-
+  
 ## Requirements

 ```txt
@@ -630,6 +562,65 @@ You can also run the shell script `run_gpu.sh` on gpu as followed:
 sh run_gpu.sh -t i -n 1 -i 1 -c config/config.json -o {outputfile}
 ```

+# Performance
+
+## Results
+
+### Fine-Tuning on Text Summarization
+The comparisons between MASS and two other pre-training methods in terms of ROUGE score on the text summarization task 
+with 3.8M training data are as follows:
+
+| Method         |  RG-1(F)      | RG-2(F)      | RG-L(F)      |
+|:---------------|:--------------|:-------------|:-------------|
+| MASS           | Ongoing       | Ongoing      | Ongoing      |
+
+### Fine-Tuning on Conversational ResponseGeneration
+The comparisons between MASS and other baseline methods in terms of PPL on Cornell Movie Dialog corpus are as follows:
+
+| Method             | Data = 10K       |  Data = 110K    |
+|--------------------|------------------|-----------------|
+| MASS               | Ongoing          | Ongoing         |
+
+### Training Performance
+
+| Parameters                 | Masked Sequence to Sequence Pre-training for Language Generation          |
+|:---------------------------|:--------------------------------------------------------------------------|
+| Model Version              | v1                                                                        |
+| Resource                   | Ascend 910, cpu 2.60GHz, 56cores；memory, 314G                            |
+| uploaded Date              | 05/24/2020                                                                |
+| MindSpore Version          | 0.2.0                                                                     |
+| Dataset                    | News Crawl 2007-2017 English monolingual corpus, Gigaword corpus, Cornell Movie Dialog corpus |
+| Training Parameters        | Epoch=50, steps=XXX, batch_size=192, lr=1e-4                              |
+| Optimizer                  | Adam                                                                      |
+| Loss Function              | Label smoothed cross-entropy criterion                                    |
+| outputs                    | Sentence and probability                                                  |
+| Loss                       | Lower than 2                                                              |
+| Accuracy                   | For conversation response, ppl=23.52, for text summarization, RG-1=29.79. |
+| Speed                      | 611.45 sentences/s                                                        |
+| Total time                 | --/--                                                                     |
+| Params (M)                 | 44.6M                                                                     |
+| Checkpoint for Fine tuning | ---Mb, --, [A link]()                                                     |
+| Model for inference        | ---Mb, --, [A link]()                                                     |
+| Scripts                    | [A link]()                                                                |
+
+
+### Inference Performance
+
+| Parameters                 | Masked Sequence to Sequence Pre-training for Language Generation |
+|:---------------------------|:-----------------------------------------------------------|
+| Model Version              | V1                                                         |
+| Resource                   | Huawei 910                                                 |
+| uploaded Date              | 05/24/2020                                                 |
+| MindSpore Version          | 0.2.0                                                      |
+| Dataset                    | Gigaword corpus, Cornell Movie Dialog corpus               |
+| batch_size                 | ---                                                        |
+| outputs                    | Sentence and probability                                   |
+| Accuracy                   | ppl=23.52 for conversation response, RG-1=29.79 for text summarization. |
+| Speed                      | ---- sentences/s                                           |
+| Total time                 | --/--                                                      |
+| Model for inference        | ---Mb, --, [A link]()                                      |
+
+
 # Description of random situation

 MASS model contains dropout operations, if you want to disable dropout, please set related dropout_rate to 0 in `config/config.json`. 

--- a/model_zoo/official/nlp/mass/config/config.json
+++ b/model_zoo/official/nlp/mass/config/config.json
@@ -30,6 +30,7 @@
    "input_mask_from_dataset": true
  },
  "loss_scale_config": {
+    "loss_scale_mode": "dynamic",
    "init_loss_scale": 65536,
    "loss_scale_factor": 2,
    "scale_window": 200

--- a/model_zoo/official/nlp/mass/config/config.py
+++ b/model_zoo/official/nlp/mass/config/config.py
@@ -93,6 +93,7 @@ class TransformerConfig:
            encoder/decoder cell. Default: 4096.
        hidden_act (str): Activation function used in the Transformer encoder/decoder
            cell. Default: "relu".
+        loss_scale_mode (str): Loss scale mode. Default: "dynamic".
        init_loss_scale (int): Initialized loss scale.
        loss_scale_factor (int): Loss scale factor.
        scale_window (int): Window size of loss scale.
@@ -141,6 +142,7 @@ class TransformerConfig:
                 attention_dropout_prob=0.1,
                 max_position_embeddings=64,
                 initializer_range=0.02,
+                 loss_scale_mode="dynamic",
                 init_loss_scale=2 ** 10,
                 loss_scale_factor=2, scale_window=2000,
                 beam_width=5,
@@ -192,6 +194,7 @@ class TransformerConfig:
        self.compute_type = mstype.float16
        self.dtype = dtype

+        self.loss_scale_mode = loss_scale_mode
        self.scale_window = scale_window
        self.loss_scale_factor = loss_scale_factor
        self.init_loss_scale = init_loss_scale

--- a/model_zoo/official/nlp/mass/train.py
+++ b/model_zoo/official/nlp/mass/train.py
@@ -187,12 +187,12 @@ def _build_training_pipeline(config: TransformerConfig,
        raise ValueError(f"optimizer only support `adam` and `momentum` now.")

    # loss scale.
-    if platform == "Ascend":
+    if config.loss_scale_mode == "dynamic":
        scale_manager = DynamicLossScaleManager(init_loss_scale=config.init_loss_scale,
                                                scale_factor=config.loss_scale_factor,
                                                scale_window=config.scale_window)
    else:
-        scale_manager = FixedLossScaleManager(loss_scale=1.0, drop_overflow_update=True)
+        scale_manager = FixedLossScaleManager(loss_scale=config.init_loss_scale, drop_overflow_update=True)
    net_with_grads = TransformerTrainOneStepWithLossScaleCell(network=net_with_loss, optimizer=optimizer,
                                                              scale_update_cell=scale_manager.get_update_cell())
    net_with_grads.set_train(True)

--- a/model_zoo/official/nlp/transformer/README.md
+++ b/model_zoo/official/nlp/transformer/README.md
@@ -21,7 +21,7 @@

 # [Transfomer Description](#contents)

-Transformer was proposed in 2017 and designed to process sequential data. It is adopted mainly in the field of natural language processing(NLP), for tasks like machine translation or text summarization. Unlike traditional recurrent neural network(RNN) which processes data in order, Transformer adopts attention mechanism and improve the parallelism, therefore reduced training times and made training on larger datasets possible. Since Transformer model was introduced, it has used to tackle many problems in NLP and derives many network models, such as BERT(Bidirectional Encoder Representations from Transformers) and GPT(Generative Pre-trained Transformer).
+Transformer was proposed in 2017 and designed to process sequential data. It is adopted mainly in the field of natural language processing(NLP), for tasks like machine translation or text summarization. Unlike traditional recurrent neural network(RNN) which processes data in order, Transformer adopts attention mechanism and improve the parallelism, therefore reduced training times and made training on larger datasets possible. Since Transformer model was introduced, it has been used to tackle many problems in NLP and derives many network models, such as BERT(Bidirectional Encoder Representations from Transformers) and GPT(Generative Pre-trained Transformer).

 [Paper](https://arxiv.org/abs/1706.03762):  Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N Gomez, Ł ukaszKaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS 2017, pages 5998–6008.