diff --git a/README.md b/README.md
index 5842741437347c7aaf48f81ad7ee270c43ecb1dc..06159f8759948c7f77fa526e5a05ae36970210ee 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@
 飞桨图像分类套件PaddleClas是飞桨为工业界和学术界所准备的一个图像分类任务的工具集，助力使用者训练出更好的视觉模型和应用落地。
 
 <div align="center">
-    <img src="docs/images/main_features.png" width="700">
+    <img src="./docs/images/main_features_s.png" width="700">
 </div>
 
 ## 丰富的模型库
@@ -17,21 +17,20 @@
 基于ImageNet1k分类数据集，PaddleClas提供ResNet、ResNet_vd、Res2Net、HRNet、MobileNetV3等23种系列的分类网络结构的简单介绍、论文指标复现配置，以及在复现过程中的训练技巧。与此同时，也提供了对应的117个图像分类预训练模型，并且基于TensorRT评估了服务器端模型的GPU预测时间，以及在骁龙855（SD855）上评估了移动端模型的CPU预测时间和存储大小。支持的***预训练模型列表、下载地址以及更多信息***请见文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
 
 <div align="center">
-    <img src="docs/images/models/main_fps_top1.png" width="700">
+    <img src="./docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg" width="700">
 </div>
 
-上图对比了一些最新的面向服务器端应用场景的模型，在使用V100，FP32和TensorRT预测一张图像的时间和其准确率，图中准确率82.4%的ResNet50_vd_ssld和83.7%的ResNet101_vd_ssld，是采用PaddleClas提供的SSLD知识蒸馏方案训练的模型。图中相同颜色和符号的点代表同一系列不同规模的模型。不同模型的简介、FLOPS、Parameters以及详细的GPU预测时间请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
+上图对比了一些最新的面向服务器端应用场景的模型，在使用V100，FP32和TensorRT，batch size为1时的预测时间及其准确率，图中准确率82.4%的ResNet50_vd_ssld和83.7%的ResNet101_vd_ssld，是采用PaddleClas提供的SSLD知识蒸馏方案训练的模型。图中相同颜色和符号的点代表同一系列不同规模的模型。不同模型的简介、FLOPS、Parameters以及详细的GPU预测时间(包括不同batchsize的T4卡预测速度)请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
 
 <div align="center">
 <img
-src="docs/images/models/mobile_arm_top1.png" width="700">
+src="./docs/images/models/mobile_arm_top1.png" width="700">
 </div>
 
 上图对比了一些最新的面向移动端应用场景的模型，在骁龙855（SD855）上预测一张图像的时间和其准确率，包括MobileNetV1系列、MobileNetV2系列、MobileNetV3系列和ShuffleNetV2系列。图中准确率79%的MV3_large_x1_0_ssld（M是MobileNet的简称），71.3%的MV3_small_x1_0_ssld、76.74%的MV2_ssld和77.89%的MV1_ssld，是采用PaddleClas提供的SSLD蒸馏方法训练的模型。MV3_large_x1_0_ssld_int8是进一步进行INT8量化的模型。不同模型的简介、FLOPS、Parameters和模型存储大小请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
 
 - TODO
 - [ ] EfficientLite、GhostNet、RegNet论文指标复现和性能评估
-- [ ] The speed benchmark on P4/T4
 
 ## 高阶优化支持
 除了提供丰富的分类网络结构和预训练模型，PaddleClas也支持了一系列有助于图像分类任务效果和效率提升的算法或工具。
@@ -41,14 +40,14 @@ src="docs/images/models/mobile_arm_top1.png" width="700">
 
 <div align="center">
 <img
-src="docs/images/distillation/distillation_perform.png" width="700">
+src="./docs/images/distillation/distillation_perform_s.jpg" width="700">
 </div>
 
 以在ImageNet1K蒸馏模型为例，SSLD知识蒸馏方案框架图如下，该方案的核心关键点包括教师模型的选择、loss计算方式、迭代轮数、无标签数据的使用、以及ImageNet1k蒸馏finetune，每部分的详细介绍以及实验介绍请参考文档教程中的[**知识蒸馏章节**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/distillation/index.html)。
 
 <div align="center">
 <img
-src="docs/images/distillation/ppcls_distillation.png" width="700">
+src="./docs/images/distillation/ppcls_distillation_s.jpg" width="700">
 </div>
 
 ### 数据增广
@@ -57,14 +56,14 @@ src="docs/images/distillation/ppcls_distillation.png" width="700">
 
 <div align="center">
 <img
-src="docs/images/image_aug/image_aug_samples.png" width="800">
+src="./docs/images/image_aug/image_aug_samples_s.jpg" width="800">
 </div>
 
 PaddleClas提供了上述8种数据增广算法的复现和在统一实验环境下的效果评估。下图展示了不同数据增广方式在ResNet50上的表现, 与标准变换相比，采用数据增广，识别准确率最高可以提升1%。每种数据增广方法的详细介绍、对比的实验环境请参考文档教程中的[**数据增广章节**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/image_augmentation/index.html)。
 
 <div align="center">
 <img
-src="docs/images/image_aug/main_image_aug.png" width="600">
+src="./docs/images/image_aug/main_image_aug_s.jpg" width="600">
 </div>
 
 ## 30分钟玩转PaddleClas
@@ -80,7 +79,7 @@ PaddleClas的安装说明、模型训练、预测、评估以及模型微调（f
 
 ### 10万类图像分类预训练模型
 在实际应用中，由于训练数据匮乏，往往将ImageNet1K数据集训练的分类模型作为预训练模型，进行图像分类的迁移学习。然而ImageNet1K数据集的类别只有1000种，预训练模型的特征迁移能力有限。因此百度自研了一个有语义体系的、粒度有粗有细的10w级别的Tag体系，通过人工或半监督方式，至今收集到 5500w+图片训练数据；该系统是国内甚至世界范围内最大规模的图片分类体系和训练集合。PaddleClas提供了在该数据集上训练的ResNet50_vd的模型。下表显示了一些实际应用场景中，使用ImageNet预训练模型和上述10万类图像分类预训练模型的效果比对，使用10万类图像分类预训练模型，识别准确率最高可以提升30%。
-    
+
 | 数据集   | 数据统计                | ImageNet预训练模型 | 10万类图像分类预训练模型 |
 |:--:|:--:|:--:|:--:|
 | 花卉    | class_num:102<br/>train/val:5789/2396      | 0.7779        | 0.9892        |
@@ -100,7 +99,7 @@ PaddleClas的安装说明、模型训练、预测、评估以及模型微调（f
 
 <div align="center">
 <img
-src="docs/images/det/pssdet.png" width="500">
+src="./docs/images/det/pssdet.png" width="500">
 </div>
 
 - TODO
diff --git a/configs/ResNet/ResNet50_fp16.yml b/configs/ResNet/ResNet50_fp16.yml
new file mode 100644
index 0000000000000000000000000000000000000000..a952833221a193cefe32c004f40181eaa409188d
--- /dev/null
+++ b/configs/ResNet/ResNet50_fp16.yml
@@ -0,0 +1,81 @@
+mode: 'train'
+ARCHITECTURE:
+    name: 'ResNet50'
+
+pretrained_model: ""
+model_save_dir: "./output/"
+classes_num: 1000
+total_images: 1281167
+save_interval: 1
+validate: True
+valid_interval: 1
+epochs: 120
+topk: 5
+image_shape: [3, 224, 224]
+
+# mixed precision training
+use_fp16: True
+amp_scale_loss: 128.0
+use_dynamic_loss_scaling: True
+
+use_mix: False
+ls_epsilon: -1
+
+LEARNING_RATE:
+    function: 'Piecewise'
+    params:
+        lr: 0.1
+        decay_epochs: [30, 60, 90]
+        gamma: 0.1
+
+OPTIMIZER:
+    function: 'Momentum'
+    params:
+        momentum: 0.9
+    regularizer:
+        function: 'L2'
+        factor: 0.000100
+
+TRAIN:
+    batch_size: 256
+    num_workers: 4
+    file_list: "./dataset/ILSVRC2012/train_list.txt"
+    data_dir: "./dataset/ILSVRC2012/"
+    shuffle_seed: 0
+    transforms:
+        - DecodeImage:
+            to_rgb: True
+            to_np: False
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1./255.
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - ToCHWImage:
+
+VALID:
+    batch_size: 64
+    num_workers: 4
+    file_list: "./dataset/ILSVRC2012/val_list.txt"
+    data_dir: "./dataset/ILSVRC2012/"
+    shuffle_seed: 0
+    transforms:
+        - DecodeImage:
+            to_rgb: True
+            to_np: False
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - ToCHWImage:
diff --git a/docs/images/distillation/distillation_perform_s.jpg b/docs/images/distillation/distillation_perform_s.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..07b1bf6790d4748b625529f5669c5ff581dab58d
Binary files /dev/null and b/docs/images/distillation/distillation_perform_s.jpg differ
diff --git a/docs/images/distillation/ppcls_distillation_s.jpg b/docs/images/distillation/ppcls_distillation_s.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..95402d401d265a008c5710dfcc711f9a16e6d962
Binary files /dev/null and b/docs/images/distillation/ppcls_distillation_s.jpg differ
diff --git a/docs/images/image_aug/image_aug_samples_s.jpg b/docs/images/image_aug/image_aug_samples_s.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..aad144ae635cc2c2e01c43e73496c279a96d7f09
Binary files /dev/null and b/docs/images/image_aug/image_aug_samples_s.jpg differ
diff --git a/docs/images/image_aug/main_image_aug_s.jpg b/docs/images/image_aug/main_image_aug_s.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..efaa27c640be17465acb77b3311752ea944c0acd
Binary files /dev/null and b/docs/images/image_aug/main_image_aug_s.jpg differ
diff --git a/docs/images/main_features_s.png b/docs/images/main_features_s.png
new file mode 100644
index 0000000000000000000000000000000000000000..d1e2acc515c6753193ce1b5f1a89a4435f905d74
Binary files /dev/null and b/docs/images/main_features_s.png differ
diff --git a/docs/images/models/DPN.png.flops.png b/docs/images/models/DPN.png.flops.png
deleted file mode 100644
index 72bb96f49812711035ec09ce0d8d44202d17cfcb..0000000000000000000000000000000000000000
Binary files a/docs/images/models/DPN.png.flops.png and /dev/null differ
diff --git a/docs/images/models/DPN.png.fp32.png b/docs/images/models/DPN.png.fp32.png
deleted file mode 100644
index bcd524c781d305fdacfd5a07886851fadbeeeddc..0000000000000000000000000000000000000000
Binary files a/docs/images/models/DPN.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/DPN.png.params.png b/docs/images/models/DPN.png.params.png
deleted file mode 100644
index 818f1961bbaf98cd0aff0c5405113f107fa02e54..0000000000000000000000000000000000000000
Binary files a/docs/images/models/DPN.png.params.png and /dev/null differ
diff --git a/docs/images/models/EfficientNet.png b/docs/images/models/EfficientNet.png
deleted file mode 100644
index 5556481c960432cab6080c243644ab43783ceabb..0000000000000000000000000000000000000000
Binary files a/docs/images/models/EfficientNet.png and /dev/null differ
diff --git a/docs/images/models/EfficientNet.png.flops.png b/docs/images/models/EfficientNet.png.flops.png
deleted file mode 100644
index dd3c36ced2973133bdd0bb6b300125b25fdcefe4..0000000000000000000000000000000000000000
Binary files a/docs/images/models/EfficientNet.png.flops.png and /dev/null differ
diff --git a/docs/images/models/EfficientNet.png.fp32.png b/docs/images/models/EfficientNet.png.fp32.png
deleted file mode 100644
index eca753f7d84699cebee13d15484291d96f0a9b6f..0000000000000000000000000000000000000000
Binary files a/docs/images/models/EfficientNet.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/EfficientNet.png.params.png b/docs/images/models/EfficientNet.png.params.png
deleted file mode 100644
index 2348c55013998bcb2ed2c3edff282493243ee37a..0000000000000000000000000000000000000000
Binary files a/docs/images/models/EfficientNet.png.params.png and /dev/null differ
diff --git a/docs/images/models/HRNet.png.flops.png b/docs/images/models/HRNet.png.flops.png
deleted file mode 100644
index 5f8ce9cd2c1c8ed9e8fb775bd89a8acd3a8e9402..0000000000000000000000000000000000000000
Binary files a/docs/images/models/HRNet.png.flops.png and /dev/null differ
diff --git a/docs/images/models/HRNet.png.fp32.png b/docs/images/models/HRNet.png.fp32.png
deleted file mode 100644
index 0e73fb4b57dc374560efa92429a2dd457c73369e..0000000000000000000000000000000000000000
Binary files a/docs/images/models/HRNet.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/HRNet.png.params.png b/docs/images/models/HRNet.png.params.png
deleted file mode 100644
index e4443a770ac0ca910d1158fe9adaf6dd92e680aa..0000000000000000000000000000000000000000
Binary files a/docs/images/models/HRNet.png.params.png and /dev/null differ
diff --git a/docs/images/models/Inception.png.flops.png b/docs/images/models/Inception.png.flops.png
deleted file mode 100644
index 589f3931c1feef3e1c0245566cd0c7e0a22782d8..0000000000000000000000000000000000000000
Binary files a/docs/images/models/Inception.png.flops.png and /dev/null differ
diff --git a/docs/images/models/Inception.png.fp32.png b/docs/images/models/Inception.png.fp32.png
deleted file mode 100644
index b9245800a2d7ca6fad5ed7457e55356d579a81d0..0000000000000000000000000000000000000000
Binary files a/docs/images/models/Inception.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/Inception.png.params.png b/docs/images/models/Inception.png.params.png
deleted file mode 100644
index 657c4360451b24905b9a1ce170e4da719d35d917..0000000000000000000000000000000000000000
Binary files a/docs/images/models/Inception.png.params.png and /dev/null differ
diff --git a/docs/images/models/ResNet.png.flops.png b/docs/images/models/ResNet.png.flops.png
deleted file mode 100644
index da1fd2eb359f57fbd545a5436b737fe23df6e891..0000000000000000000000000000000000000000
Binary files a/docs/images/models/ResNet.png.flops.png and /dev/null differ
diff --git a/docs/images/models/ResNet.png.fp32.png b/docs/images/models/ResNet.png.fp32.png
deleted file mode 100644
index 05020997f2b6eb2c926d2a8948ed69b393b6cd3b..0000000000000000000000000000000000000000
Binary files a/docs/images/models/ResNet.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/ResNet.png.params.png b/docs/images/models/ResNet.png.params.png
deleted file mode 100644
index 6fcbb69cc1e1e9a3402f2849fe016a73312d9a79..0000000000000000000000000000000000000000
Binary files a/docs/images/models/ResNet.png.params.png and /dev/null differ
diff --git a/docs/images/models/SeResNeXt.png.flops.png b/docs/images/models/SeResNeXt.png.flops.png
deleted file mode 100644
index 51d6d6497e9cd582ad671a79b9e24bbdc1a9bdae..0000000000000000000000000000000000000000
Binary files a/docs/images/models/SeResNeXt.png.flops.png and /dev/null differ
diff --git a/docs/images/models/SeResNeXt.png.fp32.png b/docs/images/models/SeResNeXt.png.fp32.png
deleted file mode 100644
index 452488955096f896ecb9dafe07885f666c92d8ad..0000000000000000000000000000000000000000
Binary files a/docs/images/models/SeResNeXt.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/SeResNeXt.png.params.png b/docs/images/models/SeResNeXt.png.params.png
deleted file mode 100644
index 9898f52fb0be6bdfb22d39a1f4a16c98e20ab510..0000000000000000000000000000000000000000
Binary files a/docs/images/models/SeResNeXt.png.params.png and /dev/null differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png b/docs/images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..1c87d711ea11f3d662cdf78e34959ddd1f355f76
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.DPN.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.DPN.png
new file mode 100644
index 0000000000000000000000000000000000000000..1eb393903bbf3b4e02b83962b33c0e0a3b4e341a
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.DPN.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.HRNet.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.HRNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..f21d63cd1a1e24481947875db23f8447af1a65ca
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.HRNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.Inception.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.Inception.png
new file mode 100644
index 0000000000000000000000000000000000000000..8095a3c0253170c00d8ae74af4dec25c4f9544eb
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.Inception.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..53a603ecad2e36df580167e134ef036df14d5596
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png
new file mode 100644
index 0000000000000000000000000000000000000000..99b8a039e0fda22053e7d7cb971d5e83b208ec6b
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png b/docs/images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..395a32f5c7e28ed09ee2b7e12e4a3ea2e9094154
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..24aabf8c3fb6607e4bb17f4d4dcc72d733476c48
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..689e73d31d70cf8566c356142080d79a0f64d6e3
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.png
new file mode 100644
index 0000000000000000000000000000000000000000..dc3922d2e2347f11b193057de9bbf730489b9cc1
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..deacfaca6a279fed852dfcbf0006dc497191a7a8
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..7177bbc56b374dd71c66231132ead01c8b141732
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..062ecd79d2fc3ab788212a238b8ca4627b0dd14f
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..4bb3f76caecbdbd43f04ed62507985476f5cac40
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..3905f0b38b7cffa856d944d99ef0035cf6d4b489
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..6fdc94b27130924d0e71a7a6954f239e632a4215
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..25a5d1648e1cf220a383377a8d9fd5ae3c9eceea
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.png
new file mode 100644
index 0000000000000000000000000000000000000000..7ef4f339ae2b9414b4cf71a8772cc4c9a92a0ad1
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..755adb5684562c341816ca98856490d270735ac5
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..44e03fdd20df450573889e5c4eca85cf5b686d9b
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..a461b9ec281c2340780da6d69e625a0947ffa42e
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..197522d16401ef6430313d36ac235a7b37e1d7f9
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..6943fc056c1cd22b4b4d1767acf72fd94a209d0d
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png
new file mode 100644
index 0000000000000000000000000000000000000000..8476efb33b73acd836993a5fec3967122da319ab
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png
new file mode 100644
index 0000000000000000000000000000000000000000..965efff5c1a3bd32c9bc2da9c1b3034dc2cd55ef
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..8c1be3ae9b32773ca56adac9dc1c15d9532e8f1a
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..a41a325b5f5e9e85b86abfae2e62c9e2132fc7ec
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.DPN.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.DPN.png
new file mode 100644
index 0000000000000000000000000000000000000000..10f542b7fc989da80af12244cd45662bccfe677b
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.DPN.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.EfficientNet.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.EfficientNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..0491ff3e6f4795fec2e131a4df31b4a3b102900e
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.EfficientNet.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.HRNet.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.HRNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..284088ec5555e2128eab564cf8331532cfe08370
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.HRNet.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.Inception.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.Inception.png
new file mode 100644
index 0000000000000000000000000000000000000000..0e0e42e7ccc319bd2c9842c3d99c849b547ac603
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.Inception.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.ResNet.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.ResNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..2332791866c06b9cdcfc576947280de27ba3667c
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.ResNet.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.SeResNeXt.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.SeResNeXt.png
new file mode 100644
index 0000000000000000000000000000000000000000..610c846347f008216589ed080d834588d7fedfe0
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.SeResNeXt.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg b/docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..407d4cff7d39b141dc210a523cd87a6d2f288bf2
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg differ
diff --git a/docs/images/models/main_fps_top1.png b/docs/images/models/main_fps_top1.png
deleted file mode 100755
index a7149d7de16c71dbd3359efdbcee519a486a2278..0000000000000000000000000000000000000000
Binary files a/docs/images/models/main_fps_top1.png and /dev/null differ
diff --git a/docs/images/models/mobile_arm_storage.png b/docs/images/models/mobile_arm_storage.png
old mode 100755
new mode 100644
index 350fd3ced05e802250a461a747f60c2e5cf1be0c..07e1f4f3fe95ed7e9a358b212f4d5d939fb94a02
Binary files a/docs/images/models/mobile_arm_storage.png and b/docs/images/models/mobile_arm_storage.png differ
diff --git a/docs/images/models/mobile_arm_top1.png b/docs/images/models/mobile_arm_top1.png
old mode 100755
new mode 100644
index 37091dd2c91a334c2888d5facec7598ad3219e84..06add75fe630510e6ab9af62b3fa93dc166b7944
Binary files a/docs/images/models/mobile_arm_top1.png and b/docs/images/models/mobile_arm_top1.png differ
diff --git a/docs/images/models/mobile_trt.png b/docs/images/models/mobile_trt.png
deleted file mode 100644
index d722548bae56b9aeca081cffe3e7d34494864033..0000000000000000000000000000000000000000
Binary files a/docs/images/models/mobile_trt.png and /dev/null differ
diff --git a/docs/images/models/mobile_trt.png.flops.png b/docs/images/models/mobile_trt.png.flops.png
deleted file mode 100644
index 6e7010906c58ef9e1f4f286725442ca85e638c51..0000000000000000000000000000000000000000
Binary files a/docs/images/models/mobile_trt.png.flops.png and /dev/null differ
diff --git a/docs/images/models/mobile_trt.png.params.png b/docs/images/models/mobile_trt.png.params.png
deleted file mode 100644
index 35b78a65b390f9843def8870bbd63f656070cdcc..0000000000000000000000000000000000000000
Binary files a/docs/images/models/mobile_trt.png.params.png and /dev/null differ
diff --git a/docs/zh_CN/extension/paddle_mobile_inference.md b/docs/zh_CN/extension/paddle_mobile_inference.md
index e076fc7d471e8fd3ada051e3df171b23953884fe..833231f736ffc19cb3b705025d7511dc8eba6bc4 100644
--- a/docs/zh_CN/extension/paddle_mobile_inference.md
+++ b/docs/zh_CN/extension/paddle_mobile_inference.md
@@ -1,5 +1,119 @@
 # Paddle-Lite
 
+## 一、简介
+
 [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) 是飞桨推出的一套功能完善、易用性强且性能卓越的轻量化推理引擎。
 轻量化体现在使用较少比特数用于表示神经网络的权重和激活，能够大大降低模型的体积，解决终端设备存储空间有限的问题，推理性能也整体优于其他框架。
-[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) 使用 Paddle-Lite 进行了[移动端模型的性能评估](../models/Mobile.md)，具体流程参考 [Paddle-Lite 文档](https://paddle-lite.readthedocs.io/zh/latest/)。
+[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) 使用 Paddle-Lite 进行了[移动端模型的性能评估](../models/Mobile.md)，本部分以`ImageNet1k`数据集的`MobileNetV1`模型为例，介绍怎样使用`Paddle-Lite`，在移动端(基于骁龙855的安卓开发平台)对进行模型速度评估。
+
+
+## 二、评估步骤
+
+### 2.1 导出inference模型
+
+* 首先需要将训练过程中保存的模型存储为用于预测部署的固化模型，可以使用`tools/export_model.py`导出inference模型，具体使用方法如下。
+
+```shell
+python tools/export_model.py -m MobileNetV1 -p pretrained/MobileNetV1_pretrained/ -o inference/MobileNetV1
+```
+
+最终在`inference/MobileNetV1`文件夹下会保存得到`model`与`parmas`文件。
+
+
+### 2.2 benchmark二进制文件下载
+
+* 使用adb(Android Debug Bridge)工具可以连接Android手机与PC端，并进行开发调试等。安装好adb，并确保PC端和手机连接成功后，使用以下命令可以查看手机的ARM版本，并基于此选择合适的预编译库。
+
+```shell
+adb shell getprop ro.product.cpu.abi
+```
+
+* 下载benchmark_bin文件
+
+```shell
+wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v8
+```
+
+如果查看的ARM版本为v7，则需要下载v7版本的benchmark_bin文件，下载命令如下。
+
+```shell
+wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v7
+```
+
+### 2.3 模型速度benchmark
+
+PC端和手机连接成功后，使用下面的命令开始模型评估。
+
+```
+sh tools/lite/benchmark.sh ./benchmark_bin_v8 ./inference result_armv8.txt true
+```
+
+其中`./benchmark_bin_v8`为benchmark二进制文件路径，`./inference`为所有需要评测的模型的路径，`result_armv8.txt`为保存的结果文件，最后的参数`true`表示在评估之后会首先进行模型优化。最终在当前文件夹下会输出`result_armv8.txt`的评估结果文件，具体信息如下。
+
+```
+PaddleLite Benchmark
+Threads=1 Warmup=10 Repeats=30
+MobileNetV1                           min = 30.89100    max = 30.73600    average = 30.79750
+
+Threads=2 Warmup=10 Repeats=30
+MobileNetV1                           min = 18.26600    max = 18.14000    average = 18.21637
+
+Threads=4 Warmup=10 Repeats=30
+MobileNetV1                           min = 10.03200    max = 9.94300     average = 9.97627
+```
+
+这里给出了不同线程数下的模型预测速度，单位为FPS，以线程数为1为例，MobileNetV1在骁龙855上的平均速度为`30.79750FPS`。
+
+
+### 2.4 模型优化与速度评估
+
+
+* 在2.3节中提到了在模型评估之前对其进行优化，在这里也可以首先对模型进行优化，再直接加载优化后的模型进行速度评估。
+
+* Paddle-Lite 提供了多种策略来自动优化原始的训练模型，其中包括量化、子图融合、混合调度、Kernel优选等等方法。为了使优化过程更加方便易用，Paddle-Lite提供了opt 工具来自动完成优化步骤，输出一个轻量的、最优的可执行模型。可以在[Paddle-Lite模型优化工具页面](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html)下载。在这里以`MacOS`开发环境为例，下载[opt_mac](https://paddlelite-data.bj.bcebos.com/model_optimize_tool/opt_mac)模型优化工具，并使用下面的命令对模型进行优化。
+
+
+
+```shell
+model_file="../MobileNetV1/model"
+param_file="../MobileNetV1/params"
+opt_models_dir="./opt_models"
+mkdir ${opt_models_dir}
+./opt_mac --model_file=${model_file} \
+    --param_file=${param_file} \
+    --valid_targets=arm \
+    --optimize_out_type=naive_buffer \
+    --prefer_int8_kernel=false \
+    --optimize_out=${opt_models_dir}/MobileNetV1
+```
+
+其中`model_file`与`param_file`分别是导出的inference模型结构文件与参数文件地址，转换成功后，会在`opt_models`文件夹下生成`MobileNetV1.nb`文件。
+
+
+
+
+使用benchmark_bin文件加载优化后的模型进行评估，具体的命令如下。
+
+```shell
+bash benchmark.sh ./benchmark_bin_v8 ./opt_models result_armv8.txt
+```
+
+最终`result_armv8.txt`中结果如下。
+
+```
+PaddleLite Benchmark
+Threads=1 Warmup=10 Repeats=30
+MobileNetV1_lite              min = 30.89500    max = 30.78500    average = 30.84173
+
+Threads=2 Warmup=10 Repeats=30
+MobileNetV1_lite              min = 18.25300    max = 18.11000    average = 18.18017
+
+Threads=4 Warmup=10 Repeats=30
+MobileNetV1_lite              min = 10.00600    max = 9.90000     average = 9.96177
+```
+
+
+以线程数为1为例，MobileNetV1在骁龙855上的平均速度为`30.84173FPS`。
+
+
+更加具体的参数解释与Paddle-Lite使用方法可以参考 [Paddle-Lite 文档](https://paddle-lite.readthedocs.io/zh/latest/)。
diff --git a/docs/zh_CN/extension/paddle_serving.md b/docs/zh_CN/extension/paddle_serving.md
index 62b8cda8b74283fea701a86f6f05e5194b0a6da8..7b9102042b5018ab972bbd2542526186b5b402df 100644
--- a/docs/zh_CN/extension/paddle_serving.md
+++ b/docs/zh_CN/extension/paddle_serving.md
@@ -1,4 +1,65 @@
 # 模型服务化部署
 
-[Paddle Serving](https://github.com/PaddlePaddle/Serving) 旨在帮助深度学习开发者轻易部署在线预测服务，支持一键部署工业级的服务能力、客户端和服务端之间高并发和高效通信、并支持多种编程语言开发客户端等特点，详细使用请参考 [Paddle Serving 相关文档](https://github.com/PaddlePaddle/Serving)。
+## 一、简介
+[Paddle Serving](https://github.com/PaddlePaddle/Serving) 旨在帮助深度学习开发者轻易部署在线预测服务，支持一键部署工业级的服务能力、客户端和服务端之间高并发和高效通信、并支持多种编程语言开发客户端。
 
+该部分以HTTP预测服务部署为例，介绍怎样在PaddleClas中使用PaddleServing部署模型服务。
+
+
+## 二、Serving安装
+
+Serving官网推荐使用docker安装并部署Serving环境。首先需要拉取docker环境并创建基于Serving的docker。
+
+```shell
+nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker exec -it test bash
+```
+
+进入docker后，需要安装Serving相关的python包。
+
+```shell
+pip install paddlepaddle-gpu
+pip install paddle-serving-client
+pip install paddle-serving-server-gpu
+```
+
+* 如果安装速度太慢，可以通过`-i https://pypi.tuna.tsinghua.edu.cn/simple`更换源，加速安装过程。
+
+* 如果希望部署CPU服务，可以安装serving-server的cpu版本，安装命令如下。
+
+```shell
+pip install paddle-serving-server
+```
+
+### 三、导出模型
+
+使用`tools/export_serving_model.py`脚本导出Serving模型，以`ResNet50_vd`为例，使用方法如下。
+
+```shell
+python tools/export_serving_model.py -m ResNet50_vd -p ./pretrained/ResNet50_vd_pretrained/ -o serving
+```
+
+最终在serving文件夹下会生成`ppcls_client_conf`与`ppcls_model`两个文件夹，分别存储了client配置、模型参数与结构文件。
+
+
+### 四、服务部署与请求
+
+* 使用下面的方式启动Serving服务。
+
+```shell
+python tools/serving/image_service_gpu.py serving/ppcls_model workdir 9292
+```
+
+其中`serving/ppcls_model`为刚才保存的Serving模型地址，`workdir`为为工作目录，`9292`为服务的端口号。
+
+
+* 使用下面的脚本向Serving服务发送识别请求，并返回结果。
+
+```
+python tools/serving/image_http_client.py  9292 ./docs/images/logo.png
+```
+
+`9292`为发送请求的端口号，需要与服务启动时的端口号保持一致，`./docs/images/logo.png`为待识别的图像文件。最终返回Top1识别结果的类别ID以及概率值。
+
+* 更多的服务部署类型，如`RPC预测服务`等，可以参考Serving的github官网：[https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet)
diff --git a/docs/zh_CN/faq.md b/docs/zh_CN/faq.md
index fe83d31ba03bf649aab1bff097d78874310fb4e6..36aff3ec2458c80802a9da2c2c9244efa3387dd9 100644
--- a/docs/zh_CN/faq.md
+++ b/docs/zh_CN/faq.md
@@ -1,10 +1,5 @@
 # FAQ
 
->>
-* Q: 启动训练后，为什么当前终端中的输出信息一直没有更新？
-* A: 启动运行后，日志会实时输出到`mylog/workerlog.*`中，可以在这里查看实时的日志。
-
-
 >>
 * Q: 多卡评估时，为什么每张卡输出的精度指标不相同？
 * A: 目前PaddleClas基于fleet api使用多卡，在多卡评估时，每张卡都是单独读取各自part的数据，不同卡中计算的图片是不同的，因此最终指标也会有微量差异，如果希望得到准确的评估指标，可以使用单卡评估。
@@ -18,3 +13,36 @@
 >>
 * Q: 评估和预测时，已经指定了预训练模型所在文件夹的地址，但是仍然无法导入参数，这么为什么呢？
 * A: 加载预训练模型时，需要指定预训练模型的前缀，例如预训练模型参数所在的文件夹为`output/ResNet50_vd/19`，预训练模型参数的名称为`output/ResNet50_vd/19/ppcls.pdparams`，则`pretrained_model`参数需要指定为`output/ResNet50_vd/19/ppcls`，PaddleClas会自动补齐`.pdparams`的后缀。
+
+
+>>
+* Q: 在评测`EfficientNetB0_small`模型时，为什么最终的精度始终比官网的低0.3%左右？
+* A: `EfficientNet`系列的网络在进行resize的时候，是使用`cubic插值方式`(resize参数的interpolation值设置为2)，而其他模型默认情况下为None，因此在训练和评估的时候需要显式地指定resiz的interpolation值。具体地，可以参考以下配置中预处理过程中ResizeImage的参数。
+```
+VALID:
+    batch_size: 16
+    num_workers: 4
+    file_list: "./dataset/ILSVRC2012/val_list.txt"
+    data_dir: "./dataset/ILSVRC2012/"
+    shuffle_seed: 0
+    transforms:
+        - DecodeImage:
+            to_rgb: True
+            to_np: False
+            channel_first: False
+        - ResizeImage:
+            resize_short: 256
+            interpolation: 2
+        - CropImage:
+            size: 224
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+        - ToCHWImage:
+```
+
+>>
+* Q: 如果想将保存的`pdparams`模型参数文件转换为早期版本(Paddle1.7.0之前)的零碎文件(每个文件均为一个单独的模型参数)，该怎么实现呢？
+* A: 可以首先导入`pdparams`模型，之后使用`fluid.io.save_vars`函数将模型保存为零散的碎文件。
diff --git a/docs/zh_CN/models/DPN_DenseNet.md b/docs/zh_CN/models/DPN_DenseNet.md
index 53092bd74014c42fae872cd20e0ea24ff858989c..25f61476e43d752f2426c000885166963de7ebc4 100644
--- a/docs/zh_CN/models/DPN_DenseNet.md
+++ b/docs/zh_CN/models/DPN_DenseNet.md
@@ -4,13 +4,15 @@
 DenseNet是2017年CVPR best paper提出的一种新的网络结构，该网络设计了一种新的跨层连接的block，即dense-block。相比ResNet中的bottleneck，dense-block设计了一个更激进的密集连接机制，即互相连接所有的层，每个层都会接受其前面所有层作为其额外的输入。DenseNet将所有的dense-block堆叠，组合成了一个密集连接型网络。密集的连接方式使得DenseNe更容易进行梯度的反向传播，使得网络更容易训练。
 DPN的全称是Dual Path Networks，即双通道网络。该网络是由DenseNet和ResNeXt结合的一个网络，其证明了DenseNet能从靠前的层级中提取到新的特征，而ResNeXt本质上是对之前层级中已提取特征的复用。作者进一步分析发现，ResNeXt对特征有高复用率，但冗余度低，DenseNet能创造新特征，但冗余度高。结合二者结构的优势，作者设计了DPN网络。最终DPN网络在同样FLOPS和参数量下，取得了比ResNeXt与DenseNet更好的结果。
 
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
 
-![](../../images/models/DPN.png.flops.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png)
 
-![](../../images/models/DPN.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png)
 
-![](../../images/models/DPN.png.fp32.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.DPN.png)
 
 目前PaddleClas开源的这两类模型的预训练模型一共有10个，其指标如上图所示，可以看到，在相同的FLOPS和参数量下，相比DenseNet，DPN拥有更高的精度。但是由于DPN有更多的分支，所以其推理速度要慢于DenseNet。由于DenseNet264的网络层数最深，所以该网络是DenseNet系列模型中参数量最大的网络，DenseNet161的网络的宽度最大，导致其是该系列中网络中计算量最大、精度最高的网络。从推理速度来看，计算量大且精度高的的DenseNet161比DenseNet264具有更快的速度，所以其比DenseNet264具有更大的优势。
 
@@ -34,9 +36,9 @@ DPN的全称是Dual Path Networks，即双通道网络。该网络是由DenseNet
 
 
 
-## FP32预测速度
+## 基于V100 GPU的预测速度
 
-| Models                               | Crop Size | Resize Short Size | Batch Size=1<br>(ms) |
+| Models                               | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-------------|-----------|-------------------|--------------------------|
 | DenseNet121 | 224       | 256               | 4.371                    |
 | DenseNet161 | 224       | 256               | 8.863                    |
@@ -48,3 +50,20 @@ DPN的全称是Dual Path Networks，即双通道网络。该网络是由DenseNet
 | DPN98       | 224       | 256               | 21.057                   |
 | DPN107      | 224       | 256               | 28.685                   |
 | DPN131      | 224       | 256               | 28.083                   |
+
+
+
+## 基于T4 GPU的预测速度
+
+| Models      | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| DenseNet121 | 224       | 256               | 4.16436                      | 7.2126                       | 10.50221                     | 4.40447                      | 9.32623                      | 15.25175                     |
+| DenseNet161 | 224       | 256               | 9.27249                      | 14.25326                     | 20.19849                     | 10.39152                     | 22.15555                     | 35.78443                     |
+| DenseNet169 | 224       | 256               | 6.11395                      | 10.28747                     | 13.68717                     | 6.43598                      | 12.98832                     | 20.41964                     |
+| DenseNet201 | 224       | 256               | 7.9617                       | 13.4171                      | 17.41949                     | 8.20652                      | 17.45838                     | 27.06309                     |
+| DenseNet264 | 224       | 256               | 11.70074                     | 19.69375                     | 24.79545                     | 12.14722                     | 26.27707                     | 40.01905                     |
+| DPN68       | 224       | 256               | 11.7827                      | 13.12652                     | 16.19213                     | 11.64915                     | 12.82807                     | 18.57113                     |
+| DPN92       | 224       | 256               | 18.56026                     | 20.35983                     | 29.89544                     | 18.15746                     | 23.87545                     | 38.68821                     |
+| DPN98       | 224       | 256               | 21.70508                     | 24.7755                      | 40.93595                     | 21.18196                     | 33.23925                     | 62.77751                     |
+| DPN107      | 224       | 256               | 27.84462                     | 34.83217                     | 60.67903                     | 27.62046                     | 52.65353                     | 100.11721                    |
+| DPN131      | 224       | 256               | 28.58941                     | 33.01078                     | 55.65146                     | 28.33119                     | 46.19439                     | 89.24904                     |
diff --git a/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md b/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md
index eadd17683d0219b4ee72d7aba71dc04743896ea5..3cbe4009dfd17cbc88ead5f3bb91d4ed9cc7470d 100644
--- a/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md
+++ b/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md
@@ -6,13 +6,15 @@ EfficientNet是Google于2019年发布的一个基于NAS的轻量级网络，其
 ResNeXt是facebook于2016年提出的一种对ResNet的改进版网络。在2019年，facebook通过弱监督学习研究了该系列网络在ImageNet上的精度上限，为了区别之前的ResNeXt网络，该系列网络的后缀为wsl，其中wsl是弱监督学习（weakly-supervised-learning）的简称。为了能有更强的特征提取能力，研究者将其网络宽度进一步放大，其中最大的ResNeXt101_32x48d_wsl拥有8亿个参数，将其在9.4亿的弱标签图片下训练并在ImageNet-1k上做finetune，最终在ImageNet-1k的top-1达到了85.4%，这也是迄今为止在ImageNet-1k的数据集上以224x224的分辨率下精度最高的网络。Fix-ResNeXt中，作者使用了更大的图像分辨率，针对训练图片和验证图片数据预处理不一致的情况下做了专门的Fix策略，并使得ResNeXt101_32x48d_wsl拥有了更高的精度，由于其用到了Fix策略，故命名为Fix-ResNeXt101_32x48d_wsl。
 
 
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
 
-![](../../images/models/EfficientNet.png.flops.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png)
 
-![](../../images/models/EfficientNet.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png)
 
-![](../../images/models/EfficientNet.png.fp32.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png)
 
 目前PaddleClas开源的这两类模型的预训练模型一共有14个。从上图中可以看出EfficientNet系列网络优势非常明显，ResNeXt101_wsl系列模型由于用到了更多的数据，最终的精度也更高。EfficientNet_B0_Small是去掉了SE_block的EfficientNet_B0，其具有更快的推理速度。
 
@@ -36,9 +38,9 @@ ResNeXt是facebook于2016年提出的一种对ResNet的改进版网络。在2019
 | EfficientNetB0_<br>small      | 0.758  | 0.926  |                   |                   | 0.720        | 4.650             |
 
 
-## FP32预测速度
+## 基于V100 GPU的预测速度
 
-| Models                               | Crop Size | Resize Short Size | Batch Size=1<br>(ms) |
+| Models                               | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-------------------------------|-----------|-------------------|--------------------------|
 | ResNeXt101_<br>32x8d_wsl      | 224       | 256               | 19.127                   |
 | ResNeXt101_<br>32x16d_wsl     | 224       | 256               | 23.629                   |
@@ -54,3 +56,24 @@ ResNeXt是facebook于2016年提出的一种对ResNet的改进版网络。在2019
 | EfficientNetB6                | 528       | 560               | 18.381                   |
 | EfficientNetB7                | 600       | 632               | 27.817                   |
 | EfficientNetB0_<br>small      | 224       | 256               | 1.692                    |
+
+
+
+## 基于T4 GPU的预测速度
+
+| Models                    | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| ResNeXt101_<br>32x8d_wsl      | 224       | 256               | 18.19374                     | 21.93529                     | 34.67802                     | 18.52528                     | 34.25319                     | 67.2283                      |
+| ResNeXt101_<br>32x16d_wsl     | 224       | 256               | 18.52609                     | 36.8288                      | 62.79947                     | 25.60395                     | 71.88384                     | 137.62327                    |
+| ResNeXt101_<br>32x32d_wsl     | 224       | 256               | 33.51391                     | 70.09682                     | 125.81884                    | 54.87396                     | 160.04337                    | 316.17718                    |
+| ResNeXt101_<br>32x48d_wsl     | 224       | 256               | 50.97681                     | 137.60926                    | 190.82628                    | 99.01698256                  | 315.91261                    | 551.83695                    |
+| Fix_ResNeXt101_<br>32x48d_wsl | 320       | 320               | 78.62869                     | 191.76039                    | 317.15436                    | 160.0838242                  | 595.99296                    | 1151.47384                   |
+| EfficientNetB0            | 224       | 256               | 3.40122                      | 5.95851                      | 9.10801                      | 3.442                        | 6.11476                      | 9.3304                       |
+| EfficientNetB1            | 240       | 272               | 5.25172                      | 9.10233                      | 14.11319                     | 5.3322                       | 9.41795                      | 14.60388                     |
+| EfficientNetB2            | 260       | 292               | 5.91052                      | 10.5898                      | 17.38106                     | 6.29351                      | 10.95702                     | 17.75308                     |
+| EfficientNetB3            | 300       | 332               | 7.69582                      | 16.02548                     | 27.4447                      | 7.67749                      | 16.53288                     | 28.5939                      |
+| EfficientNetB4            | 380       | 412               | 11.55585                     | 29.44261                     | 53.97363                     | 12.15894                     | 30.94567                     | 57.38511                     |
+| EfficientNetB5            | 456       | 488               | 19.63083                     | 56.52299                     | -                            | 20.48571                     | 61.60252                     | -                            |
+| EfficientNetB6            | 528       | 560               | 30.05911                     | -                            | -                            | 32.62402                     | -                            | -                            |
+| EfficientNetB7            | 600       | 632               | 47.86087                     | -                            | -                            | 53.93823                     | -                            | -                            |
+| EfficientNetB0_small      | 224       | 256               | 2.39166                      | 4.36748                      | 6.96002                      | 2.3076                       | 4.71886                      | 7.21888                      |
diff --git a/docs/zh_CN/models/HRNet.md b/docs/zh_CN/models/HRNet.md
index c33fb0fa025fd9b7f1993186d622c2acfdd22acb..f694f7b0c1d6d6c9b195fa61aa0cc9544564859d 100644
--- a/docs/zh_CN/models/HRNet.md
+++ b/docs/zh_CN/models/HRNet.md
@@ -3,13 +3,16 @@
 ## 概述
 HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络，不同于以往的卷积神经网络，该网络在网络深层仍然可以保持高分辨率，因此预测的关键点热图更准确，在空间上也更精确。此外，该网络在对分辨率敏感的其他视觉任务中，如检测、分割等，表现尤为优异。
 
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
 
-![](../../images/models/HRNet.png.flops.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png)
 
-![](../../images/models/HRNet.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.HRNet.png)
 
-![](../../images/models/HRNet.png.fp32.png)
 目前PaddleClas开源的这类模型的预训练模型一共有7个，其指标如图所示，其中HRNet_W48_C指标精度异常的原因可能是因为网络训练的正常波动。
 
 
@@ -26,9 +29,9 @@ HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络，
 | HRNet_W64_C | 0.793  | 0.946  | 0.795             | 0.946             | 57.830       | 128.060           |
 
 
-## FP32预测速度
+## 基于V100 GPU的预测速度
 
-| Models      | Crop Size | Resize Short Size | Batch Size=1<br>(ms) |
+| Models      | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-------------|-----------|-------------------|--------------------------|
 | HRNet_W18_C | 224       | 256               | 7.368                    |
 | HRNet_W30_C | 224       | 256               | 9.402                    |
@@ -37,3 +40,18 @@ HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络，
 | HRNet_W44_C | 224       | 256               | 11.497                   |
 | HRNet_W48_C | 224       | 256               | 12.165                   |
 | HRNet_W64_C | 224       | 256               | 15.003                   |
+
+
+
+
+## 基于T4 GPU的预测速度
+
+| Models      | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| HRNet_W18_C | 224       | 256               | 6.79093                      | 11.50986                     | 17.67244                     | 7.40636                      | 13.29752                     | 23.33445                     |
+| HRNet_W30_C | 224       | 256               | 8.98077                      | 14.08082                     | 21.23527                     | 9.57594                      | 17.35485                     | 32.6933                      |
+| HRNet_W32_C | 224       | 256               | 8.82415                      | 14.21462                     | 21.19804                     | 9.49807                      | 17.72921                     | 32.96305                     |
+| HRNet_W40_C | 224       | 256               | 11.4229                      | 19.1595                      | 30.47984                     | 12.12202                     | 25.68184                     | 48.90623                     |
+| HRNet_W44_C | 224       | 256               | 12.25778                     | 22.75456                     | 32.61275                     | 13.19858                     | 32.25202                     | 59.09871                     |
+| HRNet_W48_C | 224       | 256               | 12.65015                     | 23.12886                     | 33.37859                     | 13.70761                     | 34.43572                     | 63.01219                     |
+| HRNet_W64_C | 224       | 256               | 15.10428                     | 27.68901                     | 40.4198                      | 17.57527                     | 47.9533                      | 97.11228                     |
diff --git a/docs/zh_CN/models/Inception.md b/docs/zh_CN/models/Inception.md
index 8c9333656b1e78199b1f29feff17ef5d15c593b8..b85c2bf1b5068936daa3091215540b866d4d31b3 100644
--- a/docs/zh_CN/models/Inception.md
+++ b/docs/zh_CN/models/Inception.md
@@ -9,13 +9,15 @@ Xception 是 Google 继 Inception 后提出的对 InceptionV3 的另一种改进
 InceptionV4是2016年由Google设计的新的神经网络，当时残差结构风靡一时，但是作者认为仅使用Inception 结构也可以达到很高的性能。InceptionV4使用了更多的Inception module，在ImageNet上的精度再创新高。
 
 
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
 
-![](../../images/models/Inception.png.flops.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png)
 
-![](../../images/models/Inception.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png)
 
-![](../../images/models/Inception.png.fp32.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.Inception.png)
 
 上图反映了Xception系列和InceptionV4的精度和其他指标的关系。其中Xception_deeplab与论文结构保持一致，Xception是PaddleClas的改进模型，在预测速度基本不变的情况下，精度提升约0.6%。关于该改进模型的详细介绍正在持续更新中，敬请期待。
 
@@ -35,14 +37,28 @@ InceptionV4是2016年由Google设计的新的神经网络，当时残差结构
 
 
 
-## FP32预测速度
+## 基于V100 GPU的预测速度
 
-| Models                 | Crop Size | Resize Short Size | Batch Size=1<br>(ms) |
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |------------------------|-----------|-------------------|--------------------------|
 | GoogLeNet              | 224       | 256               | 1.807                    |
 | Xception41             | 299       | 320               | 3.972                    |
-| Xception41<br>_deeplab | 299       | 320               | 4.408                    |
+| Xception41_<br>deeplab | 299       | 320               | 4.408                    |
 | Xception65             | 299       | 320               | 6.174                    |
-| Xception65<br>_deeplab | 299       | 320               | 6.464                    |
+| Xception65_<br>deeplab | 299       | 320               | 6.464                    |
 | Xception71             | 299       | 320               | 6.782                    |
 | InceptionV4            | 299       | 320               | 11.141                   |
+
+
+
+## 基于T4 GPU的预测速度
+
+| Models             | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| GoogLeNet          | 299       | 320               | 1.75451                      | 3.39931                      | 4.71909                      | 1.88038                      | 4.48882                      | 6.94035                      |
+| Xception41         | 299       | 320               | 2.91192                      | 7.86878                      | 15.53685                     | 4.96939                      | 17.01361                     | 32.67831                     |
+| Xception41_<br>deeplab | 299       | 320               | 2.85934                      | 7.2075                       | 14.01406                     | 5.33541                      | 17.55938                     | 33.76232                     |
+| Xception65         | 299       | 320               | 4.30126                      | 11.58371                     | 23.22213                     | 7.26158                      | 25.88778                     | 53.45426                     |
+| Xception65_<br>deeplab | 299       | 320               | 4.06803                      | 9.72694                      | 19.477                       | 7.60208                      | 26.03699                     | 54.74724                     |
+| Xception71         | 299       | 320               | 4.80889                      | 13.5624                      | 27.18822                     | 8.72457                      | 31.55549                     | 69.31018                     |
+| InceptionV4        | 299       | 320               | 9.50821                      | 13.72104                     | 20.27447                     | 12.99342                     | 25.23416                     | 43.56121                     |
diff --git a/docs/zh_CN/models/Mobile.md b/docs/zh_CN/models/Mobile.md
index 6cf4ed90241ca959e7c5b66f9a14a8c415e7dd87..3c0ebe37693b5564fbc654d99948d69b04b97a85 100644
--- a/docs/zh_CN/models/Mobile.md
+++ b/docs/zh_CN/models/Mobile.md
@@ -8,10 +8,16 @@ MobileNetV2是Google继MobileNetV1提出的一种轻量级网络。相比MobileN
 ShuffleNet系列网络是旷视提出的轻量化网络结构，到目前为止，该系列网络一共有两种典型的结构，即ShuffleNetV1与ShuffleNetV2。ShuffleNet中的Channel Shuffle操作可以将组间的信息进行交换，并且可以实现端到端的训练。在ShuffleNetV2的论文中，作者提出了设计轻量级网络的四大准则，并且根据四大准则与ShuffleNetV1的不足，设计了ShuffleNetV2网络。
 
 MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络，为了进一步提升效果，将relu和sigmoid激活函数分别替换为hard_swish与hard_sigmoid激活函数，同时引入了一些专门减小网络计算量的改进策略。
+
 ![](../../images/models/mobile_arm_top1.png)
+
 ![](../../images/models/mobile_arm_storage.png)
-![](../../images/models/mobile_trt.png.flops.png)
-![](../../images/models/mobile_trt.png.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png)
+
+
 目前PaddleClas开源的的移动端系列的预训练模型一共有32个，其指标如图所示。从图片可以看出，越新的轻量级模型往往有更优的表现，MobileNetV3代表了目前最新的轻量级神经网络结构。在MobileNetV3中，作者为了获得更高的精度，在global-avg-pooling后使用了1x1的卷积。该操作大幅提升了参数量但对计算量影响不大，所以如果从存储角度评价模型的优异程度，MobileNetV3优势不是很大，但由于其更小的计算量，使得其有更快的推理速度。此外，我们模型库中的ssld蒸馏模型表现优异，从各个考量角度下，都刷新了当前轻量级模型的精度。由于MobileNetV3模型结构复杂，分支较多，对GPU并不友好，GPU预测速度不如MobileNetV1。
 
 
@@ -53,9 +59,9 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络
 | ShuffleNetV2_swish                   | 0.700   | 0.892   |                   |                   | 0.290        | 2.260             |
 
 
-## CPU预测速度和存储大小
+## 基于SD855的预测速度和存储大小
 
-| Models                               | batch_size=1(ms) | Storage Size(M) |
+| Models                               | Batch Size=1(ms) | Storage Size(M) |
 |:--:|:--:|:--:|
 | MobileNetV1_x0_25                    | 3.220            | 1.900           |
 | MobileNetV1_x0_5                     | 9.580            | 5.200           |
@@ -89,3 +95,40 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络
 | ShuffleNetV2_x1_5                    | 19.352           | 14.000          |
 | ShuffleNetV2_x2_0                    | 34.770           | 28.000          |
 | ShuffleNetV2_swish                   | 16.023           | 9.100           |
+
+
+## 基于T4 GPU的预测速度
+
+| Models            | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-----------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
+| MobileNetV1_x0_25           | 0.68422               | 1.13021               | 1.72095               | 0.67274               | 1.226                 | 1.84096               |
+| MobileNetV1_x0_5            | 0.69326               | 1.09027               | 1.84746               | 0.69947               | 1.43045               | 2.39353               |
+| MobileNetV1_x0_75           | 0.6793                | 1.29524               | 2.15495               | 0.79844               | 1.86205               | 3.064                 |
+| MobileNetV1                 | 0.71942               | 1.45018               | 2.47953               | 0.91164               | 2.26871               | 3.90797               |
+| MobileNetV1_ssld            | 0.71942               | 1.45018               | 2.47953               | 0.91164               | 2.26871               | 3.90797               |
+| MobileNetV2_x0_25           | 2.85399               | 3.62405               | 4.29952               | 2.81989               | 3.52695               | 4.2432                |
+| MobileNetV2_x0_5            | 2.84258               | 3.1511                | 4.10267               | 2.80264               | 3.65284               | 4.31737               |
+| MobileNetV2_x0_75           | 2.82183               | 3.27622               | 4.98161               | 2.86538               | 3.55198               | 5.10678               |
+| MobileNetV2                 | 2.78603               | 3.71982               | 6.27879               | 2.62398               | 3.54429               | 6.41178               |
+| MobileNetV2_x1_5            | 2.81852               | 4.87434               | 8.97934               | 2.79398               | 5.30149               | 9.30899               |
+| MobileNetV2_x2_0            | 3.65197               | 6.32329               | 11.644                | 3.29788               | 7.08644               | 12.45375              |
+| MobileNetV2_ssld            | 2.78603               | 3.71982               | 6.27879               | 2.62398               | 3.54429               | 6.41178               |
+| MobileNetV3_large_x1_25     | 2.34387               | 3.16103               | 4.79742               | 2.35117               | 3.44903               | 5.45658               |
+| MobileNetV3_large_x1_0      | 2.20149               | 3.08423               | 4.07779               | 2.04296               | 2.9322                | 4.53184               |
+| MobileNetV3_large_x0_75     | 2.1058                | 2.61426               | 3.61021               | 2.0006                | 2.56987               | 3.78005               |
+| MobileNetV3_large_x0_5      | 2.06934               | 2.77341               | 3.35313               | 2.11199               | 2.88172               | 3.19029               |
+| MobileNetV3_large_x0_35     | 2.14965               | 2.7868                | 3.36145               | 1.9041                | 2.62951               | 3.26036               |
+| MobileNetV3_small_x1_25     | 2.06817               | 2.90193               | 3.5245                | 2.02916               | 2.91866               | 3.34528               |
+| MobileNetV3_small_x1_0      | 1.73933               | 2.59478               | 3.40276               | 1.74527               | 2.63565               | 3.28124               |
+| MobileNetV3_small_x0_75     | 1.80617               | 2.64646               | 3.24513               | 1.93697               | 2.64285               | 3.32797               |
+| MobileNetV3_small_x0_5      | 1.95001               | 2.74014               | 3.39485               | 1.88406               | 2.99601               | 3.3908                |
+| MobileNetV3_small_x0_35     | 2.10683               | 2.94267               | 3.44254               | 1.94427               | 2.94116               | 3.41082               |
+| MobileNetV3_large_x1_0_ssld | 2.20149               | 3.08423               | 4.07779               | 2.04296               | 2.9322                | 4.53184               |
+| MobileNetV3_small_x1_0_ssld | 1.73933               | 2.59478               | 3.40276               | 1.74527               | 2.63565               | 3.28124               |
+| ShuffleNetV2                | 1.95064               | 2.15928               | 2.97169               | 1.89436               | 2.26339               | 3.17615               |
+| ShuffleNetV2_x0_25          | 1.43242               | 2.38172               | 2.96768               | 1.48698               | 2.29085               | 2.90284               |
+| ShuffleNetV2_x0_33          | 1.69008               | 2.65706               | 2.97373               | 1.75526               | 2.85557               | 3.09688               |
+| ShuffleNetV2_x0_5           | 1.48073               | 2.28174               | 2.85436               | 1.59055               | 2.18708               | 3.09141               |
+| ShuffleNetV2_x1_5           | 1.51054               | 2.4565                | 3.41738               | 1.45389               | 2.5203                | 3.99872               |
+| ShuffleNetV2_x2_0           | 1.95616               | 2.44751               | 4.19173               | 2.15654               | 3.18247               | 5.46893               |
+| ShuffleNetV2_swish          | 2.50213               | 2.92881               | 3.474                 | 2.5129                | 2.97422               | 3.69357               |
diff --git a/docs/zh_CN/models/Others.md b/docs/zh_CN/models/Others.md
index 35cabf68d5e2d05ec336d7e5cdfac3d363477971..c24f76652bc5e322df2533bbf8c59889bb420910 100644
--- a/docs/zh_CN/models/Others.md
+++ b/docs/zh_CN/models/Others.md
@@ -27,10 +27,10 @@ DarkNet53是YOLO作者在论文设计的用于目标检测的backbone，该网
 
 
 
-## FP32预测速度
+## 基于V100 GPU的预测速度
 
 
-| Models                 | Crop Size | Resize Short Size | Batch Size=1<br>(ms) |
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |---------------------------|-----------|-------------------|----------------------|
 | AlexNet                   | 224       | 256               | 1.176                |
 | SqueezeNet1_0             | 224       | 256               | 0.860                |
@@ -41,3 +41,20 @@ DarkNet53是YOLO作者在论文设计的用于目标检测的backbone，该网
 | VGG19                     | 224       | 256               | 3.076                |
 | DarkNet53                 | 256       | 256               | 3.139                |
 | ResNet50_ACNet<br>_deploy | 224       | 256               | 5.626                |
+
+
+
+## 基于T4 GPU的预测速度
+
+| Models                | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| AlexNet               | 224       | 256               | 1.06447                      | 1.70435                      | 2.38402                      | 1.44993                      | 2.46696                      | 3.72085                      |
+| SqueezeNet1_0         | 224       | 256               | 0.97162                      | 2.06719                      | 3.67499                      | 0.96736                      | 2.53221                      | 4.54047                      |
+| SqueezeNet1_1         | 224       | 256               | 0.81378                      | 1.62919                      | 2.68044                      | 0.76032                      | 1.877                        | 3.15298                      |
+| VGG11                 | 224       | 256               | 2.24408                      | 4.67794                      | 7.6568                       | 3.90412                      | 9.51147                      | 17.14168                     |
+| VGG13                 | 224       | 256               | 2.58589                      | 5.82708                      | 10.03591                     | 4.64684                      | 12.61558                     | 23.70015                     |
+| VGG16                 | 224       | 256               | 3.13237                      | 7.19257                      | 12.50913                     | 5.61769                      | 16.40064                     | 32.03939                     |
+| VGG19                 | 224       | 256               | 3.69987                      | 8.59168                      | 15.07866                     | 6.65221                      | 20.4334                      | 41.55902                     |
+| DarkNet53             | 256       | 256               | 3.18101                      | 5.88419                      | 10.14964                     | 4.10829                      | 12.1714                      | 22.15266                     |
+| ResNet50_ACNet        | 256       | 256               | 3.89002                      | 4.58195                      | 9.01095                      | 5.33395                      | 10.96843                     | 18.70368                     |
+| ResNet50_ACNet_deploy | 224       | 256               | 2.6823                       | 5.944                        | 7.16655                      | 3.49161                      | 7.78374                      | 13.94361                     |
diff --git a/docs/zh_CN/models/ResNet_and_vd.md b/docs/zh_CN/models/ResNet_and_vd.md
index bc2946e99bc270c084c146808188da53c475ebae..ea045f12ca545ca0cea2229fcfc1993fd50ec77b 100644
--- a/docs/zh_CN/models/ResNet_and_vd.md
+++ b/docs/zh_CN/models/ResNet_and_vd.md
@@ -10,18 +10,19 @@ ResNet系列模型是在2015年提出的，一举在ILSVRC2015比赛中取得冠
 
 其中，ResNet50_vd_v2与ResNet50_vd_ssld采用了知识蒸馏，保证模型结构不变的情况下，进一步提升了模型的精度，具体地，ResNet50_vd_v2的teacher模型是ResNet152_vd（top1准确率80.59%），数据选用的是ImageNet-1k的训练集，ResNet50_vd_ssld的teacher模型是ResNeXt101_32x16d_wsl（top1准确率84.2%），数据选用结合了ImageNet-1k的训练集和ImageNet-22k挖掘的400万数据。知识蒸馏的具体方法正在持续更新中。
 
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
 
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png)
 
-![](../../images/models/ResNet.png.flops.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png)
 
-![](../../images/models/ResNet.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.ResNet.png)
 
-![](../../images/models/ResNet.png.fp32.png)
 
 通过上述曲线可以看出，层数越多，准确率越高，但是相应的参数量、计算量和延时都会增加。ResNet50_vd_ssld通过用更强的teacher和更多的数据，将其在ImageNet-1k上的验证集top-1精度进一步提高，达到了82.39%，刷新了ResNet50系列模型的精度。
 
-**注意**：所有模型在预测时，图像的crop_size设置为224，resize_short_size设置为256。
 
 ## 精度、FLOPS和参数量
 
@@ -46,9 +47,9 @@ ResNet系列模型是在2015年提出的，一举在ILSVRC2015比赛中取得冠
 
 
 
-## FP32预测速度
+## 基于V100 GPU的预测速度
 
-| Models                 | Crop Size | Resize Short Size | Batch Size=1<br>(ms) |
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |------------------|-----------|-------------------|--------------------------|
 | ResNet18         | 224       | 256               | 1.499                    |
 | ResNet18_vd      | 224       | 256               | 1.603                    |
@@ -65,3 +66,24 @@ ResNet系列模型是在2015年提出的，一举在ILSVRC2015比赛中取得冠
 | ResNet200_vd     | 224       | 256               | 8.885                    |
 | ResNet50_vd_ssld | 224       | 256               | 3.165                    |
 | ResNet101_vd_ssld  | 224       | 256             | 5.252                  |
+
+
+## 基于T4 GPU的预测速度
+
+| Models            | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| ResNet18          | 224       | 256               | 1.3568                       | 2.5225                       | 3.61904                      | 1.45606                      | 3.56305                      | 6.28798                      |
+| ResNet18_vd       | 224       | 256               | 1.39593                      | 2.69063                      | 3.88267                      | 1.54557                      | 3.85363                      | 6.88121                      |
+| ResNet34          | 224       | 256               | 2.23092                      | 4.10205                      | 5.54904                      | 2.34957                      | 5.89821                      | 10.73451                     |
+| ResNet34_vd       | 224       | 256               | 2.23992                      | 4.22246                      | 5.79534                      | 2.43427                      | 6.22257                      | 11.44906                     |
+| ResNet50          | 224       | 256               | 2.63824                      | 4.63802                      | 7.02444                      | 3.47712                      | 7.84421                      | 13.90633                     |
+| ResNet50_vc       | 224       | 256               | 2.67064                      | 4.72372                      | 7.17204                      | 3.52346                      | 8.10725                      | 14.45577                     |
+| ResNet50_vd       | 224       | 256               | 2.65164                      | 4.84109                      | 7.46225                      | 3.53131                      | 8.09057                      | 14.45965                     |
+| ResNet50_vd_v2    | 224       | 256               | 2.65164                      | 4.84109                      | 7.46225                      | 3.53131                      | 8.09057                      | 14.45965                     |
+| ResNet101         | 224       | 256               | 5.04037                      | 7.73673                      | 10.8936                      | 6.07125                      | 13.40573                     | 24.3597                      |
+| ResNet101_vd      | 224       | 256               | 5.05972                      | 7.83685                      | 11.34235                     | 6.11704                      | 13.76222                     | 25.11071                     |
+| ResNet152         | 224       | 256               | 7.28665                      | 10.62001                     | 14.90317                     | 8.50198                      | 19.17073                     | 35.78384                     |
+| ResNet152_vd      | 224       | 256               | 7.29127                      | 10.86137                     | 15.32444                     | 8.54376                      | 19.52157                     | 36.64445                     |
+| ResNet200_vd      | 224       | 256               | 9.36026                      | 13.5474                      | 19.0725                      | 10.80619                     | 25.01731                     | 48.81399                     |
+| ResNet50_vd_ssld  | 224       | 256               | 2.65164                      | 4.84109                      | 7.46225                      | 3.53131                      | 8.09057                      | 14.45965                     |
+| ResNet101_vd_ssld | 224       | 256               | 5.05972                      | 7.83685                      | 11.34235                     | 6.11704                      | 13.76222                     | 25.11071                     |
diff --git a/docs/zh_CN/models/SEResNext_and_Res2Net.md b/docs/zh_CN/models/SEResNext_and_Res2Net.md
index 90955354aece52a3770c57c6d387c4bdf8238453..1a8c125ee931a2bc036834d122d529c26264bb66 100644
--- a/docs/zh_CN/models/SEResNext_and_Res2Net.md
+++ b/docs/zh_CN/models/SEResNext_and_Res2Net.md
@@ -7,18 +7,20 @@ SENet是2017年ImageNet分类比赛的冠军方案，其提出了一个全新的
 
 Res2Net是2019年提出的一种全新的对ResNet的改进方案，该方案可以和现有其他优秀模块轻松整合，在不增加计算负载量的情况下，在ImageNet、CIFAR-100等数据集上的测试性能超过了ResNet。Res2Net结构简单，性能优越，进一步探索了CNN在更细粒度级别的多尺度表示能力。Res2Net揭示了一个新的提升模型精度的维度，即scale，其是除了深度、宽度和基数的现有维度之外另外一个必不可少的更有效的因素。该网络在其他视觉任务如目标检测、图像分割等也有相当不错的表现。
 
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
 
-![](../../images/models/SeResNeXt.png.flops.png)
 
-![](../../images/models/SeResNeXt.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png)
 
-![](../../images/models/SeResNeXt.png.fp32.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png)
 
-目前PaddleClas开源的这三类的预训练模型一共有24个，其指标如图所示，从图中可以看出，在同样Flops和Params下，改进版的模型往往有更高的精度，但是推理速度往往不如ResNet系列。另一方面，Res2Net表现也较为优秀，相比ResNeXt中的group操作、SEResNet中的SE结构操作，Res2Net在相同Flops、Params和推理速度下往往精度更佳。
 
+目前PaddleClas开源的这三类的预训练模型一共有24个，其指标如图所示，从图中可以看出，在同样Flops和Params下，改进版的模型往往有更高的精度，但是推理速度往往不如ResNet系列。另一方面，Res2Net表现也较为优秀，相比ResNeXt中的group操作、SEResNet中的SE结构操作，Res2Net在相同Flops、Params和推理速度下往往精度更佳。
 
-**注意**：所有模型在预测时，图像的crop_size设置为224，resize_short_size设置为256。
 
 
 ## 精度、FLOPS和参数量
@@ -52,9 +54,9 @@ Res2Net是2019年提出的一种全新的对ResNet的改进方案，该方案可
 
 
 
-## FP32预测速度
+## 基于V100 GPU的预测速度
 
-| Models                 | Crop Size | Resize Short Size | Batch Size=1<br>(ms) |
+| Models                 | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-----------------------|-----------|-------------------|--------------------------|
 | Res2Net50_26w_4s      | 224       | 256               | 4.148                    |
 | Res2Net50_vd_26w_4s   | 224       | 256               | 4.172                    |
@@ -80,3 +82,33 @@ Res2Net是2019年提出的一种全新的对ResNet的改进方案，该方案可
 | SE_ResNeXt50_vd_32x4d | 224       | 256               | 9.011                    |
 | SE_ResNeXt101_32x4d   | 224       | 256               | 19.204                   |
 | SENet154_vd           | 224       | 256               | 50.406                   |
+
+
+## 基于T4 GPU的预测速度
+
+| Models                | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
+|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| Res2Net50_26w_4s      | 224       | 256               | 3.56067                      | 6.61827                      | 11.41566                     | 4.47188                      | 9.65722                      | 17.54535                     |
+| Res2Net50_vd_26w_4s   | 224       | 256               | 3.69221                      | 6.94419                      | 11.92441                     | 4.52712                      | 9.93247                      | 18.16928                     |
+| Res2Net50_14w_8s      | 224       | 256               | 4.45745                      | 7.69847                      | 12.30935                     | 5.4026                       | 10.60273                     | 18.01234                     |
+| Res2Net101_vd_26w_4s  | 224       | 256               | 6.53122                      | 10.81895                     | 18.94395                     | 8.08729                      | 17.31208                     | 31.95762                     |
+| Res2Net200_vd_26w_4s  | 224       | 256               | 11.66671                     | 18.93953                     | 33.19188                     | 14.67806                     | 32.35032                     | 63.65899                     |
+| ResNeXt50_32x4d       | 224       | 256               | 7.61087                      | 8.88918                      | 12.99674                     | 7.56327                      | 10.6134                      | 18.46915                     |
+| ResNeXt50_vd_32x4d    | 224       | 256               | 7.69065                      | 8.94014                      | 13.4088                      | 7.62044                      | 11.03385                     | 19.15339                     |
+| ResNeXt50_64x4d       | 224       | 256               | 13.78688                     | 15.84655                     | 21.79537                     | 13.80962                     | 18.4712                      | 33.49843                     |
+| ResNeXt50_vd_64x4d    | 224       | 256               | 13.79538                     | 15.22201                     | 22.27045                     | 13.94449                     | 18.88759                     | 34.28889                     |
+| ResNeXt101_32x4d      | 224       | 256               | 16.59777                     | 17.93153                     | 21.36541                     | 16.21503                     | 19.96568                     | 33.76831                     |
+| ResNeXt101_vd_32x4d   | 224       | 256               | 16.36909                     | 17.45681                     | 22.10216                     | 16.28103                     | 20.25611                     | 34.37152                     |
+| ResNeXt101_64x4d      | 224       | 256               | 30.12355                     | 32.46823                     | 38.41901                     | 30.4788                      | 36.29801                     | 68.85559                     |
+| ResNeXt101_vd_64x4d   | 224       | 256               | 30.34022                     | 32.27869                     | 38.72523                     | 30.40456                     | 36.77324                     | 69.66021                     |
+| ResNeXt152_32x4d      | 224       | 256               | 25.26417                     | 26.57001                     | 30.67834                     | 24.86299                     | 29.36764                     | 52.09426                     |
+| ResNeXt152_vd_32x4d   | 224       | 256               | 25.11196                     | 26.70515                     | 31.72636                     | 25.03258                     | 30.08987                     | 52.64429                     |
+| ResNeXt152_64x4d      | 224       | 256               | 46.58293                     | 48.34563                     | 56.97961                     | 46.7564                      | 56.34108                     | 106.11736                    |
+| ResNeXt152_vd_64x4d   | 224       | 256               | 47.68447                     | 48.91406                     | 57.29329                     | 47.18638                     | 57.16257                     | 107.26288                    |
+| SE_ResNet18_vd        | 224       | 256               | 1.61823                      | 3.1391                       | 4.60282                      | 1.7691                       | 4.19877                      | 7.5331                       |
+| SE_ResNet34_vd        | 224       | 256               | 2.67518                      | 5.04694                      | 7.18946                      | 2.88559                      | 7.03291                      | 12.73502                     |
+| SE_ResNet50_vd        | 224       | 256               | 3.65394                      | 7.568                        | 12.52793                     | 4.28393                      | 10.38846                     | 18.33154                     |
+| SE_ResNeXt50_32x4d    | 224       | 256               | 9.06957                      | 11.37898                     | 18.86282                     | 8.74121                      | 13.563                       | 23.01954                     |
+| SE_ResNeXt50_vd_32x4d | 224       | 256               | 9.25016                      | 11.85045                     | 25.57004                     | 9.17134                      | 14.76192                     | 19.914                       |
+| SE_ResNeXt101_32x4d   | 224       | 256               | 19.34455                     | 20.6104                      | 32.20432                     | 18.82604                     | 25.31814                     | 41.97758                     |
+| SENet154_vd           | 224       | 256               | 49.85733                     | 54.37267                     | 74.70447                     | 53.79794                     | 66.31684                     | 121.59885                    |
diff --git a/docs/zh_CN/models/models_intro.md b/docs/zh_CN/models/models_intro.md
index da653ae30194cfbd05baa4051998b0fe916b9035..935db13a158db7466986ec08ab09f8bb1a0750f9 100644
--- a/docs/zh_CN/models/models_intro.md
+++ b/docs/zh_CN/models/models_intro.md
@@ -23,7 +23,8 @@ python tools/infer/predict.py \
     --batch_size=1
 ```
 
-![](../../images/models/main_fps_top1.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png)
+
 ![](../../images/models/mobile_arm_top1.png)
 
 
@@ -193,7 +194,7 @@ python tools/infer/predict.py \
     - [VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.tar)
     - [VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.tar)
   - DarkNet系列<sup>[[21](#ref21)]</sup>([论文地址](https://arxiv.org/abs/1506.02640))
-    - [DarkNet53](https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_pretrained.tar)
+    - [DarkNet53](https://paddle-imagenet-models-name.bj.bcebos.com/DarkNet53_ImageNet1k_pretrained.tar)
   - ACNet系列<sup>[[22](#ref22)]</sup>([论文地址](https://arxiv.org/abs/1908.03930))
     - [ResNet50_ACNet_deploy](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_ACNet_deploy_pretrained.tar)
 
diff --git a/docs/zh_CN/update_history.md b/docs/zh_CN/update_history.md
index a3ba148b2be262e920e8a8f06c87c6b429b30413..b2ab286109354b9a75defc3ec5b29fbf85596349 100644
--- a/docs/zh_CN/update_history.md
+++ b/docs/zh_CN/update_history.md
@@ -1,3 +1,12 @@
 # 更新日志
 
-* 2020.04.10: 第一次提交
+* 2020.05.17
+    * 添加混合精度训练。
+
+* 2020.05.09
+    * 添加Paddle Serving使用文档。
+    * 添加Paddle-Lite使用文档。
+    * 添加T4 GPU的FP32/FP16预测速度benchmark。
+
+* 2020.04.10:
+    * 第一次提交。
diff --git a/ppcls/data/imaug/operators.py b/ppcls/data/imaug/operators.py
index 98e89ca8307d44ee156deaaeb0bf64ffbd6db58a..a8454740f4f6047f16104b32c5a06b6f5cc1e327 100644
--- a/ppcls/data/imaug/operators.py
+++ b/ppcls/data/imaug/operators.py
@@ -22,7 +22,6 @@ from __future__ import unicode_literals
 import six
 import math
 import random
-import functools
 import cv2
 import numpy as np
 
@@ -38,8 +37,8 @@ class DecodeImage(object):
 
     def __init__(self, to_rgb=True, to_np=False, channel_first=False):
         self.to_rgb = to_rgb
-        self.to_np = to_np  #to numpy
-        self.channel_first = channel_first  #only enabled when to_np is True
+        self.to_np = to_np  # to numpy
+        self.channel_first = channel_first  # only enabled when to_np is True
 
     def __call__(self, img):
         if six.PY2:
@@ -64,7 +63,8 @@ class DecodeImage(object):
 class ResizeImage(object):
     """ resize image """
 
-    def __init__(self, size=None, resize_short=None):
+    def __init__(self, size=None, resize_short=None, interpolation=-1):
+        self.interpolation = interpolation if interpolation >= 0 else None
         if resize_short is not None and resize_short > 0:
             self.resize_short = resize_short
             self.w = None
@@ -86,8 +86,10 @@ class ResizeImage(object):
         else:
             w = self.w
             h = self.h
-
-        return cv2.resize(img, (w, h))
+        if self.interpolation is None:
+            return cv2.resize(img, (w, h))
+        else:
+            return cv2.resize(img, (w, h), interpolation=self.interpolation)
 
 
 class CropImage(object):
@@ -138,8 +140,7 @@ class RandCropImage(object):
         scale_max = min(scale[1], bound)
         scale_min = min(scale[0], bound)
 
-        target_area = img_w * img_h * random.uniform(\
-            scale_min, scale_max)
+        target_area = img_w * img_h * random.uniform(scale_min, scale_max)
         target_size = math.sqrt(target_area)
         w = int(target_size * w)
         h = int(target_size * h)
@@ -176,7 +177,8 @@ class NormalizeImage(object):
     """
 
     def __init__(self, scale=None, mean=None, std=None, order='chw'):
-        if isinstance(scale, str): scale = eval(scale)
+        if isinstance(scale, str):
+            scale = eval(scale)
         self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
         mean = mean if mean is not None else [0.485, 0.456, 0.406]
         std = std if std is not None else [0.229, 0.224, 0.225]
diff --git a/ppcls/data/reader.py b/ppcls/data/reader.py
index 5bf83c2170319fa13c6c86cddc42b04ec48da7e7..20072db90923055818f2b148732e78915f4d73e0 100755
--- a/ppcls/data/reader.py
+++ b/ppcls/data/reader.py
@@ -1,27 +1,25 @@
-#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
 #
-#Licensed under the Apache License, Version 2.0 (the "License");
-#you may not use this file except in compliance with the License.
-#You may obtain a copy of the License at
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
 #
 #    http://www.apache.org/licenses/LICENSE-2.0
 #
-#Unless required by applicable law or agreed to in writing, software
-#distributed under the License is distributed on an "AS IS" BASIS,
-#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#See the License for the specific language governing permissions and
-#limitations under the License.
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
-import cv2
 import numpy as np
 import os
 import signal
-
+import imghdr
 import paddle
 
 from . import imaug
 from .imaug import transform
-from .imaug import MixupOperator
 from ppcls.utils import logger
 
 trainers_num = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
@@ -35,7 +33,7 @@ class ModeException(Exception):
 
     def __init__(self, message='', mode=''):
         message += "\nOnly the following 3 modes are supported: " \
-                "train, valid, test. Given mode is {}".format(mode)
+            "train, valid, test. Given mode is {}".format(mode)
         super(ModeException, self).__init__(message)
 
 
@@ -46,10 +44,10 @@ class SampleNumException(Exception):
 
     def __init__(self, message='', sample_num=0, batch_size=1):
         message += "\nError: The number of the whole data ({}) " \
-                "is smaller than the batch_size ({}), and drop_last " \
-                "is turnning on, so nothing  will feed in program, " \
-                "Terminated now. Please reset batch_size to a smaller " \
-                "number or feed more data!".format(sample_num, batch_size)
+            "is smaller than the batch_size ({}), and drop_last " \
+            "is turnning on, so nothing  will feed in program, " \
+            "Terminated now. Please reset batch_size to a smaller " \
+            "number or feed more data!".format(sample_num, batch_size)
         super(SampleNumException, self).__init__(message)
 
 
@@ -80,12 +78,12 @@ def check_params(params):
 
     data_dir = params.get('data_dir', '')
     assert os.path.isdir(data_dir), \
-            "{} doesn't exist, please check datadir path".format(data_dir)
+        "{} doesn't exist, please check datadir path".format(data_dir)
 
     if params['mode'] != 'test':
         file_list = params.get('file_list', '')
         assert os.path.isfile(file_list), \
-                "{} doesn't exist, please check file list path".format(file_list)
+            "{} doesn't exist, please check file list path".format(file_list)
 
 
 def create_file_list(params):
@@ -176,8 +174,8 @@ def partial_reader(params, full_lines, part_id=0, part_num=1):
         part_id(int): part index of the current partial data
         part_num(int): part num of the dataset
     """
-    assert part_id < part_num, ("part_num: {} should be larger " \
-            "than part_id: {}".format(part_num, part_id))
+    assert part_id < part_num, ("part_num: {} should be larger "
+                                "than part_id: {}".format(part_num, part_id))
 
     full_lines = full_lines[part_id::part_num]
 
@@ -187,8 +185,9 @@ def partial_reader(params, full_lines, part_id=0, part_num=1):
 
     def reader():
         ops = create_operators(params['transforms'])
+        delimiter = params.get('delimiter', ' ')
         for line in full_lines:
-            img_path, label = line.split()
+            img_path, label = line.split(delimiter)
             img_path = os.path.join(params['data_dir'], img_path)
             with open(img_path, 'rb') as f:
                 img = f.read()
@@ -220,7 +219,7 @@ def mp_reader(params):
 
 
 def term_mp(sig_num, frame):
-    """ kill all child processes 
+    """ kill all child processes
     """
     pid = os.getpid()
     pgid = os.getpgid(os.getpid())
diff --git a/ppcls/modeling/architectures/__init__.py b/ppcls/modeling/architectures/__init__.py
index 34be3ce2addc0a8320e4b3606e1d87f0873d2a47..b6b69a370b4bb56752715f6faed51501609609b5 100644
--- a/ppcls/modeling/architectures/__init__.py
+++ b/ppcls/modeling/architectures/__init__.py
@@ -36,7 +36,7 @@ from .densenet import DenseNet121, DenseNet161, DenseNet169, DenseNet201, DenseN
 from .squeezenet import SqueezeNet1_0, SqueezeNet1_1
 from .darknet import DarkNet53
 from .resnext101_wsl import ResNeXt101_32x8d_wsl, ResNeXt101_32x16d_wsl, ResNeXt101_32x32d_wsl, ResNeXt101_32x48d_wsl, Fix_ResNeXt101_32x48d_wsl
-from .efficientnet import EfficientNet, EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3, EfficientNetB4, EfficientNetB5, EfficientNetB6, EfficientNetB7
+from .efficientnet import EfficientNet, EfficientNetB0, EfficientNetB0_small, EfficientNetB1, EfficientNetB2, EfficientNetB3, EfficientNetB4, EfficientNetB5, EfficientNetB6, EfficientNetB7
 from .res2net import Res2Net50_48w_2s, Res2Net50_26w_4s, Res2Net50_14w_8s, Res2Net50_26w_6s, Res2Net50_26w_8s, Res2Net101_26w_4s, Res2Net152_26w_4s
 from .res2net_vd import Res2Net50_vd_48w_2s, Res2Net50_vd_26w_4s, Res2Net50_vd_14w_8s, Res2Net50_vd_26w_6s, Res2Net50_vd_26w_8s, Res2Net101_vd_26w_4s, Res2Net152_vd_26w_4s, Res2Net200_vd_26w_4s
 from .hrnet import HRNet_W18_C, HRNet_W30_C, HRNet_W32_C, HRNet_W40_C, HRNet_W44_C, HRNet_W48_C, HRNet_W60_C, HRNet_W64_C, SE_HRNet_W18_C, SE_HRNet_W30_C, SE_HRNet_W32_C, SE_HRNet_W40_C, SE_HRNet_W44_C, SE_HRNet_W48_C, SE_HRNet_W60_C, SE_HRNet_W64_C
diff --git a/ppcls/modeling/architectures/efficientnet.py b/ppcls/modeling/architectures/efficientnet.py
index 33135839d8ff3247b8f228946111c97ce1f31e68..d6bac79bd8674b6bcb315d512fc095577fc6c97a 100644
--- a/ppcls/modeling/architectures/efficientnet.py
+++ b/ppcls/modeling/architectures/efficientnet.py
@@ -288,7 +288,7 @@ class EfficientNet():
             name=conv_name,
             use_bias=use_bias)
 
-        if use_bn == False:
+        if use_bn is False:
             return conv
         else:
             bn_name = name + bn_name
@@ -350,8 +350,11 @@ class EfficientNet():
         conv = self._project_conv_norm(conv, block_args, is_test, name)
 
         # Skip connection and drop connect
-        input_filters, output_filters = block_args.input_filters, block_args.output_filters
-        if id_skip and block_args.stride == 1 and input_filters == output_filters:
+        input_filters = block_args.input_filters
+        output_filters = block_args.output_filters
+        if id_skip and \
+                block_args.stride == 1 and \
+                input_filters == output_filters:
             if drop_connect_rate:
                 conv = self._drop_connect(conv, drop_connect_rate,
                                           self.is_test)
@@ -416,7 +419,8 @@ class EfficientNet():
                 num_repeat=round_repeats(block_args.num_repeat,
                                          self._global_params))
 
-            # The first block needs to take care of stride and filter size increase.
+            # The first block needs to take care of stride,
+            # and filter size increase.
             drop_connect_rate = self._global_params.drop_connect_rate
             if drop_connect_rate:
                 drop_connect_rate *= float(idx) / block_size
@@ -444,7 +448,9 @@ class EfficientNet():
 
 
 class BlockDecoder(object):
-    """ Block Decoder for readability, straight from the official TensorFlow repository """
+    """
+    Block Decoder, straight from the official TensorFlow repository.
+    """
 
     @staticmethod
     def _decode_block_string(block_string):
@@ -460,9 +466,10 @@ class BlockDecoder(object):
                 options[key] = value
 
         # Check stride
-        assert (
-            ('s' in options and len(options['s']) == 1) or
-            (len(options['s']) == 2 and options['s'][0] == options['s'][1]))
+        cond_1 = ('s' in options and len(options['s']) == 1)
+        cond_2 = ((len(options['s']) == 2)
+                  and (options['s'][0] == options['s'][1]))
+        assert (cond_1 or cond_2)
 
         return BlockArgs(
             kernel_size=int(options['k']),
@@ -491,10 +498,11 @@ class BlockDecoder(object):
     @staticmethod
     def decode(string_list):
         """
-        Decodes a list of string notations to specify blocks inside the network.
+        Decode a list of string notations to specify blocks in the network.
 
-        :param string_list: a list of strings, each string is a notation of block
-        :return: a list of BlockArgs namedtuples of block args
+        string_list: list of strings, each string is a notation of block
+        return
+            list of BlockArgs namedtuples of block args
         """
         assert isinstance(string_list, list)
         blocks_args = []
@@ -529,6 +537,19 @@ def EfficientNetB0(is_test=False,
     return model
 
 
+def EfficientNetB0_small(is_test=False,
+                         padding_type='DYNAMIC',
+                         override_params=None,
+                         use_se=False):
+    model = EfficientNet(
+        name='b0',
+        is_test=is_test,
+        padding_type=padding_type,
+        override_params=override_params,
+        use_se=use_se)
+    return model
+
+
 def EfficientNetB1(is_test=False,
                    padding_type='SAME',
                    override_params=None,
diff --git a/ppcls/utils/config.py b/ppcls/utils/config.py
index b1c1be4ef616d504fce196e51d3ea5316e5b8c67..93b11569e7490ba1c7d3576e254dbfabf29f9344 100644
--- a/ppcls/utils/config.py
+++ b/ppcls/utils/config.py
@@ -64,14 +64,14 @@ def print_dict(d, delimiter=0):
     placeholder = "-" * 60
     for k, v in sorted(d.items()):
         if isinstance(v, dict):
-            logger.info("{}{} : ".format(delimiter * " ", k))
+            logger.info("{}{} : ".format(delimiter * " ", logger.coloring(k, "HEADER")))
             print_dict(v, delimiter + 4)
         elif isinstance(v, list) and len(v) >= 1 and isinstance(v[0], dict):
-            logger.info("{}{} : ".format(delimiter * " ", k))
+            logger.info("{}{} : ".format(delimiter * " ", logger.coloring(str(k),"HEADER")))
             for value in v:
                 print_dict(value, delimiter + 4)
         else:
-            logger.info("{}{} : {}".format(delimiter * " ", k, v))
+            logger.info("{}{} : {}".format(delimiter * " ", logger.coloring(k,"HEADER"), logger.coloring(v,"OKGREEN")))
 
         if k.isupper():
             logger.info(placeholder)
diff --git a/ppcls/utils/logger.py b/ppcls/utils/logger.py
index 5b4ae2ca0b12cad7b68a4f76603ae8bc4fd60d25..5b192c61b46d82cc99e88269c5118c3e9e182b66 100644
--- a/ppcls/utils/logger.py
+++ b/ppcls/utils/logger.py
@@ -14,15 +14,48 @@
 
 import logging
 import os
+import datetime
 
-logging.basicConfig(level=logging.INFO)
+from imp import reload
+reload(logging)
+
+logging.basicConfig(level=logging.INFO, 
+                    format="%(asctime)s %(levelname)s: %(message)s",
+                    datefmt = "%Y-%m-%d %H:%M:%S")
+
+
+def time_zone(sec, fmt):
+    real_time = datetime.datetime.now() + datetime.timedelta(hours=8)
+    return real_time.timetuple()
+
+
+logging.Formatter.converter = time_zone
 _logger = logging.getLogger(__name__)
 
 
+Color= {
+        'RED' : '\033[31m' ,
+        'HEADER' : '\033[35m' , # deep purple
+        'PURPLE' : '\033[95m' ,# purple
+        'OKBLUE' : '\033[94m' ,
+        'OKGREEN' : '\033[92m' ,
+        'WARNING' : '\033[93m' ,
+        'FAIL' : '\033[91m' ,
+        'ENDC' : '\033[0m' }
+
+
+def coloring(message, color="OKGREEN"):
+    assert color in Color.keys()
+    if os.environ.get('PADDLECLAS_COLORING', False):
+        return Color[color]+str(message)+Color["ENDC"]
+    else:
+        return message
+
+
 def anti_fleet(log):
     """
-    Because of the fucking Fleet, logs will print multi-times.
-    So we only display one of them and ignore the others.
+    logs will print multi-times when calling Fleet API.
+    Only display single log and ignore the others.
     """
 
     def wrapper(fmt, *args):
@@ -39,12 +72,12 @@ def info(fmt, *args):
 
 @anti_fleet
 def warning(fmt, *args):
-    _logger.warning(fmt, *args)
+    _logger.warning(coloring(fmt, "RED"), *args)
 
 
 @anti_fleet
 def error(fmt, *args):
-    _logger.error(fmt, *args)
+    _logger.error(coloring(fmt, "FAIL"), *args)
 
 
 def advertise():
@@ -66,7 +99,7 @@ def advertise():
     website = "https://github.com/PaddlePaddle/PaddleClas"
     AD_LEN = 6 + len(max([copyright, ad, website], key=len))
 
-    info("\n{0}\n{1}\n{2}\n{3}\n{4}\n{5}\n{6}\n{7}\n".format(
+    info(coloring("\n{0}\n{1}\n{2}\n{3}\n{4}\n{5}\n{6}\n{7}\n".format(
         "=" * (AD_LEN + 4),
         "=={}==".format(copyright.center(AD_LEN)),
         "=" * (AD_LEN + 4),
@@ -74,4 +107,4 @@ def advertise():
         "=={}==".format(ad.center(AD_LEN)),
         "=={}==".format(' ' * AD_LEN),
         "=={}==".format(website.center(AD_LEN)),
-        "=" * (AD_LEN + 4), ))
+        "=" * (AD_LEN + 4), ),"RED"))
diff --git a/ppcls/utils/model_zoo.py b/ppcls/utils/model_zoo.py
index 8a154a90de99432c44ec004c08f0b5cb2a61d26e..dd65e921f3e258b75c911599ee234359f3fd7c51 100644
--- a/ppcls/utils/model_zoo.py
+++ b/ppcls/utils/model_zoo.py
@@ -24,7 +24,6 @@ import tqdm
 import zipfile
 
 from ppcls.modeling import similar_architectures
-from ppcls.utils.check import check_architecture
 from ppcls.utils import logger
 
 __all__ = ['get']
@@ -168,11 +167,16 @@ def _decompress(fname):
     os.remove(fname)
 
 
+def _get_pretrained():
+    with open('./ppcls/utils/pretrained.list') as flist:
+        pretrained = [line.strip() for line in flist]
+    return pretrained
+
+
 def _check_pretrained_name(architecture):
     assert isinstance(architecture, str), \
-            ("the type of architecture({}) should be str". format(architecture))
-    with open('./configs/pretrained.list') as flist:
-        pretrained = [line.strip() for line in flist]
+        ("the type of architecture({}) should be str". format(architecture))
+    pretrained = _get_pretrained()
     similar_names = similar_architectures(architecture, pretrained)
     model_list = ', '.join(similar_names)
     err = "{} is not exist! Maybe you want: [{}]" \
@@ -181,6 +185,14 @@ def _check_pretrained_name(architecture):
         raise ModelNameError(err)
 
 
+def list_models():
+    pretrained = _get_pretrained()
+    msg = "All avialable pretrained models are as follows: {}".format(
+        pretrained)
+    logger.info(msg)
+    return
+
+
 def get(architecture, path, decompress=True):
     """
     Get the pretrained model.
@@ -188,5 +200,6 @@ def get(architecture, path, decompress=True):
     _check_pretrained_name(architecture)
     url = _get_url(architecture)
     fname = _download(url, path)
-    if decompress: _decompress(fname)
+    if decompress:
+        _decompress(fname)
     logger.info("download {} finished ".format(fname))
diff --git a/configs/pretrained.list b/ppcls/utils/pretrained.list
similarity index 98%
rename from configs/pretrained.list
rename to ppcls/utils/pretrained.list
index b01272ea2bc7ed0d9587f486001a1bdb3d7697d9..633cafd921d7390f434b2b5f82dad70129349658 100644
--- a/configs/pretrained.list
+++ b/ppcls/utils/pretrained.list
@@ -114,5 +114,5 @@ VGG11
 VGG13
 VGG16
 VGG19
-DarkNet53
+DarkNet53_ImageNet1k
 ResNet50_ACNet_deploy
diff --git a/ppcls/utils/save_load.py b/ppcls/utils/save_load.py
index 27c680432b878799ef7551f4aaabeb3f077a383d..e310166ece236cadffa0fc28d3bef6558b4c43e4 100644
--- a/ppcls/utils/save_load.py
+++ b/ppcls/utils/save_load.py
@@ -74,7 +74,7 @@ def load_params(exe, prog, path, ignore_params=None):
         raise ValueError("Model pretrain path {} does not "
                          "exists.".format(path))
 
-    logger.info('Loading parameters from {}...'.format(path))
+    logger.info(logger.coloring('Loading parameters from {}...'.format(path), 'HEADER'))
 
     ignore_set = set()
     state = _load_state(path)
@@ -100,7 +100,7 @@ def load_params(exe, prog, path, ignore_params=None):
     if len(ignore_set) > 0:
         for k in ignore_set:
             if k in state:
-                logger.warning('variable {} not used'.format(k))
+                logger.warning('variable {} is already excluded automatically'.format(k))
                 del state[k]
 
     fluid.io.set_program_state(prog, state)
@@ -113,7 +113,7 @@ def init_model(config, program, exe):
     checkpoints = config.get('checkpoints')
     if checkpoints:
         fluid.load(program, checkpoints, exe)
-        logger.info("Finish initing model from {}".format(checkpoints))
+        logger.info(logger.coloring("Finish initing model from {}".format(checkpoints),"HEADER"))
         return
 
     pretrained_model = config.get('pretrained_model')
@@ -122,7 +122,7 @@ def init_model(config, program, exe):
             pretrained_model = [pretrained_model]
         for pretrain in pretrained_model:
             load_params(exe, program, pretrain)
-        logger.info("Finish initing model from {}".format(pretrained_model))
+        logger.info(logger.coloring("Finish initing model from {}".format(pretrained_model),"HEADER"))
 
 
 def save_model(program, model_path, epoch_id, prefix='ppcls'):
@@ -133,4 +133,4 @@ def save_model(program, model_path, epoch_id, prefix='ppcls'):
     _mkdir_if_not_exist(model_path)
     model_prefix = os.path.join(model_path, prefix)
     fluid.save(program, model_prefix)
-    logger.info("Already save model in {}".format(model_path))
+    logger.info(logger.coloring("Already save model in {}".format(model_path),"HEADER"))
diff --git a/tools/download.py b/tools/download.py
index 157bebccdaa26bbb40a46a5bf065b4e092527af7..d9fe1a8ee04a14b31cf4917e60f9348ae51b8d20 100644
--- a/tools/download.py
+++ b/tools/download.py
@@ -1,18 +1,17 @@
 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
 #
-#Licensed under the Apache License, Version 2.0 (the "License");
-#you may not use this file except in compliance with the License.
-#You may obtain a copy of the License at
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
 #
 #    http://www.apache.org/licenses/LICENSE-2.0
 #
-#Unless required by applicable law or agreed to in writing, software
-#distributed under the License is distributed on an "AS IS" BASIS,
-#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#See the License for the specific language governing permissions and
-#limitations under the License.
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
-import sys
 import argparse
 
 from ppcls import model_zoo
@@ -26,6 +25,7 @@ def parse_args():
     parser.add_argument('-a', '--architecture', type=str, default='ResNet50')
     parser.add_argument('-p', '--path', type=str, default='./pretrained/')
     parser.add_argument('-d', '--decompress', type=str2bool, default=True)
+    parser.add_argument('-l', '--list', type=str2bool, default=False)
 
     args = parser.parse_args()
     return args
@@ -33,7 +33,10 @@ def parse_args():
 
 def main():
     args = parse_args()
-    model_zoo.get(args.architecture, args.path, args.decompress)
+    if args.list:
+        model_zoo.list_models()
+    else:
+        model_zoo.get(args.architecture, args.path, args.decompress)
 
 
 if __name__ == '__main__':
diff --git a/tools/export_model.py b/tools/export_model.py
index 2cc32b3e52242652e9dcb53a1ea94c6b90ca80f9..763ce473c45d60ba577d65782f5d4737f38a10bc 100644
--- a/tools/export_model.py
+++ b/tools/export_model.py
@@ -13,7 +13,6 @@
 # limitations under the License.
 
 import argparse
-import numpy as np
 
 from ppcls.modeling import architectures
 import paddle.fluid as fluid
@@ -25,13 +24,14 @@ def parse_args():
     parser.add_argument("-p", "--pretrained_model", type=str)
     parser.add_argument("-o", "--output_path", type=str)
     parser.add_argument("--class_dim", type=int, default=1000)
+    parser.add_argument("--img_size", type=int, default=224)
 
     return parser.parse_args()
 
 
-def create_input():
+def create_input(img_size=224):
     image = fluid.data(
-        name='image', shape=[None, 3, 224, 224], dtype='float32')
+        name='image', shape=[None, 3, img_size, img_size], dtype='float32')
     return image
 
 
@@ -57,7 +57,7 @@ def main():
 
     with fluid.program_guard(infer_prog, startup_prog):
         with fluid.unique_name.guard():
-            image = create_input()
+            image = create_input(args.img_size)
             out = create_model(args, model, image, class_dim=args.class_dim)
 
     infer_prog = infer_prog.clone(for_test=True)
diff --git a/tools/export_serving_model.py b/tools/export_serving_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6e4472cdbf8dfd1738dede98b5aa61121f8191a
--- /dev/null
+++ b/tools/export_serving_model.py
@@ -0,0 +1,76 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+from ppcls.modeling import architectures
+
+import paddle.fluid as fluid
+import paddle_serving_client.io as serving_io
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-m", "--model", type=str)
+    parser.add_argument("-p", "--pretrained_model", type=str)
+    parser.add_argument("-o", "--output_path", type=str, default="")
+    parser.add_argument("--class_dim", type=int, default=1000)
+    parser.add_argument("--img_size", type=int, default=224)
+
+    return parser.parse_args()
+
+
+def create_input(img_size=224):
+    image = fluid.data(
+        name='image', shape=[None, 3, img_size, img_size], dtype='float32')
+    return image
+
+
+def create_model(args, model, input, class_dim=1000):
+    if args.model == "GoogLeNet":
+        out, _, _ = model.net(input=input, class_dim=class_dim)
+    else:
+        out = model.net(input=input, class_dim=class_dim)
+        out = fluid.layers.softmax(out)
+    return out
+
+
+def main():
+    args = parse_args()
+
+    model = architectures.__dict__[args.model]()
+
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    startup_prog = fluid.Program()
+    infer_prog = fluid.Program()
+
+    with fluid.program_guard(infer_prog, startup_prog):
+        with fluid.unique_name.guard():
+            image = create_input(args.img_size)
+            out = create_model(args, model, image, class_dim=args.class_dim)
+
+    infer_prog = infer_prog.clone(for_test=True)
+    fluid.load(
+        program=infer_prog, model_path=args.pretrained_model, executor=exe)
+
+    model_path = os.path.join(args.output_path, "ppcls_model")
+    conf_path = os.path.join(args.output_path, "ppcls_client_conf")
+    serving_io.save_model(model_path, conf_path, {"image": image},
+                          {"prediction": out}, infer_prog)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tools/lite/benchmark.sh b/tools/lite/benchmark.sh
new file mode 100644
index 0000000000000000000000000000000000000000..591331e42a4ff7dc184a9873b3aaf05889d2aa02
--- /dev/null
+++ b/tools/lite/benchmark.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+# ref1: https://github.com/PaddlePaddle/Paddle-Lite/blob/58b2d7dd89/lite/api/benchmark.cc
+# ref2: https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark.sh
+
+set -e
+
+# Check input
+if [ $# -lt  3 ];
+then
+    echo "Input error"
+    echo "Usage:"
+    echo "  sh benchmark.sh <benchmark_bin_path> <benchmark_models_path> <result_filename>"
+    echo "  sh benchmark.sh <benchmark_bin_path> <benchmark_models_path> <result_filename> <is_run_model_optimize: [true|false]>"
+    exit
+fi
+
+# Set benchmark params
+ANDROID_DIR=/data/local/tmp
+BENCHMARK_BIN=$1
+MODELS_DIR=$2
+RESULT_FILENAME=$3
+
+WARMUP=10
+REPEATS=30
+IS_RUN_MODEL_OPTIMIZE=false
+IS_RUN_QUANTIZED_MODEL=false
+NUM_THREADS_LIST=(1 2 4)
+MODELS_LIST=$(ls $MODELS_DIR)
+
+# Check input
+if [ $# -gt  3 ];
+then
+    IS_RUN_MODEL_OPTIMIZE=$4
+fi
+
+# Adb push benchmark_bin, models
+adb push $BENCHMARK_BIN $ANDROID_DIR/benchmark_bin
+adb shell chmod +x $ANDROID_DIR/benchmark_bin
+adb push $MODELS_DIR $ANDROID_DIR
+
+# Run benchmark
+adb shell "echo 'PaddleLite Benchmark' > $ANDROID_DIR/$RESULT_FILENAME"
+for threads in ${NUM_THREADS_LIST[@]}; do
+    adb shell "echo Threads=$threads Warmup=$WARMUP Repeats=$REPEATS >> $ANDROID_DIR/$RESULT_FILENAME"
+    for model_name in ${MODELS_LIST[@]}; do
+      echo "Model=$model_name Threads=$threads"
+      if [ "$IS_RUN_MODEL_OPTIMIZE" = true ];
+      then
+          adb shell "$ANDROID_DIR/benchmark_bin \
+                   --model_dir=$ANDROID_DIR/${MODELS_DIR}/$model_name \
+                   --model_filename=model \
+                   --param_filename=params \
+                   --warmup=$WARMUP \
+                   --repeats=$REPEATS \
+                   --threads=$threads \
+                   --result_filename=$ANDROID_DIR/$RESULT_FILENAME"
+      else
+          adb shell "$ANDROID_DIR/benchmark_bin \
+                   --optimized_model_path=$ANDROID_DIR/${MODELS_DIR}/$model_name \
+                   --warmup=$WARMUP \
+                   --repeats=$REPEATS \
+                   --threads=$threads \
+                   --result_filename=$ANDROID_DIR/$RESULT_FILENAME"
+      fi
+    done
+    adb shell "echo >> $ANDROID_DIR/$RESULT_FILENAME"
+done
+
+# Adb pull benchmark result, show result
+adb pull $ANDROID_DIR/$RESULT_FILENAME .
+echo "\n--------------------------------------"
+cat $RESULT_FILENAME
+echo "--------------------------------------"
diff --git a/tools/program.py b/tools/program.py
index 50c609de17186b234f8ce3632a94cf7f59454dcd..b73f1064284a7f1929f99d892bee43798a5d52fb 100644
--- a/tools/program.py
+++ b/tools/program.py
@@ -297,6 +297,19 @@ def dist_optimizer(config, optimizer):
     return optimizer
 
 
+def mixed_precision_optimizer(config, optimizer):
+    use_fp16 = config.get('use_fp16', False)
+    amp_scale_loss = config.get('amp_scale_loss', 1.0)
+    use_dynamic_loss_scaling = config.get('use_dynamic_loss_scaling', False)
+    if use_fp16:
+        optimizer = fluid.contrib.mixed_precision.decorate(
+            optimizer,
+            init_loss_scaling=amp_scale_loss,
+            use_dynamic_loss_scaling=use_dynamic_loss_scaling)
+
+    return optimizer
+
+
 def build(config, main_prog, startup_prog, is_train=True):
     """
     Build a program using a model and an optimizer
@@ -337,6 +350,8 @@ def build(config, main_prog, startup_prog, is_train=True):
                 optimizer = create_optimizer(config)
                 lr = optimizer._global_learning_rate()
                 fetchs['lr'] = (lr, AverageMeter('lr', 'f', need_avg=False))
+
+                optimizer = mixed_precision_optimizer(config, optimizer)
                 optimizer = dist_optimizer(config, optimizer)
                 optimizer.minimize(fetchs['loss'][0])
 
@@ -396,19 +411,30 @@ def run(dataloader, exe, program, fetchs, epoch=0, mode='train'):
         for i, m in enumerate(metrics):
             metric_list[i].update(m[0], len(batch[0]))
         fetchs_str = ''.join([str(m.value) + ' '
-                              for m in metric_list] + [batch_time.value])
+                              for m in metric_list] + [batch_time.value]) + 's'
         if mode == 'eval':
             logger.info("{:s} step:{:<4d} {:s}s".format(mode, idx, fetchs_str))
         else:
-            logger.info("epoch:{:<3d} {:s} step:{:<4d} {:s}s".format(
-                epoch, mode, idx, fetchs_str))
+            epoch_str = "epoch:{:<3d}".format(epoch)
+            step_str = "{:s} step:{:<4d}".format(mode, idx)
+
+            logger.info("{:s} {:s} {:s}".format(
+                logger.coloring(epoch_str, "HEADER")
+                if idx == 0 else epoch_str,
+                logger.coloring(step_str, "PURPLE"),
+                logger.coloring(fetchs_str, 'OKGREEN')))
 
     end_str = ''.join([str(m.mean) + ' '
-                       for m in metric_list] + [batch_time.total])
+                       for m in metric_list] + [batch_time.total]) + 's'
     if mode == 'eval':
         logger.info("END {:s} {:s}s".format(mode, end_str))
     else:
-        logger.info("END epoch:{:<3d} {:s} {:s}s".format(epoch, mode, end_str))
+        end_epoch_str = "END epoch:{:<3d}".format(epoch)
+
+        logger.info("{:s} {:s} {:s}".format(
+            logger.coloring(end_epoch_str, "RED"),
+            logger.coloring(mode, "PURPLE"),
+            logger.coloring(end_str, "OKGREEN")))
 
     # return top1_acc in order to save the best model
     if mode == 'valid':
diff --git a/tools/serving/image_http_client.py b/tools/serving/image_http_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..3b92091c659613c83e4423a3f22b0d4d20321f43
--- /dev/null
+++ b/tools/serving/image_http_client.py
@@ -0,0 +1,47 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import requests
+import base64
+import json
+import sys
+import numpy as np
+
+py_version = sys.version_info[0]
+
+
+def predict(image_path, server):
+    if py_version == 2:
+        image = base64.b64encode(open(image_path).read())
+    else:
+        image = base64.b64encode(open(image_path, "rb").read()).decode("utf-8")
+    req = json.dumps({"feed": [{"image": image}], "fetch": ["prediction"]})
+    r = requests.post(
+        server, data=req, headers={"Content-Type": "application/json"})
+    try:
+        pred = r.json()["result"]["prediction"][0]
+        cls_id = np.argmax(pred)
+        score = pred[cls_id]
+        pred = {"cls_id": cls_id, "score": score}
+        return pred
+    except ValueError:
+        print(r.text)
+    return r
+
+
+if __name__ == "__main__":
+    server = "http://127.0.0.1:{}/image/prediction".format(sys.argv[1])
+    image_file = sys.argv[2]
+    res = predict(image_file, server)
+    print("res:", res)
diff --git a/tools/serving/image_service_cpu.py b/tools/serving/image_service_cpu.py
new file mode 100644
index 0000000000000000000000000000000000000000..92f67d3220670ffa880ff663ed887984325a0723
--- /dev/null
+++ b/tools/serving/image_service_cpu.py
@@ -0,0 +1,60 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+import base64
+from paddle_serving_server.web_service import WebService
+import utils
+
+
+class ImageService(WebService):
+    def __init__(self, name):
+        super(ImageService, self).__init__(name=name)
+        self.operators = self.create_operators()
+
+    def create_operators(self):
+        size = 224
+        img_mean = [0.485, 0.456, 0.406]
+        img_std = [0.229, 0.224, 0.225]
+        img_scale = 1.0 / 255.0
+        decode_op = utils.DecodeImage()
+        resize_op = utils.ResizeImage(resize_short=256)
+        crop_op = utils.CropImage(size=(size, size))
+        normalize_op = utils.NormalizeImage(
+            scale=img_scale, mean=img_mean, std=img_std)
+        totensor_op = utils.ToTensor()
+        return [decode_op, resize_op, crop_op, normalize_op, totensor_op]
+
+    def _process_image(self, data, ops):
+        for op in ops:
+            data = op(data)
+        return data
+
+    def preprocess(self, feed={}, fetch=[]):
+        feed_batch = []
+        for ins in feed:
+            if "image" not in ins:
+                raise ("feed data error!")
+            sample = base64.b64decode(ins["image"])
+            img = self._process_image(sample, self.operators)
+            feed_batch.append({"image": img})
+        return feed_batch, fetch
+
+
+image_service = ImageService(name="image")
+image_service.load_model_config(sys.argv[1])
+image_service.prepare_server(
+    workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
+image_service.run_server()
+image_service.run_flask()
diff --git a/tools/serving/image_service_gpu.py b/tools/serving/image_service_gpu.py
new file mode 100644
index 0000000000000000000000000000000000000000..df61cdd60659713ac77176beb8c1ecfad1c8efd8
--- /dev/null
+++ b/tools/serving/image_service_gpu.py
@@ -0,0 +1,62 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+import base64
+from paddle_serving_server_gpu.web_service import WebService
+
+import utils
+
+
+class ImageService(WebService):
+    def __init__(self, name):
+        super(ImageService, self).__init__(name=name)
+        self.operators = self.create_operators()
+
+    def create_operators(self):
+        size = 224
+        img_mean = [0.485, 0.456, 0.406]
+        img_std = [0.229, 0.224, 0.225]
+        img_scale = 1.0 / 255.0
+        decode_op = utils.DecodeImage()
+        resize_op = utils.ResizeImage(resize_short=256)
+        crop_op = utils.CropImage(size=(size, size))
+        normalize_op = utils.NormalizeImage(
+            scale=img_scale, mean=img_mean, std=img_std)
+        totensor_op = utils.ToTensor()
+        return [decode_op, resize_op, crop_op, normalize_op, totensor_op]
+
+    def _process_image(self, data, ops):
+        for op in ops:
+            data = op(data)
+        return data
+
+    def preprocess(self, feed={}, fetch=[]):
+        feed_batch = []
+        for ins in feed:
+            if "image" not in ins:
+                raise ("feed data error!")
+            sample = base64.b64decode(ins["image"])
+            img = self._process_image(sample, self.operators)
+            feed_batch.append({"image": img})
+        return feed_batch, fetch
+
+
+image_service = ImageService(name="image")
+image_service.load_model_config(sys.argv[1])
+image_service.set_gpus("0")
+image_service.prepare_server(
+    workdir=sys.argv[2], port=int(sys.argv[3]), device="gpu")
+image_service.run_server()
+image_service.run_flask()
diff --git a/tools/serving/utils.py b/tools/serving/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..6c4a75e1afe2fc1e1710c7e8213f8ac4de8ffcc2
--- /dev/null
+++ b/tools/serving/utils.py
@@ -0,0 +1,84 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import numpy as np
+
+
+class DecodeImage(object):
+    def __init__(self, to_rgb=True):
+        self.to_rgb = to_rgb
+
+    def __call__(self, img):
+        data = np.frombuffer(img, dtype='uint8')
+        img = cv2.imdecode(data, 1)
+        if self.to_rgb:
+            assert img.shape[2] == 3, 'invalid shape of image[%s]' % (
+                img.shape)
+            img = img[:, :, ::-1]
+
+        return img
+
+
+class ResizeImage(object):
+    def __init__(self, resize_short=None):
+        self.resize_short = resize_short
+
+    def __call__(self, img):
+        img_h, img_w = img.shape[:2]
+        percent = float(self.resize_short) / min(img_w, img_h)
+        w = int(round(img_w * percent))
+        h = int(round(img_h * percent))
+        return cv2.resize(img, (w, h))
+
+
+class CropImage(object):
+    def __init__(self, size):
+        if type(size) is int:
+            self.size = (size, size)
+        else:
+            self.size = size
+
+    def __call__(self, img):
+        w, h = self.size
+        img_h, img_w = img.shape[:2]
+        w_start = (img_w - w) // 2
+        h_start = (img_h - h) // 2
+
+        w_end = w_start + w
+        h_end = h_start + h
+        return img[h_start:h_end, w_start:w_end, :]
+
+
+class NormalizeImage(object):
+    def __init__(self, scale=None, mean=None, std=None):
+        self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+        mean = mean if mean is not None else [0.485, 0.456, 0.406]
+        std = std if std is not None else [0.229, 0.224, 0.225]
+
+        shape = (1, 1, 3)
+        self.mean = np.array(mean).reshape(shape).astype('float32')
+        self.std = np.array(std).reshape(shape).astype('float32')
+
+    def __call__(self, img):
+        return (img.astype('float32') * self.scale - self.mean) / self.std
+
+
+class ToTensor(object):
+    def __init__(self):
+        pass
+
+    def __call__(self, img):
+        img = img.transpose((2, 0, 1))
+        return img
diff --git a/tools/train.py b/tools/train.py
index ab7752fd17c6621c605365d5529bf20d91837c89..cd5b7d25b499751ccc9289fe96139e630b09fee7 100644
--- a/tools/train.py
+++ b/tools/train.py
@@ -62,7 +62,7 @@ def main(args):
     startup_prog = fluid.Program()
     train_prog = fluid.Program()
 
-    best_top1_acc_list = (0.0, -1)  # (top1_acc, epoch_id)
+    best_top1_acc = 0.0  # best top1 acc record
 
     train_dataloader, train_fetchs = program.build(
         config, train_prog, startup_prog, is_train=True)
@@ -101,13 +101,15 @@ def main(args):
                 top1_acc = program.run(valid_dataloader, exe,
                                        compiled_valid_prog, valid_fetchs,
                                        epoch_id, 'valid')
-                if top1_acc > best_top1_acc_list[0]:
-                    best_top1_acc_list = (top1_acc, epoch_id)
-                    logger.info("Best top1 acc: {}, in epoch: {}".format(
-                        *best_top1_acc_list))
-                    model_path = os.path.join(config.model_save_dir,
+                if top1_acc > best_top1_acc:
+                    best_top1_acc = top1_acc
+                    message = "The best top1 acc {:.5f}, in epoch: {:d}".format(best_top1_acc, epoch_id)
+                    logger.info("{:s}".format(logger.coloring(message, "RED")))
+                    if epoch_id % config.save_interval==0:
+
+                        model_path = os.path.join(config.model_save_dir,
                                               config.ARCHITECTURE["name"])
-                    save_model(train_prog, model_path, "best_model")
+                        save_model(train_prog, model_path, "best_model_in_epoch_"+str(epoch_id))
 
             # 3. save the persistable model
             if epoch_id % config.save_interval == 0: