polish code and add multi-card validate and infer in image classification (#4042)

15388cb6 · ruri · GitHub · 775741f3 · 15388cb6 · 15388cb6
9 changed file
--- a/PaddleCV/image_classification/README.md
+++ b/PaddleCV/image_classification/README.md
@@ -14,6 +14,7 @@
 - [进阶使用](#进阶使用)
    - [Mixup训练](#mixup训练)
    - [混合精度训练](#混合精度训练)
+    - [性能分析](#性能分析)
    - [DALI预处理](#DALI预处理)
    - [自定义数据集](#自定义数据集)
 - [已发布模型及其性能](#已发布模型及其性能)
@@ -129,7 +130,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch train.py \
 * **model**: 模型名称， 默认值: "ResNet50"
 * **total_images**: 图片数，ImageNet2012，默认值: 1281167
 * **class_dim**: 类别数，默认值: 1000
-* **image_shape**: 图片大小，默认值: 3 224 224
+* **image_shape**: 图片大小，默认值: [3,224,224]
 * **num_epochs**: 训练回合数，默认值: 120
 * **batch_size**: batch size大小(所有设备)，默认值: 8
 * **test_batch_size**: 测试batch大小，默认值：16
@@ -159,25 +160,37 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch train.py \

 一些开关：

+* **validate**: 是否在模型训练过程中启动模型测试，默认值: True
 * **use_gpu**: 是否在GPU上运行，默认值: True
 * **use_label_smoothing**: 是否对数据进行label smoothing处理，默认值: False
 * **label_smoothing_epsilon**: label_smoothing的epsilon， 默认值:0.1
-* **random_seed**: 随机数种子， 默认值: 1000
 * **padding_type**: efficientNet中卷积操作的padding方式, 默认值: "SAME".
 * **use_se**: efficientNet中是否使用Squeeze-and-Excitation模块, 默认值: True.
 * **use_ema**: 是否在更新模型参数时使用ExponentialMovingAverage. 默认值: False.
 * **ema_decay**: ExponentialMovingAverage的decay rate. 默认值: 0.9999.

+
+性能分析：
+
+* **enable_ce**: 是否开启CE，默认值: False
+* **random_seed**: 随机数种子，当设置数值后，所有随机化会被固定，默认值: None
+* **is_profiler**: 是否开启性能分析，默认值: 0
+* **profilier_path**: 分析文件保存位置，默认值: 'profiler_path/'
+* **max_iter**: 最大训练batch数，默认值: 0
+* **same_feed**: 是否feed相同数据进入网络，设定具体数值来指定数据数量，默认值：0
+
+
 **数据读取器说明：** 数据读取器定义在```reader.py```文件中，现在默认基于cv2的数据读取器， 在[训练阶段](#模型训练)，默认采用的增广方式是随机裁剪与水平翻转， 而在[模型评估](#模型评估)与[模型预测](#模型预测)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有：

 * 旋转
-* 颜色抖动（暂未实现）
+* 颜色抖动
 * 随机裁剪
 * 中心裁剪
 * 长宽调整
 * 水平翻转
 * 自动增广

+
 ### 参数微调

 参数微调(Finetune)是指在特定任务上微调已训练模型的参数。可以下载[已发布模型及其性能](#已发布模型及其性能)并且设置```path_to_pretrain_model```为模型所在路径，微调一个模型可以采用如下的命令：
@@ -193,6 +206,10 @@ python train.py \

 模型评估(Eval)是指对训练完毕的模型评估各类性能指标。可以下载[已发布模型及其性能](#已发布模型及其性能)并且设置```path_to_pretrain_model```为模型所在路径。运行如下的命令，可以获得模型top-1/top-5精度:

+**参数说明**
+
+* **save_json_path**: 是否将eval结果保存到json文件中，默认值：None
+
 ```bash
 python eval.py \
       --model=model_name \
@@ -220,18 +237,21 @@ python eval.py \

 **参数说明：**

-* **save_inference**: 是否保存模型，默认值：False
+* **save_inference**: 是否保存二进制模型，默认值：False
 * **topk**: 按照置信由高到低排序标签结果，返回的结果数量，默认值：1
-* **label_path**: 可读标签文件路径，默认值："./utils/tools/readable_label.txt"
+* **class_map_path**: 可读标签文件路径，默认值："./utils/tools/readable_label.txt"
+* **image_path**: 指定单文件进行预测，默认值：None
+* **save_json_path**: 将预测结果保存到json文件中，默认值: None

 ```bash
 python infer.py \
       --model=model_name \
       --pretrained_model=${path_to_pretrain_model}
+       --image_path=${path_to_single_image}
 ```
 注意：根据具体模型和任务添加并调整其他参数

-模型预测默认ImageNet1000类类别，标签文件存储在/utils/tools/readable_label.txt中，如果使用自定义数据，请指定--label_path参数
+模型预测默认ImageNet1000类类别，预测数值和可读标签的map文件存储在/utils/tools/readable_label.txt中，如果使用自定义数据，请指定--class_map_path参数


 ## 进阶使用
@@ -246,6 +266,34 @@ Mixup相关介绍参考[mixup: Beyond Empirical Risk Minimization](https://arxiv

 FP16相关内容已经迁移至PaddlePaddle/Fleet 中

+### 性能分析
+
+注意：本部分主要为内部测试功能。
+其中包括启动CE以监测模型运行的稳定性，启动profiler以测试benchmark，启动same_feed来进行快速调试。
+
+启动CE会固定随机初始化，其中包括数据读取器中的shuffle和program的[random_seed](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/fluid_cn/Program_cn.html#random_seed)
+
+``` bash
+python train.py \
+    --enable_ce=True \
+    --data_dir=${path_to_a_smaller_dataset}
+```
+
+启动profiler进行性能分析
+
+``` bash
+python train.py \
+    --is_profiler=True
+```
+
+设置same_feed参数以进行快速调试, 相同的图片（same_feed张图片）将传入网络中
+```bash
+python train.py \
+    --same_feed=8 \
+    --batch_size=4 \
+    --print_step=1
+```
+
 ### DALI预处理

 使用[Nvidia DALI](https://github.com/NVIDIA/DALI)预处理类库可以加速训练并提高GPU利用率。
@@ -275,7 +323,7 @@ python -m paddle.distributed.launch train.py \

 #### 注意事项

-1. PaddlePaddle需使用1.6或以上的版本，并且需要使用GCC5.4以上编译器编译。
+1. PaddlePaddle需使用1.6或以上的版本，并且需要使用GCC5.4以上编译器[编译](https://www.paddlepaddle.org.cn/install/doc/source/ubuntu), 另外，请在编译过程中指定-DWITH_DISTRIBUTE=ON 来启动多进程训练模式。
 2. Nvidia DALI需要使用[#1371](https://github.com/NVIDIA/DALI/pull/1371)以后的git版本。请参考[此文档](https://docs.nvidia.com/deeplearning/sdk/dali-master-branch-user-guide/docs/installation.html)安装nightly版本或从源码安装。
 3. 因为DALI使用GPU进行图片预处理，需要占用部分显存，请适当调整 `FLAGS_fraction_of_gpu_memory_to_use`环境变量（如`0.8`）来预留部分显存供DALI使用。

@@ -491,7 +539,7 @@ PaddlePaddle/Models ImageClassification 支持自定义数据
 |[ResNeXt50_vd_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_vd_64x4d_pretrained.tar) | 80.12% | 94.86% | 20.888 | 15.938 |
 |[ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x4d_pretrained.tar) | 78.65% | 94.19% | 24.154 | 17.661 |
 |[ResNeXt101_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_32x4d_pretrained.tar) | 80.33% | 95.12% | 24.701 | 17.249 |
-|[ResNeXt101_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_64x4d_pretrained.tar) | 78.43% | 94.13% | 41.073 | 31.288 |
+|[ResNeXt101_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_64x4d_pretrained.tar) | 78.35% | 94.52% | 41.073 | 31.288 |
 |[ResNeXt101_vd_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar) | 80.78% | 95.20% | 42.277 | 32.620 |
 |[ResNeXt152_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_32x4d_pretrained.tar) | 78.98% | 94.33% | 37.007 | 26.981 |
 |[ResNeXt152_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_vd_32x4d_pretrained.tar) | 80.72% | 95.20% | 35.783 | 26.081 |

--- a/PaddleCV/image_classification/README_en.md
+++ b/PaddleCV/image_classification/README_en.md
@@ -15,6 +15,7 @@ English | [中文](README.md)
 - [Advanced Usage](#advanced-usage)
    - [Mixup Training](#mixup-training)
    - [Using Mixed-Precision Training](#using-mixed-precision-training)
+    - [Profiling](#profiling)
    - [Preprocessing with Nvidia DALI](#preprocessing-with-nvidia-dali)
    - [Custom Dataset](#custom-dataset)
 - [Supported Models and Performances](#supported-models-and-performances)
@@ -152,19 +153,29 @@ Reader and preprocess:

 Switch:

+* **validate**: whether to validate when training. Default: True.
 * **use_gpu**: whether to use GPU or not. Default: True.
 * **use_label_smoothing**: whether to use label_smoothing or not. Default:False.
 * **label_smoothing_epsilon**: the label_smoothing_epsilon. Default:0.1.
-* **random_seed**: random seed for debugging, Default: 1000.
 * **padding_type**: padding type of convolution for efficientNet, Default: "SAME".
 * **use_se**: whether to use Squeeze-and-Excitation module in efficientNet, Default: True.
 * **use_ema**: whether to use ExponentialMovingAverage or not. Default: False.
 * **ema_decay**: the value of ExponentialMovingAverage decay rate. Default: 0.9999.

+Profiling:
+
+* **enable_ce**: whether to start CE, Default: False
+* **random_seed**: random seed, Default: None
+* **is_profiler**: whether to start profilier, Default: 0
+* **profilier_path**: path to save profilier output, Default: 'profilier_path'
+* **max_iter**: maximum training batch, Default: 0
+* **same_feed**: whether to feed same data in the net, Default: 0
+
+
 **data reader introduction:** Data reader is defined in ```reader.py```, default reader is implemented by opencv. In the [Training](#training) Stage, random crop and flipping are applied, while center crop is applied in the [Evaluation](#evaluation) and [Inference](#inference) stages. Supported data augmentation includes:

 * rotation
-* color jitter (haven't implemented in cv2_reader)
+* color jitter
 * random crop
 * center crop
 * resize
@@ -187,6 +198,10 @@ Note: Add and adjust other parameters accroding to specific models and tasks.

 Evaluation is to evaluate the performance of a trained model. One can download [pretrained models](#supported-models-and-performances) and set its path to ```path_to_pretrain_model```. Then top1/top5 accuracy can be obtained by running the following command:

+**parameters**
+
+* **save_json_path**: whether to save output, default: None
+
 ```
 python eval.py \
       --model=model_name \
@@ -215,7 +230,9 @@ python eval.py \

 * **save_inference**: whether to save binary model, Default: False
 * **topk**: the number of sorted predicated labels to show, Default: 1
-* **label_path**: readable label filepath, Default: "/utils/tools/readable_label.txt"
+* **class_map_path**: readable label filepath, Default: "/utils/tools/readable_label.txt"
+* **save_json_path**: whether to save output, Default: None
+* **image_path**: whether to indicate the single image path to predict, Default: None

 Inference is used to get prediction score or image features based on trained models. One can download [pretrained models](#supported-models-and-performances) and set its path to ```path_to_pretrain_model```. Run following command then obtain prediction score.

@@ -480,7 +497,7 @@ Pretrained models can be downloaded by clicking related model names.
 |[ResNeXt50_vd_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_vd_64x4d_pretrained.tar) | 80.12% | 94.86% | 20.888 | 15.938 |
 |[ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_32x4d_pretrained.tar) | 78.65% | 94.19% | 24.154 | 17.661 |
 |[ResNeXt101_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_32x4d_pretrained.tar) | 80.33% | 95.12% | 24.701 | 17.249 |
-|[ResNeXt101_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_64x4d_pretrained.tar) | 78.43% | 94.13% | 41.073 | 31.288 |
+|[ResNeXt101_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt50_64x4d_pretrained.tar) | 79.35% | 94.52% | 41.073 | 31.288 |
 |[ResNeXt101_vd_64x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt101_vd_64x4d_pretrained.tar) | 80.78% | 95.20% | 42.277 | 32.620 |
 |[ResNeXt152_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_32x4d_pretrained.tar) | 78.98% | 94.33% | 37.007 | 26.981 |
 |[ResNeXt152_vd_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/ResNeXt152_vd_32x4d_pretrained.tar) | 80.72% | 95.20% | 35.783 | 26.081 |

--- a/PaddleCV/image_classification/build_model.py
+++ b/PaddleCV/image_classification/build_model.py
@@ -48,7 +48,8 @@ def _basic_model(data, model, args, is_train):

    avg_cost = fluid.layers.mean(cost)
    acc_top1 = fluid.layers.accuracy(input=softmax_out, label=label, k=1)
-    acc_top5 = fluid.layers.accuracy(input=softmax_out, label=label, k=5)
+    acc_top5 = fluid.layers.accuracy(
+        input=softmax_out, label=label, k=min(5, args.class_dim))
    return [avg_cost, acc_top1, acc_top5]


@@ -73,7 +74,8 @@ def _googlenet_model(data, model, args, is_train):

    avg_cost = avg_cost0 + 0.3 * avg_cost1 + 0.3 * avg_cost2
    acc_top1 = fluid.layers.accuracy(input=out0, label=label, k=1)
-    acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5)
+    acc_top5 = fluid.layers.accuracy(
+        input=out0, label=label, k=min(5, args.class_dim))

    return [avg_cost, acc_top1, acc_top5]


--- a/PaddleCV/image_classification/eval.py
+++ b/PaddleCV/image_classification/eval.py
@@ -34,7 +34,7 @@ parser = argparse.ArgumentParser(description=__doc__)
 add_arg = functools.partial(add_arguments, argparser=parser)
 # yapf: disable
 add_arg('data_dir',         str,  "./data/ILSVRC2012/", "The ImageNet datset")
-add_arg('batch_size',       int,  256,                  "Minibatch size.")
+add_arg('batch_size',       int,  256,                  "batch size on the all devices.")
 add_arg('use_gpu',          bool, True,                 "Whether to use GPU or not.")
 add_arg('class_dim',        int,  1000,                 "Class number.")
 parser.add_argument("--pretrained_model", default=None, required=True, type=str, help="The path to load pretrained model")
@@ -48,6 +48,8 @@ parser.add_argument('--image_shape', nargs="+",  type=int, default=[3,224,224],
 add_arg('interpolation',    int,  None,                 "The interpolation mode")
 add_arg('padding_type',     str,  "SAME",               "Padding type of convolution")
 add_arg('use_se',           bool, True,                 "Whether to use Squeeze-and-Excitation module for EfficientNet.")
+add_arg('save_json_path',   str,  None,                 "Whether to save output in json file.")
+add_arg('same_feed',        int,  0,                    "Whether to feed same images")
 # yapf: enable


@@ -96,27 +98,37 @@ def eval(args):
        acc_top1 = fluid.layers.accuracy(input=pred, label=label, k=1)
        acc_top5 = fluid.layers.accuracy(input=pred, label=label, k=5)

+    #startup_prog = fluid.Program()
+
    test_program = fluid.default_main_program().clone(for_test=True)

    fetch_list = [avg_cost.name, acc_top1.name, acc_top5.name]
+    gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))

-    place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
+    place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
    exe = fluid.Executor(place)
+
    exe.run(fluid.default_startup_program())
+    if args.use_gpu:
+        places = fluid.framework.cuda_places()
+
+    compiled_program = fluid.compiler.CompiledProgram(
+        test_program).with_data_parallel(places=places)

    fluid.io.load_persistables(exe, args.pretrained_model)
    imagenet_reader = reader.ImageNetReader()
    val_reader = imagenet_reader.val(settings=args)
-
    feeder = fluid.DataFeeder(place=place, feed_list=[image, label])

+    val_reader = feeder.decorate_reader(val_reader, multi_devices=True)
+
    test_info = [[], [], []]
    cnt = 0
    for batch_id, data in enumerate(val_reader()):
        t1 = time.time()
-        loss, acc1, acc5 = exe.run(test_program,
+        loss, acc1, acc5 = exe.run(compiled_program,
                                   fetch_list=fetch_list,
-                                   feed=feeder.feed(data))
+                                   feed=data)
        t2 = time.time()
        period = t2 - t1
        loss = np.mean(loss)
@@ -127,10 +139,12 @@ def eval(args):
        test_info[2].append(acc5 * len(data))
        cnt += len(data)
        if batch_id % 10 == 0:
-            print("Testbatch {0},loss {1}, "
-                  "acc1 {2},acc5 {3},time {4}".format(batch_id, \
+            info = "Testbatch {0},loss {1}, acc1 {2},acc5 {3},time {4}".format(batch_id, \
                  "%.5f"%loss,"%.5f"%acc1, "%.5f"%acc5, \
-                  "%2.2f sec" % period))
+                  "%2.2f sec" % period)
+            print(info)
+            if args.save_json_path:
+                save_json(info, args.save_json_path)
            sys.stdout.flush()

    test_loss = np.sum(test_info[0]) / cnt

--- a/PaddleCV/image_classification/infer.py
+++ b/PaddleCV/image_classification/infer.py
@@ -37,7 +37,7 @@ add_arg('data_dir',         str,  "./data/ILSVRC2012/", "The ImageNet data")
 add_arg('use_gpu',          bool, True,                 "Whether to use GPU or not.")
 add_arg('class_dim',        int,  1000,                 "Class number.")
 parser.add_argument("--pretrained_model", default=None, required=True, type=str, help="The path to load pretrained model")
-add_arg('model',            str,  "ResNet50",            "Set the network to use.")
+add_arg('model',            str,  "ResNet50",           "Set the network to use.")
 add_arg('save_inference',   bool, False,                "Whether to save inference model or not")
 add_arg('resize_short_size',int,  256,                  "Set resize short size")
 add_arg('reader_thread',    int,  1,                    "The number of multi thread reader")
@@ -46,10 +46,13 @@ parser.add_argument('--image_mean', nargs='+', type=float, default=[0.485, 0.456
 parser.add_argument('--image_std', nargs='+', type=float, default=[0.229, 0.224, 0.225], help="The std of input image data")
 parser.add_argument('--image_shape', nargs='+', type=int, default=[3, 224, 224], help="the shape of image")
 add_arg('topk',             int,  1,                    "topk")
-add_arg('label_path',       str,  "./utils/tools/readable_label.txt", "readable label filepath")
+add_arg('class_map_path',   str,  "./utils/tools/readable_label.txt", "readable label filepath")
 add_arg('interpolation',    int,  None,                 "The interpolation mode")
 add_arg('padding_type',     str,  "SAME",               "Padding type of convolution")
 add_arg('use_se',           bool, True,                 "Whether to use Squeeze-and-Excitation module for EfficientNet.")
+add_arg('image_path',       str,  None,                 "single image path")
+add_arg('batch_size',       int,  8,                    "batch_size on all devices")
+add_arg('save_json_path',        str,  None,            "save output to a json file")
 # yapf: enable


@@ -63,6 +66,17 @@ def infer(args):
    assert args.image_shape[
        1] <= args.resize_short_size, "Please check the args:image_shape and args:resize_short_size, The croped size(image_shape[1]) must smaller than or equal to the resized length(resize_short_size) "

+    if args.image_path:
+        assert os.path.isfile(
+            args.image_path
+        ), "Please check the args:image_path, it should be a path to single image."
+        if args.use_gpu:
+            assert fluid.core.get_cuda_device_count(
+            ) == 1, "please set \"export CUDA_VISIBLE_DEVICES=\" available single card"
+        else:
+            assert int(os.environ.get('CPU_NUM',
+                                      1)) == 1, "please set CPU_NUM as 1"
+
    image = fluid.data(
        name='image', shape=[None] + args.image_shape, dtype='float32')

@@ -87,6 +101,9 @@ def infer(args):
    exe = fluid.Executor(place)
    exe.run(fluid.default_startup_program())

+    compiled_program = fluid.compiler.CompiledProgram(
+        test_program).with_data_parallel()
+
    fluid.io.load_persistables(exe, args.pretrained_model)
    if args.save_inference:
        fluid.io.save_inference_model(
@@ -100,32 +117,44 @@ def infer(args):
        print("model: ", args.model, " is already saved")
        exit(0)

-    args.test_batch_size = 1
    imagenet_reader = reader.ImageNetReader()
    test_reader = imagenet_reader.test(settings=args)
+
    feeder = fluid.DataFeeder(place=place, feed_list=[image])
+    test_reader = feeder.decorate_reader(test_reader, multi_devices=True)

    TOPK = args.topk
-    assert os.path.exists(args.label_path), "Index file doesn't exist!"
-    f = open(args.label_path)
-    label_dict = {}
-    for item in f.readlines():
-        key = item.split(" ")[0]
-        value = [l.replace("\n", "") for l in item.split(" ")[1:]]
-        label_dict[key] = value
+    if os.path.exists(args.class_map_path):
+        print("The map of readable label and numerical label has been found!")
+        f = open(args.class_map_path)
+        label_dict = {}
+        for item in f.readlines():
+            key = item.split(" ")[0]
+            value = [l.replace("\n", "") for l in item.split(" ")[1:]]
+            label_dict[key] = value

    for batch_id, data in enumerate(test_reader()):
-        result = exe.run(test_program,
-                         fetch_list=fetch_list,
-                         feed=feeder.feed(data))
+        result = exe.run(compiled_program, fetch_list=fetch_list, feed=data)
        result = result[0][0]
        pred_label = np.argsort(result)[::-1][:TOPK]
-        readable_pred_label = []
-        for label in pred_label:
-            readable_pred_label.append(label_dict[str(label)])
-        print("Test-{0}-score: {1}, class{2} {3}".format(batch_id, result[
-            pred_label], pred_label, readable_pred_label))
+
+        if os.path.exists(args.class_map_path):
+            readable_pred_label = []
+            for label in pred_label:
+                readable_pred_label.append(label_dict[str(label)])
+                print(readable_pred_label)
+            info = "Test-{0}-score: {1}, class{2} {3}".format(
+                batch_id, result[pred_label], pred_label, readable_pred_label)
+        else:
+            info = "Test-{0}-score: {1}, class{2}".format(
+                batch_id, result[pred_label], pred_label)
+        print(info)
+        if args.save_json_path:
+            save_json(info, args.save_json_path)
+
        sys.stdout.flush()
+    if args.image_path:
+        os.remove(".tmp.txt")


 def main():

--- a/PaddleCV/image_classification/reader.py
+++ b/PaddleCV/image_classification/reader.py
@@ -271,15 +271,13 @@ class ImageNetReader:
                        rotate=False,
                        data_dir=None):
        num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
-        if mode == 'test':
-            batch_size = 1
+
+        if settings.use_gpu:
+            batch_size = settings.batch_size // paddle.fluid.core.get_cuda_device_count(
+            )
        else:
-            if settings.use_gpu:
-                batch_size = settings.batch_size // paddle.fluid.core.get_cuda_device_count(
-                )
-            else:
-                batch_size = settings.batch_size // int(
-                    os.environ.get('CPU_NUM', 1))
+            batch_size = settings.batch_size // int(
+                os.environ.get('CPU_NUM', 1))

        def reader():
            def read_file_list():
@@ -295,11 +293,11 @@ class ImageNetReader:
                        np.random.RandomState(self.shuffle_seed).shuffle(
                            full_lines)
                    elif shuffle:
-                        if not settings.enable_ce or settings.same_feed:
+                        if not settings.enable_ce or not settings.same_feed:
                            np.random.shuffle(full_lines)

                batch_data = []
-                if settings.same_feed:
+                if (mode == "train" or mode == "val") and settings.same_feed:
                    temp_file = full_lines[0]
                    print("Same images({},nums:{}) will feed in the net".format(
                        str(temp_file), settings.same_feed))
@@ -319,6 +317,7 @@ class ImageNetReader:
            return read_file_list

        data_reader = reader()
+
        if mode == 'train' and num_trainers > 1:
            assert self.shuffle_seed is not None, \
                "If num_trainers > 1, the shuffle_seed must be set, because " \
@@ -391,7 +390,6 @@ class ImageNetReader:
        assert os.path.isfile(
            file_list), "{} doesn't exist, please check data list path".format(
                file_list)
-
        return self._reader_creator(
            settings,
            file_list,
@@ -408,7 +406,13 @@ class ImageNetReader:
        Returns:
            test reader
        """
-        file_list = os.path.join(settings.data_dir, 'val_list.txt')
+        if settings.image_path:
+            tmp = open(".tmp.txt", "w")
+            tmp.write(settings.image_path + " 0")
+            file_list = ".tmp.txt"
+            settings.batch_size = 1
+        else:
+            file_list = os.path.join(settings.data_dir, 'val_list.txt')
        assert os.path.isfile(
            file_list), "{} doesn't exist, please check data list path".format(
                file_list)

--- a/PaddleCV/image_classification/train.py
+++ b/PaddleCV/image_classification/train.py
@@ -31,7 +31,7 @@ from build_model import create_model


 def build_program(is_train, main_prog, startup_prog, args):
-    """build program, and add grad op in program accroding to different mode
+    """build program, and add backward op in program accroding to different mode

    Parameters:
        is_train: indicate train mode or test mode
@@ -86,13 +86,21 @@ def validate(args,
             test_fetch_list,
             pass_id,
             train_batch_metrics_record,
-             train_batch_time_record=None):
+             train_batch_time_record=None,
+             train_prog=None):
    test_batch_time_record = []
    test_batch_metrics_record = []
    test_batch_id = 0
+    compiled_program = best_strategy_compiled(
+        args,
+        test_prog,
+        test_fetch_list[0],
+        exe,
+        mode="val",
+        share_prog=train_prog)
    for batch in test_iter:
        t1 = time.time()
-        test_batch_metrics = exe.run(program=test_prog,
+        test_batch_metrics = exe.run(program=compiled_program,
                                     feed=batch,
                                     fetch_list=test_fetch_list)
        t2 = time.time()
@@ -103,7 +111,7 @@ def validate(args,
        test_batch_metrics_record.append(test_batch_metrics_avg)

        print_info("batch", test_batch_metrics_avg, test_batch_elapse, pass_id,
-                   test_batch_id, args.print_step)
+                   test_batch_id, args.print_step, args.class_dim)
        sys.stdout.flush()
        test_batch_id += 1

@@ -118,7 +126,8 @@ def validate(args,
        "epoch",
        list(train_epoch_metrics_avg) + list(test_epoch_metrics_avg),
        test_epoch_time_avg,
-        pass_id=pass_id)
+        pass_id=pass_id,
+        class_dim=args.class_dim)
    if args.enable_ce:
        device_num = fluid.core.get_cuda_device_count() if args.use_gpu else 1
        print_info(
@@ -136,8 +145,6 @@ def train(args):
    """
    startup_prog = fluid.Program()
    train_prog = fluid.Program()
-    test_prog = fluid.Program()
-
    train_out = build_program(
        is_train=True,
        main_prog=train_prog,
@@ -152,18 +159,20 @@ def train(args):

    train_fetch_list = [var.name for var in train_fetch_vars]

-    test_out = build_program(
-        is_train=False,
-        main_prog=test_prog,
-        startup_prog=startup_prog,
-        args=args)
-    test_data_loader = test_out[-1]
-    test_fetch_vars = test_out[:-1]
+    if args.validate:
+        test_prog = fluid.Program()
+        test_out = build_program(
+            is_train=False,
+            main_prog=test_prog,
+            startup_prog=startup_prog,
+            args=args)
+        test_data_loader = test_out[-1]
+        test_fetch_vars = test_out[:-1]

-    test_fetch_list = [var.name for var in test_fetch_vars]
+        test_fetch_list = [var.name for var in test_fetch_vars]

-    #Create test_prog and set layers' is_test params to True
-    test_prog = test_prog.clone(for_test=True)
+        #Create test_prog and set layers' is_test params to True
+        test_prog = test_prog.clone(for_test=True)

    gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
    place = fluid.CUDAPlace(gpu_id) if args.use_gpu else fluid.CPUPlace()
@@ -183,12 +192,14 @@ def train(args):
    else:
        imagenet_reader = reader.ImageNetReader(0 if num_trainers > 1 else None)
        train_reader = imagenet_reader.train(settings=args)
-        test_reader = imagenet_reader.val(settings=args)
        places = place
        if num_trainers <= 1 and args.use_gpu:
            places = fluid.framework.cuda_places()
        train_data_loader.set_sample_list_generator(train_reader, places)
-        test_data_loader.set_sample_list_generator(test_reader, place)
+
+        if args.validate:
+            test_reader = imagenet_reader.val(settings=args)
+            test_data_loader.set_sample_list_generator(test_reader, places)

    compiled_train_prog = best_strategy_compiled(args, train_prog,
                                                 train_fetch_vars[0], exe)
@@ -204,7 +215,8 @@ def train(args):

        if not args.use_dali:
            train_iter = train_data_loader()
-            test_iter = test_data_loader()
+            if args.validate:
+                test_iter = test_data_loader()

        t1 = time.time()
        for batch in train_iter:
@@ -213,13 +225,16 @@ def train(args):
                return
            train_batch_metrics = exe.run(compiled_train_prog,
                                          feed=batch,
-                                          fetch_list=train_fetch_list)
+                                          fetch_list=train_fetch_list
+                                          if pass_id % args.print_step == 0 else
+                                          [])
            t2 = time.time()
            train_batch_elapse = t2 - t1
            train_batch_time_record.append(train_batch_elapse)
-            train_batch_metrics_avg = np.mean(
-                np.array(train_batch_metrics), axis=1)
-            train_batch_metrics_record.append(train_batch_metrics_avg)
+            if pass_id % args.print_step == 0:
+                train_batch_metrics_avg = np.mean(
+                    np.array(train_batch_metrics), axis=1)
+                train_batch_metrics_record.append(train_batch_metrics_avg)
            if trainer_id == 0:
                print_info("batch", train_batch_metrics_avg, train_batch_elapse,
                           pass_id, train_batch_id, args.print_step)
@@ -242,18 +257,20 @@ def train(args):
                print('ExponentialMovingAverage validate start...')
                with ema.apply(exe):
                    validate(args, test_iter, exe, test_prog, test_fetch_list,
-                             pass_id, train_batch_metrics_record)
+                             pass_id, train_batch_metrics_record,
+                             compiled_train_prog)
                print('ExponentialMovingAverage validate over!')

            validate(args, test_iter, exe, test_prog, test_fetch_list, pass_id,
-                     train_batch_metrics_record, train_batch_time_record)
-            #For now, save model per epoch.
-            if pass_id % args.save_step == 0:
-                save_model(args, exe, train_prog, pass_id)
+                     train_batch_metrics_record, train_batch_time_record,
+                     compiled_train_prog)

            if args.use_dali:
                test_iter.reset()

+        if pass_id % args.save_step == 0:
+            save_model(args, exe, train_prog, pass_id)
+

 def main():
    args = parse_args()

--- a/PaddleCV/image_classification/utils/__init__.py
+++ b/PaddleCV/image_classification/utils/__init__.py
@@ -12,4 +12,4 @@
 #See the License for the specific language governing permissions and
 #limitations under the License.
 from .optimizer import cosine_decay, lr_warmup, cosine_decay_with_warmup, exponential_decay_with_warmup, Optimizer, create_optimizer
-from .utility import add_arguments, print_arguments, parse_args, check_gpu, check_args, check_version, init_model, save_model, create_data_loader, print_info, best_strategy_compiled, init_model, save_model, ExponentialMovingAverage
+from .utility import add_arguments, print_arguments, parse_args, check_gpu, check_args, check_version, init_model, save_model, create_data_loader, print_info, best_strategy_compiled, init_model, save_model, ExponentialMovingAverage, save_json
--- a/PaddleCV/image_classification/utils/utility.py
+++ b/PaddleCV/image_classification/utils/utility.py
@@ -26,6 +26,7 @@ import sys
 import os
 import warnings
 import signal
+import json

 import paddle
 import paddle.fluid as fluid
@@ -101,8 +102,8 @@ def parse_args():
    parser.add_argument('--image_shape', nargs='+', type=int, default=[3, 224, 224], help="The shape of image")
    add_arg('num_epochs',               int,    120,                    "The number of total epochs.")
    add_arg('class_dim',                int,    1000,                   "The number of total classes.")
-    add_arg('batch_size',               int,    8,                      "Minibatch size on a device.")
-    add_arg('test_batch_size',          int,    16,                     "Test batch size on a deveice.")
+    add_arg('batch_size',               int,    8,                      "Minibatch size on all devices.")
+    add_arg('test_batch_size',          int,    16,                     "Test batch size on all devices.")
    add_arg('lr',                       float,  0.1,                    "The learning rate.")
    add_arg('lr_strategy',              str,    "piecewise_decay",      "The learning rate decay strategy.")
    add_arg('l2_decay',                 float,  1e-4,                   "The l2_decay parameter.")
@@ -129,6 +130,7 @@ def parse_args():
    parser.add_argument('--image_std', nargs='+', type=float, default=[0.229, 0.224, 0.225], help="The std of input image data")

    # SWITCH
+    add_arg('validate',                 bool,   True,                   "whether to validate when training.")
    #NOTE: (2019/08/08) FP16 is moving to PaddlePaddle/Fleet now
    #add_arg('use_fp16',                 bool,   False,                  "Whether to enable half precision training with fp16." )
    #add_arg('scale_loss',               float,  1.0,                    "The value of scale_loss for fp16." )
@@ -136,18 +138,17 @@ def parse_args():
    add_arg('label_smoothing_epsilon',  float,  0.1,                    "The value of label_smoothing_epsilon parameter")
    #NOTE: (2019/08/08) temporary disable use_distill
    #add_arg('use_distill',              bool,   False,                  "Whether to use distill")
-    add_arg("enable_ce",                bool,   False,                  "Whether to enable ce")
-    add_arg('random_seed',              int,    None,                   "random seed")
-
    add_arg('use_ema',                  bool,   False,                  "Whether to use ExponentialMovingAverage.")
    add_arg('ema_decay',                float,  0.9999,                 "The value of ema decay rate")
    add_arg('padding_type',             str,    "SAME",                 "Padding type of convolution")
    add_arg('use_se',                   bool,   True,                   "Whether to use Squeeze-and-Excitation module for EfficientNet.")
+
    #NOTE: args for profiler
-    add_arg('is_profiler',              int,    0,                      "the profiler switch.(used for benchmark)")
-    add_arg('profiler_path',            str,    './',                   "the profiler output file path.(used for benchmark)")
-    add_arg('max_iter',                 int,    0,                    "the max train batch num.(used for benchmark)")
-    add_arg('validate',                 int,    1,                      "whether validate.(used for benchmark)")
+    add_arg("enable_ce",                bool,   False,                  "Whether to enable ce")
+    add_arg('random_seed',              int,    None,                   "random seed")
+    add_arg('is_profiler',              bool,   False,                  "Whether to start the profiler")
+    add_arg('profiler_path',            str,    './profilier_files',                   "the profiler output file path")
+    add_arg('max_iter',                 int,    0,                      "the max train batch num")
    add_arg('same_feed',                int,    0,                      "whether to feed same images")


@@ -270,30 +271,39 @@ def check_args(args):
        args.random_seed = 0
        print("CE is running now!")

-    #check gpu
+    assert args.class_dim > 1, "class_dim must greater than 1"

+    #check gpu
    check_gpu()
    check_version()


 def init_model(exe, args, program):
+    """load model from checkpoint or pretrained model
+    """
+
    if args.checkpoint:
        fluid.io.load_persistables(exe, args.checkpoint, main_program=program)
        print("Finish initing model from %s" % (args.checkpoint))

    if args.pretrained_model:

-        def if_exist(var):
-            return os.path.exists(os.path.join(args.pretrained_model, var.name))
+        def is_parameter(var):
+            return isinstance(var, fluid.framework.Parameter) and (
+                not ("fc_0" in var.name)) and os.path.exists(
+                    os.path.join(args.pretrained_model, var.name))

+        print("Load pretrain weights from {}, exclude fc layer.".format(
+            args.pretrained_model))
+        vars = filter(is_parameter, program.list_vars())
        fluid.io.load_vars(
-            exe,
-            args.pretrained_model,
-            main_program=program,
-            predicate=if_exist)
+            exe, args.pretrained_model, vars=vars, main_program=program)


 def save_model(args, exe, train_prog, info):
+    """save model in model_path
+    """
+
    model_path = os.path.join(args.model_save_dir, args.model, str(info))
    if not os.path.isdir(model_path):
        os.makedirs(model_path)
@@ -301,6 +311,13 @@ def save_model(args, exe, train_prog, info):
    print("Already save model in %s" % (model_path))


+def save_json(info, path):
+    """ save eval result or infer result to file as json format.
+    """
+    with open(path, 'a') as f:
+        json.dump(info, f)
+
+
 def create_data_loader(is_train, args):
    """create data_loader

@@ -357,7 +374,8 @@ def print_info(info_mode,
               pass_id=0,
               batch_id=0,
               print_step=1,
-               device_num=1):
+               device_num=1,
+               class_dim=5):
    """print function

    Args:
@@ -383,16 +401,18 @@ def print_info(info_mode,
            elif len(metrics) == 4:
                loss, acc1, acc5, lr = metrics
                print(
-                    "[Pass {0}, train batch {1}] \tloss {2}, acc1 {3}, acc5 {4}, lr {5}, elapse {6}".
+                    "[Pass {0}, train batch {1}] \tloss {2}, acc1 {3}, acc{7} {4}, lr {5}, elapse {6}".
                    format(pass_id, batch_id, "%.5f" % loss, "%.5f" % acc1,
-                           "%.5f" % acc5, "%.5f" % lr, "%2.4f sec" % time_info))
+                           "%.5f" % acc5, "%.5f" % lr, "%2.4f sec" % time_info,
+                           min(class_dim, 5)))
            # test output
            elif len(metrics) == 3:
                loss, acc1, acc5 = metrics
                print(
-                    "[Pass {0}, test  batch {1}] \tloss {2}, acc1 {3}, acc5 {4}, elapse {5}".
+                    "[Pass {0}, test  batch {1}] \tloss {2}, acc1 {3}, acc{6} {4}, elapse {5}".
                    format(pass_id, batch_id, "%.5f" % loss, "%.5f" % acc1,
-                           "%.5f" % acc5, "%2.4f sec" % time_info))
+                           "%.5f" % acc5, "%2.4f sec" % time_info,
+                           min(class_dim, 5)))
            else:
                raise Exception(
                    "length of metrics {} is not implemented, It maybe caused by wrong format of build_program_output".
@@ -404,16 +424,16 @@ def print_info(info_mode,
        if len(metrics) == 5:
            train_loss, _, test_loss, test_acc1, test_acc5 = metrics
            print(
-                "[End pass {0}]\ttrain_loss {1}, test_loss {2}, test_acc1 {3}, test_acc5 {4}".
+                "[End pass {0}]\ttrain_loss {1}, test_loss {2}, test_acc1 {3}, test_acc{5} {4}".
                format(pass_id, "%.5f" % train_loss, "%.5f" % test_loss, "%.5f"
-                       % test_acc1, "%.5f" % test_acc5))
+                       % test_acc1, "%.5f" % test_acc5, min(class_dim, 5)))
        elif len(metrics) == 7:
            train_loss, train_acc1, train_acc5, _, test_loss, test_acc1, test_acc5 = metrics
            print(
-                "[End pass {0}]\ttrain_loss {1}, train_acc1 {2}, train_acc5 {3},test_loss {4}, test_acc1 {5}, test_acc5 {6}".
+                "[End pass {0}]\ttrain_loss {1}, train_acc1 {2}, train_acc{7} {3},test_loss {4}, test_acc1 {5}, test_acc{7} {6}".
                format(pass_id, "%.5f" % train_loss, "%.5f" % train_acc1, "%.5f"
                       % train_acc5, "%.5f" % test_loss, "%.5f" % test_acc1,
-                       "%.5f" % test_acc5))
+                       "%.5f" % test_acc5, min(class_dim, 5)))
        sys.stdout.flush()
    elif info_mode == "ce":
        assert len(
@@ -444,7 +464,12 @@ def print_ce(device_num, metrics, time_info):
    print("kpis\ttrain_speed_card{}\t{}".format(device_num, train_speed))


-def best_strategy_compiled(args, program, loss, exe):
+def best_strategy_compiled(args,
+                           program,
+                           loss,
+                           exe,
+                           mode="train",
+                           share_prog=None):
    """make a program which wrapped by a compiled program
    """

@@ -468,7 +493,8 @@ def best_strategy_compiled(args, program, loss, exe):
            exec_strategy.num_threads = 1

        compiled_program = fluid.CompiledProgram(program).with_data_parallel(
-            loss_name=loss.name,
+            loss_name=loss.name if mode == "train" else loss,
+            share_vars_from=share_prog if mode == "val" else None,
            build_strategy=build_strategy,
            exec_strategy=exec_strategy)