add landmark models (#2417)

* add landmark models

add landmark models (#2417)
* add landmark models
cc473ce7 · Kaibing Chen · qingqing01 · 8c034956 · cc473ce7 · cc473ce7
34 changed file
--- a/PaddleCV/Research/landmark/README.md
+++ b/PaddleCV/Research/landmark/README.md
+# Google Landmark Retrieval and Recognition 2019
+The Google Landmark Dataset V2 is currently the largest publicly image retrieval and recogntion dataset, including 4M training data, more than 100,000 query images and nearly 1M index data. The large amounts of images in training dataset is the driving force of the generalizability of machine learning models. Here, we release our trained models in Google Landmark 2019 Competition, the detail of our solution can refer to our paper [[link](https://arxiv.org/pdf/1906.03990.pdf)].
+## Retrieval Models
+We fine-tune four convolutional neural networks to extract our global image descriptors. The four convolutional backbones include ResNet152, ResNet200, SE ResNeXt152 and InceptionV4. We choose arcmargin and npairs as our training loss, We train these models using Google Landmark V2 training set and index set. You can download trained models here. The training code can refer to metric learning [[link](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/metric_learning)].
+|model | public | private
+|- | - | -:
+|[res152_arcmargin](https://landmark.gz.bcebos.com/res152_arcmargin.tar) | 0.2676 | 0.3020
+|[res152_arcmargin_index](https://landmark.gz.bcebos.com/res152_arcmargin_index.tar) | 0.2476 | 0.2707
+|[res152_npairs](https://landmark.gz.bcebos.com/res152_npairs.tar) | 0.2597 |  0.2870
+|[res200_arcmargin](https://landmark.gz.bcebos.com/res200_arcmargin.tar) | 0.2670 | 0.3042
+|[se_x152_arcmargin](https://landmark.gz.bcebos.com/se_x152_arcmargin.tar) | 0.2670 |  0.2914
+|[inceptionv4_arcmargin](https://landmark.gz.bcebos.com/inceptionv4_arcmargin.tar) | 0.2685 | 0.2933
+In addition, we also train a classification model based on ResNet152 with ~4M Google Landmark V2 training set. ([res152_softmax_v1](https://landmark.gz.bcebos.com/res152_softmax_v1.tar)) 
+The taining code can refer to image classification [[link](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification)].
+## Recognition Models
+There are three models in our recognition solution.
+1.[res152_arcmargin](https://landmark.gz.bcebos.com/res152_arcmargin.tar): Retrieval model based on Resnet152 and arcmargin which is the same as in the retrieval task.  
+2.[res152_softmax_v2](https://landmark.gz.bcebos.com/res152_softmax_v2.tar): Classification model based on Resnet152 and softmax with ~3M Google Landmark V2 tidied training set. The training code can refer to image classification [[link](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification)].
+3.[res50_oid_v4_detector](https://landmark.gz.bcebos.com/res50_oid_v4_detector.tar): Object detector model for the non-landmark images filtering. The mAP of this model is ~0.55 on the OID V4 track (public LB). The training code can refer to RCNN detector [[link](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/rcnn)].
+## Environment
+Cudnn >= 7, CUDA 9, PaddlePaddle version >= 1.3, python version 2.7
+## Inference
+### 1.Compile paddle infer so and predict with binary model
+There are two different type of models in PaddlePaddle: train model and binary model. Predict with the binary model is more efficient. Thus, at first we compile paddle infer so and convert train model to binary model.
+(1) Compile paddle infer so 
+Please refer the README.md in pypredict. 
+(2) Convert train model to binary model
+```
+    pushd inference
+    sh convert.sh
+```
+### 2.Extract retrieval feature and calculate cosine distance
+In the folder ./inference/test_data, there are four images, 0.jpg and 1.jpg are same landmark images, 2.jpg is another landmark image, 3.jpg is a non-lamdnark image.
+We will extract the features of these images, and calculate the cosine distances between 0.jpg and 1.jpg, 2.jpg, 3.jpg.
+```
+pushd inference
+. set_env.sh
+python infer_retrieval.py test_retrieval model_name [res152_arcmargin, res152_arcmargin_index, res152_npairs, res200_arcmargin, se_x152_arcmargin, inceptionv4_arcmargin]
+example:
+    python infer_retrieval.py test_retrieval res152_arcmargin
+popd
+```
+### 3.Predict the classification label of images
+```
+pushd inference
+. set_env.sh
+python infer_recognition.py test_cls img_path model_name [res152_softmax_v1, res152_softmax_v2]
+example:
+    python infer_recognition.py test_cls test_data/0.jpg res152_softmax_v1
+popd
+```
+You will get the inference label and score.
+### 4.Detect images
+```
+    pushd inference
+    . set_env.sh
+    python infer_recognition.py test_det ./test_data/2e44b31818acc600.jpeg
+```
+You will get the inference detetor bounding box and classes. The class mapping file: pretrained_models/res50_oid_v4_detector/cls_name_idx_map_openimagev4_500.txt 
--- a/PaddleCV/Research/landmark/inference/conf/paddle-cls.conf
+++ b/PaddleCV/Research/landmark/inference/conf/paddle-cls.conf
+# for c++ predict
+[res152_softmax_v1_predict]
+# set the used of GPU card
+res152_softmax_v1_device : 0
+# set whether print the debug infor
+res152_softmax_v1_debug : 0
+# set the initial ratio of the GPU memory
+res152_softmax_v1_fraction_of_gpu_memory: 0.1
+# binary model structure
+res152_softmax_v1_prog_file: ./binary_models/res152_softmax_v1/model
+# binary model params
+res152_softmax_v1_param_file: ./binary_models/res152_softmax_v1/params
+[res152_softmax_v2_predict]
+res152_softmax_v2_device : 0
+res152_softmax_v2_debug : 0
+res152_softmax_v2_fraction_of_gpu_memory: 0.1
+res152_softmax_v2_prog_file: ./binary_models/res152_softmax_v2/model
+res152_softmax_v2_param_file: ./binary_models/res152_softmax_v2/params
--- a/PaddleCV/Research/landmark/inference/conf/paddle-det.conf
+++ b/PaddleCV/Research/landmark/inference/conf/paddle-det.conf
+# for c++ predict
+[paddle-classify_predict]
+# set the used of GPU card
+paddle-classify_device : 0
+# set whether print the debug infor
+paddle-classify_debug : 0
+# set the initial ratio of the GPU memory
+paddle-classify_fraction_of_gpu_memory: 0.1
+# binary model structure
+paddle-classify_prog_file: ./pretrained_models/res50_oid_v4_detector/infer_model/model
+# binary model params
+paddle-classify_param_file: ./pretrained_models/res50_oid_v4_detector/infer_model/params
+[paddle-det]
+#total detector class number
+class_nums:501
+#infer image size
+new_size:800
+#max infer image size
+max_size:1333
--- a/PaddleCV/Research/landmark/inference/conf/paddle-retrieval.conf
+++ b/PaddleCV/Research/landmark/inference/conf/paddle-retrieval.conf
+# for c++ predict
+[res152_arcmargin_predict]
+# set the used of GPU card
+res152_arcmargin_device : 0
+# set whether print the debug infor
+res152_arcmargin_debug : 0
+# set the initial ratio of the GPU memory
+res152_arcmargin_fraction_of_gpu_memory: 0.1
+# binary model structure
+res152_arcmargin_prog_file: ./binary_models/res152_arcmargin/model
+# binary model params
+res152_arcmargin_param_file: ./binary_models/res152_arcmargin/params
+# input shape
+input_size: 448
+[res152_arcmargin_index_predict]
+res152_arcmargin_index_device : 0
+res152_arcmargin_index_debug : 0
+res152_arcmargin_index_fraction_of_gpu_memory: 0.1
+res152_arcmargin_index_prog_file: ./binary_models/res152_arcmargin_index/model
+res152_arcmargin_index_param_file: ./binary_models/res152_arcmargin_index/params
+input_size: 448
+[res152_npairs_predict]
+res152_npairs_device : 0
+res152_npairs_debug : 0
+res152_npairs_fraction_of_gpu_memory: 0.1
+res152_npairs_prog_file: ./binary_models/res152_npairs/model
+res152_npairs_param_file: ./binary_models/res152_npairs/params
+input_size: 448
+[res200_arcmargin_predict]
+res200_arcmargin_device : 0
+res200_arcmargin_debug : 0
+res200_arcmargin_fraction_of_gpu_memory: 0.1
+res200_arcmargin_prog_file: ./binary_models/res200_arcmargin/model
+res200_arcmargin_param_file: ./binary_models/res200_arcmargin/params
+input_size: 448
+[se_x152_arcmargin_predict]
+se_x152_arcmargin_device : 0
+se_x152_arcmargin_debug : 0
+se_x152_arcmargin_fraction_of_gpu_memory: 0.1
+se_x152_arcmargin_prog_file: ./binary_models/se_x152_arcmargin/model
+se_x152_arcmargin_param_file: ./binary_models/se_x152_arcmargin/params
+input_size: 448
+[inceptionv4_arcmargin_predict]
+inceptionv4_arcmargin_device : 0
+inceptionv4_arcmargin_debug : 0
+inceptionv4_arcmargin_fraction_of_gpu_memory: 0.1
+inceptionv4_arcmargin_prog_file: ./binary_models/inceptionv4_arcmargin/model
+inceptionv4_arcmargin_param_file: ./binary_models/inceptionv4_arcmargin/params
+input_size: 555
--- a/PaddleCV/Research/landmark/inference/convert.sh
+++ b/PaddleCV/Research/landmark/inference/convert.sh
+#res152_softmax_v1
+python convert_binary_model.py --model='ResNet152_vd_fc' --pretrained_model=pretrained_models/res152_softmax_v1/ --binary_model=./binary_models/res152_softmax_v1 --image_shape=3,224,224 --task_mode='classification'
+#res152_softmax_v2
+python convert_binary_model.py --model='ResNet152_vd' --pretrained_model=pretrained_models/res152_softmax_v2/ --binary_model=./binary_models/res152_softmax_v2 --image_shape=3,224,224 --task_mode='classification'
+#res152_arcmargin
+python convert_binary_model.py --model='ResNet152_vd_v0_embedding' --pretrained_model=pretrained_models/res152_arcmargin/ --binary_model=./binary_models/res152_arcmargin --image_shape=3,448,448 --task_mode='retrieval'
+#res152_arcmargin_index
+python convert_binary_model.py --model='ResNet152_vd_v0_embedding' --pretrained_model=pretrained_models/res152_arcmargin_index/ --binary_model=./binary_models/res152_arcmargin_index --image_shape=3,448,448 --task_mode='retrieval'
+#res152_npairs
+python convert_binary_model.py --model='ResNet152_vd_v0_embedding' --pretrained_model=pretrained_models/res152_npairs/ --binary_model=./binary_models/res152_npairs --image_shape=3,448,448 --task_mode='retrieval'
+#res200_arcmargin
+python convert_binary_model.py --model='ResNet200_vd_embedding' --pretrained_model=pretrained_models/res200_arcmargin/ --binary_model=./binary_models/res200_arcmargin --image_shape=3,448,448 --task_mode='retrieval'
+#se_x152_arcmargin
+python convert_binary_model.py --model='SE_ResNeXt152_64x4d_vd_embedding' --pretrained_model=pretrained_models/se_x152_arcmargin/ --binary_model=./binary_models/se_x152_arcmargin --image_shape=3,448,448 --task_mode='retrieval'
+#inceptionv4_arcmargin
+python convert_binary_model.py --model='InceptionV4_embedding' --pretrained_model=pretrained_models/inceptionv4_arcmargin --binary_model=./binary_models/inceptionv4_arcmargin --image_shape=3,555,555 --task_mode='retrieval'
--- a/PaddleCV/Research/landmark/inference/convert_binary_model.py
+++ b/PaddleCV/Research/landmark/inference/convert_binary_model.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import sys
+import argparse
+import functools
+import paddle
+import paddle.fluid as fluid
+import models
+from utility import add_arguments, print_arguments
+parser = argparse.ArgumentParser(description=__doc__)
+add_arg = functools.partial(add_arguments, argparser=parser)
+# yapf: disable
+add_arg('model', str, "ResNet200_vd", "Set the network to use.")
+add_arg('embedding_size', int, 512, "Embedding size.")
+add_arg('image_shape', str, "3,448,448", "Input image size.")
+add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
+add_arg('binary_model', str, None, "Set binary_model dir")
+add_arg('task_mode', str, "retrieval", "Set task mode")
+# yapf: enable
+model_list = [m for m in dir(models) if "__" not in m]
+def convert(args):
+    # parameters from arguments
+    model_name = args.model
+    pretrained_model = args.pretrained_model
+    if not os.path.exists(pretrained_model):
+        print("pretrained_model doesn't exist!")
+        sys.exit(-1) 
+    image_shape = [int(m) for m in args.image_shape.split(",")]
+    assert model_name in model_list, "{} is not in lists: {}".format(args.model,
+                                                                     model_list)
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    # model definition
+    model = models.__dict__[model_name]()
+    if args.task_mode == 'retrieval':
+        out = model.net(input=image, embedding_size=args.embedding_size)
+    else:
+        out = model.net(input=image)
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    def if_exist(var):
+        return os.path.exists(os.path.join(pretrained_model, var.name))
+    fluid.io.load_vars(exe, pretrained_model, predicate=if_exist)
+    fluid.io.save_inference_model(
+        dirname = args.binary_model,
+        feeded_var_names = ['image'],
+        target_vars = [out['embedding']] if args.task_mode == 'retrieval' else [out],
+        executor = exe,
+        main_program = None,
+        model_filename = 'model',
+        params_filename = 'params')
+    print('input_name: {}'.format('image'))
+    print('output_name: {}'.format(out['embedding'].name)) if args.task_mode == 'retrieval' else ('output_name: {}'.format(out.name))
+    print("convert done.")
+def main():
+    args = parser.parse_args()
+    print_arguments(args)
+    convert(args)
+if __name__ == '__main__':
+    main()
--- a/PaddleCV/Research/landmark/inference/infer_recognition.py
+++ b/PaddleCV/Research/landmark/inference/infer_recognition.py
+import os
+import sys
+sys.path.append('./so')
+import time
+import cv2
+import numpy as np
+from ConfigParser import ConfigParser
+from PyCNNPredict import PyCNNPredict
+#infer detector
+def det_preprocessor(im, new_size, max_size):
+    im = im.astype(np.float32, copy=False)
+    img_mean = [0.485, 0.456, 0.406]
+    img_std = [0.229, 0.224, 0.225]
+    im = im[:, :, ::-1]
+    im = im / 255
+    im -= img_mean
+    im /= img_std
+    im_shape = im.shape
+    im_size_min = np.min(im_shape[0:2])
+    im_size_max = np.max(im_shape[0:2])  
+    im_scale = float(new_size) / float(im_size_min)  
+    # Prevent the biggest axis from being more than max_size
+    if np.round(im_scale * im_size_max) > max_size:
+        im_scale = float(max_size) / float(im_size_max)
+    im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_LINEAR)
+    channel_swap = (2, 0, 1)  #(batch, channel, height, width)
+    im = im.transpose(channel_swap)
+    return im, im_scale
+def nms(dets, thresh):
+    """nms"""
+    x1 = dets[:, 0]
+    y1 = dets[:, 1]
+    x2 = dets[:, 2]
+    y2 = dets[:, 3]
+    scores = dets[:, 4]
+    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+    dt_num = dets.shape[0]
+    order = np.array(range(dt_num))
+    keep = []
+    while order.size > 0:
+        i = order[0]
+        keep.append(i)
+        xx1 = np.maximum(x1[i], x1[order[1:]])
+        yy1 = np.maximum(y1[i], y1[order[1:]])
+        xx2 = np.minimum(x2[i], x2[order[1:]])
+        yy2 = np.minimum(y2[i], y2[order[1:]])
+        w = np.maximum(0.0, xx2 - xx1 + 1)
+        h = np.maximum(0.0, yy2 - yy1 + 1)
+        inter = w * h
+        ovr = inter / (areas[i] + areas[order[1:]] - inter)
+        inds = np.where(ovr <= thresh)[0]
+        order = order[inds + 1]
+    return keep
+def box_decoder(deltas, boxes, weights):
+    boxes = boxes.astype(deltas.dtype, copy=False)
+    widths = boxes[:, 2] - boxes[:, 0] + 1.0
+    heights = boxes[:, 3] - boxes[:, 1] + 1.0
+    ctr_x = boxes[:, 0] + 0.5 * widths
+    ctr_y = boxes[:, 1] + 0.5 * heights
+    wx, wy, ww, wh = weights
+    dx = deltas[:, 0::4] * wx
+    dy = deltas[:, 1::4] * wy
+    dw = deltas[:, 2::4] * ww
+    dh = deltas[:, 3::4] * wh
+    # Prevent sending too large values into np.exp()
+    clip_value = np.log(1000. / 16.)
+    dw = np.minimum(dw, clip_value)
+    dh = np.minimum(dh, clip_value)
+    pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
+    pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
+    pred_w = np.exp(dw) * widths[:, np.newaxis]
+    pred_h = np.exp(dh) * heights[:, np.newaxis]
+    pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
+    # x1
+    pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
+    # y1
+    pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
+    # x2 (note: "- 1" is correct; don't be fooled by the asymmetry)
+    pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w - 1
+    # y2 (note: "- 1" is correct; don't be fooled by the asymmetry)
+    pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h - 1
+    return pred_boxes
+def clip_tiled_boxes(boxes, im_shape):
+    """Clip boxes to image boundaries. im_shape is [height, width] and boxes
+    has shape (N, 4 * num_tiled_boxes)."""
+    # x1 >= 0
+    boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
+    # y1 >= 0
+    boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
+    # x2 < im_shape[1]
+    boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
+    # y2 < im_shape[0]
+    boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
+    return boxes
+def get_dt_res_common(rpn_rois_v, confs_v, locs_v, class_nums, im_info, im_id):
+    dts_res = []
+    if len(rpn_rois_v) == 0:
+        return None
+    variance_v = np.array([0.1, 0.1, 0.2, 0.2])
+    img_height, img_width, img_scale = im_info
+    tmp_v = box_decoder(locs_v, rpn_rois_v, variance_v)
+    tmp_v = clip_tiled_boxes(tmp_v, [img_height, img_width])
+    decoded_box_v = tmp_v / img_scale
+    cls_boxes = [[] for _ in range(class_nums)]
+    for j in range(1, class_nums):
+        inds = np.where(confs_v[:, j] >= 0.1)[0]
+        scores_j = confs_v[inds, j]
+        rois_j = decoded_box_v[inds, j * 4:(j + 1) * 4]
+        dets_j = np.hstack((rois_j, scores_j[:, np.newaxis])).astype(np.float32, copy=False)
+        cls_rank = np.argsort(-dets_j[:, -1])
+        dets_j = dets_j[cls_rank]
+        keep = nms(dets_j, 0.5)
+        nms_dets = dets_j[keep, :]
+        cls_boxes[j] = nms_dets
+    # Limit to max_per_image detections **over all classes**
+    image_scores = np.hstack([cls_boxes[j][:, -1] for j in range(1, class_nums)])
+    if len(image_scores) > 100:
+        image_thresh = np.sort(image_scores)[-100]
+        for j in range(1, class_nums):
+            keep = np.where(cls_boxes[j][:, -1] >= image_thresh)[0]
+            cls_boxes[j] = cls_boxes[j][keep, :]
+    for j in range(1, class_nums):
+        for dt in cls_boxes[j]:
+            xmin, ymin, xmax, ymax, score = dt.tolist()
+            w = xmax - xmin + 1
+            h = ymax - ymin + 1
+            bbox = [xmin, ymin, w, h]
+            dt_res = {
+                'image_id': im_id,
+                'category_id': j,
+                'bbox': bbox,
+                'score': score
+            }
+            dts_res.append(dt_res)
+    return dts_res
+def test_det(img_path):
+    conf_file = './conf/paddle-det.conf'
+    prefix = 'paddle-classify_'
+    conf = loadconfig(conf_file)
+    det_prefix = 'paddle-det'
+    class_nums = conf.getint(det_prefix, 'class_nums')
+    new_size = conf.getfloat(det_prefix, 'new_size')
+    max_size = conf.getfloat(det_prefix, 'max_size')
+    predictor = PyCNNPredict()
+    predictor.init(conf_file, prefix)
+    im = cv2.imread(img_path)
+    if im is None:
+        print("image doesn't exist!")
+        sys.exit(-1) 
+    img_height_ori = im.shape[0]
+    img_width_ori = im.shape[1]
+    im, im_scale = det_preprocessor(im, new_size, max_size)
+    im_height = np.round(img_height_ori * im_scale)
+    im_width = np.round(img_width_ori * im_scale) 
+    im_info = np.array([im_height, im_width, im_scale], dtype=np.float32)
+    im_data_shape = np.array([1, im.shape[0], im.shape[1], im.shape[2]])
+    im_info_shape = np.array([1, 3])
+    im = im.flatten().astype(np.float32)
+    im_info = im_info.flatten().astype(np.float32)
+    inputdatas = [im, im_info]
+    inputshapes = [im_data_shape.astype(np.int32), im_info_shape.astype(np.int32)]
+    for ino in range(2):
+        starttime = time.time() 
+        res = predictor.predict(inputdatas, inputshapes, [])
+        rpn_rois_v = res[0][0].reshape(-1, 4)
+        confs_v = res[0][1].reshape(-1, class_nums)
+        locs_v = res[0][2].reshape(-1, class_nums * 4)
+        dts_res = get_dt_res_common(rpn_rois_v, confs_v, locs_v, class_nums, im_info, 0)
+        print("Time:%.3f" % (time.time() - starttime))
+    print(dts_res)
+##infer cls 
+def normwidth(size, margin=32):
+    outsize = size // margin * margin
+    return outsize
+def loadconfig(configurefile):
+    "load config from file"
+    config = ConfigParser()
+    config.readfp(open(configurefile, 'r'))
+    return config
+def resize_short(img, target_size):
+    """ resize_short """
+    percent = float(target_size) / min(img.shape[0], img.shape[1])
+    resized_width = int(round(img.shape[1] * percent))
+    resized_height = int(round(img.shape[0] * percent))
+    resized_width = normwidth(resized_width)
+    resized_height = normwidth(resized_height)
+    resized = cv2.resize(img, (resized_width, resized_height))
+    return resized
+def crop_image(img, target_size, center):
+    """ crop_image """
+    height, width = img.shape[:2]
+    size = target_size
+    if center == True:
+        w_start = (width - size) / 2
+        h_start = (height - size) / 2
+    else:
+        w_start = random.randint(0, width - size)
+        h_start = random.randint(0, height - size)
+    w_end = w_start + size
+    h_end = h_start + size
+    img = img[h_start:h_end, w_start:w_end, :]
+    return img
+def cls_preprocessor(im, new_size):
+    img_mean = [0.485, 0.456, 0.406]
+    img_std = [0.229, 0.224, 0.225]
+    img = resize_short(im, 224)
+    img = crop_image(img, target_size=224, center=True)
+    img = img[:, :, ::-1].astype('float32').transpose((2, 0, 1)) / 255
+    img_mean = np.array(img_mean).reshape((3, 1, 1))
+    img_std = np.array(img_std).reshape((3, 1, 1))
+    img -= img_mean
+    img /= img_std
+    return img
+def test_cls(img_path, model_name):
+    conf_file = './conf/paddle-cls.conf'
+    prefix = model_name + "_"
+    conf = loadconfig(conf_file)
+    predictor = PyCNNPredict()
+    predictor.init(conf_file, prefix)
+    im = cv2.imread(img_path)
+    if im is None:
+        print("image doesn't exist!")
+        sys.exit(-1)
+    im = cls_preprocessor(im, 224)
+    im_data_shape = np.array([1, im.shape[0], im.shape[1], im.shape[2]])
+    im = im.flatten().astype(np.float32)
+    inputdatas = [im]
+    inputshapes = [im_data_shape.astype(np.int32)]
+    for ino in range(5):
+        starttime = time.time()
+        res = predictor.predict(inputdatas, inputshapes, [])
+        print "Time:", time.time() - starttime 
+    result = res[0][0]
+    pred_label = np.argsort(result)[::-1][:1]
+    print(pred_label)
+    print(result[pred_label])
+if __name__ == "__main__":
+    if len(sys.argv)>1 :
+        func = getattr(sys.modules[__name__], sys.argv[1])
+        func(*sys.argv[2:])
+    else:
+        print >> sys.stderr,'tools.py command'
--- a/PaddleCV/Research/landmark/inference/infer_retrieval.py
+++ b/PaddleCV/Research/landmark/inference/infer_retrieval.py
+import os
+import sys
+sys.path.append('./so')
+import time
+import cv2
+import numpy as np
+from ConfigParser import ConfigParser
+from PyCNNPredict import PyCNNPredict
+def normwidth(size, margin=32):
+    outsize = size // margin * margin
+    outsize = max(outsize, margin)
+    return outsize
+def loadconfig(configurefile):
+    "load config from file"
+    config = ConfigParser()
+    config.readfp(open(configurefile, 'r'))
+    return config
+def resize_short(img, target_size):
+    """ resize_short """
+    percent = float(target_size) / min(img.shape[0], img.shape[1])
+    resized_width = int(round(img.shape[1] * percent))
+    resized_height = int(round(img.shape[0] * percent))
+    resized_width = normwidth(resized_width)
+    resized_height = normwidth(resized_height)
+    resized = cv2.resize(img, (resized_width, resized_height), interpolation=cv2.INTER_LANCZOS4)
+    return resized
+def crop_image(img, target_size, center):
+    """ crop_image """
+    height, width = img.shape[:2]
+    size = target_size
+    if center == True:
+        w_start = (width - size) / 2
+        h_start = (height - size) / 2
+    else:
+        w_start = random.randint(0, width - size)
+        h_start = random.randint(0, height - size)
+    w_end = w_start + size
+    h_end = h_start + size
+    img = img[h_start:h_end, w_start:w_end, :]
+    return img
+def preprocessor(img, crop_size):
+    img_mean = [0.485, 0.456, 0.406]
+    img_std = [0.229, 0.224, 0.225]
+    h, w = img.shape[:2]
+    ratio = float(max(w, h)) / min(w, h)
+    if ratio > 3:
+        crop_size = int(crop_size * 3 / ratio)
+    img = resize_short(img, crop_size)
+    img = img[:, :, ::-1].astype('float32').transpose((2, 0, 1)) / 255
+    img_mean = np.array(img_mean).reshape((3, 1, 1))
+    img_std = np.array(img_std).reshape((3, 1, 1))
+    img -= img_mean
+    img /= img_std
+    return img
+def cosinedist(a, b):
+    return np.dot(a, b) / (np.sum(a * a) * np.sum(b * b))**0.5
+def test_retrieval(model_name):
+    conf_file = './conf/paddle-retrieval.conf'
+    prefix = model_name + "_"
+    config = loadconfig(conf_file)
+    predictor = PyCNNPredict()
+    predictor.init(conf_file, prefix)
+    input_size = config.getint(prefix + 'predict', 'input_size')
+    img_names = [
+        './test_data/0.jpg', 
+        './test_data/1.jpg', 
+        './test_data/2.jpg', 
+        './test_data/3.jpg'
+    ]
+    img_feas = []
+    for img_path in img_names:
+        im = cv2.imread(img_path)
+        if im is None:
+            return None
+        im = preprocessor(im, input_size)
+        im_data_shape = np.array([1, im.shape[0], im.shape[1], im.shape[2]])
+        im = im.flatten().astype(np.float32)
+        inputdatas = [im]
+        inputshapes = [im_data_shape.astype(np.int32)]
+        run_time = 0
+        starttime = time.time()
+        res = predictor.predict(inputdatas, inputshapes, [])
+        run_time += (time.time() - starttime)
+        fea = res[0][0]
+        img_feas.append(fea)
+        print("Time:", run_time)
+    for i in xrange(len(img_names)-1):
+        cosdist = cosinedist(img_feas[0], img_feas[i+1])
+        cosdist = max(min(cosdist, 1), 0)
+        print('cosine dist between {} and {}: {}'.format(0, i+1, cosdist))
+if __name__ == "__main__":
+    if len(sys.argv)>1 :
+        func = getattr(sys.modules[__name__], sys.argv[1])
+        func(*sys.argv[2:])
+    else:
+        print >> sys.stderr,'tools.py command'
--- a/PaddleCV/Research/landmark/inference/models/__init__.py
+++ b/PaddleCV/Research/landmark/inference/models/__init__.py
+from .inceptionv4_embedding import InceptionV4_embedding
+from .resnet_vd import ResNet50_vd, ResNet101_vd, ResNet152_vd, ResNet200_vd
+from .resnet_vd_embedding import ResNet50_vd_embedding, ResNet101_vd_embedding, ResNet152_vd_embedding, ResNet200_vd_embedding
+from .resnet_vd_fc import ResNet50_vd_fc, ResNet101_vd_fc, ResNet152_vd_fc
+from .resnet_vd_v0_embedding import ResNet50_vd_v0_embedding, ResNet101_vd_v0_embedding, ResNet152_vd_v0_embedding
+from .se_resnext_vd_embedding import SE_ResNeXt50_32x4d_vd_embedding, SE_ResNeXt101_32x4d_vd_embedding, SE_ResNeXt152_64x4d_vd_embedding
--- a/PaddleCV/Research/landmark/inference/models/inceptionv4_embedding.py
+++ b/PaddleCV/Research/landmark/inference/models/inceptionv4_embedding.py
+import paddle
+import paddle.fluid as fluid
+import math
+from paddle.fluid.param_attr import ParamAttr
+__all__ = ['InceptionV4_embedding']
+train_parameters = {
+    "input_size": [3, 224, 224],
+    "input_mean": [0.485, 0.456, 0.406],
+    "input_std": [0.229, 0.224, 0.225],
+    "learning_strategy": {
+        "name": "piecewise_decay",
+        "batch_size": 256,
+        "epochs": [10, 16, 20],
+        "steps": [0.01, 0.001, 0.0001, 0.00001]
+    }
+}
+class InceptionV4_embedding():
+    def __init__(self):
+        self.params = train_parameters
+    def net(self, input, embedding_size=256):
+        endpoints = {}
+        x = self.inception_stem(input)
+        for i in range(4):
+            x = self.inceptionA(x,name=str(i+1))
+        x = self.reductionA(x)
+        for i in range(7):
+            x = self.inceptionB(x,name=str(i+1))
+        x = self.reductionB(x)
+        for i in range(3):
+            x = self.inceptionC(x,name=str(i+1))
+        pool = fluid.layers.pool2d(
+            input=x, pool_size=8, pool_type='avg', global_pooling=True)
+        if embedding_size > 0:
+            embedding = fluid.layers.fc(input=pool, size=embedding_size)
+            endpoints['embedding'] = embedding
+        else:
+            endpoints['embedding'] = pool
+        return endpoints
+    def conv_bn_layer(self,
+                      data,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      padding=0,
+                      groups=1,
+                      act='relu',name=None):
+        conv = fluid.layers.conv2d(
+            input=data,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=padding,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights"),
+            bias_attr=False,
+            name=name)
+        bn_name = name+"_bn"
+        return fluid.layers.batch_norm(input=conv, act=act,name=bn_name,
+                                      param_attr = ParamAttr(name=bn_name+"_scale"),
+                                      bias_attr=ParamAttr(name=bn_name+"_offset"),
+                                      moving_mean_name=bn_name + '_mean',
+                                      moving_variance_name=bn_name + '_variance')
+    def inception_stem(self, data, name=None):
+        conv = self.conv_bn_layer(data, 32, 3, stride=2, act='relu', name="conv1_3x3_s2")
+        conv = self.conv_bn_layer(conv, 32, 3, act='relu', name="conv2_3x3_s1")
+        conv = self.conv_bn_layer(conv, 64, 3, padding=1, act='relu', name="conv3_3x3_s1")
+        pool1 = fluid.layers.pool2d(
+            input=conv, pool_size=3, pool_stride=2, pool_type='max')
+        conv2 = self.conv_bn_layer(conv, 96, 3, stride=2, act='relu',name="inception_stem1_3x3_s2")
+        concat = fluid.layers.concat([pool1, conv2], axis=1)
+        conv1 = self.conv_bn_layer(concat, 64, 1, act='relu',name="inception_stem2_3x3_reduce")
+        conv1 = self.conv_bn_layer(conv1, 96, 3, act='relu',name="inception_stem2_3x3")
+        conv2 = self.conv_bn_layer(concat, 64, 1, act='relu',name="inception_stem2_1x7_reduce")
+        conv2 = self.conv_bn_layer(
+            conv2, 64, (7, 1), padding=(3, 0), act='relu',name="inception_stem2_1x7")
+        conv2 = self.conv_bn_layer(
+            conv2, 64, (1, 7), padding=(0, 3), act='relu',name="inception_stem2_7x1")
+        conv2 = self.conv_bn_layer(conv2, 96, 3, act='relu',name="inception_stem2_3x3_2")
+        concat = fluid.layers.concat([conv1, conv2], axis=1)
+        conv1 = self.conv_bn_layer(concat, 192, 3, stride=2, act='relu',name="inception_stem3_3x3_s2")
+        pool1 = fluid.layers.pool2d(
+            input=concat, pool_size=3, pool_stride=2, pool_type='max')
+        concat = fluid.layers.concat([conv1, pool1], axis=1)
+        return concat
+    def inceptionA(self, data, name=None):
+        pool1 = fluid.layers.pool2d(
+            input=data, pool_size=3, pool_padding=1, pool_type='avg')
+        conv1 = self.conv_bn_layer(pool1, 96, 1, act='relu',name="inception_a"+name+"_1x1")
+        conv2 = self.conv_bn_layer(data, 96, 1, act='relu',name="inception_a"+name+"_1x1_2")
+        conv3 = self.conv_bn_layer(data, 64, 1, act='relu', name="inception_a"+name+"_3x3_reduce")
+        conv3 = self.conv_bn_layer(conv3, 96, 3, padding=1, act='relu', name="inception_a"+name+"_3x3")
+        conv4 = self.conv_bn_layer(data, 64, 1, act='relu', name="inception_a"+name+"_3x3_2_reduce")
+        conv4 = self.conv_bn_layer(conv4, 96, 3, padding=1, act='relu', name="inception_a"+name+"_3x3_2")
+        conv4 = self.conv_bn_layer(conv4, 96, 3, padding=1, act='relu',name="inception_a"+name+"_3x3_3")
+        concat = fluid.layers.concat([conv1, conv2, conv3, conv4], axis=1)
+        return concat
+    def reductionA(self, data, name=None):
+        pool1 = fluid.layers.pool2d(
+            input=data, pool_size=3, pool_stride=2, pool_type='max')
+        conv2 = self.conv_bn_layer(data, 384, 3, stride=2, act='relu',name="reduction_a_3x3")
+        conv3 = self.conv_bn_layer(data, 192, 1, act='relu',name="reduction_a_3x3_2_reduce")
+        conv3 = self.conv_bn_layer(conv3, 224, 3, padding=1, act='relu', name="reduction_a_3x3_2")
+        conv3 = self.conv_bn_layer(conv3, 256, 3, stride=2, act='relu',name="reduction_a_3x3_3")
+        concat = fluid.layers.concat([pool1, conv2, conv3], axis=1)
+        return concat
+    def inceptionB(self, data, name=None):
+        pool1 = fluid.layers.pool2d(
+            input=data, pool_size=3, pool_padding=1, pool_type='avg')
+        conv1 = self.conv_bn_layer(pool1, 128, 1, act='relu',name="inception_b"+name+"_1x1")
+        conv2 = self.conv_bn_layer(data, 384, 1, act='relu', name="inception_b"+name+"_1x1_2")
+        conv3 = self.conv_bn_layer(data, 192, 1, act='relu',name="inception_b"+name+"_1x7_reduce")
+        conv3 = self.conv_bn_layer(
+            conv3, 224, (1, 7), padding=(0, 3), act='relu',name="inception_b"+name+"_1x7")
+        conv3 = self.conv_bn_layer(
+            conv3, 256, (7, 1), padding=(3, 0), act='relu',name="inception_b"+name+"_7x1")
+        conv4 = self.conv_bn_layer(data, 192, 1, act='relu',name="inception_b"+name+"_7x1_2_reduce")
+        conv4 = self.conv_bn_layer(
+            conv4, 192, (1, 7), padding=(0, 3), act='relu',name="inception_b"+name+"_1x7_2")
+        conv4 = self.conv_bn_layer(
+            conv4, 224, (7, 1), padding=(3, 0), act='relu',name="inception_b"+name+"_7x1_2")
+        conv4 = self.conv_bn_layer(
+            conv4, 224, (1, 7), padding=(0, 3), act='relu',name="inception_b"+name+"_1x7_3")
+        conv4 = self.conv_bn_layer(
+            conv4, 256, (7, 1), padding=(3, 0), act='relu',name="inception_b"+name+"_7x1_3")
+        concat = fluid.layers.concat([conv1, conv2, conv3, conv4], axis=1)
+        return concat
+    def reductionB(self, data, name=None):
+        pool1 = fluid.layers.pool2d(
+            input=data, pool_size=3, pool_stride=2, pool_type='max')
+        conv2 = self.conv_bn_layer(data, 192, 1, act='relu',name="reduction_b_3x3_reduce")
+        conv2 = self.conv_bn_layer(conv2, 192, 3, stride=2, act='relu',name="reduction_b_3x3")
+        conv3 = self.conv_bn_layer(data, 256, 1, act='relu',name="reduction_b_1x7_reduce")
+        conv3 = self.conv_bn_layer(
+            conv3, 256, (1, 7), padding=(0, 3), act='relu',name="reduction_b_1x7")
+        conv3 = self.conv_bn_layer(
+            conv3, 320, (7, 1), padding=(3, 0), act='relu',name="reduction_b_7x1")
+        conv3 = self.conv_bn_layer(conv3, 320, 3, stride=2, act='relu',name="reduction_b_3x3_2")
+        concat = fluid.layers.concat([pool1, conv2, conv3], axis=1)
+        return concat
+    def inceptionC(self, data, name=None):
+        pool1 = fluid.layers.pool2d(
+            input=data, pool_size=3, pool_padding=1, pool_type='avg')
+        conv1 = self.conv_bn_layer(pool1, 256, 1, act='relu',name="inception_c"+name+"_1x1")
+        conv2 = self.conv_bn_layer(data, 256, 1, act='relu',name="inception_c"+name+"_1x1_2")
+        conv3 = self.conv_bn_layer(data, 384, 1, act='relu',name="inception_c"+name+"_1x1_3")
+        conv3_1 = self.conv_bn_layer(
+            conv3, 256, (1, 3), padding=(0, 1), act='relu',name="inception_c"+name+"_1x3")
+        conv3_2 = self.conv_bn_layer(
+            conv3, 256, (3, 1), padding=(1, 0), act='relu',name="inception_c"+name+"_3x1")
+        conv4 = self.conv_bn_layer(data, 384, 1, act='relu',name="inception_c"+name+"_1x1_4")
+        conv4 = self.conv_bn_layer(
+            conv4, 448, (1, 3), padding=(0, 1), act='relu',name="inception_c"+name+"_1x3_2")
+        conv4 = self.conv_bn_layer(
+            conv4, 512, (3, 1), padding=(1, 0), act='relu',name="inception_c"+name+"_3x1_2")
+        conv4_1 = self.conv_bn_layer(
+            conv4, 256, (1, 3), padding=(0, 1), act='relu',name="inception_c"+name+"_1x3_3")
+        conv4_2 = self.conv_bn_layer(
+            conv4, 256, (3, 1), padding=(1, 0), act='relu',name="inception_c"+name+"_3x1_3")
+        concat = fluid.layers.concat(
+            [conv1, conv2, conv3_1, conv3_2, conv4_1, conv4_2], axis=1)
+        return concat
\ No newline at end of file
--- a/PaddleCV/Research/landmark/inference/models/resnet_vd.py
+++ b/PaddleCV/Research/landmark/inference/models/resnet_vd.py
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+import math
+__all__ = ["ResNet", "ResNet50_vd","ResNet101_vd", "ResNet152_vd", "ResNet200_vd"]
+train_parameters = {
+    "input_size": [3, 224, 224],
+    "input_mean": [0.485, 0.456, 0.406],
+    "input_std": [0.229, 0.224, 0.225],
+    "learning_strategy": {
+        "name": "piecewise_decay",
+        "batch_size": 256,
+        "epochs": [30, 45, 55],
+        "steps": [0.01, 0.001, 0.0001, 0.00001]
+    }
+}
+class ResNet():
+    def __init__(self, layers=50, is_3x3 = False):
+        self.params = train_parameters
+        self.layers = layers
+        self.is_3x3 = is_3x3
+    def net(self, input, class_dim=1000):
+        is_3x3 = self.is_3x3
+        layers = self.layers
+        supported_layers = [50, 101, 152, 200]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(supported_layers, layers)
+        if layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        elif layers == 200:
+            depth = [3, 12, 48, 3]
+        num_filters = [64, 128, 256, 512]
+        if is_3x3 == False:
+            conv = self.conv_bn_layer(
+                input=input, num_filters=64, filter_size=7, stride=2, act='relu')
+        else:
+            conv = self.conv_bn_layer(
+                input=input, num_filters=32, filter_size=3, stride=2, act='relu', name='conv1_1')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=32, filter_size=3, stride=1, act='relu', name='conv1_2')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=64, filter_size=3, stride=1, act='relu', name='conv1_3')
+        conv = fluid.layers.pool2d(
+            input=conv,
+            pool_size=3,
+            pool_stride=2,
+            pool_padding=1,
+            pool_type='max')
+        for block in range(len(depth)):
+            for i in range(depth[block]):
+                if layers in [101, 152, 200] and block == 2:
+                    if i == 0:
+                        conv_name="res"+str(block+2)+"a"
+                    else:
+                        conv_name="res"+str(block+2)+"b"+str(i)
+                else:
+                    conv_name="res"+str(block+2)+chr(97+i)
+                conv = self.bottleneck_block(
+                    input=conv,
+                    num_filters=num_filters[block],
+                    stride=2 if i == 0 and block != 0 else 1,
+                    if_first=block==0, 
+                    name=conv_name)
+        pool = fluid.layers.pool2d(
+            input=conv, pool_size=7, pool_type='avg', global_pooling=True)
+        stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+        out = fluid.layers.fc(input=pool,
+                              size=class_dim,
+                              param_attr=fluid.param_attr.ParamAttr(
+                                  initializer=fluid.initializer.Uniform(-stdv,
+                                                                        stdv)))
+        softmaxout = fluid.layers.softmax(input=out)
+        return softmaxout
+    def conv_bn_layer(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) / 2,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights"),
+            bias_attr=False)
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:] 
+        return fluid.layers.batch_norm(input=conv, 
+                                       act=act,
+                                       param_attr=ParamAttr(name=bn_name + '_scale'),
+                                       bias_attr=ParamAttr(bn_name + '_offset'),
+                                       moving_mean_name=bn_name + '_mean',
+                                       moving_variance_name=bn_name + '_variance')
+    def conv_bn_layer_new(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        pool = fluid.layers.pool2d(input=input,
+            pool_size=2,
+            pool_stride=2,
+            pool_padding=0,
+            pool_type='avg')
+        conv = fluid.layers.conv2d(
+            input=pool,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=1,
+            padding=(filter_size - 1) / 2,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights"),
+            bias_attr=False)
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:]
+        return fluid.layers.batch_norm(input=conv, 
+                                       act=act,
+                                       param_attr=ParamAttr(name=bn_name + '_scale'),
+                                       bias_attr=ParamAttr(bn_name + '_offset'),
+                                       moving_mean_name=bn_name + '_mean',
+                                       moving_variance_name=bn_name + '_variance')
+    def shortcut(self, input, ch_out, stride, name, if_first=False):
+        ch_in = input.shape[1]
+        if ch_in != ch_out or stride != 1:
+            if if_first:
+                return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
+            else:
+                return self.conv_bn_layer_new(input, ch_out, 1, stride, name=name)
+        else:
+            return input
+    def bottleneck_block(self, input, num_filters, stride, name, if_first):
+        conv0 = self.conv_bn_layer(
+            input=input, num_filters=num_filters, filter_size=1, act='relu', name=name+"_branch2a")
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            stride=stride,
+            act='relu',
+            name=name+"_branch2b")
+        conv2 = self.conv_bn_layer(
+            input=conv1, num_filters=num_filters * 4, filter_size=1, act=None, name=name+"_branch2c")
+        short = self.shortcut(input, num_filters * 4, stride, if_first=if_first, name=name + "_branch1")
+        return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
+def ResNet50_vd():
+    model = ResNet(layers=50, is_3x3 = True)
+    return model
+def ResNet101_vd():
+    model = ResNet(layers=101, is_3x3 = True)
+    return model
+def ResNet152_vd():
+    model = ResNet(layers=152, is_3x3 = True)
+    return model
+def ResNet200_vd():
+    model = ResNet(layers=200, is_3x3 = True)
+    return model
--- a/PaddleCV/Research/landmark/inference/models/resnet_vd_embedding.py
+++ b/PaddleCV/Research/landmark/inference/models/resnet_vd_embedding.py
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+import math
+__all__ = ["ResNet", "ResNet50_vd_embedding","ResNet101_vd_embedding", "ResNet152_vd_embedding", "ResNet200_vd_embedding"]
+train_parameters = {
+    "input_size": [3, 224, 224],
+    "input_mean": [0.485, 0.456, 0.406],
+    "input_std": [0.229, 0.224, 0.225],
+    "learning_strategy": {
+        "name": "piecewise_decay",
+        "batch_size": 256,
+        "epochs": [30, 60, 90],
+        "steps": [0.1, 0.01, 0.001, 0.0001]
+    }
+}
+class ResNet():
+    def __init__(self, layers=50, is_3x3 = False):
+        self.params = train_parameters
+        self.layers = layers
+        self.is_3x3 = is_3x3
+    def net(self, input, embedding_size=256):
+        is_3x3 = self.is_3x3
+        layers = self.layers
+        supported_layers = [50, 101, 152, 200]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(supported_layers, layers)
+        if layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        elif layers == 200:
+            depth = [3, 12, 48, 3]
+        num_filters = [64, 128, 256, 512]
+        endpoints = {}
+        if is_3x3 == False:
+            conv = self.conv_bn_layer(
+                input=input, num_filters=64, filter_size=7, stride=2, act='relu')
+        else:
+            conv = self.conv_bn_layer(
+                input=input, num_filters=32, filter_size=3, stride=2, act='relu', name='conv1_1')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=32, filter_size=3, stride=1, act='relu', name='conv1_2')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=64, filter_size=3, stride=1, act='relu', name='conv1_3')
+        conv = fluid.layers.pool2d(
+            input=conv,
+            pool_size=3,
+            pool_stride=2,
+            pool_padding=1,
+            pool_type='max')
+        for block in range(len(depth)):
+            for i in range(depth[block]):
+                if layers in [101, 152, 200] and block == 2:
+                    if i == 0:
+                        conv_name="res"+str(block+2)+"a"
+                    else:
+                        conv_name="res"+str(block+2)+"b"+str(i)
+                else:
+                    conv_name="res"+str(block+2)+chr(97+i)
+                conv = self.bottleneck_block(
+                    input=conv,
+                    num_filters=num_filters[block],
+                    stride=2 if i == 0 and block != 0 else 1,
+                    if_first=block==0, 
+                    name=conv_name)
+        pool = fluid.layers.pool2d(
+            input=conv, pool_size=14, pool_type='avg', global_pooling=True)
+        if embedding_size > 0:
+            embedding = fluid.layers.fc(input=pool, size=embedding_size)
+            endpoints['embedding'] = embedding
+        else:
+            endpoints['embedding'] = pool
+        return endpoints
+    def conv_bn_layer(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) / 2,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights"),
+            bias_attr=False)
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:] 
+        return fluid.layers.batch_norm(input=conv, 
+                                       act=act,
+                                       param_attr=ParamAttr(name=bn_name + '_scale'),
+                                       bias_attr=ParamAttr(bn_name + '_offset'),
+                                       moving_mean_name=bn_name + '_mean',
+                                       moving_variance_name=bn_name + '_variance')
+    def conv_bn_layer_new(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        pool = fluid.layers.pool2d(input=input,
+            pool_size=2,
+            pool_stride=2,
+            pool_padding=0,
+            pool_type='avg')
+        conv = fluid.layers.conv2d(
+            input=pool,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=1,
+            padding=(filter_size - 1) / 2,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights"),
+            bias_attr=False)
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:]
+        return fluid.layers.batch_norm(input=conv, 
+                                       act=act,
+                                       param_attr=ParamAttr(name=bn_name + '_scale'),
+                                       bias_attr=ParamAttr(bn_name + '_offset'),
+                                       moving_mean_name=bn_name + '_mean',
+                                       moving_variance_name=bn_name + '_variance')
+    def shortcut(self, input, ch_out, stride, name, if_first=False):
+        ch_in = input.shape[1]
+        if ch_in != ch_out or stride != 1:
+            if if_first:
+                return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
+            else:
+                return self.conv_bn_layer_new(input, ch_out, 1, stride, name=name)
+        else:
+            return input
+    def bottleneck_block(self, input, num_filters, stride, name, if_first):
+        conv0 = self.conv_bn_layer(
+            input=input, num_filters=num_filters, filter_size=1, act='relu', name=name+"_branch2a")
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            stride=stride,
+            act='relu',
+            name=name+"_branch2b")
+        conv2 = self.conv_bn_layer(
+            input=conv1, num_filters=num_filters * 4, filter_size=1, act=None, name=name+"_branch2c")
+        short = self.shortcut(input, num_filters * 4, stride, if_first=if_first, name=name + "_branch1")
+        return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
+def ResNet50_vd_embedding():
+    model = ResNet(layers=50, is_3x3 = True)
+    return model
+def ResNet101_vd_embedding():
+    model = ResNet(layers=101, is_3x3 = True)
+    return model
+def ResNet152_vd_embedding():
+    model = ResNet(layers=152, is_3x3 = True)
+    return model
+def ResNet200_vd_embedding():
+    model = ResNet(layers=200, is_3x3 = True)
+    return model
\ No newline at end of file
--- a/PaddleCV/Research/landmark/inference/models/resnet_vd_fc.py
+++ b/PaddleCV/Research/landmark/inference/models/resnet_vd_fc.py
+import paddle
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+import math
+__all__ = ["ResNet", "ResNet50_vd_fc", "ResNet101_vd_fc", "ResNet152_vd_fc"]
+train_parameters = {
+    "input_size": [3, 224, 224],
+    "input_mean": [0.485, 0.456, 0.406],
+    "input_std": [0.229, 0.224, 0.225],
+    "learning_strategy": {
+        "name": "piecewise_decay",
+        "batch_size": 256,
+        "epochs": [30, 60, 90],
+        "steps": [0.1, 0.01, 0.001, 0.0001]
+    }
+}
+class ResNet(object):
+    """ResNet"""
+    def __init__(self, layers=50, is_3x3=False):
+        self.params = train_parameters
+        self.layers = layers
+        self.is_3x3 = is_3x3
+    def net(self, input, class_dim=1000):
+        """net"""
+        is_3x3 = self.is_3x3
+        layers = self.layers
+        supported_layers = [50, 101, 152]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(supported_layers, layers)
+        if layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        num_filters = [64, 128, 256, 512]
+        if is_3x3 == False:
+            conv = self.conv_bn_layer(
+                input=input, num_filters=64, filter_size=7, stride=2, act='relu')
+        else:
+            conv = self.conv_bn_layer(
+                input=input, num_filters=32, filter_size=3, stride=2, act='relu', name='conv1_1')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=32, filter_size=3, stride=1, act='relu', name='conv1_2')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=64, filter_size=3, stride=1, act='relu', name='conv1_3')
+        conv = fluid.layers.pool2d(
+            input=conv,
+            pool_size=3,
+            pool_stride=2,
+            pool_padding=1,
+            pool_type='max')
+        for block in range(len(depth)):
+            for i in range(depth[block]):
+                if layers in [101, 152] and block == 2:
+                    if i == 0:
+                        conv_name="res" + str(block + 2) + "a"
+                    else:
+                        conv_name="res" + str(block + 2) + "b" + str(i)
+                else:
+                    conv_name="res" + str(block + 2) + chr(97 + i)
+                conv = self.bottleneck_block(
+                    input=conv,
+                    num_filters=num_filters[block],
+                    stride=2 if i == 0 and block != 0 else 1,
+                    if_first=block == 0, 
+                    name=conv_name)
+        pool = fluid.layers.pool2d(
+            input=conv, pool_size=7, pool_type='avg', global_pooling=True)
+        fcresult = fluid.layers.fc(input=pool, size=256)
+        stdv = 1.0 / math.sqrt(fcresult.shape[1] * 1.0)
+        out = fluid.layers.fc(input=fcresult, size=class_dim,
+                              param_attr=fluid.param_attr.ParamAttr(
+                                  initializer=fluid.initializer.Uniform(-stdv, stdv)))
+        #return out
+        softmaxout = fluid.layers.softmax(input=out)
+        return softmaxout
+    def conv_bn_layer(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        """conv_bn_layer"""
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) / 2,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights"),
+            bias_attr=False)
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:] 
+        return fluid.layers.batch_norm(input=conv, 
+                                       act=act,
+                                       param_attr=ParamAttr(name=bn_name + '_scale'),
+                                       bias_attr=ParamAttr(bn_name + '_offset'),
+                                       moving_mean_name=bn_name + '_mean',
+                                       moving_variance_name=bn_name + '_variance')
+    def conv_bn_layer_new(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        """conv_bn_layer_new"""
+        pool = fluid.layers.pool2d(input=input,
+            pool_size=2,
+            pool_stride=2,
+            pool_padding=0,
+            pool_type='avg')
+        conv = fluid.layers.conv2d(
+            input=pool,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=1,
+            padding=(filter_size - 1) / 2,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights"),
+            bias_attr=False)
+        if name == "conv1":
+            bn_name = "bn_" + name
+        else:
+            bn_name = "bn" + name[3:]
+        return fluid.layers.batch_norm(input=conv, 
+                                       act=act,
+                                       param_attr=ParamAttr(name=bn_name + '_scale'),
+                                       bias_attr=ParamAttr(bn_name + '_offset'),
+                                       moving_mean_name=bn_name + '_mean',
+                                       moving_variance_name=bn_name + '_variance')
+    def shortcut(self, input, ch_out, stride, name, if_first=False):
+        """shortcut"""
+        ch_in = input.shape[1]
+        if ch_in != ch_out or stride != 1:
+            if if_first:
+                return self.conv_bn_layer(input, ch_out, 1, stride, name=name)
+            else:
+                return self.conv_bn_layer_new(input, ch_out, 1, stride, name=name)
+        else:
+            return input
+    def bottleneck_block(self, input, num_filters, stride, name, if_first):
+        """bottleneck_block"""
+        conv0 = self.conv_bn_layer(
+            input=input, num_filters=num_filters, 
+            filter_size=1, act='relu', name=name + "_branch2a")
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            stride=stride,
+            act='relu',
+            name=name + "_branch2b")
+        conv2 = self.conv_bn_layer(input=conv1, num_filters=num_filters * 4,
+            filter_size=1, act=None, name=name + "_branch2c")
+        short = self.shortcut(input, num_filters * 4, stride, if_first=if_first, name=name + "_branch1")
+        return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
+def ResNet50_vd_fc():
+    """ResNet50_vd"""
+    model = ResNet(layers=50, is_3x3 = True)
+    return model
+def ResNet101_vd_fc():
+    """ResNet101_vd"""
+    model = ResNet(layers=101, is_3x3 = True)
+    return model
+def ResNet152_vd_fc():
+    """ResNet152_vd"""
+    model = ResNet(layers=152, is_3x3 = True)
+    return model
--- a/PaddleCV/Research/landmark/inference/models/resnet_vd_v0_embedding.py
+++ b/PaddleCV/Research/landmark/inference/models/resnet_vd_v0_embedding.py
+import paddle
+import paddle.fluid as fluid
+import math
+__all__ = ["ResNet_vd", "ResNet50_vd_v0_embedding","ResNet101_vd_v0_embedding", "ResNet152_vd_v0_embedding"]
+train_parameters = {
+    "input_size": [3, 224, 224],
+    "input_mean": [0.485, 0.456, 0.406],
+    "input_std": [0.229, 0.224, 0.225],
+    "learning_strategy": {
+        "name": "piecewise_decay",
+        "batch_size": 256,
+        "epochs": [30, 60, 90],
+        "steps": [0.1, 0.01, 0.001, 0.0001]
+    }
+}
+class ResNet_vd():
+    def __init__(self, layers=50, is_3x3 = False):
+        self.params = train_parameters
+        self.layers = layers
+        self.is_3x3 = is_3x3
+    def net(self, input, embedding_size=256):
+        is_3x3 = self.is_3x3
+        layers = self.layers
+        supported_layers = [50, 101, 152]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(supported_layers, layers)
+        if layers == 50:
+            depth = [3, 4, 6, 3]
+        elif layers == 101:
+            depth = [3, 4, 23, 3]
+        elif layers == 152:
+            depth = [3, 8, 36, 3]
+        num_filters = [64, 128, 256, 512]
+        endpoints = {}
+        if is_3x3 == False:
+            conv = self.conv_bn_layer(
+                input=input, num_filters=64, filter_size=7, stride=2, act='relu')
+        else:
+            conv = self.conv_bn_layer(
+                input=input, num_filters=32, filter_size=3, stride=2, act='relu')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=32, filter_size=3, stride=1, act='relu')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=64, filter_size=3, stride=1, act='relu')
+        conv = fluid.layers.pool2d(
+            input=conv,
+            pool_size=3,
+            pool_stride=2,
+            pool_padding=1,
+            pool_type='max')
+        for block in range(len(depth)):
+            for i in range(depth[block]):
+                conv = self.bottleneck_block(
+                    input=conv,
+                    num_filters=num_filters[block],
+                    stride=2 if i == 0 and block != 0 else 1,
+                    if_first=block==0)
+        pool = fluid.layers.pool2d(
+            input=conv, pool_size=14, pool_type='avg', global_pooling=True)
+        if embedding_size > 0:
+            embedding = fluid.layers.fc(input=pool, size=embedding_size)
+            endpoints['embedding'] = embedding
+        else:
+            endpoints['embedding'] = pool
+        return endpoints
+    def conv_bn_layer(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) / 2,
+            groups=groups,
+            act=None,
+            bias_attr=False)
+        return fluid.layers.batch_norm(input=conv, act=act)
+    def conv_bn_layer_new(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None):
+        pool = fluid.layers.pool2d(input=input,
+            pool_size=2,
+            pool_stride=2,
+            pool_padding=0,
+            pool_type='avg')
+        conv = fluid.layers.conv2d(
+            input=pool,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=1,
+            padding=(filter_size - 1) / 2,
+            groups=groups,
+            act=None,
+            bias_attr=False)
+        return fluid.layers.batch_norm(input=conv, act=act)
+    def shortcut(self, input, ch_out, stride, if_first=False):
+        ch_in = input.shape[1]
+        if ch_in != ch_out or stride != 1:
+            if if_first:
+                return self.conv_bn_layer(input, ch_out, 1, stride)
+            else:
+                return self.conv_bn_layer_new(input, ch_out, 1, stride)
+        else:
+            return input
+    def bottleneck_block(self, input, num_filters, stride, if_first):
+        conv0 = self.conv_bn_layer(
+            input=input, num_filters=num_filters, filter_size=1, act='relu')
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            stride=stride,
+            act='relu')
+        conv2 = self.conv_bn_layer(
+            input=conv1, num_filters=num_filters * 4, filter_size=1, act=None)
+        short = self.shortcut(input, num_filters * 4, stride, if_first=if_first)
+        return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
+def ResNet50_vd_v0_embedding():
+    model = ResNet_vd(layers=50, is_3x3 = True)
+    return model
+def ResNet101_vd_v0_embedding():
+    model = ResNet_vd(layers=101, is_3x3 = True)
+    return model
+def ResNet152_vd_v0_embedding():
+    model = ResNet_vd(layers=152, is_3x3 = True)
+    return model
\ No newline at end of file
--- a/PaddleCV/Research/landmark/inference/models/se_resnext_vd_embedding.py
+++ b/PaddleCV/Research/landmark/inference/models/se_resnext_vd_embedding.py
+import paddle
+import paddle.fluid as fluid
+import math
+from paddle.fluid.param_attr import ParamAttr
+__all__ = [
+    "SE_ResNeXt", "SE_ResNeXt50_32x4d_vd_embedding", "SE_ResNeXt101_32x4d_vd_embedding",
+    "SE_ResNeXt152_64x4d_vd_embedding"
+]
+train_parameters = {
+    "input_size": [3, 224, 224],
+    "input_mean": [0.485, 0.456, 0.406],
+    "input_std": [0.229, 0.224, 0.225],
+    "learning_strategy": {
+        "name": "piecewise_decay",
+        "batch_size": 256,
+        "epochs": [10, 16, 20],
+        "steps": [0.01, 0.001, 0.0001, 0.00001]
+    }
+}
+class SE_ResNeXt():
+    def __init__(self, layers=50):
+        self.params = train_parameters
+        self.layers = layers
+    def net(self, input, embedding_size=256):
+        layers = self.layers
+        supported_layers = [50, 101, 152]
+        assert layers in supported_layers, \
+            "supported layers are {} but input layer is {}".format(supported_layers, layers)
+        endpoints = {}
+        if layers == 50:
+            cardinality = 32
+            reduction_ratio = 16
+            depth = [3, 4, 6, 3]
+            num_filters = [128, 256, 512, 1024]
+            conv = self.conv_bn_layer(
+                input=input, num_filters=64, filter_size=3, stride=2, act='relu', name='conv1_1')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=64, filter_size=3, stride=1, act='relu', name='conv1_2')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=128, filter_size=3, stride=1, act='relu', name='conv1_3')
+            conv = fluid.layers.pool2d(
+                input=conv,
+                pool_size=3,
+                pool_stride=2,
+                pool_padding=1,
+                pool_type='max')
+        elif layers == 101:
+            cardinality = 32
+            reduction_ratio = 16
+            depth = [3, 4, 23, 3]
+            num_filters = [128, 256, 512, 1024]
+            conv = self.conv_bn_layer(
+                input=input, num_filters=64, filter_size=3, stride=2, act='relu', name='conv1_1')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=64, filter_size=3, stride=1, act='relu', name='conv1_2')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=128, filter_size=3, stride=1, act='relu', name='conv1_3')
+            conv = fluid.layers.pool2d(
+                input=conv,
+                pool_size=3,
+                pool_stride=2,
+                pool_padding=1,
+                pool_type='max')
+        elif layers == 152:
+            cardinality = 64
+            reduction_ratio = 16
+            depth = [3, 8, 36, 3]
+            num_filters = [256, 512, 1024, 2048]
+            conv = self.conv_bn_layer(
+                input=input,
+                num_filters=64,
+                filter_size=3,
+                stride=2,
+                act='relu',
+                name='conv1_1')
+            conv = self.conv_bn_layer(
+                input=conv, num_filters=64, filter_size=3, stride=1, act='relu',name='conv1_2')
+            conv = self.conv_bn_layer(
+                input=conv,
+                num_filters=128,
+                filter_size=3,
+                stride=1,
+                act='relu',
+                name='conv1_3')
+            conv = fluid.layers.pool2d(
+                input=conv, pool_size=3, pool_stride=2, pool_padding=1, \
+                pool_type='max')
+        n = 1 if layers == 50 or layers == 101 else 3
+        for block in range(len(depth)):
+            n += 1
+            for i in range(depth[block]):
+                conv = self.bottleneck_block(
+                    input=conv,
+                    num_filters=num_filters[block],
+                    stride=2 if i == 0 and block != 0 else 1,
+                    cardinality=cardinality,
+                    reduction_ratio=reduction_ratio,
+                    if_first=block==0, 
+                    name=str(n)+'_'+str(i+1))
+        pool = fluid.layers.pool2d(
+            input=conv, pool_size=14, pool_type='avg', global_pooling=True)
+        if embedding_size > 0:
+            embedding = fluid.layers.fc(input=pool, size=embedding_size)
+            endpoints['embedding'] = embedding
+        else:
+            endpoints['embedding'] = pool
+        return endpoints
+    def shortcut(self, input, ch_out, stride, name, if_first=False):
+        ch_in = input.shape[1]
+        if ch_in != ch_out or stride != 1:
+            filter_size = 1
+            if if_first:
+                return self.conv_bn_layer(input, ch_out, filter_size, stride, name='conv'+name+'_prj')
+            else:
+                return self.conv_bn_layer_new(input, ch_out, filter_size, stride, name='conv'+name+'_prj')
+        else:
+            return input
+    def bottleneck_block(self, input, num_filters, stride, cardinality,
+                         reduction_ratio,if_first, name=None):
+        conv0 = self.conv_bn_layer(
+            input=input, num_filters=num_filters, filter_size=1, act='relu',name='conv'+name+'_x1')
+        conv1 = self.conv_bn_layer(
+            input=conv0,
+            num_filters=num_filters,
+            filter_size=3,
+            stride=stride,
+            groups=cardinality,
+            act='relu',
+            name='conv'+name+'_x2')
+        if cardinality == 64:
+            num_filters = num_filters / 2
+        conv2 = self.conv_bn_layer(
+            input=conv1, num_filters=num_filters * 2, filter_size=1, act=None, name='conv'+name+'_x3')
+        scale = self.squeeze_excitation(
+            input=conv2,
+            num_channels=num_filters * 2,
+            reduction_ratio=reduction_ratio,
+            name='fc'+name)
+        short = self.shortcut(input, num_filters * 2, stride, if_first=if_first, name=name)
+        return fluid.layers.elementwise_add(x=short, y=scale, act='relu')
+    def conv_bn_layer(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=stride,
+            padding=(filter_size - 1) / 2,
+            groups=groups,
+            act=None,
+            bias_attr=False,
+            param_attr=ParamAttr(name=name + '_weights'),
+            )
+        bn_name = name + "_bn"
+        return fluid.layers.batch_norm(input=conv, act=act, 
+                                       param_attr=ParamAttr(name=bn_name + '_scale'),
+                                       bias_attr=ParamAttr(bn_name + '_offset'),
+                                       moving_mean_name=bn_name + '_mean',
+                                       moving_variance_name=bn_name + '_variance')
+    def conv_bn_layer_new(self,
+                      input,
+                      num_filters,
+                      filter_size,
+                      stride=1,
+                      groups=1,
+                      act=None,
+                      name=None):
+        pool = fluid.layers.pool2d(input=input,
+            pool_size=2,
+            pool_stride=2,
+            pool_padding=0,
+            pool_type='avg')
+        conv = fluid.layers.conv2d(
+            input=pool,
+            num_filters=num_filters,
+            filter_size=filter_size,
+            stride=1,
+            padding=(filter_size - 1) / 2,
+            groups=groups,
+            act=None,
+            param_attr=ParamAttr(name=name + "_weights"),
+            bias_attr=False)
+        bn_name = name + "_bn"
+        return fluid.layers.batch_norm(input=conv, 
+                                       act=act,
+                                       param_attr=ParamAttr(name=bn_name + '_scale'),
+                                       bias_attr=ParamAttr(bn_name + '_offset'),
+                                       moving_mean_name=bn_name + '_mean',
+                                       moving_variance_name=bn_name + '_variance')
+    def squeeze_excitation(self, input, num_channels, reduction_ratio, name=None):
+        pool = fluid.layers.pool2d(
+            input=input, pool_size=0, pool_type='avg', global_pooling=True)
+        stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+        squeeze = fluid.layers.fc(input=pool,
+                                  size=num_channels / reduction_ratio,
+                                  act='relu',
+                                  param_attr=fluid.param_attr.ParamAttr(
+                                      initializer=fluid.initializer.Uniform(
+                                          -stdv, stdv),name=name+'_sqz_weights'),
+                                 bias_attr=ParamAttr(name=name+'_sqz_offset'))
+        stdv = 1.0 / math.sqrt(squeeze.shape[1] * 1.0)
+        excitation = fluid.layers.fc(input=squeeze,
+                                     size=num_channels,
+                                     act='sigmoid',
+                                     param_attr=fluid.param_attr.ParamAttr(
+                                         initializer=fluid.initializer.Uniform(
+                                             -stdv, stdv),name=name+'_exc_weights'),
+                                    bias_attr=ParamAttr(name=name+'_exc_offset'))
+        scale = fluid.layers.elementwise_mul(x=input, y=excitation, axis=0)
+        return scale
+def SE_ResNeXt50_32x4d_vd_embedding():
+    model = SE_ResNeXt(layers=50)
+    return model
+def SE_ResNeXt101_32x4d_vd_embedding():
+    model = SE_ResNeXt(layers=101)
+    return model
+def SE_ResNeXt152_64x4d_vd_embedding():
+    model = SE_ResNeXt(layers=152)
+    return model
--- a/PaddleCV/Research/landmark/inference/set_env.sh
+++ b/PaddleCV/Research/landmark/inference/set_env.sh
+export CUDA_VISIBLE_DEVICES=0
+export FLAGS_fraction_of_gpu_memory_to_use=0.8
+export LD_LIBRARY_PATH=./so:$LD_LIBRARY_PATH
--- a/PaddleCV/Research/landmark/inference/test_data/0.jpg
+++ b/PaddleCV/Research/landmark/inference/test_data/0.jpg
--- a/PaddleCV/Research/landmark/inference/test_data/1.jpg
+++ b/PaddleCV/Research/landmark/inference/test_data/1.jpg
--- a/PaddleCV/Research/landmark/inference/test_data/2.jpg
+++ b/PaddleCV/Research/landmark/inference/test_data/2.jpg
--- a/PaddleCV/Research/landmark/inference/test_data/2e44b31818acc600.jpeg
+++ b/PaddleCV/Research/landmark/inference/test_data/2e44b31818acc600.jpeg
--- a/PaddleCV/Research/landmark/inference/test_data/3.jpg
+++ b/PaddleCV/Research/landmark/inference/test_data/3.jpg
--- a/PaddleCV/Research/landmark/inference/utility.py
+++ b/PaddleCV/Research/landmark/inference/utility.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import time
+import os
+import subprocess
+import distutils.util
+import numpy as np
+from paddle.fluid import core
+def print_arguments(args):
+    """Print argparse's arguments.
+    Usage:
+    .. code-block:: python
+        parser = argparse.ArgumentParser()
+        parser.add_argument("name", default="Jonh", type=str, help="User name.")
+        args = parser.parse_args()
+        print_arguments(args)
+    :param args: Input argparse.Namespace for printing.
+    :type args: argparse.Namespace
+    """
+    print("-----------  Configuration Arguments -----------")
+    for arg, value in sorted(vars(args).iteritems()):
+        print("%s: %s" % (arg, value))
+    print("------------------------------------------------")
+def add_arguments(argname, type, default, help, argparser, **kwargs):
+    """Add argparse's argument.
+    Usage:
+    .. code-block:: python
+        parser = argparse.ArgumentParser()
+        add_argument("name", str, "Jonh", "User name.", parser)
+        args = parser.parse_args()
+    """
+    type = distutils.util.strtobool if type == bool else type
+    argparser.add_argument(
+        "--" + argname,
+        default=default,
+        type=type,
+        help=help + ' Default: %(default)s.',
+        **kwargs)
+def fmt_time():
+    """ get formatted time for now
+    """
+    now_str = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))
+    return now_str
+def get_gpunum():
+    """ get number of gpu devices
+    """
+    visibledevice = os.getenv('CUDA_VISIBLE_DEVICES')
+    if visibledevice:
+        devicenum = len(visibledevice.split(','))
+    else:
+        devicenum = subprocess.check_output(['nvidia-smi', '-L']).count('\n')
+    return devicenum
--- a/PaddleCV/Research/landmark/pypredict/Makefile
+++ b/PaddleCV/Research/landmark/pypredict/Makefile
+.PHONY:clean
+LDFLAGS = -fPIC -shared -Wl,--rpath=\$ORIGIN -Wl,--rpath=\$ORIGIN/../so -Wl,-z,origin
+LIBS = ./so/libpaddle_fluid.so ./so/libmkldnn.so.0 ./so/libmklml_intel.so ./so/libiomp5.so -L $(PYTHONHOME)/lib -lpython2.7
+CFLAGS= -fPIC -I pybind11/include -I ./fluid_inference/third_party/install/glog/include -I./fluid_inference/third_party/install/gflags/include -I ./fluid_inference/paddle/include/ -I. -I./fluid_inference/third_party/boost/ -I$(PYTHONHOME)/include/python2.7 -std=c++11  
+OBJS = predictor.o\
+       py_cnnpredict.o\
+       conf_parser.o
+PyCNNPredict.so: $(OBJS) 
+	g++ $(CFLAGS) $(LDFLAGS) -o $@ $^ $(LIBS) 
+%.o : %.cpp
+	g++ $(CFLAGS) -c $^ -o $@
+clean:
+	rm -rf $(OBJS) PyCNNPredict.so
\ No newline at end of file
--- a/PaddleCV/Research/landmark/pypredict/README.md
+++ b/PaddleCV/Research/landmark/pypredict/README.md
+# Accelerated Infer Project
+This project is to accelerate the prediction of cnn.We need to compile theproject.
+## Environment
+Python2.7, Numpy
+## Compile The Entire C++ Project
+first open build.sh, and you need to set PYTHONHOME nev in build.sh
+```
+    export PYTHONHOME=/your/python/home
+    sh build.sh
+```
+so folder will appear, This is the c++ program used to speed up the prediction.
+then you can copy the so file to ../inference to predict models
+```
+    mv so ../inference
+```
--- a/PaddleCV/Research/landmark/pypredict/build.sh
+++ b/PaddleCV/Research/landmark/pypredict/build.sh
+set -x
+#http://www.paddlepaddle.org/documentation/docs/en/1.4/advanced_usage/deploy/inference/build_and_install_lib_en.html
+alias wget='wget --no-check-certificat '
+alias git="/usr/bin/git"
+#set python home
+export PYTHONHOME=~/.jumbo
+wget https://paddle-inference-lib.bj.bcebos.com/1.4.1-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz
+tar -xzf fluid_inference.tgz
+mkdir so
+cp `find fluid_inference -name '*.so*'`  so/
+git clone https://github.com/pybind/pybind11.git
+cd pybind11 && git checkout v2.2.4 && cd -
+make
+mv PyCNNPredict.so so
--- a/PaddleCV/Research/landmark/pypredict/cnnpredict_interface.h
+++ b/PaddleCV/Research/landmark/pypredict/cnnpredict_interface.h
+#pragma once
+#include <cstddef>
+#include <memory>
+#include <string>
+#include <vector>
+enum class DataType : int {
+  INT8 = 0,
+  INT32 = 2,
+  INT64 = 3,
+  FLOAT32 = 4,
+};
+inline size_t get_type_size(DataType type) {
+  switch (type) {
+    case DataType::INT8:
+      return sizeof(int8_t);
+    case DataType::INT32:
+      return sizeof(int32_t);
+    case DataType::INT64:
+      return sizeof(int64_t);
+    case DataType::FLOAT32:
+      return sizeof(float);
+    default:
+      return 0;
+  }
+}
+struct DataBuf {
+  std::size_t size;
+  DataType type;
+  std::shared_ptr<char> data;
+  DataBuf() = default;
+  DataBuf(DataType dtype, size_t dsize) { alloc(dtype, dsize); }
+  DataBuf(const void *ddata, DataType dtype, size_t dsize) {
+    alloc(dtype, dsize);
+    copy(ddata, dsize);
+  }
+  DataBuf(const DataBuf &dbuf)
+      : size(dbuf.size), type(dbuf.type), data(dbuf.data) {}
+  DataBuf &operator=(const DataBuf &dbuf) {
+    size = dbuf.size;
+    type = dbuf.type;
+    data = dbuf.data;
+    return *this;
+  }
+  void reset(const void *ddata, size_t dsize) {
+    clear();
+    alloc(type, dsize);
+    copy(ddata, dsize);
+  }
+  void clear() {
+    size = 0;
+    data.reset();
+  }
+  ~DataBuf() { clear(); }
+ private:
+  void alloc(DataType dtype, size_t dsize) {
+    type = dtype;
+    size = dsize;
+    data.reset(new char[dsize * get_type_size(dtype)],
+               std::default_delete<char[]>());
+  }
+  void copy(const void *ddata, size_t dsize) {
+    const char *temp = reinterpret_cast<const char *>(ddata);
+    std::copy(temp, temp + dsize * get_type_size(type), data.get());
+  }
+};
+struct Tensor {
+  std::string name;
+  std::vector<int> shape;
+  std::vector<std::vector<size_t>> lod;
+  DataBuf data;
+};
+class ICNNPredict {
+ public:
+  ICNNPredict() {}
+  virtual ~ICNNPredict() {}
+  virtual ICNNPredict *clone() = 0;
+  virtual bool predict(const std::vector<Tensor> &inputs,
+                       const std::vector<std::string> &layers,
+                       std::vector<Tensor> &outputs) = 0;
+  virtual bool predict(const std::vector<std::vector<float>> &input_datas,
+                       const std::vector<std::vector<int>> &input_shapes,
+                       const std::vector<std::string> &layers,
+                       std::vector<std::vector<float>> &output_datas,
+                       std::vector<std::vector<int>> &output_shapes) = 0;
+  virtual void destroy(std::vector<Tensor> &tensors) {
+    std::vector<Tensor>().swap(tensors);
+  }
+  virtual void destroy(std::vector<std::vector<float>> &datas) {
+    std::vector<std::vector<float>>().swap(datas);
+  }
+  virtual void destroy(std::vector<std::vector<int>> &shapes) {
+    std::vector<std::vector<int>>().swap(shapes);
+  }
+};
+ICNNPredict *create_cnnpredict(const std::string &conf_file,
+                               const std::string &prefix);
--- a/PaddleCV/Research/landmark/pypredict/common.h
+++ b/PaddleCV/Research/landmark/pypredict/common.h
+#pragma once
+#include <unistd.h>
+#include <fstream>
+#include <iostream>
+#include <sstream>
+#include <string>
+static inline bool file_exist(const std::string &file_name) {
+  return ((access(file_name.c_str(), 0)) != -1) ? true : false;
+}
+template <class T>
+static inline bool str2num(const std::string &str, T &num) {
+  std::istringstream istr(str);
+  istr >> num;
+  return !istr.fail();
+};
+template <class T>
+static bool strs2nums(const std::vector<std::string> &strs,
+                      std::vector<T> &nums) {
+  nums.resize(strs.size());
+  for (size_t i = 0; i < strs.size(); i++) {
+    if (!str2num(strs[i], nums[i])) {
+      nums.clear();
+      return false;
+    }
+  }
+  return true;
+};
+template <class T>
+static inline std::string num2str(T a) {
+  std::stringstream istr;
+  istr << a;
+  return istr.str();
+}
--- a/PaddleCV/Research/landmark/pypredict/conf_parser.cpp
+++ b/PaddleCV/Research/landmark/pypredict/conf_parser.cpp
+#include <algorithm>
+#include <fstream>
+#include <iostream>
+#include <sstream>
+#include "logger.h"
+#include "common.h"
+#include "conf_parser.h"
+std::string join_string(const std::string &prefix, const std::string &str) {
+  if (prefix.empty() && str.empty()) {
+    return "";
+  } else if (prefix.empty()) {
+    return str;
+  } else if (str.empty()) {
+    return prefix;
+  }
+  return prefix + str;
+}
+bool read_text_file(const std::string &file_name, std::string &str) {
+  LOG(INFO) << "read_text_file!";
+  if (!file_exist(file_name)) {
+    LOG(FATAL) << "file: " << file_name << "is not exist!";
+    return false;
+  }
+  std::ifstream ifs(file_name.c_str(), std::ios::binary);
+  if (!ifs) {
+    LOG(FATAL) << "fail to open " << file_name;
+    return false;
+  }
+  std::stringstream ss;
+  ss << ifs.rdbuf();
+  str = ss.str();
+  return true;
+}
+std::vector<std::string> split_str(const std::string &str,
+                                   const std::string &sep,
+                                   bool suppress_blanks) {
+  std::vector<std::string> array;
+  size_t position = 0;
+  size_t last_position = 0;
+  last_position = position = 0;
+  while (position + sep.size() <= str.size()) {
+    if (str[position] == sep[0] && str.substr(position, sep.size()) == sep) {
+      if (!suppress_blanks || position - last_position > 0) {
+        array.push_back(str.substr(last_position, position - last_position));
+      }
+      last_position = position = position + sep.size();
+    } else {
+      position++;
+    }
+  }
+  if (!suppress_blanks || last_position - str.size()) {
+    array.push_back(str.substr(last_position));
+  }
+  return array;
+}
+void strip(std::string &s) {
+  if (s.empty()) {
+    return;
+  }
+  s.erase(remove_if(s.begin(), s.end(), isspace), s.end());
+  if (s.size() == 1 &&
+      (s[0] == ' ' || s[0] == '\t' || s[0] == '\n' || s[0] == '\r')) {
+    s = "";
+  }
+  int begin = -1;
+  int end = 0;
+  for (size_t i = 0; i < s.length(); i++) {
+    if (!(s[i] == ' ' || s[i] == '\t' || s[i] == '\n' || s[i] == '\r')) {
+      begin = i;
+      break;
+    }
+  }
+  if (begin < 0) {
+    s = "";
+    return;
+  }
+  for (int i = s.length() - 1; i >= 0; i--) {
+    if (!(s[i] == ' ' || s[i] == '\t' || s[i] == '\n' || s[i] == '\r')) {
+      end = i;
+      break;
+    }
+  }
+  if (((int)s.size()) != end - begin + 1) {
+    s = s.substr(begin, end - begin + 1);
+  }
+}
+bool ConfParserBase::load(const std::string &file_name) {
+  std::string str;
+  if (!read_text_file(file_name, str)) {
+    LOG(FATAL) << "fail to read " << file_name;
+    return false;
+  }
+  load_from_string(str);
+  return true;
+}
+bool ConfParserBase::load_from_string(const std::string &str) {
+  map_clear();
+  std::vector<std::string> lines = split_str(str, "\n", true);
+  int count = 0;
+  for (size_t i = 0; i < lines.size(); i++) {
+    if (parse_line(lines[i])) {
+      count++;
+    }
+  }
+  return (count > 0);
+}
+bool ConfParserBase::get_conf_float(const std::string &key,
+                                    float &value) const {
+  MapIter it = _map.find(key);
+  if (it == _map.end()) {
+    return false;
+  }
+  float temp = 0;
+  if (!str2num(it->second, temp)) {
+    LOG(WARNING) << "failure to convert " << it->second << " to float";
+    return false;
+  }
+  value = temp;
+  return true;
+}
+bool ConfParserBase::get_conf_uint(const std::string &key,
+                                   unsigned int &value) const {
+  MapIter it = _map.find(key);
+  if (it == _map.end()) {
+    LOG(WARNING) << "fail to get: " << key;
+    return false;
+  }
+  unsigned int temp = 0;
+  if (!str2num(it->second, temp)) {
+    LOG(ERROR) << "fail to convert " << it->second << " to float";
+    return false;
+  }
+  value = temp;
+  return true;
+}
+bool ConfParserBase::get_conf_int(const std::string &key, int &value) const {
+  MapIter it = _map.find(key);
+  if (it == _map.end()) {
+    return false;
+  }
+  int temp = 0;
+  if (!str2num(it->second, temp)) {
+    LOG(ERROR) << "fail to convert " << it->second << " to float";
+    return false;
+  }
+  value = temp;
+  return true;
+}
+bool ConfParserBase::get_conf_str(const std::string &key,
+                                  std::string &value) const {
+  MapIter it = _map.find(key);
+  if (it == _map.end()) {
+    LOG(WARNING) << "fail to get: " << key;
+    return false;
+  } else {
+    value = it->second;
+  }
+  return true;
+}
+bool ConfParserBase::exist(const char *name) const {
+  return _map.find(name) != _map.end();
+}
+void ConfParserBase::map_clear() { _map.clear(); }
+bool ConfParserBase::parse_line(const std::string &line) {
+  std::string strip_line = line;
+  strip(strip_line);
+  if (strip_line.empty() || strip_line[0] == '#' || strip_line[0] == ';') {
+    return false;
+  }
+  std::basic_string<char>::size_type index_pos = strip_line.find(':');
+  if (index_pos == std::string::npos) {
+    LOG(ERROR) << "wrong setting format of line: " << line;
+    return false;
+  }
+  std::string key = strip_line.substr(0, index_pos);
+  std::string value =
+      strip_line.substr(index_pos + 1, strip_line.size() - index_pos - 1);
+  if (!_map.insert(std::pair<std::string, std::string>(key, value)).second) {
+    LOG(WARNING) << "value already exist for key: " << key;
+    return false;
+  }
+  return true;
+}
+ConfParser::~ConfParser() {
+  if (NULL != _conf) {
+    delete _conf;
+    _conf = NULL;
+  }
+}
+bool ConfParser::init(const std::string &conf_file) {
+  _conf = new ConfParserBase();
+  if (!_conf->load(conf_file)) {
+    LOG(FATAL) << "fail to laod conf file: " << conf_file;
+    return false;
+  }
+  return true;
+}
+bool ConfParser::get_uint(const std::string &prefix,
+                          const std::string &key,
+                          unsigned int &value) const {
+  std::string pre_key = join_string(prefix, key);
+  if (!_conf->get_conf_uint(pre_key, value)) {
+    return false;
+  }
+  return true;
+}
+bool ConfParser::get_uints(const std::string &prefix,
+                           const std::string &key,
+                           std::vector<unsigned int> &values) const {
+  std::vector<std::string> str_values;
+  get_strings(prefix, key, str_values);
+  return strs2nums(str_values, values);
+}
+bool ConfParser::get_int(const std::string &prefix,
+                         const std::string &key,
+                         int &value) const {
+  std::string pre_key = join_string(prefix, key);
+  if (!_conf->get_conf_int(pre_key, value)) {
+    return false;
+  }
+  return true;
+}
+bool ConfParser::get_ints(const std::string &prefix,
+                          const std::string &key,
+                          std::vector<int> &values) const {
+  std::vector<std::string> str_values;
+  get_strings(prefix, key, str_values);
+  return strs2nums(str_values, values);
+}
+bool ConfParser::get_float(const std::string &prefix,
+                           const std::string &key,
+                           float &value) const {
+  std::string pre_key = join_string(prefix, key);
+  if (!_conf->get_conf_float(pre_key, value)) {
+    return false;
+  }
+  return true;
+}
+bool ConfParser::get_floats(const std::string &prefix,
+                            const std::string &key,
+                            std::vector<float> &values) const {
+  std::vector<std::string> str_values;
+  get_strings(prefix, key, str_values);
+  return strs2nums(str_values, values);
+}
+bool ConfParser::get_string(const std::string &prefix,
+                            const std::string &key,
+                            std::string &value) const {
+  std::string pre_key = join_string(prefix, key);
+  if (!_conf->get_conf_str(pre_key, value)) {
+    return false;
+  }
+  return true;
+}
+bool ConfParser::get_strings(const std::string &prefix,
+                             const std::string &key,
+                             std::vector<std::string> &values) const {
+  std::string pre_key = join_string(prefix, key);
+  std::string value;
+  if (!_conf->get_conf_str(pre_key, value)) {
+    return false;
+  }
+  std::vector<std::string> split_value = split_str(value, ",", true);
+  values.swap(split_value);
+  return true;
+}
--- a/PaddleCV/Research/landmark/pypredict/conf_parser.h
+++ b/PaddleCV/Research/landmark/pypredict/conf_parser.h
+#pragma once
+#include <map>
+#include <string>
+typedef std::map<std::string, std::string> Map;
+typedef Map::const_iterator MapIter;
+class ConfParserBase {
+ public:
+  ConfParserBase() {}
+  bool load(const std::string &file_name);
+  bool load_from_string(const std::string &str);
+  bool get_conf_float(const std::string &key, float &value) const;
+  bool get_conf_uint(const std::string &key, unsigned int &value) const;
+  bool get_conf_int(const std::string &key, int &value) const;
+  bool get_conf_str(const std::string &key, std::string &value) const;
+  bool exist(const char *name) const;
+  void map_clear();
+ private:
+  bool parse_line(const std::string &line);
+  Map _map;
+};
+class ConfParser {
+ public:
+  ConfParser() : _conf(NULL){};
+  ~ConfParser();
+  bool init(const std::string &conf_file);
+  bool get_uint(const std::string &prefix,
+                const std::string &key,
+                unsigned int &value) const;
+  bool get_uints(const std::string &prefix,
+                 const std::string &key,
+                 std::vector<unsigned int> &values) const;
+  bool get_int(const std::string &prefix,
+               const std::string &key,
+               int &value) const;
+  bool get_ints(const std::string &prefix,
+                const std::string &key,
+                std::vector<int> &values) const;
+  bool get_float(const std::string &prefix,
+                 const std::string &key,
+                 float &value) const;
+  bool get_floats(const std::string &prefix,
+                  const std::string &key,
+                  std::vector<float> &values) const;
+  bool get_string(const std::string &prefix,
+                  const std::string &key,
+                  std::string &value) const;
+  bool get_strings(const std::string &prefix,
+                   const std::string &key,
+                   std::vector<std::string> &values) const;
+ public:
+  ConfParserBase *_conf;
+};
--- a/PaddleCV/Research/landmark/pypredict/logger.h
+++ b/PaddleCV/Research/landmark/pypredict/logger.h
+#pragma once
+#include <iostream>
+#include <sstream>
+// compatiable with glog
+enum {
+  INFO = 0,
+  WARNING = 1,
+  ERROR = 2,
+  FATAL = 3,
+};
+struct NullStream : std::ostream {
+  NullStream() : std::ios(0), std::ostream(0) {}
+};
+class Logger {
+ public:
+  Logger(const char *filename, int lineno, int loglevel) {
+    static const char *log_levels[] = {"INFO ", "WARN ", "ERROR", "FATAL"};
+    static NullStream nullstream;
+    _loglevel = loglevel;
+    _logstream = (_loglevel >= getloglevel()) ? &std::cerr : &nullstream;
+    (*_logstream) << log_levels[_loglevel] << ":" << filename << "[" << lineno
+                  << "]";
+  }
+  static inline int &getloglevel() {
+    // default initialized with glog env
+    static int globallevel = getgloglevel();
+    return globallevel;
+  }
+  static inline void setloglevel(int loglevel) { getloglevel() = loglevel; }
+  static int getgloglevel() {
+    char *env = getenv("GLOG_minloglevel");
+    int level = WARNING;
+    if (env != NULL) {
+      int num = 0;
+      std::istringstream istr(env);
+      istr >> num;
+      if (!istr.fail()) {
+        level = num;
+      }
+    }
+    return level;
+  }
+  ~Logger() { *_logstream << std::endl; }
+  std::ostream &getstream() { return *_logstream; }
+ protected:
+  int _loglevel;
+  std::ostream *_logstream;
+};
+#define LOG(loglevel) Logger(__FILE__, __LINE__, loglevel).getstream()
--- a/PaddleCV/Research/landmark/pypredict/predictor.cpp
+++ b/PaddleCV/Research/landmark/pypredict/predictor.cpp
+#include <algorithm>
+#include <memory>
+#include "logger.h"
+#include "conf_parser.h"
+#include "predictor.h"
+Predictor::~Predictor() {}
+bool feed(paddle::PaddlePredictor *predictor,
+          const std::vector<Tensor> &tensors) {
+  LOG(INFO) << "Predictor::feed";
+  auto names = predictor->GetInputNames();
+  if (names.size() != tensors.size()) {
+    LOG(WARNING) << "The given size " << tensors.size()
+                 << " is not equal to the required size " << names.size();
+    return false;
+  }
+  for (size_t i = 0; i < names.size(); ++i) {
+    auto i_t = predictor->GetInputTensor(names[i]);
+    i_t->Reshape(tensors[i].shape);
+    i_t->SetLoD(tensors[i].lod);
+    if (tensors[i].data.type == DataType::FLOAT32) {
+      const float *temp =
+          reinterpret_cast<const float *>(tensors[i].data.data.get());
+      i_t->copy_from_cpu(temp);
+    } else if (tensors[i].data.type == DataType::INT32) {
+      const int32_t *temp =
+          reinterpret_cast<const int32_t *>(tensors[i].data.data.get());
+      i_t->copy_from_cpu(temp);
+    } else if (tensors[i].data.type == DataType::INT64) {
+      const int64_t *temp =
+          reinterpret_cast<const int64_t *>(tensors[i].data.data.get());
+      i_t->copy_from_cpu(temp);
+    } else {
+      LOG(ERROR) << "do not support current datatype";
+      return false;
+    }
+  }
+  return true;
+}
+bool fetch(paddle::PaddlePredictor *predictor, std::vector<Tensor> &tensors) {
+  LOG(INFO) << "Predictor::fetch";
+  auto names = predictor->GetOutputNames();
+  for (auto &name : names) {
+    auto o_t = predictor->GetOutputTensor(name);
+    std::vector<int> s = o_t->shape();
+    Tensor out;
+    out.shape = s;
+    out.lod = o_t->lod();
+    int num = std::accumulate(s.begin(), s.end(), 1, std::multiplies<int>());
+    if (o_t->type() == paddle::PaddleDType::FLOAT32) {
+      out.data = DataBuf(DataType::FLOAT32, size_t(num));
+      float *p_data = reinterpret_cast<float *>(out.data.data.get());
+      o_t->copy_to_cpu(p_data);
+    } else if (o_t->type() == paddle::PaddleDType::INT32) {
+      out.data = DataBuf(DataType::INT32, size_t(num));
+      int32_t *p_data = reinterpret_cast<int32_t *>(out.data.data.get());
+      o_t->copy_to_cpu(p_data);
+    } else if (o_t->type() == paddle::PaddleDType::INT64) {
+      out.data = DataBuf(DataType::INT64, size_t(num));
+      int64_t *p_data = reinterpret_cast<int64_t *>(out.data.data.get());
+      o_t->copy_to_cpu(p_data);
+    } else {
+      LOG(ERROR) << "do no support current datatype";
+      return false;
+    }
+    tensors.push_back(out);
+  }
+  return true;
+}
+bool Predictor::predict(const std::vector<Tensor> &inputs,
+                        const std::vector<std::string> &layers,
+                        std::vector<Tensor> &outputs) {
+  LOG(INFO) << "Predictor::predict";
+  (void)layers;
+  // 1. feed input
+  if (!feed(_predictor.get(), inputs)) {
+    return false;
+  }
+  // 2. execute inference
+  if (!_predictor->ZeroCopyRun()) {
+    LOG(WARNING) << "fail to execute predictor";
+    return false;
+  }
+  // 3. fetch output
+  if (!fetch(_predictor.get(), outputs)) {
+    return false;
+  }
+  return true;
+}
+bool check_shape(const std::vector<std::vector<float>> &datas,
+                 const std::vector<std::vector<int>> &shapes) {
+  LOG(INFO) << "check_shape";
+  if (datas.size() != shapes.size()) {
+    LOG(ERROR) << "datas size: " << datas.size() << " != "
+               << "shapes size(): " << shapes.size();
+    return false;
+  }
+  for (size_t i = 0; i < datas.size(); ++i) {
+    int count = 1;
+    for (auto num : shapes[i]) {
+      count *= num;
+    }
+    int data_size = static_cast<int>(datas[i].size());
+    if (count != data_size) {
+      LOG(ERROR) << "data[" << i << "] size " << data_size << " != "
+                 << "shape [" << i << "] size " << count;
+      return false;
+    }
+  }
+  return true;
+}
+bool feed(paddle::PaddlePredictor *predictor,
+          const std::vector<std::vector<float>> &datas,
+          const std::vector<std::vector<int>> &shapes) {
+  LOG(INFO) << "Predictor::feed";
+  // 1. check input shape
+  if (!check_shape(datas, shapes)) {
+    return false;
+  }
+  // 2. check given input and required input
+  auto names = predictor->GetInputNames();
+  if (names.size() != datas.size()) {
+    LOG(WARNING) << "The given size " << datas.size()
+                 << " is not equal to the required size " << names.size();
+    return false;
+  }
+  // 3. feed
+  for (size_t i = 0; i < names.size(); ++i) {
+    auto i_t = predictor->GetInputTensor(names[i]);
+    i_t->Reshape(shapes[i]);
+    i_t->copy_from_cpu(datas[i].data());
+  }
+  return true;
+}
+bool fetch(paddle::PaddlePredictor *predictor,
+           std::vector<std::vector<float>> &datas,
+           std::vector<std::vector<int>> &shapes) {
+  LOG(INFO) << "Predictor::fetch";
+  auto names = predictor->GetOutputNames();
+  for (auto &name : names) {
+    auto o_t = predictor->GetOutputTensor(name);
+    std::vector<int> s = o_t->shape();
+    shapes.push_back(s);
+    int num = std::accumulate(s.begin(), s.end(), 1, std::multiplies<int>());
+    std::vector<float> data(num);
+    o_t->copy_to_cpu(data.data());
+    datas.push_back(data);
+  }
+  return true;
+}
+bool Predictor::predict(const std::vector<std::vector<float>> &input_datas,
+                        const std::vector<std::vector<int>> &input_shapes,
+                        const std::vector<std::string> &layers,
+                        std::vector<std::vector<float>> &output_datas,
+                        std::vector<std::vector<int>> &output_shapes) {
+  LOG(INFO) << "Predictor::predict";
+  (void)layers;
+  // 1. feed input
+  if (!feed(_predictor.get(), input_datas, input_shapes)) {
+    return false;
+  }
+  // 2. execute inference
+  if (!_predictor->ZeroCopyRun()) {
+    LOG(WARNING) << "fail to execute predictor";
+    return false;
+  }
+  // 3. fetch output
+  if (!fetch(_predictor.get(), output_datas, output_shapes)) {
+    return false;
+  }
+  return true;
+}
+void init_tensorrt(const ConfParser *conf,
+                   const std::string &prefix,
+                   AnalysisConfig &config) {
+  LOG(INFO) << "Predictor::init_tensorrt()";
+  // 1. max_batch_size for tensorrt
+  int max_batch_size = 1;
+  if (!conf->get_int(prefix, "max_batch_size", max_batch_size)) {
+    LOG(WARNING) << "fail to get max_batch_size from conf, set as 1";
+  }
+  max_batch_size = std::max(1, max_batch_size);
+  // 2. workspace_size for tensorrt
+  int workspace_size = 0;
+  if (!conf->get_int(prefix, "workspace_size", workspace_size)) {
+    LOG(WARNING) << "fail to get workspace_size from conf, set as 0";
+  }
+  workspace_size = std::max(0, workspace_size);
+  // 3. min_subgraph_size for tensorrt
+  int min_subgraph_size = 3;
+  if (!conf->get_int(prefix, "min_subgraph_size", min_subgraph_size)) {
+    LOG(WARNING) << "fail to get min_subgraph_size from conf, set as 3";
+  }
+  min_subgraph_size = std::max(0, min_subgraph_size);
+  config.EnableTensorRtEngine(
+      workspace_size, max_batch_size, min_subgraph_size);
+}
+void init_anakin(const ConfParser *conf,
+                 const std::string &prefix,
+                 AnalysisConfig &config) {
+  LOG(INFO) << "Predictor::init_anakin()";
+  // 1. max_batch_size for tensorrt
+  int max_batch_size = 1;
+  if (!conf->get_int(prefix, "max_batch_size", max_batch_size)) {
+    LOG(WARNING) << "fail to get max_batch_size from conf, set as 1";
+  }
+  max_batch_size = std::max(1, max_batch_size);
+  std::map<std::string, std::vector<int>> anakin_max_input_dict;
+  std::vector<std::string> input_names;
+  if (!conf->get_strings(prefix, "input_names", input_names)) {
+    LOG(WARNING) << "fail to get input_names from conf";
+  }
+  for (auto &n : input_names) {
+    std::vector<int> shape;
+    if (!conf->get_ints(prefix, n, shape)) {
+      LOG(WARNING) << "fail to get the shape of " + n;
+    } else {
+      anakin_max_input_dict[n] = shape;
+    }
+  }
+  config.EnableAnakinEngine(max_batch_size, anakin_max_input_dict);
+  config.pass_builder()->TurnOnDebug();
+}
+void init_gpu(const ConfParser *conf,
+              const std::string &prefix,
+              int device,
+              AnalysisConfig &config) {
+  LOG(INFO) << "Predictor::init_gpu()";
+  // 1. GPU memeroy
+  uint32_t gpu_memory_mb = 1024;
+  if (!conf->get_uint(prefix, "gpu_memory_mb", gpu_memory_mb)) {
+    LOG(WARNING) << "fail to get gpu_memory_mb from conf, set as 1024";
+  }
+  config.EnableUseGpu(gpu_memory_mb, device);
+  // 2. use_tensorrt
+  std::string infer_engine;
+  if (!conf->get_string(prefix, "infer_engine", infer_engine)) {
+    LOG(WARNING) << "disable infer engine";
+    return;
+  } else if (infer_engine == "tensorrt") {
+    init_tensorrt(conf, prefix + "tensorrt_", config);
+  } else if (infer_engine == "anakin") {
+    init_anakin(conf, prefix + "anakin_", config);
+  } else {
+    LOG(WARNING) << "unknwon infer engine";
+    return;
+  }
+}
+void init_cpu(const ConfParser *conf,
+              const std::string &prefix,
+              AnalysisConfig &config) {
+  LOG(INFO) << "Predictor::init_cpu()";
+  config.DisableGpu();
+  // 1. cpu_math_library (such as mkl/openblas) num_threads
+  int num_threads = 1;
+  if (!conf->get_int(prefix, "num_threads", num_threads)) {
+    LOG(WARNING) << "fail to get num_threads conf, set as 1";
+  }
+  num_threads = std::max(1, num_threads);
+  config.SetCpuMathLibraryNumThreads(num_threads);
+  // 2. use_mkldnn
+  int use_mkldnn = -1;
+  if (conf->get_int(prefix, "use_mkldnn", use_mkldnn) && use_mkldnn > 0) {
+    config.EnableMKLDNN();
+  }
+}
+bool init_model(const ConfParser *conf,
+                const std::string &prefix,
+                AnalysisConfig &config) {
+  LOG(INFO) << "Predictor::init_model()";
+  std::string prog_file;
+  if (!conf->get_string(prefix, "prog_file", prog_file)) {
+    LOG(WARNING) << "fail to get prog_file from conf";
+  }
+  std::string param_file;
+  if (!conf->get_string(prefix, "param_file", param_file)) {
+    LOG(WARNING) << "fail to get param_file from conf";
+  }
+  if (!prog_file.empty() && !param_file.empty()) {
+    if (!file_exist(prog_file)) {
+      LOG(FATAL) << "file: " << prog_file << " is not exist";
+      return false;
+    }
+    if (!file_exist(param_file)) {
+      LOG(FATAL) << "file: " << param_file << " is not exist";
+      return false;
+    }
+    config.SetModel(prog_file, param_file);
+    return true;
+  }
+  std::string model_path;
+  if (!conf->get_string(prefix, "model_path", model_path)) {
+    LOG(FATAL) << "fail to get model_path from conf";
+    return false;
+  }
+  config.SetModel(model_path);
+  return true;
+}
+void show_version_info() {
+  static bool initialized = false;
+  if (initialized) {
+    return;
+  }
+  LOG(INFO) << "[date:" << __DATE__ << "]"
+            << "[time:" << __TIME__ << "]";
+  LOG(INFO) << "paddle " << paddle::get_version();
+  initialized = true;
+}
+bool Predictor::init(const std::string &conf_file, const std::string &prefix) {
+  LOG(INFO) << "Predictor::init()";
+  show_version_info();
+  std::unique_ptr<AnalysisConfig> config(new AnalysisConfig());
+  std::unique_ptr<ConfParser> conf(new ConfParser());
+  if (!conf->init(conf_file)) {
+    LOG(FATAL) << "fail to load conf file: " << conf_file;
+    return false;
+  }
+  // 1. Debug
+  if (!conf->get_int(prefix, "debug", _debug)) {
+    _debug = -1;
+    LOG(WARNING) << "fail to get debug from conf, set as -1";
+  }
+  // 2. init model
+  if (!init_model(conf.get(), prefix, *config.get())) {
+    LOG(FATAL) << "fail to init model";
+    return false;
+  }
+  // 3. enable_ir_optim
+  int ir_optim = -1;
+  if (!conf->get_int(prefix, "enable_ir_optim", ir_optim)) {
+    LOG(WARNING) << "fail to get enable_ir_optim from conf, set as false";
+  }
+  config->SwitchIrOptim(ir_optim > 0);
+  // 4. specify_input_name
+  int sp_input = -1;
+  if (!conf->get_int(prefix, "specify_input_name", sp_input)) {
+    LOG(WARNING) << "fail to get specify_input_name from conf, set as false";
+  }
+  config->SwitchSpecifyInputNames(sp_input > 0);
+  // 5. use zerocopy
+  config->SwitchUseFeedFetchOps(false);
+  // 6. Device
+  int device = -1;
+  if (!conf->get_int(prefix, "device", device)) {
+    LOG(WARNING) << "fail to get device from conf";
+    return false;
+  }
+  if (device < 0) {
+    LOG(INFO) << "use cpu!";
+    init_cpu(conf.get(), prefix, *config.get());
+  } else {
+    LOG(INFO) << "use gpu!";
+    init_gpu(conf.get(), prefix, device, *config.get());
+  }
+  // 7. delete unused pass
+  std::vector<std::string> passes;
+  if (conf->get_strings(prefix, "delete_pass", passes)) {
+    for (auto &p : passes) {
+      LOG(INFO) << "delete pass: " << p;
+      config->pass_builder()->DeletePass(p);
+    }
+  }
+  // 8. create predictor
+  auto predictor = CreatePaddlePredictor<AnalysisConfig>(*config.get());
+  if (NULL == predictor) {
+    LOG(ERROR) << "fail to create paddle predictor";
+    return false;
+  }
+  _predictor = std::move(predictor);
+  return true;
+}
+bool Predictor::init_shared(Predictor *cls) {
+  LOG(INFO) << "Predictor::init_shared";
+  this->_predictor = std::move(cls->_predictor->Clone());
+  if (NULL == this->_predictor) {
+    LOG(ERROR) << "fail to clone paddle predictor";
+    return false;
+  }
+  return true;
+}
+ICNNPredict *Predictor::clone() {
+  LOG(INFO) << "Predictor::clone";
+  Predictor *cls = new Predictor();
+  if (!cls->init_shared(this)) {
+    LOG(FATAL) << "fail to call cls->init_shared";
+    delete cls;
+    return NULL;
+  }
+  return cls;
+}
+ICNNPredict *create_cnnpredict(const std::string &conf_file,
+                               const std::string &prefix) {
+  LOG(INFO) << "create_cnnpredict";
+  Predictor *predictor = new Predictor();
+  if (!predictor->init(conf_file, prefix)) {
+    delete predictor;
+    return NULL;
+  }
+  return predictor;
+}
--- a/PaddleCV/Research/landmark/pypredict/predictor.h
+++ b/PaddleCV/Research/landmark/pypredict/predictor.h
+#pragma once
+#include <iostream>
+#include <vector>
+#include "cnnpredict_interface.h"
+#include "common.h"
+#include "paddle_inference_api.h"
+using paddle::CreatePaddlePredictor;
+using paddle::AnalysisConfig;
+using paddle::PaddleEngineKind;
+class Predictor : public ICNNPredict {
+ public:
+  Predictor() : _debug(0) {}
+  virtual ~Predictor();
+  ICNNPredict *clone();
+  /**
+   * [init predict from conf]
+   * @param  conf_file [conf file]
+   * @param  prefix [prefix before every key]
+   * @return      [true of fasle]
+   */
+  bool init(const std::string &conf_file, const std::string &prefix);
+  bool predict(const std::vector<Tensor> &inputs,
+               const std::vector<std::string> &layers,
+               std::vector<Tensor> &outputs);
+  bool predict(const std::vector<std::vector<float>> &input_datas,
+               const std::vector<std::vector<int>> &input_shapes,
+               const std::vector<std::string> &layers,
+               std::vector<std::vector<float>> &output_datas,
+               std::vector<std::vector<int>> &output_shapes);
+ private:
+  bool init_shared(Predictor *cls);
+  int _debug;
+  std::unique_ptr<paddle::PaddlePredictor> _predictor;
+};
--- a/PaddleCV/Research/landmark/pypredict/py_cnnpredict.cpp
+++ b/PaddleCV/Research/landmark/pypredict/py_cnnpredict.cpp
+#include "logger.h"
+#include "cnnpredict_interface.h"
+#include "common.h"
+#include "py_cnnpredict.h"
+template <class T>
+vector<T> ndarray_to_vector(const py::array &nd) {
+  py::dtype datatype = nd.dtype();
+  size_t nd_dim = nd.ndim();
+  const auto *shape = nd.shape();
+  size_t data_num = nd.size();
+  // py::buffer_info buf = nd.request();
+  const T *nd_data = reinterpret_cast<const T *>(nd.data(0));
+  vector<T> vec(data_num, 0);
+  std::copy(nd_data, nd_data + data_num, vec.begin());
+  return vec;
+}
+template <class T>
+vector<T> list_to_vector(py::list &list) {
+  vector<T> vec;
+  for (size_t i = 0; i < py::len(list); i++) {
+    T l = py::cast<T>(list[i]);
+    vec.push_back(l);
+  }
+  return vec;
+}
+template <class T>
+vector<vector<T>> ndlist_to_vectors(py::list &ndlist) {
+  vector<vector<T>> vecs;
+  for (unsigned int i = 0; i < py::len(ndlist); i++) {
+    py::array nd = py::array(ndlist[i]);
+    vector<T> vec = ndarray_to_vector<T>(nd);
+    vecs.push_back(vec);
+  }
+  return vecs;
+}
+template <class T>
+py::array vector_to_ndarray(const vector<T> &vec) {
+  const std::vector<size_t> shape = {vec.size()};
+  auto format = py::format_descriptor<T>::format();
+  py::dtype dt(format);
+  py::array nd(dt, shape, (const char *)vec.data());
+  return nd;
+}
+template <class T>
+py::list vectors_to_list(const vector<vector<T>> &vecs) {
+  py::list ndlist;
+  for (int i = 0; i < vecs.size(); i++) {
+    py::array nd = vector_to_ndarray<T>(vecs[i]);
+    ndlist.append(nd);
+  }
+  return ndlist;
+}
+PyCNNPredict::~PyCNNPredict() {
+  if (_predictor != NULL) {
+    delete _predictor;
+    _predictor = NULL;
+  }
+}
+bool PyCNNPredict::init(string conf_file, string prefix) {
+  LOG(INFO) << "PyCNNPredict::init()";
+  _predictor = create_cnnpredict(conf_file, prefix);
+  if (_predictor == NULL) {
+    LOG(FATAL) << "fail to call create_cnnpredict";
+    return false;
+  }
+  return true;
+}
+py::list PyCNNPredict::postprocess(const vector<vector<float>> &vdatas,
+                                   const vector<vector<int>> &vshapes) {
+  LOG(INFO) << "PyCNNPredict::postprocess()";
+  py::list result;
+  if (vdatas.size() != vshapes.size()) {
+    LOG(FATAL) << "datas and shapes size not equal";
+    return result;
+  }
+  result.append(vectors_to_list(vdatas));
+  result.append(vectors_to_list(vshapes));
+  return result;
+}
+py::list PyCNNPredict::predict(py::list input_datas,
+                               py::list input_shapes,
+                               py::list layer_names) {
+  LOG(INFO) << "PyCNNPredict::predict()";
+  vector<vector<float>> inputdatas;
+  vector<vector<int>> inputshapes;
+  vector<string> layernames;
+  vector<vector<float>> outputdatas;
+  vector<vector<int>> outputshapes;
+  py::list result;
+  if (py::len(input_datas) != py::len(input_shapes)) {
+    LOG(FATAL) << "datas and shapes size not equal";
+    return result;
+  }
+  inputdatas = ndlist_to_vectors<float>(input_datas);
+  inputshapes = ndlist_to_vectors<int>(input_shapes);
+  layernames = list_to_vector<string>(layer_names);
+  bool ret = _predictor->predict(
+      inputdatas, inputshapes, layernames, outputdatas, outputshapes);
+  if (!ret) {
+    LOG(FATAL) << "fail to predict";
+    return result;
+  }
+  return postprocess(outputdatas, outputshapes);
+}
+PYBIND11_MODULE(PyCNNPredict, m) {
+  m.doc() = "pycnnpredict";
+  py::class_<PyCNNPredict>(m, "PyCNNPredict")
+      .def(py::init())
+      .def("init", &PyCNNPredict::init)
+      .def("predict", &PyCNNPredict::predict);
+}
--- a/PaddleCV/Research/landmark/pypredict/py_cnnpredict.h
+++ b/PaddleCV/Research/landmark/pypredict/py_cnnpredict.h
+#pragma once
+#include <pybind11/numpy.h>
+#include <pybind11/pybind11.h>
+#include <iostream>
+#include <string>
+#include <vector>
+namespace py = pybind11;
+using std::string;
+using std::vector;
+class PyCNNPredict {
+ public:
+  PyCNNPredict() : _predictor(NULL) {}
+  ~PyCNNPredict();
+  bool init(string conf_file, string prefix);
+  py::list predict(py::list input_datas,
+                   py::list input_shapes,
+                   py::list layer_names);
+ private:
+  ICNNPredict *_predictor;
+  py::list postprocess(const vector<vector<float>> &vdatas,
+                       const vector<vector<int>> &vshapes);
+};