add more config for lrc (#7)

* add more config for lrc * refine code style and add README

add more config for lrc (#7)
* add more config for lrc * refine code style and add README
693ab9b8 · jerrywgz · GitHub · 8e1ec6d0 · 693ab9b8 · 693ab9b8
19 changed file
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
+-   repo: https://github.com/PaddlePaddle/mirrors-yapf.git
+    sha: 0d79c0c469bab64f7229c9aca2b1186ef47f0e37
+    hooks:
+    -   id: yapf
+        files: \.py$
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    sha: a11d9314b22d8f8c7556443875b731ef05965464
+    hooks:
+    -   id: check-merge-conflict
+    -   id: check-symlinks
+    -   id: detect-private-key
+        files: (?!.*paddle)^.*$
+    -   id: end-of-file-fixer
+        files: \.md$
+    -   id: trailing-whitespace
+        files: \.md$
+-   repo: https://github.com/Lucas-C/pre-commit-hooks
+    sha: v1.0.1
+    hooks:
+    -   id: forbid-crlf
+        files: \.md$
+    -   id: remove-crlf
+        files: \.md$
+    -   id: forbid-tabs
+        files: \.md$
+    -   id: remove-tabs
+        files: \.md$
--- a/.style.yapf
+++ b/.style.yapf
+[style]
+based_on_style = pep8
+column_limit = 80
--- a/.travis/precommit.sh
+++ b/.travis/precommit.sh
+#!/bin/bash
+function abort(){
+    echo "Your commit does not fit PaddlePaddle code style" 1>&2
+    echo "Please use pre-commit scripts to auto-format your code" 1>&2
+    exit 1
+}
+trap 'abort' 0
+set -e
+cd `dirname $0`
+cd ..
+export PATH=/usr/bin:$PATH
+pre-commit install
+if ! pre-commit run -a ; then
+  ls -lh
+  git diff  --exit-code
+  exit 1
+fi
+trap : 0
--- a/LRC/README.md
+++ b/LRC/README.md
 # LRC Local Rademachar Complexity Regularization
-Regularization of Deep Neural Networks(DNNs) for the sake of improving their generalization capability is important and chllenging. This directory contains image classification model based on a novel regularizer rooted in Local Rademacher Complexity (LRC). We appreciate the contribution by [DARTS](https://arxiv.org/abs/1806.09055) for our research. The regularization by LRC and DARTS are combined in this model on CIFAR-10 dataset. Code accompanying the paper
+Regularization of Deep Neural Networks(DNNs) for the sake of improving their generalization capability is important and chllenging. This directory contains image classification model based on a novel regularizer rooted in Local Rademacher Complexity (LRC). We appreciate the contribution by [DARTS](https://arxiv.org/abs/1806.09055) for our research. The regularization by LRC and DARTS are combined in this model to reach accuracy of 98.01% on CIFAR-10 dataset. Code accompanying the paper
 > [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\
 > Yingzhen Yang, Xingjian Li, Jun Huan.\
 > _arXiv:1902.00873_.
@@ -7,13 +7,21 @@ Regularization of Deep Neural Networks(DNNs) for the sake of improving their gen
 ---
 # Table of Contents
+- [Introduction of algorithm](#introduction-of-algorithm)
 - [Installation](#installation)
 - [Data preparation](#data-preparation)
 - [Training](#training)
+- [Testing](#testing)
+- [Experimental result](#experimental-result)
+- [Reference](#reference)
+## Introduction of algorithm
+Rademacher complexity is well known as a distribution-free complexity measure of function class and LRC focus on a restricted function class which leads to sharper convergence rates and potential better generalization. Our LRC based regularizer is developed by estimating the complexity of the function class centered at the minimizer of the empirical loss of DNNs.
 ## Installation
-Running sample code in this directory requires PaddelPaddle Fluid v.1.2.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle) and make an update.
+Running sample code in this directory requires PaddelPaddle Fluid v.1.3.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html#paddlepaddle) and make an update.
 ## Data preparation
@@ -30,13 +38,8 @@ The dataset will be downloaded to `dataset/cifar/cifar-10-batches-py` in the sam
 After data preparation, one can start the training step by:
-    python -u train_mixup.py \
+    sh run_cifar.sh
-        --batch_size=80 \
-        --auxiliary \
-        --weight_decay=0.0003 \
-        --learning_rate=0.025 \
-        --lrc_loss_lambda=0.7 \
-        --cutout
 - Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to train.
 - For more help on arguments:
@@ -44,7 +47,7 @@ After data preparation, one can start the training step by:
 **data reader introduction:**
-* Data reader is defined in `reader.py`.
+* Data reader is defined in `reader_cifar.py`.
 * Reshape the images to 32 * 32.
 * In training stage, images are padding to 40 * 40 and cropped randomly to the original size.
 * In training stage, images are horizontally random flipped.
@@ -54,19 +57,40 @@ After data preparation, one can start the training step by:
 **model configuration:**
-* Use auxiliary loss and auxiliary\_weight=0.4.
-* Use dropout and drop\_path\_prob=0.2.
-* Set lrc\_loss\_lambda=0.7.
-**training strategy:**
 *  Use momentum optimizer with momentum=0.9.
-*  Weight decay is 0.0003.
-*  Use cosine decay with init\_lr=0.025.
 *  Total epoch is 600.
-*  Use Xaiver initalizer to weight in conv2d, Constant initalizer to weight in batch norm and Normal initalizer to weight in fc.
+*  Use global L2 norm to clip gradient.
-*  Initalize bias in batch norm and fc to zero constant and do not add bias to conv2d.
+*  Other configurations are set in `run_cifar.sh`
+## Tesing
+one can start the testing step by:
+    sh run_cifar_test.sh
+- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to train.
+- For more help on arguments:
+    python test_mixup.py --help
+After obtaining six models, one can get ensembled model by:
+    python voting.py
+## Experimental result
+Experimental result is shown as below:
+| Model                   |   based lr  | batch size | model id  | acc-1  |
+| :--------------- | :--------: | :------------:    | :------------------:    |------: |
+| [model_0](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_0.tar.gz)  | 0.01 | 64  | 0 | 97.12% |
+| [model_1](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_1.tar.gz)  | 0.02 | 80  | 0 | 97.34% |
+| [model_2](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_2.tar.gz)  | 0.015 | 80 | 1 | 97.31% |
+| [model_3](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_3.tar.gz)  | 0.02 | 80  | 1 | 97.52% |
+| [model_4](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_4.tar.gz)  | 0.03 | 80  | 1 | 97.30% |
+| [model_5](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_5.tar.gz)  | 0.015 | 64 | 2 | 97.32% |
+ensembled model acc-1=98.01%
 ## Reference

--- a/LRC/README_cn.md
+++ b/LRC/README_cn.md
 # LRC 局部Rademachar复杂度正则化
-为了在深度神经网络中提升泛化能力，正则化的选择十分重要也具有挑战性。本目录包括了一种基于局部rademacher复杂度的新型正则（LRC）的图像分类模型。十分感谢[DARTS](https://arxiv.org/abs/1806.09055)模型对本研究提供的帮助。该模型将LRC正则和DARTS网络相结合，在CIFAR-10数据集中得到了很出色的效果。代码和文章一同发布
+为了在深度神经网络中提升泛化能力，正则化的选择十分重要也具有挑战性。本目录包括了一种基于局部rademacher复杂度的新型正则（LRC）的图像分类模型。十分感谢[DARTS](https://arxiv.org/abs/1806.09055)模型对本研究提供的帮助。该模型将LRC正则和DARTS网络相结合，在CIFAR-10数据集中得到了98.01%的准确率。代码和文章一同发布
 > [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\
 > Yingzhen Yang, Xingjian Li, Jun Huan.\
 > _arXiv:1902.00873_.
@@ -7,13 +7,21 @@
 ---
 # 内容
+- [算法简介](#算法简介)
 - [安装](#安装)
 - [数据准备](#数据准备)
 - [模型训练](#模型训练)
+- [模型测试](#模型测试)
+- [实验结果](#实验结果)
+- [引用](#引用)
+## 算法简介
+局部拉德马赫复杂度方法借鉴了已有的局部拉德马赫复杂度方法，仅考虑在经验损失函数的极小值点附近的一个球内的拉德马赫复杂度。采用最近的拉德马赫复杂度的估计方法，对折页损失函数 (Hinge Loss) 和交叉熵（cross entropy）推得了这个固定值的表达式，并且将其称之为局部拉德马赫正则化项，并加在经验损失函数上。将正则化方法作用在混合和模型集成之后，得到了CIFAR-10上目前最好的准确率。
 ## 安装
-在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本，请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle)中的说明来更新PaddlePaddle。
+在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.3.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本，请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html#paddlepaddle)中的说明来更新PaddlePaddle。
 ## 数据准备
@@ -21,27 +29,22 @@
    sh ./dataset/download.sh
-请确保您的环境有互联网连接。数据会下载到`train.py`同目录下的`dataset/cifar/cifar-10-batches-py`。如果下载失败，您可以自行从https://www.cs.toronto.edu/~kriz/cifar.html上下载cifar-10-python.tar.gz并解压到上述位置。
+请确保您的环境有互联网连接。数据会下载到`train.py`同目录下的`dataset/cifar/cifar-10-batches-py`。如果下载失败，您可以自行从 https://www.cs.toronto.edu/~kriz/cifar.html 上下载cifar-10-python.tar.gz并解压到上述位置。
 ## 模型训练
 数据准备好后，可以通过如下命令开始训练：
-    python -u train_mixup.py \
+    sh run_cifar.sh
-        --batch_size=80 \
-        --auxiliary \
+- 在```run_cifar.sh```中通过设置 ```export CUDA_VISIBLE_DEVICES=0```指定GPU卡号进行训练。
-        --weight_decay=0.0003 \
-        --learning_rate=0.025 \
-        --lrc_loss_lambda=0.7 \
-        --cutout
- 通过设置 ```export CUDA_VISIBLE_DEVICES=0```指定单张GPU训练。
 - 可选参数见：
    python train_mixup.py --help
 **数据读取器说明：**
-* 数据读取器定义在`reader.py`中
+* 数据读取器定义在`reader_cifar.py`中
 * 输入图像尺寸统一变换为32 * 32
 * 训练时将图像填充为40 * 40然后随机剪裁为原输入图像大小
 * 训练时图像随机水平翻转
@@ -51,19 +54,41 @@
 **模型配置：**
-* 使用辅助损失，辅助损失权重为0.4
-* 使用dropout，随机丢弃率为0.2
-* 设置lrc\_loss\_lambda为0.7
-**训练策略：**
 * 采用momentum优化算法训练，momentum=0.9
-* 权重衰减系数为0.0001
-* 采用正弦学习率衰减，初始学习率为0.025
 * 总共训练600轮
-* 对卷积权重采用Xaiver初始化，对batch norm权重采用固定初始化，对全连接层权重采用高斯初始化
+* 对梯度采用全局L2范数裁剪
-* 对batch norm和全连接层偏差采用固定初始化，不对卷积设置偏差
+* 其余模型配置在run_cifar.sh中
+## 模型测试
+可以通过如下命令开始测试：
+    sh run_cifar_test.sh
+- 在```run_cifar_test.sh```中通过设置 ```export CUDA_VISIBLE_DEVICES=0```指定GPU卡号进行训练。
+- 可选参数见：
+    python test_mixup.py --help
+得到六个模型后运行如下脚本得到融合模型：
+    python voting.py
+## 实验结果
+下表为模型评估结果：
+| 模型                   |   初始学习率  | 批量大小   | 模型编号   | acc-1  |
+| :--------------- | :--------: | :------------:    | :------------------:    |------: |
+| [model_0](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_0.tar.gz)  | 0.01 | 64  | 0 | 97.12% |
+| [model_1](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_1.tar.gz)  | 0.02 | 80  | 0 | 97.34% |
+| [model_2](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_2.tar.gz)  | 0.015 | 80 | 1 | 97.31% |
+| [model_3](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_3.tar.gz)  | 0.02 | 80  | 1 | 97.52% |
+| [model_4](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_4.tar.gz)  | 0.03 | 80  | 1 | 97.30% |
+| [model_5](https://paddlemodels.bj.bcebos.com/autodl/lrc_model_5.tar.gz)  | 0.015 | 64 | 2 | 97.32% |
+融合模型acc-1=98.01%
 ## 引用

--- a/LRC/genotypes.py
+++ b/LRC/genotypes.py
@@ -113,4 +113,34 @@ MY_DARTS = Genotype(
            ('skip_connect', 2), ('skip_connect', 3)],
    reduce_concat=range(2, 6))
-DARTS = MY_DARTS
+MY_DARTS_list = [
+    Genotype(
+        normal=[('sep_conv_3x3', 0), ('skip_connect', 1), ('sep_conv_3x3', 0),
+                ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('sep_conv_3x3', 1),
+                ('skip_connect', 0), ('sep_conv_3x3', 2)],
+        normal_concat=range(2, 6),
+        reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2),
+                ('max_pool_3x3', 0), ('skip_connect', 3), ('avg_pool_3x3', 1),
+                ('skip_connect', 2), ('skip_connect', 3)],
+        reduce_concat=range(2, 6)),
+    Genotype(
+        normal=[('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('skip_connect', 0),
+                ('dil_conv_3x3', 2), ('skip_connect', 0), ('sep_conv_3x3', 1),
+                ('skip_connect', 0), ('skip_connect', 1)],
+        normal_concat=range(2, 6),
+        reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2),
+                ('dil_conv_3x3', 0), ('skip_connect', 3), ('skip_connect', 2),
+                ('skip_connect', 3), ('skip_connect', 2)],
+        reduce_concat=range(2, 6)),
+    Genotype(
+        normal=[('sep_conv_3x3', 0), ('skip_connect', 1), ('skip_connect', 0),
+                ('dil_conv_5x5', 1), ('skip_connect', 0), ('sep_conv_3x3', 1),
+                ('skip_connect', 0), ('sep_conv_3x3', 1)],
+        normal_concat=range(2, 6),
+        reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('max_pool_3x3', 0),
+                ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 2),
+                ('skip_connect', 2), ('skip_connect', 3)],
+        reduce_concat=range(2, 6))
+]
+DARTS = MY_DARTS_list[0]
--- a/LRC/labels.npz
+++ b/LRC/labels.npz
--- a/LRC/learning_rate.py
+++ b/LRC/learning_rate.py
@@ -38,6 +38,41 @@ def cosine_decay(learning_rate, num_epoch, steps_one_epoch):
    with init_on_cpu():
        decayed_lr = learning_rate * \
-                 (ops.cos((global_step / steps_one_epoch) \
+                 (ops.cos(fluid.layers.floor(global_step / steps_one_epoch) \
                 * math.pi / num_epoch) + 1)/2
    return decayed_lr
+def cosine_with_warmup_decay(learning_rate, lr_min, steps_one_epoch,
+                             warmup_epochs, total_epoch, num_gpu):
+    global_step = _decay_step_counter()
+    epoch_idx = fluid.layers.floor(global_step / steps_one_epoch)
+    lr = fluid.layers.create_global_var(
+        shape=[1],
+        value=0.0,
+        dtype='float32',
+        persistable=True,
+        name="learning_rate")
+    warmup_epoch_var = fluid.layers.fill_constant(
+        shape=[1], dtype='float32', value=float(warmup_epochs), force_cpu=True)
+    num_gpu_var = fluid.layers.fill_constant(
+        shape=[1], dtype='float32', value=float(num_gpu), force_cpu=True)
+    batch_idx = global_step - steps_one_epoch * epoch_idx
+    with fluid.layers.control_flow.Switch() as switch:
+        with switch.case(epoch_idx < warmup_epoch_var):
+            epoch_ = (batch_idx + 1) / steps_one_epoch
+            factor = 1 / num_gpu_var * (
+                epoch_ * (num_gpu_var - 1) / warmup_epoch_var + 1)
+            decayed_lr = learning_rate * factor * num_gpu_var
+            fluid.layers.assign(decayed_lr, lr)
+        epoch_ = (batch_idx + 1) / steps_one_epoch
+        m = epoch_ / total_epoch
+        frac = (1 + ops.cos(math.pi * m)) / 2
+        cosine_lr = (lr_min + (learning_rate - lr_min) * frac) * num_gpu_var
+        with switch.default():
+            fluid.layers.assign(cosine_lr, lr)
+    return lr
--- a/LRC/model.py
+++ b/LRC/model.py
@@ -176,9 +176,9 @@ def StemConv(input, C_out, kernel_size, padding):
 class NetworkCIFAR(object):
    def __init__(self, C, class_num, layers, auxiliary, genotype):
-        self.class_num = class_num
        self._layers = layers
        self._auxiliary = auxiliary
+        self.class_num = class_num
        stem_multiplier = 3
        self.drop_path_prob = 0
@@ -201,36 +201,12 @@ class NetworkCIFAR(object):
            if i == 2 * layers // 3:
                C_to_auxiliary = C_prev
-    def forward(self, init_channel, is_train):
+    def build_input(self, image_shape, is_train):
-        self.training = is_train
-        self.logits_aux = None
-        num_channel = init_channel * 3
-        s0 = StemConv(self.image, num_channel, kernel_size=3, padding=1)
-        s1 = s0
-        for i, cell in enumerate(self.cells):
-            name = 'cells.' + str(i) + '.'
-            s0, s1 = s1, cell.forward(s0, s1, self.drop_path_prob, is_train,
-                                      name)
-            if i == int(2 * self._layers // 3):
-                if self._auxiliary and self.training:
-                    self.logits_aux = AuxiliaryHeadCIFAR(s1, self.class_num)
-        out = fluid.layers.adaptive_pool2d(s1, (1, 1), "avg")
-        self.logits = fluid.layers.fc(out,
-                                      size=self.class_num,
-                                      param_attr=ParamAttr(
-                                          initializer=Normal(scale=1e-3),
-                                          name='classifier.weight'),
-                                      bias_attr=ParamAttr(
-                                          initializer=Constant(0.),
-                                          name='classifier.bias'))
-        return self.logits, self.logits_aux
-    def build_input(self, image_shape, batch_size, is_train):
        if is_train:
            py_reader = fluid.layers.py_reader(
                capacity=64,
                shapes=[[-1] + image_shape, [-1, 1], [-1, 1], [-1, 1], [-1, 1],
-                        [-1, 1], [-1, batch_size, self.class_num - 1]],
+                        [-1, 1], [50, -1, self.class_num - 1]],
                lod_levels=[0, 0, 0, 0, 0, 0, 0],
                dtypes=[
                    "float32", "int64", "int64", "float32", "int32", "int32",
@@ -248,14 +224,35 @@ class NetworkCIFAR(object):
                name='test_reader')
        return py_reader
-    def train_model(self, py_reader, init_channels, aux, aux_w, batch_size,
+    def forward(self, init_channel, is_train):
-                    loss_lambda):
+        self.training = is_train
+        self.logits_aux = None
+        num_channel = init_channel * 3
+        s0 = s1 = StemConv(self.image, num_channel, kernel_size=3, padding=1)
+        for i, cell in enumerate(self.cells):
+            name = 'cells.' + str(i) + '.'
+            s0, s1 = s1, cell.forward(s0, s1, self.drop_path_prob, is_train,
+                                      name)
+            if i == int(2 * self._layers // 3):
+                if self._auxiliary and self.training:
+                    self.logits_aux = AuxiliaryHeadCIFAR(s1, self.class_num)
+        out = fluid.layers.adaptive_pool2d(s1, (1, 1), "avg")
+        self.logits = fluid.layers.fc(out,
+                                      size=self.class_num,
+                                      param_attr=ParamAttr(
+                                          initializer=Normal(scale=1e-3),
+                                          name='classifier.weight'),
+                                      bias_attr=ParamAttr(
+                                          initializer=Constant(0, ),
+                                          name='classifier.bias'))
+        return self.logits, self.logits_aux
+    def train_model(self, py_reader, init_channels, aux, aux_w, loss_lambda):
        self.image, self.ya, self.yb, self.lam, self.label_reshape,\
           self.non_label_reshape, self.rad_var = fluid.layers.read_file(py_reader)
        self.logits, self.logits_aux = self.forward(init_channels, True)
        self.mixup_loss = self.mixup_loss(aux, aux_w)
-        self.lrc_loss = self.lrc_loss(batch_size)
+        return self.mixup_loss
-        return self.mixup_loss + loss_lambda * self.lrc_loss
    def test_model(self, py_reader, init_channels):
        self.image, self.ya = fluid.layers.read_file(py_reader)
@@ -264,12 +261,13 @@ class NetworkCIFAR(object):
        loss = fluid.layers.cross_entropy(prob, self.ya)
        acc_1 = fluid.layers.accuracy(self.logits, self.ya, k=1)
        acc_5 = fluid.layers.accuracy(self.logits, self.ya, k=5)
-        return loss, acc_1, acc_5
+        return prob, acc_1, acc_5
    def mixup_loss(self, auxiliary, auxiliary_weight):
        prob = fluid.layers.softmax(self.logits, use_cudnn=False)
        loss_a = fluid.layers.cross_entropy(prob, self.ya)
        loss_b = fluid.layers.cross_entropy(prob, self.yb)
        loss_a_mean = fluid.layers.reduce_mean(loss_a)
        loss_b_mean = fluid.layers.reduce_mean(loss_b)
        loss = self.lam * loss_a_mean + (1 - self.lam) * loss_b_mean
@@ -283,7 +281,7 @@ class NetworkCIFAR(object):
                                                     ) * loss_b_aux_mean
        return loss + auxiliary_weight * loss_aux
-    def lrc_loss(self, batch_size):
+    def lrc_loss(self):
        y_diff_reshape = fluid.layers.reshape(self.logits, shape=(-1, 1))
        label_reshape = fluid.layers.squeeze(self.label_reshape, axes=[1])
        non_label_reshape = fluid.layers.squeeze(
@@ -296,18 +294,226 @@ class NetworkCIFAR(object):
        y_diff_non_label_reshape = fluid.layers.gather(y_diff_reshape,
                                                       non_label_reshape)
        y_diff_label = fluid.layers.reshape(
-            y_diff_label_reshape, shape=(-1, batch_size, 1))
+            y_diff_label_reshape, shape=(1, -1, 1))
        y_diff_non_label = fluid.layers.reshape(
-            y_diff_non_label_reshape,
+            y_diff_non_label_reshape, shape=(1, -1, self.class_num - 1))
-            shape=(-1, batch_size, self.class_num - 1))
        y_diff_ = y_diff_non_label - y_diff_label
        y_diff_ = fluid.layers.transpose(y_diff_, perm=[1, 2, 0])
        rad_var_trans = fluid.layers.transpose(self.rad_var, perm=[1, 2, 0])
        rad_y_diff_trans = rad_var_trans * y_diff_
        lrc_loss_sum = fluid.layers.reduce_sum(rad_y_diff_trans, dim=[0, 1])
-        lrc_loss_ = fluid.layers.abs(lrc_loss_sum) / (batch_size *
+        shape_nbc = fluid.layers.shape(rad_y_diff_trans)
-                                                      (self.class_num - 1))
+        shape_nb = fluid.layers.slice(shape_nbc, axes=[0], starts=[0], ends=[2])
+        num = fluid.layers.reduce_prod(shape_nb)
+        num.stop_gradient = True
+        lrc_loss_ = fluid.layers.abs(lrc_loss_sum) / num
        lrc_loss_mean = fluid.layers.reduce_mean(lrc_loss_)
        return lrc_loss_mean
+def AuxiliaryHeadImageNet(input, num_classes, aux_name='auxiliary_head'):
+    relu_a = fluid.layers.relu(input)
+    pool_a = fluid.layers.pool2d(relu_a, 5, 'avg', pool_stride=3)
+    conv2d_a = fluid.layers.conv2d(
+        pool_a,
+        128,
+        1,
+        name=aux_name + '.features.2',
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=aux_name + '.features.2.weight'),
+        bias_attr=False)
+    bn_a_name = aux_name + '.features.3'
+    bn_a = fluid.layers.batch_norm(
+        conv2d_a,
+        act='relu',
+        name=bn_a_name,
+        param_attr=ParamAttr(
+            initializer=Constant(1.), name=bn_a_name + '.weight'),
+        bias_attr=ParamAttr(
+            initializer=Constant(0.), name=bn_a_name + '.bias'),
+        moving_mean_name=bn_a_name + '.running_mean',
+        moving_variance_name=bn_a_name + '.running_var')
+    conv2d_b = fluid.layers.conv2d(
+        bn_a,
+        768,
+        2,
+        act='relu',
+        name=aux_name + '.features.5',
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0),
+            name=aux_name + '.features.5.weight'),
+        bias_attr=False)
+    fc_name = aux_name + '.classifier'
+    fc = fluid.layers.fc(conv2d_b,
+                         num_classes,
+                         name=fc_name,
+                         param_attr=ParamAttr(
+                             initializer=Normal(scale=1e-3),
+                             name=fc_name + '.weight'),
+                         bias_attr=ParamAttr(
+                             initializer=Constant(0.), name=fc_name + '.bias'))
+    return fc
+def Stem0Conv(input, C_out):
+    conv_a = fluid.layers.conv2d(
+        input,
+        C_out // 2,
+        3,
+        stride=2,
+        padding=1,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0), name='stem0.0.weight'),
+        bias_attr=False)
+    bn_a = fluid.layers.batch_norm(
+        conv_a,
+        param_attr=ParamAttr(
+            initializer=Constant(1.), name='stem0.1.weight'),
+        bias_attr=ParamAttr(
+            initializer=Constant(0.), name='stem0.1.bias'),
+        moving_mean_name='stem0.1.running_mean',
+        moving_variance_name='stem0.1.running_var',
+        act='relu')
+    conv_b = fluid.layers.conv2d(
+        relu_a,
+        C_out,
+        3,
+        stride=2,
+        padding=1,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0), name='stem0.3.weight'),
+        bias_attr=False)
+    bn_b = fluid.layers.batch_norm(
+        conv_b,
+        param_attr=ParamAttr(
+            initializer=Constant(1.), name='stem0.4.weight'),
+        bias_attr=ParamAttr(
+            initializer=Constant(0.), name='stem0.4.bias'),
+        moving_mean_name='stem0.4.running_mean',
+        moving_variance_name='stem0.4.running_var')
+    return bn_b
+def Stem1Conv(input, C_out):
+    relu_a = fluid.layers.relu(input)
+    conv_a = fluid.layers.conv2d(
+        relu_a,
+        C_out,
+        3,
+        stride=2,
+        padding=1,
+        param_attr=ParamAttr(
+            initializer=Xavier(
+                uniform=False, fan_in=0), name='stem1.1.weight'),
+        bias_attr=False)
+    bn_a = fluid.layers.batch_norm(
+        conv_a,
+        param_attr=ParamAttr(
+            initializer=Constant(1.), name='stem1.2.weight'),
+        bias_attr=ParamAttr(
+            initializer=Constant(0.), name='stem1.2.bias'),
+        moving_mean_name='stem1.2.running_mean',
+        moving_variance_name='stem1.2.running_var')
+    return bn_a
+class NetworkImageNet(object):
+    def __init__(self, C, class_num, layers, genotype):
+        self.class_num = class_num
+        self._layers = layers
+        self.drop_path_prob = 0
+        C_prev_prev, C_prev, C_curr = C, C, C
+        self.cells = []
+        reduction_prev = True
+        for i in range(layers):
+            if i in [layers // 3, 2 * layers // 3]:
+                C_curr *= 2
+                reduction = True
+            else:
+                reduction = False
+            cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction,
+                        reduction_prev)
+            reduction_prev = reduction
+            self.cells += [cell]
+            C_prev_prev, C_prev = C_prev, cell.multiplier * C_curr
+            if i == 2 * layers // 3:
+                C_to_auxiliary = C_prev
+        self.stem0 = functools.partial(Stem0Conv, C_out=C)
+        self.stem1 = functools.partial(Stem1Conv, C_out=C)
+    def build_input(self, image_shape, is_train):
+        if is_train:
+            py_reader = fluid.layers.py_reader(
+                capacity=64,
+                shapes=[[-1] + image_shape, [-1, 1]],
+                lod_levels=[0, 0],
+                dtypes=["float32", "int64"],
+                use_double_buffer=True,
+                name='train_reader')
+        else:
+            py_reader = fluid.layers.py_reader(
+                capacity=64,
+                shapes=[[-1] + image_shape, [-1, 1]],
+                lod_levels=[0, 0],
+                dtypes=["float32", "int64"],
+                use_double_buffer=True,
+                name='test_reader')
+        return py_reader
+    def forward(self, is_train):
+        self.training = is_train
+        self.logits_aux = None
+        s0 = self.stem0(self.image)
+        s1 = self.stem1(s0)
+        for i, cell in enumerate(self.cells):
+            name = 'cells.' + str(i) + '.'
+            s0, s1 = s1, cell.forward(s0, s1, self.drop_path_prob, is_train,
+                                      name)
+            if i == int(2 * self._layers // 3):
+                if self._auxiliary and self.training:
+                    self.logits_aux = AuxiliaryHeadImageNet(s1, self.class_num)
+        out = fluid.layers.pool2d(s1, 7, "avg", pool_stride=7)
+        self.logits = fluid.layers.fc(out,
+                                      size=self.class_num,
+                                      param_attr=ParamAttr(
+                                          initializer=Normal(scale=1e-3),
+                                          name='classifier.weight'),
+                                      bias_attr=ParamAttr(
+                                          initializer=Constant(0, ),
+                                          name='classifier.bias'))
+        return self.logits, self.logits_aux
+    def calc_loss(self, auxiliary_weight):
+        prob = fluid.layers.softmax(self.logits, use_cudnn=False)
+        loss = fluid.layers.cross_entropy(prob, self.label)
+        loss_mean = fluid.layers.reduce_mean(loss)
+        prob_aux = fluid.layers.softmax(self.logits_aux, use_cudnn=False)
+        loss_aux = fluid.layers.cross_entropy(prob_aux, self.label)
+        loss_aux_mean = fluid.layers.reduce_mean(loss_aux)
+        return loss_mean + auxiliary_weight * loss_aux_mean
+    def train_model(self, py_reader, aux_w):
+        self.image, self.label = fluid.layers.read_file(py_reader)
+        self.logits, self.logits_aux = self.forward(True)
+        self.loss = self.calc_loss(aux_w)
+        return self.loss
+    def test_model(self, py_reader):
+        self.image, self.label = fluid.layers.read_file(py_reader)
+        self.logits, _ = self.forward(False)
+        prob = fluid.layers.softmax(self.logits, use_cudnn=False)
+        loss = fluid.layers.cross_entropy(prob, self.label)
+        acc_1 = fluid.layers.accuracy(self.logits, self.label, k=1)
+        acc_5 = fluid.layers.accuracy(self.logits, self.label, k=5)
+        return prob, acc_1, acc_5
--- a/LRC/operations.py
+++ b/LRC/operations.py
@@ -312,7 +312,8 @@ def FactorizedReduce(input, C_out, name='', affine=True):
        bias_attr=False)
    h_end = relu_a.shape[2]
    w_end = relu_a.shape[3]
-    slice_a = fluid.layers.slice(relu_a, [2, 3], [1, 1], [h_end, w_end])
+    slice_a = fluid.layers.slice(
+        input=relu_a, axes=[2, 3], starts=[1, 1], ends=[h_end, w_end])
    conv2d_b = fluid.layers.conv2d(
        slice_a,
        C_out // 2,

--- a/LRC/paddle_predict/__init__.py
+++ b/LRC/paddle_predict/__init__.py
--- a/LRC/reader.py
+++ b/LRC/reader.py
@@ -31,7 +31,10 @@ from PIL import Image
 from PIL import ImageOps
 import numpy as np
-import cPickle
+try:
+    import cPickle as pickle
+except:
+    import pickle
 import random
 import utils
 import paddle.fluid as fluid
@@ -46,7 +49,7 @@ image_size = 32
 image_depth = 3
 half_length = 8
-CIFAR_MEAN = [0.4914, 0.4822, 0.4465]
+CIFAR_MEAN = [0.49139968, 0.48215827, 0.44653124]
 CIFAR_STD = [0.24703233, 0.24348505, 0.26158768]
@@ -82,6 +85,7 @@ def generate_bernoulli_number(batch_size, CIFAR_CLASSES=10):
 def preprocess(sample, is_training, args):
    image_array = sample.reshape(3, image_size, image_size)
    rgb_array = np.transpose(image_array, (1, 2, 0))
    img = Image.fromarray(rgb_array, 'RGB')
@@ -123,13 +127,15 @@ def reader_creator_filepath(filename, sub_name, is_training, args):
    datasets = []
    for name in names:
        print("Reading file " + name)
-        batch = cPickle.load(open(filename + name, 'rb'))
+        batch = pickle.load(open(filename + name, 'rb'))
        data = batch['data']
        labels = batch.get('labels', batch.get('fine_labels', None))
        assert labels is not None
        dataset = zip(data, labels)
        datasets.extend(dataset)
-    random.shuffle(datasets)
+    if is_training:
+        random.shuffle(datasets)
    def read_batch(datasets, args):
        for sample, label in datasets:
@@ -160,6 +166,23 @@ def reader_creator_filepath(filename, sub_name, is_training, args):
                    yield batch_out
                batch_data = []
                batch_label = []
+        if len(batch_data) != 0:
+            batch_data = np.array(batch_data, dtype='float32')
+            batch_label = np.array(batch_label, dtype='int64')
+            if is_training:
+                flatten_label, flatten_non_label = \
+                  generate_reshape_label(batch_label, len(batch_data))
+                rad_var = generate_bernoulli_number(len(batch_data))
+                mixed_x, y_a, y_b, lam = utils.mixup_data(
+                    batch_data, batch_label, len(batch_data), args.mix_alpha)
+                batch_out = [[mixed_x, y_a, y_b, lam, flatten_label, \
+                            flatten_non_label, rad_var]]
+                yield batch_out
+            else:
+                batch_out = [[batch_data, batch_label]]
+                yield batch_out
+            batch_data = []
+            batch_label = []
    return reader

--- a/LRC/run.sh
+++ b/LRC/run.sh
-CUDA_VISIBLE_DEVICES=0 python -u train_mixup.py \
--batch_size=80 \
--auxiliary \
--weight_decay=0.0003 \
--learning_rate=0.025 \
--lrc_loss_lambda=0.7 \
--cutout
--- a/LRC/run_cifar.sh
+++ b/LRC/run_cifar.sh
+export FLAGS_fraction_of_gpu_memory_to_use=0.9
+export FLAGS_eager_delete_tensor_gb=0.0
+export FLAGS_fast_eager_deletion_mode=1
+nohup env CUDA_VISIBLE_DEVICES=0 python -u train_mixup.py --batch_size=64 --auxiliary --mix_alpha=0.9 --model_id=0 --cutout --lrc_loss_lambda=0.5 --weight_decay=0.0002 --learning_rate=0.01 --save_model_path=model_0 > lrc_model_0.log 2>&1 &
+nohup env CUDA_VISIBLE_DEVICES=1 python -u train_mixup.py --batch_size=64 --auxiliary --mix_alpha=0.6 --model_id=0 --cutout --lrc_loss_lambda=0.5 --weight_decay=0.0002 --learning_rate=0.02 --save_model_path=model_1 > lrc_model_1.log 2>&1 &
+nohup env CUDA_VISIBLE_DEVICES=2 python -u train_mixup.py --batch_size=80 --auxiliary --mix_alpha=0.5 --model_id=1 --cutout --lrc_loss_lambda=0.5 --weight_decay=0.0002 --learning_rate=0.015 --save_model_path=model_2 > lrc_model_2.log 2>&1 &
+nohup env CUDA_VISIBLE_DEVICES=3 python -u train_mixup.py --batch_size=80 --auxiliary --mix_alpha=0.6 --model_id=1 --cutout --lrc_loss_lambda=0.5 --weight_decay=0.0002 --learning_rate=0.02 --save_model_path=model_3 > lrc_model_3.log 2>&1 &
+nohup env CUDA_VISIBLE_DEVICES=4 python -u train_mixup.py --batch_size=80 --auxiliary --mix_alpha=0.8 --model_id=1 --cutout --lrc_loss_lambda=0.5 --weight_decay=0.0002 --learning_rate=0.03 --save_model_path=model_4 > lrc_model_4.log 2>&1 &
+nohup env CUDA_VISIBLE_DEVICES=5 python -u train_mixup.py --batch_size=64 --auxiliary --mix_alpha=0.5 --model_id=2 --cutout --lrc_loss_lambda=0.5 --weight_decay=0.0002 --learning_rate=0.015 --save_model_path=model_5 > lrc_model_5.log 2>&1 &
--- a/LRC/run_cifar_test.sh
+++ b/LRC/run_cifar_test.sh
+export FLAGS_fraction_of_gpu_memory_to_use=0.6
+nohup env CUDA_VISIBLE_DEVICES=0 python -u test_mixup.py --batch_size=64 --auxiliary --model_id=0 --pretrained_model=model_0/final/ --dump_path=paddle_predict/prob_test_0.pkl > lrc_test_0.log 2>&1 &
+nohup env CUDA_VISIBLE_DEVICES=1 python -u test_mixup.py --batch_size=64 --auxiliary --model_id=0 --pretrained_model=model_1/final/ --dump_path=paddle_predict/prob_test_1.pkl > lrc_test_1.log 2>&1 &
+nohup env CUDA_VISIBLE_DEVICES=2 python -u test_mixup.py --batch_size=80 --auxiliary --model_id=1 --pretrained_model=model_2/final/ --dump_path=paddle_predict/prob_test_2.pkl > lrc_test_2.log 2>&1 &
+nohup env CUDA_VISIBLE_DEVICES=3 python -u test_mixup.py --batch_size=80 --auxiliary --model_id=1 --pretrained_model=model_3/final/ --dump_path=paddle_predict/prob_test_3.pkl > lrc_test_3.log 2>&1 &
+nohup env CUDA_VISIBLE_DEVICES=4 python -u test_mixup.py --batch_size=80 --auxiliary --model_id=1 --pretrained_model=model_4/final/ --dump_path=paddle_predict/prob_test_4.pkl > lrc_test_4.log 2>&1 &
+nohup env CUDA_VISIBLE_DEVICES=5 python -u test_mixup.py --batch_size=64 --auxiliary --model_id=2 --pretrained_model=model_5/final/ --dump_path=paddle_predict/prob_test_5.pkl > lrc_test_5.log 2>&1 &
--- a/LRC/test_mixup.py
+++ b/LRC/test_mixup.py
+#  Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+#
+# Based on:
+# --------------------------------------------------------
+# DARTS
+# Copyright (c) 2018, Hanxiao Liu.
+# Licensed under the Apache License, Version 2.0;
+# --------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from learning_rate import cosine_decay
+import numpy as np
+import argparse
+from model import NetworkCIFAR as Network
+import reader_cifar as reader
+import sys
+import os
+import time
+import logging
+import genotypes
+import paddle.fluid as fluid
+import shutil
+import utils
+parser = argparse.ArgumentParser("cifar")
+# yapf: disable
+parser.add_argument('--data', type=str, default='./dataset/cifar/cifar-10-batches-py/', help='location of the data corpus')
+parser.add_argument('--batch_size', type=int, default=96, help='batch size')
+parser.add_argument('--model_id', type=int, help='model id')
+parser.add_argument('--report_freq', type=float, default=50, help='report frequency')
+parser.add_argument( '--init_channels', type=int, default=36, help='num of init channels')
+parser.add_argument( '--layers', type=int, default=20, help='total number of layers')
+parser.add_argument('--auxiliary', action='store_true', default=False, help='use auxiliary tower')
+parser.add_argument('--auxiliary_weight', type=float, default=0.4, help='weight for auxiliary loss')
+parser.add_argument('--drop_path_prob', type=float, default=0.2, help='drop path probability')
+parser.add_argument('--pretrained_model', type=str, default='/model_0/final/', help='pretrained model to load')
+parser.add_argument('--arch', type=str, default='DARTS', help='which architecture to use')
+parser.add_argument('--dump_path', type=str, default='prob_test_0.pkl', help='dump path')
+# yapf: enable
+args = parser.parse_args()
+CIFAR_CLASSES = 10
+dataset_train_size = 50000
+image_size = 32
+genotypes.DARTS = genotypes.MY_DARTS_list[args.model_id]
+print(genotypes.DARTS)
+def main():
+    image_shape = [3, image_size, image_size]
+    devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
+    devices_num = len(devices.split(","))
+    logging.info("args = %s", args)
+    genotype = eval("genotypes.%s" % args.arch)
+    model = Network(args.init_channels, CIFAR_CLASSES, args.layers,
+                    args.auxiliary, genotype)
+    test(model, args, image_shape)
+def build_program(args, is_train, model, im_shape):
+    out = []
+    py_reader = model.build_input(im_shape, is_train)
+    prob, acc_1, acc_5 = model.test_model(py_reader, args.init_channels)
+    out = [py_reader, prob, acc_1, acc_5]
+    return out
+def test(model, args, im_shape):
+    test_py_reader, prob, acc_1, acc_5 = build_program(args, False, model,
+                                                       im_shape)
+    test_prog = fluid.default_main_program().clone(for_test=True)
+    place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    # yapf: disable
+    if args.pretrained_model:
+        def if_exist(var):
+            return os.path.exists(os.path.join(args.pretrained_model, var.name))
+        fluid.io.load_vars(exe, args.pretrained_model, predicate=if_exist)
+    # yapf: enable
+    exec_strategy = fluid.ExecutionStrategy()
+    exec_strategy.num_threads = 1
+    compile_program = fluid.compiler.CompiledProgram(
+        test_prog).with_data_parallel(exec_strategy=exec_strategy)
+    test_reader = reader.test10(args)
+    test_py_reader.decorate_paddle_reader(test_reader)
+    test_fetch_list = [prob, acc_1, acc_5]
+    prob = []
+    top1 = utils.AvgrageMeter()
+    top5 = utils.AvgrageMeter()
+    test_py_reader.start()
+    test_start_time = time.time()
+    step_id = 0
+    try:
+        while True:
+            prev_test_start_time = test_start_time
+            test_start_time = time.time()
+            prob_v, acc_1_v, acc_5_v = exe.run(compile_program,
+                                               test_prog,
+                                               fetch_list=test_fetch_list)
+            prob.append(list(np.array(prob_v)))
+            top1.update(np.array(acc_1_v), np.array(prob_v).shape[0])
+            top5.update(np.array(acc_5_v), np.array(prob_v).shape[0])
+            if step_id % args.report_freq == 0:
+                print('prob shape:', np.array(prob_v).shape)
+                print("Step {}, acc_1 {}, acc_5 {}, time {}".format(
+                    step_id,
+                    np.array(acc_1_v),
+                    np.array(acc_5_v), test_start_time - prev_test_start_time))
+            step_id += 1
+    except fluid.core.EOFException:
+        test_py_reader.reset()
+    np.concatenate(prob).dump(args.dump_path)
+    print("top1 {0}, top5 {1}".format(top1.avg, top5.avg))
+if __name__ == '__main__':
+    main()
--- a/LRC/train_mixup.py
+++ b/LRC/train_mixup.py
@@ -26,7 +26,7 @@ from learning_rate import cosine_decay
 import numpy as np
 import argparse
 from model import NetworkCIFAR as Network
-import reader
+import reader_cifar as reader
 import sys
 import os
 import time
@@ -35,73 +35,40 @@ import genotypes
 import paddle.fluid as fluid
 import shutil
 import utils
-import cPickle as cp
+import math
 parser = argparse.ArgumentParser("cifar")
-parser.add_argument(
+# yapf: disable
-    '--data',
+parser.add_argument('--data', type=str, default='./dataset/cifar/cifar-10-batches-py/', help='location of the data corpus')
-    type=str,
-    default='./dataset/cifar/cifar-10-batches-py/',
-    help='location of the data corpus')
 parser.add_argument('--batch_size', type=int, default=96, help='batch size')
-parser.add_argument(
+parser.add_argument('--pretrained_model', type=str, default=None, help='pretrained model to load')
-    '--learning_rate', type=float, default=0.025, help='init learning rate')
+parser.add_argument('--model_id', type=int, help='model id')
+parser.add_argument('--learning_rate', type=float, default=0.025, help='init learning rate')
 parser.add_argument('--momentum', type=float, default=0.9, help='momentum')
-parser.add_argument(
+parser.add_argument('--weight_decay', type=float, default=3e-4, help='weight decay')
-    '--weight_decay', type=float, default=3e-4, help='weight decay')
+parser.add_argument('--report_freq', type=float, default=50, help='report frequency')
-parser.add_argument(
+parser.add_argument('--epochs', type=int, default=600, help='num of training epochs')
-    '--report_freq', type=float, default=50, help='report frequency')
+parser.add_argument('--init_channels', type=int, default=36, help='num of init channels')
-parser.add_argument(
+parser.add_argument('--layers', type=int, default=20, help='total number of layers')
-    '--epochs', type=int, default=600, help='num of training epochs')
+parser.add_argument('--save_model_path', type=str, default='saved_models', help='path to save the model')
-parser.add_argument(
+parser.add_argument('--auxiliary', action='store_true', default=False, help='use auxiliary tower')
-    '--init_channels', type=int, default=36, help='num of init channels')
+parser.add_argument('--auxiliary_weight', type=float, default=0.4, help='weight for auxiliary loss')
-parser.add_argument(
+parser.add_argument('--cutout', action='store_true', default=False, help='use cutout')
-    '--layers', type=int, default=20, help='total number of layers')
+parser.add_argument('--cutout_length', type=int, default=16, help='cutout length')
-parser.add_argument(
+parser.add_argument('--drop_path_prob', type=float, default=0.2, help='drop path probability')
-    '--model_path',
+parser.add_argument('--arch', type=str, default='DARTS', help='which architecture to use')
-    type=str,
+parser.add_argument('--grad_clip', type=float, default=5, help='gradient clipping')
-    default='saved_models',
+parser.add_argument('--lr_exp_decay', action='store_true', default=False, help='use exponential_decay learning_rate')
-    help='path to save the model')
-parser.add_argument(
-    '--auxiliary',
-    action='store_true',
-    default=False,
-    help='use auxiliary tower')
-parser.add_argument(
-    '--auxiliary_weight',
-    type=float,
-    default=0.4,
-    help='weight for auxiliary loss')
-parser.add_argument(
-    '--cutout', action='store_true', default=False, help='use cutout')
-parser.add_argument(
-    '--cutout_length', type=int, default=16, help='cutout length')
-parser.add_argument(
-    '--drop_path_prob', type=float, default=0.2, help='drop path probability')
-parser.add_argument('--save', type=str, default='EXP', help='experiment name')
-parser.add_argument(
-    '--arch', type=str, default='DARTS', help='which architecture to use')
-parser.add_argument(
-    '--grad_clip', type=float, default=5, help='gradient clipping')
-parser.add_argument(
-    '--lr_exp_decay',
-    action='store_true',
-    default=False,
-    help='use exponential_decay learning_rate')
 parser.add_argument('--mix_alpha', type=float, default=0.5, help='mixup alpha')
-parser.add_argument(
+parser.add_argument('--lrc_loss_lambda', default=0, type=float, help='lrc_loss_lambda')
-    '--lrc_loss_lambda', default=0, type=float, help='lrc_loss_lambda')
+# yapf: enable
-parser.add_argument(
-    '--loss_type',
-    default=1,
-    type=float,
-    help='loss_type 0: cross entropy 1: multi margin loss 2: max margin loss')
 args = parser.parse_args()
 CIFAR_CLASSES = 10
-dataset_train_size = 50000
+dataset_train_size = 50000.
 image_size = 32
+genotypes.DARTS = genotypes.MY_DARTS_list[args.model_id]
 def main():
@@ -112,7 +79,9 @@ def main():
    genotype = eval("genotypes.%s" % args.arch)
    model = Network(args.init_channels, CIFAR_CLASSES, args.layers,
                    args.auxiliary, genotype)
-    steps_one_epoch = dataset_train_size / (devices_num * args.batch_size)
+    steps_one_epoch = math.ceil(dataset_train_size /
+                                (devices_num * args.batch_size))
    train(model, args, image_shape, steps_one_epoch)
@@ -120,73 +89,84 @@ def build_program(main_prog, startup_prog, args, is_train, model, im_shape,
                  steps_one_epoch):
    out = []
    with fluid.program_guard(main_prog, startup_prog):
-        py_reader = model.build_input(im_shape, args.batch_size, is_train)
+        py_reader = model.build_input(im_shape, is_train)
        if is_train:
            with fluid.unique_name.guard():
                loss = model.train_model(py_reader, args.init_channels,
                                         args.auxiliary, args.auxiliary_weight,
-                                         args.batch_size, args.lrc_loss_lambda)
+                                         args.lrc_loss_lambda)
                optimizer = fluid.optimizer.Momentum(
-                        learning_rate=cosine_decay(args.learning_rate, \
+                    learning_rate=cosine_decay(args.learning_rate, args.epochs,
-                            args.epochs, steps_one_epoch),
+                                               steps_one_epoch),
-                        regularization=fluid.regularizer.L2Decay(\
+                    regularization=fluid.regularizer.L2Decay(args.weight_decay),
-                            args.weight_decay),
+                    momentum=args.momentum)
-                        momentum=args.momentum)
                optimizer.minimize(loss)
                out = [py_reader, loss]
        else:
            with fluid.unique_name.guard():
-                loss, acc_1, acc_5 = model.test_model(py_reader,
+                prob, acc_1, acc_5 = model.test_model(py_reader,
                                                      args.init_channels)
-                out = [py_reader, loss, acc_1, acc_5]
+                out = [py_reader, prob, acc_1, acc_5]
    return out
 def train(model, args, im_shape, steps_one_epoch):
-    train_startup_prog = fluid.Program()
+    startup_prog = fluid.Program()
-    test_startup_prog = fluid.Program()
    train_prog = fluid.Program()
    test_prog = fluid.Program()
-    train_py_reader, loss_train = build_program(train_prog, train_startup_prog,
+    train_py_reader, loss_train = build_program(
-                                                args, True, model, im_shape,
+        train_prog, startup_prog, args, True, model, im_shape, steps_one_epoch)
-                                                steps_one_epoch)
-    test_py_reader, loss_test, acc_1, acc_5 = build_program(
+    test_py_reader, prob, acc_1, acc_5 = build_program(
-        test_prog, test_startup_prog, args, False, model, im_shape,
+        test_prog, startup_prog, args, False, model, im_shape, steps_one_epoch)
-        steps_one_epoch)
    test_prog = test_prog.clone(for_test=True)
    place = fluid.CUDAPlace(0)
    exe = fluid.Executor(place)
-    exe.run(train_startup_prog)
+    exe.run(startup_prog)
-    exe.run(test_startup_prog)
+    if args.pretrained_model:
+        def if_exist(var):
+            return os.path.exists(os.path.join(args.pretrained_model, var.name))
+        fluid.io.load_vars(
+            exe,
+            args.pretrained_model,
+            main_program=train_prog,
+            predicate=if_exist)
    exec_strategy = fluid.ExecutionStrategy()
    exec_strategy.num_threads = 1
-    train_exe = fluid.ParallelExecutor(
+    build_strategy = fluid.BuildStrategy()
-        main_program=train_prog,
+    build_strategy.memory_optimize = False
-        use_cuda=True,
+    build_strategy.enable_inplace = True
-        loss_name=loss_train.name,
-        exec_strategy=exec_strategy)
+    compile_program = fluid.compiler.CompiledProgram(
+        train_prog).with_data_parallel(
+            loss_name=loss_train.name,
+            build_strategy=build_strategy,
+            exec_strategy=exec_strategy)
    train_reader = reader.train10(args)
    test_reader = reader.test10(args)
    train_py_reader.decorate_paddle_reader(train_reader)
    test_py_reader.decorate_paddle_reader(test_reader)
-    fluid.clip.set_gradient_clip(fluid.clip.GradientClipByNorm(args.grad_clip))
+    fluid.clip.set_gradient_clip(
-    fluid.memory_optimize(fluid.default_main_program())
+        fluid.clip.GradientClipByGlobalNorm(args.grad_clip), program=train_prog)
+    train_fetch_list = [loss_train]
    def save_model(postfix, main_prog):
-        model_path = os.path.join(args.model_path, postfix)
+        model_path = os.path.join(args.save_model_path, postfix)
        if os.path.isdir(model_path):
            shutil.rmtree(model_path)
        fluid.io.save_persistables(exe, model_path, main_program=main_prog)
    def test(epoch_id):
-        test_fetch_list = [loss_test, acc_1, acc_5]
+        test_fetch_list = [prob, acc_1, acc_5]
-        objs = utils.AvgrageMeter()
        top1 = utils.AvgrageMeter()
        top5 = utils.AvgrageMeter()
        test_py_reader.start()
@@ -196,11 +176,10 @@ def train(model, args, im_shape, steps_one_epoch):
            while True:
                prev_test_start_time = test_start_time
                test_start_time = time.time()
-                loss_test_v, acc_1_v, acc_5_v = exe.run(
+                prob_v, acc_1_v, acc_5_v = exe.run(test_prog,
-                    test_prog, fetch_list=test_fetch_list)
+                                                   fetch_list=test_fetch_list)
-                objs.update(np.array(loss_test_v), args.batch_size)
+                top1.update(np.array(acc_1_v), np.array(prob_v).shape[0])
-                top1.update(np.array(acc_1_v), args.batch_size)
+                top5.update(np.array(acc_5_v), np.array(prob_v).shape[0])
-                top5.update(np.array(acc_5_v), args.batch_size)
                if step_id % args.report_freq == 0:
                    print("Epoch {}, Step {}, acc_1 {}, acc_5 {}, time {}".
                          format(epoch_id, step_id,
@@ -213,7 +192,6 @@ def train(model, args, im_shape, steps_one_epoch):
        print("Epoch {0}, top1 {1}, top5 {2}".format(epoch_id, top1.avg,
                                                     top5.avg))
-    train_fetch_list = [loss_train]
    epoch_start_time = time.time()
    for epoch_id in range(args.epochs):
        model.drop_path_prob = args.drop_path_prob * epoch_id / args.epochs
@@ -230,7 +208,8 @@ def train(model, args, im_shape, steps_one_epoch):
            while True:
                prev_start_time = start_time
                start_time = time.time()
-                loss_v, = train_exe.run(
+                loss_v, = exe.run(
+                    compile_program,
                    fetch_list=[v.name for v in train_fetch_list])
                print("Epoch {}, Step {}, loss {}, time {}".format(epoch_id, step_id, \
                        np.array(loss_v).mean(), start_time-prev_start_time))
@@ -238,8 +217,10 @@ def train(model, args, im_shape, steps_one_epoch):
                sys.stdout.flush()
        except fluid.core.EOFException:
            train_py_reader.reset()
-        if epoch_id % 50 == 0 or epoch_id == args.epochs - 1:
+        if epoch_id % 50 == 0:
            save_model(str(epoch_id), train_prog)
+        if epoch_id == args.epochs - 1:
+            save_model('final', train_prog)
        test(epoch_id)

--- a/LRC/voting.py
+++ b/LRC/voting.py
+import numpy as np
+try:
+    import cPickle as pickle
+except ImportError:
+    import pickle
+import sys, os
+model_path = 'paddle_predict'
+fl = os.listdir(model_path)
+labels = np.load('labels.npz')['arr_0']
+pred = np.zeros((10000, 10))
+fl.sort()
+i = 0
+for f in fl:
+    print(f)
+    pred += pickle.load(open(os.path.join(model_path, f)))
+    print(np.mean(np.argmax(pred, axis=1) == labels))
+    i += 1
--- a/README.md
+++ b/README.md
 # AutoDL
\ No newline at end of file