diff --git a/README.md b/README.md
index b9793c3eab5d40c28f01cc67ad607b97261b3235..db0fbd88b250cdc2a3cc77521cc1c2cea77c6e87 100644
--- a/README.md
+++ b/README.md
@@ -51,19 +51,19 @@ Please refer to our [release announcement](https://github.com/PaddlePaddle/Paddl
 - **Connected to Products**
 
     In addition, PaddlePaddle is also designed to be easily deployable. At Baidu,
-    PaddlePaddle has been deployed into products or service with a vast number
+    PaddlePaddle has been deployed into products and services with a vast number
     of users, including ad click-through rate (CTR) prediction, large-scale image
     classification, optical character recognition(OCR), search ranking, computer
     virus detection, recommendation, etc. It is widely utilized in products at
-    Baidu and it has achieved a significant impact. We hope you can also exploit
-    the capability of PaddlePaddle to make a huge impact for your product.
+    Baidu and it has achieved a significant impact. We hope you can also explore
+    the capability of PaddlePaddle to make an impact on your product.
 
 ## Installation
 
 It is recommended to check out the
 [Docker installation guide](http://doc.paddlepaddle.org/develop/doc/getstarted/build_and_install/docker_install_en.html)
 before looking into the
-[build from source guide](http://doc.paddlepaddle.org/develop/doc/getstarted/build_and_install/build_from_source_en.html)
+[build from source guide](http://doc.paddlepaddle.org/develop/doc/getstarted/build_and_install/build_from_source_en.html).
 
 ## Documentation
 
@@ -72,7 +72,7 @@ We provide [English](http://doc.paddlepaddle.org/develop/doc/) and
 
 - [Deep Learning 101](http://book.paddlepaddle.org/index.html)
 
-  You might want to start from this online interactive book that can run in Jupyter Notebook.
+  You might want to start from this online interactive book that can run in a Jupyter Notebook.
 
 - [Distributed Training](http://doc.paddlepaddle.org/develop/doc/howto/usage/cluster/cluster_train_en.html)
 
diff --git a/doc/design/tensor_array.md b/doc/design/tensor_array.md
new file mode 100644
index 0000000000000000000000000000000000000000..a0419ec002159893b035fae1300fce489e68936a
--- /dev/null
+++ b/doc/design/tensor_array.md
@@ -0,0 +1,73 @@
+# Design for TensorArray
+TensorArray as a new concept is borrowed from TensorFlow, 
+it is meant to be used with dynamic iteration primitives such as `while_loop` and `map_fn`.
+
+This concept can be used to support our new design of dynamic operations, and help to refactor some existing variant-sentence-related layers, 
+such as `RecurrentGradientMachine`.
+
+In [our design for dynamic RNN](https://github.com/PaddlePaddle/Paddle/pull/4401), 
+`TensorArray` is used to segment inputs and store states in all time steps.
+By providing some methods similar to a C++ array,
+the definition of some state-based dynamic models such as RNN could be more natural and highly flexible.
+
+## Dynamic-Related Methods
+Some basic methods should be proposed as follows:
+
+### stack()
+Pack the values in a `TensorArray` into a tensor with rank one higher than each tensor in `values`.
+### unstack(axis=0)
+Unpacks the given dimension of a rank-`R` tensor into rank-`(R-1)` tensors.
+### concat()
+Return the values in the `TensorArray` as a concatenated Tensor.
+### write(index, value, data_shared=true)
+Write value into index of the TensorArray.
+### read(index)
+Read the value at location `index` in the `TensorArray`.
+### size()
+Return the number of values.
+
+## LoDTensor-related Supports
+The `RecurrentGradientMachine` in Paddle serves as a flexible RNN layer; it takes variant length sequences as input, 
+because each step of RNN could only take a tensor-represented batch of data as input, 
+some preprocess should be taken on the inputs such as sorting the sentences by their length in descending order and cut each word and pack to new batches.
+
+Such cut-like operations can be embedded into `TensorArray` as general methods called `unpack` and `pack`.
+
+With these two methods, a variant-sentence-RNN can be implemented like
+
+```c++
+// input is the varient-length data
+LodTensor sentence_input(xxx);
+TensorArray ta;
+Tensor indice_map;
+Tensor boot_state = xxx; // to initialize rnn's first state
+TensorArray::unpack(input, 1/*level*/, true/*sort_by_length*/, &ta, &indice_map);
+TessorArray step_outputs;
+TensorArray states;
+
+for (int step = 0; step = ta.size(); step++) {
+  auto state = states.read(step);
+  // rnnstep is a function which acts like a step of RNN
+  auto step_input = ta.read(step);
+  auto step_output = rnnstep(step_input, state);
+  step_outputs.write(step_output, true/*data_shared*/);
+}
+
+// rnn_output is the final output of an rnn
+LoDTensor rnn_output = ta.pack(ta, indice_map);
+```
+the code above shows that by embedding the LoDTensor-related preprocess operations into `TensorArray`,
+the implementation of a RNN that supports varient-length sentences is far more concise than `RecurrentGradientMachine` because the latter mixes all the codes together, hard to read and extend.
+
+
+some details are as follows.
+
+### unpack(level, sort_by_length)
+Split LodTensor in some `level` and generate batches, if set `sort_by_length`, will sort by length.
+
+Returns:
+
+- a new `TensorArray`, whose values are LodTensors and represents batches of data.
+- an int32 Tensor, which stores the map from the new batch's indices to original LoDTensor
+### pack(level, indices_map)
+Recover the original LoD-arranged LoDTensor with the values in a `TensorArray` and `level` and `indices_map`.
diff --git a/doc/howto/dev/new_op_en.md b/doc/howto/dev/new_op_en.md
index 60681cdd718547e1abc730ea720f05bbd39561f1..bad1dbc1de9cc5bd11914fddf397857f0bda7976 100644
--- a/doc/howto/dev/new_op_en.md
+++ b/doc/howto/dev/new_op_en.md
@@ -182,7 +182,7 @@ Note that **different devices (CPU, GPU)share an Op definition; whether or not t
 
 `MulOp`'s CPU and GPU share the same `Kernel`. A non-sharing  `OpKernel` example can be seen in [`OnehotCrossEntropyOpKernel`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/cross_entropy_op.h#L43).
 
-To ease the writing of `OpKernel` compute, and for reusing code cross-device, `Eigen unsupported Tensor` module is used to implement `Compute` interface. To learn about how the Eigen library is used in PaddlePaddle, please see [usage document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md).
+To ease the writing of `OpKernel` compute, and for reusing code cross-device, [`Eigen-unsupported Tensor`](https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md?fileviewer=file-view-default) module is used to implement `Compute` interface. To learn about how the Eigen library is used in PaddlePaddle, please see [usage document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/dev/use_eigen_cn.md).
 
 
 This concludes the forward implementation of an operator. Next its operation and kernel need to be registered in a `.cc` file.
diff --git a/doc/howto/dev/use_eigen_en.md b/doc/howto/dev/use_eigen_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..e169106e12f5d62696f1f0e7163562793b32c18c
--- /dev/null
+++ b/doc/howto/dev/use_eigen_en.md
@@ -0,0 +1,146 @@
+## How to use Eigen in Paddle
+
+Essentially, a neural network is a compute graph. T data needed for the computation is stored in `Tensor`s and its computation procedure is described by `Operator`s. An `Operator` calls the `Compute` interface in its corresponding `OpKernel` and operates on the `Tensor`.
+
+
+### Eigen Tensor Module
+
+The Eigen Tensor module supports powerful element-wise computation. In addition, a piece of code written using it can be run on both the CPU and the GPU.
+
+Note that Eigen Tensor is still being actively developed, so its tests are not completely covered and its documentation may be sparse.
+
+For details on Eigen Tensor module, please see [doc 1](https://github.com/RLovelett/eigen/blob/master/unsupported/Eigen/CXX11/src/Tensor/README.md) and [doc 2](https://bitbucket.org/eigen/eigen/src/default/unsupported/Eigen/CXX11/src/Tensor/README.md).
+
+
+### paddle::framework::Tensor
+
+Paddle Tensor's is defined in the framework directory with the following interface:
+
+```cpp
+class Tensor {
+ public:
+  /*! Return a pointer to mutable memory block. */
+  template <typename T>
+  inline T* data();
+
+  /**
+   * @brief   Return a pointer to mutable memory block.
+   * @note    If not exist, then allocation.
+   */
+  template <typename T>
+  inline T* mutable_data(platform::Place place);
+
+  /**
+   * @brief     Return a pointer to mutable memory block.
+   *
+   * @param[in] dims    The dimensions of the memory block.
+   * @param[in] place   The place of the memory block.
+   *
+   * @note      If not exist, then allocation.
+   */
+  template <typename T>
+  inline T* mutable_data(DDim dims, platform::Place place);
+
+  /*! Resize the dimensions of the memory block. */
+  inline Tensor& Resize(const DDim& dims);
+
+  /*! Return the dimensions of the memory block. */
+  inline const DDim& dims() const;
+
+ private:
+  /*! holds the memory block if allocated. */
+  std::shared_ptr<Placeholder> holder_;
+
+  /*! points to dimensions of memory block. */
+  DDim dim_;
+};
+```
+
+`Placeholder` is used to delay memory allocation; that is, we can first define a tensor, using `Resize` to configure its shape, and then call `mutuable_data` to allocate the actual memory.
+
+```cpp
+paddle::framework::Tensor t;
+paddle::platform::CPUPlace place;
+// set size first
+t.Resize({2, 3});
+// allocate memory on CPU later
+t.mutable_data(place);
+```
+
+### paddle::framework::Tensor Usage
+`AddOp` demonstrates Tensor's usage.
+
+- InferShape
+
+When computing a neural network's compute graph, first call every `Operator`'s `InferShape` method, and use `Resize` to configure the size of the output tensor.
+
+```cpp
+void InferShape(const framework::InferShapeContext &ctx) const override {
+  PADDLE_ENFORCE_EQ(ctx.Input<Tensor>("X")->dims(),
+                    ctx.Input<Tensor>("Y")->dims(),
+                    "Two input of Add Op's dimension must be same.");
+  ctx.Output<Tensor>("Out")->Resize(ctx.Input<Tensor>("X")->dims());
+}
+```
+
+
+- Run
+
+```cpp
+void Compute(const framework::ExecutionContext& context) const override {
+  auto* input0 = context.Input<Tensor>("X");
+  auto* input1 = context.Input<Tensor>("Y");
+  auto* output = context.Output<Tensor>("Out");
+
+  output->mutable_data<T>(context.GetPlace());
+
+  auto x = EigenVector<T>::Flatten(*input0);
+  auto y = EigenVector<T>::Flatten(*input1);
+  auto z = EigenVector<T>::Flatten(*output);
+
+  auto place = context.GetEigenDevice<Place>();
+
+  z.device(place) = x + y;
+}
+```
+
+
+### paddle::framework::Tensor到EigenTensor的转换
+
+As shown above, in actual computation, we need to transform the input and output `Tensor`s into formats Eigen supports. We show some functions in [eigen.h](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/eigen.h) to implement the transformation from `paddle::framework::Tensor`to `EigenTensor/EigenMatrix/EigenVector/EigenScalar`.
+
+Using EigenTensor as an example:
+
+```cpp
+Tensor t;
+float* p = t.mutable_data<float>(make_ddim({1, 2, 3}), platform::CPUPlace());
+for (int i = 0; i < 1 * 2 * 3; i++) {
+  p[i] = static_cast<float>(i);
+}
+
+EigenTensor<float, 3>::Type et = EigenTensor<float, 3>::From(t);
+```
+
+`From` is an interfacing method provided by the EigenTensor template, which implements the transformation from a `paddle::framework::Tensor` object to an EigenTensor. Since `rank` is a template parameter, it needs to be explicitly specified at the time of the transformation.
+
+In Eigen, tensors with different ranks are different types, with `Vector` bring a rank-1 instance. Note that `EigenVector<T>::From` uses a transformation from an 1-dimensional Paddle tensor to a 1-dimensional Eigen tensor while `EigenVector<T>::Flatten` reshapes a paddle tensor and flattens it into a 1-dimensional Eigen tensor. Both resulting tensors are still typed EigenVector.
+
+For more transformations, see the [unit tests](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/eigen_test.cc) in the `eigen_test.cc` file.
+
+
+
+### Implementing Computation
+
+While computing, the device interface is needed from the EigenTensors on the left hand side of the assignments. Note that the computation between EigenTensors only changes the data originally inthe Tensor and does not change all the shape information associated with the Tensor.
+
+```cpp
+auto x = EigenVector<T>::Flatten(*input0);
+auto y = EigenVector<T>::Flatten(*input1);
+auto z = EigenVector<T>::Flatten(*output);
+auto place = context.GetEigenDevice<Place>();
+z.device(place) = x + y;
+```
+
+In this code segment, input0/input1/output can be Tensors of arbitrary dimension. We are calling Flatten from EigenVector, transforming a tensor of any dimension into a 1-dimensional EigenVector. After completing computation, input0/input1/output will retain the same shape information, and they can be resized using the `Resize` interface.
+
+Because the Eigen Tensor module is under-documented, please refer to `OpKernel`'s computation code in TensorFlow's [kernel module documentation](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/kernels).
diff --git a/paddle/framework/lod_tensor.cc b/paddle/framework/lod_tensor.cc
index 3c349637cdbe59b2cf9a1ea28e7715f4181f9293..5b7badf89c1714331bae9fc8cf94c8da2c66dbad 100644
--- a/paddle/framework/lod_tensor.cc
+++ b/paddle/framework/lod_tensor.cc
@@ -72,6 +72,22 @@ bool operator==(const LoD& a, const LoD& b) {
   return true;
 }
 
+size_t LoDTensor::NumElements(size_t level, size_t idx) const {
+  PADDLE_ENFORCE_LT(level, NumLevels());
+  PADDLE_ENFORCE_LT(idx, NumElements(level));
+  // the last level of LoD, just return number of records in Tensor
+  if (level == NumLevels() - 1) {
+    return lod_[level][idx + 1] - lod_[level][idx];
+  }
+  // high level of LoD, and there is another lower level, return number of
+  // lower-level elements
+  auto tmp = SliceInLevel(lod_, level, idx, idx + 1);
+  PADDLE_ENFORCE_GE(tmp.size(), 2);
+  // there is a 0 as a placeholder stored in LoD, so the number of elements
+  // equals lod.size() - 1
+  return tmp[1].size() - 1;
+}
+
 void LoDTensor::ShrinkLevels(size_t level_begin, size_t level_end) {
   auto new_lod = framework::SliceLevels(lod_, level_begin, level_end);
   lod_ = new_lod;
diff --git a/paddle/framework/lod_tensor.h b/paddle/framework/lod_tensor.h
index 82f58464264c6871b51251e0feae3d5ca076cd2b..49786a4a6635f1b39356dbf9633c4e7da443f04e 100644
--- a/paddle/framework/lod_tensor.h
+++ b/paddle/framework/lod_tensor.h
@@ -38,6 +38,18 @@ using Vector = thrust::host_vector<
     T, thrust::system::cuda::experimental::pinned_allocator<T>>;
 #endif
 
+/*
+ * 3-level LoD stores
+ *
+ * 0 10 20
+ * 0 5 10 15 20
+ * 0 2 5 7 10 12 15 20
+ *
+ * - in a level, each element indicates offset in the underlying Tensor
+ * - the first element should be 0 and that indicates that this sequence start
+ * from 0
+ * - each sequence's begin and end(no-inclusive) is level[id, id+1]
+ */
 using LoD = std::vector<Vector<size_t>>;
 
 LoD SliceLevels(const LoD& in, size_t level_begin, size_t level_end);
@@ -65,11 +77,8 @@ class LoDTensor : public Tensor {
    * Get a element from LoD.
    */
   size_t lod_element(size_t level, size_t elem) const {
-    PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level,
-                   NumLevels());
-    PADDLE_ENFORCE(elem < NumElements(level),
-                   "element begin [%d] out of range [%d]", elem,
-                   NumElements(level));
+    PADDLE_ENFORCE_LT(level, NumLevels());
+    PADDLE_ENFORCE_LT(elem, NumElements(level));
     return (lod_)[level][elem];
   }
 
@@ -82,12 +91,23 @@ class LoDTensor : public Tensor {
    * Number of elements in a level.
    */
   size_t NumElements(size_t level = 0) const {
-    PADDLE_ENFORCE(level < NumLevels(), "level [%d] out of range [%d]", level,
-                   NumLevels());
+    PADDLE_ENFORCE_LT(level, NumLevels());
     // the last offset is the end of last element
     return (lod_)[level].size() - 1;
   }
 
+  /*
+   * Number of lower-level elements.
+   * For example, a 2-level lod-tensor
+   *
+   * 0-th level   |   |
+   * 1-th level   ||  |||
+   *
+   * NumElements(0, 0) get 2
+   * NumElements(0, 1) get 3
+   */
+  size_t NumElements(size_t level, size_t idx) const;
+
   /*
    * Shrink levels[level_begin:level_end]
    */
diff --git a/paddle/framework/lod_tensor_test.cc b/paddle/framework/lod_tensor_test.cc
index 486b839738ec077545163bc47e6a97ef188c3c2f..44f09f584fb752d7003baa804979f3bb5cd9d651 100644
--- a/paddle/framework/lod_tensor_test.cc
+++ b/paddle/framework/lod_tensor_test.cc
@@ -56,6 +56,12 @@ TEST_F(LoDTensorTester, NumElements) {
   ASSERT_EQ(lod_tensor_.NumElements(2), 8UL);
 }
 
+TEST_F(LoDTensorTester, NumElements2) {
+  ASSERT_EQ(lod_tensor_.NumElements(0, 0), 2UL);
+  ASSERT_EQ(lod_tensor_.NumElements(0, 1), 2UL);
+  ASSERT_EQ(lod_tensor_.NumElements(1, 1), 2UL);
+}
+
 TEST_F(LoDTensorTester, ShrinkLevels) {
   // slice 1 level
   for (size_t level = 0; level < 3UL; ++level) {
@@ -65,7 +71,7 @@ TEST_F(LoDTensorTester, ShrinkLevels) {
     ASSERT_EQ(new_lod_tensor.NumElements(0), lod_tensor_.NumElements(level));
     ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
   }
-  // slice 2 level
+  // shrink 2 level
   for (size_t level = 0; level < 2UL; ++level) {
     LoDTensor new_lod_tensor = lod_tensor_;
     new_lod_tensor.ShrinkLevels(level, level + 2);
diff --git a/paddle/framework/lod_tensor_test.cu b/paddle/framework/lod_tensor_test.cu
index 97e69cdb2e5e1e64031c899f5e04020665485ba8..647d07536dd070bc37137fc01f683ec07ba7d6f4 100644
--- a/paddle/framework/lod_tensor_test.cu
+++ b/paddle/framework/lod_tensor_test.cu
@@ -36,8 +36,8 @@ TEST(LoDTensor, LoDInGPU) {
   lod_tensor.mutable_data<float>(place);
 
   lod_tensor.set_lod(src_lod);
-  CHECK_EQ(lod_tensor.lod_element(0, 2), 4);
-  CHECK_EQ(lod_tensor.lod_element(0, 4), 8);
+  CHECK_EQ(lod_tensor.lod_element(0, 2), 4UL);
+  CHECK_EQ(lod_tensor.lod_element(0, 4), 8UL);
 
   auto lod = lod_tensor.lod();
 
diff --git a/paddle/gserver/layers/MKLDNNConvLayer.cpp b/paddle/gserver/layers/MKLDNNConvLayer.cpp
index 9a0abd291ae8fae43b0e95c7371f3ce35d1261ec..0d6742e909635c1097b4fe21bbb304f8a71af5cb 100644
--- a/paddle/gserver/layers/MKLDNNConvLayer.cpp
+++ b/paddle/gserver/layers/MKLDNNConvLayer.cpp
@@ -28,7 +28,7 @@ bool MKLDNNConvLayer::init(const LayerMap& layerMap,
   if (!MKLDNNLayer::init(layerMap, parameterMap)) {
     return false;
   }
-  CHECK_EQ(inputLayers_.size(), 1) << "Only support one input layer yet";
+  CHECK_EQ(inputLayers_.size(), 1UL) << "Only support one input layer yet";
   CHECK_EQ(inputLayers_.size(), parameters_.size());
   CHECK(config_.shared_biases()) << "Only support shared biases yet";
 
diff --git a/paddle/gserver/layers/MKLDNNFcLayer.cpp b/paddle/gserver/layers/MKLDNNFcLayer.cpp
index 8cbfbd0d2b9f2149f7c959aec5c4ae1de952f903..e829456d6afd7cc844f752d4571cd9f90c73997f 100644
--- a/paddle/gserver/layers/MKLDNNFcLayer.cpp
+++ b/paddle/gserver/layers/MKLDNNFcLayer.cpp
@@ -28,7 +28,7 @@ bool MKLDNNFcLayer::init(const LayerMap& layerMap,
     return false;
   }
 
-  CHECK_EQ(inputLayers_.size(), 1) << "Only support one input layer yet";
+  CHECK_EQ(inputLayers_.size(), 1UL) << "Only support one input layer yet";
   CHECK_EQ(inputLayers_.size(), parameters_.size());
   CHECK(!parameters_[0]->isSparse()) << "Do not support sparse yet";
 
diff --git a/paddle/gserver/tests/test_CrossEntropyOverBeamGrad.cpp b/paddle/gserver/tests/test_CrossEntropyOverBeamGrad.cpp
index 538d18cdc3d262df0ddb031d9e6b38a3fea57606..c922237d33da5de0ece61df732334bee5592249d 100644
--- a/paddle/gserver/tests/test_CrossEntropyOverBeamGrad.cpp
+++ b/paddle/gserver/tests/test_CrossEntropyOverBeamGrad.cpp
@@ -228,7 +228,7 @@ void genGroundTruth(vector<SingleBeamExpansion>& beamExpansions,
         curBeam.groundTruth[j] = *(start + n);
         curBeam.inBeam[j] = 1;
       } else {
-        CHECK_LE(curBeam.rowIdxInBeam[j] + 1,
+        CHECK_LE((size_t)curBeam.rowIdxInBeam[j] + 1,
                  curBeam.subSeqStartPos.size() - 1);
         int start = curBeam.subSeqStartPos[curBeam.rowIdxInBeam[j]];
         int end = curBeam.subSeqStartPos[curBeam.rowIdxInBeam[j] + 1];
diff --git a/paddle/memory/.clang-format b/paddle/memory/.clang-format
deleted file mode 100644
index 29282dc87e2c499988c17d90d47d44cd5cf7f115..0000000000000000000000000000000000000000
--- a/paddle/memory/.clang-format
+++ /dev/null
@@ -1,5 +0,0 @@
----
-Language:        Cpp
-BasedOnStyle:  Google
-Standard:  Cpp11 
-...
diff --git a/paddle/memory/.clang-format b/paddle/memory/.clang-format
new file mode 120000
index 0000000000000000000000000000000000000000..7d28cb3924707d39dafe20f4664fb17b5538996c
--- /dev/null
+++ b/paddle/memory/.clang-format
@@ -0,0 +1 @@
+../framework/.clang-format
\ No newline at end of file
diff --git a/paddle/operators/.clang-format b/paddle/operators/.clang-format
deleted file mode 100644
index 47b8a85206ab457e2b3cb90a68b7a82a0753d327..0000000000000000000000000000000000000000
--- a/paddle/operators/.clang-format
+++ /dev/null
@@ -1,5 +0,0 @@
----
-Language:        Cpp
-BasedOnStyle:  Google
-Standard:  Cpp11
-...
diff --git a/paddle/operators/.clang-format b/paddle/operators/.clang-format
new file mode 120000
index 0000000000000000000000000000000000000000..7d28cb3924707d39dafe20f4664fb17b5538996c
--- /dev/null
+++ b/paddle/operators/.clang-format
@@ -0,0 +1 @@
+../framework/.clang-format
\ No newline at end of file
diff --git a/paddle/operators/multiplex_op.cu b/paddle/operators/multiplex_op.cu
index 70e46815fc9148a2530d437d20c14f5d40baa1a4..505776612e7119e568493506b113661a839e5bd1 100644
--- a/paddle/operators/multiplex_op.cu
+++ b/paddle/operators/multiplex_op.cu
@@ -42,7 +42,7 @@ class MultiplexGPUKernel : public framework::OpKernel {
     for (auto i = 0; i < rows; i++) {
       int32_t k = index[i];
       PADDLE_ENFORCE_GE(k, 0, "index must be nonnegative.");
-      PADDLE_ENFORCE_LT(k, ins.size(),
+      PADDLE_ENFORCE_LT((size_t)k, ins.size(),
                         "index exceeds the number of candidate tensors.");
       memory::Copy(place, out->data<T>() + i * cols, place,
                    ins[k]->data<T>() + i * cols, cols * sizeof(T), stream);
diff --git a/paddle/pybind/.clang-format b/paddle/pybind/.clang-format
new file mode 120000
index 0000000000000000000000000000000000000000..7d28cb3924707d39dafe20f4664fb17b5538996c
--- /dev/null
+++ b/paddle/pybind/.clang-format
@@ -0,0 +1 @@
+../framework/.clang-format
\ No newline at end of file
diff --git a/paddle/pybind/CMakeLists.txt b/paddle/pybind/CMakeLists.txt
index aa9ca4e31aa8bdae159ce2d8db8eadd2ab49dffc..326cc4a75bd5cc29f79de88a3e0802d17c812ecd 100644
--- a/paddle/pybind/CMakeLists.txt
+++ b/paddle/pybind/CMakeLists.txt
@@ -1,6 +1,6 @@
 if(WITH_PYTHON)
   cc_library(paddle_pybind SHARED
-    SRCS pybind.cc protobuf.cc
+    SRCS pybind.cc exception.cc protobuf.cc
     DEPS pybind python backward
     ${GLOB_OP_LIB})
 endif(WITH_PYTHON)
diff --git a/paddle/pybind/exception.cc b/paddle/pybind/exception.cc
new file mode 100644
index 0000000000000000000000000000000000000000..ff79b12ee4b28c53ee04f4c170b5bca9ca28d14a
--- /dev/null
+++ b/paddle/pybind/exception.cc
@@ -0,0 +1,34 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License. */
+
+#include "paddle/pybind/exception.h"
+
+namespace paddle {
+namespace pybind {
+
+void BindException(pybind11::module& m) {
+  static pybind11::exception<platform::EnforceNotMet> exc(m, "EnforceNotMet");
+  pybind11::register_exception_translator([](std::exception_ptr p) {
+    try {
+      if (p) std::rethrow_exception(p);
+    } catch (const platform::EnforceNotMet& e) {
+      exc(e.what());
+    }
+  });
+
+  m.def("__unittest_throw_exception__", [] { PADDLE_THROW("test exception"); });
+}
+
+}  // namespace pybind
+}  // namespace paddle
diff --git a/paddle/pybind/exception.h b/paddle/pybind/exception.h
new file mode 100644
index 0000000000000000000000000000000000000000..12c7df93f617d40b5e028d1ae897ce47197c47c6
--- /dev/null
+++ b/paddle/pybind/exception.h
@@ -0,0 +1,23 @@
+/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License. */
+
+#pragma once
+#include "paddle/platform/enforce.h"
+#include "pybind11/pybind11.h"
+namespace paddle {
+namespace pybind {
+
+extern void BindException(pybind11::module& m);
+}  // namespace pybind
+}  // namespace paddle
diff --git a/paddle/pybind/protobuf.cc b/paddle/pybind/protobuf.cc
index de3f7bb97be9787e0e9b0130a87d19529edf44f3..1a29621bdf13030c8781dab4acccca08d7250dbe 100644
--- a/paddle/pybind/protobuf.cc
+++ b/paddle/pybind/protobuf.cc
@@ -46,8 +46,7 @@ struct variant_caster<V<Ts...>> {
 
   template <typename T>
   typename std::enable_if<
-      !std::is_same<T, boost::detail::variant::void_>::value,
-      bool>::type
+      !std::is_same<T, boost::detail::variant::void_>::value, bool>::type
   try_load(handle src, bool convert) {
     auto caster = make_caster<T>();
     if (!load_success_ && caster.load(src, convert)) {
@@ -71,8 +70,7 @@ struct variant_caster<V<Ts...>> {
     return load_success_;
   }
 
-  static handle cast(Type const &src,
-                     return_value_policy policy,
+  static handle cast(Type const &src, return_value_policy policy,
                      handle parent) {
     variant_caster_visitor visitor(policy, parent);
     return boost::apply_visitor(visitor, src);
@@ -101,8 +99,8 @@ inline std::vector<T> RepeatedToVector(
     const google::protobuf::RepeatedField<T> &repeated_field) {
   std::vector<T> ret;
   ret.reserve(repeated_field.size());
-  std::copy(
-      repeated_field.begin(), repeated_field.end(), std::back_inserter(ret));
+  std::copy(repeated_field.begin(), repeated_field.end(),
+            std::back_inserter(ret));
   return ret;
 }
 
@@ -134,7 +132,7 @@ class VarDescBind;
 // read/write speed. Only when we want the protobuf message, the local changes
 // will be synchronized (by `Sync` method).
 class VarDescBind {
-public:
+ public:
   explicit VarDescBind(const std::string &name) { desc_.set_name(name); }
 
   VarDesc *Proto() { return &desc_; }
@@ -157,12 +155,12 @@ public:
     return desc_.lod_tensor().data_type();
   }
 
-private:
+ private:
   VarDesc desc_;
 };
 
 class OpDescBind {
-public:
+ public:
   OpDesc *Proto() {
     Sync();
     return &op_desc_;
@@ -174,8 +172,8 @@ public:
 
   const std::vector<std::string> &Input(const std::string &name) const {
     auto it = inputs_.find(name);
-    PADDLE_ENFORCE(
-        it != inputs_.end(), "Input %s cannot be found in Op %s", name, Type());
+    PADDLE_ENFORCE(it != inputs_.end(), "Input %s cannot be found in Op %s",
+                   name, Type());
     return it->second;
   }
 
@@ -196,10 +194,8 @@ public:
 
   const std::vector<std::string> &Output(const std::string &name) const {
     auto it = outputs_.find(name);
-    PADDLE_ENFORCE(it != outputs_.end(),
-                   "Output %s cannot be found in Op %s",
-                   name,
-                   Type());
+    PADDLE_ENFORCE(it != outputs_.end(), "Output %s cannot be found in Op %s",
+                   name, Type());
     return it->second;
   }
 
@@ -258,7 +254,7 @@ public:
     return boost::get<BlockDesc *>(it->second)->idx();
   }
 
-private:
+ private:
   struct SetAttrDescVisitor : public boost::static_visitor<void> {
     explicit SetAttrDescVisitor(OpDesc::Attr *attr) : attr_(attr) {}
     mutable OpDesc::Attr *attr_;
@@ -325,7 +321,7 @@ private:
 };
 
 class BlockDescBind {
-public:
+ public:
   BlockDescBind(ProgramDescBind *prog, BlockDesc *desc)
       : prog_(prog), desc_(desc), need_update_(false) {}
 
@@ -349,8 +345,8 @@ public:
   VarDescBind *Var(py::bytes name_bytes) const {
     std::string name = name_bytes;
     auto it = vars_.find(name);
-    PADDLE_ENFORCE(
-        it != vars_.end(), "Can not find variable %s in current block.", name);
+    PADDLE_ENFORCE(it != vars_.end(),
+                   "Can not find variable %s in current block.", name);
     return it->second.get();
   }
 
@@ -398,7 +394,7 @@ public:
 
   BlockDesc *RawPtr() { return desc_; }
 
-private:
+ private:
   ProgramDescBind *prog_;  // not_own
   BlockDesc *desc_;        // not_own
   bool need_update_;
@@ -412,7 +408,7 @@ using ProgDescMap =
 static ProgDescMap *g_bind_map = nullptr;
 
 class ProgramDescBind {
-public:
+ public:
   static ProgramDescBind &Instance(ProgramDesc *prog) {
     if (g_bind_map == nullptr) {
       g_bind_map = new ProgDescMap();
@@ -449,7 +445,7 @@ public:
     return prog_;
   }
 
-private:
+ private:
   explicit ProgramDescBind(ProgramDesc *prog) : prog_(prog) {
     for (auto &block : *prog->mutable_blocks()) {
       blocks_.emplace_back(new BlockDescBind(this, &block));
@@ -492,8 +488,7 @@ void BindProgramDesc(py::module &m) {
                     return &ProgramDescBind::Instance(prog_desc);
                   },
                   py::return_value_policy::reference)
-      .def("append_block",
-           &ProgramDescBind::AppendBlock,
+      .def("append_block", &ProgramDescBind::AppendBlock,
            py::return_value_policy::reference)
       .def("block", &ProgramDescBind::Block, py::return_value_policy::reference)
       .def("__str__", &ProgramDescBind::DebugString)
@@ -504,20 +499,16 @@ void BindBlockDesc(py::module &m) {
   py::class_<BlockDescBind>(m, "BlockDesc", "")
       .def_property_readonly("id", &BlockDescBind::ID)
       .def_property_readonly("parent", &BlockDescBind::Parent)
-      .def("append_op",
-           &BlockDescBind::AppendOp,
+      .def("append_op", &BlockDescBind::AppendOp,
            py::return_value_policy::reference)
-      .def("prepend_op",
-           &BlockDescBind::PrependOp,
+      .def("prepend_op", &BlockDescBind::PrependOp,
+           py::return_value_policy::reference)
+      .def("new_var", &BlockDescBind::NewVar,
            py::return_value_policy::reference)
-      .def(
-          "new_var", &BlockDescBind::NewVar, py::return_value_policy::reference)
       .def("var", &BlockDescBind::Var, py::return_value_policy::reference)
-      .def("all_vars",
-           &BlockDescBind::AllVars,
+      .def("all_vars", &BlockDescBind::AllVars,
            py::return_value_policy::reference)
-      .def("all_ops",
-           &BlockDescBind::AllOps,
+      .def("all_ops", &BlockDescBind::AllOps,
            py::return_value_policy::reference);
 }
 
diff --git a/paddle/pybind/pybind.cc b/paddle/pybind/pybind.cc
index df9ebaa2438e271f416bdb02ec2c439c679a21fe..3816aee21f8842c8fc73c56621234b66661e880c 100644
--- a/paddle/pybind/pybind.cc
+++ b/paddle/pybind/pybind.cc
@@ -21,6 +21,7 @@ limitations under the License. */
 #include "paddle/operators/recurrent_op.h"
 #include "paddle/platform/enforce.h"
 #include "paddle/platform/place.h"
+#include "paddle/pybind/exception.h"
 #include "paddle/pybind/pybind.h"
 #include "paddle/pybind/tensor_py.h"
 #include "paddle/string/to_string.h"
@@ -47,6 +48,8 @@ PYBIND11_PLUGIN(core) {
   // not cause namespace pollution.
   using namespace paddle::framework;  // NOLINT
 
+  BindException(m);
+
   py::class_<Tensor>(m, "Tensor", py::buffer_protocol())
       .def_buffer(
           [](Tensor &self) -> py::buffer_info { return CastToPyBuffer(self); })
@@ -161,8 +164,7 @@ All parameter, weight, gradient are variables in Paddle.
            py::return_value_policy::reference)
       .def("find_var", &Scope::FindVar, py::return_value_policy::reference)
       .def(py::init<>())
-      .def("new_scope",
-           [](Scope &self) -> Scope * { return &self.NewScope(); },
+      .def("new_scope", [](Scope &self) -> Scope * { return &self.NewScope(); },
            py::return_value_policy::reference)
       .def("drop_kids", &Scope::DropKids);
 
@@ -230,8 +232,7 @@ All parameter, weight, gradient are variables in Paddle.
            })
       .def("infer_shape", &OperatorBase::InferShape)
       .def("run",
-           [](OperatorBase &self,
-              const Scope &scope,
+           [](OperatorBase &self, const Scope &scope,
               const platform::DeviceContext &dev_ctx) {
              self.Run(scope, dev_ctx);
              dev_ctx.Wait();
@@ -259,10 +260,8 @@ All parameter, weight, gradient are variables in Paddle.
                     retv->SetType("plain_net");
                     return retv;
                   })
-      .def("append_op",
-           [](operators::NetOp &self, const OperatorBase &op) {
-             self.AppendOp(op);
-           })
+      .def("append_op", [](operators::NetOp &self,
+                           const OperatorBase &op) { self.AppendOp(op); })
       .def("complete_add_op", &operators::NetOp::CompleteAddOp)
       .def("complete_add_op", [](std::shared_ptr<operators::NetOp> &self) {
         self->CompleteAddOp();
@@ -282,9 +281,10 @@ All parameter, weight, gradient are variables in Paddle.
             auto rnn_op = OpRegistry::CreateOp(desc);
             return static_cast<operators::RecurrentOp *>(rnn_op.release());
           })
-      .def("set_stepnet",
-           [](operators::RecurrentOp &self, const operators::NetOp &net)
-               -> void { self.set_stepnet(net.Clone()); });
+      .def("set_stepnet", [](operators::RecurrentOp &self,
+                             const operators::NetOp &net) -> void {
+        self.set_stepnet(net.Clone());
+      });
 
   // cond_op
   py::class_<operators::CondOp, OperatorBase>(m, "CondOp")
diff --git a/paddle/pybind/tensor_py.h b/paddle/pybind/tensor_py.h
index bcfba84a1aa6e646cf255dc4612dfda42169fc44..f0d5a6f9ff963ecd80d0c261daff56bff50663d4 100644
--- a/paddle/pybind/tensor_py.h
+++ b/paddle/pybind/tensor_py.h
@@ -63,11 +63,8 @@ struct CastToPyBufferImpl<true, I, ARGS...> {
       }
       return py::buffer_info(
           dst_tensor.mutable_data<CUR_TYPE>(dst_tensor.holder_->place()),
-          sizeof(CUR_TYPE),
-          py::format_descriptor<CUR_TYPE>::format(),
-          (size_t)framework::arity(dst_tensor.dims()),
-          dims_outside,
-          strides);
+          sizeof(CUR_TYPE), py::format_descriptor<CUR_TYPE>::format(),
+          (size_t)framework::arity(dst_tensor.dims()), dims_outside, strides);
     } else {
       constexpr bool less = I + 1 < std::tuple_size<std::tuple<ARGS...>>::value;
       return CastToPyBufferImpl<less, I + 1, ARGS...>()(tensor);
@@ -110,8 +107,8 @@ void PyCUDATensorSetFromArray(
 
   self.Resize(framework::make_ddim(dims));
   auto *dst = self.mutable_data<T>(place);
-  paddle::platform::GpuMemcpySync(
-      dst, array.data(), sizeof(T) * array.size(), cudaMemcpyHostToDevice);
+  paddle::platform::GpuMemcpySync(dst, array.data(), sizeof(T) * array.size(),
+                                  cudaMemcpyHostToDevice);
 }
 #endif
 
diff --git a/paddle/string/.clang-format b/paddle/string/.clang-format
new file mode 120000
index 0000000000000000000000000000000000000000..7d28cb3924707d39dafe20f4664fb17b5538996c
--- /dev/null
+++ b/paddle/string/.clang-format
@@ -0,0 +1 @@
+../framework/.clang-format
\ No newline at end of file
diff --git a/paddle/string/piece.h b/paddle/string/piece.h
index 03ae9243a4cc4e9e92e376bf46ab2b1d7162dfcb..7362ce02c7c80e121218fab77d87696403b1c5e8 100644
--- a/paddle/string/piece.h
+++ b/paddle/string/piece.h
@@ -30,7 +30,7 @@ namespace string {
 // its syntax is simple as it doesn't own/manage the string, it is
 // cheap to construct Pieces and pass them around.
 class Piece {
-public:
+ public:
   static const size_t npos = static_cast<size_t>(-1);
 
   // We provide non-explicit singleton constructors so users can
@@ -57,7 +57,7 @@ public:
   // Return a string that contains the copy of the referenced data.
   std::string ToString() const { return std::string(data_, size_); }
 
-private:
+ private:
   const char* data_;
   size_t size_;
 
diff --git a/paddle/string/printf_test.cc b/paddle/string/printf_test.cc
index d8f2454165d741b3937f908dcfd87501940750d5..2586264046a2e2ba24b0908c1f6eba163cdef448 100644
--- a/paddle/string/printf_test.cc
+++ b/paddle/string/printf_test.cc
@@ -11,6 +11,6 @@ TEST(StringPrintf, StringPrintf) {
   long hour = 14;
   int min = 44;
   EXPECT_EQ(std::string("Wednesday, July 27, 14:44"),
-            paddle::string::Sprintf(
-                "%s, %s %d, %.2d:%.2d", weekday, month, day, hour, min));
+            paddle::string::Sprintf("%s, %s %d, %.2d:%.2d", weekday, month, day,
+                                    hour, min));
 }
diff --git a/paddle/string/tinyformat/tinyformat.h b/paddle/string/tinyformat/tinyformat.h
index f0e5e0160fb018b813c1dade727da2861a295147..3516777d9f9669c1e1300b9136c26e61f65b14a7 100644
--- a/paddle/string/tinyformat/tinyformat.h
+++ b/paddle/string/tinyformat/tinyformat.h
@@ -133,7 +133,7 @@ namespace detail {
 // Test whether type T1 is convertible to type T2
 template <typename T1, typename T2>
 struct is_convertible {
-private:
+ private:
   // two types of different size
   struct fail {
     char dummy[2];
@@ -146,7 +146,7 @@ private:
   static succeed tryConvert(const T2 &);
   static const T1 &makeT1();
 
-public:
+ public:
   // Standard trick: the (...) version of tryConvert will be chosen from
   // the overload set only if the version taking a T2 doesn't match.
   // Then we compare the sizes of the return types to check which
@@ -156,8 +156,7 @@ public:
 
 // Format the value by casting to type fmtT.  This default implementation
 // should never be called.
-template <typename T,
-          typename fmtT,
+template <typename T, typename fmtT,
           bool convertible = is_convertible<T, fmtT>::value>
 struct formatValueAsType {
   static void invoke(std::ostream & /*out*/, const T & /*value*/) { assert(0); }
@@ -227,11 +226,8 @@ TINYFORMAT_DEFINE_FORMAT_TRUNCATED_CSTR(char)
 /// operator<< to format the type T, with special cases for the %c and %p
 /// conversions.
 template <typename T>
-inline void formatValue(std::ostream &out,
-                        const char * /*fmtBegin*/,
-                        const char *fmtEnd,
-                        int ntrunc,
-                        const T &value) {
+inline void formatValue(std::ostream &out, const char * /*fmtBegin*/,
+                        const char *fmtEnd, int ntrunc, const T &value) {
   // The mess here is to support the %c and %p conversions: if these
   // conversions are active we try to convert the type to a char or const
   // void* respectively and format that instead of the value itself.  For the
@@ -253,25 +249,22 @@ inline void formatValue(std::ostream &out,
 }
 
 // Overloaded version for char types to support printing as an integer
-#define TINYFORMAT_DEFINE_FORMATVALUE_CHAR(charType) \
-  inline void formatValue(std::ostream &out,         \
-                          const char * /*fmtBegin*/, \
-                          const char *fmtEnd,        \
-                          int /**/,                  \
-                          charType value) {          \
-    switch (*(fmtEnd - 1)) {                         \
-      case 'u':                                      \
-      case 'd':                                      \
-      case 'i':                                      \
-      case 'o':                                      \
-      case 'X':                                      \
-      case 'x':                                      \
-        out << static_cast<int>(value);              \
-        break;                                       \
-      default:                                       \
-        out << value;                                \
-        break;                                       \
-    }                                                \
+#define TINYFORMAT_DEFINE_FORMATVALUE_CHAR(charType)                      \
+  inline void formatValue(std::ostream &out, const char * /*fmtBegin*/,   \
+                          const char *fmtEnd, int /**/, charType value) { \
+    switch (*(fmtEnd - 1)) {                                              \
+      case 'u':                                                           \
+      case 'd':                                                           \
+      case 'i':                                                           \
+      case 'o':                                                           \
+      case 'X':                                                           \
+      case 'x':                                                           \
+        out << static_cast<int>(value);                                   \
+        break;                                                            \
+      default:                                                            \
+        out << value;                                                     \
+        break;                                                            \
+    }                                                                     \
   }
 // per 3.9.1: char, signed char and unsigned char are all distinct types
 TINYFORMAT_DEFINE_FORMATVALUE_CHAR(char)
@@ -468,7 +461,7 @@ namespace detail {
 // each argument to be allocated as a homogenous array inside FormatList
 // whereas a naive implementation based on inheritance does not.
 class FormatArg {
-public:
+ public:
   FormatArg() {}
 
   template <typename T>
@@ -477,22 +470,17 @@ public:
         m_formatImpl(&formatImpl<T>),
         m_toIntImpl(&toIntImpl<T>) {}
 
-  void format(std::ostream &out,
-              const char *fmtBegin,
-              const char *fmtEnd,
+  void format(std::ostream &out, const char *fmtBegin, const char *fmtEnd,
               int ntrunc) const {
     m_formatImpl(out, fmtBegin, fmtEnd, ntrunc, m_value);
   }
 
   int toInt() const { return m_toIntImpl(m_value); }
 
-private:
+ private:
   template <typename T>
-  static void formatImpl(std::ostream &out,
-                         const char *fmtBegin,
-                         const char *fmtEnd,
-                         int ntrunc,
-                         const void *value) {
+  static void formatImpl(std::ostream &out, const char *fmtBegin,
+                         const char *fmtEnd, int ntrunc, const void *value) {
     formatValue(out, fmtBegin, fmtEnd, ntrunc, *static_cast<const T *>(value));
   }
 
@@ -502,11 +490,8 @@ private:
   }
 
   const void *m_value;
-  void (*m_formatImpl)(std::ostream &out,
-                       const char *fmtBegin,
-                       const char *fmtEnd,
-                       int ntrunc,
-                       const void *value);
+  void (*m_formatImpl)(std::ostream &out, const char *fmtBegin,
+                       const char *fmtEnd, int ntrunc, const void *value);
   int (*m_toIntImpl)(const void *value);
 };
 
@@ -555,12 +540,10 @@ inline const char *printFormatStringLiteral(std::ostream &out,
 // necessary to pull out variable width and precision .  The function returns a
 // pointer to the character after the end of the current format spec.
 inline const char *streamStateFromFormat(std::ostream &out,
-                                         bool &spacePadPositive,
-                                         int &ntrunc,
+                                         bool &spacePadPositive, int &ntrunc,
                                          const char *fmtStart,
                                          const detail::FormatArg *formatters,
-                                         int &argIndex,
-                                         int numFormatters) {
+                                         int &argIndex, int numFormatters) {
   if (*fmtStart != '%') {
     TINYFORMAT_ERROR(
         "tinyformat: Not enough conversion specifiers in format string");
@@ -736,10 +719,8 @@ inline const char *streamStateFromFormat(std::ostream &out,
 }
 
 //------------------------------------------------------------------------------
-inline void formatImpl(std::ostream &out,
-                       const char *fmt,
-                       const detail::FormatArg *formatters,
-                       int numFormatters) {
+inline void formatImpl(std::ostream &out, const char *fmt,
+                       const detail::FormatArg *formatters, int numFormatters) {
   // Saved stream state
   std::streamsize origWidth = out.width();
   std::streamsize origPrecision = out.precision();
@@ -751,13 +732,9 @@ inline void formatImpl(std::ostream &out,
     fmt = printFormatStringLiteral(out, fmt);
     bool spacePadPositive = false;
     int ntrunc = -1;
-    const char *fmtEnd = streamStateFromFormat(out,
-                                               spacePadPositive,
-                                               ntrunc,
-                                               fmt,
-                                               formatters,
-                                               argIndex,
-                                               numFormatters);
+    const char *fmtEnd =
+        streamStateFromFormat(out, spacePadPositive, ntrunc, fmt, formatters,
+                              argIndex, numFormatters);
     if (argIndex >= numFormatters) {
       // Check args remain after reading any variable width/precision
       TINYFORMAT_ERROR("tinyformat: Not enough format arguments");
@@ -806,15 +783,14 @@ inline void formatImpl(std::ostream &out,
 /// information has been stripped from the arguments, leaving just enough of a
 /// common interface to perform formatting as required.
 class FormatList {
-public:
+ public:
   FormatList(detail::FormatArg *formatters, int N)
       : m_formatters(formatters), m_N(N) {}
 
-  friend void vformat(std::ostream &out,
-                      const char *fmt,
+  friend void vformat(std::ostream &out, const char *fmt,
                       const FormatList &list);
 
-private:
+ private:
   const detail::FormatArg *m_formatters;
   int m_N;
 };
@@ -827,7 +803,7 @@ namespace detail {
 // Format list subclass with fixed storage to avoid dynamic allocation
 template <int N>
 class FormatListN : public FormatList {
-public:
+ public:
   template <typename... Args>
   FormatListN(const Args &... args)
       : FormatList(&m_formatterStore[0], N),
@@ -835,14 +811,14 @@ public:
     static_assert(sizeof...(args) == N, "Number of args must be N");
   }
 
-private:
+ private:
   FormatArg m_formatterStore[N];
 };
 
 // Special 0-arg version - MSVC says zero-sized C array in struct is nonstandard
 template <>
 class FormatListN<0> : public FormatList {
-public:
+ public:
   FormatListN() : FormatList(0, 0) {}
 };
 
diff --git a/paddle/string/to_string_test.cc b/paddle/string/to_string_test.cc
index 5ff1b007f1875c7b920a08bd13b8d98cdc5138d3..542c771a98ec8ae187cd4f821ed1ee4373427041 100644
--- a/paddle/string/to_string_test.cc
+++ b/paddle/string/to_string_test.cc
@@ -17,7 +17,7 @@
 
 constexpr char kOutputString[] = "User Defined Output";
 class UserDefinedClass {
-public:
+ public:
 };
 
 std::ostream& operator<<(std::ostream& s, const UserDefinedClass& ins) {
diff --git a/python/paddle/v2/framework/tests/test_exception.py b/python/paddle/v2/framework/tests/test_exception.py
new file mode 100644
index 0000000000000000000000000000000000000000..5ae048817cfcc1ec85e0d0e0c5db749da4521012
--- /dev/null
+++ b/python/paddle/v2/framework/tests/test_exception.py
@@ -0,0 +1,17 @@
+import paddle.v2.framework.core as core
+import unittest
+
+
+class TestException(unittest.TestCase):
+    def test_exception(self):
+        ex = None
+        try:
+            core.__unittest_throw_exception__()
+        except core.EnforceNotMet as ex:
+            self.assertIn("test exception", ex.message)
+
+        self.assertIsNotNone(ex)
+
+
+if __name__ == "__main__":
+    unittest.main()