Merge branch 'develop' of github.com:baidu/Paddle into stable_elemwise_mul

55b68c6e · Yu Yang · 9a3efb28 · 184768e0 · 55b68c6e · 55b68c6e
隐藏空白更改
内联并排

Showing with 100 addition and 98 deletion

doc/design/refactorization.md doc/design/refactorization.md +92 -91

paddle/framework/op_registry.h paddle/framework/op_registry.h +8 -7

未找到文件。
--- a/doc/design/refactorization.md
+++ b/doc/design/refactorization.md
 # Design Doc: Refactorization Overview
-The goal of refactorizaiton include:
+The goals of refactoring include:
-1. Make it easy for external contributors to write new elementory computaiton operations.
+1. Making it easy for external contributors to write new elementary computation operations.
-1. Make the codebase clean and readable.
+1. Making the codebase clean and readable.
-1. Introduce a new design of computation representation -- a computation graph of operators and variables.
+1. Designing a new computation representation -- a computation graph of operators and variables.
-1. The graph representation helps implementing auto-scalable and auto fault recoverable distributed computing.
+1. Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs.
 ## Computation Graphs
-1. PaddlePaddle represent the computation, training and inference of DL models, by computation graphs.
+1. PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs.
-  1. Please dig into [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a solid example.
+  1. Please refer to [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a concrete example.
-1. Users write Python programs to describe the graphs and run it (locally or remotely).
+1. Users write Python programs to describe the graphs and run them (locally or remotely).
 1. A graph is composed of *variables* and *operators*.
-1. The description of graphs must be able to be serialized/deserialized, so it
+1. The description of graphs must be capable of being serialized/deserialized, so that
-   1. could to be sent to the cloud for distributed execution, and
+   1. It can to be sent to the cloud for distributed execution, and
-   1. be sent to clients for mobile or enterprise deployment.
+   1. It can be sent to clients for mobile or enterprise deployment.
-1. The Python program do
+1. The Python program does the following steps
-   1. *compilation*: runs a Python program to generate a protobuf message representation of the graph and send it to
+   1. *compilation*: run a Python program to generate a protobuf message representation of the graph and send it to
      1. the C++ library `libpaddle.so` for local execution,
      1. the master process of a distributed training job for training, or
      1. the server process of a Kubernetes serving job for distributed serving.
-   1. *execution*: according to the protobuf message, constructs instances of class `Variable` and `OperatorBase`, and run them.
+   1. *execution*: execute the graph by constructing instances of class [`Variable`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24) and [`OperatorBase`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70), according to the protobuf message.
-## Description and Realization
+## Description and Realization of Computation Graph
-At compile time, the Python program generates protobuf message representation of the graph, or the description of the graph.
+At compile time, the Python program generates a protobuf message representation of the graph, or the description of the graph.
-At runtime, the C++ program realizes the graph and run it.
+At runtime, the C++ program realizes the graph and runs it.
 | | Representation (protobuf messages) | Realization (C++ class objects) |
 |---|---|---|
@@ -42,30 +42,31 @@ At runtime, the C++ program realizes the graph and run it.
 |Operation|[OpDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35)|[Operator](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64)|
 |Block|BlockDesc|Block|
-The word *graph* is exchangable with *block* in this document.  A graph represent computation steps and local variables as a C++/Java program block, or a pair of { and }.
+The word *graph* is interchangeable with *block* in this document.  A graph represents computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(`{` and `}`).
 ## Compilation and Execution
-1. Run an applicaton Python program to describe the graph.  In particular,
+1. Run an application Python program to describe the graph.  In particular, the Python application program does the following:
-   1. create VarDesc to represent local/intermediate variables,
+   1. Create `VarDesc` to represent local/intermediate variables,
-   1. create operators and set attributes,
+   1. Create operators and set attributes,
-   1. validate attribute values,
+   1. Validate attribute values,
-   1. inference the type and the shape of variables,
+   1. Infer the type and the shape of variables,
-   1. plan for memory-reuse for variables,
+   1. Plan memory-reuse for variables,
-   1. generate backward and optimization part of the Graph.
+   1. Generate the backward graph
-   1. possiblly split the graph for distributed training.
+   1. Optimize the computation graph.
+   1. Potentially, split the graph for distributed training.
-1. The invocation of `train` or `infer` in the application Python program:
+1. The invocation of `train` or [`infer`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108) methods in the application Python program does the following:
-   1. create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block,
+   1. Create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block,
      1. realize local variables defined in the BlockDesc message in the new scope,
      1. a scope is similar to the stack frame in programming languages,
-   1. create an instance of class `Block`, in which,
+   1. Create an instance of class `Block`, in which,
      1. realize operators in the BlockDesc message,
-   1. run the Block by calling
+   1. Run the Block by calling
      1. `Block::Eval(vector<Variable>* targets)` for forward and backward computations, or
      1. `Block::Eval(vector<Operator>* targets)` for optimization.
@@ -76,14 +77,14 @@ The word *graph* is exchangable with *block* in this document.  A graph represen
 Compile Time -> IR -> Runtime
 ```
-### Benefit
+### Benefits of IR
 - Optimization
  ```text
  Compile Time -> IR -> Optimized IR -> Runtime
  ```
- Send automatically partitioned IR to different nodes.
+- Automatically send partitioned IR to different nodes.
-  - Automatic data parallel
+  - Automatic Data Parallelism
    ```text
    Compile Time
    |-> Single GPU IR
@@ -92,7 +93,7 @@ Compile Time -> IR -> Runtime
            |-> Node-1 (runs trainer-IR-1)
            |-> Node-2 (runs pserver-IR)
    ```
-  - Automatic model parallel (planned for future)
+  - Automatic Model Parallelism (planned for future)
 ---
@@ -105,10 +106,10 @@ Compile Time -> IR -> Runtime
 # Operator
 ![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot)
-* `Operator` is the fundamental building block as the user interface.
+* `Operator` is the fundamental building block of the user interface.
-    * Operator stores input/output variable name, and attributes.
+    * Operator stores input/output variable names, and attributes.
-    * The `InferShape` interface is used to infer output variable shapes by its input shapes.
+    * The `InferShape` interface is used to infer the shape of the output variable shapes based on the shapes of the input variables.
-    * Use `Run` to compute `input variables` to `output variables`.
+    * Use `Run` to compute the `output` variables from the `input` variables.
 ---
@@ -126,30 +127,30 @@ Compile Time -> IR -> Runtime
 # Why separate Kernel and Operator
 * Separate GPU and CPU code.
-    * Make Paddle can run without GPU.
+    * Make Paddle capable of running without GPU.
-* Make one operator (which is user interface) can contain many implementations.
+* Make one operator (which is a user interface) and create many implementations.
-    * Same mul op, different FP16, FP32 Kernel. different MKL, eigen kernel.
+    * For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel.
 ---
 # Libraries for Kernel development
 * `Eigen::Tensor` contains basic math and element-wise functions.
    * Note that `Eigen::Tensor` has broadcast implementation.
-    * Limit number of `tensor.device(dev) = ` in your code.
+    * Limit the number of `tensor.device(dev) = ` in your code.
 * `thrust::tranform` and `std::transform`.
-    * `thrust` has the same API as C++ standard library. Using `transform` can quickly implement a customized elementwise kernel.
+    * `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized elementwise kernels.
-    * `thrust` has more complex API, like `scan`, `reduce`, `reduce_by_key`.
+    * `thrust` also has more complex APIs, like `scan`, `reduce`, `reduce_by_key`.
 * Hand-writing `GPUKernel` and `CPU` code
-    * Do not write `.h`. CPU Kernel should be in `.cc`. GPU kernel should be in `.cu`. (`GCC` cannot compile GPU code.)
+    * Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.)
 ---
-# Operator Register
+# Operator Registration
-## Why register is necessary?
+## Why registration is necessary?
 We need a method to build mappings between Op type names and Op classes.
-## How to do the register?
+## How is registration implemented?
-Maintain a map, whose key is the type name and value is corresponding Op constructor.
+Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.
 ---
 # The Registry Map
@@ -177,34 +178,34 @@ REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class)
 REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class)
 ```
-### `USE` Macros
+### USE Macros
-make sure the registration process is executed and linked.
+Make sure the registration process is executed and linked.
 ---
-# Register Process
+# Registration Process
-1. Write Op class, as well as its gradient Op class if there is.
+1. Write an Op class and its gradient Op class, if required.
-2. Write Op maker class. In the constructor, describe its inputs, outputs, and attributes.
+2. Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator.
-3. Invoke macro `REGISTER_OP`. The macro will
+3. Invoke the macro `REGISTER_OP`. This macro will
-	1. call maker class to complete `proto` and `checker`
+	1. Call maker class to complete the `proto` and the `checker`
-	2. with the completed `proto` and `checker`, build a new key-value pair in the `OpInfoMap`
+	2. Using the completed `proto` and `checker`, it will add a new key-value pair to the `OpInfoMap`
-4. Invoke `USE` macro in where the Op is used to make sure it is linked.
+4. Invoke the `USE` macro in which the Op is used, to make sure that it is linked.
 ---
 # Backward Module (1/2)
 ### Create Backward Operator
- Mapping from forwarding Op to backward Op
+- Mapping from forward Op to backward Op
 ![backward](https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png)
 ---
 # Backward Module (2/2)
 ### Build Backward Network
- **Input** graph of forwarding operators
+- **Input**: graph of forwarding operators
- **Output** graph of backward operators
+- **Output**: graph of backward operators
- **corner case in construction**
+- **Corner cases in construction**
-	- shared variable => insert `Add` operator
+	- Shared Variables => insert an `Add` operator to combine gradients
-	- no gradient => insert `fill_zero_grad` operator
+	- No Gradient => insert a `fill_zero_grad` operator
-	- recursive netOp => call `Backward` recursively
+	- Recursive NetOp => call `Backward` recursively
 	- RNN Op => recursively call `Backward` on stepnet
@@ -213,41 +214,41 @@ make sure the registration process is executed and linked.
 * `Tensor` is an n-dimension array with type.
 	* Only dims and data pointers are stored in `Tensor`.
-	* All operators on `Tensor` is written in `Operator` or global functions.
+	* All operations on `Tensor` are written in `Operator` or global functions.
-	* variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md)
+	* Variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md)
-* `Variable` is the inputs and outputs of an operator. Not just `Tensor`.
+* `Variable` instances are the inputs and the outputs of an operator. Not just `Tensor`.
-	* step_scopes in RNN is a variable and not a tensor.
+	* `step_scopes` in RNN is a variable and not a tensor.
-* `Scope` is where variables store at.
+* `Scope` is where variables are stores.
-	* map<string/*var name */, Variable>
+	* map<string `variable_name`, Variable>
-	* `Scope` has a hierarchical structure. The local scope can get variable from its parent scope.
+	* `Scope` has a hierarchical structure. The local scope can get variables from its parent scope.
 ---
 # Block (in design)
 ## the difference with original RNNOp
- as an operator is more intuitive than `RNNOp`,
+- As an operator is more intuitive than `RNNOp`,
- offers new interface `Eval(targets)` to deduce the minimal block to `Run`,
+- Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`,
- fits the compile-time/ runtime separation design.
+- Fits the compile-time/ runtime separation design paradigm.
-  - during the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc`
+  - During the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc`
-  - when graph executes, a Block with `BlockDesc` passed in creates `Op` and `Var` then `Run`
+  - When graph executes, a Block with `BlockDesc` is passed. It then creates `Op` and `Var` instances and then invokes `Run`.
 ---
 # Milestone
- take Paddle/books as the main line, the requirement of the models motivates framework refactoring,
+- Take Paddle/books as the main line, the requirement of the models motivates framework refactoring,
- model migration
+- Model migration
-  - framework development gives **priority support** to model migration, for example,
+  - Framework development gives **priority support** to model migration, for example,
    - the MNIST demo needs a Python interface,
    - the RNN models require the framework to support `LoDTensor`.
-  - determine some timelines,
+  - Determine some timelines,
-  - heavily-relied Ops need to be migrated first,
+  - Frequently used Ops need to be migrated first,
-  - different models can be migrated parallelly.
+  - Different models can be migrated in parallel.
- improve the framework at the same time
+- Improve the framework at the same time
- accept imperfection, concentrated on solving the specific problem at the right price.
+- Accept imperfection, concentrate on solving the specific problem at the right price.
 ---
 # Control the migration quality
- compare the performance of migrated models with old ones.
+- Compare the performance of migrated models with old ones.
- follow google C style
+- Follow the google C++ style
- build the automatic workflow of generating Python/C++ documentations
+- Build the automatic workflow of generating Python/C++ documentations.
-  - the documentation of layers and ops should be written inside the code
+  - The documentation of layers and ops should be written inside the code.
-  - take the documentation quality into account when doing PR
+  - Take the documentation quality into account when submitting pull requests.
-  - preview the documentations, read and improve them from users' perspective
+  - Preview the documentations, read and improve them from a user's perspective.
--- a/paddle/framework/op_registry.h
+++ b/paddle/framework/op_registry.h
@@ -103,18 +103,19 @@ class OpRegistrar : public Registrar {
 template <typename PlaceType, bool at_end, size_t I, typename... KernelType>
 struct OpKernelRegistrarFunctor;
-template <typename PlaceType, size_t I, typename... KernelType>
+template <typename PlaceType, size_t I, typename... KernelTypes>
-struct OpKernelRegistrarFunctor<PlaceType, false, I, KernelType...> {
+struct OpKernelRegistrarFunctor<PlaceType, false, I, KernelTypes...> {
-  using KT = typename std::tuple_element<I, std::tuple<KernelType...>>::type;
+  using KERNEL_TYPE =
+      typename std::tuple_element<I, std::tuple<KernelTypes...>>::type;
  void operator()(const char* op_type) const {
-    using T = typename KT::ELEMENT_TYPE;
+    using T = typename KERNEL_TYPE::ELEMENT_TYPE;
    OperatorWithKernel::OpKernelKey key(ToDataType(std::type_index(typeid(T))),
                                        PlaceType());
-    OperatorWithKernel::AllOpKernels()[op_type][key].reset(new KT);
+    OperatorWithKernel::AllOpKernels()[op_type][key].reset(new KERNEL_TYPE);
-    constexpr auto size = std::tuple_size<std::tuple<KernelType...>>::value;
+    constexpr auto size = std::tuple_size<std::tuple<KernelTypes...>>::value;
-    OpKernelRegistrarFunctor<PlaceType, I + 1 == size, I + 1, KernelType...>
+    OpKernelRegistrarFunctor<PlaceType, I + 1 == size, I + 1, KernelTypes...>
        func;
    func(op_type);
  }