提交 a35e82a6 编写于 作者: Y Yancey1989

Merge branch 'develop' of github.com:PaddlePaddle/Paddle into seqconcat_op

set -e set -e
unset OMP_NUM_THREADS MKL_NUM_THREADS
export OMP_DYNAMIC="FALSE"
export KMP_AFFINITY="granularity=fine,compact,0,0"
function train() { function train() {
unset OMP_NUM_THREADS MKL_NUM_THREADS
export OMP_DYNAMIC="FALSE"
export KMP_AFFINITY="granularity=fine,compact,0,0"
topology=$1 topology=$1
bs=$2 bs=$2
use_mkldnn=$3 use_mkldnn=$3
......
# Design Doc: Python API
Due to the refactorization of the PaddlePaddle core, we need Python classes to construct corresponding protobuf messages that describe a DL program.
| Python classes | Protobuf messages |
| --- | --- |
| Program | ProgramDesc |
| Block | BlockDesc |
| Operator | OpDesc |
| Variable | VarDesc |
Please be aware that these Python classes need to maintain some construction-time information, which are not part of the protobuf messages.
## Core Concepts
### Program
A `ProgramDesc` describes a [DL program](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md), which is composed of an array of `BlockDesc`s. A `BlockDesc` refers to its parent block by its index in the array. For example, operators in the step block of an RNN operator needs to be able to access variables in its ancessor blocks.
Whenever we create a block, we need set its parent block to the current block, so the Python class `Program` needs to maintain a data member `current_block`.
```python
class Program(objects):
def __init__(self):
self.proto = core.NewProgram() # a C++ ProgramDesc pointer.
self.blocks = vector<Block>()
self.blocks.append(Block(self, -1)) # the global block
self.current_block = 0 # initialized to the global block
def global_block():
return self.blocks[0]
def current_block():
return self.get_block(self.current_block)
def rollback():
self.current_block = self.current_block().parent_idx
def create_block():
new_block_idx = len(self.block)
self.blocks.append(Block(self, self.current_block))
self.current_block = new_block_idx
return current_block()
```
`Program` is an accessor to the protobuf message `ProgramDesc`, which is created in C++ space, because the InferShape function is in C++, which manipulates `VarDesc` messages, which are in turn members of `BlockDesc`, which is a member of `ProgramDesc`.
`Program` creates the first block as the global block in its constructor. All parameters and their initializer operators are in the global block.
### Block
A [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md) includes
1. a map from variable names to an instance of the Python `Variable` class, and
1. a list of `Operator` instances.
```python
class Block(objects):
def __init__(self, program, parent_idx):
self.proto = core.NewBlock(program.proto)
self.program = program
self.vars = map<string, Variable>()
self.ops = vector<Operator>()
self.parent_idx = parent_idx
def create_var(self, ...):
return Variable(self, ...)
def _create_global_var(self, ...):
program.global_block().create_var(...)
def create_parameter(self, name, ...):
# Parameter is a subclass of variable. See Parameter section for details.
self.vars[name] = Parameter(self._create_global_var(...), ...)
return self.vars[name]
def append_operator(self, ...):
self.ops.append(Operator(self, ...))
def prepend_operator(self, ...): # Parameter's ctor prepands initialize operators.
self.ops.prepend(Operator(self, ...))
```
`create_parameter` is necessary because parameters are global variables, those defined in the global block, but can be created in some sub-blocks, e.g., an FC layer in the step block of an RNN operator.
`prepand_operator` is necessary because the constructor of `Parameter` needs to create the initialize (or load) operator of the parameter, and would like to put it in the *preamble* of the global block.
### Operator
The `Operator` class fills in the `OpDesc` message and calls the C++ function `InferShape` to infer output shape from input shape.
```python
class Operator(object):
def __init__(self,
block, # Block
type, # string
inputs, # dict<string, Variable>
outputs,# dict<stirng, Variable>
attrs # dict<string, Any>
):
self.proto = core.NewOpDesc(block.proto, type, inputs, outputs, attrs)
core.infer_shape(self.proto, inputs, outputs)
def type(self):
return self.proto.type()
```
`Operator` creates the `OpDesc` message in C++ space, so could it call the `InferShape` function, which is in C++.
### Variable
Operators take Variables as its inputs and outputs.
```python
class Variable(object):
def __init__(self,
block=None, # Block
name=None, # string
shape, # tuple
dtype="float32", # string
lod_level=None # int
):
if name is None:
name = unique_name_generator()
self.name = name
self.block = block
self.proto = core.NewVarDesc(block.proto, name, shape, lod_level)
self.writer = None
```
Please be aware of `self.writer`, that tracks operator who creates the variable. It possible that there are more than one operators who write a variable, but in Python space, each writes to a variable is represented by a Variable class. This is guaranteed by the fact that **`core.NewVarDesc` must NOT create a new `VarDesc` message if its name already exists in the specified block**.
### Parameter
A parameter is a global variable with an initializer (or load) operator.
```python
class Parameter(Variable):
def __init__(self,
block=None, # Block
name=None, # string
shape, # tuple
dtype="float32", # string
lod_level=None # int
trainable, # bool
initialize_op_attrs,
optimize_op_attrs):
super(Parameter, self).__init__(block, name, shape, dtype, lod_level)
self.trainable = trainable
self.optimize_op_attrs = optimize_op_attrs
block.prepend(Operator(block, # Block
initialize_op_attrs['type'], # string
None, # no inputs
self, # output is the parameter
initialize_op_attrs)
```
When users create a parameter, s/he can call
```python
program.create_parameter(
...,
init_attr={
type: "uniform_random",
min: -1.0,
max: 1.0,
})
)
```
In above example, `init_attr.type` names an initialize operator. It can also name the load operator
```python
init_attr={
type: "load",
filename: "something.numpy",
}
```
`optimize_op_attrs` is not in the `VarDesc` message, but kept in the Python instance, as it will be used in the Python space when creating the optimize operator's `OpDesc`, and will be in the `OpDesc` message.
## Layer Functions
A layer is a Python function that creates some operators and variables. Layers simplify the work of application programmers.
### Data Layer
```python
def data_layer(name, type, column_name):
block = the_current_program.glolal_block()
var = block.create_global_var(
name=name,
shape=[None] + type.dims(),
dtype=type.dtype)
block.prepend_operator(block,
type="Feed",
inputs = None,
outputs = [var],
{column_name: column_name})
return var
```
The input to the feed operator is a special variable in the global scope, which is the output of [Python readers](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/reader/README.md).
### FC Layer
```python
def fc_layer(input, size, ...):
block = program.current_block()
w = block.create_parameter(...)
b = block.create_parameter(...)
out = block.create_var()
op = block.append_operator("FC", X=input, W=w, b=b, out=out)
out.writer = op
return out
```
# Design Doc: Refactorization Overview # Design Doc: Refactorization Overview
The goal of refactorizaiton include: The goals of refactoring include:
1. Make it easy for external contributors to write new elementory computaiton operations. 1. Making it easy for external contributors to write new elementary computation operations.
1. Make the codebase clean and readable. 1. Making the codebase clean and readable.
1. Introduce a new design of computation representation -- a computation graph of operators and variables. 1. Designing a new computation representation -- a computation graph of operators and variables.
1. The graph representation helps implementing auto-scalable and auto fault recoverable distributed computing. 1. Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs.
## Computation Graphs ## Computation Graphs
1. PaddlePaddle represent the computation, training and inference of DL models, by computation graphs. 1. PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs.
1. Please dig into [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a solid example. 1. Please refer to [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a concrete example.
1. Users write Python programs to describe the graphs and run it (locally or remotely). 1. Users write Python programs to describe the graphs and run them (locally or remotely).
1. A graph is composed of *variables* and *operators*. 1. A graph is composed of *variables* and *operators*.
1. The description of graphs must be able to be serialized/deserialized, so it 1. The description of graphs must be capable of being serialized/deserialized, so that:
1. could to be sent to the cloud for distributed execution, and 1. It can to be sent to the cloud for distributed execution, and
1. be sent to clients for mobile or enterprise deployment. 1. It can be sent to clients for mobile or enterprise deployment.
1. The Python program do 1. The Python program does the following steps
1. *compilation*: runs a Python program to generate a protobuf message representation of the graph and send it to 1. *compilation*: run a Python program to generate a protobuf message representation of the graph and send it to
1. the C++ library `libpaddle.so` for local execution, 1. the C++ library `libpaddle.so` for local execution,
1. the master process of a distributed training job for training, or 1. the master process of a distributed training job for training, or
1. the server process of a Kubernetes serving job for distributed serving. 1. the server process of a Kubernetes serving job for distributed serving.
1. *execution*: according to the protobuf message, constructs instances of class `Variable` and `OperatorBase`, and run them. 1. *execution*: execute the graph by constructing instances of class [`Variable`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24) and [`OperatorBase`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70), according to the protobuf message.
## Description and Realization ## Description and Realization of Computation Graph
At compile time, the Python program generates protobuf message representation of the graph, or the description of the graph. At compile time, the Python program generates a protobuf message representation of the graph, or the description of the graph.
At runtime, the C++ program realizes the graph and run it. At runtime, the C++ program realizes the graph and runs it.
| | Representation (protobuf messages) | Realization (C++ class objects) | | | Representation (protobuf messages) | Realization (C++ class objects) |
|---|---|---| |---|---|---|
...@@ -42,30 +42,31 @@ At runtime, the C++ program realizes the graph and run it. ...@@ -42,30 +42,31 @@ At runtime, the C++ program realizes the graph and run it.
|Operation|[OpDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35)|[Operator](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64)| |Operation|[OpDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35)|[Operator](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64)|
|Block|BlockDesc|Block| |Block|BlockDesc|Block|
The word *graph* is exchangable with *block* in this document. A graph represent computation steps and local variables as a C++/Java program block, or a pair of { and }. The word *graph* is interchangeable with *block* in this document. A graph represents computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(`{` and `}`).
## Compilation and Execution ## Compilation and Execution
1. Run an applicaton Python program to describe the graph. In particular, 1. Run an application Python program to describe the graph. In particular, the Python application program does the following:
1. create VarDesc to represent local/intermediate variables, 1. Create `VarDesc` to represent local/intermediate variables,
1. create operators and set attributes, 1. Create operators and set attributes,
1. validate attribute values, 1. Validate attribute values,
1. inference the type and the shape of variables, 1. Infer the type and the shape of variables,
1. plan for memory-reuse for variables, 1. Plan memory-reuse for variables,
1. generate backward and optimization part of the Graph. 1. Generate the backward graph
1. possiblly split the graph for distributed training. 1. Optimize the computation graph.
1. Potentially, split the graph for distributed training.
1. The invocation of `train` or `infer` in the application Python program: 1. The invocation of `train` or [`infer`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108) methods in the application Python program does the following:
1. create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block, 1. Create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block,
1. realize local variables defined in the BlockDesc message in the new scope, 1. realize local variables defined in the BlockDesc message in the new scope,
1. a scope is similar to the stack frame in programming languages, 1. a scope is similar to the stack frame in programming languages,
1. create an instance of class `Block`, in which, 1. Create an instance of class `Block`, in which,
1. realize operators in the BlockDesc message, 1. realize operators in the BlockDesc message,
1. run the Block by calling 1. Run the Block by calling
1. `Block::Eval(vector<Variable>* targets)` for forward and backward computations, or 1. `Block::Eval(vector<Variable>* targets)` for forward and backward computations, or
1. `Block::Eval(vector<Operator>* targets)` for optimization. 1. `Block::Eval(vector<Operator>* targets)` for optimization.
...@@ -76,14 +77,14 @@ The word *graph* is exchangable with *block* in this document. A graph represen ...@@ -76,14 +77,14 @@ The word *graph* is exchangable with *block* in this document. A graph represen
Compile Time -> IR -> Runtime Compile Time -> IR -> Runtime
``` ```
### Benefit ### Benefits of IR
- Optimization - Optimization
```text ```text
Compile Time -> IR -> Optimized IR -> Runtime Compile Time -> IR -> Optimized IR -> Runtime
``` ```
- Send automatically partitioned IR to different nodes. - Automatically send partitioned IR to different nodes.
- Automatic data parallel - Automatic Data Parallelism
```text ```text
Compile Time Compile Time
|-> Single GPU IR |-> Single GPU IR
...@@ -92,7 +93,7 @@ Compile Time -> IR -> Runtime ...@@ -92,7 +93,7 @@ Compile Time -> IR -> Runtime
|-> Node-1 (runs trainer-IR-1) |-> Node-1 (runs trainer-IR-1)
|-> Node-2 (runs pserver-IR) |-> Node-2 (runs pserver-IR)
``` ```
- Automatic model parallel (planned for future) - Automatic Model Parallelism (planned for future)
--- ---
...@@ -105,10 +106,10 @@ Compile Time -> IR -> Runtime ...@@ -105,10 +106,10 @@ Compile Time -> IR -> Runtime
# Operator # Operator
![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot) ![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot)
* `Operator` is the fundamental building block as the user interface. * `Operator` is the fundamental building block of the user interface.
* Operator stores input/output variable name, and attributes. * Operator stores input/output variable names, and attributes.
* The `InferShape` interface is used to infer output variable shapes by its input shapes. * The `InferShape` interface is used to infer the shape of the output variable shapes based on the shapes of the input variables.
* Use `Run` to compute `input variables` to `output variables`. * Use `Run` to compute the `output` variables from the `input` variables.
--- ---
...@@ -126,30 +127,29 @@ Compile Time -> IR -> Runtime ...@@ -126,30 +127,29 @@ Compile Time -> IR -> Runtime
# Why separate Kernel and Operator # Why separate Kernel and Operator
* Separate GPU and CPU code. * Separate GPU and CPU code.
* Make Paddle can run without GPU. * Make Paddle capable of running without GPU.
* Make one operator (which is user interface) can contain many implementations. * Make one operator (which is a user interface) and create many implementations.
* Same mul op, different FP16, FP32 Kernel. different MKL, eigen kernel. * For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel.
--- ---
# Libraries for Kernel development # Libraries for Kernel development
* `Eigen::Tensor` contains basic math and element-wise functions. * `Eigen::Tensor` contains basic math and element-wise functions.
* Note that `Eigen::Tensor` has broadcast implementation. * Note that `Eigen::Tensor` has broadcast implementation.
* Limit number of `tensor.device(dev) = ` in your code. * Limit the number of `tensor.device(dev) = ` in your code.
* `thrust::tranform` and `std::transform`. * `thrust::transform` and `std::transform`.
* `thrust` has the same API as C++ standard library. Using `transform` can quickly implement a customized elementwise kernel. * `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized element-wise kernels.
* `thrust` has more complex API, like `scan`, `reduce`, `reduce_by_key`. * `thrust` also has more complex APIs, like `scan`, `reduce`, `reduce_by_key`.
* Hand-writing `GPUKernel` and `CPU` code * Hand-writing `GPUKernel` and `CPU` code
* Do not write `.h`. CPU Kernel should be in `.cc`. GPU kernel should be in `.cu`. (`GCC` cannot compile GPU code.) * Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.)
--- ---
# Operator Register # Operator Registration
## Why register is necessary? ## Why is registration necessary?
We need a method to build mappings between Op type names and Op classes. We need a method to build mappings between Op type names and Op classes.
## How to do the register? ## How is registration implemented?
Maintaining a map, whose key is the type name and the value is the corresponding Op constructor.
Maintain a map, whose key is the type name and value is corresponding Op constructor.
--- ---
# The Registry Map # The Registry Map
...@@ -169,7 +169,7 @@ Maintain a map, whose key is the type name and value is corresponding Op constru ...@@ -169,7 +169,7 @@ Maintain a map, whose key is the type name and value is corresponding Op constru
# Related Concepts # Related Concepts
### Op_Maker ### Op_Maker
It's constructor takes `proto` and `checker`. They are compeleted during Op_Maker's construction. ([ScaleOpMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37)) It's constructor takes `proto` and `checker`. They are completed during Op_Maker's construction. ([ScaleOpMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37))
### Register Macros ### Register Macros
```cpp ```cpp
...@@ -177,34 +177,34 @@ REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class) ...@@ -177,34 +177,34 @@ REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class)
REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class) REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class)
``` ```
### `USE` Macros ### USE Macros
make sure the registration process is executed and linked. Make sure the registration process is executed and linked.
--- ---
# Register Process # Registration Process
1. Write Op class, as well as its gradient Op class if there is. 1. Write an Op class and its gradient Op class, if required.
2. Write Op maker class. In the constructor, describe its inputs, outputs, and attributes. 2. Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator.
3. Invoke macro `REGISTER_OP`. The macro will 3. Invoke the macro `REGISTER_OP`. This macro will
1. call maker class to complete `proto` and `checker` 1. Call maker class to complete the `proto` and the `checker`
2. with the completed `proto` and `checker`, build a new key-value pair in the `OpInfoMap` 2. Using the completed `proto` and `checker`, it will add a new key-value pair to the `OpInfoMap`
4. Invoke `USE` macro in where the Op is used to make sure it is linked. 4. Invoke the `USE` macro in which the Op is used, to make sure that it is linked.
--- ---
# Backward Module (1/2) # Backward Module (1/2)
### Create Backward Operator ### Create Backward Operator
- Mapping from forwarding Op to backward Op - Mapping from forward Op to backward Op
![backward](https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png) ![backward](https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png)
--- ---
# Backward Module (2/2) # Backward Module (2/2)
### Build Backward Network ### Build Backward Network
- **Input** graph of forwarding operators - **Input**: graph of forward operators
- **Output** graph of backward operators - **Output**: graph of backward operators
- **corner case in construction** - **Corner cases in construction**
- shared variable => insert `Add` operator - Shared Variables => insert an `Add` operator to combine gradients
- no gradient => insert `fill_zero_grad` operator - No Gradient => insert a `fill_zero_grad` operator
- recursive netOp => call `Backward` recursively - Recursive NetOp => call `Backward` recursively
- RNN Op => recursively call `Backward` on stepnet - RNN Op => recursively call `Backward` on stepnet
...@@ -213,41 +213,41 @@ make sure the registration process is executed and linked. ...@@ -213,41 +213,41 @@ make sure the registration process is executed and linked.
* `Tensor` is an n-dimension array with type. * `Tensor` is an n-dimension array with type.
* Only dims and data pointers are stored in `Tensor`. * Only dims and data pointers are stored in `Tensor`.
* All operators on `Tensor` is written in `Operator` or global functions. * All operations on `Tensor` are written in `Operator` or global functions.
* variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) * Variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md)
* `Variable` is the inputs and outputs of an operator. Not just `Tensor`. * `Variable` instances are the inputs and the outputs of an operator. Not just `Tensor`.
* step_scopes in RNN is a variable and not a tensor. * `step_scopes` in RNN is a variable and not a tensor.
* `Scope` is where variables store at. * `Scope` is where variables are stores.
* map<string/*var name */, Variable> * map<string `variable_name`, Variable>
* `Scope` has a hierarchical structure. The local scope can get variable from its parent scope. * `Scope` has a hierarchical structure. The local scope can get variables from its parent scope.
--- ---
# Block (in design) # Block (in design)
## the difference with original RNNOp ## the difference between original RNNOp and Block
- as an operator is more intuitive than `RNNOp`, - As an operator is more intuitive than `RNNOp`,
- offers new interface `Eval(targets)` to deduce the minimal block to `Run`, - Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`,
- fits the compile-time/ runtime separation design. - Fits the compile-time/ runtime separation design paradigm.
- during the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc` - During the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc`
- when graph executes, a Block with `BlockDesc` passed in creates `Op` and `Var` then `Run` - When graph executes, a Block with `BlockDesc` is passed. It then creates `Op` and `Var` instances and then invokes `Run`.
--- ---
# Milestone # Milestone
- take Paddle/books as the main line, the requirement of the models motivates framework refactoring, - Take Paddle/books as the main line, the requirement of the models motivates framework refactoring,
- model migration - Model migration
- framework development gives **priority support** to model migration, for example, - Framework development gives **priority support** to model migration, for example,
- the MNIST demo needs a Python interface, - the MNIST demo needs a Python interface,
- the RNN models require the framework to support `LoDTensor`. - the RNN models require the framework to support `LoDTensor`.
- determine some timelines, - Determine some timelines,
- heavily-relied Ops need to be migrated first, - Frequently used Ops need to be migrated first,
- different models can be migrated parallelly. - Different models can be migrated in parallel.
- improve the framework at the same time - Improve the framework at the same time
- accept imperfection, concentrated on solving the specific problem at the right price. - Accept imperfection, concentrate on solving the specific problem at the right price.
--- ---
# Control the migration quality # Control the migration quality
- compare the performance of migrated models with old ones. - Compare the performance of migrated models with old ones.
- follow google C style - Follow the google C++ style
- build the automatic workflow of generating Python/C++ documentations - Build the automatic workflow of generating Python/C++ documentations.
- the documentation of layers and ops should be written inside the code - The documentation of layers and ops should be written inside the code.
- take the documentation quality into account when doing PR - Take the documentation quality into account when submitting pull requests.
- preview the documentations, read and improve them from users' perspective - Preview the documentations, read and improve them from a user's perspective.
...@@ -285,41 +285,27 @@ class TestMulGradOp(GradientChecker): ...@@ -285,41 +285,27 @@ class TestMulGradOp(GradientChecker):
'Y': np.random.random((84, 100)).astype("float32") 'Y': np.random.random((84, 100)).astype("float32")
} }
def test_cpu_gpu_compare(self): def test_check_grad_normal(self):
self.compare_grad(self.op, self.inputs)
def test_normal(self):
# mul op will enlarge the relative error # mul op will enlarge the relative error
self.check_grad( self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5)
self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5)
def test_ignore_x(self): def test_check_grad_ingore_x(self):
self.check_grad( self.check_grad(
self.op, ['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X"))
self.inputs, ["Y"],
"Out",
max_relative_error=0.5,
no_grad_set={"X"})
def test_ignore_y(self): def test_check_grad_ingore_y(self):
self.check_grad( self.check_grad(
self.op, ['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y'))
self.inputs, ["X"],
"Out",
max_relative_error=0.5,
no_grad_set={"Y"})
``` ```
下面解释代码中一些关键的地方: 下面解释代码中一些关键的地方:
- 调用`create_op("mul")`创建反向Op对应的前向Op。 - 调用`create_op("mul")`创建反向Op对应的前向Op。
- 调用`compare_grad`函数对比CPU、GPU计算结果。 - `test_check_grad_normal`中调用`check_grad`使用数值法检测梯度正确性和稳定性。
- `test_normal`中调用`check_grad`使用数值法检测梯度正确性和稳定性。 - 第一个参数`["X", "Y"]` : 指定对输入变量`X``Y`做梯度检测。
- 第一个参数`self.op` : 前向Op。 - 第二个参数`"Out"` : 指定前向网络最终的输出目标变量`Out`
- 第二个参数`self.inputs` : 输入词典,词典的Key和`ProtoMaker`定义保持一致。 - 第三个参数`max_relative_error`:指定检测梯度时能容忍的最大错误值。
- 第三个参数`["X", "Y"]` : 指定对输入变量`X``Y`做梯度检测。 - `test_check_grad_ingore_x``test_check_grad_ingore_y`分支用来测试只需要计算一个输入梯度的情况。
- 第四个参数`"Out"` : 指定前向网络最终的输出目标变量`Out`
- `test_ignore_x``test_ignore_y`分支用来测试只需要计算一个输入梯度的情况。
### 编译和执行单元测试 ### 编译和执行单元测试
......
...@@ -293,41 +293,27 @@ class TestMulGradOp(GradientChecker): ...@@ -293,41 +293,27 @@ class TestMulGradOp(GradientChecker):
'Y': np.random.random((84, 100)).astype("float32") 'Y': np.random.random((84, 100)).astype("float32")
} }
def test_cpu_gpu_compare(self): def test_check_grad_normal(self):
self.compare_grad(self.op, self.inputs)
def test_normal(self):
# mul op will enlarge the relative error # mul op will enlarge the relative error
self.check_grad( self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5)
self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5)
def test_ignore_x(self): def test_check_grad_ingore_x(self):
self.check_grad( self.check_grad(
self.op, ['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X"))
self.inputs, ["Y"],
"Out",
max_relative_error=0.5,
no_grad_set={"X"})
def test_ignore_y(self): def test_check_grad_ingore_y(self):
self.check_grad( self.check_grad(
self.op, ['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y'))
self.inputs, ["X"],
"Out",
max_relative_error=0.5,
no_grad_set={"Y"})
``` ```
Some key points in the code above include: Some key points in the code above include:
- `create_op("mul")` creates the backward operator's corresponding forward operator. - `create_op("mul")` creates the backward operator's corresponding forward operator.
- `compare_grad` compares results between utilizing the CPU and the GPU.
- `test_normal` calls `check_grad` to validate scaling tests' correctness and stability through numeric methods. - `test_normal` calls `check_grad` to validate scaling tests' correctness and stability through numeric methods.
- The first variable `self.op` denotes the forward operator. - The first variable `["X", "Y"]` appoints `X` and `Y` to be scale tested.
- The second variable `self.inputs` denotes the input dictionary, which has its key value identical to its `ProtoMaker` definitions. - The second variable `"Out"` points to the network's final output target `Out`.
- The third variable `["X", "Y"]` appoints `X` and `Y` to be scale tested. - The third variable `max_relative_error` points to the maximum relative tolerance error during scaling tests.
- The fourth variable `"Out"` points to the network's final output target `Out`. - `test_check_grad_ingore_x` and `test_check_grad_ingore_y`branches test the cases where there is only one scaling input.
- `test_ignore_x` and `test_ignore_y`branches test the cases where there is only one scaling input.
### Compiling and Running ### Compiling and Running
......
...@@ -19,13 +19,14 @@ cc_test(scope_test SRCS scope_test.cc DEPS scope) ...@@ -19,13 +19,14 @@ cc_test(scope_test SRCS scope_test.cc DEPS scope)
proto_library(framework_proto SRCS framework.proto) proto_library(framework_proto SRCS framework.proto)
cc_library(attribute SRCS attribute.cc DEPS framework_proto) cc_library(attribute SRCS attribute.cc DEPS framework_proto)
cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS attribute)
cc_library(op_proto_maker SRCS op_proto_maker.cc DEPS framework_proto attribute) cc_library(op_proto_maker SRCS op_proto_maker.cc DEPS framework_proto attribute)
cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker) cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker)
cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto) cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto)
cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope) cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope)
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry) cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry)
cc_library(grad_op_builder SRCS grad_op_builder.cc DEPS operator) cc_library(grad_op_builder SRCS grad_op_builder.cc DEPS operator proto_desc)
cc_library(op_registry SRCS op_registry.cc DEPS grad_op_builder op_proto_maker op_info) cc_library(op_registry SRCS op_registry.cc DEPS grad_op_builder op_proto_maker op_info)
cc_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry) cc_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry)
cc_test(grad_op_builder_test SRCS grad_op_builder_test.cc DEPS grad_op_builder op_registry add_op) cc_test(grad_op_builder_test SRCS grad_op_builder_test.cc DEPS grad_op_builder op_registry add_op)
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/block_desc.h"
#include "paddle/framework/program_desc.h"
namespace paddle {
namespace framework {
VarDescBind *BlockDescBind::NewVar(const std::string &name) {
need_update_ = true;
auto it = vars_.find(name);
PADDLE_ENFORCE(it == vars_.end(), "Duplicated variable %s", name);
auto var = new VarDescBind(name);
vars_[name].reset(var);
return var;
}
VarDescBind *BlockDescBind::Var(const std::string &name) const {
auto it = vars_.find(name);
PADDLE_ENFORCE(it != vars_.end(),
"Can not find variable %s in current block.", name);
return it->second.get();
}
std::vector<VarDescBind *> BlockDescBind::AllVars() const {
std::vector<VarDescBind *> res;
for (const auto &p : vars_) {
res.push_back(p.second.get());
}
return res;
}
OpDescBind *BlockDescBind::AppendOp() {
need_update_ = true;
ops_.emplace_back(new OpDescBind());
return ops_.back().get();
}
OpDescBind *BlockDescBind::PrependOp() {
need_update_ = true;
ops_.emplace_front(new OpDescBind());
return ops_.front().get();
}
std::vector<OpDescBind *> BlockDescBind::AllOps() const {
std::vector<OpDescBind *> res;
for (const auto &op : ops_) {
res.push_back(op.get());
}
return res;
}
void BlockDescBind::Sync() {
if (need_update_) {
auto &op_field = *this->desc_->mutable_ops();
op_field.Clear();
op_field.Reserve(static_cast<int>(ops_.size()));
for (auto &op_desc : ops_) {
op_field.AddAllocated(op_desc->Proto());
}
need_update_ = false;
}
}
BlockDescBind *BlockDescBind::ParentBlock() const {
if (this->desc_->parent_idx() == -1) {
return nullptr;
}
return prog_->Block(static_cast<size_t>(this->desc_->parent_idx()));
}
void OpDescBind::SetBlockAttr(const std::string &name, BlockDescBind &block) {
BlockDesc *desc = block.RawPtr();
this->attrs_[name] = desc;
}
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <deque>
#include <unordered_map>
#include <vector>
#include "paddle/framework/op_desc.h"
#include "paddle/framework/var_desc.h"
namespace paddle {
namespace framework {
class ProgramDescBind;
// Each Protobuf Message, we provide a XXXBind class. In that class, we optimize
// read/write speed. Only when we want the protobuf message, the local changes
// will be synchronized (by `Sync` method).
class BlockDescBind {
public:
BlockDescBind(ProgramDescBind *prog, BlockDesc *desc)
: prog_(prog), desc_(desc), need_update_(false) {}
BlockDescBind(const BlockDescBind &o) = delete;
BlockDescBind &operator=(const BlockDescBind &o) = delete;
int32_t ID() const { return desc_->idx(); }
int32_t Parent() const { return desc_->parent_idx(); }
VarDescBind *NewVar(const std::string &name_bytes);
VarDescBind *Var(const std::string &name_bytes) const;
std::vector<VarDescBind *> AllVars() const;
BlockDescBind *ParentBlock() const;
OpDescBind *AppendOp();
OpDescBind *PrependOp();
std::vector<OpDescBind *> AllOps() const;
void Sync();
BlockDesc *RawPtr() { return desc_; }
private:
ProgramDescBind *prog_; // not_own
BlockDesc *desc_; // not_own
bool need_update_;
std::deque<std::unique_ptr<OpDescBind>> ops_;
std::unordered_map<std::string, std::unique_ptr<VarDescBind>> vars_;
};
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <typeindex>
#include "paddle/framework/framework.pb.h"
namespace paddle {
namespace framework {
inline DataType ToDataType(std::type_index type) {
if (typeid(float).hash_code() == type.hash_code()) {
return DataType::FP32;
} else if (typeid(double).hash_code() == type.hash_code()) {
return DataType::FP64;
} else if (typeid(int).hash_code() == type.hash_code()) {
return DataType::INT32;
} else {
PADDLE_THROW("Not supported");
return static_cast<DataType>(-1);
}
}
} // namespace framework
} // namespace paddle
...@@ -54,5 +54,44 @@ OperatorBase* BuildGradOp(const OperatorBase* op) { ...@@ -54,5 +54,44 @@ OperatorBase* BuildGradOp(const OperatorBase* op) {
return grad_info.Creator()(info.grad_op_type_, inputs, outputs, op->Attrs()); return grad_info.Creator()(info.grad_op_type_, inputs, outputs, op->Attrs());
} }
static void TransOpDescArg(const OpDescBind* src_op, const OpArgType& src_type,
bool is_grad, OpDescBind* dst_op,
const OpArgType& dst_type) {
PADDLE_ENFORCE(dst_op != nullptr,
"Protobuf desc of gradient op must be initialized first.");
const auto& proto = OpInfoMap::Instance().Get(src_op->Type()).Proto();
const auto& src_arg_list =
src_type == OpArgType::IN ? proto.inputs() : proto.outputs();
for (const auto& arg : src_arg_list) {
if (arg.not_in_gradient() && !is_grad) continue;
const std::string src_name = arg.name();
std::vector<std::string> vars = src_type == OpArgType::IN
? src_op->Input(src_name)
: src_op->Output(src_name);
if (is_grad) {
for (std::string& var : vars) {
var = GradVarName(var);
}
}
std::string dst_name = is_grad ? GradVarName(src_name) : src_name;
dst_type == OpArgType::IN ? dst_op->SetInput(dst_name, vars)
: dst_op->SetOutput(dst_name, vars);
}
}
void CompleteGradOpDesc(const OpDescBind* forw_op, OpDescBind* grad_op) {
auto& info = OpInfoMap::Instance().Get(forw_op->Type());
PADDLE_ENFORCE(info.HasGradientOp());
grad_op->SetType(info.grad_op_type_);
TransOpDescArg(forw_op, OpArgType::IN, false, grad_op, OpArgType::IN);
TransOpDescArg(forw_op, OpArgType::OUT, false, grad_op, OpArgType::IN);
TransOpDescArg(forw_op, OpArgType::OUT, true, grad_op, OpArgType::IN);
TransOpDescArg(forw_op, OpArgType::IN, true, grad_op, OpArgType::OUT);
grad_op->SetAttrMap(forw_op->GetAttrMap());
}
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -14,6 +14,7 @@ limitations under the License. */ ...@@ -14,6 +14,7 @@ limitations under the License. */
#pragma once #pragma once
#include "paddle/framework/op_desc.h"
#include "paddle/framework/operator.h" #include "paddle/framework/operator.h"
namespace paddle { namespace paddle {
...@@ -21,5 +22,7 @@ namespace framework { ...@@ -21,5 +22,7 @@ namespace framework {
OperatorBase* BuildGradOp(const OperatorBase* op); OperatorBase* BuildGradOp(const OperatorBase* op);
void CompleteGradOpDesc(const OpDescBind* forw_op, OpDescBind* grad_op);
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -120,3 +120,82 @@ TEST(GradOpBuilder, IOIgnoredInGradient) { ...@@ -120,3 +120,82 @@ TEST(GradOpBuilder, IOIgnoredInGradient) {
std::vector<std::string>( std::vector<std::string>(
{f::GradVarName("in3_1"), f::GradVarName("in3_2")})); {f::GradVarName("in3_1"), f::GradVarName("in3_2")}));
} }
TEST(GradOpDescBuilder, MutiInOut) {
f::OpDescBind *forw_op = new f::OpDescBind();
forw_op->SetType("mult_io");
forw_op->SetInput("In1", {"in1"});
forw_op->SetInput("In2_mult", {"in2_1", "in2_2", "in2_3"});
forw_op->SetInput("In3", {"in3"});
forw_op->SetOutput("Out1", {"out1"});
forw_op->SetOutput("Out2_mult", {"out2_1", "out2_2"});
f::OpDescBind *grad_op = new f::OpDescBind();
f::CompleteGradOpDesc(forw_op, grad_op);
EXPECT_EQ(grad_op->Type(), "mult_io_grad");
ASSERT_EQ(grad_op->InputNames().size(), 3UL + 2UL + 2UL);
EXPECT_EQ(grad_op->Input("In1"), std::vector<std::string>({"in1"}));
EXPECT_EQ(grad_op->Input("In2_mult"),
std::vector<std::string>({"in2_1", "in2_2", "in2_3"}));
EXPECT_EQ(grad_op->Input("In3"), std::vector<std::string>({"in3"}));
EXPECT_EQ(grad_op->Input("Out1"), std::vector<std::string>({"out1"}));
EXPECT_EQ(grad_op->Input("Out2_mult"),
std::vector<std::string>({"out2_1", "out2_2"}));
EXPECT_EQ(grad_op->Input(f::GradVarName("Out1")),
std::vector<std::string>({f::GradVarName("out1")}));
EXPECT_EQ(grad_op->Input(f::GradVarName("Out2_mult")),
std::vector<std::string>(
{f::GradVarName("out2_1"), f::GradVarName("out2_2")}));
ASSERT_EQ(grad_op->OutputNames().size(), 3UL);
EXPECT_EQ(grad_op->Output(f::GradVarName("In1")),
std::vector<std::string>({f::GradVarName("in1")}));
EXPECT_EQ(grad_op->Output(f::GradVarName("In2_mult")),
std::vector<std::string>({f::GradVarName("in2_1"),
f::GradVarName("in2_2"),
f::GradVarName("in2_3")}));
EXPECT_EQ(grad_op->Output(f::GradVarName("In3")),
std::vector<std::string>({f::GradVarName("in3")}));
delete forw_op;
delete grad_op;
}
TEST(GradOpDescBuilder, IOIgnoredInGradient) {
f::OpDescBind *forw_op = new f::OpDescBind();
forw_op->SetType("io_ignored");
forw_op->SetInput("In1", {"in1"});
forw_op->SetInput("In2_mult", {"in2_1", "in2_2"});
forw_op->SetInput("In3_mult", {"in3_1", "in3_2"});
forw_op->SetOutput("Out1_mult", {"out1_1", "out1_2"});
forw_op->SetOutput("Out2", {"out2"});
f::OpDescBind *grad_op = new f::OpDescBind();
f::CompleteGradOpDesc(forw_op, grad_op);
EXPECT_EQ(grad_op->Type(), "io_ignored_grad");
// 'In2' and 'Out2' are ignored in gradient calculating
ASSERT_EQ(grad_op->InputNames().size(), 2UL + 1UL + 2UL);
EXPECT_EQ(grad_op->Input("In1"), std::vector<std::string>({"in1"}));
EXPECT_EQ(grad_op->Input("In3_mult"),
std::vector<std::string>({"in3_1", "in3_2"}));
EXPECT_EQ(grad_op->Input("Out1_mult"),
std::vector<std::string>({"out1_1", "out1_2"}));
EXPECT_EQ(grad_op->Input(f::GradVarName("Out1_mult")),
std::vector<std::string>(
{f::GradVarName("out1_1"), f::GradVarName("out1_2")}));
EXPECT_EQ(grad_op->Input(f::GradVarName("Out2")),
std::vector<std::string>({f::GradVarName("out2")}));
ASSERT_EQ(grad_op->OutputNames().size(), 3UL);
EXPECT_EQ(grad_op->Output(f::GradVarName("In1")),
std::vector<std::string>({f::GradVarName("in1")}));
EXPECT_EQ(grad_op->Output(f::GradVarName("In2_mult")),
std::vector<std::string>(
{f::GradVarName("in2_1"), f::GradVarName("in2_2")}));
EXPECT_EQ(grad_op->Output(f::GradVarName("In3_mult")),
std::vector<std::string>(
{f::GradVarName("in3_1"), f::GradVarName("in3_2")}));
delete forw_op;
delete grad_op;
}
\ No newline at end of file
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/op_desc.h"
#include "paddle/framework/block_desc.h"
namespace paddle {
namespace framework {
OpDesc *OpDescBind::Proto() {
Sync();
return &op_desc_;
}
const std::vector<std::string> &OpDescBind::Input(
const std::string &name) const {
auto it = inputs_.find(name);
PADDLE_ENFORCE(it != inputs_.end(), "Input %s cannot be found in Op %s", name,
Type());
return it->second;
}
std::vector<std::string> OpDescBind::InputNames() const {
std::vector<std::string> retv;
retv.reserve(this->inputs_.size());
for (auto &ipt : this->inputs_) {
retv.push_back(ipt.first);
}
return retv;
}
void OpDescBind::SetInput(const std::string &param_name,
const std::vector<std::string> &args) {
need_update_ = true;
inputs_[param_name] = args;
}
const std::vector<std::string> &OpDescBind::Output(
const std::string &name) const {
auto it = outputs_.find(name);
PADDLE_ENFORCE(it != outputs_.end(), "Output %s cannot be found in Op %s",
name, Type());
return it->second;
}
std::vector<std::string> OpDescBind::OutputNames() const {
std::vector<std::string> retv;
retv.reserve(this->outputs_.size());
for (auto &ipt : this->outputs_) {
retv.push_back(ipt.first);
}
return retv;
}
void OpDescBind::SetOutput(const std::string &param_name,
const std::vector<std::string> &args) {
need_update_ = true;
this->outputs_[param_name] = args;
}
AttrType OpDescBind::GetAttrType(const std::string &name) const {
auto it = attrs_.find(name);
PADDLE_ENFORCE(it != attrs_.end(), "Attribute %s is not found", name);
return static_cast<AttrType>(it->second.which() - 1);
}
std::vector<std::string> OpDescBind::AttrNames() const {
std::vector<std::string> retv;
retv.reserve(attrs_.size());
for (auto &attr : attrs_) {
retv.push_back(attr.first);
}
return retv;
}
void OpDescBind::SetAttr(const std::string &name, const Attribute &v) {
this->attrs_[name] = v;
need_update_ = true;
}
void OpDescBind::SetAttrMap(
const std::unordered_map<std::string, Attribute> &attr_map) {
attrs_ = attr_map;
need_update_ = true;
}
Attribute OpDescBind::GetAttr(const std::string &name) const {
auto it = attrs_.find(name);
PADDLE_ENFORCE(it != attrs_.end(), "Attribute %s is not found", name);
return it->second;
}
int OpDescBind::GetBlockAttr(const std::string &name) const {
auto it = attrs_.find(name);
PADDLE_ENFORCE(it != attrs_.end(), "Attribute %s is not found", name);
return boost::get<BlockDesc *>(it->second)->idx();
}
const std::unordered_map<std::string, Attribute> &OpDescBind::GetAttrMap()
const {
return attrs_;
}
void OpDescBind::Sync() {
if (need_update_) {
this->op_desc_.mutable_inputs()->Clear();
for (auto &ipt : inputs_) {
auto *input = op_desc_.add_inputs();
input->set_parameter(ipt.first);
VectorToRepeated(ipt.second, input->mutable_arguments());
}
this->op_desc_.mutable_outputs()->Clear();
for (auto &opt : outputs_) {
auto *output = op_desc_.add_outputs();
output->set_parameter(opt.first);
VectorToRepeated(opt.second, output->mutable_arguments());
}
this->op_desc_.mutable_attrs()->Clear();
for (auto &attr : attrs_) {
auto *attr_desc = op_desc_.add_attrs();
attr_desc->set_name(attr.first);
attr_desc->set_type(
static_cast<framework::AttrType>(attr.second.which() - 1));
boost::apply_visitor(SetAttrDescVisitor(attr_desc), attr.second);
}
need_update_ = false;
}
}
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <unordered_map>
#include <vector>
#include "paddle/framework/attribute.h"
#include "paddle/framework/var_desc.h"
namespace paddle {
namespace framework {
class BlockDescBind;
class OpDescBind {
public:
OpDesc *Proto();
std::string Type() const { return op_desc_.type(); }
void SetType(const std::string &type) { op_desc_.set_type(type); }
const std::vector<std::string> &Input(const std::string &name) const;
std::vector<std::string> InputNames() const;
void SetInput(const std::string &param_name,
const std::vector<std::string> &args);
const std::vector<std::string> &Output(const std::string &name) const;
std::vector<std::string> OutputNames() const;
void SetOutput(const std::string &param_name,
const std::vector<std::string> &args);
std::string DebugString() { return this->Proto()->DebugString(); }
bool HasAttr(const std::string &name) const {
return attrs_.find(name) != attrs_.end();
}
AttrType GetAttrType(const std::string &name) const;
std::vector<std::string> AttrNames() const;
void SetAttr(const std::string &name, const Attribute &v);
void SetBlockAttr(const std::string &name, BlockDescBind &block);
// Only be used in C++
void SetAttrMap(const std::unordered_map<std::string, Attribute> &attr_map);
Attribute GetAttr(const std::string &name) const;
int GetBlockAttr(const std::string &name) const;
// Only be used in C++
const std::unordered_map<std::string, Attribute> &GetAttrMap() const;
private:
struct SetAttrDescVisitor : public boost::static_visitor<void> {
explicit SetAttrDescVisitor(OpDesc::Attr *attr) : attr_(attr) {}
mutable OpDesc::Attr *attr_;
void operator()(int v) const { attr_->set_i(v); }
void operator()(float v) const { attr_->set_f(v); }
void operator()(const std::string &v) const { attr_->set_s(v); }
void operator()(bool b) const { attr_->set_b(b); }
void operator()(const std::vector<int> &v) const {
VectorToRepeated(v, attr_->mutable_ints());
}
void operator()(const std::vector<float> &v) const {
VectorToRepeated(v, attr_->mutable_floats());
}
void operator()(const std::vector<std::string> &v) const {
VectorToRepeated(v, attr_->mutable_strings());
}
void operator()(const std::vector<bool> &v) const {
VectorToRepeated(v, attr_->mutable_bools());
}
void operator()(BlockDesc *desc) const {
attr_->set_block_idx(desc->idx());
}
void operator()(boost::blank) const { PADDLE_THROW("Unexpected branch"); }
};
void Sync();
OpDesc op_desc_;
std::unordered_map<std::string, std::vector<std::string>> inputs_;
std::unordered_map<std::string, std::vector<std::string>> outputs_;
std::unordered_map<std::string, Attribute> attrs_;
// need_update_ indicate there some local changes not be synchronized. If
// local changes should be synchronized, need_update_ should be set to true.
bool need_update_{false};
};
} // namespace framework
} // namespace paddle
...@@ -100,13 +100,39 @@ class OpRegistrar : public Registrar { ...@@ -100,13 +100,39 @@ class OpRegistrar : public Registrar {
} }
}; };
template <typename PlaceType, typename KernelType> template <typename PlaceType, bool at_end, size_t I, typename... KernelType>
struct OpKernelRegistrarFunctor;
template <typename PlaceType, size_t I, typename... KernelTypes>
struct OpKernelRegistrarFunctor<PlaceType, false, I, KernelTypes...> {
using KERNEL_TYPE =
typename std::tuple_element<I, std::tuple<KernelTypes...>>::type;
void operator()(const char* op_type) const {
using T = typename KERNEL_TYPE::ELEMENT_TYPE;
OperatorWithKernel::OpKernelKey key(ToDataType(std::type_index(typeid(T))),
PlaceType());
OperatorWithKernel::AllOpKernels()[op_type][key].reset(new KERNEL_TYPE);
constexpr auto size = std::tuple_size<std::tuple<KernelTypes...>>::value;
OpKernelRegistrarFunctor<PlaceType, I + 1 == size, I + 1, KernelTypes...>
func;
func(op_type);
}
};
template <typename PlaceType, size_t I, typename... KernelType>
struct OpKernelRegistrarFunctor<PlaceType, true, I, KernelType...> {
void operator()(const char* op_type) const {}
};
// User can register many kernel in one place. The data type could be different.
template <typename PlaceType, typename... KernelType>
class OpKernelRegistrar : public Registrar { class OpKernelRegistrar : public Registrar {
public: public:
explicit OpKernelRegistrar(const char* op_type) { explicit OpKernelRegistrar(const char* op_type) {
OperatorWithKernel::OpKernelKey key; OpKernelRegistrarFunctor<PlaceType, false, 0, KernelType...> func;
key.place_ = PlaceType(); func(op_type);
OperatorWithKernel::AllOpKernels()[op_type][key].reset(new KernelType);
} }
}; };
......
...@@ -10,7 +10,6 @@ class CosineOp : public OperatorBase { ...@@ -10,7 +10,6 @@ class CosineOp : public OperatorBase {
using OperatorBase::OperatorBase; using OperatorBase::OperatorBase;
void Run(const Scope& scope, void Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const override {} const platform::DeviceContext& dev_ctx) const override {}
void InferShape(const Scope& scope) const override {}
}; };
class CosineOpProtoAndCheckerMaker : public OpProtoAndCheckerMaker { class CosineOpProtoAndCheckerMaker : public OpProtoAndCheckerMaker {
...@@ -29,7 +28,6 @@ class CosineOpProtoAndCheckerMaker : public OpProtoAndCheckerMaker { ...@@ -29,7 +28,6 @@ class CosineOpProtoAndCheckerMaker : public OpProtoAndCheckerMaker {
class MyTestOp : public OperatorBase { class MyTestOp : public OperatorBase {
public: public:
using OperatorBase::OperatorBase; using OperatorBase::OperatorBase;
void InferShape(const Scope& scope) const override {}
void Run(const Scope& scope, void Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const override {} const platform::DeviceContext& dev_ctx) const override {}
}; };
......
...@@ -22,14 +22,14 @@ namespace framework { ...@@ -22,14 +22,14 @@ namespace framework {
template <> template <>
Eigen::DefaultDevice& ExecutionContext::GetEigenDevice< Eigen::DefaultDevice& ExecutionContext::GetEigenDevice<
platform::CPUPlace, Eigen::DefaultDevice>() const { platform::CPUPlace, Eigen::DefaultDevice>() const {
return *device_context_.get_eigen_device<Eigen::DefaultDevice>(); return *device_context_.GetEigenDevice<platform::CPUPlace>();
} }
#ifndef PADDLE_ONLY_CPU #ifndef PADDLE_ONLY_CPU
template <> template <>
Eigen::GpuDevice& Eigen::GpuDevice&
ExecutionContext::GetEigenDevice<platform::GPUPlace, Eigen::GpuDevice>() const { ExecutionContext::GetEigenDevice<platform::GPUPlace, Eigen::GpuDevice>() const {
return *device_context_.get_eigen_device<Eigen::GpuDevice>(); return *device_context_.GetEigenDevice<platform::GPUPlace>();
} }
#endif #endif
......
...@@ -22,6 +22,7 @@ limitations under the License. */ ...@@ -22,6 +22,7 @@ limitations under the License. */
#include "op_info.h" #include "op_info.h"
#include "paddle/framework/attribute.h" #include "paddle/framework/attribute.h"
#include "paddle/framework/data_type.h"
#include "paddle/framework/framework.pb.h" #include "paddle/framework/framework.pb.h"
#include "paddle/framework/lod_tensor.h" #include "paddle/framework/lod_tensor.h"
#include "paddle/framework/scope.h" #include "paddle/framework/scope.h"
...@@ -83,10 +84,6 @@ class OperatorBase { ...@@ -83,10 +84,6 @@ class OperatorBase {
virtual std::string DebugString() const; virtual std::string DebugString() const;
/// InferShape infer the size of Variables used by this Operator with
/// information inside scope
virtual void InferShape(const Scope& scope) const = 0;
/// Net will call this function to Run an op. /// Net will call this function to Run an op.
virtual void Run(const Scope& scope, virtual void Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const = 0; const platform::DeviceContext& dev_ctx) const = 0;
...@@ -164,7 +161,6 @@ class OperatorBase { ...@@ -164,7 +161,6 @@ class OperatorBase {
class NOP : public OperatorBase { class NOP : public OperatorBase {
public: public:
using OperatorBase::OperatorBase; using OperatorBase::OperatorBase;
void InferShape(const Scope& scope) const override {}
void Run(const Scope& scope, void Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const override {} const platform::DeviceContext& dev_ctx) const override {}
std::unique_ptr<OperatorBase> Clone() const override { std::unique_ptr<OperatorBase> Clone() const override {
...@@ -300,21 +296,6 @@ template <> ...@@ -300,21 +296,6 @@ template <>
std::vector<Tensor*> InferShapeContext::MultiOutput<Tensor>( std::vector<Tensor*> InferShapeContext::MultiOutput<Tensor>(
const std::string& name) const; const std::string& name) const;
template <typename T>
struct EigenDeviceConverter;
template <>
struct EigenDeviceConverter<platform::CPUPlace> {
using EigenDeviceType = Eigen::DefaultDevice;
};
#ifndef PADDLE_ONLY_CPU
template <>
struct EigenDeviceConverter<platform::GPUPlace> {
using EigenDeviceType = Eigen::GpuDevice;
};
#endif
class ExecutionContext : public InferShapeContext { class ExecutionContext : public InferShapeContext {
public: public:
ExecutionContext(const OperatorBase& op, const Scope& scope, ExecutionContext(const OperatorBase& op, const Scope& scope,
...@@ -322,8 +303,8 @@ class ExecutionContext : public InferShapeContext { ...@@ -322,8 +303,8 @@ class ExecutionContext : public InferShapeContext {
: InferShapeContext(op, scope), device_context_(device_context) {} : InferShapeContext(op, scope), device_context_(device_context) {}
template <typename PlaceType, template <typename PlaceType,
typename DeviceType = typename DeviceType = typename platform::EigenDeviceConverter<
typename EigenDeviceConverter<PlaceType>::EigenDeviceType> PlaceType>::EigenDeviceType>
DeviceType& GetEigenDevice() const; DeviceType& GetEigenDevice() const;
platform::Place GetPlace() const { return device_context_.GetPlace(); } platform::Place GetPlace() const { return device_context_.GetPlace(); }
...@@ -353,6 +334,32 @@ class RuntimeInferShapeContext : public InferShapeContextBase { ...@@ -353,6 +334,32 @@ class RuntimeInferShapeContext : public InferShapeContextBase {
return var != nullptr; return var != nullptr;
} }
bool HasInputs(const std::string& name) const {
auto inputs = op_.Inputs(name);
if (inputs.size() == 0UL) {
return false;
}
for (auto& input : inputs) {
if (scope_.FindVar(input) == nullptr) {
return false;
}
}
return true;
}
bool HasOutputs(const std::string& name) const {
auto outputs = op_.Outputs(name);
if (outputs.size() == 0UL) {
return false;
}
for (auto& output : outputs) {
if (scope_.FindVar(output) == nullptr) {
return false;
}
}
return true;
}
DDim GetInputDim(const std::string& name) const { DDim GetInputDim(const std::string& name) const {
return GetDim(op_.Input(name)); return GetDim(op_.Input(name));
} }
...@@ -408,7 +415,7 @@ class RuntimeInferShapeContext : public InferShapeContextBase { ...@@ -408,7 +415,7 @@ class RuntimeInferShapeContext : public InferShapeContextBase {
const Scope& scope_; const Scope& scope_;
}; };
class OpKernel { class OpKernelBase {
public: public:
/** /**
* ExecutionContext is the only parameter of Kernel Run function. * ExecutionContext is the only parameter of Kernel Run function.
...@@ -419,48 +426,61 @@ class OpKernel { ...@@ -419,48 +426,61 @@ class OpKernel {
virtual void Compute(const ExecutionContext& context) const = 0; virtual void Compute(const ExecutionContext& context) const = 0;
virtual ~OpKernel() {} virtual ~OpKernelBase() = default;
};
template <typename T>
class OpKernel : public OpKernelBase {
public:
using ELEMENT_TYPE = T;
}; };
class OperatorWithKernel : public OperatorBase { class OperatorWithKernel : public OperatorBase {
public: public:
struct OpKernelKey { struct OpKernelKey {
platform::Place place_; platform::Place place_;
DataType data_type_;
OpKernelKey() = default; OpKernelKey(DataType data_type, platform::Place place)
explicit OpKernelKey(const platform::DeviceContext& dev_ctx) { : place_(place), data_type_(data_type) {}
place_ = dev_ctx.GetPlace();
} OpKernelKey(DataType data_type, const platform::DeviceContext& dev_ctx)
: place_(dev_ctx.GetPlace()), data_type_(data_type) {}
bool operator==(const OpKernelKey& o) const { bool operator==(const OpKernelKey& o) const {
return platform::places_are_same_class(place_, o.place_); return platform::places_are_same_class(place_, o.place_) &&
data_type_ == o.data_type_;
} }
}; };
struct OpKernelHash { struct OpKernelHash {
std::hash<bool> hash_; std::hash<int> hash_;
size_t operator()(const OpKernelKey& key) const { size_t operator()(const OpKernelKey& key) const {
return hash_(platform::is_gpu_place(key.place_)); int place = key.place_.which();
int data_type = static_cast<int>(key.data_type_);
int pre_hash = data_type << NUM_PLACE_TYPE_LIMIT_IN_BIT |
(place & ((1 << NUM_PLACE_TYPE_LIMIT_IN_BIT) - 1));
return hash_(pre_hash);
} }
}; };
using OpKernelMap = using OpKernelMap =
std::unordered_map<OpKernelKey, std::unique_ptr<OpKernel>, OpKernelHash>; std::unordered_map<OpKernelKey, std::unique_ptr<OpKernelBase>,
OpKernelHash>;
OperatorWithKernel(const std::string& type, const VariableNameMap& inputs, OperatorWithKernel(const std::string& type, const VariableNameMap& inputs,
const VariableNameMap& outputs, const AttributeMap& attrs) const VariableNameMap& outputs, const AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs) {} : OperatorBase(type, inputs, outputs, attrs) {}
// runtime infershape
void InferShape(const Scope& scope) const override {
auto c = RuntimeInferShapeContext(*this, scope);
InferShape(&c);
}
void Run(const Scope& scope, void Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const final { const platform::DeviceContext& dev_ctx) const final {
auto& opKernel = AllOpKernels().at(type_).at(OpKernelKey(dev_ctx)); RuntimeInferShapeContext infer_shape_ctx(*this, scope);
opKernel->Compute(ExecutionContext(*this, scope, dev_ctx)); this->InferShape(&infer_shape_ctx);
ExecutionContext ctx(*this, scope, dev_ctx);
auto& opKernel = AllOpKernels().at(type_).at(
OpKernelKey(IndicateDataType(ctx), dev_ctx));
opKernel->Compute(ctx);
} }
static std::unordered_map<std::string /* op_type */, OpKernelMap>& static std::unordered_map<std::string /* op_type */, OpKernelMap>&
...@@ -470,13 +490,43 @@ class OperatorWithKernel : public OperatorBase { ...@@ -470,13 +490,43 @@ class OperatorWithKernel : public OperatorBase {
} }
bool SupportGPU() const override { bool SupportGPU() const override {
OperatorWithKernel::OpKernelKey key; auto& op_kernels = OperatorWithKernel::AllOpKernels().at(type_);
key.place_ = platform::GPUPlace(); return std::any_of(op_kernels.begin(), op_kernels.end(),
return OperatorWithKernel::AllOpKernels().at(type_).count(key) != 0; [](OpKernelMap::const_reference kern_pair) {
return platform::is_gpu_place(kern_pair.first.place_);
});
} }
protected: protected:
virtual void InferShape(InferShapeContextBase* ctx) const = 0; virtual void InferShape(InferShapeContextBase* ctx) const = 0;
// indicate kernel DataType by input data. Defaultly all input data must be
// same.
virtual DataType IndicateDataType(const ExecutionContext& ctx) const {
auto& scope = ctx.scope();
int data_type = -1;
for (auto& input : this->inputs_) {
for (auto& ipt_name : input.second) {
auto* var = scope.FindVar(ipt_name);
if (var != nullptr) {
const Tensor* t = nullptr;
if (var->IsType<Tensor>()) {
t = &var->Get<Tensor>();
} else if (var->IsType<LoDTensor>()) {
t = &var->Get<LoDTensor>();
}
if (t != nullptr) {
int tmp = static_cast<int>(ToDataType(t->type()));
PADDLE_ENFORCE(tmp == data_type || data_type == -1,
"DataType of Paddle Op must be same.");
data_type = tmp;
}
}
}
}
PADDLE_ENFORCE(data_type != -1, "DataType should be indicated by input");
return static_cast<DataType>(data_type);
}
}; };
} // namespace framework } // namespace framework
......
...@@ -27,7 +27,6 @@ class OpWithoutKernelTest : public OperatorBase { ...@@ -27,7 +27,6 @@ class OpWithoutKernelTest : public OperatorBase {
OpWithoutKernelTest(const std::string& type, const VariableNameMap& inputs, OpWithoutKernelTest(const std::string& type, const VariableNameMap& inputs,
const VariableNameMap& outputs, const AttributeMap& attrs) const VariableNameMap& outputs, const AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs), x(1) {} : OperatorBase(type, inputs, outputs, attrs), x(1) {}
void InferShape(const Scope& scope) const override {}
void Run(const Scope& scope, void Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const override { const platform::DeviceContext& dev_ctx) const override {
++op_run_num; ++op_run_num;
...@@ -87,7 +86,6 @@ TEST(OperatorBase, all) { ...@@ -87,7 +86,6 @@ TEST(OperatorBase, all) {
auto op = paddle::framework::OpRegistry::CreateOp(op_desc); auto op = paddle::framework::OpRegistry::CreateOp(op_desc);
scope.NewVar("OUT1"); scope.NewVar("OUT1");
ASSERT_EQ(paddle::framework::op_run_num, 0); ASSERT_EQ(paddle::framework::op_run_num, 0);
op->InferShape(scope);
op->Run(scope, device_context); op->Run(scope, device_context);
ASSERT_EQ(paddle::framework::op_run_num, 1); ASSERT_EQ(paddle::framework::op_run_num, 1);
} }
...@@ -116,10 +114,13 @@ class OpWithKernelTest : public OperatorWithKernel { ...@@ -116,10 +114,13 @@ class OpWithKernelTest : public OperatorWithKernel {
protected: protected:
void InferShape(framework::InferShapeContextBase* ctx) const override {} void InferShape(framework::InferShapeContextBase* ctx) const override {}
DataType IndicateDataType(const ExecutionContext& ctx) const override {
return DataType::FP32;
}
}; };
template <typename T1, typename T2> template <typename T1, typename T2>
class CPUKernelTest : public OpKernel { class CPUKernelTest : public OpKernel<float> {
public: public:
void Compute(const ExecutionContext& ctx) const { void Compute(const ExecutionContext& ctx) const {
std::cout << "this is cpu kernel" << std::endl; std::cout << "this is cpu kernel" << std::endl;
...@@ -146,7 +147,7 @@ class OpKernelTestMultiInputsProtoAndCheckerMaker ...@@ -146,7 +147,7 @@ class OpKernelTestMultiInputsProtoAndCheckerMaker
} }
}; };
class CPUKernalMultiInputsTest : public OpKernel { class CPUKernalMultiInputsTest : public OpKernel<float> {
public: public:
void Compute(const ExecutionContext& ctx) const { void Compute(const ExecutionContext& ctx) const {
auto xs = ctx.op().Inputs("xs"); auto xs = ctx.op().Inputs("xs");
...@@ -255,7 +256,6 @@ class OperatorClone : public paddle::framework::OperatorBase { ...@@ -255,7 +256,6 @@ class OperatorClone : public paddle::framework::OperatorBase {
const paddle::framework::VariableNameMap& outputs, const paddle::framework::VariableNameMap& outputs,
const paddle::framework::AttributeMap& attrs) const paddle::framework::AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs) {} : OperatorBase(type, inputs, outputs, attrs) {}
void InferShape(const paddle::framework::Scope& scope) const override {}
void Run(const paddle::framework::Scope& scope, void Run(const paddle::framework::Scope& scope,
const paddle::platform::DeviceContext& dev_ctx) const override {} const paddle::platform::DeviceContext& dev_ctx) const override {}
}; };
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/program_desc.h"
#include "paddle/framework/block_desc.h"
namespace paddle {
namespace framework {
using ProgDescMap =
std::unordered_map<ProgramDesc *, std::unique_ptr<ProgramDescBind>>;
static ProgDescMap *g_bind_map = nullptr;
ProgramDescBind &ProgramDescBind::Instance(ProgramDesc *prog) {
if (g_bind_map == nullptr) {
g_bind_map = new ProgDescMap();
}
auto &map = *g_bind_map;
auto &ptr = map[prog];
if (ptr == nullptr) {
ptr.reset(new ProgramDescBind(prog));
}
return *ptr;
}
BlockDescBind *ProgramDescBind::AppendBlock(const BlockDescBind &parent) {
auto *b = prog_->add_blocks();
b->set_parent_idx(parent.ID());
b->set_idx(prog_->blocks_size() - 1);
blocks_.emplace_back(new BlockDescBind(this, b));
return blocks_.back().get();
}
ProgramDesc *ProgramDescBind::Proto() {
for (auto &block : blocks_) {
block->Sync();
}
return prog_;
}
ProgramDescBind::ProgramDescBind(ProgramDesc *prog) {
prog_ = prog;
for (auto &block : *prog->mutable_blocks()) {
blocks_.emplace_back(new BlockDescBind(this, &block));
}
}
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/framework/framework.pb.h"
namespace paddle {
namespace framework {
class BlockDescBind;
class ProgramDescBind {
public:
static ProgramDescBind &Instance(ProgramDesc *prog);
ProgramDescBind(const ProgramDescBind &o) = delete;
ProgramDescBind &operator=(const ProgramDescBind &o) = delete;
BlockDescBind *AppendBlock(const BlockDescBind &parent);
BlockDescBind *Block(size_t idx) { return blocks_[idx].get(); }
std::string DebugString() { return Proto()->DebugString(); }
size_t Size() const { return blocks_.size(); }
ProgramDesc *Proto();
private:
explicit ProgramDescBind(ProgramDesc *prog);
// Not owned
ProgramDesc *prog_;
std::vector<std::unique_ptr<BlockDescBind>> blocks_;
};
} // namespace framework
} // namespace paddle
...@@ -24,6 +24,10 @@ class InferShapeContextBase { ...@@ -24,6 +24,10 @@ class InferShapeContextBase {
virtual ~InferShapeContextBase() {} virtual ~InferShapeContextBase() {}
virtual bool HasInput(const std::string &name) const = 0; virtual bool HasInput(const std::string &name) const = 0;
virtual bool HasOutput(const std::string &name) const = 0; virtual bool HasOutput(const std::string &name) const = 0;
virtual bool HasInputs(const std::string &name) const = 0;
virtual bool HasOutputs(const std::string &name) const = 0;
virtual framework::DDim GetInputDim(const std::string &name) const = 0; virtual framework::DDim GetInputDim(const std::string &name) const = 0;
std::vector<framework::DDim> GetInputsDim(const std::string &name) const { std::vector<framework::DDim> GetInputsDim(const std::string &name) const {
const std::vector<std::string> &names = Inputs(name); const std::vector<std::string> &names = Inputs(name);
......
...@@ -29,20 +29,10 @@ limitations under the License. */ ...@@ -29,20 +29,10 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace pybind {
namespace details {
template <bool less, size_t i, typename... args>
struct CastToPyBufferImpl;
}
} // namespace pybind
namespace framework { namespace framework {
class Tensor { class Tensor {
public: public:
template <bool less, size_t i, typename... args>
friend struct pybind::details::CastToPyBufferImpl;
template <typename T, size_t D, int MajorType, typename IndexType> template <typename T, size_t D, int MajorType, typename IndexType>
friend struct EigenTensor; friend struct EigenTensor;
...@@ -119,6 +109,8 @@ class Tensor { ...@@ -119,6 +109,8 @@ class Tensor {
return holder_->place(); return holder_->place();
} }
std::type_index type() const { return holder_->type(); }
private: private:
template <typename T> template <typename T>
inline void check_memory_size() const; inline void check_memory_size() const;
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/framework/var_desc.h"
namespace paddle {
namespace framework {
void VarDescBind::SetShape(const std::vector<int64_t> &dims) {
VectorToRepeated(dims, desc_.mutable_lod_tensor()->mutable_dims());
}
void VarDescBind::SetDataType(DataType data_type) {
desc_.mutable_lod_tensor()->set_data_type(data_type);
}
std::vector<int64_t> VarDescBind::Shape() const {
return RepeatedToVector(desc_.lod_tensor().dims());
}
DataType VarDescBind::GetDataType() const {
return desc_.lod_tensor().data_type();
}
} // namespace framework
} // namespace paddle
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <vector>
#include "paddle/framework/framework.pb.h"
namespace paddle {
namespace framework {
// convert between std::vector and protobuf repeated.
template <typename T>
inline std::vector<T> RepeatedToVector(
const google::protobuf::RepeatedField<T> &repeated_field) {
std::vector<T> ret;
ret.reserve(repeated_field.size());
std::copy(repeated_field.begin(), repeated_field.end(),
std::back_inserter(ret));
return ret;
}
template <typename T, typename RepeatedField>
inline void VectorToRepeated(const std::vector<T> &vec,
RepeatedField *repeated_field) {
repeated_field->Reserve(vec.size());
for (const auto &elem : vec) {
*repeated_field->Add() = elem;
}
}
// Specialize vector<bool>.
template <typename RepeatedField>
inline void VectorToRepeated(const std::vector<bool> &vec,
RepeatedField *repeated_field) {
repeated_field->Reserve(vec.size());
for (auto elem : vec) {
*repeated_field->Add() = elem;
}
}
class VarDescBind {
public:
explicit VarDescBind(const std::string &name) { desc_.set_name(name); }
VarDesc *Proto() { return &desc_; }
std::string Name() const { return desc_.name(); }
void SetShape(const std::vector<int64_t> &dims);
void SetDataType(DataType data_type);
std::vector<int64_t> Shape() const;
DataType GetDataType() const;
private:
VarDesc desc_;
};
} // namespace framework
} // namespace paddle
...@@ -18,7 +18,6 @@ limitations under the License. */ ...@@ -18,7 +18,6 @@ limitations under the License. */
#include "neon_util.h" #include "neon_util.h"
namespace paddle { namespace paddle {
namespace neon { namespace neon {
#if defined(__ARM_NEON__) || defined(__ARM_NEON) #if defined(__ARM_NEON__) || defined(__ARM_NEON)
...@@ -26,17 +25,20 @@ namespace neon { ...@@ -26,17 +25,20 @@ namespace neon {
template <int filterSize, int stride> template <int filterSize, int stride>
struct DepthwiseConvKernel {}; struct DepthwiseConvKernel {};
inline float32_t conv3x3(float32x4_t r0, inline float32_t conv3x3(const float* r0,
float32x4_t r1, const float* r1,
float32x4_t r2, const float* r2,
float32x4_t k0, float32x4_t k0,
float32x4_t k1, float32x4_t k1,
float32x4_t k2) { float32x4_t k2) {
float32x4_t tmp; float32_t tmp[12];
tmp = vmulq_f32(r0, k0); vst1q_f32(&(tmp[0]), k0);
tmp = vmlaq_f32(tmp, r1, k1); vst1q_f32(&(tmp[4]), k1);
tmp = vmlaq_f32(tmp, r2, k2); vst1q_f32(&(tmp[8]), k2);
return vaddvq_f32(tmp); float32_t sum0 = r0[0] * tmp[0] + r0[1] * tmp[1] + r0[2] * tmp[2];
float32_t sum1 = r1[0] * tmp[4] + r1[1] * tmp[5] + r1[2] * tmp[6];
float32_t sum2 = r2[0] * tmp[8] + r2[1] * tmp[9] + r2[2] * tmp[10];
return sum0 + sum1 + sum2;
} }
inline float32_t conv4x4(float32x4_t r0, inline float32_t conv4x4(float32x4_t r0,
...@@ -136,10 +138,7 @@ struct DepthwiseConvKernel<3, 1> { ...@@ -136,10 +138,7 @@ struct DepthwiseConvKernel<3, 1> {
} }
for (int r = 0; r < remain; r++) { for (int r = 0; r < remain; r++) {
float32x4_t i0 = vld1q_f32(r0); *outputData = conv3x3(r0, r1, r2, k[0], k[1], k[2]);
float32x4_t i1 = vld1q_f32(r1);
float32x4_t i2 = vld1q_f32(r2);
*outputData = conv3x3(i0, i1, i2, k[0], k[1], k[2]);
r0++; r0++;
r1++; r1++;
r2++; r2++;
...@@ -243,10 +242,7 @@ struct DepthwiseConvKernel<3, 2> { ...@@ -243,10 +242,7 @@ struct DepthwiseConvKernel<3, 2> {
} }
for (int r = 0; r < remain; r++) { for (int r = 0; r < remain; r++) {
float32x4_t i0 = vld1q_f32(r0); *outputData = conv3x3(r0, r1, r2, k[0], k[1], k[2]);
float32x4_t i1 = vld1q_f32(r1);
float32x4_t i2 = vld1q_f32(r2);
*outputData = conv3x3(i0, i1, i2, k[0], k[1], k[2]);
r0 += 2; r0 += 2;
r1 += 2; r1 += 2;
r2 += 2; r2 += 2;
......
...@@ -215,13 +215,13 @@ struct testActDesc { ...@@ -215,13 +215,13 @@ struct testActDesc {
static void getAddtoConfig(TestConfig& cfg, const testActDesc& pm) { static void getAddtoConfig(TestConfig& cfg, const testActDesc& pm) {
cfg.biasSize = 0; cfg.biasSize = 0;
cfg.layerConfig.set_type("addto"); cfg.layerConfig.set_type("addto");
size_t layerSize = pm.ih * pm.ih * pm.iw; size_t layerSize = pm.ic * pm.ih * pm.iw;
cfg.layerConfig.set_size(layerSize); cfg.layerConfig.set_size(layerSize);
cfg.inputDefs.push_back({INPUT_DATA, "layer_0", layerSize, 0}); cfg.inputDefs.push_back({INPUT_DATA, "layer_0", layerSize, 0});
cfg.layerConfig.add_inputs(); cfg.layerConfig.add_inputs();
} }
void testActivation(std::string& actType, const testActDesc& pm) { void testActivation(std::string actType, const testActDesc& pm) {
// TODO(TJ): remove me when paddle support elu activation // TODO(TJ): remove me when paddle support elu activation
if (actType == "mkldnn_elu") { if (actType == "mkldnn_elu") {
return; return;
...@@ -240,6 +240,7 @@ TEST(MKLDNNActivation, Activations) { ...@@ -240,6 +240,7 @@ TEST(MKLDNNActivation, Activations) {
for (auto type : types) { for (auto type : types) {
/* bs, c, h, w*/ /* bs, c, h, w*/
testActivation(type, {16, 64, 32, 32}); testActivation(type, {16, 64, 32, 32});
testActivation(type, {2, 8, 1, 1});
} }
} }
......
...@@ -99,7 +99,11 @@ public: ...@@ -99,7 +99,11 @@ public:
/** /**
* @brief clear local buffer. It only affect auto-growth buffer. * @brief clear local buffer. It only affect auto-growth buffer.
*/ */
inline void clear() { rowStore_.clear(); } inline void clear() {
// swap an empty vector to it to free the memory.
std::vector<real, AlignedAllocator<real, 32>> empty;
rowStore_.swap(empty);
}
/** /**
* @brief get current number of rows. * @brief get current number of rows.
......
...@@ -55,12 +55,25 @@ function(op_library TARGET) ...@@ -55,12 +55,25 @@ function(op_library TARGET)
set(pybind_flag 1) set(pybind_flag 1)
endif() endif()
if ("${TARGET}" STREQUAL "pool_op")
set(pybind_flag 1)
# It's enough to just adding one operator to pybind
file(APPEND ${pybind_file} "USE_OP(pool2d);\n")
endif()
# activation_op contains several operators # activation_op contains several operators
if ("${TARGET}" STREQUAL "activation_op") if ("${TARGET}" STREQUAL "activation_op")
set(pybind_flag 1) set(pybind_flag 1)
# It's enough to just adding one operator to pybind # It's enough to just adding one operator to pybind
file(APPEND ${pybind_file} "USE_OP(sigmoid);\n") file(APPEND ${pybind_file} "USE_OP(sigmoid);\n")
endif() endif()
# reduce_op contains several operators
if ("${TARGET}" STREQUAL "reduce_op")
set(pybind_flag 1)
# It's enough to just adding one operator to pybind
file(APPEND ${pybind_file} "USE_OP(reduce_sum);\n")
endif()
# pybind USE_NO_KERNEL_OP # pybind USE_NO_KERNEL_OP
file(READ ${TARGET}.cc TARGET_CONTENT) file(READ ${TARGET}.cc TARGET_CONTENT)
...@@ -94,8 +107,8 @@ set(DEPS_OPS ...@@ -94,8 +107,8 @@ set(DEPS_OPS
op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc
DEPS framework_proto tensor net_op) DEPS framework_proto tensor net_op)
op_library(cond_op SRCS cond_op.cc DEPS framework_proto tensor operator net_op) op_library(cond_op SRCS cond_op.cc DEPS framework_proto tensor operator net_op)
op_library(cross_entropy_op DEPS cross_entropy_function) op_library(cross_entropy_op DEPS cross_entropy)
op_library(softmax_with_cross_entropy_op DEPS cross_entropy_function softmax_function) op_library(softmax_with_cross_entropy_op DEPS cross_entropy softmax)
list(REMOVE_ITEM GENERAL_OPS ${DEPS_OPS}) list(REMOVE_ITEM GENERAL_OPS ${DEPS_OPS})
foreach(src ${GENERAL_OPS}) foreach(src ${GENERAL_OPS})
......
...@@ -47,7 +47,7 @@ __global__ void AccuracyCudaKernel(const int N, const int D, const int* Xdata, ...@@ -47,7 +47,7 @@ __global__ void AccuracyCudaKernel(const int N, const int D, const int* Xdata,
} }
template <typename T> template <typename T>
class AccuracyOpCUDAKernel : public framework::OpKernel { class AccuracyOpCUDAKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
......
...@@ -35,7 +35,7 @@ template <typename T, int MajorType = Eigen::RowMajor, ...@@ -35,7 +35,7 @@ template <typename T, int MajorType = Eigen::RowMajor,
using EigenScalar = framework::EigenScalar<T, MajorType, IndexType>; using EigenScalar = framework::EigenScalar<T, MajorType, IndexType>;
template <typename Place, typename T> template <typename Place, typename T>
class AccuracyKernel : public framework::OpKernel { class AccuracyKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto* inference = ctx.Input<Tensor>("Inference"); auto* inference = ctx.Input<Tensor>("Inference");
......
...@@ -132,6 +132,17 @@ class SquareOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -132,6 +132,17 @@ class SquareOpMaker : public framework::OpProtoAndCheckerMaker {
} }
}; };
class SoftsignOpMaker : public framework::OpProtoAndCheckerMaker {
public:
SoftsignOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of Softsign operator");
AddOutput("Y", "Output of Softsign operator");
AddComment("Softsign activation operator, softsign(x) = x / (1 + |x|)");
}
};
template <typename AttrType> template <typename AttrType>
class BReluOpMaker : public framework::OpProtoAndCheckerMaker { class BReluOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
...@@ -195,111 +206,57 @@ class STanhOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -195,111 +206,57 @@ class STanhOpMaker : public framework::OpProtoAndCheckerMaker {
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(sigmoid, ops::ActivationOp, ops::SigmoidOpMaker, sigmoid_grad, REGISTER_OP(sigmoid, ops::ActivationOp, ops::SigmoidOpMaker, sigmoid_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(sigmoid,
ops::ActivationKernel<paddle::platform::CPUPlace, float,
ops::SigmoidFunctor<float>>);
REGISTER_OP_CPU_KERNEL(
sigmoid_grad, ops::ActivationGradKernel<paddle::platform::CPUPlace, float,
ops::SigmoidGradFunctor<float>>);
REGISTER_OP(exp, ops::ActivationOp, ops::ExpOpMaker, exp_grad, REGISTER_OP(exp, ops::ActivationOp, ops::ExpOpMaker, exp_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(
exp,
ops::ActivationKernel<paddle::platform::CPUPlace, float, ops::ExpFunctor>);
REGISTER_OP_CPU_KERNEL(exp_grad,
ops::ActivationGradKernel<paddle::platform::CPUPlace,
float, ops::ExpGradFunctor>);
REGISTER_OP(relu, ops::ActivationOp, ops::ReluOpMaker, relu_grad, REGISTER_OP(relu, ops::ActivationOp, ops::ReluOpMaker, relu_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(relu,
ops::ActivationKernel<paddle::platform::CPUPlace, float,
ops::ReluFunctor<float>>);
REGISTER_OP_CPU_KERNEL(
relu_grad, ops::ActivationGradKernel<paddle::platform::CPUPlace, float,
ops::ReluGradFunctor<float>>);
REGISTER_OP(tanh, ops::ActivationOp, ops::TanhOpMaker, tanh_grad, REGISTER_OP(tanh, ops::ActivationOp, ops::TanhOpMaker, tanh_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(
tanh,
ops::ActivationKernel<paddle::platform::CPUPlace, float, ops::TanhFunctor>);
REGISTER_OP_CPU_KERNEL(
tanh_grad, ops::ActivationGradKernel<paddle::platform::CPUPlace, float,
ops::TanhGradFunctor<float>>);
REGISTER_OP(sqrt, ops::ActivationOp, ops::SqrtOpMaker, sqrt_grad, REGISTER_OP(sqrt, ops::ActivationOp, ops::SqrtOpMaker, sqrt_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(
sqrt,
ops::ActivationKernel<paddle::platform::CPUPlace, float, ops::SqrtFunctor>);
REGISTER_OP_CPU_KERNEL(
sqrt_grad, ops::ActivationGradKernel<paddle::platform::CPUPlace, float,
ops::SqrtGradFunctor<float>>);
REGISTER_OP(abs, ops::ActivationOp, ops::AbsOpMaker, abs_grad, REGISTER_OP(abs, ops::ActivationOp, ops::AbsOpMaker, abs_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(
abs,
ops::ActivationKernel<paddle::platform::CPUPlace, float, ops::AbsFunctor>);
REGISTER_OP_CPU_KERNEL(abs_grad,
ops::ActivationGradKernel<paddle::platform::CPUPlace,
float, ops::AbsGradFunctor>);
REGISTER_OP(reciprocal, ops::ActivationOp, ops::ReciprocalOpMaker, REGISTER_OP(reciprocal, ops::ActivationOp, ops::ReciprocalOpMaker,
reciprocal_grad, ops::ActivationOpGrad); reciprocal_grad, ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(reciprocal,
ops::ActivationKernel<paddle::platform::CPUPlace, float,
ops::ReciprocalFunctor<float>>);
REGISTER_OP_CPU_KERNEL(
reciprocal_grad,
ops::ActivationGradKernel<paddle::platform::CPUPlace, float,
ops::ReciprocalGradFunctor<float>>);
REGISTER_OP(log, ops::ActivationOp, ops::LogOpMaker, log_grad, REGISTER_OP(log, ops::ActivationOp, ops::LogOpMaker, log_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(
log,
ops::ActivationKernel<paddle::platform::CPUPlace, float, ops::LogFunctor>);
REGISTER_OP_CPU_KERNEL(
log_grad, ops::ActivationGradKernel<paddle::platform::CPUPlace, float,
ops::LogGradFunctor<float>>);
REGISTER_OP(square, ops::ActivationOp, ops::SquareOpMaker, square_grad, REGISTER_OP(square, ops::ActivationOp, ops::SquareOpMaker, square_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(square,
ops::ActivationKernel<paddle::platform::CPUPlace, float, REGISTER_OP(softsign, ops::ActivationOp, ops::SoftsignOpMaker, softsign_grad,
ops::SquareFunctor>); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(
square_grad, ops::ActivationGradKernel<paddle::platform::CPUPlace, float,
ops::SquareGradFunctor<float>>);
REGISTER_OP(brelu, ops::ActivationOp, ops::BReluOpMaker<float>, brelu_grad, REGISTER_OP(brelu, ops::ActivationOp, ops::BReluOpMaker<float>, brelu_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(brelu,
ops::BReluKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(brelu_grad,
ops::BReluGradKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP(soft_relu, ops::ActivationOp, ops::SoftReluOpMaker<float>, REGISTER_OP(soft_relu, ops::ActivationOp, ops::SoftReluOpMaker<float>,
soft_relu_grad, ops::ActivationOpGrad); soft_relu_grad, ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(soft_relu,
ops::SoftReluKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(
soft_relu_grad, ops::SoftReluGradKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP(pow, ops::ActivationOp, ops::PowOpMaker<float>, pow_grad, REGISTER_OP(pow, ops::ActivationOp, ops::PowOpMaker<float>, pow_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(pow, ops::PowKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL(pow_grad,
ops::PowGradKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP(stanh, ops::ActivationOp, ops::STanhOpMaker<float>, stanh_grad, REGISTER_OP(stanh, ops::ActivationOp, ops::STanhOpMaker<float>, stanh_grad,
ops::ActivationOpGrad); ops::ActivationOpGrad);
REGISTER_OP_CPU_KERNEL(stanh,
ops::STanhKernel<paddle::platform::CPUPlace, float>); #define REGISTER_ACTIVATION_CPU_KERNEL(act_type, functor, grad_functor) \
REGISTER_OP_CPU_KERNEL(stanh_grad, REGISTER_OP_CPU_KERNEL( \
ops::STanhGradKernel<paddle::platform::CPUPlace, float>); act_type, \
paddle::operators::ActivationKernel<paddle::platform::CPUPlace, \
paddle::operators::functor<float>>); \
REGISTER_OP_CPU_KERNEL(act_type##_grad, \
paddle::operators::ActivationGradKernel< \
paddle::platform::CPUPlace, \
paddle::operators::grad_functor<float>>);
FOR_EACH_KERNEL_FUNCTOR(REGISTER_ACTIVATION_CPU_KERNEL);
...@@ -15,86 +15,14 @@ ...@@ -15,86 +15,14 @@
#define EIGEN_USE_GPU #define EIGEN_USE_GPU
#include "paddle/operators/activation_op.h" #include "paddle/operators/activation_op.h"
namespace ops = paddle::operators; #define REGISTER_ACTIVATION_GPU_KERNEL(act_type, functor, grad_functor) \
REGISTER_OP_GPU_KERNEL( \
REGISTER_OP_GPU_KERNEL(sigmoid, act_type, \
ops::ActivationKernel<paddle::platform::GPUPlace, float, paddle::operators::ActivationKernel<paddle::platform::GPUPlace, \
ops::SigmoidFunctor<float>>); paddle::operators::functor<float>>); \
REGISTER_OP_GPU_KERNEL( REGISTER_OP_GPU_KERNEL(act_type##_grad, \
sigmoid_grad, ops::ActivationGradKernel<paddle::platform::GPUPlace, float, paddle::operators::ActivationGradKernel< \
ops::SigmoidGradFunctor<float>>); paddle::platform::GPUPlace, \
paddle::operators::grad_functor<float>>);
REGISTER_OP_GPU_KERNEL(
exp, FOR_EACH_KERNEL_FUNCTOR(REGISTER_ACTIVATION_GPU_KERNEL);
ops::ActivationKernel<paddle::platform::GPUPlace, float, ops::ExpFunctor>);
REGISTER_OP_GPU_KERNEL(exp_grad,
ops::ActivationGradKernel<paddle::platform::GPUPlace,
float, ops::ExpGradFunctor>);
REGISTER_OP_GPU_KERNEL(relu,
ops::ActivationKernel<paddle::platform::GPUPlace, float,
ops::ReluFunctor<float>>);
REGISTER_OP_GPU_KERNEL(
relu_grad, ops::ActivationGradKernel<paddle::platform::GPUPlace, float,
ops::ReluGradFunctor<float>>);
REGISTER_OP_GPU_KERNEL(
tanh,
ops::ActivationKernel<paddle::platform::GPUPlace, float, ops::TanhFunctor>);
REGISTER_OP_GPU_KERNEL(
tanh_grad, ops::ActivationGradKernel<paddle::platform::GPUPlace, float,
ops::TanhGradFunctor<float>>);
REGISTER_OP_GPU_KERNEL(
sqrt,
ops::ActivationKernel<paddle::platform::GPUPlace, float, ops::SqrtFunctor>);
REGISTER_OP_GPU_KERNEL(
sqrt_grad, ops::ActivationGradKernel<paddle::platform::GPUPlace, float,
ops::SqrtGradFunctor<float>>);
REGISTER_OP_GPU_KERNEL(
abs,
ops::ActivationKernel<paddle::platform::GPUPlace, float, ops::AbsFunctor>);
REGISTER_OP_GPU_KERNEL(abs_grad,
ops::ActivationGradKernel<paddle::platform::GPUPlace,
float, ops::AbsGradFunctor>);
REGISTER_OP_GPU_KERNEL(reciprocal,
ops::ActivationKernel<paddle::platform::GPUPlace, float,
ops::ReciprocalFunctor<float>>);
REGISTER_OP_GPU_KERNEL(
reciprocal_grad,
ops::ActivationGradKernel<paddle::platform::GPUPlace, float,
ops::ReciprocalGradFunctor<float>>);
REGISTER_OP_GPU_KERNEL(
log,
ops::ActivationKernel<paddle::platform::GPUPlace, float, ops::LogFunctor>);
REGISTER_OP_GPU_KERNEL(
log_grad, ops::ActivationGradKernel<paddle::platform::GPUPlace, float,
ops::LogGradFunctor<float>>);
REGISTER_OP_GPU_KERNEL(square,
ops::ActivationKernel<paddle::platform::GPUPlace, float,
ops::SquareFunctor>);
REGISTER_OP_GPU_KERNEL(
square_grad, ops::ActivationGradKernel<paddle::platform::GPUPlace, float,
ops::SquareGradFunctor<float>>);
REGISTER_OP_GPU_KERNEL(brelu,
ops::BReluKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(brelu_grad,
ops::BReluGradKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(soft_relu,
ops::SoftReluKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(
soft_relu_grad, ops::SoftReluGradKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(pow, ops::PowKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(pow_grad,
ops::PowGradKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(stanh,
ops::STanhKernel<paddle::platform::GPUPlace, float>);
REGISTER_OP_GPU_KERNEL(stanh_grad,
ops::STanhGradKernel<paddle::platform::GPUPlace, float>);
...@@ -19,9 +19,12 @@ ...@@ -19,9 +19,12 @@
namespace paddle { namespace paddle {
namespace operators { namespace operators {
template <typename Place, typename T, typename Functor> template <typename Place, typename Functor>
class ActivationKernel : public framework::OpKernel { class ActivationKernel
: public framework::OpKernel<typename Functor::ELEMENT_TYPE> {
public: public:
using T = typename Functor::ELEMENT_TYPE;
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* X = context.Input<framework::Tensor>("X"); auto* X = context.Input<framework::Tensor>("X");
auto* Y = context.Output<framework::Tensor>("Y"); auto* Y = context.Output<framework::Tensor>("Y");
...@@ -31,13 +34,20 @@ class ActivationKernel : public framework::OpKernel { ...@@ -31,13 +34,20 @@ class ActivationKernel : public framework::OpKernel {
auto y = framework::EigenVector<T>::Flatten(*Y); auto y = framework::EigenVector<T>::Flatten(*Y);
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
Functor functor; Functor functor;
auto attrs = functor.GetAttrs();
for (auto& attr : attrs) {
*attr.second = context.Attr<float>(attr.first);
}
functor(place, x, y); functor(place, x, y);
} }
}; };
template <typename Place, typename T, typename Functor> template <typename Place, typename Functor>
class ActivationGradKernel : public framework::OpKernel { class ActivationGradKernel
: public framework::OpKernel<typename Functor::ELEMENT_TYPE> {
public: public:
using T = typename Functor::ELEMENT_TYPE;
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* X = context.Input<framework::Tensor>("X"); auto* X = context.Input<framework::Tensor>("X");
auto* Y = context.Input<framework::Tensor>("Y"); auto* Y = context.Input<framework::Tensor>("Y");
...@@ -51,303 +61,322 @@ class ActivationGradKernel : public framework::OpKernel { ...@@ -51,303 +61,322 @@ class ActivationGradKernel : public framework::OpKernel {
auto dx = framework::EigenVector<T>::Flatten(*dX); auto dx = framework::EigenVector<T>::Flatten(*dX);
auto place = context.GetEigenDevice<Place>(); auto place = context.GetEigenDevice<Place>();
Functor functor; Functor functor;
auto attrs = functor.GetAttrs();
for (auto& attr : attrs) {
*attr.second = context.Attr<float>(attr.first);
}
functor(place, x, y, dy, dx); functor(place, x, y, dy, dx);
} }
}; };
template <typename T>
struct BaseActivationFunctor {
using ELEMENT_TYPE = T;
using AttrPair = std::vector<std::pair<const char*, float*>>;
AttrPair GetAttrs() { return AttrPair(); }
};
// sigmoid(x) = 1 / (1 + exp(-x)) // sigmoid(x) = 1 / (1 + exp(-x))
template <typename T> template <typename T>
struct SigmoidFunctor { struct SigmoidFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y> template <typename Device, typename X, typename Y>
void operator()(Device d, X x, Y y) { void operator()(Device d, X x, Y y) const {
y.device(d) = static_cast<T>(1) / (static_cast<T>(1) + (-x).exp()); y.device(d) = static_cast<T>(1) / (static_cast<T>(1) + (-x).exp());
} }
}; };
template <typename T> template <typename T>
struct SigmoidGradFunctor { struct SigmoidGradFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y, typename dY, typename dX> template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) { void operator()(Device d, X x, Y y, dY dy, dX dx) const {
dx.device(d) = dy * y * (static_cast<T>(1) - y); dx.device(d) = dy * y * (static_cast<T>(1) - y);
} }
}; };
// exp(x) = e^x // exp(x) = e^x
struct ExpFunctor { template <typename T>
struct ExpFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y> template <typename Device, typename X, typename Y>
void operator()(Device d, X x, Y y) { void operator()(Device d, X x, Y y) const {
y.device(d) = x.exp(); y.device(d) = x.exp();
} }
}; };
struct ExpGradFunctor { template <typename T>
struct ExpGradFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y, typename dY, typename dX> template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) { void operator()(Device d, X x, Y y, dY dy, dX dx) const {
dx.device(d) = dy * y; dx.device(d) = dy * y;
} }
}; };
// relu(x) = max(x, 0) // relu(x) = max(x, 0)
template <typename T> template <typename T>
struct ReluFunctor { struct ReluFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y> template <typename Device, typename X, typename Y>
void operator()(Device d, X x, Y y) { void operator()(Device d, X x, Y y) const {
y.device(d) = x.cwiseMax(static_cast<T>(0)); y.device(d) = x.cwiseMax(static_cast<T>(0));
} }
}; };
template <typename T> template <typename T>
struct ReluGradFunctor { struct ReluGradFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y, typename dY, typename dX> template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) { void operator()(Device d, X x, Y y, dY dy, dX dx) const {
dx.device(d) = dy * (x > static_cast<T>(0)).template cast<T>(); dx.device(d) = dy * (x > static_cast<T>(0)).template cast<T>();
} }
}; };
// tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)) // tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
struct TanhFunctor { template <typename T>
struct TanhFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y> template <typename Device, typename X, typename Y>
void operator()(Device d, X x, Y y) { void operator()(Device d, X x, Y y) const {
y.device(d) = x.tanh(); y.device(d) = x.tanh();
} }
}; };
template <typename T> template <typename T>
struct TanhGradFunctor { struct TanhGradFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y, typename dY, typename dX> template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) { void operator()(Device d, X x, Y y, dY dy, dX dx) const {
dx.device(d) = dy * (static_cast<T>(1) - y * y); dx.device(d) = dy * (static_cast<T>(1) - y * y);
} }
}; };
// sqrt(x) = x^(1/2) // sqrt(x) = x^(1/2)
struct SqrtFunctor { template <typename T>
struct SqrtFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y> template <typename Device, typename X, typename Y>
void operator()(Device d, X x, Y y) { void operator()(Device d, X x, Y y) const {
y.device(d) = x.sqrt(); y.device(d) = x.sqrt();
} }
}; };
template <typename T> template <typename T>
struct SqrtGradFunctor { struct SqrtGradFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y, typename dY, typename dX> template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) { void operator()(Device d, X x, Y y, dY dy, dX dx) const {
const Y y_conj = Eigen::numext::conj(y); const Y y_conj = Eigen::numext::conj(y);
dx.device(d) = static_cast<T>(0.5) * dy / y_conj; dx.device(d) = static_cast<T>(0.5) * dy / y_conj;
} }
}; };
// abs(x) = |x| // abs(x) = |x|
struct AbsFunctor { template <typename T>
struct AbsFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y> template <typename Device, typename X, typename Y>
void operator()(Device d, X x, Y y) { void operator()(Device d, X x, Y y) const {
y.device(d) = x.abs(); y.device(d) = x.abs();
} }
}; };
struct AbsGradFunctor { template <typename T>
struct AbsGradFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y, typename dY, typename dX> template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) { void operator()(Device d, X x, Y y, dY dy, dX dx) const {
dx.device(d) = dy * x.sign(); dx.device(d) = dy * x.sign();
} }
}; };
// reciprocal(x) = 1 / x // reciprocal(x) = 1 / x
template <typename T> template <typename T>
struct ReciprocalFunctor { struct ReciprocalFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y> template <typename Device, typename X, typename Y>
void operator()(Device d, X x, Y y) { void operator()(Device d, X x, Y y) const {
y.device(d) = static_cast<T>(1) / x; y.device(d) = static_cast<T>(1) / x;
} }
}; };
template <typename T> template <typename T>
struct ReciprocalGradFunctor { struct ReciprocalGradFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y, typename dY, typename dX> template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) { void operator()(Device d, X x, Y y, dY dy, dX dx) const {
dx.device(d) = dy * static_cast<T>(-1) * y * y; dx.device(d) = dy * static_cast<T>(-1) * y * y;
} }
}; };
// log(x) = natural logarithm of x // log(x) = natural logarithm of x
struct LogFunctor { template <typename T>
struct LogFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y> template <typename Device, typename X, typename Y>
void operator()(Device d, X x, Y y) { void operator()(Device d, X x, Y y) const {
y.device(d) = x.log(); y.device(d) = x.log();
} }
}; };
template <typename T> template <typename T>
struct LogGradFunctor { struct LogGradFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y, typename dY, typename dX> template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) { void operator()(Device d, X x, Y y, dY dy, dX dx) const {
dx.device(d) = dy * (static_cast<T>(1) / x); dx.device(d) = dy * (static_cast<T>(1) / x);
} }
}; };
// square(x) = x^2 // square(x) = x^2
struct SquareFunctor { template <typename T>
struct SquareFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y> template <typename Device, typename X, typename Y>
void operator()(Device d, X x, Y y) { void operator()(Device d, X x, Y y) const {
y.device(d) = x.square(); y.device(d) = x.square();
} }
}; };
template <typename T> template <typename T>
struct SquareGradFunctor { struct SquareGradFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y, typename dY, typename dX> template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) { void operator()(Device d, X x, Y y, dY dy, dX dx) const {
dx.device(d) = dy * static_cast<T>(2) * x; dx.device(d) = dy * static_cast<T>(2) * x;
} }
}; };
template <typename Place, typename T, typename AttrType = T> template <typename T>
class BReluKernel : public framework::OpKernel { struct BReluFunctor : public BaseActivationFunctor<T> {
public: float t_min;
void Compute(const framework::ExecutionContext& context) const override { float t_max;
auto* X = context.Input<framework::Tensor>("X");
auto* Y = context.Output<framework::Tensor>("Y"); // NOTE: Explicit hides the `BaseActivationFunctor<T>::GetAttrs`
auto t_min = static_cast<T>(context.Attr<AttrType>("t_min")); // not polymorphism for speed.
auto t_max = static_cast<T>(context.Attr<AttrType>("t_max")); typename BaseActivationFunctor<T>::AttrPair GetAttrs() {
Y->mutable_data<T>(context.GetPlace()); return {{"t_min", &t_min}, {"t_max", &t_max}};
}
auto x = framework::EigenVector<T>::Flatten(*X); template <typename Device, typename X, typename Y>
auto y = framework::EigenVector<T>::Flatten(*Y); void operator()(Device d, X x, Y y) const {
auto place = context.GetEigenDevice<Place>(); y.device(d) = x.cwiseMax(t_min).cwiseMin(t_max);
y.device(place) = x.cwiseMax(t_min).cwiseMin(t_max);
} }
}; };
template <typename Place, typename T, typename AttrType = T> template <typename T>
class BReluGradKernel : public framework::OpKernel { struct BReluGradFunctor : public BaseActivationFunctor<T> {
public: float t_min;
void Compute(const framework::ExecutionContext& context) const override { float t_max;
auto* X = context.Input<framework::Tensor>("X"); typename BaseActivationFunctor<T>::AttrPair GetAttrs() {
auto* dY = context.Input<framework::Tensor>(framework::GradVarName("Y")); return {{"t_min", &t_min}, {"t_max", &t_max}};
auto* dX = context.Output<framework::Tensor>(framework::GradVarName("X")); }
auto t_min = static_cast<T>(context.Attr<AttrType>("t_min")); template <typename Device, typename X, typename Y, typename dY, typename dX>
auto t_max = static_cast<T>(context.Attr<AttrType>("t_max")); void operator()(Device d, X x, Y y, dY dy, dX dx) const {
dX->mutable_data<T>(context.GetPlace()); dx.device(d) = dy * ((x > t_min) * (x < t_max)).template cast<T>();
}
};
auto dy = framework::EigenVector<T>::Flatten(*dY); // softsign(x) = x / (1 + |x|)
auto x = framework::EigenVector<T>::Flatten(*X); template <typename T>
auto dx = framework::EigenVector<T>::Flatten(*dX); struct SoftsignFunctor : public BaseActivationFunctor<T> {
auto place = context.GetEigenDevice<Place>(); template <typename Device, typename X, typename Y>
void operator()(Device d, X x, Y y) {
y.device(d) = x / (static_cast<T>(1) + x.abs());
}
};
dx.device(place) = dy * ((x > t_min) * (x < t_max)).template cast<T>(); // d(softsign(x))/dx = 1 / (1 + |x|)^2
// Taken from https://en.wikipedia.org/wiki/Activation_function
template <typename T>
struct SoftsignGradFunctor : public BaseActivationFunctor<T> {
template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) {
dx.device(d) =
dy * (static_cast<T>(1) / (static_cast<T>(1) + x.abs()).square());
} }
}; };
template <typename Place, typename T, typename AttrType = T> template <typename T>
class SoftReluKernel : public framework::OpKernel { struct SoftReluFunctor : public BaseActivationFunctor<T> {
public: float threshold;
void Compute(const framework::ExecutionContext& context) const override { typename BaseActivationFunctor<T>::AttrPair GetAttrs() {
auto* X = context.Input<framework::Tensor>("X"); return {{"threshold", &threshold}};
auto* Y = context.Output<framework::Tensor>("Y"); }
auto threshold = static_cast<T>(context.Attr<AttrType>("threshold"));
Y->mutable_data<T>(context.GetPlace());
auto x = framework::EigenVector<T>::Flatten(*X); template <typename Device, typename X, typename Y>
auto y = framework::EigenVector<T>::Flatten(*Y); void operator()(Device d, X x, Y y) const {
auto place = context.GetEigenDevice<Place>(); auto temp = x.cwiseMax(-threshold).cwiseMin(threshold);
auto temp = x.cwiseMax(-threshold).cwiseMin(threshold).eval(); y.device(d) = (static_cast<T>(1) + temp.exp()).log();
y.device(place) = (static_cast<T>(1) + temp.exp()).log();
} }
}; };
template <typename Place, typename T, typename AttrType = T> template <typename T>
class SoftReluGradKernel : public framework::OpKernel { struct SoftReluGradFunctor : public BaseActivationFunctor<T> {
public: float threshold;
void Compute(const framework::ExecutionContext& context) const override { typename BaseActivationFunctor<T>::AttrPair GetAttrs() {
auto* X = context.Input<framework::Tensor>("X"); return {{"threshold", &threshold}};
auto* Y = context.Input<framework::Tensor>("Y"); }
auto* dY = context.Input<framework::Tensor>(framework::GradVarName("Y")); template <typename Device, typename X, typename Y, typename dY, typename dX>
auto* dX = context.Output<framework::Tensor>(framework::GradVarName("X")); void operator()(Device d, X x, Y y, dY dy, dX dx) const {
auto threshold = static_cast<T>(context.Attr<AttrType>("threshold"));
dX->mutable_data<T>(context.GetPlace());
auto x = framework::EigenVector<T>::Flatten(*X);
auto y = framework::EigenVector<T>::Flatten(*Y);
auto dy = framework::EigenVector<T>::Flatten(*dY);
auto dx = framework::EigenVector<T>::Flatten(*dX);
auto place = context.GetEigenDevice<Place>();
auto temp = ((x > -threshold) * (x < threshold)).template cast<T>().eval(); auto temp = ((x > -threshold) * (x < threshold)).template cast<T>().eval();
dx.device(place) = dy * (static_cast<T>(1) - (-y).exp()) * temp; dx.device(d) = dy * (static_cast<T>(1) - (-y).exp()) * temp;
} }
}; };
template <typename Place, typename T, typename AttrType = T> template <typename T>
class PowKernel : public framework::OpKernel { struct PowFunctor : public BaseActivationFunctor<T> {
public: float factor;
void Compute(const framework::ExecutionContext& context) const override { typename BaseActivationFunctor<T>::AttrPair GetAttrs() {
auto* X = context.Input<framework::Tensor>("X"); return {{"factor", &factor}};
auto* Y = context.Output<framework::Tensor>("Y"); }
auto factor = static_cast<T>(context.Attr<AttrType>("factor")); template <typename Device, typename X, typename Y>
Y->mutable_data<T>(context.GetPlace()); void operator()(Device d, X x, Y y) const {
y.device(d) = x.pow(factor);
auto x = framework::EigenVector<T>::Flatten(*X);
auto y = framework::EigenVector<T>::Flatten(*Y);
auto place = context.GetEigenDevice<Place>();
y.device(place) = x.pow(factor);
} }
}; };
template <typename Place, typename T, typename AttrType = T> template <typename T>
class PowGradKernel : public framework::OpKernel { struct PowGradFunctor : public BaseActivationFunctor<T> {
public: float factor;
void Compute(const framework::ExecutionContext& context) const override { typename BaseActivationFunctor<T>::AttrPair GetAttrs() {
auto* X = context.Input<framework::Tensor>("X"); return {{"factor", &factor}};
auto* dY = context.Input<framework::Tensor>(framework::GradVarName("Y")); }
auto* dX = context.Output<framework::Tensor>(framework::GradVarName("X")); template <typename Device, typename X, typename Y, typename dY, typename dX>
auto factor = static_cast<T>(context.Attr<AttrType>("factor")); void operator()(Device d, X x, Y y, dY dy, dX dx) const {
dX->mutable_data<T>(context.GetPlace()); dx.device(d) = dy * factor * x.pow(factor - static_cast<T>(1));
auto dy = framework::EigenVector<T>::Flatten(*dY);
auto x = framework::EigenVector<T>::Flatten(*X);
auto dx = framework::EigenVector<T>::Flatten(*dX);
auto place = context.GetEigenDevice<Place>();
dx.device(place) = dy * factor * x.pow(factor - static_cast<T>(1));
} }
}; };
template <typename Place, typename T, typename AttrType = T> template <typename T>
class STanhKernel : public framework::OpKernel { struct STanhFunctor : public BaseActivationFunctor<T> {
public: float scale_a;
void Compute(const framework::ExecutionContext& context) const override { float scale_b;
auto* X = context.Input<framework::Tensor>("X"); typename BaseActivationFunctor<T>::AttrPair GetAttrs() {
auto* Y = context.Output<framework::Tensor>("Y"); return {{"scale_a", &scale_a}, {"scale_b", &scale_b}};
auto scale_a = static_cast<T>(context.Attr<AttrType>("scale_a")); }
auto scale_b = static_cast<T>(context.Attr<AttrType>("scale_b"));
Y->mutable_data<T>(context.GetPlace());
auto x = framework::EigenVector<T>::Flatten(*X); template <typename Device, typename X, typename Y>
auto y = framework::EigenVector<T>::Flatten(*Y); void operator()(Device d, X x, Y y) const {
auto place = context.GetEigenDevice<Place>(); y.device(d) = scale_b * (scale_a * x).tanh();
y.device(place) = scale_b * (scale_a * x).tanh();
} }
}; };
template <typename Place, typename T, typename AttrType = T> template <typename T>
class STanhGradKernel : public framework::OpKernel { struct STanhGradFunctor : public BaseActivationFunctor<T> {
public: float scale_a;
void Compute(const framework::ExecutionContext& context) const override { float scale_b;
auto* X = context.Input<framework::Tensor>("X"); typename BaseActivationFunctor<T>::AttrPair GetAttrs() {
auto* dY = context.Input<framework::Tensor>(framework::GradVarName("Y")); return {{"scale_a", &scale_a}, {"scale_b", &scale_b}};
auto* dX = context.Output<framework::Tensor>(framework::GradVarName("X")); }
auto scale_a = static_cast<T>(context.Attr<AttrType>("scale_a"));
auto scale_b = static_cast<T>(context.Attr<AttrType>("scale_b"));
dX->mutable_data<T>(context.GetPlace());
auto dy = framework::EigenVector<T>::Flatten(*dY);
auto x = framework::EigenVector<T>::Flatten(*X);
auto dx = framework::EigenVector<T>::Flatten(*dX);
auto place = context.GetEigenDevice<Place>();
template <typename Device, typename X, typename Y, typename dY, typename dX>
void operator()(Device d, X x, Y y, dY dy, dX dx) const {
auto temp = (scale_a * x).tanh() * (scale_a * x).tanh(); auto temp = (scale_a * x).tanh() * (scale_a * x).tanh();
dx.device(place) = dy * scale_a * scale_b * (static_cast<T>(1) - temp); dx.device(d) = dy * scale_a * scale_b * (static_cast<T>(1) - temp);
} }
}; };
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
#define FOR_EACH_KERNEL_FUNCTOR(__macro) \
__macro(sigmoid, SigmoidFunctor, SigmoidGradFunctor); \
__macro(exp, ExpFunctor, ExpGradFunctor); \
__macro(relu, ReluFunctor, ReluGradFunctor); \
__macro(tanh, TanhFunctor, TanhGradFunctor); \
__macro(sqrt, SqrtFunctor, SqrtGradFunctor); \
__macro(abs, AbsFunctor, AbsGradFunctor); \
__macro(reciprocal, ReciprocalFunctor, ReciprocalGradFunctor); \
__macro(log, LogFunctor, LogGradFunctor); \
__macro(square, SquareFunctor, SquareGradFunctor); \
__macro(brelu, BReluFunctor, BReluGradFunctor); \
__macro(soft_relu, SoftReluFunctor, SoftReluGradFunctor); \
__macro(pow, PowFunctor, PowGradFunctor); \
__macro(stanh, STanhFunctor, STanhGradFunctor); \
__macro(softsign, SoftsignFunctor, SoftsignGradFunctor)
...@@ -25,7 +25,7 @@ template <typename T, int MajorType = Eigen::RowMajor, ...@@ -25,7 +25,7 @@ template <typename T, int MajorType = Eigen::RowMajor,
using EigenVector = framework::EigenVector<T, MajorType, IndexType>; using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
template <typename Place, typename T> template <typename Place, typename T>
class AddKernel : public framework::OpKernel { class AddKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* input0 = context.Input<Tensor>("X"); auto* input0 = context.Input<Tensor>("X");
......
...@@ -56,7 +56,7 @@ class ClipGradFunctor { ...@@ -56,7 +56,7 @@ class ClipGradFunctor {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class ClipKernel : public framework::OpKernel { class ClipKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto max = context.Attr<T>("max"); auto max = context.Attr<T>("max");
...@@ -73,7 +73,7 @@ class ClipKernel : public framework::OpKernel { ...@@ -73,7 +73,7 @@ class ClipKernel : public framework::OpKernel {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class ClipGradKernel : public framework::OpKernel { class ClipGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto max = context.Attr<T>("max"); auto max = context.Attr<T>("max");
......
...@@ -22,7 +22,7 @@ namespace paddle { ...@@ -22,7 +22,7 @@ namespace paddle {
namespace operators { namespace operators {
template <typename Place, typename T> template <typename Place, typename T>
class ConcatKernel : public framework::OpKernel { class ConcatKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
auto ins = ctx.MultiInput<framework::Tensor>("X"); auto ins = ctx.MultiInput<framework::Tensor>("X");
...@@ -44,7 +44,7 @@ class ConcatKernel : public framework::OpKernel { ...@@ -44,7 +44,7 @@ class ConcatKernel : public framework::OpKernel {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class ConcatGradKernel : public framework::OpKernel { class ConcatGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const { void Compute(const framework::ExecutionContext& ctx) const {
auto* in = ctx.Input<framework::Tensor>(framework::GradVarName("Out")); auto* in = ctx.Input<framework::Tensor>(framework::GradVarName("Out"));
......
...@@ -14,12 +14,7 @@ limitations under the License. */ ...@@ -14,12 +14,7 @@ limitations under the License. */
#include "paddle/operators/cond_op.h" #include "paddle/operators/cond_op.h"
#include <cstring>
#include <sstream>
#include "paddle/framework/op_registry.h"
#include "paddle/operators/gather.h" #include "paddle/operators/gather.h"
#include "paddle/operators/net_op.h"
#include "paddle/operators/scatter.h" #include "paddle/operators/scatter.h"
namespace paddle { namespace paddle {
...@@ -31,142 +26,104 @@ using Tensor = framework::Tensor; ...@@ -31,142 +26,104 @@ using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor; using LoDTensor = framework::LoDTensor;
using DDim = framework::DDim; using DDim = framework::DDim;
void CondOp::CreateScope(const Scope& scope) const { framework::Scope& CondOp::AddSubScope(const Scope& scope) const {
auto sub_scopes_var = scope.FindVar("SubScopes"); auto sub_scopes_var = scope.FindVar("SubScopes");
PADDLE_ENFORCE_NOT_NULL(sub_scopes_var, PADDLE_ENFORCE_NOT_NULL(sub_scopes_var,
"Output(SubScopes) of CondOp should not be null."); "Output(SubScopes) of CondOp should not be null.");
auto sub_scopes = sub_scopes_var->GetMutable<std::vector<Scope*>>(); auto sub_scopes = sub_scopes_var->GetMutable<std::vector<Scope*>>();
auto& sub_scope = scope.NewScope(); auto& sub_scope = scope.NewScope();
sub_scopes->push_back(&sub_scope); sub_scopes->push_back(&sub_scope);
return sub_scope;
} }
void CondOp::CreateIndexTensor(const Scope& scope) const { std::vector<framework::Scope*>& CondOp::GetSubScopes(
const framework::Scope& scope) const {
auto sub_scopes_var = scope.FindVar("SubScopes");
PADDLE_ENFORCE_NOT_NULL(sub_scopes_var,
"Output(SubScopes) of CondOp should not be null.");
return *sub_scopes_var->GetMutable<std::vector<framework::Scope*>>();
}
LoDTensor& CondOp::AddIndexTensor(const Scope& scope) const {
auto index_tensors_var = scope.FindVar("IndexTensors"); auto index_tensors_var = scope.FindVar("IndexTensors");
PADDLE_ENFORCE_NOT_NULL(index_tensors_var, PADDLE_ENFORCE_NOT_NULL(index_tensors_var,
"Output(IndexTensors) of CondOp should not be null."); "Output(IndexTensors) of CondOp should not be null.");
auto& index_tensors = auto& index_tensors =
*index_tensors_var->GetMutable<std::vector<LoDTensor>>(); *index_tensors_var->GetMutable<std::vector<LoDTensor>>();
index_tensors.push_back(LoDTensor()); index_tensors.push_back(LoDTensor());
return index_tensors.back();
} }
void CondOp::InferShape(const Scope& scope) const { std::vector<framework::LoDTensor>& CondOp::GetIndexTensors(
auto sub_scopes_var = scope.FindVar("SubScopes"); const framework::Scope& scope) const {
PADDLE_ENFORCE_NOT_NULL(sub_scopes_var,
"Output(SubScopes) of CondOp should not be null.");
auto& sub_scopes = *sub_scopes_var->GetMutable<std::vector<Scope*>>();
for (int i = 0; i < 2; ++i) {
// Create two sub scopes for true and false branches
// sub_scopes[0] for the true branch and sub_scopes[1] for the false
// branch
CreateScope(scope);
// Create two tensors for true and false indices
// index_tensors[0] for the true branch and index_tensors[1] for the false
// branch
CreateIndexTensor(scope);
PADDLE_ENFORCE(!Inputs("Xs").empty(),
"Inputs(Xs) of CondOp can't be empty.");
for (auto& input : Inputs("Xs")) {
// Create a new tensor in sub-scope for input-type tensor
Variable* v = sub_scopes[i]->NewVar(input);
LoDTensor* sub_input = v->GetMutable<LoDTensor>();
sub_input->Resize(scope.FindVar(input)->GetMutable<LoDTensor>()->dims());
}
for (auto& output : (*sub_net_op_[i]).Outputs()) {
for (auto& var_name : output.second) {
sub_scopes[i]->NewVar(var_name);
}
}
// each net calls InferShape
sub_net_op_[i]->InferShape(*sub_scopes[i]);
}
for (auto& output : Outputs("Outs")) {
LoDTensor* tensor_t_out =
sub_scopes[0]->FindVar(output)->GetMutable<LoDTensor>();
PADDLE_ENFORCE_NOT_NULL(tensor_t_out, "True output should not be NULL");
LoDTensor* tensor_f_out =
sub_scopes[1]->FindVar(output)->GetMutable<LoDTensor>();
PADDLE_ENFORCE_NOT_NULL(tensor_f_out, "False output should not be NULL");
auto* tensor_out_var = scope.FindVar(output);
PADDLE_ENFORCE_NOT_NULL(tensor_out_var, "Output not found");
LoDTensor* tensor_out = tensor_out_var->GetMutable<LoDTensor>();
PADDLE_ENFORCE_NOT_NULL(tensor_t_out,
"True output tensor should not be NULL");
// check output size should be same
PADDLE_ENFORCE_EQ(tensor_t_out->dims(), tensor_f_out->dims(),
"Outputs not of the same shape");
tensor_out->Resize(tensor_t_out->dims());
// tensor_out->mutable_data<float>(tensor_out->dims(),
// platform::CPUPlace());
tensor_out->mutable_data<float>(platform::CPUPlace());
}
}
void CondOp::Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const {
auto* sub_scopes_var = scope.FindVar("SubScopes");
PADDLE_ENFORCE_NOT_NULL(sub_scopes_var,
"Output(SubScopes) of CondOp should not be null.");
auto sub_scopes = sub_scopes_var->Get<std::vector<Scope*>>();
auto* index_tensors_var = scope.FindVar("IndexTensors"); auto* index_tensors_var = scope.FindVar("IndexTensors");
PADDLE_ENFORCE_NOT_NULL(index_tensors_var, PADDLE_ENFORCE_NOT_NULL(index_tensors_var,
"Output(IndexTensors) of CondOp should not be null."); "Output(IndexTensors) of CondOp should not be null.");
auto index_tensors = index_tensors_var->Get<std::vector<LoDTensor>>(); return *index_tensors_var->GetMutable<std::vector<framework::LoDTensor>>();
}
std::string cond_name = Input("Cond"); void CondOp::PrepareDataForSubnet(
Variable* cond_var = scope.FindVar(cond_name); const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const {
PADDLE_ENFORCE(!Inputs("Xs").empty(), "Inputs(Xs) of CondOp can't be empty.");
for (int i = 0; i < BRANCH_NUM; ++i) {
// Create two sub scopes for true and false branches
// sub_scopes[0] for the true branch
// sub_scopes[1] for the false branch
AddSubScope(scope);
// Create two tensors for true and false indices:
// index_tensors[0] for the true branch
// index_tensors[1] for the false branch
AddIndexTensor(scope);
}
Variable* cond_var = scope.FindVar(Input("Cond"));
PADDLE_ENFORCE_NOT_NULL(cond_var, PADDLE_ENFORCE_NOT_NULL(cond_var,
"Input(Cond) of CondOp should not be null."); "Input(Cond) of CondOp should not be null.");
const LoDTensor* cond = cond_var->GetMutable<LoDTensor>(); const LoDTensor* cond = cond_var->GetMutable<LoDTensor>();
// Step 1: get the true/false index at runtime // get the true/false index at runtime according to cond tensor
// index_[0]: vector<int>, contains all index for cond[i] == true // index_vectors[0]: vector<int>, contains all index for cond[i] == true
// index_[1]: vector<int>, contains all index for cond[i] == false // index_vectors[1]: vector<int>, contains all index for cond[i] == false
for (int i = 0; i < 2; ++i) index_[i].clear(); std::vector<std::vector<int>> index_vectors;
index_vectors.resize(BRANCH_NUM);
const int* cond_data = cond->data<int>(); const int* cond_data = cond->data<int>();
for (int i = 0; i < cond->dims()[0]; ++i) { for (int i = 0; i < cond->dims()[0]; ++i) {
if (cond_data[i]) if (cond_data[i])
index_[0].push_back(i); index_vectors[TRUE_BRANCH].push_back(i);
else else
index_[1].push_back(i); index_vectors[FALSE_BRANCH].push_back(i);
} }
// put index_[0] and index_[1] into two tensors: // put index_vectors[0] and index_vectors[1] into two tensors:
// index_tensor_[0] and index_tensor_[1] // index_tensors[0] and index_tensors[1]
DDim dim = paddle::framework::make_ddim({0}); std::vector<framework::LoDTensor>& index_tensors = GetIndexTensors(scope);
for (int i = 0; i < 2; ++i) { std::vector<framework::Scope*>& sub_scopes = GetSubScopes(scope);
dim[0] = index_[i].size();
int* tmp_ptr = for (int i = 0; i < BRANCH_NUM; ++i) {
DDim dim = {static_cast<int64_t>(index_vectors[i].size())};
int* index_tensor_data_ptr =
index_tensors[i].mutable_data<int>(dim, platform::CPUPlace()); index_tensors[i].mutable_data<int>(dim, platform::CPUPlace());
index_tensors[i].Resize(dim); memcpy(index_tensor_data_ptr, index_vectors[i].data(),
memcpy(tmp_ptr, index_[i].data(), dim[0] * sizeof(int)); dim[0] * sizeof(int));
} }
// Step 2: collect data by calling gather // create input in subscopes according to index_vectors
for (int i = 0; i < 2; ++i) { for (auto& input : Inputs("Xs")) {
// i= 0/i for True and False branches respectively Variable* var_parent = scope.FindVar(input);
for (auto& input : Inputs("Xs")) { PADDLE_ENFORCE_NOT_NULL(var_parent);
// find Tensor const auto* tensor_parent = &var_parent->Get<LoDTensor>();
Variable* v = scope.FindVar(input);
PADDLE_ENFORCE_NOT_NULL(v);
LoDTensor* tensor_parent = v->GetMutable<LoDTensor>();
v = sub_scopes[i]->FindVar(input); for (int i = 0; i < BRANCH_NUM; ++i) {
PADDLE_ENFORCE_NOT_NULL(v); Variable* var_child = sub_scopes[i]->FindVar(input);
LoDTensor* tensor_child = v->GetMutable<LoDTensor>(); PADDLE_ENFORCE_NOT_NULL(var_child);
auto* tensor_child = var_child->GetMutable<LoDTensor>();
// Resize child // Resize child
DDim dim = tensor_child->dims(); DDim dim = tensor_parent->dims();
dim[0] = index_[i].size(); dim[0] = index_tensors[i].dims()[0];
tensor_child->Resize(dim);
tensor_child->mutable_data<float>(dim, platform::CPUPlace()); tensor_child->mutable_data<float>(dim, platform::CPUPlace());
Gather<float>(dev_ctx.GetPlace(), tensor_parent, &index_tensors[i], Gather<float>(dev_ctx.GetPlace(), tensor_parent, &index_tensors[i],
...@@ -174,32 +131,79 @@ void CondOp::Run(const Scope& scope, ...@@ -174,32 +131,79 @@ void CondOp::Run(const Scope& scope,
} }
} }
// Step 3: run // create output_tensors in subscope for sub_net
for (int i = 0; i < 2; ++i) { for (int i = 0; i < BRANCH_NUM; ++i) {
sub_net_op_[i]->Run(*sub_scopes[i], dev_ctx); for (auto& output : (*sub_net_op_[i]).Outputs()) {
for (auto& var_name : output.second) {
sub_scopes[i]->NewVar(var_name);
}
}
} }
}
// Step 4: merge output results void CondOp::MergeDataFromSubnet(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const {
std::vector<framework::Scope*>& sub_scopes = GetSubScopes(scope);
const std::vector<framework::LoDTensor>& index_tensors =
GetIndexTensors(scope);
// Infer the output dim, out_dim[0] = true_dim[0] + false_dim[0]
PADDLE_ENFORCE(!Outputs("Outs").empty(), PADDLE_ENFORCE(!Outputs("Outs").empty(),
"Outputs(Outs) of CondOp can't be empty."); "Outputs(Outs) of CondOp can't be empty.");
for (int i = 0; i < 2; ++i) { for (auto& output : Outputs("Outs")) {
// i= 0/i for True and False branches respectively const LoDTensor* tensor_t_out =
for (auto& output : Outputs("Outs")) { &sub_scopes[TRUE_BRANCH]->FindVar(output)->Get<LoDTensor>();
// find Tensor PADDLE_ENFORCE_NOT_NULL(tensor_t_out, "True output should not be NULL");
Variable* v = scope.FindVar(output); const LoDTensor* tensor_f_out =
PADDLE_ENFORCE_NOT_NULL(v); &sub_scopes[FALSE_BRANCH]->FindVar(output)->Get<LoDTensor>();
LoDTensor* tensor_parent = v->GetMutable<LoDTensor>(); PADDLE_ENFORCE_NOT_NULL(tensor_f_out, "False output should not be NULL");
v = sub_scopes[i]->FindVar(output); auto* var_out = scope.FindVar(output);
PADDLE_ENFORCE_NOT_NULL(v); PADDLE_ENFORCE_NOT_NULL(var_out, "Output not found");
LoDTensor* tensor_child = v->GetMutable<LoDTensor>(); LoDTensor* tensor_out = var_out->GetMutable<LoDTensor>();
PADDLE_ENFORCE_NOT_NULL(tensor_t_out,
"True output tensor should not be NULL");
DDim true_dim = tensor_t_out->dims();
DDim false_dim = tensor_f_out->dims();
true_dim[0] = 0;
false_dim[0] = 0;
PADDLE_ENFORCE_EQ(true_dim, false_dim,
"Outputs not of the same shape except the first dim");
DDim out_dim = tensor_t_out->dims();
out_dim[0] = tensor_t_out->dims()[0] + tensor_f_out->dims()[0];
tensor_out->Resize(out_dim);
tensor_out->mutable_data<float>(platform::CPUPlace());
}
// merge output results:
// output_tensor = true_output_tensor + false_output_tensor
for (auto& output : Outputs("Outs")) {
Variable* var_parent = scope.FindVar(output);
PADDLE_ENFORCE_NOT_NULL(var_parent);
auto* tensor_parent = var_parent->GetMutable<LoDTensor>();
for (int i = 0; i < BRANCH_NUM; ++i) {
Variable* var_child = sub_scopes[i]->FindVar(output);
PADDLE_ENFORCE_NOT_NULL(var_child);
auto* tensor_child = &var_child->Get<LoDTensor>();
ScatterUpdate<float>(dev_ctx.GetPlace(), tensor_child, &index_tensors[i], ScatterUpdate<float>(dev_ctx.GetPlace(), tensor_child, &index_tensors[i],
tensor_parent); tensor_parent);
} }
} }
} }
void CondOp::Run(const Scope& scope,
const platform::DeviceContext& dev_ctx) const {
PrepareDataForSubnet(scope, dev_ctx);
std::vector<framework::Scope*>& sub_scopes = GetSubScopes(scope);
for (int i = 0; i < BRANCH_NUM; ++i) {
sub_net_op_[i]->Run(*sub_scopes[i], dev_ctx);
}
MergeDataFromSubnet(scope, dev_ctx);
}
class CondOpProtoAndCheckerMaker : public framework::OpProtoAndCheckerMaker { class CondOpProtoAndCheckerMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CondOpProtoAndCheckerMaker(framework::OpProto* proto, CondOpProtoAndCheckerMaker(framework::OpProto* proto,
......
...@@ -40,8 +40,7 @@ class CondOp : public framework::OperatorBase { ...@@ -40,8 +40,7 @@ class CondOp : public framework::OperatorBase {
const framework::VariableNameMap& outputs, const framework::VariableNameMap& outputs,
const framework::AttributeMap& attrs) const framework::AttributeMap& attrs)
: OperatorBase(type, inputs, outputs, attrs) { : OperatorBase(type, inputs, outputs, attrs) {
index_.resize(2); sub_net_op_.resize(BRANCH_NUM);
sub_net_op_.resize(2);
} }
CondOp(const CondOp& o) CondOp(const CondOp& o)
...@@ -51,40 +50,44 @@ class CondOp : public framework::OperatorBase { ...@@ -51,40 +50,44 @@ class CondOp : public framework::OperatorBase {
PADDLE_THROW("Not implemented"); PADDLE_THROW("Not implemented");
} }
void CreateScope(const framework::Scope& scope) const; framework::Scope& AddSubScope(const framework::Scope& scope) const;
std::vector<framework::Scope*>& GetSubScopes(
const framework::Scope& scope) const;
void CreateIndexTensor(const framework::Scope& scope) const; framework::LoDTensor& AddIndexTensor(const framework::Scope& scope) const;
std::vector<framework::LoDTensor>& GetIndexTensors(
const framework::Scope& scope) const;
/* void PrepareDataForSubnet(const framework::Scope& scope,
* InferShape must be called before Run. const platform::DeviceContext& dev_ctx) const;
*/ void MergeDataFromSubnet(const framework::Scope& scope,
void InferShape(const framework::Scope& scope) const override; const platform::DeviceContext& dev_ctx) const;
/* /*
* Set True Block * Set True Block
*/ */
void set_truenet(std::unique_ptr<OperatorBase>&& net) { void set_truenet(std::unique_ptr<OperatorBase>&& net) {
sub_net_op_[0] = std::move(net); sub_net_op_[TRUE_BRANCH] = std::move(net);
} }
/* /*
* Set False Block * Set False Block
*/ */
void set_falsenet(std::unique_ptr<OperatorBase>&& net) { void set_falsenet(std::unique_ptr<OperatorBase>&& net) {
sub_net_op_[1] = std::move(net); sub_net_op_[FALSE_BRANCH] = std::move(net);
} }
void Run(const framework::Scope& scope, void Run(const framework::Scope& scope,
const platform::DeviceContext& dev_ctx) const override; const platform::DeviceContext& dev_ctx) const override;
private: private:
const int TRUE_BRANCH = 0;
const int FALSE_BRANCH = 1;
const int BRANCH_NUM = 2;
// sub_net_op_[0]: subnet_t // sub_net_op_[0]: subnet_t
// sub_net_op_[1]: subnet_f // sub_net_op_[1]: subnet_f
std::vector<std::unique_ptr<framework::OperatorBase>> sub_net_op_; std::vector<std::unique_ptr<framework::OperatorBase>> sub_net_op_;
// index_[0]: True_index;
// index_[1]: False_index;
mutable std::vector<std::vector<int>> index_;
}; };
} // namespace operators } // namespace operators
......
...@@ -28,7 +28,7 @@ template <typename T, int MajorType = Eigen::RowMajor, ...@@ -28,7 +28,7 @@ template <typename T, int MajorType = Eigen::RowMajor,
using EigenVector = framework::EigenVector<T, MajorType, IndexType>; using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
template <typename Place, typename T> template <typename Place, typename T>
class CosSimKernel : public framework::OpKernel { class CosSimKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
// get Tensor // get Tensor
...@@ -67,7 +67,7 @@ class CosSimKernel : public framework::OpKernel { ...@@ -67,7 +67,7 @@ class CosSimKernel : public framework::OpKernel {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class CosSimGradKernel : public framework::OpKernel { class CosSimGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
// get Tensor // get Tensor
......
...@@ -27,7 +27,7 @@ using EigenTensor = framework::EigenTensor<T, D, MajorType, IndexType>; ...@@ -27,7 +27,7 @@ using EigenTensor = framework::EigenTensor<T, D, MajorType, IndexType>;
using framework::Tensor; using framework::Tensor;
template <typename T> template <typename T>
class CropKernel : public framework::OpKernel { class CropKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* x = context.Input<Tensor>("X"); auto* x = context.Input<Tensor>("X");
...@@ -69,7 +69,7 @@ void CropGradFunction(const framework::ExecutionContext& context) { ...@@ -69,7 +69,7 @@ void CropGradFunction(const framework::ExecutionContext& context) {
} }
template <typename Place, typename T> template <typename Place, typename T>
class CropGradKernel : public framework::OpKernel { class CropGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
size_t rank = size_t rank =
......
...@@ -47,6 +47,12 @@ class CrossEntropyOp : public framework::OperatorWithKernel { ...@@ -47,6 +47,12 @@ class CrossEntropyOp : public framework::OperatorWithKernel {
ctx->SetOutputDim("Y", {x_dims[0], 1}); ctx->SetOutputDim("Y", {x_dims[0], 1});
ctx->ShareLoD("X", /*->*/ "Y"); ctx->ShareLoD("X", /*->*/ "Y");
} }
// CrossEntropy's data type just determined by "X"
framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("X")->type());
}
}; };
class CrossEntropyGradientOp : public framework::OperatorWithKernel { class CrossEntropyGradientOp : public framework::OperatorWithKernel {
...@@ -87,6 +93,12 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel { ...@@ -87,6 +93,12 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
} }
ctx->SetOutputDim(framework::GradVarName("X"), x_dims); ctx->SetOutputDim(framework::GradVarName("X"), x_dims);
} }
// CrossEntropy's data type just determined by "X"
framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("X")->type());
}
}; };
class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker { class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker {
......
...@@ -18,14 +18,6 @@ namespace paddle { ...@@ -18,14 +18,6 @@ namespace paddle {
namespace operators { namespace operators {
namespace { namespace {
// TODO(qingqing): make zero setting a common function.
template <typename T>
__global__ void Zero(T* X, const int N) {
for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N;
i += blockDim.x * gridDim.x) {
X[i] = 0.0;
}
}
template <typename T> template <typename T>
__global__ void CrossEntropyGradientKernel(T* dX, const T* dY, const T* X, __global__ void CrossEntropyGradientKernel(T* dX, const T* dY, const T* X,
...@@ -53,7 +45,7 @@ __global__ void SoftCrossEntropyGradientKernel(T* dX, const T* dY, const T* X, ...@@ -53,7 +45,7 @@ __global__ void SoftCrossEntropyGradientKernel(T* dX, const T* dY, const T* X,
} // namespace } // namespace
template <typename T> template <typename T>
class CrossEntropyOpCUDAKernel : public framework::OpKernel { class CrossEntropyOpCUDAKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
...@@ -64,12 +56,12 @@ class CrossEntropyOpCUDAKernel : public framework::OpKernel { ...@@ -64,12 +56,12 @@ class CrossEntropyOpCUDAKernel : public framework::OpKernel {
y->mutable_data<T>(ctx.GetPlace()); y->mutable_data<T>(ctx.GetPlace());
math::CrossEntropyFunctor<platform::GPUPlace, T>()( math::CrossEntropyFunctor<platform::GPUPlace, T>()(
ctx, y, x, label, ctx.Attr<bool>("softLabel")); ctx.device_context(), y, x, label, ctx.Attr<bool>("softLabel"));
} }
}; };
template <typename T> template <typename T>
class CrossEntropyGradientOpCUDAKernel : public framework::OpKernel { class CrossEntropyGradientOpCUDAKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
...@@ -99,11 +91,7 @@ class CrossEntropyGradientOpCUDAKernel : public framework::OpKernel { ...@@ -99,11 +91,7 @@ class CrossEntropyGradientOpCUDAKernel : public framework::OpKernel {
.stream()>>>(dx_data, dy_data, x_data, label_data, .stream()>>>(dx_data, dy_data, x_data, label_data,
batch_size, class_num); batch_size, class_num);
} else { } else {
Zero<T><<<grid, block, 0, math::SetConstant<platform::GPUPlace, T>(ctx.device_context(), dx, 0);
reinterpret_cast<const platform::CUDADeviceContext&>(
ctx.device_context())
.stream()>>>(dx_data, batch_size * class_num);
auto* label_data = label->data<int>(); auto* label_data = label->data<int>();
grid = (batch_size + block - 1) / block; grid = (batch_size + block - 1) / block;
CrossEntropyGradientKernel<T><<< CrossEntropyGradientKernel<T><<<
......
...@@ -16,6 +16,7 @@ limitations under the License. */ ...@@ -16,6 +16,7 @@ limitations under the License. */
#include "paddle/framework/eigen.h" #include "paddle/framework/eigen.h"
#include "paddle/framework/op_registry.h" #include "paddle/framework/op_registry.h"
#include "paddle/operators/math/cross_entropy.h" #include "paddle/operators/math/cross_entropy.h"
#include "paddle/operators/math/math_function.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
...@@ -26,7 +27,7 @@ template <typename T, int MajorType = Eigen::RowMajor, ...@@ -26,7 +27,7 @@ template <typename T, int MajorType = Eigen::RowMajor,
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>; using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename T> template <typename T>
class CrossEntropyOpKernel : public framework::OpKernel { class CrossEntropyOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()),
...@@ -37,12 +38,12 @@ class CrossEntropyOpKernel : public framework::OpKernel { ...@@ -37,12 +38,12 @@ class CrossEntropyOpKernel : public framework::OpKernel {
y->mutable_data<T>(ctx.GetPlace()); y->mutable_data<T>(ctx.GetPlace());
math::CrossEntropyFunctor<platform::CPUPlace, T>()( math::CrossEntropyFunctor<platform::CPUPlace, T>()(
ctx, y, x, labels, ctx.Attr<bool>("softLabel")); ctx.device_context(), y, x, labels, ctx.Attr<bool>("softLabel"));
} }
}; };
template <typename T> template <typename T>
class CrossEntropyGradientOpKernel : public framework::OpKernel { class CrossEntropyGradientOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()),
...@@ -69,8 +70,7 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel { ...@@ -69,8 +70,7 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel {
const T* x_data = x->data<T>(); const T* x_data = x->data<T>();
const int* label_data = label->data<int>(); const int* label_data = label->data<int>();
// TODO(qingqing): make zero setting a common function. math::SetConstant<platform::CPUPlace, T>(ctx.device_context(), dx, 0);
memset(dx_data, 0, sizeof(T) * batch_size * class_num);
for (int i = 0; i < batch_size; ++i) { for (int i = 0; i < batch_size; ++i) {
PADDLE_ASSERT(label_data[i] >= 0 || label_data[i] < class_num); PADDLE_ASSERT(label_data[i] >= 0 || label_data[i] < class_num);
......
...@@ -47,7 +47,7 @@ struct MaskGenerator { ...@@ -47,7 +47,7 @@ struct MaskGenerator {
// Use std::random and thrust::random(thrust is a std library in CUDA) to // Use std::random and thrust::random(thrust is a std library in CUDA) to
// implement uniform random. // implement uniform random.
template <typename Place, typename T, typename AttrType> template <typename Place, typename T, typename AttrType>
class GPUDropoutKernel : public framework::OpKernel { class GPUDropoutKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* x = context.Input<Tensor>("X"); auto* x = context.Input<Tensor>("X");
......
...@@ -26,7 +26,7 @@ template <typename T, int MajorType = Eigen::RowMajor, ...@@ -26,7 +26,7 @@ template <typename T, int MajorType = Eigen::RowMajor,
using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>; using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename Place, typename T, typename AttrType> template <typename Place, typename T, typename AttrType>
class CPUDropoutKernel : public framework::OpKernel { class CPUDropoutKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* x = context.Input<Tensor>("X"); auto* x = context.Input<Tensor>("X");
...@@ -62,7 +62,7 @@ class CPUDropoutKernel : public framework::OpKernel { ...@@ -62,7 +62,7 @@ class CPUDropoutKernel : public framework::OpKernel {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class DropoutGradKernel : public framework::OpKernel { class DropoutGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
PADDLE_ENFORCE(context.Attr<bool>("is_training"), PADDLE_ENFORCE(context.Attr<bool>("is_training"),
......
...@@ -20,7 +20,7 @@ namespace paddle { ...@@ -20,7 +20,7 @@ namespace paddle {
namespace operators { namespace operators {
template <typename Place, typename T> template <typename Place, typename T>
class ElementwiseAddKernel : public framework::OpKernel { class ElementwiseAddKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
ElementwiseCompute<EigenAddFunctor, Place, T>(ctx); ElementwiseCompute<EigenAddFunctor, Place, T>(ctx);
...@@ -101,7 +101,7 @@ struct ElementwiseAddBroadCast2GradFunctor { ...@@ -101,7 +101,7 @@ struct ElementwiseAddBroadCast2GradFunctor {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class ElementwiseAddGradKernel : public framework::OpKernel { class ElementwiseAddGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
ElementwiseGradCompute<Place, T, ElementwiseAddGradFunctor<T>, ElementwiseGradCompute<Place, T, ElementwiseAddGradFunctor<T>,
......
...@@ -20,7 +20,7 @@ namespace paddle { ...@@ -20,7 +20,7 @@ namespace paddle {
namespace operators { namespace operators {
template <typename Place, typename T> template <typename Place, typename T>
class ElementwiseDivKernel : public framework::OpKernel { class ElementwiseDivKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
ElementwiseCompute<EigenDivFunctor, Place, T>(ctx); ElementwiseCompute<EigenDivFunctor, Place, T>(ctx);
...@@ -103,7 +103,7 @@ struct ElementwiseDivBroadCast2GradFunctor { ...@@ -103,7 +103,7 @@ struct ElementwiseDivBroadCast2GradFunctor {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class ElementwiseDivGradKernel : public framework::OpKernel { class ElementwiseDivGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
ElementwiseGradCompute<Place, T, ElementwiseDivGradFunctor<T>, ElementwiseGradCompute<Place, T, ElementwiseDivGradFunctor<T>,
......
...@@ -36,7 +36,9 @@ REGISTER_OP(elementwise_mul, ops::ElementwiseOp, ops::ElementwiseMulOpMaker, ...@@ -36,7 +36,9 @@ REGISTER_OP(elementwise_mul, ops::ElementwiseOp, ops::ElementwiseMulOpMaker,
elementwise_mul_grad, ops::ElementwiseOpGrad); elementwise_mul_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_mul, elementwise_mul,
ops::ElementwiseMulKernel<paddle::platform::CPUPlace, float>); ops::ElementwiseMulKernel<paddle::platform::CPUPlace, float>,
ops::ElementwiseMulKernel<paddle::platform::CPUPlace, double>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_mul_grad, elementwise_mul_grad,
ops::ElementwiseMulGradKernel<paddle::platform::CPUPlace, float>); ops::ElementwiseMulGradKernel<paddle::platform::CPUPlace, float>,
ops::ElementwiseMulGradKernel<paddle::platform::CPUPlace, double>);
...@@ -19,7 +19,9 @@ namespace ops = paddle::operators; ...@@ -19,7 +19,9 @@ namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL( REGISTER_OP_GPU_KERNEL(
elementwise_mul, elementwise_mul,
ops::ElementwiseMulKernel<paddle::platform::GPUPlace, float>); ops::ElementwiseMulKernel<paddle::platform::GPUPlace, float>,
ops::ElementwiseMulKernel<paddle::platform::GPUPlace, double>);
REGISTER_OP_GPU_KERNEL( REGISTER_OP_GPU_KERNEL(
elementwise_mul_grad, elementwise_mul_grad,
ops::ElementwiseMulGradKernel<paddle::platform::GPUPlace, float>); ops::ElementwiseMulGradKernel<paddle::platform::GPUPlace, float>,
ops::ElementwiseMulGradKernel<paddle::platform::GPUPlace, double>);
...@@ -19,7 +19,7 @@ namespace paddle { ...@@ -19,7 +19,7 @@ namespace paddle {
namespace operators { namespace operators {
template <typename Place, typename T> template <typename Place, typename T>
class ElementwiseMulKernel : public framework::OpKernel { class ElementwiseMulKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
ElementwiseCompute<EigenMulFunctor, Place, T>(ctx); ElementwiseCompute<EigenMulFunctor, Place, T>(ctx);
...@@ -102,7 +102,7 @@ struct ElementwiseMulBroadCast2GradFunctor { ...@@ -102,7 +102,7 @@ struct ElementwiseMulBroadCast2GradFunctor {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class ElementwiseMulGradKernel : public framework::OpKernel { class ElementwiseMulGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
ElementwiseGradCompute<Place, T, ElementwiseMulGradFunctor<T>, ElementwiseGradCompute<Place, T, ElementwiseMulGradFunctor<T>,
......
...@@ -19,7 +19,7 @@ namespace paddle { ...@@ -19,7 +19,7 @@ namespace paddle {
namespace operators { namespace operators {
template <typename Place, typename T> template <typename Place, typename T>
class ElementwiseSubKernel : public framework::OpKernel { class ElementwiseSubKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
ElementwiseCompute<EigenSubFunctor, Place, T>(ctx); ElementwiseCompute<EigenSubFunctor, Place, T>(ctx);
...@@ -102,7 +102,7 @@ struct ElementwiseSubBroadCast2GradFunctor { ...@@ -102,7 +102,7 @@ struct ElementwiseSubBroadCast2GradFunctor {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class ElementwiseSubGradKernel : public framework::OpKernel { class ElementwiseSubGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
ElementwiseGradCompute<Place, T, ElementwiseSubGradFunctor<T>, ElementwiseGradCompute<Place, T, ElementwiseSubGradFunctor<T>,
......
...@@ -100,7 +100,7 @@ class FCOp : public NetOp { ...@@ -100,7 +100,7 @@ class FCOp : public NetOp {
add_out = Output("AddOut"); add_out = Output("AddOut");
AppendOp(framework::OpRegistry::CreateOp( AppendOp(framework::OpRegistry::CreateOp(
"rowwise_add", {{"X", {sum_out}}, {"b", {Input("B")}}}, "elementwise_add", {{"X", {sum_out}}, {"Y", {Input("B")}}},
{{"Out", {add_out}}}, {})); {{"Out", {add_out}}}, {}));
} else { } else {
if (Output("AddOut") != framework::kEmptyVarName) { if (Output("AddOut") != framework::kEmptyVarName) {
......
...@@ -20,7 +20,7 @@ namespace paddle { ...@@ -20,7 +20,7 @@ namespace paddle {
namespace operators { namespace operators {
template <typename Place, typename T> template <typename Place, typename T>
class FillZerosLikeKernel : public framework::OpKernel { class FillZerosLikeKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* output = context.Output<framework::Tensor>("Y"); auto* output = context.Output<framework::Tensor>("Y");
......
...@@ -37,6 +37,11 @@ class GatherOp : public framework::OperatorWithKernel { ...@@ -37,6 +37,11 @@ class GatherOp : public framework::OperatorWithKernel {
output_dims[0] = batch_size; output_dims[0] = batch_size;
ctx->SetOutputDim("Out", output_dims); ctx->SetOutputDim("Out", output_dims);
} }
framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("X")->type());
}
}; };
class GatherGradOp : public framework::OperatorWithKernel { class GatherGradOp : public framework::OperatorWithKernel {
...@@ -47,6 +52,11 @@ class GatherGradOp : public framework::OperatorWithKernel { ...@@ -47,6 +52,11 @@ class GatherGradOp : public framework::OperatorWithKernel {
void InferShape(framework::InferShapeContextBase* ctx) const override { void InferShape(framework::InferShapeContextBase* ctx) const override {
ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X")); ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X"));
} }
framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("X")->type());
}
}; };
class GatherOpMaker : public framework::OpProtoAndCheckerMaker { class GatherOpMaker : public framework::OpProtoAndCheckerMaker {
......
...@@ -24,7 +24,7 @@ namespace operators { ...@@ -24,7 +24,7 @@ namespace operators {
using Tensor = framework::Tensor; using Tensor = framework::Tensor;
template <typename Place, typename T> template <typename Place, typename T>
class GatherOpKernel : public framework::OpKernel { class GatherOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext &ctx) const override { void Compute(const framework::ExecutionContext &ctx) const override {
auto *X = ctx.Input<Tensor>("X"); auto *X = ctx.Input<Tensor>("X");
...@@ -37,7 +37,7 @@ class GatherOpKernel : public framework::OpKernel { ...@@ -37,7 +37,7 @@ class GatherOpKernel : public framework::OpKernel {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class GatherGradientOpKernel : public framework::OpKernel { class GatherGradientOpKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext &ctx) const override { void Compute(const framework::ExecutionContext &ctx) const override {
auto *Index = ctx.Input<Tensor>("Index"); auto *Index = ctx.Input<Tensor>("Index");
......
...@@ -16,7 +16,7 @@ namespace paddle { ...@@ -16,7 +16,7 @@ namespace paddle {
namespace operators { namespace operators {
template <typename T> template <typename T>
class CPUGaussianRandomKernel : public framework::OpKernel { class CPUGaussianRandomKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
float mean = context.Attr<float>("mean"); float mean = context.Attr<float>("mean");
...@@ -56,6 +56,11 @@ class GaussianRandomOp : public framework::OperatorWithKernel { ...@@ -56,6 +56,11 @@ class GaussianRandomOp : public framework::OperatorWithKernel {
"dims can be one int or array. dims must be set."); "dims can be one int or array. dims must be set.");
ctx->SetOutputDim("Out", framework::make_ddim(temp)); ctx->SetOutputDim("Out", framework::make_ddim(temp));
} }
framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override {
return static_cast<framework::DataType>(Attr<int>("data_type"));
}
}; };
class GaussianRandomOpMaker : public framework::OpProtoAndCheckerMaker { class GaussianRandomOpMaker : public framework::OpProtoAndCheckerMaker {
...@@ -76,6 +81,8 @@ Use to initialize tensor with gaussian random generator. ...@@ -76,6 +81,8 @@ Use to initialize tensor with gaussian random generator.
"Random seed of generator." "Random seed of generator."
"0 means use system wide seed") "0 means use system wide seed")
.SetDefault(0); .SetDefault(0);
AddAttr<int>("data_type", "output data type")
.SetDefault(framework::DataType::FP32);
} }
}; };
......
...@@ -37,7 +37,7 @@ struct GaussianGenerator { ...@@ -37,7 +37,7 @@ struct GaussianGenerator {
}; };
template <typename T> template <typename T>
class GPUGaussianRandomKernel : public framework::OpKernel { class GPUGaussianRandomKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* tensor = context.Output<framework::Tensor>("Out"); auto* tensor = context.Output<framework::Tensor>("Out");
......
...@@ -25,7 +25,7 @@ namespace operators { ...@@ -25,7 +25,7 @@ namespace operators {
using Tensor = framework::Tensor; using Tensor = framework::Tensor;
template <typename Place, typename T> template <typename Place, typename T>
class GemmConv2DKernel : public framework::OpKernel { class GemmConv2DKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
const Tensor* input = context.Input<Tensor>("Input"); const Tensor* input = context.Input<Tensor>("Input");
...@@ -98,7 +98,7 @@ class GemmConv2DKernel : public framework::OpKernel { ...@@ -98,7 +98,7 @@ class GemmConv2DKernel : public framework::OpKernel {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class GemmConvGrad2DKernel : public framework::OpKernel { class GemmConvGrad2DKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
const Tensor* input = context.Input<Tensor>("Input"); const Tensor* input = context.Input<Tensor>("Input");
......
...@@ -36,6 +36,11 @@ class LookupTableOp : public framework::OperatorWithKernel { ...@@ -36,6 +36,11 @@ class LookupTableOp : public framework::OperatorWithKernel {
ctx->SetOutputDim("Out", {ids_dims[0], table_dims[1]}); ctx->SetOutputDim("Out", {ids_dims[0], table_dims[1]});
ctx->ShareLoD("Ids", /*->*/ "Out"); ctx->ShareLoD("Ids", /*->*/ "Out");
} }
framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("W")->type());
}
}; };
class LookupTableOpMaker : public framework::OpProtoAndCheckerMaker { class LookupTableOpMaker : public framework::OpProtoAndCheckerMaker {
...@@ -69,6 +74,11 @@ class LookupTableOpGrad : public framework::OperatorWithKernel { ...@@ -69,6 +74,11 @@ class LookupTableOpGrad : public framework::OperatorWithKernel {
auto table_dims = ctx->GetInputDim("W"); auto table_dims = ctx->GetInputDim("W");
ctx->SetOutputDim(framework::GradVarName("W"), table_dims); ctx->SetOutputDim(framework::GradVarName("W"), table_dims);
} }
framework::DataType IndicateDataType(
const framework::ExecutionContext& ctx) const override {
return framework::ToDataType(ctx.Input<Tensor>("W")->type());
}
}; };
} // namespace operators } // namespace operators
......
...@@ -61,7 +61,7 @@ __global__ void LookupTableGrad(T* table, const T* output, const int32_t* ids, ...@@ -61,7 +61,7 @@ __global__ void LookupTableGrad(T* table, const T* output, const int32_t* ids,
} }
template <typename T> template <typename T>
class LookupTableCUDAKernel : public framework::OpKernel { class LookupTableCUDAKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto table_t = context.Input<Tensor>("W"); auto table_t = context.Input<Tensor>("W");
...@@ -85,7 +85,7 @@ class LookupTableCUDAKernel : public framework::OpKernel { ...@@ -85,7 +85,7 @@ class LookupTableCUDAKernel : public framework::OpKernel {
}; };
template <typename T> template <typename T>
class LookupTableGradCUDAKernel : public framework::OpKernel { class LookupTableGradCUDAKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto ids_t = context.Input<Tensor>("Ids"); auto ids_t = context.Input<Tensor>("Ids");
......
...@@ -23,7 +23,7 @@ namespace operators { ...@@ -23,7 +23,7 @@ namespace operators {
using Tensor = framework::Tensor; using Tensor = framework::Tensor;
template <typename T> template <typename T>
class LookupTableKernel : public framework::OpKernel { class LookupTableKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto table_t = context.Input<Tensor>("W"); // float tensor auto table_t = context.Input<Tensor>("W"); // float tensor
...@@ -44,7 +44,7 @@ class LookupTableKernel : public framework::OpKernel { ...@@ -44,7 +44,7 @@ class LookupTableKernel : public framework::OpKernel {
}; };
template <typename T> template <typename T>
class LookupTableGradKernel : public framework::OpKernel { class LookupTableGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto ids_t = context.Input<Tensor>("Ids"); auto ids_t = context.Input<Tensor>("Ids");
......
...@@ -47,7 +47,6 @@ class LstmUnitOp : public framework::OperatorWithKernel { ...@@ -47,7 +47,6 @@ class LstmUnitOp : public framework::OperatorWithKernel {
} }
}; };
template <typename AttrType>
class LstmUnitOpMaker : public framework::OpProtoAndCheckerMaker { class LstmUnitOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LstmUnitOpMaker(framework::OpProto* proto, LstmUnitOpMaker(framework::OpProto* proto,
...@@ -68,7 +67,7 @@ Equation: ...@@ -68,7 +67,7 @@ Equation:
H = C * sigm(o) H = C * sigm(o)
)DOC"); )DOC");
AddAttr<AttrType>("forget_bias", "The forget bias of Lstm Unit.") AddAttr<float>("forget_bias", "The forget bias of Lstm Unit.")
.SetDefault(0.0); .SetDefault(0.0);
} }
}; };
...@@ -93,9 +92,11 @@ class LstmUnitGradOp : public framework::OperatorWithKernel { ...@@ -93,9 +92,11 @@ class LstmUnitGradOp : public framework::OperatorWithKernel {
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP(lstm_unit, ops::LstmUnitOp, ops::LstmUnitOpMaker<float>, REGISTER_OP(lstm_unit, ops::LstmUnitOp, ops::LstmUnitOpMaker, lstm_unit_grad,
lstm_unit_grad, ops::LstmUnitGradOp); ops::LstmUnitGradOp);
REGISTER_OP_CPU_KERNEL(lstm_unit, REGISTER_OP_CPU_KERNEL(lstm_unit,
ops::LstmUnitKernel<paddle::platform::CPUPlace, float>); ops::LstmUnitKernel<paddle::platform::CPUPlace, float>,
ops::LstmUnitKernel<paddle::platform::CPUPlace, double>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
lstm_unit_grad, ops::LstmUnitGradKernel<paddle::platform::CPUPlace, float>); lstm_unit_grad, ops::LstmUnitGradKernel<paddle::platform::CPUPlace, float>,
ops::LstmUnitGradKernel<paddle::platform::CPUPlace, double>);
...@@ -89,8 +89,8 @@ __global__ void LSTMUnitGradientKernel(const int nthreads, const int dim, ...@@ -89,8 +89,8 @@ __global__ void LSTMUnitGradientKernel(const int nthreads, const int dim,
} }
} }
template <typename T, typename AttrType = T> template <typename T>
class LstmUnitOpCUDAKernel : public framework::OpKernel { class LstmUnitOpCUDAKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
...@@ -101,7 +101,7 @@ class LstmUnitOpCUDAKernel : public framework::OpKernel { ...@@ -101,7 +101,7 @@ class LstmUnitOpCUDAKernel : public framework::OpKernel {
auto* c_tensor = ctx.Output<framework::Tensor>("C"); auto* c_tensor = ctx.Output<framework::Tensor>("C");
auto* h_tensor = ctx.Output<framework::Tensor>("H"); auto* h_tensor = ctx.Output<framework::Tensor>("H");
auto forget_bias = static_cast<T>(ctx.Attr<AttrType>("forget_bias")); auto forget_bias = static_cast<T>(ctx.Attr<float>("forget_bias"));
int b_size = c_tensor->dims()[0]; int b_size = c_tensor->dims()[0];
int D = c_tensor->dims()[1]; int D = c_tensor->dims()[1];
...@@ -120,8 +120,8 @@ class LstmUnitOpCUDAKernel : public framework::OpKernel { ...@@ -120,8 +120,8 @@ class LstmUnitOpCUDAKernel : public framework::OpKernel {
} }
}; };
template <typename T, typename AttrType = T> template <typename T>
class LstmUnitGradOpCUDAKernel : public framework::OpKernel { class LstmUnitGradOpCUDAKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
...@@ -153,7 +153,7 @@ class LstmUnitGradOpCUDAKernel : public framework::OpKernel { ...@@ -153,7 +153,7 @@ class LstmUnitGradOpCUDAKernel : public framework::OpKernel {
int N = c_tensor->dims()[0]; int N = c_tensor->dims()[0];
int D = c_tensor->dims()[1]; int D = c_tensor->dims()[1];
auto forget_bias = static_cast<T>(ctx.Attr<AttrType>("forget_bias")); auto forget_bias = static_cast<T>(ctx.Attr<float>("forget_bias"));
int block = 512; int block = 512;
int n = N * D; int n = N * D;
...@@ -169,5 +169,7 @@ class LstmUnitGradOpCUDAKernel : public framework::OpKernel { ...@@ -169,5 +169,7 @@ class LstmUnitGradOpCUDAKernel : public framework::OpKernel {
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_GPU_KERNEL(lstm_unit, ops::LstmUnitOpCUDAKernel<float>); REGISTER_OP_GPU_KERNEL(lstm_unit, ops::LstmUnitOpCUDAKernel<float>,
REGISTER_OP_GPU_KERNEL(lstm_unit_grad, ops::LstmUnitGradOpCUDAKernel<float>); ops::LstmUnitOpCUDAKernel<double>);
REGISTER_OP_GPU_KERNEL(lstm_unit_grad, ops::LstmUnitGradOpCUDAKernel<float>,
ops::LstmUnitGradOpCUDAKernel<double>);
...@@ -32,8 +32,8 @@ inline T tanh(T x) { ...@@ -32,8 +32,8 @@ inline T tanh(T x) {
return 2. * sigmoid(2. * x) - 1.; return 2. * sigmoid(2. * x) - 1.;
} }
template <typename Place, typename T, typename AttrType = T> template <typename Place, typename T>
class LstmUnitKernel : public framework::OpKernel { class LstmUnitKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()),
...@@ -44,7 +44,7 @@ class LstmUnitKernel : public framework::OpKernel { ...@@ -44,7 +44,7 @@ class LstmUnitKernel : public framework::OpKernel {
auto* c_tensor = ctx.Output<framework::Tensor>("C"); auto* c_tensor = ctx.Output<framework::Tensor>("C");
auto* h_tensor = ctx.Output<framework::Tensor>("H"); auto* h_tensor = ctx.Output<framework::Tensor>("H");
auto forget_bias = static_cast<T>(ctx.Attr<AttrType>("forget_bias")); auto forget_bias = static_cast<T>(ctx.Attr<float>("forget_bias"));
int b_size = c_tensor->dims()[0]; int b_size = c_tensor->dims()[0];
int D = c_tensor->dims()[1]; int D = c_tensor->dims()[1];
...@@ -75,8 +75,8 @@ class LstmUnitKernel : public framework::OpKernel { ...@@ -75,8 +75,8 @@ class LstmUnitKernel : public framework::OpKernel {
} }
}; };
template <typename Place, typename T, typename AttrType = T> template <typename Place, typename T>
class LstmUnitGradKernel : public framework::OpKernel { class LstmUnitGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& ctx) const override { void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()),
...@@ -108,7 +108,7 @@ class LstmUnitGradKernel : public framework::OpKernel { ...@@ -108,7 +108,7 @@ class LstmUnitGradKernel : public framework::OpKernel {
int N = c_tensor->dims()[0]; int N = c_tensor->dims()[0];
int D = c_tensor->dims()[1]; int D = c_tensor->dims()[1];
auto forget_bias = static_cast<T>(ctx.Attr<AttrType>("forget_bias")); auto forget_bias = static_cast<T>(ctx.Attr<float>("forget_bias"));
for (int n = 0; n < N; ++n) { for (int n = 0; n < N; ++n) {
for (int d = 0; d < D; ++d) { for (int d = 0; d < D; ++d) {
......
if(WITH_GPU) if(WITH_GPU)
nv_library(math_function SRCS math_function.cc math_function.cu im2col.cc nv_library(math_function SRCS math_function.cc math_function.cu im2col.cc im2col.cu pooling.cc pooling.cu DEPS cblas device_context operator)
im2col.cu DEPS cblas device_context operator) nv_test(math_function_test SRCS math_function_test.cc DEPS math_function tensor)
nv_library(softmax_function SRCS softmax.cc softmax.cu nv_library(softmax SRCS softmax.cc softmax.cu DEPS operator)
DEPS operator) nv_library(cross_entropy SRCS cross_entropy.cc cross_entropy.cu DEPS operator)
nv_library(cross_entropy_function SRCS cross_entropy.cc cross_entropy.cu
DEPS operator)
else() else()
cc_library(math_function SRCS math_function.cc im2col.cc cc_library(math_function SRCS math_function.cc im2col.cc pooling.cc DEPS cblas device_context operator)
DEPS cblas device_context operator) cc_test(math_function_test SRCS math_function_test.cc DEPS math_function tensor)
cc_library(softmax_function SRCS softmax.cc DEPS operator) cc_library(softmax SRCS softmax.cc DEPS operator)
cc_library(cross_entropy_function SRCS cross_entropy.cc DEPS operator) cc_library(cross_entropy SRCS cross_entropy.cc DEPS operator)
endif() endif()
nv_test(math_function_test SRCS math_function_test.cc DEPS math_function tensor)
cc_test(im2col_test SRCS im2col_test.cc DEPS math_function tensor) cc_test(im2col_test SRCS im2col_test.cc DEPS math_function tensor)
...@@ -26,8 +26,8 @@ using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>; ...@@ -26,8 +26,8 @@ using EigenMatrix = framework::EigenMatrix<T, MajorType, IndexType>;
template <typename T> template <typename T>
class CrossEntropyFunctor<platform::CPUPlace, T> { class CrossEntropyFunctor<platform::CPUPlace, T> {
public: public:
void operator()(const framework::ExecutionContext& ctx, void operator()(const platform::DeviceContext& ctx, framework::Tensor* out,
framework::Tensor* out, const framework::Tensor* prob, const framework::Tensor* prob,
const framework::Tensor* labels, const bool softLabel) { const framework::Tensor* labels, const bool softLabel) {
const int batch_size = prob->dims()[0]; const int batch_size = prob->dims()[0];
if (softLabel) { if (softLabel) {
...@@ -35,7 +35,7 @@ class CrossEntropyFunctor<platform::CPUPlace, T> { ...@@ -35,7 +35,7 @@ class CrossEntropyFunctor<platform::CPUPlace, T> {
auto lbl = EigenMatrix<T>::From(*labels); auto lbl = EigenMatrix<T>::From(*labels);
auto loss = EigenMatrix<T>::From(*out); auto loss = EigenMatrix<T>::From(*out);
loss.device(ctx.GetEigenDevice<platform::CPUPlace>()) = loss.device(*ctx.GetEigenDevice<platform::CPUPlace>()) =
-((lbl * in.log().unaryExpr(math::TolerableValue<T>())) -((lbl * in.log().unaryExpr(math::TolerableValue<T>()))
.sum(Eigen::DSizes<int, 1>(1)) .sum(Eigen::DSizes<int, 1>(1))
.reshape(Eigen::DSizes<int, 2>(batch_size, 1))); .reshape(Eigen::DSizes<int, 2>(batch_size, 1)));
......
...@@ -74,8 +74,8 @@ using Tensor = framework::Tensor; ...@@ -74,8 +74,8 @@ using Tensor = framework::Tensor;
template <typename T> template <typename T>
class CrossEntropyFunctor<platform::GPUPlace, T> { class CrossEntropyFunctor<platform::GPUPlace, T> {
public: public:
void operator()(const framework::ExecutionContext& ctx, void operator()(const platform::DeviceContext& ctx, framework::Tensor* out,
framework::Tensor* out, const framework::Tensor* prob, const framework::Tensor* prob,
const framework::Tensor* labels, bool softLabel) { const framework::Tensor* labels, bool softLabel) {
const T* prob_data = prob->data<T>(); const T* prob_data = prob->data<T>();
T* loss_data = out->mutable_data<T>(ctx.GetPlace()); T* loss_data = out->mutable_data<T>(ctx.GetPlace());
...@@ -87,20 +87,18 @@ class CrossEntropyFunctor<platform::GPUPlace, T> { ...@@ -87,20 +87,18 @@ class CrossEntropyFunctor<platform::GPUPlace, T> {
const T* label_data = labels->data<T>(); const T* label_data = labels->data<T>();
int block = class_num > 512 ? 512 : pow(2, int(std::log2(class_num))); int block = class_num > 512 ? 512 : pow(2, int(std::log2(class_num)));
SoftCrossEntropyKernel< SoftCrossEntropyKernel<T><<<
T><<<batch_size, block, block * sizeof(T), batch_size, block, block * sizeof(T),
reinterpret_cast<const platform::CUDADeviceContext&>( reinterpret_cast<const platform::CUDADeviceContext&>(ctx).stream()>>>(
ctx.device_context()) loss_data, prob_data, label_data, class_num);
.stream()>>>(loss_data, prob_data, label_data, class_num);
} else { } else {
const int* label_data = labels->data<int>(); const int* label_data = labels->data<int>();
int block = 512; int block = 512;
int grid = (batch_size + block - 1) / block; int grid = (batch_size + block - 1) / block;
CrossEntropyKernel<T><<< CrossEntropyKernel<T><<<
grid, block, 0, reinterpret_cast<const platform::CUDADeviceContext&>( grid, block, 0,
ctx.device_context()) reinterpret_cast<const platform::CUDADeviceContext&>(ctx).stream()>>>(
.stream()>>>(loss_data, prob_data, label_data, loss_data, prob_data, label_data, batch_size, class_num);
batch_size, class_num);
} }
} }
}; };
......
...@@ -37,9 +37,7 @@ struct TolerableValue { ...@@ -37,9 +37,7 @@ struct TolerableValue {
template <typename Place, typename T> template <typename Place, typename T>
class CrossEntropyFunctor { class CrossEntropyFunctor {
public: public:
// (TODO caoying) it is much better to use DeviceContext as the first void operator()(const platform::DeviceContext& context,
// parameter.
void operator()(const framework::ExecutionContext& context,
framework::Tensor* out, const framework::Tensor* prob, framework::Tensor* out, const framework::Tensor* prob,
const framework::Tensor* labels, const bool softLabel); const framework::Tensor* labels, const bool softLabel);
}; };
......
...@@ -52,6 +52,7 @@ int LAPACKE_dgetri(int matrix_layout, int n, double* a, int lda, ...@@ -52,6 +52,7 @@ int LAPACKE_dgetri(int matrix_layout, int n, double* a, int lda,
#include <cmath> #include <cmath>
#include "paddle/framework/eigen.h"
#include "paddle/framework/tensor.h" #include "paddle/framework/tensor.h"
#include "paddle/platform/device_context.h" #include "paddle/platform/device_context.h"
#include "paddle/platform/enforce.h" #include "paddle/platform/enforce.h"
...@@ -84,6 +85,13 @@ void matmul(const platform::DeviceContext& context, ...@@ -84,6 +85,13 @@ void matmul(const platform::DeviceContext& context,
const framework::Tensor& matrix_b, bool trans_b, T alpha, const framework::Tensor& matrix_b, bool trans_b, T alpha,
framework::Tensor* matrix_out, T beta); framework::Tensor* matrix_out, T beta);
template <typename Place, typename T>
void SetConstant(const platform::DeviceContext& context,
framework::Tensor* tensor, T num) {
auto t = framework::EigenVector<T>::Flatten(*tensor);
t.device(*context.GetEigenDevice<Place>()) = t.constant(static_cast<T>(num));
}
} // namespace math } // namespace math
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
...@@ -243,3 +243,24 @@ TEST(math_function, gemm_trans_clbas) { ...@@ -243,3 +243,24 @@ TEST(math_function, gemm_trans_clbas) {
EXPECT_EQ(input3_ptr[6], 86); EXPECT_EQ(input3_ptr[6], 86);
EXPECT_EQ(input3_ptr[7], 99); EXPECT_EQ(input3_ptr[7], 99);
} }
TEST(math_function, zero) {
paddle::framework::Tensor tensor;
auto* cpu_place = new paddle::platform::CPUPlace();
float* t = tensor.mutable_data<float>({2, 2}, *cpu_place);
paddle::platform::CPUDeviceContext context(*cpu_place);
paddle::operators::math::SetConstant<paddle::platform::CPUPlace, float>(
context, &tensor, 0);
EXPECT_EQ(t[0], 0);
EXPECT_EQ(t[1], 0);
EXPECT_EQ(t[2], 0);
EXPECT_EQ(t[3], 0);
paddle::operators::math::SetConstant<paddle::platform::CPUPlace, float>(
context, &tensor, 1);
EXPECT_EQ(t[0], 1);
EXPECT_EQ(t[1], 1);
EXPECT_EQ(t[2], 1);
EXPECT_EQ(t[3], 1);
}
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/operators/math/pooling.h"
namespace paddle {
namespace operators {
namespace math {
template <typename PoolProcess, typename T>
class Pool2dFunctor<platform::CPUPlace, PoolProcess, T> {
public:
void operator()(const platform::DeviceContext& context,
const framework::Tensor& input, framework::Tensor& output,
std::vector<int>& ksize, std::vector<int>& strides,
std::vector<int>& paddings, PoolProcess pool_process) {
const int batch_size = input.dims()[0];
const int input_height = input.dims()[2];
const int input_width = input.dims()[3];
const int output_channels = output.dims()[1];
const int output_height = output.dims()[2];
const int output_width = output.dims()[3];
const int ksize_height = ksize[0];
const int ksize_width = ksize[1];
const int stride_height = strides[0];
const int stride_width = strides[1];
const int padding_height = paddings[0];
const int padding_width = paddings[1];
const int input_stride = input_height * input_width;
const int output_stride = output_height * output_width;
const T* input_data = input.data<T>();
T* output_data = output.mutable_data<T>(context.GetPlace());
for (int i = 0; i < batch_size; i++) {
for (int c = 0; c < output_channels; ++c) {
for (int ph = 0; ph < output_height; ++ph) {
int hstart = ph * stride_height - padding_height;
int hend = std::min(hstart + ksize_height, input_height);
hstart = std::max(hstart, 0);
for (int pw = 0; pw < output_width; ++pw) {
int wstart = pw * stride_width - padding_width;
int wend = std::min(wstart + ksize_width, input_width);
wstart = std::max(wstart, 0);
T ele = pool_process.initial();
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
pool_process.compute(ele, input_data[h * input_width + w]);
}
}
int pool_size = (hend - hstart) * (wend - wstart);
pool_process.finalize(ele, (static_cast<T>(pool_size)));
output_data[ph * output_width + pw] = ele;
}
}
input_data += input_stride;
output_data += output_stride;
}
}
}
};
template <typename PoolProcess, class T>
class Pool2dGradFunctor<platform::CPUPlace, PoolProcess, T> {
public:
void operator()(const platform::DeviceContext& context,
const framework::Tensor& input, framework::Tensor& input_grad,
const framework::Tensor& output,
const framework::Tensor& output_grad, std::vector<int>& ksize,
std::vector<int>& strides, std::vector<int>& paddings,
PoolProcess pool_grad_process) {
const int batch_size = input.dims()[0];
const int input_height = input.dims()[2];
const int input_width = input.dims()[3];
const int output_channels = output.dims()[1];
const int output_height = output.dims()[2];
const int output_width = output.dims()[3];
const int ksize_height = ksize[0];
const int ksize_width = ksize[1];
const int stride_height = strides[0];
const int stride_width = strides[1];
const int padding_height = paddings[0];
const int padding_width = paddings[1];
const int input_stride = input_height * input_width;
const int output_stride = output_height * output_width;
const T* input_data = input.data<T>();
const T* output_data = output.data<T>();
const T* output_grad_data = output_grad.data<T>();
T* input_grad_data = input_grad.mutable_data<T>(context.GetPlace());
for (int i = 0; i < batch_size; i++) {
for (int c = 0; c < output_channels; ++c) {
for (int ph = 0; ph < output_height; ++ph) {
int hstart = ph * stride_height - padding_height;
int hend = std::min(hstart + ksize_height, input_height);
hstart = std::max(hstart, 0);
for (int pw = 0; pw < output_width; ++pw) {
int wstart = pw * stride_width - padding_width;
int wend = std::min(wstart + ksize_width, input_width);
wstart = std::max(wstart, 0);
int pool_size = (hend - hstart) * (wend - wstart);
float scale = 1.0 / pool_size;
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
pool_grad_process.compute(
input_data[h * input_width + w],
output_data[ph * output_width + pw],
output_grad_data[ph * output_width + pw],
input_grad_data[h * input_width + w],
static_cast<T>(scale));
}
}
}
}
input_data += input_stride;
output_data += output_stride;
input_grad_data += input_stride;
output_grad_data += output_stride;
}
}
}
};
template <class T>
class MaxPool2dGradFunctor<platform::CPUPlace, T> {
public:
void operator()(const platform::DeviceContext& context,
const framework::Tensor& input, framework::Tensor& input_grad,
const framework::Tensor& output,
const framework::Tensor& output_grad, std::vector<int>& ksize,
std::vector<int>& strides, std::vector<int>& paddings) {
const int batch_size = input.dims()[0];
const int input_height = input.dims()[2];
const int input_width = input.dims()[3];
const int output_channels = output.dims()[1];
const int output_height = output.dims()[2];
const int output_width = output.dims()[3];
const int ksize_height = ksize[0];
const int ksize_width = ksize[1];
const int stride_height = strides[0];
const int stride_width = strides[1];
const int padding_height = paddings[0];
const int padding_width = paddings[1];
const int input_stride = input_height * input_width;
const int output_stride = output_height * output_width;
const T* input_data = input.data<T>();
const T* output_data = output.data<T>();
const T* output_grad_data = output_grad.data<T>();
T* input_grad_data = input_grad.mutable_data<T>(context.GetPlace());
for (int i = 0; i < batch_size; i++) {
for (int c = 0; c < output_channels; ++c) {
for (int ph = 0; ph < output_height; ++ph) {
int hstart = ph * stride_height - padding_height;
int hend = std::min(hstart + ksize_height, input_height);
hstart = std::max(hstart, 0);
for (int pw = 0; pw < output_width; ++pw) {
int wstart = pw * stride_width - padding_width;
int wend = std::min(wstart + ksize_width, input_width);
wstart = std::max(wstart, 0);
bool stop = false;
for (int h = hstart; h < hend && !stop; ++h) {
for (int w = wstart; w < wend && !stop; ++w) {
int input_idx = h * input_width + w;
int output_idx = ph * output_width + pw;
if (input_data[input_idx] == output_data[output_idx]) {
input_grad_data[input_idx] += output_grad_data[output_idx];
stop = true;
}
}
}
}
}
input_data += input_stride;
output_data += output_stride;
input_grad_data += input_stride;
output_grad_data += output_stride;
}
}
}
};
template class MaxPool2dGradFunctor<platform::CPUPlace, float>;
// template class MaxPool2dGradFunctor<platform::CPUPlace, double>;
template class Pool2dFunctor<platform::CPUPlace,
paddle::operators::math::MaxPool<float>, float>;
template class Pool2dFunctor<platform::CPUPlace,
paddle::operators::math::AvgPool<float>, float>;
template class Pool2dGradFunctor<
platform::CPUPlace, paddle::operators::math::MaxPoolGrad<float>, float>;
template class Pool2dGradFunctor<
platform::CPUPlace, paddle::operators::math::AvgPoolGrad<float>, float>;
template class Pool2dFunctor<platform::CPUPlace,
paddle::operators::math::MaxPool<double>, double>;
template class Pool2dFunctor<platform::CPUPlace,
paddle::operators::math::AvgPool<double>, double>;
template class Pool2dGradFunctor<
platform::CPUPlace, paddle::operators::math::MaxPoolGrad<double>, double>;
template class Pool2dGradFunctor<
platform::CPUPlace, paddle::operators::math::AvgPoolGrad<double>, double>;
template <typename PoolProcess, class T>
class Pool3dFunctor<platform::CPUPlace, PoolProcess, T> {
public:
void operator()(const platform::DeviceContext& context,
const framework::Tensor& input, framework::Tensor& output,
std::vector<int>& ksize, std::vector<int>& strides,
std::vector<int>& paddings, PoolProcess pool_process) {
const int batch_size = input.dims()[0];
const int input_depth = input.dims()[2];
const int input_height = input.dims()[3];
const int input_width = input.dims()[4];
const int output_channels = output.dims()[1];
const int output_depth = output.dims()[2];
const int output_height = output.dims()[3];
const int output_width = output.dims()[4];
const int ksize_depth = ksize[0];
const int ksize_height = ksize[1];
const int ksize_width = ksize[2];
const int stride_depth = strides[0];
const int stride_height = strides[1];
const int stride_width = strides[2];
const int padding_depth = paddings[0];
const int padding_height = paddings[1];
const int padding_width = paddings[2];
const int input_stride = input_depth * input_height * input_width;
const int output_stride = output_depth * output_height * output_width;
const T* input_data = input.data<T>();
T* output_data = output.mutable_data<T>(context.GetPlace());
for (int i = 0; i < batch_size; i++) {
for (int c = 0; c < output_channels; ++c) {
for (int pd = 0; pd < output_depth; ++pd) {
int dstart = pd * stride_depth - padding_depth;
int dend = std::min(dstart + ksize_depth, input_depth);
dstart = std::max(dstart, 0);
for (int ph = 0; ph < output_height; ++ph) {
int hstart = ph * stride_height - padding_height;
int hend = std::min(hstart + ksize_height, input_height);
hstart = std::max(hstart, 0);
for (int pw = 0; pw < output_width; ++pw) {
int wstart = pw * stride_width - padding_width;
int wend = std::min(wstart + ksize_width, input_width);
wstart = std::max(wstart, 0);
int output_idx = (pd * output_height + ph) * output_width + pw;
T ele = pool_process.initial();
for (int d = dstart; d < dend; ++d) {
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
pool_process.compute(
ele,
input_data[(d * input_height + h) * input_width + w]);
}
}
}
int pool_size =
(dend - dstart) * (hend - hstart) * (wend - wstart);
pool_process.finalize(ele, static_cast<T>(pool_size));
output_data[output_idx] = ele;
}
}
}
input_data += input_stride;
output_data += output_stride;
}
}
}
};
template <typename PoolProcess, class T>
class Pool3dGradFunctor<platform::CPUPlace, PoolProcess, T> {
public:
void operator()(const platform::DeviceContext& context,
const framework::Tensor& input, framework::Tensor& input_grad,
const framework::Tensor& output,
const framework::Tensor& output_grad, std::vector<int>& ksize,
std::vector<int>& strides, std::vector<int>& paddings,
PoolProcess pool_grad_process) {
const int batch_size = input.dims()[0];
const int input_depth = input.dims()[2];
const int input_height = input.dims()[3];
const int input_width = input.dims()[4];
const int output_channels = output.dims()[1];
const int output_depth = output.dims()[2];
const int output_height = output.dims()[3];
const int output_width = output.dims()[4];
const int ksize_depth = ksize[0];
const int ksize_height = ksize[1];
const int ksize_width = ksize[2];
const int stride_depth = strides[0];
const int stride_height = strides[1];
const int stride_width = strides[2];
const int padding_depth = paddings[0];
const int padding_height = paddings[1];
const int padding_width = paddings[2];
const int input_stride = input_depth * input_height * input_width;
const int output_stride = output_depth * output_height * output_width;
const T* input_data = input.data<T>();
const T* output_data = output.data<T>();
const T* output_grad_data = output_grad.data<T>();
T* input_grad_data = input_grad.mutable_data<T>(context.GetPlace());
for (int i = 0; i < batch_size; i++) {
for (int c = 0; c < output_channels; ++c) {
for (int pd = 0; pd < output_depth; ++pd) {
int dstart = pd * stride_depth - padding_depth;
int dend = std::min(dstart + ksize_depth, input_depth);
dstart = std::max(dstart, 0);
for (int ph = 0; ph < output_height; ++ph) {
int hstart = ph * stride_height - padding_height;
int hend = std::min(hstart + ksize_height, input_height);
hstart = std::max(hstart, 0);
for (int pw = 0; pw < output_width; ++pw) {
int wstart = pw * stride_width - padding_width;
int wend = std::min(wstart + ksize_width, input_width);
wstart = std::max(wstart, 0);
int pool_size =
(dend - dstart) * (hend - hstart) * (wend - wstart);
float scale = 1.0 / pool_size;
for (int d = dstart; d < dend; ++d) {
for (int h = hstart; h < hend; ++h) {
for (int w = wstart; w < wend; ++w) {
int input_idx = (d * input_height + h) * input_width + w;
int output_idx =
(pd * output_height + ph) * output_width + pw;
pool_grad_process.compute(
input_data[input_idx], output_data[output_idx],
output_grad_data[output_idx],
input_grad_data[input_idx], static_cast<T>(scale));
}
}
}
}
}
}
input_data += input_stride;
output_data += output_stride;
input_grad_data += input_stride;
output_grad_data += output_stride;
}
}
}
};
template <class T>
class MaxPool3dGradFunctor<platform::CPUPlace, T> {
public:
void operator()(const platform::DeviceContext& context,
const framework::Tensor& input, framework::Tensor& input_grad,
const framework::Tensor& output,
const framework::Tensor& output_grad, std::vector<int>& ksize,
std::vector<int>& strides, std::vector<int>& paddings) {
const int batch_size = input.dims()[0];
const int input_depth = input.dims()[2];
const int input_height = input.dims()[3];
const int input_width = input.dims()[4];
const int output_channels = output.dims()[1];
const int output_depth = output.dims()[2];
const int output_height = output.dims()[3];
const int output_width = output.dims()[4];
const int ksize_depth = ksize[0];
const int ksize_height = ksize[1];
const int ksize_width = ksize[2];
const int stride_depth = strides[0];
const int stride_height = strides[1];
const int stride_width = strides[2];
const int padding_depth = paddings[0];
const int padding_height = paddings[1];
const int padding_width = paddings[2];
const int input_stride = input_depth * input_height * input_width;
const int output_stride = output_depth * output_height * output_width;
const T* input_data = input.data<T>();
const T* output_data = output.data<T>();
const T* output_grad_data = output_grad.data<T>();
T* input_grad_data = input_grad.mutable_data<T>(context.GetPlace());
for (int i = 0; i < batch_size; i++) {
for (int c = 0; c < output_channels; ++c) {
for (int pd = 0; pd < output_depth; ++pd) {
int dstart = pd * stride_depth - padding_depth;
int dend = std::min(dstart + ksize_depth, input_depth);
dstart = std::max(dstart, 0);
for (int ph = 0; ph < output_height; ++ph) {
int hstart = ph * stride_height - padding_height;
int hend = std::min(hstart + ksize_height, input_height);
hstart = std::max(hstart, 0);
for (int pw = 0; pw < output_width; ++pw) {
int wstart = pw * stride_width - padding_width;
int wend = std::min(wstart + ksize_width, input_width);
wstart = std::max(wstart, 0);
bool stop = false;
for (int d = dstart; d < dend && !stop; ++d) {
for (int h = hstart; h < hend && !stop; ++h) {
for (int w = wstart; w < wend && !stop; ++w) {
int input_idx = (d * input_height + h) * input_width + w;
int output_idx =
(pd * output_height + ph) * output_width + pw;
if (input_data[input_idx] == output_data[output_idx]) {
input_grad_data[input_idx] +=
output_grad_data[output_idx];
stop = true;
}
}
}
}
}
}
}
input_data += input_stride;
output_data += output_stride;
input_grad_data += input_stride;
output_grad_data += output_stride;
}
}
}
};
template class MaxPool3dGradFunctor<platform::CPUPlace, float>;
// template class MaxPool3dGradFunctor<platform::CPUPlace, double>;
template class Pool3dFunctor<platform::CPUPlace,
paddle::operators::math::MaxPool<float>, float>;
template class Pool3dFunctor<platform::CPUPlace,
paddle::operators::math::AvgPool<float>, float>;
template class Pool3dGradFunctor<
platform::CPUPlace, paddle::operators::math::MaxPoolGrad<float>, float>;
template class Pool3dGradFunctor<
platform::CPUPlace, paddle::operators::math::AvgPoolGrad<float>, float>;
template class Pool3dFunctor<platform::CPUPlace,
paddle::operators::math::MaxPool<double>, double>;
template class Pool3dFunctor<platform::CPUPlace,
paddle::operators::math::AvgPool<double>, double>;
template class Pool3dGradFunctor<
platform::CPUPlace, paddle::operators::math::MaxPoolGrad<double>, double>;
template class Pool3dGradFunctor<
platform::CPUPlace, paddle::operators::math::AvgPoolGrad<double>, double>;
} // namespace math
} // namespace operators
} // namespace paddle
此差异已折叠。
此差异已折叠。
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. you may not use this file except in compliance with the License.
You may obtain a copy of the License at You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/operators/math/softmax.h" #include "paddle/operators/math/softmax.h"
...@@ -19,6 +19,7 @@ namespace operators { ...@@ -19,6 +19,7 @@ namespace operators {
namespace math { namespace math {
template class SoftmaxFunctor<platform::CPUPlace, float>; template class SoftmaxFunctor<platform::CPUPlace, float>;
template class SoftmaxGradFunctor<platform::CPUPlace, float>;
} // namespace math } // namespace math
} // namespace operators } // namespace operators
......
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. you may not use this file except in compliance with the License.
You may obtain a copy of the License at You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#define EIGEN_USE_GPU #define EIGEN_USE_GPU
...@@ -21,6 +21,7 @@ namespace operators { ...@@ -21,6 +21,7 @@ namespace operators {
namespace math { namespace math {
template class SoftmaxFunctor<platform::GPUPlace, float>; template class SoftmaxFunctor<platform::GPUPlace, float>;
template class SoftmaxGradFunctor<platform::GPUPlace, float>;
} // namespace math } // namespace math
} // namespace operators } // namespace operators
......
此差异已折叠。
...@@ -28,7 +28,7 @@ template <typename T, int MajorType = Eigen::RowMajor, ...@@ -28,7 +28,7 @@ template <typename T, int MajorType = Eigen::RowMajor,
using EigenVector = framework::EigenVector<T, MajorType, IndexType>; using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
template <typename Place, typename T> template <typename Place, typename T>
class MeanKernel : public framework::OpKernel { class MeanKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* input = context.Input<Tensor>("X"); auto* input = context.Input<Tensor>("X");
...@@ -45,7 +45,7 @@ class MeanKernel : public framework::OpKernel { ...@@ -45,7 +45,7 @@ class MeanKernel : public framework::OpKernel {
}; };
template <typename Place, typename T> template <typename Place, typename T>
class MeanGradKernel : public framework::OpKernel { class MeanGradKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto OG = context.Input<Tensor>(framework::GradVarName("Out")); auto OG = context.Input<Tensor>(framework::GradVarName("Out"));
......
...@@ -20,7 +20,7 @@ namespace paddle { ...@@ -20,7 +20,7 @@ namespace paddle {
namespace operators { namespace operators {
template <typename Place, typename T> template <typename Place, typename T>
class MinusKernel : public framework::OpKernel { class MinusKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* left_tensor = context.Input<framework::Tensor>("X"); auto* left_tensor = context.Input<framework::Tensor>("X");
......
...@@ -39,7 +39,7 @@ struct ModifiedHuberLossBackward { ...@@ -39,7 +39,7 @@ struct ModifiedHuberLossBackward {
}; };
template <typename T> template <typename T>
class ModifiedHuberLossGradGPUKernel : public framework::OpKernel { class ModifiedHuberLossGradGPUKernel : public framework::OpKernel<T> {
public: public:
void Compute(const framework::ExecutionContext& context) const override { void Compute(const framework::ExecutionContext& context) const override {
auto* in0 = context.Input<Tensor>("Y"); auto* in0 = context.Input<Tensor>("Y");
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册