diff --git a/doc/design/refactorization.md b/doc/design/refactorization.md index ffcc069ccd879f910a7b86d075075dfb8ad9fb66..a2a353c28374213605be0996fcff75ad13d736f1 100644 --- a/doc/design/refactorization.md +++ b/doc/design/refactorization.md @@ -1,40 +1,40 @@ # Design Doc: Refactorization Overview -The goal of refactorizaiton include: +The goals of refactoring include: -1. Make it easy for external contributors to write new elementory computaiton operations. -1. Make the codebase clean and readable. -1. Introduce a new design of computation representation -- a computation graph of operators and variables. -1. The graph representation helps implementing auto-scalable and auto fault recoverable distributed computing. +1. Making it easy for external contributors to write new elementary computation operations. +1. Making the codebase clean and readable. +1. Designing a new computation representation -- a computation graph of operators and variables. +1. Implementing auto-scalability and auto fault recoverable distributed computing with the help of computation graphs. ## Computation Graphs -1. PaddlePaddle represent the computation, training and inference of DL models, by computation graphs. +1. PaddlePaddle represents the computation, training and inference of Deep Learning models, by computation graphs. - 1. Please dig into [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a solid example. + 1. Please refer to [computation graphs](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/graph.md) for a concrete example. -1. Users write Python programs to describe the graphs and run it (locally or remotely). +1. Users write Python programs to describe the graphs and run them (locally or remotely). 1. A graph is composed of *variables* and *operators*. -1. The description of graphs must be able to be serialized/deserialized, so it +1. The description of graphs must be capable of being serialized/deserialized, so that: - 1. could to be sent to the cloud for distributed execution, and - 1. be sent to clients for mobile or enterprise deployment. + 1. It can to be sent to the cloud for distributed execution, and + 1. It can be sent to clients for mobile or enterprise deployment. -1. The Python program do +1. The Python program does the following steps - 1. *compilation*: runs a Python program to generate a protobuf message representation of the graph and send it to + 1. *compilation*: run a Python program to generate a protobuf message representation of the graph and send it to 1. the C++ library `libpaddle.so` for local execution, 1. the master process of a distributed training job for training, or 1. the server process of a Kubernetes serving job for distributed serving. - 1. *execution*: according to the protobuf message, constructs instances of class `Variable` and `OperatorBase`, and run them. + 1. *execution*: execute the graph by constructing instances of class [`Variable`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/variable.h#L24) and [`OperatorBase`](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L70), according to the protobuf message. -## Description and Realization +## Description and Realization of Computation Graph -At compile time, the Python program generates protobuf message representation of the graph, or the description of the graph. +At compile time, the Python program generates a protobuf message representation of the graph, or the description of the graph. -At runtime, the C++ program realizes the graph and run it. +At runtime, the C++ program realizes the graph and runs it. | | Representation (protobuf messages) | Realization (C++ class objects) | |---|---|---| @@ -42,30 +42,31 @@ At runtime, the C++ program realizes the graph and run it. |Operation|[OpDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L35)|[Operator](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/operator.h#L64)| |Block|BlockDesc|Block| -The word *graph* is exchangable with *block* in this document. A graph represent computation steps and local variables as a C++/Java program block, or a pair of { and }. +The word *graph* is interchangeable with *block* in this document. A graph represents computation steps and local variables similar to a C++/Java program block, or a pair of parentheses(`{` and `}`). ## Compilation and Execution -1. Run an applicaton Python program to describe the graph. In particular, +1. Run an application Python program to describe the graph. In particular, the Python application program does the following: - 1. create VarDesc to represent local/intermediate variables, - 1. create operators and set attributes, - 1. validate attribute values, - 1. inference the type and the shape of variables, - 1. plan for memory-reuse for variables, - 1. generate backward and optimization part of the Graph. - 1. possiblly split the graph for distributed training. + 1. Create `VarDesc` to represent local/intermediate variables, + 1. Create operators and set attributes, + 1. Validate attribute values, + 1. Infer the type and the shape of variables, + 1. Plan memory-reuse for variables, + 1. Generate the backward graph + 1. Optimize the computation graph. + 1. Potentially, split the graph for distributed training. -1. The invocation of `train` or `infer` in the application Python program: +1. The invocation of `train` or [`infer`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/inference.py#L108) methods in the application Python program does the following: - 1. create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block, + 1. Create a new Scope instance in the [scope hierarchy](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/scope.md) for each run of a block, 1. realize local variables defined in the BlockDesc message in the new scope, 1. a scope is similar to the stack frame in programming languages, - 1. create an instance of class `Block`, in which, + 1. Create an instance of class `Block`, in which, 1. realize operators in the BlockDesc message, - 1. run the Block by calling + 1. Run the Block by calling 1. `Block::Eval(vector* targets)` for forward and backward computations, or 1. `Block::Eval(vector* targets)` for optimization. @@ -76,14 +77,14 @@ The word *graph* is exchangable with *block* in this document. A graph represen Compile Time -> IR -> Runtime ``` -### Benefit +### Benefits of IR - Optimization ```text Compile Time -> IR -> Optimized IR -> Runtime ``` -- Send automatically partitioned IR to different nodes. - - Automatic data parallel +- Automatically send partitioned IR to different nodes. + - Automatic Data Parallelism ```text Compile Time |-> Single GPU IR @@ -92,7 +93,7 @@ Compile Time -> IR -> Runtime |-> Node-1 (runs trainer-IR-1) |-> Node-2 (runs pserver-IR) ``` - - Automatic model parallel (planned for future) + - Automatic Model Parallelism (planned for future) --- @@ -105,10 +106,10 @@ Compile Time -> IR -> Runtime # Operator ![class_diagram](http://api.paddlepaddle.org/graphviz?dot=https://gist.githubusercontent.com/reyoung/53df507f6749762675dff3e7ce53372f/raw/dd598e8f1976f5759f58af5e5ef94738a6b2e661/op.dot) -* `Operator` is the fundamental building block as the user interface. - * Operator stores input/output variable name, and attributes. - * The `InferShape` interface is used to infer output variable shapes by its input shapes. - * Use `Run` to compute `input variables` to `output variables`. +* `Operator` is the fundamental building block of the user interface. + * Operator stores input/output variable names, and attributes. + * The `InferShape` interface is used to infer the shape of the output variable shapes based on the shapes of the input variables. + * Use `Run` to compute the `output` variables from the `input` variables. --- @@ -126,30 +127,29 @@ Compile Time -> IR -> Runtime # Why separate Kernel and Operator * Separate GPU and CPU code. - * Make Paddle can run without GPU. -* Make one operator (which is user interface) can contain many implementations. - * Same mul op, different FP16, FP32 Kernel. different MKL, eigen kernel. + * Make Paddle capable of running without GPU. +* Make one operator (which is a user interface) and create many implementations. + * For example, same multiplication op can have different implementations kernels such as FP16 kernel, FP32 kernel, MKL, eigen kernel. --- # Libraries for Kernel development * `Eigen::Tensor` contains basic math and element-wise functions. * Note that `Eigen::Tensor` has broadcast implementation. - * Limit number of `tensor.device(dev) = ` in your code. -* `thrust::tranform` and `std::transform`. - * `thrust` has the same API as C++ standard library. Using `transform` can quickly implement a customized elementwise kernel. - * `thrust` has more complex API, like `scan`, `reduce`, `reduce_by_key`. + * Limit the number of `tensor.device(dev) = ` in your code. +* `thrust::transform` and `std::transform`. + * `thrust` has the same API as C++ standard library. Using `transform`, one can quickly implement customized element-wise kernels. + * `thrust` also has more complex APIs, like `scan`, `reduce`, `reduce_by_key`. * Hand-writing `GPUKernel` and `CPU` code - * Do not write `.h`. CPU Kernel should be in `.cc`. GPU kernel should be in `.cu`. (`GCC` cannot compile GPU code.) + * Do not write in header (`.h`) files. CPU Kernel should be in cpp source (`.cc`) and GPU kernels should be in cuda (`.cu`) files. (GCC cannot compile GPU code.) --- -# Operator Register +# Operator Registration -## Why register is necessary? +## Why is registration necessary? We need a method to build mappings between Op type names and Op classes. -## How to do the register? - -Maintain a map, whose key is the type name and value is corresponding Op constructor. +## How is registration implemented? +Maintaining a map, whose key is the type name and the value is the corresponding Op constructor. --- # The Registry Map @@ -169,7 +169,7 @@ Maintain a map, whose key is the type name and value is corresponding Op constru # Related Concepts ### Op_Maker -It's constructor takes `proto` and `checker`. They are compeleted during Op_Maker's construction. ([ScaleOpMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37)) +It's constructor takes `proto` and `checker`. They are completed during Op_Maker's construction. ([ScaleOpMaker](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/scale_op.cc#L37)) ### Register Macros ```cpp @@ -177,32 +177,30 @@ REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class) REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class) ``` - --- -# Register Process -1. Write Op class, as well as its gradient Op class if there is. -2. Write Op maker class. In the constructor, describe its inputs, outputs, and attributes. -3. Invoke macro `REGISTER_OP`. The macro will - 1. call maker class to complete `proto` and `checker` - 2. with the completed `proto` and `checker`, build a new key-value pair in the `OpInfoMap` - +# Registration Process +1. Write an Op class and its gradient Op class, if required. +2. Write an Op maker class. In the constructor of this class, describe the inputs, outputs and attributes of the operator. +3. Invoke the macro `REGISTER_OP`. This macro will + 1. Call maker class to complete the `proto` and the `checker` + 2. Using the completed `proto` and `checker`, it will add a new key-value pair to the `OpInfoMap` --- # Backward Module (1/2) ### Create Backward Operator -- Mapping from forwarding Op to backward Op +- Mapping from forward Op to backward Op ![backward](https://gist.githubusercontent.com/dzhwinter/a6fbd4623ee76c459f7f94591fd1abf0/raw/61026ab6e518e66bde66a889bc42557a1fccff33/backward.png) --- # Backward Module (2/2) ### Build Backward Network -- **Input** graph of forwarding operators -- **Output** graph of backward operators -- **corner case in construction** - - shared variable => insert `Add` operator - - no gradient => insert `fill_zero_grad` operator - - recursive netOp => call `Backward` recursively +- **Input**: graph of forward operators +- **Output**: graph of backward operators +- **Corner cases in construction** + - Shared Variables => insert an `Add` operator to combine gradients + - No Gradient => insert a `fill_zero_grad` operator + - Recursive NetOp => call `Backward` recursively - RNN Op => recursively call `Backward` on stepnet @@ -211,41 +209,41 @@ REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class) * `Tensor` is an n-dimension array with type. * Only dims and data pointers are stored in `Tensor`. - * All operators on `Tensor` is written in `Operator` or global functions. - * variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) -* `Variable` is the inputs and outputs of an operator. Not just `Tensor`. - * step_scopes in RNN is a variable and not a tensor. -* `Scope` is where variables store at. - * map - * `Scope` has a hierarchical structure. The local scope can get variable from its parent scope. + * All operations on `Tensor` are written in `Operator` or global functions. + * Variable length Tensor design [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) +* `Variable` instances are the inputs and the outputs of an operator. Not just `Tensor`. + * `step_scopes` in RNN is a variable and not a tensor. +* `Scope` is where variables are stores. + * map + * `Scope` has a hierarchical structure. The local scope can get variables from its parent scope. --- # Block (in design) -## the difference with original RNNOp -- as an operator is more intuitive than `RNNOp`, -- offers new interface `Eval(targets)` to deduce the minimal block to `Run`, -- fits the compile-time/ runtime separation design. - - during the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc` - - when graph executes, a Block with `BlockDesc` passed in creates `Op` and `Var` then `Run` +## the difference between original RNNOp and Block +- As an operator is more intuitive than `RNNOp`, +- Offers a new interface `Eval(targets)` to deduce the minimal block to `Run`, +- Fits the compile-time/ runtime separation design paradigm. + - During the compilation, `SymbolTable` stores `VarDesc`s and `OpDesc`s and serialize to a `BlockDesc` + - When graph executes, a Block with `BlockDesc` is passed. It then creates `Op` and `Var` instances and then invokes `Run`. --- # Milestone -- take Paddle/books as the main line, the requirement of the models motivates framework refactoring, -- model migration - - framework development gives **priority support** to model migration, for example, +- Take Paddle/books as the main line, the requirement of the models motivates framework refactoring, +- Model migration + - Framework development gives **priority support** to model migration, for example, - the MNIST demo needs a Python interface, - the RNN models require the framework to support `LoDTensor`. - - determine some timelines, - - heavily-relied Ops need to be migrated first, - - different models can be migrated parallelly. -- improve the framework at the same time -- accept imperfection, concentrated on solving the specific problem at the right price. + - Determine some timelines, + - Frequently used Ops need to be migrated first, + - Different models can be migrated in parallel. +- Improve the framework at the same time +- Accept imperfection, concentrate on solving the specific problem at the right price. --- # Control the migration quality -- compare the performance of migrated models with old ones. -- follow google C style -- build the automatic workflow of generating Python/C++ documentations - - the documentation of layers and ops should be written inside the code - - take the documentation quality into account when doing PR - - preview the documentations, read and improve them from users' perspective +- Compare the performance of migrated models with old ones. +- Follow the google C++ style +- Build the automatic workflow of generating Python/C++ documentations. + - The documentation of layers and ops should be written inside the code. + - Take the documentation quality into account when submitting pull requests. + - Preview the documentations, read and improve them from a user's perspective. diff --git a/doc/howto/dev/new_op_cn.md b/doc/howto/dev/new_op_cn.md index 264b998f50df016da0741d97d4b26f759ee90900..9d3d02ffc3116ebec537ab9b890eafccad196ed0 100644 --- a/doc/howto/dev/new_op_cn.md +++ b/doc/howto/dev/new_op_cn.md @@ -285,41 +285,27 @@ class TestMulGradOp(GradientChecker): 'Y': np.random.random((84, 100)).astype("float32") } - def test_cpu_gpu_compare(self): - self.compare_grad(self.op, self.inputs) - - def test_normal(self): + def test_check_grad_normal(self): # mul op will enlarge the relative error - self.check_grad( - self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5) + self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5) - def test_ignore_x(self): + def test_check_grad_ingore_x(self): self.check_grad( - self.op, - self.inputs, ["Y"], - "Out", - max_relative_error=0.5, - no_grad_set={"X"}) + ['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X")) - def test_ignore_y(self): + def test_check_grad_ingore_y(self): self.check_grad( - self.op, - self.inputs, ["X"], - "Out", - max_relative_error=0.5, - no_grad_set={"Y"}) + ['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y')) ``` 下面解释代码中一些关键的地方: - 调用`create_op("mul")`创建反向Op对应的前向Op。 -- 调用`compare_grad`函数对比CPU、GPU计算结果。 -- `test_normal`中调用`check_grad`使用数值法检测梯度正确性和稳定性。 - - 第一个参数`self.op` : 前向Op。 - - 第二个参数`self.inputs` : 输入词典,词典的Key和`ProtoMaker`定义保持一致。 - - 第三个参数`["X", "Y"]` : 指定对输入变量`X`、`Y`做梯度检测。 - - 第四个参数`"Out"` : 指定前向网络最终的输出目标变量`Out` -- `test_ignore_x`和`test_ignore_y`分支用来测试只需要计算一个输入梯度的情况。 +- `test_check_grad_normal`中调用`check_grad`使用数值法检测梯度正确性和稳定性。 + - 第一个参数`["X", "Y"]` : 指定对输入变量`X`、`Y`做梯度检测。 + - 第二个参数`"Out"` : 指定前向网络最终的输出目标变量`Out`。 + - 第三个参数`max_relative_error`:指定检测梯度时能容忍的最大错误值。 +- `test_check_grad_ingore_x`和`test_check_grad_ingore_y`分支用来测试只需要计算一个输入梯度的情况。 ### 编译和执行单元测试 diff --git a/doc/howto/dev/new_op_en.md b/doc/howto/dev/new_op_en.md index bad1dbc1de9cc5bd11914fddf397857f0bda7976..57ff7caad19cc6bf2e4a052d306d4fc303c8875d 100644 --- a/doc/howto/dev/new_op_en.md +++ b/doc/howto/dev/new_op_en.md @@ -293,41 +293,27 @@ class TestMulGradOp(GradientChecker): 'Y': np.random.random((84, 100)).astype("float32") } - def test_cpu_gpu_compare(self): - self.compare_grad(self.op, self.inputs) - - def test_normal(self): + def test_check_grad_normal(self): # mul op will enlarge the relative error - self.check_grad( - self.op, self.inputs, ["X", "Y"], "Out", max_relative_error=0.5) + self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5) - def test_ignore_x(self): + def test_check_grad_ingore_x(self): self.check_grad( - self.op, - self.inputs, ["Y"], - "Out", - max_relative_error=0.5, - no_grad_set={"X"}) + ['Y'], 'Out', max_relative_error=0.5, no_grad_set=set("X")) - def test_ignore_y(self): + def test_check_grad_ingore_y(self): self.check_grad( - self.op, - self.inputs, ["X"], - "Out", - max_relative_error=0.5, - no_grad_set={"Y"}) + ['X'], 'Out', max_relative_error=0.5, no_grad_set=set('Y')) ``` Some key points in the code above include: - `create_op("mul")` creates the backward operator's corresponding forward operator. -- `compare_grad` compares results between utilizing the CPU and the GPU. - `test_normal` calls `check_grad` to validate scaling tests' correctness and stability through numeric methods. - - The first variable `self.op` denotes the forward operator. - - The second variable `self.inputs` denotes the input dictionary, which has its key value identical to its `ProtoMaker` definitions. - - The third variable `["X", "Y"]` appoints `X` and `Y` to be scale tested. - - The fourth variable `"Out"` points to the network's final output target `Out`. -- `test_ignore_x` and `test_ignore_y`branches test the cases where there is only one scaling input. + - The first variable `["X", "Y"]` appoints `X` and `Y` to be scale tested. + - The second variable `"Out"` points to the network's final output target `Out`. + - The third variable `max_relative_error` points to the maximum relative tolerance error during scaling tests. +- `test_check_grad_ingore_x` and `test_check_grad_ingore_y`branches test the cases where there is only one scaling input. ### Compiling and Running diff --git a/paddle/framework/CMakeLists.txt b/paddle/framework/CMakeLists.txt index 4aaa43d79612111856dd4dfc954ca2bfd8f4fa63..8a5d8532bb32db917b893f7f59039e08d85c8c34 100644 --- a/paddle/framework/CMakeLists.txt +++ b/paddle/framework/CMakeLists.txt @@ -26,7 +26,7 @@ cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto) cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope) cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry) -cc_library(grad_op_builder SRCS grad_op_builder.cc DEPS operator) +cc_library(grad_op_builder SRCS grad_op_builder.cc DEPS operator proto_desc) cc_library(op_registry SRCS op_registry.cc DEPS grad_op_builder op_proto_maker op_info) cc_test(op_registry_test SRCS op_registry_test.cc DEPS op_registry) cc_test(grad_op_builder_test SRCS grad_op_builder_test.cc DEPS grad_op_builder op_registry add_op) diff --git a/paddle/operators/rowwise_add_op.cu b/paddle/framework/data_type.h similarity index 51% rename from paddle/operators/rowwise_add_op.cu rename to paddle/framework/data_type.h index 4a57f64c890ce99d6060faec6a4a01b107403344..55e3931f870d62dcaddc6c067f66999c59e2a262 100644 --- a/paddle/operators/rowwise_add_op.cu +++ b/paddle/framework/data_type.h @@ -12,12 +12,25 @@ See the License for the specific language governing permissions and limitations under the License. */ -#define EIGEN_USE_GPU -#include "paddle/operators/rowwise_add_op.h" +#pragma once +#include +#include "paddle/framework/framework.pb.h" -namespace ops = paddle::operators; -REGISTER_OP_GPU_KERNEL( - rowwise_add, ops::RowwiseAddKernel); -REGISTER_OP_GPU_KERNEL( - rowwise_add_grad, - ops::RowwiseAddGradKernel); +namespace paddle { +namespace framework { + +inline DataType ToDataType(std::type_index type) { + if (typeid(float).hash_code() == type.hash_code()) { + return DataType::FP32; + } else if (typeid(double).hash_code() == type.hash_code()) { + return DataType::FP64; + } else if (typeid(int).hash_code() == type.hash_code()) { + return DataType::INT32; + } else { + PADDLE_THROW("Not supported"); + return static_cast(-1); + } +} + +} // namespace framework +} // namespace paddle diff --git a/paddle/framework/grad_op_builder.cc b/paddle/framework/grad_op_builder.cc index b02a599a800668b22e7fe39a10fa6dc132e305bd..3661ce41beba1328d1b1cdd9f0f913e693af9cff 100644 --- a/paddle/framework/grad_op_builder.cc +++ b/paddle/framework/grad_op_builder.cc @@ -54,5 +54,44 @@ OperatorBase* BuildGradOp(const OperatorBase* op) { return grad_info.Creator()(info.grad_op_type_, inputs, outputs, op->Attrs()); } +static void TransOpDescArg(const OpDescBind* src_op, const OpArgType& src_type, + bool is_grad, OpDescBind* dst_op, + const OpArgType& dst_type) { + PADDLE_ENFORCE(dst_op != nullptr, + "Protobuf desc of gradient op must be initialized first."); + const auto& proto = OpInfoMap::Instance().Get(src_op->Type()).Proto(); + const auto& src_arg_list = + src_type == OpArgType::IN ? proto.inputs() : proto.outputs(); + for (const auto& arg : src_arg_list) { + if (arg.not_in_gradient() && !is_grad) continue; + const std::string src_name = arg.name(); + std::vector vars = src_type == OpArgType::IN + ? src_op->Input(src_name) + : src_op->Output(src_name); + if (is_grad) { + for (std::string& var : vars) { + var = GradVarName(var); + } + } + std::string dst_name = is_grad ? GradVarName(src_name) : src_name; + dst_type == OpArgType::IN ? dst_op->SetInput(dst_name, vars) + : dst_op->SetOutput(dst_name, vars); + } +} + +void CompleteGradOpDesc(const OpDescBind* forw_op, OpDescBind* grad_op) { + auto& info = OpInfoMap::Instance().Get(forw_op->Type()); + PADDLE_ENFORCE(info.HasGradientOp()); + + grad_op->SetType(info.grad_op_type_); + + TransOpDescArg(forw_op, OpArgType::IN, false, grad_op, OpArgType::IN); + TransOpDescArg(forw_op, OpArgType::OUT, false, grad_op, OpArgType::IN); + TransOpDescArg(forw_op, OpArgType::OUT, true, grad_op, OpArgType::IN); + TransOpDescArg(forw_op, OpArgType::IN, true, grad_op, OpArgType::OUT); + + grad_op->SetAttrMap(forw_op->GetAttrMap()); +} + } // namespace framework } // namespace paddle diff --git a/paddle/framework/grad_op_builder.h b/paddle/framework/grad_op_builder.h index 998f8ebbb5f2f4fb8b7e938b5916afd0f8a7930d..b601406061f9f8f24302251c2144b07b6e65717f 100644 --- a/paddle/framework/grad_op_builder.h +++ b/paddle/framework/grad_op_builder.h @@ -14,6 +14,7 @@ limitations under the License. */ #pragma once +#include "paddle/framework/op_desc.h" #include "paddle/framework/operator.h" namespace paddle { @@ -21,5 +22,7 @@ namespace framework { OperatorBase* BuildGradOp(const OperatorBase* op); +void CompleteGradOpDesc(const OpDescBind* forw_op, OpDescBind* grad_op); + } // namespace framework } // namespace paddle diff --git a/paddle/framework/grad_op_builder_test.cc b/paddle/framework/grad_op_builder_test.cc index 9e3ca563c6765637f8471d142d32cec447f0b977..d09892f81bea34415d454b017258fd2a0d4575db 100644 --- a/paddle/framework/grad_op_builder_test.cc +++ b/paddle/framework/grad_op_builder_test.cc @@ -120,3 +120,82 @@ TEST(GradOpBuilder, IOIgnoredInGradient) { std::vector( {f::GradVarName("in3_1"), f::GradVarName("in3_2")})); } + +TEST(GradOpDescBuilder, MutiInOut) { + f::OpDescBind *forw_op = new f::OpDescBind(); + forw_op->SetType("mult_io"); + forw_op->SetInput("In1", {"in1"}); + forw_op->SetInput("In2_mult", {"in2_1", "in2_2", "in2_3"}); + forw_op->SetInput("In3", {"in3"}); + forw_op->SetOutput("Out1", {"out1"}); + forw_op->SetOutput("Out2_mult", {"out2_1", "out2_2"}); + + f::OpDescBind *grad_op = new f::OpDescBind(); + f::CompleteGradOpDesc(forw_op, grad_op); + + EXPECT_EQ(grad_op->Type(), "mult_io_grad"); + ASSERT_EQ(grad_op->InputNames().size(), 3UL + 2UL + 2UL); + EXPECT_EQ(grad_op->Input("In1"), std::vector({"in1"})); + EXPECT_EQ(grad_op->Input("In2_mult"), + std::vector({"in2_1", "in2_2", "in2_3"})); + EXPECT_EQ(grad_op->Input("In3"), std::vector({"in3"})); + EXPECT_EQ(grad_op->Input("Out1"), std::vector({"out1"})); + EXPECT_EQ(grad_op->Input("Out2_mult"), + std::vector({"out2_1", "out2_2"})); + EXPECT_EQ(grad_op->Input(f::GradVarName("Out1")), + std::vector({f::GradVarName("out1")})); + EXPECT_EQ(grad_op->Input(f::GradVarName("Out2_mult")), + std::vector( + {f::GradVarName("out2_1"), f::GradVarName("out2_2")})); + + ASSERT_EQ(grad_op->OutputNames().size(), 3UL); + EXPECT_EQ(grad_op->Output(f::GradVarName("In1")), + std::vector({f::GradVarName("in1")})); + EXPECT_EQ(grad_op->Output(f::GradVarName("In2_mult")), + std::vector({f::GradVarName("in2_1"), + f::GradVarName("in2_2"), + f::GradVarName("in2_3")})); + EXPECT_EQ(grad_op->Output(f::GradVarName("In3")), + std::vector({f::GradVarName("in3")})); + delete forw_op; + delete grad_op; +} + +TEST(GradOpDescBuilder, IOIgnoredInGradient) { + f::OpDescBind *forw_op = new f::OpDescBind(); + forw_op->SetType("io_ignored"); + forw_op->SetInput("In1", {"in1"}); + forw_op->SetInput("In2_mult", {"in2_1", "in2_2"}); + forw_op->SetInput("In3_mult", {"in3_1", "in3_2"}); + forw_op->SetOutput("Out1_mult", {"out1_1", "out1_2"}); + forw_op->SetOutput("Out2", {"out2"}); + + f::OpDescBind *grad_op = new f::OpDescBind(); + f::CompleteGradOpDesc(forw_op, grad_op); + + EXPECT_EQ(grad_op->Type(), "io_ignored_grad"); + // 'In2' and 'Out2' are ignored in gradient calculating + ASSERT_EQ(grad_op->InputNames().size(), 2UL + 1UL + 2UL); + EXPECT_EQ(grad_op->Input("In1"), std::vector({"in1"})); + EXPECT_EQ(grad_op->Input("In3_mult"), + std::vector({"in3_1", "in3_2"})); + EXPECT_EQ(grad_op->Input("Out1_mult"), + std::vector({"out1_1", "out1_2"})); + EXPECT_EQ(grad_op->Input(f::GradVarName("Out1_mult")), + std::vector( + {f::GradVarName("out1_1"), f::GradVarName("out1_2")})); + EXPECT_EQ(grad_op->Input(f::GradVarName("Out2")), + std::vector({f::GradVarName("out2")})); + + ASSERT_EQ(grad_op->OutputNames().size(), 3UL); + EXPECT_EQ(grad_op->Output(f::GradVarName("In1")), + std::vector({f::GradVarName("in1")})); + EXPECT_EQ(grad_op->Output(f::GradVarName("In2_mult")), + std::vector( + {f::GradVarName("in2_1"), f::GradVarName("in2_2")})); + EXPECT_EQ(grad_op->Output(f::GradVarName("In3_mult")), + std::vector( + {f::GradVarName("in3_1"), f::GradVarName("in3_2")})); + delete forw_op; + delete grad_op; +} \ No newline at end of file diff --git a/paddle/framework/op_desc.cc b/paddle/framework/op_desc.cc index 99b5a9c37700adce56f9a83af3792ef113a873ff..0c12c55dc09f6aa064066b5c73bc5e985a57343f 100644 --- a/paddle/framework/op_desc.cc +++ b/paddle/framework/op_desc.cc @@ -89,6 +89,12 @@ void OpDescBind::SetAttr(const std::string &name, const Attribute &v) { need_update_ = true; } +void OpDescBind::SetAttrMap( + const std::unordered_map &attr_map) { + attrs_ = attr_map; + need_update_ = true; +} + Attribute OpDescBind::GetAttr(const std::string &name) const { auto it = attrs_.find(name); PADDLE_ENFORCE(it != attrs_.end(), "Attribute %s is not found", name); @@ -101,6 +107,11 @@ int OpDescBind::GetBlockAttr(const std::string &name) const { return boost::get(it->second)->idx(); } +const std::unordered_map &OpDescBind::GetAttrMap() + const { + return attrs_; +} + void OpDescBind::Sync() { if (need_update_) { this->op_desc_.mutable_inputs()->Clear(); diff --git a/paddle/framework/op_desc.h b/paddle/framework/op_desc.h index ffc8ac61abfb74e4716f10c457d0fbc18b2e2ab8..0cf7d13971675eb825bcd0c7636896f0862d6ebb 100644 --- a/paddle/framework/op_desc.h +++ b/paddle/framework/op_desc.h @@ -60,10 +60,16 @@ class OpDescBind { void SetBlockAttr(const std::string &name, BlockDescBind &block); + // Only be used in C++ + void SetAttrMap(const std::unordered_map &attr_map); + Attribute GetAttr(const std::string &name) const; int GetBlockAttr(const std::string &name) const; + // Only be used in C++ + const std::unordered_map &GetAttrMap() const; + private: struct SetAttrDescVisitor : public boost::static_visitor { explicit SetAttrDescVisitor(OpDesc::Attr *attr) : attr_(attr) {} diff --git a/paddle/framework/op_registry.h b/paddle/framework/op_registry.h index 90077d0192421f3678a049a723972fcb1e8d67af..4db38badaea8ae22d9ad47951f4941f3bdeb401a 100644 --- a/paddle/framework/op_registry.h +++ b/paddle/framework/op_registry.h @@ -100,13 +100,39 @@ class OpRegistrar : public Registrar { } }; -template +template +struct OpKernelRegistrarFunctor; + +template +struct OpKernelRegistrarFunctor { + using KERNEL_TYPE = + typename std::tuple_element>::type; + + void operator()(const char* op_type) const { + using T = typename KERNEL_TYPE::ELEMENT_TYPE; + OperatorWithKernel::OpKernelKey key(ToDataType(std::type_index(typeid(T))), + PlaceType()); + OperatorWithKernel::AllOpKernels()[op_type][key].reset(new KERNEL_TYPE); + + constexpr auto size = std::tuple_size>::value; + OpKernelRegistrarFunctor + func; + func(op_type); + } +}; + +template +struct OpKernelRegistrarFunctor { + void operator()(const char* op_type) const {} +}; + +// User can register many kernel in one place. The data type could be different. +template class OpKernelRegistrar : public Registrar { public: explicit OpKernelRegistrar(const char* op_type) { - OperatorWithKernel::OpKernelKey key; - key.place_ = PlaceType(); - OperatorWithKernel::AllOpKernels()[op_type][key].reset(new KernelType); + OpKernelRegistrarFunctor func; + func(op_type); } }; diff --git a/paddle/framework/operator.cc b/paddle/framework/operator.cc index d7beff5bc1df1def6bf35381e103cf87eeb68fd0..8b5560ffa1234145fb4291f5730f89fd7375ee15 100644 --- a/paddle/framework/operator.cc +++ b/paddle/framework/operator.cc @@ -22,14 +22,14 @@ namespace framework { template <> Eigen::DefaultDevice& ExecutionContext::GetEigenDevice< platform::CPUPlace, Eigen::DefaultDevice>() const { - return *device_context_.get_eigen_device(); + return *device_context_.GetEigenDevice(); } #ifndef PADDLE_ONLY_CPU template <> Eigen::GpuDevice& ExecutionContext::GetEigenDevice() const { - return *device_context_.get_eigen_device(); + return *device_context_.GetEigenDevice(); } #endif diff --git a/paddle/framework/operator.h b/paddle/framework/operator.h index 79bda2e2f9173ab632307bc52167d7d8c17d4418..310d68d7c1baac231a2f1709af28bfb58ae1a436 100644 --- a/paddle/framework/operator.h +++ b/paddle/framework/operator.h @@ -22,6 +22,7 @@ limitations under the License. */ #include "op_info.h" #include "paddle/framework/attribute.h" +#include "paddle/framework/data_type.h" #include "paddle/framework/framework.pb.h" #include "paddle/framework/lod_tensor.h" #include "paddle/framework/scope.h" @@ -295,21 +296,6 @@ template <> std::vector InferShapeContext::MultiOutput( const std::string& name) const; -template -struct EigenDeviceConverter; - -template <> -struct EigenDeviceConverter { - using EigenDeviceType = Eigen::DefaultDevice; -}; - -#ifndef PADDLE_ONLY_CPU -template <> -struct EigenDeviceConverter { - using EigenDeviceType = Eigen::GpuDevice; -}; -#endif - class ExecutionContext : public InferShapeContext { public: ExecutionContext(const OperatorBase& op, const Scope& scope, @@ -317,8 +303,8 @@ class ExecutionContext : public InferShapeContext { : InferShapeContext(op, scope), device_context_(device_context) {} template ::EigenDeviceType> + typename DeviceType = typename platform::EigenDeviceConverter< + PlaceType>::EigenDeviceType> DeviceType& GetEigenDevice() const; platform::Place GetPlace() const { return device_context_.GetPlace(); } @@ -403,7 +389,7 @@ class RuntimeInferShapeContext : public InferShapeContextBase { const Scope& scope_; }; -class OpKernel { +class OpKernelBase { public: /** * ExecutionContext is the only parameter of Kernel Run function. @@ -414,33 +400,47 @@ class OpKernel { virtual void Compute(const ExecutionContext& context) const = 0; - virtual ~OpKernel() {} + virtual ~OpKernelBase() = default; +}; + +template +class OpKernel : public OpKernelBase { + public: + using ELEMENT_TYPE = T; }; class OperatorWithKernel : public OperatorBase { public: struct OpKernelKey { platform::Place place_; + DataType data_type_; - OpKernelKey() = default; - explicit OpKernelKey(const platform::DeviceContext& dev_ctx) { - place_ = dev_ctx.GetPlace(); - } + OpKernelKey(DataType data_type, platform::Place place) + : place_(place), data_type_(data_type) {} + + OpKernelKey(DataType data_type, const platform::DeviceContext& dev_ctx) + : place_(dev_ctx.GetPlace()), data_type_(data_type) {} bool operator==(const OpKernelKey& o) const { - return platform::places_are_same_class(place_, o.place_); + return platform::places_are_same_class(place_, o.place_) && + data_type_ == o.data_type_; } }; struct OpKernelHash { - std::hash hash_; + std::hash hash_; size_t operator()(const OpKernelKey& key) const { - return hash_(platform::is_gpu_place(key.place_)); + int place = key.place_.which(); + int data_type = static_cast(key.data_type_); + int pre_hash = data_type << NUM_PLACE_TYPE_LIMIT_IN_BIT | + (place & ((1 << NUM_PLACE_TYPE_LIMIT_IN_BIT) - 1)); + return hash_(pre_hash); } }; using OpKernelMap = - std::unordered_map, OpKernelHash>; + std::unordered_map, + OpKernelHash>; OperatorWithKernel(const std::string& type, const VariableNameMap& inputs, const VariableNameMap& outputs, const AttributeMap& attrs) @@ -451,8 +451,10 @@ class OperatorWithKernel : public OperatorBase { RuntimeInferShapeContext infer_shape_ctx(*this, scope); this->InferShape(&infer_shape_ctx); - auto& opKernel = AllOpKernels().at(type_).at(OpKernelKey(dev_ctx)); - opKernel->Compute(ExecutionContext(*this, scope, dev_ctx)); + ExecutionContext ctx(*this, scope, dev_ctx); + auto& opKernel = AllOpKernels().at(type_).at( + OpKernelKey(IndicateDataType(ctx), dev_ctx)); + opKernel->Compute(ctx); } static std::unordered_map& @@ -462,13 +464,43 @@ class OperatorWithKernel : public OperatorBase { } bool SupportGPU() const override { - OperatorWithKernel::OpKernelKey key; - key.place_ = platform::GPUPlace(); - return OperatorWithKernel::AllOpKernels().at(type_).count(key) != 0; + auto& op_kernels = OperatorWithKernel::AllOpKernels().at(type_); + return std::any_of(op_kernels.begin(), op_kernels.end(), + [](OpKernelMap::const_reference kern_pair) { + return platform::is_gpu_place(kern_pair.first.place_); + }); } protected: virtual void InferShape(InferShapeContextBase* ctx) const = 0; + + // indicate kernel DataType by input data. Defaultly all input data must be + // same. + virtual DataType IndicateDataType(const ExecutionContext& ctx) const { + auto& scope = ctx.scope(); + int data_type = -1; + for (auto& input : this->inputs_) { + for (auto& ipt_name : input.second) { + auto* var = scope.FindVar(ipt_name); + if (var != nullptr) { + const Tensor* t = nullptr; + if (var->IsType()) { + t = &var->Get(); + } else if (var->IsType()) { + t = &var->Get(); + } + if (t != nullptr) { + int tmp = static_cast(ToDataType(t->type())); + PADDLE_ENFORCE(tmp == data_type || data_type == -1, + "DataType of Paddle Op must be same."); + data_type = tmp; + } + } + } + } + PADDLE_ENFORCE(data_type != -1, "DataType should be indicated by input"); + return static_cast(data_type); + } }; } // namespace framework diff --git a/paddle/framework/operator_test.cc b/paddle/framework/operator_test.cc index e1d8f040b837a6ad598351dae0427cc7c231e79f..a0c17b41f27d9ec9a0f8e80576a052617919b000 100644 --- a/paddle/framework/operator_test.cc +++ b/paddle/framework/operator_test.cc @@ -114,10 +114,13 @@ class OpWithKernelTest : public OperatorWithKernel { protected: void InferShape(framework::InferShapeContextBase* ctx) const override {} + DataType IndicateDataType(const ExecutionContext& ctx) const override { + return DataType::FP32; + } }; template -class CPUKernelTest : public OpKernel { +class CPUKernelTest : public OpKernel { public: void Compute(const ExecutionContext& ctx) const { std::cout << "this is cpu kernel" << std::endl; @@ -144,7 +147,7 @@ class OpKernelTestMultiInputsProtoAndCheckerMaker } }; -class CPUKernalMultiInputsTest : public OpKernel { +class CPUKernalMultiInputsTest : public OpKernel { public: void Compute(const ExecutionContext& ctx) const { auto xs = ctx.op().Inputs("xs"); diff --git a/paddle/framework/tensor.h b/paddle/framework/tensor.h index f040c09c089ec75c9773d752685be5e232e8f4b7..80a3f0a3935ef6809ebd6f3bfb849d4e87d76d1b 100644 --- a/paddle/framework/tensor.h +++ b/paddle/framework/tensor.h @@ -29,20 +29,10 @@ limitations under the License. */ namespace paddle { -namespace pybind { -namespace details { -template -struct CastToPyBufferImpl; -} -} // namespace pybind - namespace framework { class Tensor { public: - template - friend struct pybind::details::CastToPyBufferImpl; - template friend struct EigenTensor; @@ -119,6 +109,8 @@ class Tensor { return holder_->place(); } + std::type_index type() const { return holder_->type(); } + private: template inline void check_memory_size() const; diff --git a/paddle/gserver/tests/test_MKLDNN.cpp b/paddle/gserver/tests/test_MKLDNN.cpp index 857d07df3e3088be28943d9e2fe58017e9e57f4a..a70b2f17f4f1130322f3c50d244f70fdcf34468b 100644 --- a/paddle/gserver/tests/test_MKLDNN.cpp +++ b/paddle/gserver/tests/test_MKLDNN.cpp @@ -215,13 +215,13 @@ struct testActDesc { static void getAddtoConfig(TestConfig& cfg, const testActDesc& pm) { cfg.biasSize = 0; cfg.layerConfig.set_type("addto"); - size_t layerSize = pm.ih * pm.ih * pm.iw; + size_t layerSize = pm.ic * pm.ih * pm.iw; cfg.layerConfig.set_size(layerSize); cfg.inputDefs.push_back({INPUT_DATA, "layer_0", layerSize, 0}); cfg.layerConfig.add_inputs(); } -void testActivation(std::string& actType, const testActDesc& pm) { +void testActivation(std::string actType, const testActDesc& pm) { // TODO(TJ): remove me when paddle support elu activation if (actType == "mkldnn_elu") { return; @@ -240,6 +240,7 @@ TEST(MKLDNNActivation, Activations) { for (auto type : types) { /* bs, c, h, w*/ testActivation(type, {16, 64, 32, 32}); + testActivation(type, {2, 8, 1, 1}); } } diff --git a/paddle/math/RowBuffer.h b/paddle/math/RowBuffer.h index dbb829c4e24a659e4a97c0a3ba4c5c78b68815d3..9ef5b89680b00981188d78cb312dc75e2c0a79ee 100644 --- a/paddle/math/RowBuffer.h +++ b/paddle/math/RowBuffer.h @@ -99,7 +99,11 @@ public: /** * @brief clear local buffer. It only affect auto-growth buffer. */ - inline void clear() { rowStore_.clear(); } + inline void clear() { + // swap an empty vector to it to free the memory. + std::vector> empty; + rowStore_.swap(empty); + } /** * @brief get current number of rows. diff --git a/paddle/operators/CMakeLists.txt b/paddle/operators/CMakeLists.txt index 21166354937c378dc3f295f9011d034eb24cfc7c..87efb900cd59e6adeb051e0e458f2b86c1b510c9 100644 --- a/paddle/operators/CMakeLists.txt +++ b/paddle/operators/CMakeLists.txt @@ -101,8 +101,8 @@ set(DEPS_OPS op_library(recurrent_op SRCS recurrent_op.cc rnn/recurrent_op_utils.cc DEPS framework_proto tensor net_op) op_library(cond_op SRCS cond_op.cc DEPS framework_proto tensor operator net_op) -op_library(cross_entropy_op DEPS cross_entropy_function) -op_library(softmax_with_cross_entropy_op DEPS cross_entropy_function softmax_function) +op_library(cross_entropy_op DEPS cross_entropy) +op_library(softmax_with_cross_entropy_op DEPS cross_entropy softmax) list(REMOVE_ITEM GENERAL_OPS ${DEPS_OPS}) foreach(src ${GENERAL_OPS}) diff --git a/paddle/operators/accuracy_op.cu b/paddle/operators/accuracy_op.cu index 75e8a989036f0b818687e1fec3e600bb90e86b22..0ca9ef941d4cb15619caea2b6baed197e4b15e5a 100644 --- a/paddle/operators/accuracy_op.cu +++ b/paddle/operators/accuracy_op.cu @@ -47,7 +47,7 @@ __global__ void AccuracyCudaKernel(const int N, const int D, const int* Xdata, } template -class AccuracyOpCUDAKernel : public framework::OpKernel { +class AccuracyOpCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), diff --git a/paddle/operators/accuracy_op.h b/paddle/operators/accuracy_op.h index fe704efe1c979f4fc6a5a37184e51b416f5e517f..12c6b9aac8819caedbc02017cee81b37322bb72a 100644 --- a/paddle/operators/accuracy_op.h +++ b/paddle/operators/accuracy_op.h @@ -35,7 +35,7 @@ template ; template -class AccuracyKernel : public framework::OpKernel { +class AccuracyKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { auto* inference = ctx.Input("Inference"); diff --git a/paddle/operators/activation_op.cc b/paddle/operators/activation_op.cc index f77e1c572e33533ac672e3d476a7e6dad122031f..7ae4d2f6b6c0b0f30c06adc34c811bfe34b59fa6 100644 --- a/paddle/operators/activation_op.cc +++ b/paddle/operators/activation_op.cc @@ -132,6 +132,17 @@ class SquareOpMaker : public framework::OpProtoAndCheckerMaker { } }; +class SoftsignOpMaker : public framework::OpProtoAndCheckerMaker { + public: + SoftsignOpMaker(framework::OpProto *proto, + framework::OpAttrChecker *op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddInput("X", "Input of Softsign operator"); + AddOutput("Y", "Output of Softsign operator"); + AddComment("Softsign activation operator, softsign(x) = x / (1 + |x|)"); + } +}; + template class BReluOpMaker : public framework::OpProtoAndCheckerMaker { public: @@ -195,111 +206,57 @@ class STanhOpMaker : public framework::OpProtoAndCheckerMaker { } // namespace paddle namespace ops = paddle::operators; + REGISTER_OP(sigmoid, ops::ActivationOp, ops::SigmoidOpMaker, sigmoid_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL(sigmoid, - ops::ActivationKernel>); -REGISTER_OP_CPU_KERNEL( - sigmoid_grad, ops::ActivationGradKernel>); REGISTER_OP(exp, ops::ActivationOp, ops::ExpOpMaker, exp_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL( - exp, - ops::ActivationKernel); -REGISTER_OP_CPU_KERNEL(exp_grad, - ops::ActivationGradKernel); REGISTER_OP(relu, ops::ActivationOp, ops::ReluOpMaker, relu_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL(relu, - ops::ActivationKernel>); -REGISTER_OP_CPU_KERNEL( - relu_grad, ops::ActivationGradKernel>); REGISTER_OP(tanh, ops::ActivationOp, ops::TanhOpMaker, tanh_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL( - tanh, - ops::ActivationKernel); -REGISTER_OP_CPU_KERNEL( - tanh_grad, ops::ActivationGradKernel>); REGISTER_OP(sqrt, ops::ActivationOp, ops::SqrtOpMaker, sqrt_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL( - sqrt, - ops::ActivationKernel); -REGISTER_OP_CPU_KERNEL( - sqrt_grad, ops::ActivationGradKernel>); REGISTER_OP(abs, ops::ActivationOp, ops::AbsOpMaker, abs_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL( - abs, - ops::ActivationKernel); -REGISTER_OP_CPU_KERNEL(abs_grad, - ops::ActivationGradKernel); REGISTER_OP(reciprocal, ops::ActivationOp, ops::ReciprocalOpMaker, reciprocal_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL(reciprocal, - ops::ActivationKernel>); -REGISTER_OP_CPU_KERNEL( - reciprocal_grad, - ops::ActivationGradKernel>); REGISTER_OP(log, ops::ActivationOp, ops::LogOpMaker, log_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL( - log, - ops::ActivationKernel); -REGISTER_OP_CPU_KERNEL( - log_grad, ops::ActivationGradKernel>); REGISTER_OP(square, ops::ActivationOp, ops::SquareOpMaker, square_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL(square, - ops::ActivationKernel); -REGISTER_OP_CPU_KERNEL( - square_grad, ops::ActivationGradKernel>); + +REGISTER_OP(softsign, ops::ActivationOp, ops::SoftsignOpMaker, softsign_grad, + ops::ActivationOpGrad); REGISTER_OP(brelu, ops::ActivationOp, ops::BReluOpMaker, brelu_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL(brelu, - ops::BReluKernel); -REGISTER_OP_CPU_KERNEL(brelu_grad, - ops::BReluGradKernel); REGISTER_OP(soft_relu, ops::ActivationOp, ops::SoftReluOpMaker, soft_relu_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL(soft_relu, - ops::SoftReluKernel); -REGISTER_OP_CPU_KERNEL( - soft_relu_grad, ops::SoftReluGradKernel); REGISTER_OP(pow, ops::ActivationOp, ops::PowOpMaker, pow_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL(pow, ops::PowKernel); -REGISTER_OP_CPU_KERNEL(pow_grad, - ops::PowGradKernel); REGISTER_OP(stanh, ops::ActivationOp, ops::STanhOpMaker, stanh_grad, ops::ActivationOpGrad); -REGISTER_OP_CPU_KERNEL(stanh, - ops::STanhKernel); -REGISTER_OP_CPU_KERNEL(stanh_grad, - ops::STanhGradKernel); + +#define REGISTER_ACTIVATION_CPU_KERNEL(act_type, functor, grad_functor) \ + REGISTER_OP_CPU_KERNEL( \ + act_type, \ + paddle::operators::ActivationKernel>); \ + REGISTER_OP_CPU_KERNEL(act_type##_grad, \ + paddle::operators::ActivationGradKernel< \ + paddle::platform::CPUPlace, \ + paddle::operators::grad_functor>); + +FOR_EACH_KERNEL_FUNCTOR(REGISTER_ACTIVATION_CPU_KERNEL); diff --git a/paddle/operators/activation_op.cu b/paddle/operators/activation_op.cu index feed1302b292a546f88fa35457c86aa2cfdaa307..93e9f1c694bacba48c4f8c46f90fb5b512bead99 100644 --- a/paddle/operators/activation_op.cu +++ b/paddle/operators/activation_op.cu @@ -15,86 +15,14 @@ #define EIGEN_USE_GPU #include "paddle/operators/activation_op.h" -namespace ops = paddle::operators; - -REGISTER_OP_GPU_KERNEL(sigmoid, - ops::ActivationKernel>); -REGISTER_OP_GPU_KERNEL( - sigmoid_grad, ops::ActivationGradKernel>); - -REGISTER_OP_GPU_KERNEL( - exp, - ops::ActivationKernel); -REGISTER_OP_GPU_KERNEL(exp_grad, - ops::ActivationGradKernel); -REGISTER_OP_GPU_KERNEL(relu, - ops::ActivationKernel>); -REGISTER_OP_GPU_KERNEL( - relu_grad, ops::ActivationGradKernel>); - -REGISTER_OP_GPU_KERNEL( - tanh, - ops::ActivationKernel); -REGISTER_OP_GPU_KERNEL( - tanh_grad, ops::ActivationGradKernel>); - -REGISTER_OP_GPU_KERNEL( - sqrt, - ops::ActivationKernel); -REGISTER_OP_GPU_KERNEL( - sqrt_grad, ops::ActivationGradKernel>); - -REGISTER_OP_GPU_KERNEL( - abs, - ops::ActivationKernel); -REGISTER_OP_GPU_KERNEL(abs_grad, - ops::ActivationGradKernel); - -REGISTER_OP_GPU_KERNEL(reciprocal, - ops::ActivationKernel>); -REGISTER_OP_GPU_KERNEL( - reciprocal_grad, - ops::ActivationGradKernel>); - -REGISTER_OP_GPU_KERNEL( - log, - ops::ActivationKernel); -REGISTER_OP_GPU_KERNEL( - log_grad, ops::ActivationGradKernel>); - -REGISTER_OP_GPU_KERNEL(square, - ops::ActivationKernel); -REGISTER_OP_GPU_KERNEL( - square_grad, ops::ActivationGradKernel>); - -REGISTER_OP_GPU_KERNEL(brelu, - ops::BReluKernel); -REGISTER_OP_GPU_KERNEL(brelu_grad, - ops::BReluGradKernel); - -REGISTER_OP_GPU_KERNEL(soft_relu, - ops::SoftReluKernel); -REGISTER_OP_GPU_KERNEL( - soft_relu_grad, ops::SoftReluGradKernel); - -REGISTER_OP_GPU_KERNEL(pow, ops::PowKernel); -REGISTER_OP_GPU_KERNEL(pow_grad, - ops::PowGradKernel); - -REGISTER_OP_GPU_KERNEL(stanh, - ops::STanhKernel); -REGISTER_OP_GPU_KERNEL(stanh_grad, - ops::STanhGradKernel); +#define REGISTER_ACTIVATION_GPU_KERNEL(act_type, functor, grad_functor) \ + REGISTER_OP_GPU_KERNEL( \ + act_type, \ + paddle::operators::ActivationKernel>); \ + REGISTER_OP_GPU_KERNEL(act_type##_grad, \ + paddle::operators::ActivationGradKernel< \ + paddle::platform::GPUPlace, \ + paddle::operators::grad_functor>); + +FOR_EACH_KERNEL_FUNCTOR(REGISTER_ACTIVATION_GPU_KERNEL); diff --git a/paddle/operators/activation_op.h b/paddle/operators/activation_op.h index 15f8afb4ba45cc989fe7576b82b8bf853b1df7de..ff35c2d97e856ab76581c74512a0b451ea6fe60c 100644 --- a/paddle/operators/activation_op.h +++ b/paddle/operators/activation_op.h @@ -19,9 +19,12 @@ namespace paddle { namespace operators { -template -class ActivationKernel : public framework::OpKernel { +template +class ActivationKernel + : public framework::OpKernel { public: + using T = typename Functor::ELEMENT_TYPE; + void Compute(const framework::ExecutionContext& context) const override { auto* X = context.Input("X"); auto* Y = context.Output("Y"); @@ -31,13 +34,20 @@ class ActivationKernel : public framework::OpKernel { auto y = framework::EigenVector::Flatten(*Y); auto place = context.GetEigenDevice(); Functor functor; + + auto attrs = functor.GetAttrs(); + for (auto& attr : attrs) { + *attr.second = context.Attr(attr.first); + } functor(place, x, y); } }; -template -class ActivationGradKernel : public framework::OpKernel { +template +class ActivationGradKernel + : public framework::OpKernel { public: + using T = typename Functor::ELEMENT_TYPE; void Compute(const framework::ExecutionContext& context) const override { auto* X = context.Input("X"); auto* Y = context.Input("Y"); @@ -51,303 +61,322 @@ class ActivationGradKernel : public framework::OpKernel { auto dx = framework::EigenVector::Flatten(*dX); auto place = context.GetEigenDevice(); Functor functor; + auto attrs = functor.GetAttrs(); + for (auto& attr : attrs) { + *attr.second = context.Attr(attr.first); + } functor(place, x, y, dy, dx); } }; +template +struct BaseActivationFunctor { + using ELEMENT_TYPE = T; + + using AttrPair = std::vector>; + + AttrPair GetAttrs() { return AttrPair(); } +}; + // sigmoid(x) = 1 / (1 + exp(-x)) template -struct SigmoidFunctor { +struct SigmoidFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y) { + void operator()(Device d, X x, Y y) const { y.device(d) = static_cast(1) / (static_cast(1) + (-x).exp()); } }; template -struct SigmoidGradFunctor { +struct SigmoidGradFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y, dY dy, dX dx) { + void operator()(Device d, X x, Y y, dY dy, dX dx) const { dx.device(d) = dy * y * (static_cast(1) - y); } }; // exp(x) = e^x -struct ExpFunctor { +template +struct ExpFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y) { + void operator()(Device d, X x, Y y) const { y.device(d) = x.exp(); } }; -struct ExpGradFunctor { +template +struct ExpGradFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y, dY dy, dX dx) { + void operator()(Device d, X x, Y y, dY dy, dX dx) const { dx.device(d) = dy * y; } }; // relu(x) = max(x, 0) template -struct ReluFunctor { +struct ReluFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y) { + void operator()(Device d, X x, Y y) const { y.device(d) = x.cwiseMax(static_cast(0)); } }; template -struct ReluGradFunctor { +struct ReluGradFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y, dY dy, dX dx) { + void operator()(Device d, X x, Y y, dY dy, dX dx) const { dx.device(d) = dy * (x > static_cast(0)).template cast(); } }; // tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)) -struct TanhFunctor { +template +struct TanhFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y) { + void operator()(Device d, X x, Y y) const { y.device(d) = x.tanh(); } }; template -struct TanhGradFunctor { +struct TanhGradFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y, dY dy, dX dx) { + void operator()(Device d, X x, Y y, dY dy, dX dx) const { dx.device(d) = dy * (static_cast(1) - y * y); } }; // sqrt(x) = x^(1/2) -struct SqrtFunctor { +template +struct SqrtFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y) { + void operator()(Device d, X x, Y y) const { y.device(d) = x.sqrt(); } }; template -struct SqrtGradFunctor { +struct SqrtGradFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y, dY dy, dX dx) { + void operator()(Device d, X x, Y y, dY dy, dX dx) const { const Y y_conj = Eigen::numext::conj(y); dx.device(d) = static_cast(0.5) * dy / y_conj; } }; // abs(x) = |x| -struct AbsFunctor { +template +struct AbsFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y) { + void operator()(Device d, X x, Y y) const { y.device(d) = x.abs(); } }; -struct AbsGradFunctor { +template +struct AbsGradFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y, dY dy, dX dx) { + void operator()(Device d, X x, Y y, dY dy, dX dx) const { dx.device(d) = dy * x.sign(); } }; // reciprocal(x) = 1 / x template -struct ReciprocalFunctor { +struct ReciprocalFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y) { + void operator()(Device d, X x, Y y) const { y.device(d) = static_cast(1) / x; } }; template -struct ReciprocalGradFunctor { +struct ReciprocalGradFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y, dY dy, dX dx) { + void operator()(Device d, X x, Y y, dY dy, dX dx) const { dx.device(d) = dy * static_cast(-1) * y * y; } }; // log(x) = natural logarithm of x -struct LogFunctor { +template +struct LogFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y) { + void operator()(Device d, X x, Y y) const { y.device(d) = x.log(); } }; template -struct LogGradFunctor { +struct LogGradFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y, dY dy, dX dx) { + void operator()(Device d, X x, Y y, dY dy, dX dx) const { dx.device(d) = dy * (static_cast(1) / x); } }; // square(x) = x^2 -struct SquareFunctor { +template +struct SquareFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y) { + void operator()(Device d, X x, Y y) const { y.device(d) = x.square(); } }; template -struct SquareGradFunctor { +struct SquareGradFunctor : public BaseActivationFunctor { template - void operator()(Device d, X x, Y y, dY dy, dX dx) { + void operator()(Device d, X x, Y y, dY dy, dX dx) const { dx.device(d) = dy * static_cast(2) * x; } }; -template -class BReluKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* Y = context.Output("Y"); - auto t_min = static_cast(context.Attr("t_min")); - auto t_max = static_cast(context.Attr("t_max")); - Y->mutable_data(context.GetPlace()); +template +struct BReluFunctor : public BaseActivationFunctor { + float t_min; + float t_max; + + // NOTE: Explicit hides the `BaseActivationFunctor::GetAttrs` + // not polymorphism for speed. + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"t_min", &t_min}, {"t_max", &t_max}}; + } - auto x = framework::EigenVector::Flatten(*X); - auto y = framework::EigenVector::Flatten(*Y); - auto place = context.GetEigenDevice(); - y.device(place) = x.cwiseMax(t_min).cwiseMin(t_max); + template + void operator()(Device d, X x, Y y) const { + y.device(d) = x.cwiseMax(t_min).cwiseMin(t_max); } }; -template -class BReluGradKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* dY = context.Input(framework::GradVarName("Y")); - auto* dX = context.Output(framework::GradVarName("X")); - auto t_min = static_cast(context.Attr("t_min")); - auto t_max = static_cast(context.Attr("t_max")); - dX->mutable_data(context.GetPlace()); +template +struct BReluGradFunctor : public BaseActivationFunctor { + float t_min; + float t_max; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"t_min", &t_min}, {"t_max", &t_max}}; + } + template + void operator()(Device d, X x, Y y, dY dy, dX dx) const { + dx.device(d) = dy * ((x > t_min) * (x < t_max)).template cast(); + } +}; - auto dy = framework::EigenVector::Flatten(*dY); - auto x = framework::EigenVector::Flatten(*X); - auto dx = framework::EigenVector::Flatten(*dX); - auto place = context.GetEigenDevice(); +// softsign(x) = x / (1 + |x|) +template +struct SoftsignFunctor : public BaseActivationFunctor { + template + void operator()(Device d, X x, Y y) { + y.device(d) = x / (static_cast(1) + x.abs()); + } +}; - dx.device(place) = dy * ((x > t_min) * (x < t_max)).template cast(); +// d(softsign(x))/dx = 1 / (1 + |x|)^2 +// Taken from https://en.wikipedia.org/wiki/Activation_function +template +struct SoftsignGradFunctor : public BaseActivationFunctor { + template + void operator()(Device d, X x, Y y, dY dy, dX dx) { + dx.device(d) = + dy * (static_cast(1) / (static_cast(1) + x.abs()).square()); } }; -template -class SoftReluKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* Y = context.Output("Y"); - auto threshold = static_cast(context.Attr("threshold")); - Y->mutable_data(context.GetPlace()); +template +struct SoftReluFunctor : public BaseActivationFunctor { + float threshold; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"threshold", &threshold}}; + } - auto x = framework::EigenVector::Flatten(*X); - auto y = framework::EigenVector::Flatten(*Y); - auto place = context.GetEigenDevice(); - auto temp = x.cwiseMax(-threshold).cwiseMin(threshold).eval(); - y.device(place) = (static_cast(1) + temp.exp()).log(); + template + void operator()(Device d, X x, Y y) const { + auto temp = x.cwiseMax(-threshold).cwiseMin(threshold); + y.device(d) = (static_cast(1) + temp.exp()).log(); } }; -template -class SoftReluGradKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* Y = context.Input("Y"); - auto* dY = context.Input(framework::GradVarName("Y")); - auto* dX = context.Output(framework::GradVarName("X")); - auto threshold = static_cast(context.Attr("threshold")); - dX->mutable_data(context.GetPlace()); - - auto x = framework::EigenVector::Flatten(*X); - auto y = framework::EigenVector::Flatten(*Y); - auto dy = framework::EigenVector::Flatten(*dY); - auto dx = framework::EigenVector::Flatten(*dX); - auto place = context.GetEigenDevice(); +template +struct SoftReluGradFunctor : public BaseActivationFunctor { + float threshold; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"threshold", &threshold}}; + } + template + void operator()(Device d, X x, Y y, dY dy, dX dx) const { auto temp = ((x > -threshold) * (x < threshold)).template cast().eval(); - dx.device(place) = dy * (static_cast(1) - (-y).exp()) * temp; + dx.device(d) = dy * (static_cast(1) - (-y).exp()) * temp; } }; -template -class PowKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* Y = context.Output("Y"); - auto factor = static_cast(context.Attr("factor")); - Y->mutable_data(context.GetPlace()); - - auto x = framework::EigenVector::Flatten(*X); - auto y = framework::EigenVector::Flatten(*Y); - auto place = context.GetEigenDevice(); - y.device(place) = x.pow(factor); +template +struct PowFunctor : public BaseActivationFunctor { + float factor; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"factor", &factor}}; + } + template + void operator()(Device d, X x, Y y) const { + y.device(d) = x.pow(factor); } }; -template -class PowGradKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* dY = context.Input(framework::GradVarName("Y")); - auto* dX = context.Output(framework::GradVarName("X")); - auto factor = static_cast(context.Attr("factor")); - dX->mutable_data(context.GetPlace()); - - auto dy = framework::EigenVector::Flatten(*dY); - auto x = framework::EigenVector::Flatten(*X); - auto dx = framework::EigenVector::Flatten(*dX); - auto place = context.GetEigenDevice(); - - dx.device(place) = dy * factor * x.pow(factor - static_cast(1)); +template +struct PowGradFunctor : public BaseActivationFunctor { + float factor; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"factor", &factor}}; + } + template + void operator()(Device d, X x, Y y, dY dy, dX dx) const { + dx.device(d) = dy * factor * x.pow(factor - static_cast(1)); } }; -template -class STanhKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* Y = context.Output("Y"); - auto scale_a = static_cast(context.Attr("scale_a")); - auto scale_b = static_cast(context.Attr("scale_b")); - Y->mutable_data(context.GetPlace()); +template +struct STanhFunctor : public BaseActivationFunctor { + float scale_a; + float scale_b; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"scale_a", &scale_a}, {"scale_b", &scale_b}}; + } - auto x = framework::EigenVector::Flatten(*X); - auto y = framework::EigenVector::Flatten(*Y); - auto place = context.GetEigenDevice(); - y.device(place) = scale_b * (scale_a * x).tanh(); + template + void operator()(Device d, X x, Y y) const { + y.device(d) = scale_b * (scale_a * x).tanh(); } }; -template -class STanhGradKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* X = context.Input("X"); - auto* dY = context.Input(framework::GradVarName("Y")); - auto* dX = context.Output(framework::GradVarName("X")); - auto scale_a = static_cast(context.Attr("scale_a")); - auto scale_b = static_cast(context.Attr("scale_b")); - dX->mutable_data(context.GetPlace()); - - auto dy = framework::EigenVector::Flatten(*dY); - auto x = framework::EigenVector::Flatten(*X); - auto dx = framework::EigenVector::Flatten(*dX); - auto place = context.GetEigenDevice(); +template +struct STanhGradFunctor : public BaseActivationFunctor { + float scale_a; + float scale_b; + typename BaseActivationFunctor::AttrPair GetAttrs() { + return {{"scale_a", &scale_a}, {"scale_b", &scale_b}}; + } + template + void operator()(Device d, X x, Y y, dY dy, dX dx) const { auto temp = (scale_a * x).tanh() * (scale_a * x).tanh(); - dx.device(place) = dy * scale_a * scale_b * (static_cast(1) - temp); + dx.device(d) = dy * scale_a * scale_b * (static_cast(1) - temp); } }; } // namespace operators } // namespace paddle + +#define FOR_EACH_KERNEL_FUNCTOR(__macro) \ + __macro(sigmoid, SigmoidFunctor, SigmoidGradFunctor); \ + __macro(exp, ExpFunctor, ExpGradFunctor); \ + __macro(relu, ReluFunctor, ReluGradFunctor); \ + __macro(tanh, TanhFunctor, TanhGradFunctor); \ + __macro(sqrt, SqrtFunctor, SqrtGradFunctor); \ + __macro(abs, AbsFunctor, AbsGradFunctor); \ + __macro(reciprocal, ReciprocalFunctor, ReciprocalGradFunctor); \ + __macro(log, LogFunctor, LogGradFunctor); \ + __macro(square, SquareFunctor, SquareGradFunctor); \ + __macro(brelu, BReluFunctor, BReluGradFunctor); \ + __macro(soft_relu, SoftReluFunctor, SoftReluGradFunctor); \ + __macro(pow, PowFunctor, PowGradFunctor); \ + __macro(stanh, STanhFunctor, STanhGradFunctor); \ + __macro(softsign, SoftsignFunctor, SoftsignGradFunctor) diff --git a/paddle/operators/add_op.h b/paddle/operators/add_op.h index a7307b6818aa3d10ff215d06281e2b53196fd101..75163032a1ff11a1f18cfd0a4ff7289ff0cb66bf 100644 --- a/paddle/operators/add_op.h +++ b/paddle/operators/add_op.h @@ -25,7 +25,7 @@ template ; template -class AddKernel : public framework::OpKernel { +class AddKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* input0 = context.Input("X"); diff --git a/paddle/operators/clip_op.h b/paddle/operators/clip_op.h index ce1d4e1f460414e6e4acee4fa3207f309c55d86b..ac702e9935201ba5263a80ebeb1ab22fa0bd1340 100644 --- a/paddle/operators/clip_op.h +++ b/paddle/operators/clip_op.h @@ -56,7 +56,7 @@ class ClipGradFunctor { }; template -class ClipKernel : public framework::OpKernel { +class ClipKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto max = context.Attr("max"); @@ -73,7 +73,7 @@ class ClipKernel : public framework::OpKernel { }; template -class ClipGradKernel : public framework::OpKernel { +class ClipGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto max = context.Attr("max"); diff --git a/paddle/operators/concat_op.h b/paddle/operators/concat_op.h index b37063261123bce1f22c39ab021e88f2faf58e9f..c113f19fb5cf806709bff845ee0f1078b34014bb 100644 --- a/paddle/operators/concat_op.h +++ b/paddle/operators/concat_op.h @@ -22,7 +22,7 @@ namespace paddle { namespace operators { template -class ConcatKernel : public framework::OpKernel { +class ConcatKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { auto ins = ctx.MultiInput("X"); @@ -44,7 +44,7 @@ class ConcatKernel : public framework::OpKernel { }; template -class ConcatGradKernel : public framework::OpKernel { +class ConcatGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const { auto* in = ctx.Input(framework::GradVarName("Out")); diff --git a/paddle/operators/cos_sim_op.h b/paddle/operators/cos_sim_op.h index bcf6f758cae561a2e22f5be6c7a242647ef1c144..68c56f531f941e1b8f66ac7ba6bf318881642c4f 100644 --- a/paddle/operators/cos_sim_op.h +++ b/paddle/operators/cos_sim_op.h @@ -28,7 +28,7 @@ template ; template -class CosSimKernel : public framework::OpKernel { +class CosSimKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { // get Tensor @@ -67,7 +67,7 @@ class CosSimKernel : public framework::OpKernel { }; template -class CosSimGradKernel : public framework::OpKernel { +class CosSimGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { // get Tensor diff --git a/paddle/operators/crop_op.h b/paddle/operators/crop_op.h index ac3aeaf41e206c1deb74c7022c36f02c4777a84b..2e72583d68d0acf0e2f5044637dba55de3b57209 100644 --- a/paddle/operators/crop_op.h +++ b/paddle/operators/crop_op.h @@ -27,7 +27,7 @@ using EigenTensor = framework::EigenTensor; using framework::Tensor; template -class CropKernel : public framework::OpKernel { +class CropKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* x = context.Input("X"); @@ -69,7 +69,7 @@ void CropGradFunction(const framework::ExecutionContext& context) { } template -class CropGradKernel : public framework::OpKernel { +class CropGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { size_t rank = diff --git a/paddle/operators/cross_entropy_op.cc b/paddle/operators/cross_entropy_op.cc index 26fc9b51c44d21d92851030449e116538f937846..4b67887f3638f32a89d1a4fd1316c0596b444629 100644 --- a/paddle/operators/cross_entropy_op.cc +++ b/paddle/operators/cross_entropy_op.cc @@ -47,6 +47,12 @@ class CrossEntropyOp : public framework::OperatorWithKernel { ctx->SetOutputDim("Y", {x_dims[0], 1}); ctx->ShareLoD("X", /*->*/ "Y"); } + + // CrossEntropy's data type just determined by "X" + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.Input("X")->type()); + } }; class CrossEntropyGradientOp : public framework::OperatorWithKernel { @@ -87,6 +93,12 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel { } ctx->SetOutputDim(framework::GradVarName("X"), x_dims); } + + // CrossEntropy's data type just determined by "X" + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.Input("X")->type()); + } }; class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker { diff --git a/paddle/operators/cross_entropy_op.cu b/paddle/operators/cross_entropy_op.cu index 1cfeb7a53b047541322ac53c5b7249e660039d5c..5e2024e0ea9040b758e1cec4dbaa4b329bbb727e 100644 --- a/paddle/operators/cross_entropy_op.cu +++ b/paddle/operators/cross_entropy_op.cu @@ -18,14 +18,6 @@ namespace paddle { namespace operators { namespace { -// TODO(qingqing): make zero setting a common function. -template -__global__ void Zero(T* X, const int N) { - for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N; - i += blockDim.x * gridDim.x) { - X[i] = 0.0; - } -} template __global__ void CrossEntropyGradientKernel(T* dX, const T* dY, const T* X, @@ -53,7 +45,7 @@ __global__ void SoftCrossEntropyGradientKernel(T* dX, const T* dY, const T* X, } // namespace template -class CrossEntropyOpCUDAKernel : public framework::OpKernel { +class CrossEntropyOpCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), @@ -64,12 +56,12 @@ class CrossEntropyOpCUDAKernel : public framework::OpKernel { y->mutable_data(ctx.GetPlace()); math::CrossEntropyFunctor()( - ctx, y, x, label, ctx.Attr("softLabel")); + ctx.device_context(), y, x, label, ctx.Attr("softLabel")); } }; template -class CrossEntropyGradientOpCUDAKernel : public framework::OpKernel { +class CrossEntropyGradientOpCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), @@ -99,11 +91,7 @@ class CrossEntropyGradientOpCUDAKernel : public framework::OpKernel { .stream()>>>(dx_data, dy_data, x_data, label_data, batch_size, class_num); } else { - Zero<<( - ctx.device_context()) - .stream()>>>(dx_data, batch_size * class_num); - + math::SetConstant(ctx.device_context(), dx, 0); auto* label_data = label->data(); grid = (batch_size + block - 1) / block; CrossEntropyGradientKernel<<< diff --git a/paddle/operators/cross_entropy_op.h b/paddle/operators/cross_entropy_op.h index 1f67461d3fadb1a979832ad049d4e0098256b834..d2d321aa7ed8e32cc19d5a171beea34d36195b10 100644 --- a/paddle/operators/cross_entropy_op.h +++ b/paddle/operators/cross_entropy_op.h @@ -16,6 +16,7 @@ limitations under the License. */ #include "paddle/framework/eigen.h" #include "paddle/framework/op_registry.h" #include "paddle/operators/math/cross_entropy.h" +#include "paddle/operators/math/math_function.h" namespace paddle { namespace operators { @@ -26,7 +27,7 @@ template ; template -class CrossEntropyOpKernel : public framework::OpKernel { +class CrossEntropyOpKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), @@ -37,12 +38,12 @@ class CrossEntropyOpKernel : public framework::OpKernel { y->mutable_data(ctx.GetPlace()); math::CrossEntropyFunctor()( - ctx, y, x, labels, ctx.Attr("softLabel")); + ctx.device_context(), y, x, labels, ctx.Attr("softLabel")); } }; template -class CrossEntropyGradientOpKernel : public framework::OpKernel { +class CrossEntropyGradientOpKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), @@ -69,8 +70,7 @@ class CrossEntropyGradientOpKernel : public framework::OpKernel { const T* x_data = x->data(); const int* label_data = label->data(); - // TODO(qingqing): make zero setting a common function. - memset(dx_data, 0, sizeof(T) * batch_size * class_num); + math::SetConstant(ctx.device_context(), dx, 0); for (int i = 0; i < batch_size; ++i) { PADDLE_ASSERT(label_data[i] >= 0 || label_data[i] < class_num); diff --git a/paddle/operators/dropout_op.cu b/paddle/operators/dropout_op.cu index a04e4a22cc09d4e8106a528e490ccf8e90681c08..30c769000f2b98c69eaa78a4c139630dd0956386 100644 --- a/paddle/operators/dropout_op.cu +++ b/paddle/operators/dropout_op.cu @@ -47,7 +47,7 @@ struct MaskGenerator { // Use std::random and thrust::random(thrust is a std library in CUDA) to // implement uniform random. template -class GPUDropoutKernel : public framework::OpKernel { +class GPUDropoutKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* x = context.Input("X"); diff --git a/paddle/operators/dropout_op.h b/paddle/operators/dropout_op.h index d57f64afcb3558aeea6aed23fae06866e9af874a..745525fe81dadb22cbb64d66203f5a75608d3718 100644 --- a/paddle/operators/dropout_op.h +++ b/paddle/operators/dropout_op.h @@ -26,7 +26,7 @@ template ; template -class CPUDropoutKernel : public framework::OpKernel { +class CPUDropoutKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* x = context.Input("X"); @@ -62,7 +62,7 @@ class CPUDropoutKernel : public framework::OpKernel { }; template -class DropoutGradKernel : public framework::OpKernel { +class DropoutGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { PADDLE_ENFORCE(context.Attr("is_training"), diff --git a/paddle/operators/elementwise_add_op.h b/paddle/operators/elementwise_add_op.h index e9f78ef26e05878053d968c35f17b456c128827a..f04fe3ec6069ab1bf227be6a3a5c10ee908e4824 100644 --- a/paddle/operators/elementwise_add_op.h +++ b/paddle/operators/elementwise_add_op.h @@ -20,7 +20,7 @@ namespace paddle { namespace operators { template -class ElementwiseAddKernel : public framework::OpKernel { +class ElementwiseAddKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { ElementwiseCompute(ctx); @@ -101,7 +101,7 @@ struct ElementwiseAddBroadCast2GradFunctor { }; template -class ElementwiseAddGradKernel : public framework::OpKernel { +class ElementwiseAddGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { ElementwiseGradCompute, diff --git a/paddle/operators/elementwise_div_op.h b/paddle/operators/elementwise_div_op.h index 99b6d9c1991edfb0018f8a459dfa373948cec434..8946ff3d25c2aff3dc3aa69368f0083371cd2fef 100644 --- a/paddle/operators/elementwise_div_op.h +++ b/paddle/operators/elementwise_div_op.h @@ -20,7 +20,7 @@ namespace paddle { namespace operators { template -class ElementwiseDivKernel : public framework::OpKernel { +class ElementwiseDivKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { ElementwiseCompute(ctx); @@ -103,7 +103,7 @@ struct ElementwiseDivBroadCast2GradFunctor { }; template -class ElementwiseDivGradKernel : public framework::OpKernel { +class ElementwiseDivGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { ElementwiseGradCompute, diff --git a/paddle/operators/elementwise_mul_op.cc b/paddle/operators/elementwise_mul_op.cc index bda5dfe03e974740fe4a07191ae6b68ebfcd5d3a..da7765aa6a7a81c9e0b4f462022cad54c16aec47 100644 --- a/paddle/operators/elementwise_mul_op.cc +++ b/paddle/operators/elementwise_mul_op.cc @@ -36,7 +36,9 @@ REGISTER_OP(elementwise_mul, ops::ElementwiseOp, ops::ElementwiseMulOpMaker, elementwise_mul_grad, ops::ElementwiseOpGrad); REGISTER_OP_CPU_KERNEL( elementwise_mul, - ops::ElementwiseMulKernel); + ops::ElementwiseMulKernel, + ops::ElementwiseMulKernel); REGISTER_OP_CPU_KERNEL( elementwise_mul_grad, - ops::ElementwiseMulGradKernel); + ops::ElementwiseMulGradKernel, + ops::ElementwiseMulGradKernel); diff --git a/paddle/operators/elementwise_mul_op.cu b/paddle/operators/elementwise_mul_op.cu index da08a75596c4d3b89dc8892bd4405464fec96389..056f081d3e6ac349978ff00689700c035bed8e39 100644 --- a/paddle/operators/elementwise_mul_op.cu +++ b/paddle/operators/elementwise_mul_op.cu @@ -19,7 +19,9 @@ namespace ops = paddle::operators; REGISTER_OP_GPU_KERNEL( elementwise_mul, - ops::ElementwiseMulKernel); + ops::ElementwiseMulKernel, + ops::ElementwiseMulKernel); REGISTER_OP_GPU_KERNEL( elementwise_mul_grad, - ops::ElementwiseMulGradKernel); + ops::ElementwiseMulGradKernel, + ops::ElementwiseMulGradKernel); diff --git a/paddle/operators/elementwise_mul_op.h b/paddle/operators/elementwise_mul_op.h index 6ab642378bb0af8593ca0677014aede3c03cff8e..4469b07eaa08a3b011a88e58f1d645dd30b10ced 100644 --- a/paddle/operators/elementwise_mul_op.h +++ b/paddle/operators/elementwise_mul_op.h @@ -19,7 +19,7 @@ namespace paddle { namespace operators { template -class ElementwiseMulKernel : public framework::OpKernel { +class ElementwiseMulKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { ElementwiseCompute(ctx); @@ -102,7 +102,7 @@ struct ElementwiseMulBroadCast2GradFunctor { }; template -class ElementwiseMulGradKernel : public framework::OpKernel { +class ElementwiseMulGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { ElementwiseGradCompute, diff --git a/paddle/operators/elementwise_sub_op.h b/paddle/operators/elementwise_sub_op.h index 3ca1376c73b3332b76a5973e201f9e4fba77cd21..3f40c1c5bcea5e8473765b039de4ee2a16054f0c 100644 --- a/paddle/operators/elementwise_sub_op.h +++ b/paddle/operators/elementwise_sub_op.h @@ -19,7 +19,7 @@ namespace paddle { namespace operators { template -class ElementwiseSubKernel : public framework::OpKernel { +class ElementwiseSubKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { ElementwiseCompute(ctx); @@ -102,7 +102,7 @@ struct ElementwiseSubBroadCast2GradFunctor { }; template -class ElementwiseSubGradKernel : public framework::OpKernel { +class ElementwiseSubGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { ElementwiseGradCompute, diff --git a/paddle/operators/fc_op.cc b/paddle/operators/fc_op.cc index 5ac0e8cc45f007d42f1b6d7f86333f5cbedb3ea8..7c422c81fc479fa2e317bdee1b66017096381d27 100644 --- a/paddle/operators/fc_op.cc +++ b/paddle/operators/fc_op.cc @@ -100,7 +100,7 @@ class FCOp : public NetOp { add_out = Output("AddOut"); AppendOp(framework::OpRegistry::CreateOp( - "rowwise_add", {{"X", {sum_out}}, {"b", {Input("B")}}}, + "elementwise_add", {{"X", {sum_out}}, {"Y", {Input("B")}}}, {{"Out", {add_out}}}, {})); } else { if (Output("AddOut") != framework::kEmptyVarName) { diff --git a/paddle/operators/fill_zeros_like_op.h b/paddle/operators/fill_zeros_like_op.h index 4474581784531faee1741f0b143743e31cc3788f..cdf56a723b117fe7b08ef2749aa2c2978c923d44 100644 --- a/paddle/operators/fill_zeros_like_op.h +++ b/paddle/operators/fill_zeros_like_op.h @@ -20,7 +20,7 @@ namespace paddle { namespace operators { template -class FillZerosLikeKernel : public framework::OpKernel { +class FillZerosLikeKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* output = context.Output("Y"); diff --git a/paddle/operators/gather_op.cc b/paddle/operators/gather_op.cc index 0e3cd174adee1e50d0a63861286a26d325484efb..da22bd0c52c27d7decd10e2e2b34fa38d0620da8 100644 --- a/paddle/operators/gather_op.cc +++ b/paddle/operators/gather_op.cc @@ -37,6 +37,11 @@ class GatherOp : public framework::OperatorWithKernel { output_dims[0] = batch_size; ctx->SetOutputDim("Out", output_dims); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.Input("X")->type()); + } }; class GatherGradOp : public framework::OperatorWithKernel { @@ -47,6 +52,11 @@ class GatherGradOp : public framework::OperatorWithKernel { void InferShape(framework::InferShapeContextBase* ctx) const override { ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X")); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.Input("X")->type()); + } }; class GatherOpMaker : public framework::OpProtoAndCheckerMaker { diff --git a/paddle/operators/gather_op.h b/paddle/operators/gather_op.h index 381854f301870beadb72d9e9b4eb17ff199960fb..073e566e8f6962d62cc1b738672843421dcb4ee5 100644 --- a/paddle/operators/gather_op.h +++ b/paddle/operators/gather_op.h @@ -24,7 +24,7 @@ namespace operators { using Tensor = framework::Tensor; template -class GatherOpKernel : public framework::OpKernel { +class GatherOpKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext &ctx) const override { auto *X = ctx.Input("X"); @@ -37,7 +37,7 @@ class GatherOpKernel : public framework::OpKernel { }; template -class GatherGradientOpKernel : public framework::OpKernel { +class GatherGradientOpKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext &ctx) const override { auto *Index = ctx.Input("Index"); diff --git a/paddle/operators/gaussian_random_op.cc b/paddle/operators/gaussian_random_op.cc index 05120a6e7bcfdb8641c722731f462c89e4223339..5cd2c7d2c066cd31e2d38a3c0d682f02339b4d59 100644 --- a/paddle/operators/gaussian_random_op.cc +++ b/paddle/operators/gaussian_random_op.cc @@ -16,7 +16,7 @@ namespace paddle { namespace operators { template -class CPUGaussianRandomKernel : public framework::OpKernel { +class CPUGaussianRandomKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { float mean = context.Attr("mean"); @@ -56,6 +56,11 @@ class GaussianRandomOp : public framework::OperatorWithKernel { "dims can be one int or array. dims must be set."); ctx->SetOutputDim("Out", framework::make_ddim(temp)); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return static_cast(Attr("data_type")); + } }; class GaussianRandomOpMaker : public framework::OpProtoAndCheckerMaker { @@ -76,6 +81,8 @@ Use to initialize tensor with gaussian random generator. "Random seed of generator." "0 means use system wide seed") .SetDefault(0); + AddAttr("data_type", "output data type") + .SetDefault(framework::DataType::FP32); } }; diff --git a/paddle/operators/gaussian_random_op.cu b/paddle/operators/gaussian_random_op.cu index 2d63b3049988cfc3135a87a57dad56b970df3eab..315560bf1ba8a66b9a3b7d79510d202885e845d6 100644 --- a/paddle/operators/gaussian_random_op.cu +++ b/paddle/operators/gaussian_random_op.cu @@ -37,7 +37,7 @@ struct GaussianGenerator { }; template -class GPUGaussianRandomKernel : public framework::OpKernel { +class GPUGaussianRandomKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* tensor = context.Output("Out"); diff --git a/paddle/operators/gemm_conv2d_op.h b/paddle/operators/gemm_conv2d_op.h index 5c9e81732aa72211c2021382cf9a907880c53c17..323e3f7c3bd506c6b63bf4d1152384649f5da575 100644 --- a/paddle/operators/gemm_conv2d_op.h +++ b/paddle/operators/gemm_conv2d_op.h @@ -25,7 +25,7 @@ namespace operators { using Tensor = framework::Tensor; template -class GemmConv2DKernel : public framework::OpKernel { +class GemmConv2DKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { const Tensor* input = context.Input("Input"); @@ -98,7 +98,7 @@ class GemmConv2DKernel : public framework::OpKernel { }; template -class GemmConvGrad2DKernel : public framework::OpKernel { +class GemmConvGrad2DKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { const Tensor* input = context.Input("Input"); diff --git a/paddle/operators/lookup_table_op.cc b/paddle/operators/lookup_table_op.cc index 9b1314bfbade8551d98b0fbabb7c2968d7600db5..929008fbcbe03bd6591b0a02252b343c46d00b8f 100644 --- a/paddle/operators/lookup_table_op.cc +++ b/paddle/operators/lookup_table_op.cc @@ -36,6 +36,11 @@ class LookupTableOp : public framework::OperatorWithKernel { ctx->SetOutputDim("Out", {ids_dims[0], table_dims[1]}); ctx->ShareLoD("Ids", /*->*/ "Out"); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.Input("W")->type()); + } }; class LookupTableOpMaker : public framework::OpProtoAndCheckerMaker { @@ -69,6 +74,11 @@ class LookupTableOpGrad : public framework::OperatorWithKernel { auto table_dims = ctx->GetInputDim("W"); ctx->SetOutputDim(framework::GradVarName("W"), table_dims); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.Input("W")->type()); + } }; } // namespace operators diff --git a/paddle/operators/lookup_table_op.cu b/paddle/operators/lookup_table_op.cu index 62f63b4f3c876e084e2468001e8bcb9310d16a82..c3808fa9a8de031fcae3ac0417e8c4330b2f5aad 100644 --- a/paddle/operators/lookup_table_op.cu +++ b/paddle/operators/lookup_table_op.cu @@ -61,7 +61,7 @@ __global__ void LookupTableGrad(T* table, const T* output, const int32_t* ids, } template -class LookupTableCUDAKernel : public framework::OpKernel { +class LookupTableCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto table_t = context.Input("W"); @@ -85,7 +85,7 @@ class LookupTableCUDAKernel : public framework::OpKernel { }; template -class LookupTableGradCUDAKernel : public framework::OpKernel { +class LookupTableGradCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto ids_t = context.Input("Ids"); diff --git a/paddle/operators/lookup_table_op.h b/paddle/operators/lookup_table_op.h index a1298906dd4b4209644fe06584f70169519de01c..dfead2fc5b25b9be26bb19cd74a3a94daf62cca6 100644 --- a/paddle/operators/lookup_table_op.h +++ b/paddle/operators/lookup_table_op.h @@ -23,7 +23,7 @@ namespace operators { using Tensor = framework::Tensor; template -class LookupTableKernel : public framework::OpKernel { +class LookupTableKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto table_t = context.Input("W"); // float tensor @@ -44,7 +44,7 @@ class LookupTableKernel : public framework::OpKernel { }; template -class LookupTableGradKernel : public framework::OpKernel { +class LookupTableGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto ids_t = context.Input("Ids"); diff --git a/paddle/operators/lstm_unit_op.cu b/paddle/operators/lstm_unit_op.cu index 6e5e4978994c281416a65af5f8ffdec688768d63..b1db0d53227148de53b04587b943945f8563346e 100644 --- a/paddle/operators/lstm_unit_op.cu +++ b/paddle/operators/lstm_unit_op.cu @@ -90,7 +90,7 @@ __global__ void LSTMUnitGradientKernel(const int nthreads, const int dim, } template -class LstmUnitOpCUDAKernel : public framework::OpKernel { +class LstmUnitOpCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), @@ -121,7 +121,7 @@ class LstmUnitOpCUDAKernel : public framework::OpKernel { }; template -class LstmUnitGradOpCUDAKernel : public framework::OpKernel { +class LstmUnitGradOpCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), diff --git a/paddle/operators/lstm_unit_op.h b/paddle/operators/lstm_unit_op.h index 683034fe15df8cabfdff5e856adb5c0467055064..0dc9a7d9a7aae2e16bc4488731f572f43778baf8 100644 --- a/paddle/operators/lstm_unit_op.h +++ b/paddle/operators/lstm_unit_op.h @@ -33,7 +33,7 @@ inline T tanh(T x) { } template -class LstmUnitKernel : public framework::OpKernel { +class LstmUnitKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), @@ -76,7 +76,7 @@ class LstmUnitKernel : public framework::OpKernel { }; template -class LstmUnitGradKernel : public framework::OpKernel { +class LstmUnitGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { PADDLE_ENFORCE(platform::is_cpu_place(ctx.GetPlace()), diff --git a/paddle/operators/math/CMakeLists.txt b/paddle/operators/math/CMakeLists.txt index 91ae3d49f1df51d9524547f7765285bff9dbb5c5..b39d4f0ac27bf0a8378344f852a602c5ecf4cf6a 100644 --- a/paddle/operators/math/CMakeLists.txt +++ b/paddle/operators/math/CMakeLists.txt @@ -1,16 +1,15 @@ if(WITH_GPU) nv_library(math_function SRCS math_function.cc math_function.cu im2col.cc - im2col.cu DEPS cblas device_context operator) - nv_library(softmax_function SRCS softmax.cc softmax.cu - DEPS operator) - nv_library(cross_entropy_function SRCS cross_entropy.cc cross_entropy.cu - DEPS operator) + im2col.cu DEPS cblas device_context operator) + nv_test(math_function_test SRCS math_function_test.cc DEPS math_function tensor) + nv_library(softmax SRCS softmax.cc softmax.cu DEPS operator) + nv_library(cross_entropy SRCS cross_entropy.cc cross_entropy.cu DEPS operator) else() cc_library(math_function SRCS math_function.cc im2col.cc - DEPS cblas device_context operator) - cc_library(softmax_function SRCS softmax.cc DEPS operator) - cc_library(cross_entropy_function SRCS cross_entropy.cc DEPS operator) + DEPS cblas device_context operator) + cc_test(math_function_test SRCS math_function_test.cc DEPS math_function tensor) + cc_library(softmax SRCS softmax.cc DEPS operator) + cc_library(cross_entropy SRCS cross_entropy.cc DEPS operator) endif() -nv_test(math_function_test SRCS math_function_test.cc DEPS math_function tensor) cc_test(im2col_test SRCS im2col_test.cc DEPS math_function tensor) diff --git a/paddle/operators/math/cross_entropy.cc b/paddle/operators/math/cross_entropy.cc index a5a426bc7b16852e67afd790df7a91d89a458c8a..150a65f2751aaeac17f9403404d2efd990a0c72b 100644 --- a/paddle/operators/math/cross_entropy.cc +++ b/paddle/operators/math/cross_entropy.cc @@ -26,8 +26,8 @@ using EigenMatrix = framework::EigenMatrix; template class CrossEntropyFunctor { public: - void operator()(const framework::ExecutionContext& ctx, - framework::Tensor* out, const framework::Tensor* prob, + void operator()(const platform::DeviceContext& ctx, framework::Tensor* out, + const framework::Tensor* prob, const framework::Tensor* labels, const bool softLabel) { const int batch_size = prob->dims()[0]; if (softLabel) { @@ -35,7 +35,7 @@ class CrossEntropyFunctor { auto lbl = EigenMatrix::From(*labels); auto loss = EigenMatrix::From(*out); - loss.device(ctx.GetEigenDevice()) = + loss.device(*ctx.GetEigenDevice()) = -((lbl * in.log().unaryExpr(math::TolerableValue())) .sum(Eigen::DSizes(1)) .reshape(Eigen::DSizes(batch_size, 1))); diff --git a/paddle/operators/math/cross_entropy.cu b/paddle/operators/math/cross_entropy.cu index d14a75a30c01deb86937a3ced43005aed4066d86..367190e6b0682ec62550e869e2f04c3a2b2cbec3 100644 --- a/paddle/operators/math/cross_entropy.cu +++ b/paddle/operators/math/cross_entropy.cu @@ -74,8 +74,8 @@ using Tensor = framework::Tensor; template class CrossEntropyFunctor { public: - void operator()(const framework::ExecutionContext& ctx, - framework::Tensor* out, const framework::Tensor* prob, + void operator()(const platform::DeviceContext& ctx, framework::Tensor* out, + const framework::Tensor* prob, const framework::Tensor* labels, bool softLabel) { const T* prob_data = prob->data(); T* loss_data = out->mutable_data(ctx.GetPlace()); @@ -87,20 +87,18 @@ class CrossEntropyFunctor { const T* label_data = labels->data(); int block = class_num > 512 ? 512 : pow(2, int(std::log2(class_num))); - SoftCrossEntropyKernel< - T><<( - ctx.device_context()) - .stream()>>>(loss_data, prob_data, label_data, class_num); + SoftCrossEntropyKernel<<< + batch_size, block, block * sizeof(T), + reinterpret_cast(ctx).stream()>>>( + loss_data, prob_data, label_data, class_num); } else { const int* label_data = labels->data(); int block = 512; int grid = (batch_size + block - 1) / block; CrossEntropyKernel<<< - grid, block, 0, reinterpret_cast( - ctx.device_context()) - .stream()>>>(loss_data, prob_data, label_data, - batch_size, class_num); + grid, block, 0, + reinterpret_cast(ctx).stream()>>>( + loss_data, prob_data, label_data, batch_size, class_num); } } }; diff --git a/paddle/operators/math/cross_entropy.h b/paddle/operators/math/cross_entropy.h index 18e637cf9186b5dc21e94f1ab15b3d858ec93c67..0ab6827ffa8f8b90b432a801607a97206e010cf4 100644 --- a/paddle/operators/math/cross_entropy.h +++ b/paddle/operators/math/cross_entropy.h @@ -37,9 +37,7 @@ struct TolerableValue { template class CrossEntropyFunctor { public: - // (TODO caoying) it is much better to use DeviceContext as the first - // parameter. - void operator()(const framework::ExecutionContext& context, + void operator()(const platform::DeviceContext& context, framework::Tensor* out, const framework::Tensor* prob, const framework::Tensor* labels, const bool softLabel); }; diff --git a/paddle/operators/math/math_function.h b/paddle/operators/math/math_function.h index 43306fca73387b7b212f556a2b187df113a1b327..473eff4d198ca9b17b6af8eebd6dfe39d49d138d 100644 --- a/paddle/operators/math/math_function.h +++ b/paddle/operators/math/math_function.h @@ -52,6 +52,7 @@ int LAPACKE_dgetri(int matrix_layout, int n, double* a, int lda, #include +#include "paddle/framework/eigen.h" #include "paddle/framework/tensor.h" #include "paddle/platform/device_context.h" #include "paddle/platform/enforce.h" @@ -84,6 +85,13 @@ void matmul(const platform::DeviceContext& context, const framework::Tensor& matrix_b, bool trans_b, T alpha, framework::Tensor* matrix_out, T beta); +template +void SetConstant(const platform::DeviceContext& context, + framework::Tensor* tensor, T num) { + auto t = framework::EigenVector::Flatten(*tensor); + t.device(*context.GetEigenDevice()) = t.constant(static_cast(num)); +} + } // namespace math } // namespace operators } // namespace paddle diff --git a/paddle/operators/math/math_function_test.cc b/paddle/operators/math/math_function_test.cc index f272f7e5135e7092618b8c94ee55faf1cfd8e8a5..22468a0c4a4b0aca343fe766c8c9d63393a338eb 100644 --- a/paddle/operators/math/math_function_test.cc +++ b/paddle/operators/math/math_function_test.cc @@ -243,3 +243,24 @@ TEST(math_function, gemm_trans_clbas) { EXPECT_EQ(input3_ptr[6], 86); EXPECT_EQ(input3_ptr[7], 99); } + +TEST(math_function, zero) { + paddle::framework::Tensor tensor; + auto* cpu_place = new paddle::platform::CPUPlace(); + float* t = tensor.mutable_data({2, 2}, *cpu_place); + paddle::platform::CPUDeviceContext context(*cpu_place); + paddle::operators::math::SetConstant( + context, &tensor, 0); + EXPECT_EQ(t[0], 0); + EXPECT_EQ(t[1], 0); + EXPECT_EQ(t[2], 0); + EXPECT_EQ(t[3], 0); + + paddle::operators::math::SetConstant( + context, &tensor, 1); + + EXPECT_EQ(t[0], 1); + EXPECT_EQ(t[1], 1); + EXPECT_EQ(t[2], 1); + EXPECT_EQ(t[3], 1); +} diff --git a/paddle/operators/math/softmax.cc b/paddle/operators/math/softmax.cc index ac9f3c4bf61bf8e13faa17387f1112756db9a100..0ba8197ab8b64649c8adcf67771ba01eca7f1d10 100644 --- a/paddle/operators/math/softmax.cc +++ b/paddle/operators/math/softmax.cc @@ -1,16 +1,16 @@ /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at - http://www.apache.org/licenses/LICENSE-2.0 + http://www.apache.org/licenses/LICENSE-2.0 - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. */ +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ #include "paddle/operators/math/softmax.h" @@ -19,6 +19,7 @@ namespace operators { namespace math { template class SoftmaxFunctor; +template class SoftmaxGradFunctor; } // namespace math } // namespace operators diff --git a/paddle/operators/math/softmax.cu b/paddle/operators/math/softmax.cu index 4c3df0550e7ca6f4310db1d35cc34d5c73a2dd16..99f988d51e4b16c3f3bfd9c76b411bb53619603e 100644 --- a/paddle/operators/math/softmax.cu +++ b/paddle/operators/math/softmax.cu @@ -1,16 +1,16 @@ /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at - http://www.apache.org/licenses/LICENSE-2.0 + http://www.apache.org/licenses/LICENSE-2.0 - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. */ +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ #define EIGEN_USE_GPU @@ -21,6 +21,7 @@ namespace operators { namespace math { template class SoftmaxFunctor; +template class SoftmaxGradFunctor; } // namespace math } // namespace operators diff --git a/paddle/operators/math/softmax.h b/paddle/operators/math/softmax.h index 3d2f0d0aecffcd0fe51166c3d863aa8b91bba196..b7f627eee7f8fe68a83595a3390a55d438c97afb 100644 --- a/paddle/operators/math/softmax.h +++ b/paddle/operators/math/softmax.h @@ -1,16 +1,16 @@ /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at - http://www.apache.org/licenses/LICENSE-2.0 + http://www.apache.org/licenses/LICENSE-2.0 - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. */ +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ #pragma once #include "paddle/framework/eigen.h" @@ -36,7 +36,7 @@ struct ValueClip { template class SoftmaxFunctor { public: - void operator()(const framework::ExecutionContext& context, + void operator()(const platform::DeviceContext& context, const framework::Tensor* X, framework::Tensor* Y) { auto logits = EigenMatrix::From(*X); auto softmax = EigenMatrix::From(*Y); @@ -58,8 +58,8 @@ class SoftmaxFunctor { .broadcast(one_by_class)) .unaryExpr(ValueClip()); - softmax.device(context.GetEigenDevice()) = shifted_logits.exp(); - softmax.device(context.GetEigenDevice()) = + softmax.device(*context.GetEigenDevice()) = shifted_logits.exp(); + softmax.device(*context.GetEigenDevice()) = (softmax * softmax.sum(along_class) .inverse() @@ -68,6 +68,37 @@ class SoftmaxFunctor { .broadcast(one_by_class)); } }; + +template +class SoftmaxGradFunctor { + public: + void operator()(const platform::DeviceContext& context, + const framework::Tensor* y, const framework::Tensor* y_grad, + framework::Tensor* x_grad) { + auto softmax = EigenMatrix::From(*y); + auto softmax_grad = EigenMatrix::From(*y_grad); + auto logits_grad = EigenMatrix::From(*x_grad); + + const int kBatchDim = 0; + const int kClassDim = 1; + + const int batch_size = softmax.dimension(kBatchDim); + const int num_classes = softmax.dimension(kClassDim); + + Eigen::DSizes along_class(kClassDim); + Eigen::DSizes batch_by_one(batch_size, 1); + Eigen::DSizes one_by_class(1, num_classes); + + auto dot = (softmax * softmax_grad) + .sum(along_class) + .eval() + .reshape(batch_by_one) + .broadcast(one_by_class); + logits_grad.device(*context.GetEigenDevice()) = + (softmax_grad - dot) * softmax; + } +}; + } // namespace math } // namespace operators } // namespace paddle diff --git a/paddle/operators/mean_op.h b/paddle/operators/mean_op.h index ce31e178d8e375dc59be80a6c05133201308da70..c99286a5b928f1edcd845b01b21b95654c25db07 100644 --- a/paddle/operators/mean_op.h +++ b/paddle/operators/mean_op.h @@ -28,7 +28,7 @@ template ; template -class MeanKernel : public framework::OpKernel { +class MeanKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* input = context.Input("X"); @@ -45,7 +45,7 @@ class MeanKernel : public framework::OpKernel { }; template -class MeanGradKernel : public framework::OpKernel { +class MeanGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto OG = context.Input(framework::GradVarName("Out")); diff --git a/paddle/operators/minus_op.h b/paddle/operators/minus_op.h index 6310a4fd5141516cff4fc7acbe1d17913a1b5506..bd9a2790aa2b208c2d3dfc792031283eb6c42397 100644 --- a/paddle/operators/minus_op.h +++ b/paddle/operators/minus_op.h @@ -20,7 +20,7 @@ namespace paddle { namespace operators { template -class MinusKernel : public framework::OpKernel { +class MinusKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* left_tensor = context.Input("X"); diff --git a/paddle/operators/modified_huber_loss_op.cu b/paddle/operators/modified_huber_loss_op.cu index bce760f95e72cfec05b07591e0fa1250168b112f..8854e166cd99ce914d7f9f9bcead3234b0649506 100644 --- a/paddle/operators/modified_huber_loss_op.cu +++ b/paddle/operators/modified_huber_loss_op.cu @@ -39,7 +39,7 @@ struct ModifiedHuberLossBackward { }; template -class ModifiedHuberLossGradGPUKernel : public framework::OpKernel { +class ModifiedHuberLossGradGPUKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* in0 = context.Input("Y"); diff --git a/paddle/operators/modified_huber_loss_op.h b/paddle/operators/modified_huber_loss_op.h index cb51007749e3c59572d4852959f4119ac377decc..aba75efad9c19e3e113b4f09bc1fbd4732f4e187 100644 --- a/paddle/operators/modified_huber_loss_op.h +++ b/paddle/operators/modified_huber_loss_op.h @@ -47,7 +47,7 @@ struct ModifiedHuberLossForward { }; template -class ModifiedHuberLossKernel : public framework::OpKernel { +class ModifiedHuberLossKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* in0 = context.Input("X"); @@ -73,7 +73,7 @@ class ModifiedHuberLossKernel : public framework::OpKernel { // CPU backward kernel template -class ModifiedHuberLossGradCPUKernel : public framework::OpKernel { +class ModifiedHuberLossGradCPUKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* in0 = context.Input("Y"); diff --git a/paddle/operators/mul_op.cc b/paddle/operators/mul_op.cc index 9858c4d9c2195c7bd0e767aaa86a950e0a791443..3c8fe04d2edeccc0e0d55aa2a28d71085ccf5145 100644 --- a/paddle/operators/mul_op.cc +++ b/paddle/operators/mul_op.cc @@ -1,16 +1,16 @@ /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at - http://www.apache.org/licenses/LICENSE-2.0 + http://www.apache.org/licenses/LICENSE-2.0 - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. */ +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ #include "paddle/operators/mul_op.h" @@ -35,12 +35,14 @@ class MulOp : public framework::OperatorWithKernel { int x_num_col_dims = ctx->Attrs().Get("x_num_col_dims"); int y_num_col_dims = ctx->Attrs().Get("y_num_col_dims"); - PADDLE_ENFORCE(x_dims.size() > x_num_col_dims, - "The rank of input tensor X should be larger than " - "`mul_op`'s `x_num_col_dims`."); - PADDLE_ENFORCE(y_dims.size() > y_num_col_dims, - "The rank of input tensor Y should be larger than " - "`mul_op`'s `y_num_col_dims`."); + PADDLE_ENFORCE_GT( + x_dims.size(), x_num_col_dims, + "The input tensor X's rank of MulOp should be larger than " + "x_num_col_dims."); + PADDLE_ENFORCE_GT( + y_dims.size(), y_num_col_dims, + "The input tensor Y's rank of MulOp should be larger than " + "y_num_col_dims."); auto x_mat_dims = framework::flatten_to_2d(x_dims, x_num_col_dims); auto y_mat_dims = framework::flatten_to_2d(y_dims, y_num_col_dims); diff --git a/paddle/operators/mul_op.h b/paddle/operators/mul_op.h index ac7136a76933d1f3ead86518c65d589747227631..684b1ea0c0c8ddabc9809cc05ed985e0cc250955 100644 --- a/paddle/operators/mul_op.h +++ b/paddle/operators/mul_op.h @@ -28,7 +28,7 @@ template ; template -class MulKernel : public framework::OpKernel { +class MulKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { const Tensor* x = context.Input("X"); @@ -52,7 +52,7 @@ class MulKernel : public framework::OpKernel { }; template -class MulGradKernel : public framework::OpKernel { +class MulGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { int x_num_col_dims = ctx.template Attr("x_num_col_dims"); diff --git a/paddle/operators/multiplex_op.cc b/paddle/operators/multiplex_op.cc index 9896d269ccc86d8fdc3bf6375e44ef5bf3e6b9c7..a069127a19a1d0ba4eaa2b3450a1c46262ace3ed 100644 --- a/paddle/operators/multiplex_op.cc +++ b/paddle/operators/multiplex_op.cc @@ -50,6 +50,11 @@ class MultiplexOp : public framework::OperatorWithKernel { } ctx->SetOutputDim("Out", in_dim); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.MultiInput("X")[0]->type()); + } }; class MultiplexOpMaker : public framework::OpProtoAndCheckerMaker { @@ -99,6 +104,11 @@ class MultiplexGradOp : public framework::OperatorWithKernel { } ctx->SetOutputsDim(framework::GradVarName("X"), d_ins); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.MultiInput("X")[0]->type()); + } }; } // namespace operators diff --git a/paddle/operators/multiplex_op.cu b/paddle/operators/multiplex_op.cu index 505776612e7119e568493506b113661a839e5bd1..72b1f96eafde37976b4b067b534112b17e02b807 100644 --- a/paddle/operators/multiplex_op.cu +++ b/paddle/operators/multiplex_op.cu @@ -21,7 +21,7 @@ namespace operators { using Tensor = framework::Tensor; template -class MultiplexGPUKernel : public framework::OpKernel { +class MultiplexGPUKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const { auto ins = ctx.MultiInput("X"); @@ -51,7 +51,7 @@ class MultiplexGPUKernel : public framework::OpKernel { }; template -class MultiplexGradGPUKernel : public framework::OpKernel { +class MultiplexGradGPUKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const { auto* d_out = ctx.Input(framework::GradVarName("Out")); diff --git a/paddle/operators/multiplex_op.h b/paddle/operators/multiplex_op.h index 637c63a34af394f5f54997c46c00a9ff00577476..ab3cafaa324a29d6f249cf1f73db92e1364eebc8 100644 --- a/paddle/operators/multiplex_op.h +++ b/paddle/operators/multiplex_op.h @@ -23,7 +23,7 @@ namespace paddle { namespace operators { template -class MultiplexCPUKernel : public framework::OpKernel { +class MultiplexCPUKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const { auto ins = ctx.MultiInput("X"); @@ -48,7 +48,7 @@ class MultiplexCPUKernel : public framework::OpKernel { }; template -class MultiplexGradCPUKernel : public framework::OpKernel { +class MultiplexGradCPUKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const { auto* d_out = ctx.Input(framework::GradVarName("Out")); diff --git a/paddle/operators/pad_op.h b/paddle/operators/pad_op.h index 2cc3b945ae5b2e2e93d8531c7f99e4c215d1d806..9534dbf54529e3b9ae2b6640d51fe291e9521927 100644 --- a/paddle/operators/pad_op.h +++ b/paddle/operators/pad_op.h @@ -47,7 +47,7 @@ void PadFunction(const framework::ExecutionContext& context) { } template -class PadKernel : public framework::OpKernel { +class PadKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { int rank = context.Input("X")->dims().size(); @@ -97,7 +97,7 @@ void PadGradFunction(const framework::ExecutionContext& context) { } template -class PadGradKernel : public framework::OpKernel { +class PadGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { size_t rank = diff --git a/paddle/operators/prelu_op.h b/paddle/operators/prelu_op.h index 6b78ed295cbac060d816fb3dd27a4b80145cb1ce..5ad31c2203ae6c9bf6f48bb9ecf9a714597e7da8 100644 --- a/paddle/operators/prelu_op.h +++ b/paddle/operators/prelu_op.h @@ -40,7 +40,7 @@ class PReluFunctor { }; template -class PReluKernel : public framework::OpKernel { +class PReluKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* x = context.Input("X"); @@ -77,7 +77,7 @@ class PReluGradFunctor { }; template -class PReluGradKernel : public framework::OpKernel { +class PReluGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* dx = context.Output(framework::GradVarName("X")); diff --git a/paddle/operators/rank_loss_op.h b/paddle/operators/rank_loss_op.h index 7df195ff47ecfd79388385eed4bd37b8c9b45979..f184d6efcb496a1d7f38540712b6c431f816482e 100644 --- a/paddle/operators/rank_loss_op.h +++ b/paddle/operators/rank_loss_op.h @@ -21,7 +21,7 @@ namespace paddle { namespace operators { template -class RankLossKernel : public framework::OpKernel { +class RankLossKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const { auto* out_t = ctx.Output("Out"); @@ -42,7 +42,7 @@ class RankLossKernel : public framework::OpKernel { }; template -class RankLossGradKernel : public framework::OpKernel { +class RankLossGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const { auto* d_left_t = diff --git a/paddle/operators/reduce_op.h b/paddle/operators/reduce_op.h index 2fbf94e34f3961a9b3140fb682a7c479f3b71f4d..ba3f3db81dc6251a063d27e597fd7e486e7b6c14 100644 --- a/paddle/operators/reduce_op.h +++ b/paddle/operators/reduce_op.h @@ -87,7 +87,7 @@ struct MaxOrMinGradFunctor { }; template -class ReduceKernel : public framework::OpKernel { +class ReduceKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { int rank = context.Input("X")->dims().size(); @@ -141,7 +141,7 @@ class ReduceKernel : public framework::OpKernel { }; template -class ReduceGradKernel : public framework::OpKernel { +class ReduceGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { int rank = context.Input("X")->dims().size(); diff --git a/paddle/operators/reshape_op.h b/paddle/operators/reshape_op.h index 873acf30782d390cdca5e7e864c76e1f743f9a7c..628dfe4c0fadcfeec188d8ae5049a994e3281bc1 100644 --- a/paddle/operators/reshape_op.h +++ b/paddle/operators/reshape_op.h @@ -21,7 +21,7 @@ namespace paddle { namespace operators { template -class ReshapeKernel : public framework::OpKernel { +class ReshapeKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const { auto* out = ctx.Output("Out"); @@ -39,7 +39,7 @@ class ReshapeKernel : public framework::OpKernel { }; template -class ReshapeGradKernel : public framework::OpKernel { +class ReshapeGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const { auto* d_out = ctx.Input(framework::GradVarName("Out")); diff --git a/paddle/operators/rowwise_add_op.cc b/paddle/operators/rowwise_add_op.cc deleted file mode 100644 index 1fcf0959dffd6a68d97dec4e2b5b509d06c0d09c..0000000000000000000000000000000000000000 --- a/paddle/operators/rowwise_add_op.cc +++ /dev/null @@ -1,109 +0,0 @@ -/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. */ - -#include "paddle/operators/rowwise_add_op.h" - -namespace paddle { -namespace operators { - -using framework::Tensor; - -class RowwiseAddOp : public framework::OperatorWithKernel { - public: - using framework::OperatorWithKernel::OperatorWithKernel; - - protected: - void InferShape(framework::InferShapeContextBase* ctx) const override { - PADDLE_ENFORCE(ctx->HasInput("X"), - "Input(X) of RowwiseAddOp should not be null."); - PADDLE_ENFORCE(ctx->HasInput("b"), - "Input(b) of RowwiseAddOp should not be null."); - PADDLE_ENFORCE(ctx->HasOutput("Out"), - "Output(Out) of RowwiseAddOp should not be null."); - - auto x_dims = ctx->GetInputDim("X"); - auto b_dims = ctx->GetInputDim("b"); - PADDLE_ENFORCE_GT( - x_dims.size(), b_dims.size(), - "The rank of input `X` must be larger than the one of input `b`."); - - int num_col_dims = x_dims.size() - b_dims.size(); - - PADDLE_ENFORCE_EQ( - framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims, - "The width of two operands must be same"); - PADDLE_ENFORCE_EQ(ctx->Outputs("Out").size(), 1, - "The output size must be 1"); - ctx->SetOutputDim("Out", x_dims); - ctx->ShareLoD("X", /*->*/ "Out"); - } -}; - -class RowwiseAddOpMaker : public framework::OpProtoAndCheckerMaker { - public: - RowwiseAddOpMaker(framework::OpProto* proto, - framework::OpAttrChecker* op_checker) - : OpProtoAndCheckerMaker(proto, op_checker) { - AddInput("X", "The left input of row-wise add op, must be matrix"); - AddInput("b", "The right input of row-wise add op, must be vector"); - AddOutput("Out", "The output of row-wise add op"); - AddComment(R"DOC(Row-wise Add operator - -for i in xrange(X.shape[0]): - Out = X[i] + b -)DOC"); - } -}; -class RowwiseAddGradOp : public framework::OperatorWithKernel { - public: - using framework::OperatorWithKernel::OperatorWithKernel; - - protected: - void InferShape(framework::InferShapeContextBase* ctx) const override { - PADDLE_ENFORCE(ctx->HasInput("X"), "X should not be null"); - PADDLE_ENFORCE(ctx->HasInput("b"), "b should not be null"); - PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")), - "Input(Out@GRAD) should not be null"); - auto x_dims = ctx->GetInputDim("X"); - auto b_dims = ctx->GetInputDim("b"); - PADDLE_ENFORCE_GT( - x_dims.size(), b_dims.size(), - "The rank of input `X` must be larger than the one of input `b`."); - - int64_t num_col_dims = x_dims.size() - b_dims.size(); - PADDLE_ENFORCE_EQ( - framework::slice_ddim(x_dims, num_col_dims, x_dims.size()), b_dims, - "The width of two operands must be same"); - auto x_grad_name = framework::GradVarName("X"); - auto b_grad_name = framework::GradVarName("b"); - if (ctx->HasOutput(x_grad_name)) { - ctx->SetOutputDim(x_grad_name, x_dims); - } - if (ctx->HasOutput(b_grad_name)) { - ctx->SetOutputDim(b_grad_name, b_dims); - } - } -}; - -} // namespace operators -} // namespace paddle - -namespace ops = paddle::operators; -REGISTER_OP(rowwise_add, ops::RowwiseAddOp, ops::RowwiseAddOpMaker, - rowwise_add_grad, ops::RowwiseAddGradOp); -REGISTER_OP_CPU_KERNEL( - rowwise_add, ops::RowwiseAddKernel); -REGISTER_OP_CPU_KERNEL( - rowwise_add_grad, - ops::RowwiseAddGradKernel); diff --git a/paddle/operators/rowwise_add_op.h b/paddle/operators/rowwise_add_op.h deleted file mode 100644 index 35774b940926f77167b8f19597027e74d3477e5b..0000000000000000000000000000000000000000 --- a/paddle/operators/rowwise_add_op.h +++ /dev/null @@ -1,80 +0,0 @@ -/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. */ - -#pragma once -#include "paddle/framework/eigen.h" -#include "paddle/framework/op_registry.h" - -namespace paddle { -namespace operators { - -using Tensor = framework::Tensor; -template -using EigenVector = framework::EigenVector; -template -using EigenMatrix = framework::EigenMatrix; - -template -class RowwiseAddKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto out = context.Output("Out"); - out->mutable_data(context.GetPlace()); - int num_col_dims = context.Input("X")->dims().size() - - context.Input("b")->dims().size(); - auto input = - EigenMatrix::Reshape(*context.Input("X"), num_col_dims); - auto bias = EigenVector::Flatten(*context.Input("b")); - auto output = EigenMatrix::Reshape(*out, num_col_dims); - - const int bias_size = bias.dimension(0); - const int rest_size = input.size() / bias_size; - Eigen::DSizes one_d(input.size()); - Eigen::DSizes bcast(rest_size); - output.reshape(one_d).device(context.GetEigenDevice()) = - input.reshape(one_d) + bias.broadcast(bcast).reshape(one_d); - } -}; - -template -class RowwiseAddGradKernel : public framework::OpKernel { - public: - void Compute(const framework::ExecutionContext& context) const override { - auto* dout = context.Input(framework::GradVarName("Out")); - auto* dx = context.Output(framework::GradVarName("X")); - auto* db = context.Output(framework::GradVarName("b")); - int num_col_dims = context.Input("X")->dims().size() - - context.Input("b")->dims().size(); - - auto out_grad = EigenMatrix::Reshape(*dout, num_col_dims); - auto place = context.GetEigenDevice(); - - if (dx) { - dx->mutable_data(context.GetPlace()); - EigenMatrix::Reshape(*dx, num_col_dims).device(place) = out_grad; - } - - if (db) { - db->mutable_data(context.GetPlace()); - // https://eigen.tuxfamily.org/dox/unsupported/TensorBase_8h_source.html - // colwise add - Eigen::array dims{{0}}; /* dimension to reduce */ - EigenVector::Flatten(*db).device(place) = out_grad.sum(dims); - } - } -}; -} // namespace operators -} // namespace paddle diff --git a/paddle/operators/scale_op.h b/paddle/operators/scale_op.h index 02fbdc52bbf89c9f2acc5eeaa1197e4ccbca9d31..dc6bc768997f4fdd049bb63bdc11252ab52fcda9 100644 --- a/paddle/operators/scale_op.h +++ b/paddle/operators/scale_op.h @@ -20,7 +20,7 @@ namespace paddle { namespace operators { template -class ScaleKernel : public framework::OpKernel { +class ScaleKernel : public framework::OpKernel { public: virtual void Compute(const framework::ExecutionContext& context) const { auto* tensor = context.Output("Out"); diff --git a/paddle/operators/scatter_op.cc b/paddle/operators/scatter_op.cc index 3fc4a39ebc5526bfed61ba667c3cdc214cdd056c..cadd8841b6ab3a3674054240265eb6d4b474db1e 100644 --- a/paddle/operators/scatter_op.cc +++ b/paddle/operators/scatter_op.cc @@ -48,6 +48,11 @@ class ScatterOp : public framework::OperatorWithKernel { } ctx->SetOutputDim("Out", ref_dims); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.Input("Ref")->type()); + } }; class ScatterGradOp : public framework::OperatorWithKernel { @@ -60,6 +65,11 @@ class ScatterGradOp : public framework::OperatorWithKernel { ctx->GetInputDim("Updates")); ctx->SetOutputDim(framework::GradVarName("Ref"), ctx->GetInputDim("Ref")); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.Input("Ref")->type()); + } }; class ScatterOpMaker : public framework::OpProtoAndCheckerMaker { diff --git a/paddle/operators/scatter_op.h b/paddle/operators/scatter_op.h index e9595638a86a4a4536ddad4e6f20fd80a54b1608..a8eb54399a932913de208e1ddc90a6ff0dfaa452 100644 --- a/paddle/operators/scatter_op.h +++ b/paddle/operators/scatter_op.h @@ -24,7 +24,7 @@ namespace operators { using Tensor = framework::Tensor; template -class ScatterOpKernel : public framework::OpKernel { +class ScatterOpKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext &ctx) const override { auto *Ref = ctx.Input("Ref"); @@ -40,7 +40,7 @@ class ScatterOpKernel : public framework::OpKernel { }; template -class ScatterGradientOpKernel : public framework::OpKernel { +class ScatterGradientOpKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext &ctx) const override { auto *dRef = ctx.Output(framework::GradVarName("Ref")); diff --git a/paddle/operators/sequence_pool_op.cc b/paddle/operators/sequence_pool_op.cc index 17685ea654715f6996e17f6228f266c3aa1ee424..bc4af2f70427e684dfb531b8c61d68f28ae20794 100644 --- a/paddle/operators/sequence_pool_op.cc +++ b/paddle/operators/sequence_pool_op.cc @@ -24,9 +24,9 @@ class SequencePoolOp : public framework::OperatorWithKernel { protected: void InferShape(framework::InferShapeContextBase* ctx) const override { PADDLE_ENFORCE(ctx->HasInput("X"), - "Input(X) of SequenceAvgPoolOp should not be null."); + "Input(X) of SequencePoolOp should not be null."); PADDLE_ENFORCE(ctx->HasOutput("Out"), - "Output(Out) of SequenceAvgPoolOp should not be null."); + "Output(Out) of SequencePoolOp should not be null."); ctx->SetOutputDim("Out", ctx->GetInputDim("X")); } }; diff --git a/paddle/operators/sequence_pool_op.h b/paddle/operators/sequence_pool_op.h index cb80586e88f8d9e31b7b91a54f5e05ac6fa73f0f..752d714125578b2d1f926765b183495ec5cc203e 100644 --- a/paddle/operators/sequence_pool_op.h +++ b/paddle/operators/sequence_pool_op.h @@ -38,7 +38,7 @@ enum SeqPoolType { }; template -class SequencePoolKernel : public framework::OpKernel { +class SequencePoolKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* in = context.Input("X"); @@ -85,7 +85,7 @@ class SequencePoolKernel : public framework::OpKernel { }; template -class SequencePoolGradKernel : public framework::OpKernel { +class SequencePoolGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* in = context.Input("X"); diff --git a/paddle/operators/sequence_softmax_op.cc b/paddle/operators/sequence_softmax_op.cc new file mode 100644 index 0000000000000000000000000000000000000000..621779ab6133f56a43fb2d20c814ebed8762ea7d --- /dev/null +++ b/paddle/operators/sequence_softmax_op.cc @@ -0,0 +1,103 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#include "paddle/operators/sequence_softmax_op.h" + +namespace paddle { +namespace operators { + +class SequenceSoftmaxOp : public framework::OperatorWithKernel { + public: + using framework::OperatorWithKernel::OperatorWithKernel; + + protected: + void InferShape(framework::InferShapeContextBase* ctx) const override { + PADDLE_ENFORCE(ctx->HasInput("X"), + "Input(X) of SequenceSoftmaxOp should not be null."); + PADDLE_ENFORCE(ctx->HasOutput("Out"), + "Output(Out) of SequenceSoftmaxOp should not be null."); + ctx->SetOutputDim("Out", ctx->GetInputDim("X")); + ctx->ShareLoD("X", /*->*/ "Out"); + } +}; + +class SequenceSoftmaxOpMaker : public framework::OpProtoAndCheckerMaker { + public: + SequenceSoftmaxOpMaker(framework::OpProto* proto, + framework::OpAttrChecker* op_checker) + : OpProtoAndCheckerMaker(proto, op_checker) { + AddInput("X", + "(LoDTensor) 1-D or 2-D input LoDTensor with the 2-nd dimension " + "of length 1."); + AddOutput("Out", + "(LoDTensor) 1-D or 2-D output LoDTensor with the 2-nd dimension " + "of length 1."); + AddComment(R"DOC( +SequenceSoftmaxOp computes softmax activation among all time-steps for each +sequence. The dimension of each time-step should be 1. Thus, the shape of +input Tensor can be either [N, 1] or [N], where N is the sum of all sequences' +lengths. + +Equation: + for i-th sequence in a mini-batch: + Out(X[lod[i]:lod[i+1]], :) = + exp(X[lod[i]:lod[i+1], :]) / sum(exp(X[lod[i]:lod[i+1], :])) + +For example, for a mini-batch of 3 sequences with variable-length, +each containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7], +then softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :] +and N turns out to be 7. +)DOC"); + } +}; + +class SequenceSoftmaxGradOp : public framework::OperatorWithKernel { + public: + using framework::OperatorWithKernel::OperatorWithKernel; + + protected: + void InferShape(framework::InferShapeContextBase* ctx) const override { + PADDLE_ENFORCE(ctx->HasInput("Out"), + "Input(Out) of SequenceSoftmaxGradOp should not be null."); + PADDLE_ENFORCE( + ctx->HasInput(framework::GradVarName("Out")), + "Input(Out@GRAD) of SequenceSoftmaxGradOp should not be null."); + PADDLE_ENFORCE(ctx->HasInput("X"), + "Input(X) of SequenceSoftmaxOp should not be null."); + PADDLE_ENFORCE(ctx->HasOutput(framework::GradVarName("X")), + "Output(X@GRAD) of SequenceSoftmaxOp should not be null."); + + PADDLE_ENFORCE_EQ( + ctx->GetInputDim("Out"), + ctx->GetInputDim(framework::GradVarName("Out")), + "Input(Out) and Input(Out@GRAD) of SequenceSoftmaxGradOp should be of " + "the same shape."); + + ctx->SetOutputDim(framework::GradVarName("X"), ctx->GetInputDim("X")); + } +}; + +} // namespace operators +} // namespace paddle + +namespace ops = paddle::operators; +REGISTER_OP(sequence_softmax, ops::SequenceSoftmaxOp, + ops::SequenceSoftmaxOpMaker, sequence_softmax_grad, + ops::SequenceSoftmaxGradOp); +REGISTER_OP_CPU_KERNEL( + sequence_softmax, + ops::SequenceSoftmaxKernel); +REGISTER_OP_CPU_KERNEL( + sequence_softmax_grad, + ops::SequenceSoftmaxGradKernel); diff --git a/paddle/operators/sequence_softmax_op.cu b/paddle/operators/sequence_softmax_op.cu new file mode 100644 index 0000000000000000000000000000000000000000..f2a1e3d5e31ef21b95a51b287bdd1d4aa9221e89 --- /dev/null +++ b/paddle/operators/sequence_softmax_op.cu @@ -0,0 +1,25 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#define EIGEN_USE_GPU + +#include "paddle/operators/sequence_softmax_op.h" + +namespace ops = paddle::operators; +REGISTER_OP_GPU_KERNEL( + sequence_softmax, + ops::SequenceSoftmaxKernel) +REGISTER_OP_GPU_KERNEL( + sequence_softmax_grad, + ops::SequenceSoftmaxGradKernel); diff --git a/paddle/operators/sequence_softmax_op.h b/paddle/operators/sequence_softmax_op.h new file mode 100644 index 0000000000000000000000000000000000000000..96d87c404d217280d74bd088e7a23f539ef6e7ce --- /dev/null +++ b/paddle/operators/sequence_softmax_op.h @@ -0,0 +1,94 @@ +/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. */ + +#pragma once + +#include "paddle/framework/eigen.h" +#include "paddle/framework/op_registry.h" +#include "paddle/operators/math/softmax.h" + +namespace paddle { +namespace operators { + +using Tensor = framework::Tensor; +using LoDTensor = framework::LoDTensor; + +template +class SequenceSoftmaxKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& ctx) const override { + auto* x = ctx.Input("X"); + auto* out = ctx.Output("Out"); + + auto lod = x->lod(); + auto dims = x->dims(); + + const size_t level = lod.size() - 1; + PADDLE_ENFORCE_EQ(dims[0], static_cast(lod[level].back()), + "The first dimension of Input(X) should be equal to the " + "sum of all sequences' lengths."); + PADDLE_ENFORCE_EQ(dims[0], x->numel(), + "The width of each timestep in Input(X) of " + "SequenceSoftmaxOp should be 1."); + + out->mutable_data(ctx.GetPlace()); + for (int i = 0; i < static_cast(lod[level].size()) - 1; ++i) { + int start_pos = static_cast(lod[level][i]); + int end_pos = static_cast(lod[level][i + 1]); + Tensor x_i = x->Slice(start_pos, end_pos); + Tensor out_i = out->Slice(start_pos, end_pos); + + // Reshape from (end_pos - start_pos) x 1UL to 1UL x (end_pos - start_pos) + framework::DDim dims_i = framework::make_ddim({1UL, end_pos - start_pos}); + x_i.Resize(dims_i); + out_i.Resize(dims_i); + math::SoftmaxFunctor()(ctx.device_context(), &x_i, &out_i); + } + } +}; + +template +class SequenceSoftmaxGradKernel : public framework::OpKernel { + public: + void Compute(const framework::ExecutionContext& ctx) const override { + auto* out = ctx.Input("Out"); + auto* out_grad = ctx.Input(framework::GradVarName("Out")); + auto* x = ctx.Input("X"); + auto* x_grad = ctx.Output(framework::GradVarName("X")); + + auto lod = x->lod(); + const size_t level = lod.size() - 1; + + x_grad->mutable_data(ctx.GetPlace()); + for (int i = 0; i < static_cast(lod[level].size()) - 1; ++i) { + int start_pos = static_cast(lod[level][i]); + int end_pos = static_cast(lod[level][i + 1]); + + Tensor out_i = out->Slice(start_pos, end_pos); + Tensor out_grad_i = out_grad->Slice(start_pos, end_pos); + Tensor x_grad_i = x_grad->Slice(start_pos, end_pos); + + // Reshape from (end_pos - start_pos) x 1UL to 1UL x (end_pos - start_pos) + framework::DDim dims_i = framework::make_ddim({1UL, end_pos - start_pos}); + out_i.Resize(dims_i); + out_grad_i.Resize(dims_i); + x_grad_i.Resize(dims_i); + math::SoftmaxGradFunctor()(ctx.device_context(), &out_i, + &out_grad_i, &x_grad_i); + } + } +}; + +} // namespace operators +} // namespace paddle diff --git a/paddle/operators/sgd_op.h b/paddle/operators/sgd_op.h index f8888f9c362e1c39af42236bb3a23be37aa3ae15..a3fe3308942f98e2c28376b589b6fc930e6878a1 100644 --- a/paddle/operators/sgd_op.h +++ b/paddle/operators/sgd_op.h @@ -25,7 +25,7 @@ template ; template -class SGDOpKernel : public framework::OpKernel { +class SGDOpKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { auto param = ctx.Input("param"); diff --git a/paddle/operators/sigmoid_cross_entropy_with_logits_op.h b/paddle/operators/sigmoid_cross_entropy_with_logits_op.h index a6de9043fdbcdcca47407aac0b4892cbad3a9a42..41c619f181c878f08959a8ca461c60af5ffdff2a 100644 --- a/paddle/operators/sigmoid_cross_entropy_with_logits_op.h +++ b/paddle/operators/sigmoid_cross_entropy_with_logits_op.h @@ -21,7 +21,7 @@ namespace operators { // Out = max(X, 0) - X * Labels + log(1 + exp(-abs(X))) template -class SigmoidCrossEntropyWithLogitsKernel : public framework::OpKernel { +class SigmoidCrossEntropyWithLogitsKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext &context) const override { const framework::Tensor *X = context.Input("X"); @@ -48,7 +48,7 @@ class SigmoidCrossEntropyWithLogitsKernel : public framework::OpKernel { // dX = sigmoid(X) - labels template -class SigmoidCrossEntropyWithLogitsGradKernel : public framework::OpKernel { +class SigmoidCrossEntropyWithLogitsGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext &context) const override { const framework::Tensor *X = context.Input("X"); diff --git a/paddle/operators/smooth_l1_loss_op.h b/paddle/operators/smooth_l1_loss_op.h index 0604fb5e1c2f17c702208520a1d23bd5c3c65b5d..39d0070b6c8909b8f433de48038240e851d9d6cf 100644 --- a/paddle/operators/smooth_l1_loss_op.h +++ b/paddle/operators/smooth_l1_loss_op.h @@ -45,7 +45,7 @@ struct SmoothL1LossForward { }; template -class SmoothL1LossKernel : public framework::OpKernel { +class SmoothL1LossKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* in0 = context.Input("X"); @@ -115,7 +115,7 @@ struct SmoothL1LossBackward { }; template -class SmoothL1LossGradKernel : public framework::OpKernel { +class SmoothL1LossGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* in0 = context.Input("InsideWeight"); diff --git a/paddle/operators/softmax_op.h b/paddle/operators/softmax_op.h index 7220f486be055e1b841a06b15f519717c54f575c..2c08853f4f615bfe95f51aa20776ddddcdaa8f61 100644 --- a/paddle/operators/softmax_op.h +++ b/paddle/operators/softmax_op.h @@ -26,46 +26,31 @@ template ; template -class SoftmaxKernel : public framework::OpKernel { +class SoftmaxKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { - auto X = context.Input("X"); - auto Y = context.Output("Y"); + auto* X = context.Input("X"); + auto* Y = context.Output("Y"); // allocate memory on device. Y->mutable_data(context.GetPlace()); - math::SoftmaxFunctor()(context, X, Y); + math::SoftmaxFunctor()(context.device_context(), X, Y); } }; template -class SoftmaxGradKernel : public framework::OpKernel { +class SoftmaxGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { - auto Y = context.Input("Y"); - auto dY = context.Input(framework::GradVarName("Y")); - auto dX = context.Output(framework::GradVarName("X")); - dX->mutable_data(context.GetPlace()); - - const int batch_size = Y->dims()[0]; - const int class_num = Y->dims()[1]; - - Eigen::DSizes along_class(1); - Eigen::DSizes batch_by_one(batch_size, 1); - Eigen::DSizes one_by_class(1, class_num); + auto* Y = context.Input("Y"); + auto* dY = context.Input(framework::GradVarName("Y")); + auto* dX = context.Output(framework::GradVarName("X")); - auto Y_eigen = EigenMatrix::From(*Y); - auto dY_eigen = EigenMatrix::From(*dY); - auto dX_eigen = EigenMatrix::From(*dX); - auto place = context.GetEigenDevice(); + // allocate memory on device. + dX->mutable_data(context.GetPlace()); - auto dot = (Y_eigen * dY_eigen) - .sum(along_class) - .eval() - .reshape(batch_by_one) - .broadcast(one_by_class); - dX_eigen.device(place) = (dY_eigen - dot) * Y_eigen; + math::SoftmaxGradFunctor()(context.device_context(), Y, dY, dX); } }; diff --git a/paddle/operators/softmax_with_cross_entropy_op.cc b/paddle/operators/softmax_with_cross_entropy_op.cc index e2299b254458cdd42dee4683561d4d5c81653fb1..a76489871f30dc8d852b6a783efeff41704fd4a4 100644 --- a/paddle/operators/softmax_with_cross_entropy_op.cc +++ b/paddle/operators/softmax_with_cross_entropy_op.cc @@ -13,6 +13,7 @@ limitations under the License. */ #include "paddle/operators/softmax_with_cross_entropy_op.h" +#include namespace paddle { namespace operators { @@ -115,6 +116,11 @@ class SoftmaxWithCrossEntropyOp : public framework::OperatorWithKernel { ctx->ShareLoD("Logits", /*->*/ "Softmax"); ctx->ShareLoD("Logits", /*->*/ "Loss"); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType(ctx.Input("Logits")->type()); + } }; class SoftmaxWithCrossEntropyOpGrad : public framework::OperatorWithKernel { @@ -149,6 +155,12 @@ class SoftmaxWithCrossEntropyOpGrad : public framework::OperatorWithKernel { ctx->SetOutputDim(framework::GradVarName("Logits"), ctx->GetInputDim("Softmax")); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return framework::ToDataType( + ctx.Input(framework::GradVarName("Loss"))->type()); + } }; } // namespace operators diff --git a/paddle/operators/softmax_with_cross_entropy_op.cu b/paddle/operators/softmax_with_cross_entropy_op.cu index 1cf4296dccf68aece6fdfb7910a9c68449633b76..2bc53ecf871eb1800a920ba85e8eac31d7037efe 100644 --- a/paddle/operators/softmax_with_cross_entropy_op.cu +++ b/paddle/operators/softmax_with_cross_entropy_op.cu @@ -53,7 +53,7 @@ __global__ void SoftCrossEntropyGradientKernel(T* logit_grad, } // namespace template -class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel { +class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { PADDLE_ENFORCE(platform::is_gpu_place(context.GetPlace()), @@ -66,14 +66,16 @@ class SoftmaxWithCrossEntropyCUDAKernel : public framework::OpKernel { softmax->mutable_data(context.GetPlace()); loss->mutable_data(context.GetPlace()); - math::SoftmaxFunctor()(context, logits, softmax); + math::SoftmaxFunctor()(context.device_context(), + logits, softmax); math::CrossEntropyFunctor()( - context, loss, softmax, labels, context.Attr("softLabel")); + context.device_context(), loss, softmax, labels, + context.Attr("softLabel")); } }; template -class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel { +class SoftmaxWithCrossEntropyGradCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { PADDLE_ENFORCE(platform::is_gpu_place(context.GetPlace()), diff --git a/paddle/operators/softmax_with_cross_entropy_op.h b/paddle/operators/softmax_with_cross_entropy_op.h index bf792c1f59e2e43a98c93bddbc2aa63d646dee6f..cffd422f1827b646a8abcd881fdcb5455e6a663a 100644 --- a/paddle/operators/softmax_with_cross_entropy_op.h +++ b/paddle/operators/softmax_with_cross_entropy_op.h @@ -27,7 +27,7 @@ template ; template -class SoftmaxWithCrossEntropyKernel : public framework::OpKernel { +class SoftmaxWithCrossEntropyKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { PADDLE_ENFORCE(platform::is_cpu_place(context.GetPlace()), @@ -40,14 +40,16 @@ class SoftmaxWithCrossEntropyKernel : public framework::OpKernel { softmax->mutable_data(context.GetPlace()); loss->mutable_data(context.GetPlace()); - math::SoftmaxFunctor()(context, logits, softmax); + math::SoftmaxFunctor()(context.device_context(), + logits, softmax); math::CrossEntropyFunctor()( - context, loss, softmax, labels, context.Attr("softLabel")); + context.device_context(), loss, softmax, labels, + context.Attr("softLabel")); } }; template -class SoftmaxWithCrossEntropyGradKernel : public framework::OpKernel { +class SoftmaxWithCrossEntropyGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { const Tensor* out_grad = diff --git a/paddle/operators/split_op.h b/paddle/operators/split_op.h index 8ab8e0ee4fea621b34da73507c53846100d61a17..fa26e5f677b18c84b45dd583004d02cab4c1d375 100644 --- a/paddle/operators/split_op.h +++ b/paddle/operators/split_op.h @@ -22,7 +22,7 @@ namespace paddle { namespace operators { template -class SplitOpKernel : public framework::OpKernel { +class SplitOpKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { auto* in = ctx.Input("X"); diff --git a/paddle/operators/squared_l2_distance_op.h b/paddle/operators/squared_l2_distance_op.h index 097ac04fc09a10b3b624f491a847e281e41a802c..259ef4029646914f83a112b9c6d7fdf8401483f6 100644 --- a/paddle/operators/squared_l2_distance_op.h +++ b/paddle/operators/squared_l2_distance_op.h @@ -28,7 +28,7 @@ template ; template -class SquaredL2DistanceKernel : public framework::OpKernel { +class SquaredL2DistanceKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* in0 = context.Input("X"); @@ -68,7 +68,7 @@ class SquaredL2DistanceKernel : public framework::OpKernel { }; template -class SquaredL2DistanceGradKernel : public framework::OpKernel { +class SquaredL2DistanceGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* in0 = context.Input("sub_result"); diff --git a/paddle/operators/sum_op.h b/paddle/operators/sum_op.h index 0b1e9ebaa38d455fb5e3ce8c1a39cbbcdad9a940..7e8fbb9e41c694df9169ea583ce47c33d3bcf2bb 100644 --- a/paddle/operators/sum_op.h +++ b/paddle/operators/sum_op.h @@ -22,7 +22,7 @@ template ; template -class SumKernel : public framework::OpKernel { +class SumKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto ins = context.MultiInput("X"); @@ -43,7 +43,7 @@ class SumKernel : public framework::OpKernel { }; template -class SumGradKernel : public framework::OpKernel { +class SumGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* input = context.Input(framework::GradVarName("Out")); diff --git a/paddle/operators/top_k_op.cu b/paddle/operators/top_k_op.cu index 53fe505b77bfac8a33803f082f8e935d3ed403b6..7be6932f1e301d06e0e232367a38bfa673ff45be 100644 --- a/paddle/operators/top_k_op.cu +++ b/paddle/operators/top_k_op.cu @@ -279,7 +279,7 @@ __global__ void KeMatrixTopK(T* output, int output_stride, int* indices, } template -class TopkOpCUDAKernel : public framework::OpKernel { +class TopkOpCUDAKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), diff --git a/paddle/operators/top_k_op.h b/paddle/operators/top_k_op.h index ef66acc1d569282a42be64b7a5e90f3fbdb20690..4b248faa120bcfd20e70d288cce2d485d3e6371e 100644 --- a/paddle/operators/top_k_op.h +++ b/paddle/operators/top_k_op.h @@ -28,7 +28,7 @@ template ; template -class TopkKernel : public framework::OpKernel { +class TopkKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { // Get the top k elements of each row of input tensor diff --git a/paddle/operators/transpose_op.h b/paddle/operators/transpose_op.h index ea299dce72ad340b0a65ee50582dc156b5ad7abb..aaa3f47ab5545accd4d1108e0ad6f5a3062186d0 100644 --- a/paddle/operators/transpose_op.h +++ b/paddle/operators/transpose_op.h @@ -38,7 +38,7 @@ void EigenTranspose(const framework::ExecutionContext& context, } template -class TransposeKernel : public framework::OpKernel { +class TransposeKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* x = context.Input("X"); @@ -73,7 +73,7 @@ class TransposeKernel : public framework::OpKernel { }; template -class TransposeGradKernel : public framework::OpKernel { +class TransposeGradKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* out_grad = diff --git a/paddle/operators/uniform_random_op.cc b/paddle/operators/uniform_random_op.cc index 2771df56086ff261728af84edcdf01cda3e45e9f..97b1d0bed4595cb750e4d2122f294f10edfbe0ff 100644 --- a/paddle/operators/uniform_random_op.cc +++ b/paddle/operators/uniform_random_op.cc @@ -21,7 +21,7 @@ namespace operators { // Use std::random and thrust::random(thrust is a std library in CUDA) to // implement uniform random. template -class CPUUniformRandomKernel : public framework::OpKernel { +class CPUUniformRandomKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& ctx) const override { auto* tensor = ctx.Output("Out"); @@ -62,6 +62,11 @@ class UniformRandomOp : public framework::OperatorWithKernel { } ctx->SetOutputDim("Out", framework::make_ddim(temp)); } + + framework::DataType IndicateDataType( + const framework::ExecutionContext& ctx) const override { + return static_cast(Attr("data_type")); + } }; class UniformRandomOpMaker : public framework::OpProtoAndCheckerMaker { @@ -80,6 +85,8 @@ Used to initialize tensor with uniform random generator. "Random seed of uniform random. " "0 means generate a seed by system") .SetDefault(0); + AddAttr("data_type", "output tensor data type") + .SetDefault(framework::DataType::FP32); } }; } // namespace operators diff --git a/paddle/operators/uniform_random_op.cu b/paddle/operators/uniform_random_op.cu index 6614b53b3f990d10c82633f3c1f079acea0cd827..5612ce9eb1c644d6271b4a9bb949f685848e05c0 100644 --- a/paddle/operators/uniform_random_op.cu +++ b/paddle/operators/uniform_random_op.cu @@ -40,7 +40,7 @@ struct UniformGenerator { // Use std::random and thrust::random(thrust is a std library in CUDA) to // implement uniform random. template -class GPUUniformRandomKernel : public framework::OpKernel { +class GPUUniformRandomKernel : public framework::OpKernel { public: void Compute(const framework::ExecutionContext& context) const override { auto* tensor = context.Output("Out"); diff --git a/paddle/platform/device_context.cc b/paddle/platform/device_context.cc index 93b472b41c8a4c3a2bfada9d4fbf0e9e1b0cc736..36af1ac677f6bb3e5b6392ff0de678afe7e47950 100644 --- a/paddle/platform/device_context.cc +++ b/paddle/platform/device_context.cc @@ -16,8 +16,8 @@ namespace paddle { namespace platform { template <> -Eigen::DefaultDevice* DeviceContext::get_eigen_device() - const { +Eigen::DefaultDevice* DeviceContext::GetEigenDevice< + platform::CPUPlace, Eigen::DefaultDevice>() const { return reinterpret_cast(this)->eigen_device(); } @@ -37,6 +37,12 @@ Place CPUDeviceContext::GetPlace() const { return CPUPlace(); } #ifndef PADDLE_ONLY_CPU +template <> +Eigen::GpuDevice* +DeviceContext::GetEigenDevice() const { + return reinterpret_cast(this)->eigen_device(); +} + class EigenCudaStreamDevice : public Eigen::StreamInterface { public: EigenCudaStreamDevice() : scratch_(nullptr), semaphore_(nullptr) { @@ -90,11 +96,6 @@ class EigenCudaStreamDevice : public Eigen::StreamInterface { mutable unsigned int* semaphore_; }; -template <> -Eigen::GpuDevice* DeviceContext::get_eigen_device() const { - return reinterpret_cast(this)->eigen_device(); -} - CUDADeviceContext::CUDADeviceContext(GPUPlace place) : place_(place) { SetDeviceId(place_.device); PADDLE_ENFORCE(cudaStreamCreate(&stream_)); diff --git a/paddle/platform/device_context.h b/paddle/platform/device_context.h index f6a39a8e26c301296aac0af7f4e8b2c6c97ece24..d805d2ab085f76e119edf1c6f2acb9715883d755 100644 --- a/paddle/platform/device_context.h +++ b/paddle/platform/device_context.h @@ -27,13 +27,23 @@ limitations under the License. */ namespace paddle { namespace platform { +template +struct EigenDeviceConverter; + +template <> +struct EigenDeviceConverter { + using EigenDeviceType = Eigen::DefaultDevice; +}; + class DeviceContext { public: virtual ~DeviceContext() {} virtual Place GetPlace() const = 0; - template - DeviceType* get_eigen_device() const; + template ::EigenDeviceType> + DeviceType* GetEigenDevice() const; virtual void Wait() const {} }; @@ -52,6 +62,11 @@ class CPUDeviceContext : public DeviceContext { }; #ifndef PADDLE_ONLY_CPU +template <> +struct EigenDeviceConverter { + using EigenDeviceType = Eigen::GpuDevice; +}; + class EigenCudaStreamDevice; class CUDADeviceContext : public DeviceContext { diff --git a/paddle/platform/device_context_test.cc b/paddle/platform/device_context_test.cc index 5883a55272f0f24c94d48bc43c62ddb7bef15465..f4b00c57dee5196e535816d8985fd7e831c4c226 100644 --- a/paddle/platform/device_context_test.cc +++ b/paddle/platform/device_context_test.cc @@ -24,7 +24,7 @@ TEST(Device, Init) { for (int i = 0; i < count; i++) { DeviceContext* device_context = new CUDADeviceContext(GPUPlace(i)); Eigen::GpuDevice* gpu_device = - device_context->template get_eigen_device(); + device_context->template GetEigenDevice(); ASSERT_NE(nullptr, gpu_device); delete device_context; } diff --git a/paddle/platform/place.cc b/paddle/platform/place.cc index b31515e1f028acac885a506ff1c20479407a05e3..856e54df89c1c18ade040957188a2fbda0901473 100644 --- a/paddle/platform/place.cc +++ b/paddle/platform/place.cc @@ -47,7 +47,7 @@ bool is_cpu_place(const Place &p) { } bool places_are_same_class(const Place &p1, const Place &p2) { - return is_gpu_place(p1) == is_gpu_place(p2); + return p1.which() == p2.which(); } std::ostream &operator<<(std::ostream &os, const Place &p) { diff --git a/paddle/platform/place.h b/paddle/platform/place.h index 1117476bb37f1b0f3876c55e610803d5ee2558ce..0efc6932349a5b3ad295d195a16737a642e18943 100644 --- a/paddle/platform/place.h +++ b/paddle/platform/place.h @@ -15,6 +15,7 @@ limitations under the License. */ #pragma once #include + #include "paddle/platform/variant.h" namespace paddle { @@ -46,8 +47,18 @@ struct IsGPUPlace : public boost::static_visitor { bool operator()(const GPUPlace &gpu) const { return true; } }; +// Define the max number of Place in bit length. i.e., the max number of places +// should be less equal than 2^(NUM_PLACE_TYPE_LIMIT_IN_BIT) +#define NUM_PLACE_TYPE_LIMIT_IN_BIT 4 + typedef boost::variant Place; +// static check number of place types is less equal than +// 2^(NUM_PLACE_TYPE_LIMIT_IN_BIT) +BOOST_MPL_ASSERT((boost::mpl::less_equal< + Place::types::size, + boost::mpl::long_<1 << NUM_PLACE_TYPE_LIMIT_IN_BIT>>)); + void set_place(const Place &); const Place &get_place(); diff --git a/paddle/platform/variant.h b/paddle/platform/variant.h index c2257af1b5dd1a1e284979bf17e1a947072baa85..16ee00efe7a9b0406f8459e19a55e1e1b9ca7419 100644 --- a/paddle/platform/variant.h +++ b/paddle/platform/variant.h @@ -29,4 +29,6 @@ #endif #endif +#include +#include #include diff --git a/paddle/pybind/pybind.cc b/paddle/pybind/pybind.cc index d85bf6c7faa5f65c7b39682f7639fe269bdfa345..f4121e9d71824296770f86c1e94c096f767dec0a 100644 --- a/paddle/pybind/pybind.cc +++ b/paddle/pybind/pybind.cc @@ -77,20 +77,18 @@ PYBIND11_PLUGIN(core) { }) .def("set", PyCPUTensorSetFromArray) .def("set", PyCPUTensorSetFromArray) + .def("set", PyCPUTensorSetFromArray) #ifndef PADDLE_ONLY_CPU .def("set", PyCUDATensorSetFromArray) .def("set", PyCUDATensorSetFromArray) + .def("set", PyCUDATensorSetFromArray) #endif .def("shape", [](Tensor &self) { return vectorize(self.dims()); }) - .def("set_float_element", - [](Tensor &self, size_t offset, float f) { - // TODO(yuyang18): Only support GPU now. - self.data()[offset] = f; - }) - .def("get_float_element", [](Tensor &self, size_t offset) -> float { - // TODO(yuyang18): Only support GPU now. - return self.data()[offset]; - }); + .def("set_float_element", TensorSetElement) + .def("get_float_element", TensorGetElement) + .def("set_double_element", TensorSetElement) + .def("get_double_element", TensorGetElement) + .def("dtype", [](Tensor &self) { return ToDataType(self.type()); }); py::class_(m, "LoDTensor") .def_buffer( diff --git a/paddle/pybind/tensor_py.h b/paddle/pybind/tensor_py.h index f0d5a6f9ff963ecd80d0c261daff56bff50663d4..3e3e6bc0312974fab50e17d428c7dea9ca547d1e 100644 --- a/paddle/pybind/tensor_py.h +++ b/paddle/pybind/tensor_py.h @@ -42,7 +42,7 @@ template struct CastToPyBufferImpl { using CUR_TYPE = typename std::tuple_element>::type; py::buffer_info operator()(framework::Tensor &tensor) { - if (std::type_index(typeid(CUR_TYPE)) == tensor.holder_->type()) { + if (std::type_index(typeid(CUR_TYPE)) == tensor.type()) { auto dim_vec = framework::vectorize(tensor.dims()); std::vector dims_outside; std::vector strides; @@ -56,13 +56,13 @@ struct CastToPyBufferImpl { prod *= dims_outside[i - 1]; } framework::Tensor dst_tensor; - if (paddle::platform::is_gpu_place(tensor.holder_->place())) { + if (paddle::platform::is_gpu_place(tensor.place())) { dst_tensor.CopyFrom(tensor, platform::CPUPlace()); - } else if (paddle::platform::is_cpu_place(tensor.holder_->place())) { + } else if (paddle::platform::is_cpu_place(tensor.place())) { dst_tensor = tensor; } return py::buffer_info( - dst_tensor.mutable_data(dst_tensor.holder_->place()), + dst_tensor.mutable_data(dst_tensor.place()), sizeof(CUR_TYPE), py::format_descriptor::format(), (size_t)framework::arity(dst_tensor.dims()), dims_outside, strides); } else { @@ -73,10 +73,23 @@ struct CastToPyBufferImpl { }; } // namespace details inline py::buffer_info CastToPyBuffer(framework::Tensor &tensor) { - auto buffer_info = details::CastToPyBufferImpl()(tensor); + auto buffer_info = + details::CastToPyBufferImpl()(tensor); return buffer_info; } +template +T TensorGetElement(framework::Tensor &self, size_t offset) { + PADDLE_ENFORCE(platform::is_cpu_place(self.place())); + return self.data()[offset]; +} + +template +void TensorSetElement(framework::Tensor &self, size_t offset, T elem) { + PADDLE_ENFORCE(platform::is_cpu_place(self.place())); + self.data()[offset] = elem; +} + template void PyCPUTensorSetFromArray( framework::Tensor &self, diff --git a/paddle/scripts/submit_local.sh.in b/paddle/scripts/submit_local.sh.in index 26f9c0fcd4e045f5d603fc4e4b16691a418823ca..5c4b5a2495182ea5d2b3341cff650dfb4d8b0c0f 100755 --- a/paddle/scripts/submit_local.sh.in +++ b/paddle/scripts/submit_local.sh.in @@ -18,7 +18,7 @@ function version(){ echo "PaddlePaddle @PADDLE_VERSION@, compiled with" echo " with_avx: @WITH_AVX@" echo " with_gpu: @WITH_GPU@" - echo " with_mkldnn: @WITH_MKLDNN" + echo " with_mkldnn: @WITH_MKLDNN@" echo " with_mklml: @WITH_MKLML@" echo " with_double: @WITH_DOUBLE@" echo " with_python: @WITH_PYTHON@" diff --git a/python/paddle/v2/framework/tests/op_test.py b/python/paddle/v2/framework/tests/op_test.py index 89979044f29a301daa7435ff903ae902c981ea1b..75df2eeddfe67269d4709887c7cfdb8fab108bd8 100644 --- a/python/paddle/v2/framework/tests/op_test.py +++ b/python/paddle/v2/framework/tests/op_test.py @@ -1,5 +1,6 @@ import unittest import numpy as np +import random import itertools import paddle.v2.framework.core as core from paddle.v2.framework.op import Operator @@ -12,17 +13,19 @@ def grad_var_name(var_name): def create_op(scope, op_type, inputs, outputs, attrs): kwargs = dict() + def __create_var__(name, var_name): + scope.new_var(var_name) + kwargs[name].append(var_name) + for in_name, in_dup in Operator.get_op_inputs(op_type): if in_name in inputs: kwargs[in_name] = [] if in_dup: sub_in = inputs[in_name] for sub_in_name, _ in sub_in: - var = scope.new_var(sub_in_name) - kwargs[in_name].append(sub_in_name) + __create_var__(in_name, sub_in_name) else: - var = scope.new_var(in_name) - kwargs[in_name].append(in_name) + __create_var__(in_name, in_name) for out_name, out_dup in Operator.get_op_outputs(op_type): if out_name in outputs: @@ -30,11 +33,9 @@ def create_op(scope, op_type, inputs, outputs, attrs): if out_dup: sub_out = outputs[out_name] for sub_out_name, _ in sub_out: - var = scope.new_var(sub_out_name) - kwargs[out_name].append(sub_out_name) + __create_var__(out_name, sub_out_name) else: - var = scope.new_var(out_name) - kwargs[out_name].append(out_name) + __create_var__(out_name, out_name) for attr_name in Operator.get_op_attr_names(op_type): if attr_name in attrs: @@ -44,49 +45,46 @@ def create_op(scope, op_type, inputs, outputs, attrs): def set_input(scope, op, inputs, place): + def __set_input__(var_name, var): + tensor = scope.find_var(var_name).get_tensor() + if isinstance(var, tuple): + tensor.set_lod(var[1]) + var = var[0] + tensor.set_dims(var.shape) + tensor.set(var, place) + for in_name, in_dup in Operator.get_op_inputs(op.type()): if in_name in inputs: if in_dup: sub_in = inputs[in_name] for sub_in_name, sub_in_val in sub_in: - var = scope.find_var(sub_in_name) - tensor = var.get_tensor() - sub_in_array = sub_in_val[0] \ - if isinstance(sub_in_val, tuple) else sub_in_val - tensor.set_dims(sub_in_array.shape) - tensor.set(sub_in_array, place) - if isinstance(sub_in_val, tuple): - tensor.set_lod(sub_in_val[1]) + __set_input__(sub_in_name, sub_in_val) else: - var = scope.find_var(in_name) - tensor = var.get_tensor() - in_val = inputs[in_name] - in_array = in_val[0] if isinstance(in_val, tuple) else in_val - tensor.set_dims(in_array.shape) - tensor.set(in_array, place) - if isinstance(in_val, tuple): - tensor.set_lod(in_val[1]) + __set_input__(in_name, inputs[in_name]) def set_output_grad(scope, op, outputs, place): + def __set_tensor__(name): + out_tensor = scope.find_var(name).get_tensor() + grad_tensor = scope.new_var(grad_var_name(name)).get_tensor() + out_dtype = out_tensor.dtype() + if out_dtype == core.DataType.FP64: + data = np.ones(out_tensor.shape(), dtype=np.float64) + elif out_dtype == core.DataType.FP32: + data = np.ones(out_tensor.shape(), dtype=np.float32) + else: + raise ValueError("Not supported data type " + str(out_dtype)) + + grad_tensor.set(data, place) + for out_name, out_dup in Operator.get_op_outputs(op.type()): if out_name in outputs: if out_dup: sub_out = outputs[out_name] for sub_out_name, _ in sub_out: - out_tensor = scope.find_var(sub_out_name).get_tensor() - grad_tensor = scope.new_var(grad_var_name( - sub_out_name)).get_tensor() - grad_tensor.set_dims(out_tensor.shape()) - data = np.ones(out_tensor.shape(), dtype=np.float32) - grad_tensor.set(data, place) + __set_tensor__(sub_out_name) else: - out_tensor = scope.find_var(out_name).get_tensor() - grad_tensor = scope.new_var(grad_var_name(out_name)).get_tensor( - ) - grad_tensor.set_dims(out_tensor.shape()) - data = np.ones(out_tensor.shape(), dtype=np.float32) - grad_tensor.set(data, place) + __set_tensor__(out_name) def get_numeric_gradient(scope, @@ -96,7 +94,6 @@ def get_numeric_gradient(scope, output_names, delta=0.005, in_place=False): - set_input(scope, op, inputs, core.CPUPlace()) tensor_to_check = scope.find_var(input_to_check).get_tensor() @@ -115,7 +112,29 @@ def get_numeric_gradient(scope, tensor_to_check = scope.find_var(input_to_check).get_tensor() tensor_size = product(tensor_to_check.get_dims()) - gradient_flat = np.zeros(shape=(tensor_size, ), dtype='float32') + tensor_to_check_dtype = tensor_to_check.dtype() + if tensor_to_check_dtype == core.DataType.FP32: + tensor_to_check_dtype = np.float32 + elif tensor_to_check_dtype == core.DataType.FP64: + tensor_to_check_dtype = np.float64 + else: + raise ValueError("Not supported data type " + str( + tensor_to_check_dtype)) + + gradient_flat = np.zeros(shape=(tensor_size, ), dtype=tensor_to_check_dtype) + + def __get_elem__(tensor, i): + if tensor_to_check_dtype == np.float32: + return tensor.get_float_element(i) + else: + return tensor.get_double_element(i) + + def __set_elem__(tensor, i, e): + if tensor_to_check_dtype == np.float32: + tensor.set_float_element(i, e) + else: + tensor.set_double_element(i, e) + # we only compute gradient of one element each time. # we use a for loop to compute the gradient of every element. for i in xrange(tensor_size): @@ -123,20 +142,20 @@ def get_numeric_gradient(scope, set_input(scope, op, inputs, core.CPUPlace()) # get one input element throw it's index i. - origin = tensor_to_check.get_float_element(i) + origin = __get_elem__(tensor_to_check, i) # add delta to it, run op and then get the sum of the result tensor. x_pos = origin + delta - tensor_to_check.set_float_element(i, x_pos) + __set_elem__(tensor_to_check, i, x_pos) y_pos = get_output() if in_place: set_input(scope, op, inputs, core.CPUPlace()) x_neg = origin - delta - tensor_to_check.set_float_element(i, x_neg) + __set_elem__(tensor_to_check, i, x_neg) y_neg = get_output() - tensor_to_check.set_float_element(i, origin) + __set_elem__(tensor_to_check, i, origin) gradient_flat[i] = (y_pos - y_neg) / delta / 2 return gradient_flat.reshape(tensor_to_check.get_dims()) @@ -174,6 +193,21 @@ def get_gradient(scope, op, inputs, outputs, grad_name, place, class OpTest(unittest.TestCase): + @classmethod + def setUpClass(cls): + '''Fix random seeds to remove randomness from tests''' + cls._np_rand_state = np.random.get_state() + cls._py_rand_state = random.getstate() + + np.random.seed(123) + random.seed(124) + + @classmethod + def tearDownClass(cls): + '''Restore random seeds''' + np.random.set_state(cls._np_rand_state) + random.setstate(cls._py_rand_state) + def check_output_with_place(self, place, atol): self.scope = core.Scope() op_inputs = self.inputs if hasattr(self, "inputs") else dict() diff --git a/python/paddle/v2/framework/tests/test_activation_op.py b/python/paddle/v2/framework/tests/test_activation_op.py index 8f6d2be17758b7f6604d2db74fe466fb30695bd5..c44eb849063592fbda417ec1516d195dd4358612 100644 --- a/python/paddle/v2/framework/tests/test_activation_op.py +++ b/python/paddle/v2/framework/tests/test_activation_op.py @@ -219,5 +219,22 @@ class TestSTanh(OpTest): self.check_grad(['X'], 'Y', max_relative_error=0.007) +class TestSoftsign(OpTest): + def setUp(self): + self.op_type = "softsign" + self.inputs = { + 'X': np.random.uniform(-1, 1, [11, 17]).astype("float32") + } + self.outputs = { + 'Y': np.divide(self.inputs['X'], 1 + np.abs(self.inputs['X'])) + } + + def test_check_output(self): + self.check_output() + + def test_check_grad(self): + self.check_grad(['X'], 'Y', max_relative_error=0.007) + + if __name__ == "__main__": unittest.main() diff --git a/python/paddle/v2/framework/tests/test_cross_entropy_op.py b/python/paddle/v2/framework/tests/test_cross_entropy_op.py index 1de514dff487158e0823fd628d9b3b50f36fdd9b..4ea14da7fd3d84870965d62514d6a79b4926a6ec 100644 --- a/python/paddle/v2/framework/tests/test_cross_entropy_op.py +++ b/python/paddle/v2/framework/tests/test_cross_entropy_op.py @@ -80,7 +80,7 @@ class TestCrossEntropyOp3(OpTest): cross_entropy2 = (-label * np.log(X)).sum( axis=1, keepdims=True).astype("float32") - self.inputs = {"X": X, "Label": label} + self.inputs = {"X": X, "Label": label.astype(np.float32)} self.outputs = {"Y": cross_entropy} self.attrs = {"softLabel": True} diff --git a/python/paddle/v2/framework/tests/test_elementwise_mul_op.py b/python/paddle/v2/framework/tests/test_elementwise_mul_op.py index cee4385a8176f7a441a280e3cd40c39ca51493c5..261ca9cb3da90dee91b016fee98f67b4c19356a1 100644 --- a/python/paddle/v2/framework/tests/test_elementwise_mul_op.py +++ b/python/paddle/v2/framework/tests/test_elementwise_mul_op.py @@ -7,8 +7,8 @@ class ElementwiseMulOp(OpTest): def setUp(self): self.op_type = "elementwise_mul" self.inputs = { - 'X': np.random.uniform(0.1, 1, [13, 17]).astype("float32"), - 'Y': np.random.uniform(0.1, 1, [13, 17]).astype("float32") + 'X': np.random.uniform(0.1, 1, [13, 17]).astype("float64"), + 'Y': np.random.uniform(0.1, 1, [13, 17]).astype("float64") } self.outputs = {'Out': np.multiply(self.inputs['X'], self.inputs['Y'])} @@ -16,23 +16,21 @@ class ElementwiseMulOp(OpTest): self.check_output() def test_check_grad_normal(self): - self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.1) + self.check_grad(['X', 'Y'], 'Out') def test_check_grad_ingore_x(self): - self.check_grad( - ['Y'], 'Out', max_relative_error=0.1, no_grad_set=set("X")) + self.check_grad(['Y'], 'Out', no_grad_set=set("X")) def test_check_grad_ingore_y(self): - self.check_grad( - ['X'], 'Out', max_relative_error=0.1, no_grad_set=set('Y')) + self.check_grad(['X'], 'Out', no_grad_set=set('Y')) class TestElementwiseMulOp_Vector(ElementwiseMulOp): def setUp(self): self.op_type = "elementwise_mul" self.inputs = { - 'X': np.random.random((32, )).astype("float32"), - 'Y': np.random.random((32, )).astype("float32") + 'X': np.random.random((32, )).astype("float64"), + 'Y': np.random.random((32, )).astype("float64") } self.outputs = {'Out': np.multiply(self.inputs['X'], self.inputs['Y'])} @@ -41,8 +39,8 @@ class TestElementwiseMulOp_broadcast_0(ElementwiseMulOp): def setUp(self): self.op_type = "elementwise_mul" self.inputs = { - 'X': np.random.rand(2, 3, 4).astype(np.float32), - 'Y': np.random.rand(2).astype(np.float32) + 'X': np.random.rand(2, 3, 4).astype(np.float64), + 'Y': np.random.rand(2).astype(np.float64) } self.attrs = {'axis': 0} @@ -55,8 +53,8 @@ class TestElementwiseMulOp_broadcast_1(ElementwiseMulOp): def setUp(self): self.op_type = "elementwise_mul" self.inputs = { - 'X': np.random.rand(2, 3, 4).astype(np.float32), - 'Y': np.random.rand(3).astype(np.float32) + 'X': np.random.rand(2, 3, 4).astype(np.float64), + 'Y': np.random.rand(3).astype(np.float64) } self.attrs = {'axis': 1} @@ -69,8 +67,8 @@ class TestElementwiseMulOp_broadcast_2(ElementwiseMulOp): def setUp(self): self.op_type = "elementwise_mul" self.inputs = { - 'X': np.random.rand(2, 3, 4).astype(np.float32), - 'Y': np.random.rand(4).astype(np.float32) + 'X': np.random.rand(2, 3, 4).astype(np.float64), + 'Y': np.random.rand(4).astype(np.float64) } self.outputs = { @@ -82,8 +80,8 @@ class TestElementwiseMulOp_broadcast_3(ElementwiseMulOp): def setUp(self): self.op_type = "elementwise_mul" self.inputs = { - 'X': np.random.rand(2, 3, 4, 5).astype(np.float32), - 'Y': np.random.rand(3, 4).astype(np.float32) + 'X': np.random.rand(2, 3, 4, 5).astype(np.float64), + 'Y': np.random.rand(3, 4).astype(np.float64) } self.attrs = {'axis': 1} diff --git a/python/paddle/v2/framework/tests/test_prelu_op.py b/python/paddle/v2/framework/tests/test_prelu_op.py index 676fd9f7c555fd5c8544e760345ab954cd137dc5..7be932ac8f6b82283fecd32ac4b3b7bb9aff0338 100644 --- a/python/paddle/v2/framework/tests/test_prelu_op.py +++ b/python/paddle/v2/framework/tests/test_prelu_op.py @@ -17,7 +17,7 @@ class PReluTest(OpTest): x_np_sign = np.sign(x_np) x_np = x_np_sign * np.maximum(x_np, .005) - alpha_np = np.array([.1]) + alpha_np = np.array([.1], dtype="float32") self.inputs = {'X': x_np, 'Alpha': alpha_np} out_np = np.maximum(self.inputs['X'], 0.) out_np = out_np + np.minimum(self.inputs['X'], diff --git a/python/paddle/v2/framework/tests/test_rowwise_add_op.py b/python/paddle/v2/framework/tests/test_rowwise_add_op.py deleted file mode 100644 index 336645bd993ff743cbe20bb5cae5cd278db57ce7..0000000000000000000000000000000000000000 --- a/python/paddle/v2/framework/tests/test_rowwise_add_op.py +++ /dev/null @@ -1,51 +0,0 @@ -import unittest -import numpy as np -from op_test import OpTest - - -class TestRowwiseAddOp(OpTest): - def setUp(self): - self.op_type = "rowwise_add" - self.inputs = { - 'X': np.random.uniform(0.1, 1, [5, 10]).astype("float32"), - 'b': np.random.uniform(0.1, 1, [10]).astype("float32") - } - self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])} - - def test_check_output(self): - self.check_output() - - def test_check_grad_normal(self): - self.check_grad(['X', 'b'], 'Out') - - def test_check_grad_ingore_b(self): - self.check_grad(['X'], 'Out', no_grad_set=set('b')) - - def test_check_grad_ingore_x(self): - self.check_grad(['b'], 'Out', no_grad_set=set('X')) - - -class TestRowwiseAddOp2(OpTest): - def setUp(self): - self.op_type = "rowwise_add" - self.inputs = { - 'X': np.random.uniform(0.1, 1, [2, 3, 2, 5]).astype("float32"), - 'b': np.random.uniform(0.1, 1, [2, 5]).astype("float32") - } - self.outputs = {'Out': np.add(self.inputs['X'], self.inputs['b'])} - - def test_check_output(self): - self.check_output() - - def test_check_grad_normal(self): - self.check_grad(['X', 'b'], 'Out') - - def test_check_grad_ignore_b(self): - self.check_grad(['X'], 'Out', no_grad_set=set('b')) - - def test_check_grad_ignore_x(self): - self.check_grad(['b'], 'Out', no_grad_set=set('X')) - - -if __name__ == "__main__": - unittest.main() diff --git a/python/paddle/v2/framework/tests/test_sequence_softmax_op.py b/python/paddle/v2/framework/tests/test_sequence_softmax_op.py new file mode 100644 index 0000000000000000000000000000000000000000..b54a56aa6d3f76baa4d1fc6ba8f963332deba002 --- /dev/null +++ b/python/paddle/v2/framework/tests/test_sequence_softmax_op.py @@ -0,0 +1,38 @@ +import unittest +import numpy as np +from op_test import OpTest + + +def stable_softmax(x): + """Compute the softmax of vector x in a numerically stable way.""" + shiftx = x - np.max(x).clip(-64.) + exps = np.exp(shiftx) + return exps / np.sum(exps) + + +class TestSequenceSoftmaxOp(OpTest): + def setUp(self): + self.op_type = "sequence_softmax" + x = np.random.uniform(0.1, 1, (11, 1)).astype("float32") + lod = [[0, 4, 5, 8, 11]] + + out = np.zeros((11, 1)).astype("float32") + for i in range(4): + sub_x = x[lod[0][i]:lod[0][i + 1], :] + sub_x = sub_x.reshape(1, lod[0][i + 1] - lod[0][i]) + sub_out = stable_softmax(sub_x) + out[lod[0][i]:lod[0][i + 1], :] = sub_out.reshape( + lod[0][i + 1] - lod[0][i], 1) + + self.inputs = {"X": (x, lod)} + self.outputs = {"Out": out} + + def test_check_output(self): + self.check_output() + + def test_check_grad(self): + self.check_grad(["X"], "Out", max_relative_error=0.01) + + +if __name__ == "__main__": + unittest.main() diff --git a/python/paddle/v2/framework/tests/test_softmax_with_cross_entropy_op.py b/python/paddle/v2/framework/tests/test_softmax_with_cross_entropy_op.py index 428395b76c8fbcbc07b19ee1979419f0e64aca85..377d07fb5927a108e9bd39ab227da4f40a9cd447 100644 --- a/python/paddle/v2/framework/tests/test_softmax_with_cross_entropy_op.py +++ b/python/paddle/v2/framework/tests/test_softmax_with_cross_entropy_op.py @@ -43,7 +43,7 @@ class TestSoftmaxWithCrossEntropyOp2(OpTest): def setUp(self): self.op_type = "softmax_with_cross_entropy" batch_size = 2 - class_num = 17 + class_num = 37 logits = np.random.uniform(0.1, 1.0, [batch_size, class_num]).astype("float32") diff --git a/python/paddle/v2/inference.py b/python/paddle/v2/inference.py index e80456d9bbeb3c34ac9eab873a84dbf8f06e34df..9148cb56cf78e1ebb994f4a4a34d4a1b6e2e6ef4 100644 --- a/python/paddle/v2/inference.py +++ b/python/paddle/v2/inference.py @@ -96,6 +96,9 @@ class Inference(object): for i, item in enumerate(result): retv[i].append(item) + if retv == None: + return [] + if flatten_result: retv = [numpy.concatenate(out) for out in retv]