提交 54cbf799 编写于 作者: Y yuyang18

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into feature/change_pe_strategy

...@@ -9,7 +9,7 @@ import subprocess ...@@ -9,7 +9,7 @@ import subprocess
import platform import platform
COPYRIGHT = ''' COPYRIGHT = '''
Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved. Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License"); Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. you may not use this file except in compliance with the License.
......
# Embed Paddle Inference in Your Application
Paddle inference offers the APIs in `C` and `C++` languages.
One can easily deploy a model trained by Paddle following the steps as below:
1. Optimize the native model;
2. Write some codes for deployment.
Let's explain the steps in detail.
## Optimize the native Fluid Model
The native model that get from the training phase needs to be optimized for that.
- Clean the noise such as the cost operators that do not need inference;
- Prune unnecessary computation fork that has nothing to do with the output;
- Remove extraneous variables;
- Memory reuse for native Fluid executor;
- Translate the model storage format to some third-party engine's, so that the inference API can utilize the engine for acceleration;
We have an official tool to do the optimization, call `paddle_inference_optimize --help` for more information.
## Write some codes
Read `paddle_inference_api.h` for more information.
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <string>
#include <vector>
namespace paddle {
class Predictor {
public:
struct Attr;
Predictor() = default;
// Build the network before inference.
bool Init(const Attr& attr);
// Predict an record.
// Arguments:
// inputs: the name of the input variables.
// outputs: the name of the output varaibles.
// input_shapes: the shape of the input variables.
// output_shapes: the shape of the output variables.
// input_data: the data of the input variables.
// output_data: the data of the output variables.
bool Run(const std::vector<std::string>& inputs,
const std::vector<std::string>& outputs,
const std::vector<std::vector<int>>& input_shapes,
const std::vector<std::vector<int>>& output_shapes,
const std::vector<std::vector<float>>& input_data,
std::vector<std::vector<float>>* output_data);
// Clone a predictor that share the model weights.
Predictor* Clone();
// Destroy the Predictor.
~Predictor();
struct Attr {
enum class EngineKind;
std::string model_dir; // path to the model directory.
bool enable_engine{false}; // Enable to execute (part of) the model on
// third-party engines.
EngineKind engine_kind{Attr::EngineKind::kNone};
enum class EngineKind {
kNone = -1, // Use the native Fluid facility.
kAnakin, // Use Anakin for inference.
kTensorRT, // Use TensorRT for inference.
kAutoMixedAnakin, // Automatically mix Fluid with Anakin.
kAutoMixedTensorRT, // Automatically mix Fluid with TensorRT.
};
};
};
} // namespace paddle
...@@ -77,8 +77,7 @@ print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA) ...@@ -77,8 +77,7 @@ print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)
### Example 2. Sharing Parameters between "Models" ### Example 2. Sharing Parameters between "Models"
We use [GAN](https://github.com/PaddlePaddle/book/tree/develop/gan) in We use GAN in this example. In the following example program, `d0` and `d1`
this example. In the following example program, `d0` and `d1`
correspond to the two networks in the following figure: correspond to the two networks in the following figure:
<img src="https://github.com/wangyang59/book/raw/00036f4b0da5225041a6824587c1a01cf20159b1/gan/image/gan_ig.png" width=400 /> <img src="https://github.com/wangyang59/book/raw/00036f4b0da5225041a6824587c1a01cf20159b1/gan/image/gan_ig.png" width=400 />
......
...@@ -75,7 +75,7 @@ Different layout leads to different implementation of the operator kernel. There ...@@ -75,7 +75,7 @@ Different layout leads to different implementation of the operator kernel. There
- The inference of Layout is at run-time, not at compile-time. - The inference of Layout is at run-time, not at compile-time.
- Every operator has to implement different kernels for different layouts. Let's take MKLDNN as an example. If we want to implement an MKLDNN convolution operator, we have to implement all the kernels for different layouts, which are listed [here](http://01org.github.io/mkl-dnn/structmkldnn_1_1memory.html). And we will have a special macro to register kernels for MKLDNN operators. - Every operator has to implement different kernels for different layouts. Let's take MKLDNN as an example. If we want to implement an MKLDNN convolution operator, we have to implement all the kernels for different layouts, which are listed [here](http://intel.github.io/mkl-dnn/structmkldnn_1_1memory.html). And we will have a special macro to register kernels for MKLDNN operators.
`Layout` is also defined as a enum variable: `Layout` is also defined as a enum variable:
......
# Distributed Training with NCCL2 and RDMA
When doing distributed multi-GPU training, network bandwith often becomes the
bottle neck. We introduce a way to use NCCL2 to do such training job to
achieve best performace.
## Prepare Hardwares with RDMA and Multiple GPUs
I'm using two Linux servers each of them is installed with 8 GPUs and
one 100Gb RDMA card.
Base environment is:
* OS: CentOS 7.4
* RDMA device: "Mellanox Technologies MT27700 Family [ConnectX-4]"
* Kernel version: `4.4.88-1.el7.elrepo.x86_64`
* Docker version: `1.12.6`
* Docker storage driver: `overlay2`
* IP addresses: 192.168.16.30,192.168.16.34
In general, the steps including:
1. Install GPU drivers
1. Install RDMA drivers
1. Install "InfiniBand Support"
1. Use docker to run tests and make sure GPUs and RDMA can work inside
the container.
I'll ommit section "Install GPU drivers" because we can find it easily
somewhere else.
### Install RDMA drivers
For my case, I've got two machines with device
"Mellanox Technologies MT27700 Family [ConnectX-4]" installed. The OS was
"CentOS 7.4" and I updated the kernel to version 4.4 so that docker can
work with latest overlay2 filesystem.
***NOTE: before you start, make sure you have a way to get a console
of the server other than ssh because we may need to re-configure the
network device.***
1. Go to http://www.mellanox.com/page/products_dyn?product_family=26,
download `MLNX_OFED` software in the bottom of the page, and upload it
onto the server.
1. Run `./mlnxofedinstall --add-kernel-support` in the software package.
1. Run `/etc/init.d/openibd restart` to make everything work, note that
this operation may cause the network goes down if you are using this
RDMA device as default network device and use ssh to login the server.
1. Re-configure the network interface, for example:
`ifconfig eth2 192.168.16.30/20 up`, then add routes if needed:
`ip route add default via 192.168.16.1 dev eth2`.
1. Do the same thing on the other node.
1. Use `ping` to test if the two nodes have typical ICMP connection.
1. Use either `udaddy` or `ib_write_bw` to test the network connection is
ready and have the desired bandwith.
### Prepare Docker Image to Run RDMA Programs
1. Build a docker image using cuda base image like: `nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04` and install paddlepaddle whl
package in it.
1. Start a docker container and mount GPU driver libs into it (you can
skip this step if you are using nvidia-docker).
1. Mount RDMA dirvers and libs into the docker image (see below section),
also `udaddy` and `ib_write_bw` if needed.
1. Mount GPU devices and RDMA devices into the container using `--device`
or just use privileged mode `--privileged`.
1. Start the container using host network mode: `--net=host`
### RDMA Library Files Needed
Usually, `MLNX_OFED` install latest supported libs under
`/usr/lib64/mlnx_ofed/valgrind`. Other libs also needed to run RDMA programs
is listed below. These libs must be mounted into the docker container.
* Libs under `/usr/lib64/mlnx_ofed/valgrind`
* libibcm.so
* libibverbs.so
* libmlx4.so
* libmlx5.so
* libmlx5-rdmav2.so
* librdmacm.so
* Other libs:
* libnl-3.so.200
* libnl-route-3.so.200
* libnuma.so.1
## Start to Run the Training Job
Setting NCCL environment variables to turn NCCL switches on and off:
| Env Name | Description |
| --- | --- |
| NCCL_SOCKET_IFNAME | The RDMA device, e.g. eth2 |
| NCCL_P2P_DISABLE | Set to 1 to disable P2P transfer between GPUs |
| NCCL_IB_DISABLE | Set to 1 to disable using RDMA |
| NCCL_IB_CUDA_SUPPORT | Set to 1 to enable GPU Direct if supported |
| NCCL_DEBUG | Set debug level: VERSION, WARN, INFO |
My two servers are: `192.168.16.30,192.168.16.34`, On node 1, Run :
```bash
PADDLE_TRAINER_ID=0 PADDLE_PORT=48372 PADDLE_WORKERS=192.168.16.30,192.168.16.34 POD_IP=192.168.16.30 stdbuf -oL python vgg16.py
```
On node 2, Run:
```bash
PADDLE_TRAINER_ID=1 PADDLE_PORT=48372 PADDLE_WORKERS=192.168.16.30,192.168.16.34 POD_IP=192.168.16.34 stdbuf -oL python vgg16.py
```
...@@ -57,7 +57,7 @@ cc_library(data_transform SRCS data_transform.cc DEPS math_function tensor ...@@ -57,7 +57,7 @@ cc_library(data_transform SRCS data_transform.cc DEPS math_function tensor
cc_library(attribute SRCS attribute.cc DEPS framework_proto boost) cc_library(attribute SRCS attribute.cc DEPS framework_proto boost)
cc_test(program_desc_test SRCS program_desc_test.cc DEPS proto_desc cc_test(program_desc_test SRCS program_desc_test.cc DEPS proto_desc
device_context) device_context)
cc_library(op_proto_maker SRCS op_proto_maker.cc DEPS framework_proto attribute) cc_library(op_proto_maker SRCS op_proto_maker.cc DEPS framework_proto attribute glog)
cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker) cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker)
cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto) cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto)
cc_library(shape_inference SRCS shape_inference.cc DEPS ddim attribute device_context) cc_library(shape_inference SRCS shape_inference.cc DEPS ddim attribute device_context)
......
...@@ -134,6 +134,11 @@ OpDesc *BlockDesc::PrependOp() { ...@@ -134,6 +134,11 @@ OpDesc *BlockDesc::PrependOp() {
return ops_.front().get(); return ops_.front().get();
} }
void BlockDesc::PrependAllocatedOp(std::unique_ptr<OpDesc> &&op_desc) {
need_update_ = true;
ops_.emplace_front(std::move(op_desc));
}
OpDesc *BlockDesc::InsertOp(size_t index) { OpDesc *BlockDesc::InsertOp(size_t index) {
need_update_ = true; need_update_ = true;
auto it = ops_.begin() + index; auto it = ops_.begin() + index;
......
...@@ -88,6 +88,8 @@ class BlockDesc { ...@@ -88,6 +88,8 @@ class BlockDesc {
OpDesc *PrependOp(); OpDesc *PrependOp();
void PrependAllocatedOp(std::unique_ptr<OpDesc> &&op_desc);
OpDesc *InsertOp(size_t index); OpDesc *InsertOp(size_t index);
/* /*
......
...@@ -32,8 +32,7 @@ struct AddFunctor { ...@@ -32,8 +32,7 @@ struct AddFunctor {
class OpKernelTestProtoAndCheckerMaker : public OpProtoAndCheckerMaker { class OpKernelTestProtoAndCheckerMaker : public OpProtoAndCheckerMaker {
public: public:
OpKernelTestProtoAndCheckerMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("input", "input1 of test op"); AddInput("input", "input1 of test op");
AddOutput("output", "output of test op"); AddOutput("output", "output of test op");
AddAttr<bool>("use_gpu", "force to use gpu kernel").SetDefault(false); AddAttr<bool>("use_gpu", "force to use gpu kernel").SetDefault(false);
......
...@@ -38,9 +38,7 @@ void BroadcastOpHandle::RunImpl() { ...@@ -38,9 +38,7 @@ void BroadcastOpHandle::RunImpl() {
out_var_handles.size(), places_.size(), out_var_handles.size(), places_.size(),
"The number of output should equal to the number of places."); "The number of output should equal to the number of places.");
// Wait input done, this Wait is asynchronous operation platform::Place WaitInputVarGenerated();
// &in_place;
WaitInputVarGenerated(*in_var_handle);
std::vector<const Scope *> var_scopes; std::vector<const Scope *> var_scopes;
for (auto *s : local_scopes_) { for (auto *s : local_scopes_) {
...@@ -50,29 +48,9 @@ void BroadcastOpHandle::RunImpl() { ...@@ -50,29 +48,9 @@ void BroadcastOpHandle::RunImpl() {
auto *in_var = auto *in_var =
var_scopes.at(in_var_handle->scope_idx_)->FindVar(in_var_handle->name_); var_scopes.at(in_var_handle->scope_idx_)->FindVar(in_var_handle->name_);
PADDLE_ENFORCE_NOT_NULL(in_var); PADDLE_ENFORCE_NOT_NULL(in_var);
Tensor &in_tensor = VariableVisitor::GetMutableTensor(in_var); Tensor &in_tensor = VariableVisitor::GetMutableTensor(in_var);
// NOTE: The tensors' Place of input and output must be all on GPU or all on InitOutputValue(*in_var_handle, out_var_handles);
// CPU.
for (auto *out_var_handle : out_var_handles) {
if (out_var_handle->IsTheSameVar(*in_var_handle)) {
continue;
}
auto t_out_p = out_var_handle->place_;
auto *out_var = var_scopes.at(out_var_handle->scope_idx_)
->FindVar(out_var_handle->name_);
PADDLE_ENFORCE_NOT_NULL(out_var);
if (platform::is_gpu_place(in_tensor.place())) {
PADDLE_ENFORCE(platform::is_gpu_place(t_out_p),
"Places of input and output must be all on GPU.");
} else {
t_out_p = platform::CPUPlace();
}
VariableVisitor::ShareDimsAndLoD(*in_var, out_var);
VariableVisitor::GetMutableTensor(out_var).mutable_data(t_out_p,
in_tensor.type());
}
if (platform::is_cpu_place(in_tensor.place())) { if (platform::is_cpu_place(in_tensor.place())) {
for (auto *out_var_handle : out_var_handles) { for (auto *out_var_handle : out_var_handles) {
...@@ -147,11 +125,37 @@ void BroadcastOpHandle::RunImpl() { ...@@ -147,11 +125,37 @@ void BroadcastOpHandle::RunImpl() {
} }
} }
void BroadcastOpHandle::WaitInputVarGenerated(const VarHandle &in_var) { void BroadcastOpHandle::InitOutputValue(
if (in_var.generated_op_) { const VarHandle &in_var_handle,
for (auto &pair : dev_ctxes_) { const std::vector<VarHandle *> &out_var_handles) const {
in_var.generated_op_->Wait(pair.second); std::vector<const Scope *> var_scopes;
for (auto *s : local_scopes_) {
var_scopes.emplace_back(s->FindVar(kLocalExecScopeName)->Get<Scope *>());
}
auto *in_var =
var_scopes.at(in_var_handle.scope_idx_)->FindVar(in_var_handle.name_);
Tensor &in_tensor = VariableVisitor::GetMutableTensor(in_var);
// NOTE: The tensors' Place of input and output must be all on GPU or all on
// CPU.
for (auto *out_var_handle : out_var_handles) {
if (out_var_handle->IsTheSameVar(in_var_handle)) {
continue;
} }
auto t_out_p = out_var_handle->place_;
auto *out_var = var_scopes.at(out_var_handle->scope_idx_)
->FindVar(out_var_handle->name_);
PADDLE_ENFORCE_NOT_NULL(out_var);
if (is_gpu_place(in_tensor.place())) {
PADDLE_ENFORCE(platform::is_gpu_place(t_out_p),
"Places of input and output must be all on GPU.");
} else {
t_out_p = platform::CPUPlace();
}
VariableVisitor::ShareDimsAndLoD(*in_var, out_var);
VariableVisitor::GetMutableTensor(out_var).mutable_data(t_out_p,
in_tensor.type());
} }
} }
......
...@@ -57,7 +57,6 @@ struct BroadcastOpHandle : public OpHandleBase { ...@@ -57,7 +57,6 @@ struct BroadcastOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
void WaitInputVarGenerated(const VarHandle &in_var);
private: private:
const std::vector<Scope *> &local_scopes_; const std::vector<Scope *> &local_scopes_;
...@@ -65,6 +64,9 @@ struct BroadcastOpHandle : public OpHandleBase { ...@@ -65,6 +64,9 @@ struct BroadcastOpHandle : public OpHandleBase {
#ifdef PADDLE_WITH_CUDA #ifdef PADDLE_WITH_CUDA
const platform::NCCLContextMap *nccl_ctxs_; const platform::NCCLContextMap *nccl_ctxs_;
#endif #endif
void InitOutputValue(const VarHandle &in_var_handle,
const std::vector<VarHandle *> &out_var_handles) const;
}; };
} // namespace details } // namespace details
} // namespace framework } // namespace framework
......
...@@ -26,20 +26,20 @@ ComputationOpHandle::ComputationOpHandle(const OpDesc &op_desc, Scope *scope, ...@@ -26,20 +26,20 @@ ComputationOpHandle::ComputationOpHandle(const OpDesc &op_desc, Scope *scope,
place_(place) {} place_(place) {}
void ComputationOpHandle::RunImpl() { void ComputationOpHandle::RunImpl() {
auto *cur_ctx = dev_ctxes_[place_]; WaitInputVarGenerated(place_);
for (auto *in : inputs_) {
bool need_wait = in->generated_op_ &&
in->generated_op_->DeviceContext(place_) != cur_ctx;
if (need_wait) {
in->generated_op_->Wait(cur_ctx);
}
}
this->RunAndRecordEvent([this] { this->RunAndRecordEvent([this] {
op_->Run(*scope_->FindVar(kLocalExecScopeName)->Get<Scope *>(), place_); op_->Run(*scope_->FindVar(kLocalExecScopeName)->Get<Scope *>(), place_);
}); });
} }
bool ComputationOpHandle::NeedWait(VarHandleBase *in_var) {
bool need_wait =
in_var && in_var->generated_op_ &&
in_var->generated_op_->DeviceContext(place_) != dev_ctxes_[place_];
return need_wait;
}
std::string ComputationOpHandle::Name() const { return op_->Type(); } std::string ComputationOpHandle::Name() const { return op_->Type(); }
} // namespace details } // namespace details
} // namespace framework } // namespace framework
......
...@@ -36,6 +36,8 @@ struct ComputationOpHandle : public OpHandleBase { ...@@ -36,6 +36,8 @@ struct ComputationOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
bool NeedWait(VarHandleBase *in_var) override;
private: private:
std::unique_ptr<OperatorBase> op_; std::unique_ptr<OperatorBase> op_;
Scope *scope_; Scope *scope_;
......
...@@ -31,7 +31,7 @@ FetchOpHandle::~FetchOpHandle() { ...@@ -31,7 +31,7 @@ FetchOpHandle::~FetchOpHandle() {
} }
} }
void FetchOpHandle::Wait(platform::DeviceContext *waited_dev) { void FetchOpHandle::RecordWaitEventOnCtx(platform::DeviceContext *waited_ctx) {
PADDLE_THROW("Nobody should wait FetchOp. Unexpceted Error"); PADDLE_THROW("Nobody should wait FetchOp. Unexpceted Error");
} }
...@@ -45,14 +45,8 @@ void FetchOpHandle::WaitAndMergeCPUTensors() const { ...@@ -45,14 +45,8 @@ void FetchOpHandle::WaitAndMergeCPUTensors() const {
} }
void FetchOpHandle::RunImpl() { void FetchOpHandle::RunImpl() {
auto cpu_ctx = WaitInputVarGenerated(platform::CPUPlace());
platform::DeviceContextPool::Instance().Get(platform::CPUPlace());
for (auto *input : inputs_) {
auto *var = static_cast<VarHandle *>(input);
if (var->generated_op_) {
var->generated_op_->Wait(cpu_ctx);
}
}
tensors_.resize(inputs_.size()); tensors_.resize(inputs_.size());
auto *var_handle = static_cast<VarHandle *>(inputs_[0]); auto *var_handle = static_cast<VarHandle *>(inputs_[0]);
auto &var_name = var_handle->name_; auto &var_name = var_handle->name_;
...@@ -79,6 +73,15 @@ void FetchOpHandle::RunImpl() { ...@@ -79,6 +73,15 @@ void FetchOpHandle::RunImpl() {
this->WaitAndMergeCPUTensors(); this->WaitAndMergeCPUTensors();
} }
void FetchOpHandle::WaitInputVarGenerated(const platform::Place &place) {
auto cpu_ctx = platform::DeviceContextPool::Instance().Get(place);
for (auto *input : inputs_) {
if (input->generated_op_) {
input->generated_op_->RecordWaitEventOnCtx(cpu_ctx);
}
}
}
std::string FetchOpHandle::Name() const { return "Fetch"; } std::string FetchOpHandle::Name() const { return "Fetch"; }
} // namespace details } // namespace details
......
...@@ -33,7 +33,7 @@ struct FetchOpHandle : public OpHandleBase { ...@@ -33,7 +33,7 @@ struct FetchOpHandle : public OpHandleBase {
~FetchOpHandle(); ~FetchOpHandle();
void Wait(platform::DeviceContext *waited_dev) override; void RecordWaitEventOnCtx(platform::DeviceContext *waited_ctx) override;
void WaitAndMergeCPUTensors() const; void WaitAndMergeCPUTensors() const;
...@@ -42,6 +42,8 @@ struct FetchOpHandle : public OpHandleBase { ...@@ -42,6 +42,8 @@ struct FetchOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
void WaitInputVarGenerated(const platform::Place &place) override;
private: private:
FeedFetchList *data_; FeedFetchList *data_;
size_t offset_; size_t offset_;
......
...@@ -55,7 +55,7 @@ void GatherOpHandle::RunImpl() { ...@@ -55,7 +55,7 @@ void GatherOpHandle::RunImpl() {
"Currently, gather_op only can gather SelectedRows."); "Currently, gather_op only can gather SelectedRows.");
// Wait input done, this Wait is asynchronous operation // Wait input done, this Wait is asynchronous operation
WaitInputVarGenerated(in_var_handles); WaitInputVarGenerated();
auto &pre_in_value = pre_in_var->Get<framework::SelectedRows>(); auto &pre_in_value = pre_in_var->Get<framework::SelectedRows>();
std::vector<int64_t> out_rows; std::vector<int64_t> out_rows;
...@@ -111,17 +111,6 @@ void GatherOpHandle::RunImpl() { ...@@ -111,17 +111,6 @@ void GatherOpHandle::RunImpl() {
}); });
} }
void GatherOpHandle::WaitInputVarGenerated(
const std::vector<VarHandle *> &in_var_handles) {
for (auto *in : in_var_handles) {
if (in->generated_op_) {
for (auto pair : dev_ctxes_) {
in->generated_op_->Wait(pair.second);
}
}
}
}
std::string GatherOpHandle::Name() const { return "gather"; } std::string GatherOpHandle::Name() const { return "gather"; }
} // namespace details } // namespace details
} // namespace framework } // namespace framework
......
...@@ -39,7 +39,6 @@ struct GatherOpHandle : public OpHandleBase { ...@@ -39,7 +39,6 @@ struct GatherOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
void WaitInputVarGenerated(const std::vector<VarHandle *> &in_var_handles);
private: private:
const std::vector<Scope *> &local_scopes_; const std::vector<Scope *> &local_scopes_;
......
...@@ -34,12 +34,7 @@ void NCCLAllReduceOpHandle::RunImpl() { ...@@ -34,12 +34,7 @@ void NCCLAllReduceOpHandle::RunImpl() {
return; // No need to all reduce when GPU count = 1; return; // No need to all reduce when GPU count = 1;
} else { } else {
// Wait input done // Wait input done
for (auto *in : inputs_) { WaitInputVarGenerated();
auto &p = static_cast<VarHandle *>(in)->place_;
if (in->generated_op_) {
in->generated_op_->Wait(dev_ctxes_[p]);
}
}
auto &var_name = static_cast<VarHandle *>(this->inputs_[0])->name_; auto &var_name = static_cast<VarHandle *>(this->inputs_[0])->name_;
int dtype = -1; int dtype = -1;
......
...@@ -56,15 +56,15 @@ void OpHandleBase::Run(bool use_event) { ...@@ -56,15 +56,15 @@ void OpHandleBase::Run(bool use_event) {
RunImpl(); RunImpl();
} }
void OpHandleBase::Wait(platform::DeviceContext *waited_dev) { void OpHandleBase::RecordWaitEventOnCtx(platform::DeviceContext *waited_ctx) {
#ifdef PADDLE_WITH_CUDA #ifdef PADDLE_WITH_CUDA
if (platform::is_cpu_place(waited_dev->GetPlace()) || events_.empty()) { if (platform::is_cpu_place(waited_ctx->GetPlace()) || events_.empty()) {
for (auto &dev_ctx : dev_ctxes_) { for (auto &dev_ctx : dev_ctxes_) {
dev_ctx.second->Wait(); dev_ctx.second->Wait();
} }
} else { } else {
auto stream = auto stream =
static_cast<platform::CUDADeviceContext *>(waited_dev)->stream(); static_cast<platform::CUDADeviceContext *>(waited_ctx)->stream();
for (auto &ev : events_) { for (auto &ev : events_) {
PADDLE_ENFORCE(cudaStreamWaitEvent(stream, ev.second, 0)); PADDLE_ENFORCE(cudaStreamWaitEvent(stream, ev.second, 0));
} }
...@@ -86,6 +86,28 @@ void OpHandleBase::AddOutput(VarHandleBase *out) { ...@@ -86,6 +86,28 @@ void OpHandleBase::AddOutput(VarHandleBase *out) {
out->generated_op_ = this; out->generated_op_ = this;
} }
void OpHandleBase::WaitInputVarGenerated() {
for (auto in_var : inputs_) {
if (NeedWait(in_var)) {
for (auto &pair : dev_ctxes_) {
in_var->generated_op_->RecordWaitEventOnCtx(pair.second);
}
}
}
}
void OpHandleBase::WaitInputVarGenerated(const platform::Place &place) {
for (auto *in : inputs_) {
if (NeedWait(in)) {
in->generated_op_->RecordWaitEventOnCtx(dev_ctxes_[place]);
}
}
}
bool OpHandleBase::NeedWait(VarHandleBase *in_var) {
return in_var && in_var->generated_op_;
}
void OpHandleBase::RunAndRecordEvent(const std::function<void()> &callback) { void OpHandleBase::RunAndRecordEvent(const std::function<void()> &callback) {
#ifdef PADDLE_WITH_CUDA #ifdef PADDLE_WITH_CUDA
if (!events_.empty()) { // Use event if (!events_.empty()) { // Use event
......
...@@ -38,12 +38,24 @@ class OpHandleBase { ...@@ -38,12 +38,24 @@ class OpHandleBase {
void Run(bool use_event); void Run(bool use_event);
virtual void Wait(platform::DeviceContext *waited_dev); virtual void RecordWaitEventOnCtx(platform::DeviceContext *waited_ctx);
void AddInput(VarHandleBase *in); void AddInput(VarHandleBase *in);
void AddOutput(VarHandleBase *out); void AddOutput(VarHandleBase *out);
// This method adds the wait events of all the input on all the device
// context.
// NODE: This Wait is asynchronous operation.
virtual void WaitInputVarGenerated();
// This method adds the wait events of all the input on the specified device
// context.
// NODE: This Wait is asynchronous operation.
virtual void WaitInputVarGenerated(const platform::Place &place);
virtual bool NeedWait(VarHandleBase *in_var);
// If the Op involves data transfer of multiple devices that // If the Op involves data transfer of multiple devices that
// will likely block other computations. // will likely block other computations.
virtual bool IsMultiDeviceTransfer() { return false; } virtual bool IsMultiDeviceTransfer() { return false; }
......
...@@ -95,7 +95,10 @@ struct OpInfoFiller<T, kOpProtoAndCheckerMaker> { ...@@ -95,7 +95,10 @@ struct OpInfoFiller<T, kOpProtoAndCheckerMaker> {
void operator()(const char* op_type, OpInfo* info) const { void operator()(const char* op_type, OpInfo* info) const {
info->proto_ = new proto::OpProto; info->proto_ = new proto::OpProto;
info->checker_ = new OpAttrChecker(); info->checker_ = new OpAttrChecker();
auto maker = T(info->proto_, info->checker_); T maker;
maker.SetProto(info->proto_);
maker.SetChecker(info->checker_);
maker.Make();
maker.Validate(); maker.Validate();
info->proto_->set_type(op_type); info->proto_->set_type(op_type);
PADDLE_ENFORCE( PADDLE_ENFORCE(
......
...@@ -51,7 +51,7 @@ void ReduceOpHandle::RunImpl() { ...@@ -51,7 +51,7 @@ void ReduceOpHandle::RunImpl() {
PADDLE_ENFORCE_NOT_NULL(pre_in_var); PADDLE_ENFORCE_NOT_NULL(pre_in_var);
// Wait input done, this Wait is asynchronous operation // Wait input done, this Wait is asynchronous operation
WaitInputVarGenerated(in_var_handles); WaitInputVarGenerated();
// NOTE: The Places of all input tensor must be all on CPU or all on GPU. // NOTE: The Places of all input tensor must be all on CPU or all on GPU.
std::vector<platform::Place> in_places; // used to get dev_ctx std::vector<platform::Place> in_places; // used to get dev_ctx
...@@ -80,19 +80,21 @@ void ReduceOpHandle::RunImpl() { ...@@ -80,19 +80,21 @@ void ReduceOpHandle::RunImpl() {
} }
if (pre_in_var->IsType<framework::SelectedRows>()) { if (pre_in_var->IsType<framework::SelectedRows>()) {
std::vector<const SelectedRows *> in_selected_rows = this->RunAndRecordEvent([&] {
GetInputValues<SelectedRows>(in_var_handles, var_scopes); std::vector<const SelectedRows *> in_selected_rows =
GetInputValues<SelectedRows>(in_var_handles, var_scopes);
GatherSelectedRows(in_selected_rows, in_places, dev_ctxes_, t_out_p, GatherSelectedRows(in_selected_rows, in_places, dev_ctxes_, t_out_p,
out_var->GetMutable<framework::SelectedRows>()); out_var->GetMutable<framework::SelectedRows>());
});
} else { } else {
std::vector<const LoDTensor *> lod_tensors = std::vector<const LoDTensor *> lod_tensors =
GetInputValues<LoDTensor>(in_var_handles, var_scopes); GetInputValues<LoDTensor>(in_var_handles, var_scopes);
if (paddle::platform::is_cpu_place(lod_tensors[0]->place())) { if (paddle::platform::is_cpu_place(lod_tensors[0]->place())) {
ReduceLoDTensor func(lod_tensors, this->RunAndRecordEvent([&] {
out_var->GetMutable<framework::LoDTensor>()); ReduceLoDTensor func(lod_tensors,
VisitDataType(ToDataType(lod_tensors[0]->type()), func); out_var->GetMutable<framework::LoDTensor>());
VisitDataType(ToDataType(lod_tensors[0]->type()), func);
});
} else if (paddle::platform::is_gpu_place(lod_tensors[0]->place())) { } else if (paddle::platform::is_gpu_place(lod_tensors[0]->place())) {
#ifdef PADDLE_WITH_CUDA #ifdef PADDLE_WITH_CUDA
auto pre_in = pre_in_var->Get<framework::LoDTensor>(); auto pre_in = pre_in_var->Get<framework::LoDTensor>();
...@@ -157,17 +159,6 @@ std::vector<const T *> ReduceOpHandle::GetInputValues( ...@@ -157,17 +159,6 @@ std::vector<const T *> ReduceOpHandle::GetInputValues(
return in_selected_rows; return in_selected_rows;
} }
void ReduceOpHandle::WaitInputVarGenerated(
const std::vector<VarHandle *> &in_var_handles) {
for (auto *in : in_var_handles) {
if (in->generated_op_) {
for (auto pair : dev_ctxes_) {
in->generated_op_->Wait(pair.second);
}
}
}
}
std::string ReduceOpHandle::Name() const { return "reduce"; } std::string ReduceOpHandle::Name() const { return "reduce"; }
} // namespace details } // namespace details
} // namespace framework } // namespace framework
......
...@@ -60,8 +60,6 @@ struct ReduceOpHandle : public OpHandleBase { ...@@ -60,8 +60,6 @@ struct ReduceOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
void WaitInputVarGenerated(const std::vector<VarHandle *> &in_var_handles);
template <typename T> template <typename T>
std::vector<const T *> GetInputValues( std::vector<const T *> GetInputValues(
const std::vector<VarHandle *> &in_var_handles, const std::vector<VarHandle *> &in_var_handles,
......
...@@ -29,6 +29,7 @@ ScaleLossGradOpHandle::ScaleLossGradOpHandle(size_t num_dev, Scope *scope, ...@@ -29,6 +29,7 @@ ScaleLossGradOpHandle::ScaleLossGradOpHandle(size_t num_dev, Scope *scope,
ScaleLossGradOpHandle::~ScaleLossGradOpHandle() {} ScaleLossGradOpHandle::~ScaleLossGradOpHandle() {}
void ScaleLossGradOpHandle::RunImpl() { void ScaleLossGradOpHandle::RunImpl() {
// Doesn't wait any event
std::string var_name = static_cast<VarHandle *>(this->outputs_[0])->name_; std::string var_name = static_cast<VarHandle *>(this->outputs_[0])->name_;
auto &local_scope = *scope_->FindVar(kLocalExecScopeName)->Get<Scope *>(); auto &local_scope = *scope_->FindVar(kLocalExecScopeName)->Get<Scope *>();
......
...@@ -26,6 +26,7 @@ SendOpHandle::SendOpHandle(const framework::OpDesc &op_desc, ...@@ -26,6 +26,7 @@ SendOpHandle::SendOpHandle(const framework::OpDesc &op_desc,
place_(place) {} place_(place) {}
void SendOpHandle::RunImpl() { void SendOpHandle::RunImpl() {
// TODO(wuyi): need further analysis whether wait VarDummyHandle.
// Wait input done // Wait input done
for (auto *in : inputs_) { for (auto *in : inputs_) {
auto &p = static_cast<VarHandle *>(in)->place_; auto &p = static_cast<VarHandle *>(in)->place_;
...@@ -33,7 +34,7 @@ void SendOpHandle::RunImpl() { ...@@ -33,7 +34,7 @@ void SendOpHandle::RunImpl() {
continue; continue;
} }
if (in->generated_op_) { if (in->generated_op_) {
in->generated_op_->Wait(dev_ctxes_[p]); in->generated_op_->RecordWaitEventOnCtx(dev_ctxes_[p]);
} }
} }
auto &tmp_scope = local_scope_->FindVar(kLocalExecScopeName)->Get<Scope *>(); auto &tmp_scope = local_scope_->FindVar(kLocalExecScopeName)->Get<Scope *>();
......
...@@ -14,8 +14,6 @@ ...@@ -14,8 +14,6 @@
#include "paddle/fluid/framework/details/threaded_ssa_graph_executor.h" #include "paddle/fluid/framework/details/threaded_ssa_graph_executor.h"
#include "paddle/fluid/framework/details/fetch_op_handle.h"
namespace paddle { namespace paddle {
namespace framework { namespace framework {
namespace details { namespace details {
...@@ -45,73 +43,33 @@ FeedFetchList ThreadedSSAGraphExecutor::Run( ...@@ -45,73 +43,33 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
// Should revisit it if overlapping is available. // Should revisit it if overlapping is available.
std::unordered_set<OpHandleBase *> delayed_ops; std::unordered_set<OpHandleBase *> delayed_ops;
auto InsertPendingVar = [&pending_vars, &ready_vars](VarHandleBase &var) {
pending_vars.insert(&var);
if (var.generated_op_ == nullptr) {
ready_vars.Push(&var);
}
};
auto InsertPendingOp = [&pending_ops](OpHandleBase &op_instance) {
pending_ops.insert({&op_instance, op_instance.Inputs().size()});
};
// Transform SSAGraph to pending_ops & pending_vars // Transform SSAGraph to pending_ops & pending_vars
for (auto &var_map : graph_->vars_) { for (auto &var_map : graph_->vars_) {
for (auto &name_pair : var_map) { for (auto &name_pair : var_map) {
for (auto &version_pair : name_pair.second) { for (auto &version_pair : name_pair.second) {
InsertPendingVar(*version_pair); InsertPendingVar(&pending_vars, &ready_vars, version_pair.get());
} }
} }
} }
for (auto &var : graph_->dep_vars_) { for (auto &var : graph_->dep_vars_) {
InsertPendingVar(*var); InsertPendingVar(&pending_vars, &ready_vars, var.get());
} }
for (auto &op : graph_->ops_) { for (auto &op : graph_->ops_) {
if (op->Inputs().empty()) { // Special case, Op has no input. if (op->Inputs().empty()) { // Special case, Op has no input.
ready_ops.insert(op.get()); ready_ops.insert(op.get());
} else { } else {
InsertPendingOp(*op); InsertPendingOp(&pending_ops, op.get());
} }
} }
// Step 2. Insert FetchOps // Step 2. Insert FetchOps
std::vector<std::unique_ptr<FetchOpHandle>> fetch_ops; std::vector<std::unique_ptr<FetchOpHandle>> fetch_ops;
FeedFetchList fetch_data(fetch_tensors.size());
std::unordered_map<std::string, std::vector<VarHandleBase *>> fetched_vars;
for (auto &fetch_var_name : fetch_tensors) {
for (auto &var_map : graph_->vars_) {
auto it = var_map.find(fetch_var_name);
if (it != var_map.end()) {
fetched_vars[fetch_var_name].push_back(it->second.rbegin()->get());
}
}
}
std::unordered_set<std::unique_ptr<VarHandleBase>> fetch_dependencies; std::unordered_set<std::unique_ptr<VarHandleBase>> fetch_dependencies;
for (size_t i = 0; i < fetch_tensors.size(); ++i) { FeedFetchList fetch_data(fetch_tensors.size());
auto &var_name = fetch_tensors[i];
auto &vars = fetched_vars.at(var_name);
auto *op = new FetchOpHandle(&fetch_data, i, &local_scopes_);
fetch_ops.emplace_back(op);
for (auto &p : places_) {
op->SetDeviceContext(p, fetch_ctxs_.Get(p));
}
for (auto *var : vars) {
op->AddInput(var);
}
auto *fetch_dummy = new DummyVarHandle(); InsertFetchOps(fetch_tensors, &fetch_ops, &fetch_dependencies, &pending_ops,
op->AddOutput(fetch_dummy); &pending_vars, &ready_vars, &fetch_data);
fetch_dependencies.emplace(fetch_dummy);
InsertPendingVar(*fetch_dummy);
InsertPendingOp(*op);
}
auto run_all_ops = [&](std::unordered_set<OpHandleBase *> &set) { auto run_all_ops = [&](std::unordered_set<OpHandleBase *> &set) {
for (auto *op : set) { for (auto *op : set) {
...@@ -174,6 +132,60 @@ FeedFetchList ThreadedSSAGraphExecutor::Run( ...@@ -174,6 +132,60 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
return fetch_data; return fetch_data;
} }
void ThreadedSSAGraphExecutor::InsertFetchOps(
const std::vector<std::string> &fetch_tensors,
std::vector<std::unique_ptr<FetchOpHandle>> *fetch_ops,
std::unordered_set<std::unique_ptr<VarHandleBase>> *fetch_dependencies,
std::unordered_map<OpHandleBase *, size_t> *pending_ops,
std::unordered_set<VarHandleBase *> *pending_vars,
BlockingQueue<VarHandleBase *> *ready_vars, FeedFetchList *fetch_data) {
std::unordered_map<std::string, std::vector<VarHandleBase *>> fetched_vars;
for (auto &fetch_var_name : fetch_tensors) {
for (auto &var_map : graph_->vars_) {
auto it = var_map.find(fetch_var_name);
if (it != var_map.end()) {
fetched_vars[fetch_var_name].push_back(it->second.rbegin()->get());
}
}
}
for (size_t i = 0; i < fetch_tensors.size(); ++i) {
auto &var_name = fetch_tensors[i];
auto &vars = fetched_vars.at(var_name);
auto *op = new FetchOpHandle(fetch_data, i, &local_scopes_);
fetch_ops->emplace_back(op);
for (auto &p : places_) {
op->SetDeviceContext(p, fetch_ctxs_.Get(p));
}
for (auto *var : vars) {
op->AddInput(var);
}
auto *fetch_dummy = new DummyVarHandle();
op->AddOutput(fetch_dummy);
fetch_dependencies->emplace(fetch_dummy);
this->InsertPendingVar(pending_vars, ready_vars, fetch_dummy);
this->InsertPendingOp(pending_ops, op);
}
}
void ThreadedSSAGraphExecutor::InsertPendingOp(
std::unordered_map<OpHandleBase *, size_t> *pending_ops,
OpHandleBase *op_instance) const {
pending_ops->insert({op_instance, op_instance->Inputs().size()});
}
void ThreadedSSAGraphExecutor::InsertPendingVar(
std::unordered_set<VarHandleBase *> *pending_vars,
BlockingQueue<VarHandleBase *> *ready_vars, VarHandleBase *var) const {
pending_vars->insert(var);
if (var->generated_op_ == nullptr) {
ready_vars->Push(var);
}
}
void ThreadedSSAGraphExecutor::RunOp( void ThreadedSSAGraphExecutor::RunOp(
BlockingQueue<VarHandleBase *> *ready_var_q, details::OpHandleBase *op) { BlockingQueue<VarHandleBase *> *ready_var_q, details::OpHandleBase *op) {
auto op_run = [ready_var_q, op, this] { auto op_run = [ready_var_q, op, this] {
......
...@@ -23,6 +23,7 @@ ...@@ -23,6 +23,7 @@
#include <functional> #include <functional>
#include "ThreadPool.h" // ThreadPool in thrird party #include "ThreadPool.h" // ThreadPool in thrird party
#include "paddle/fluid/framework/blocking_queue.h" #include "paddle/fluid/framework/blocking_queue.h"
#include "paddle/fluid/framework/details/fetch_op_handle.h"
#include "paddle/fluid/framework/details/ssa_graph_executor.h" #include "paddle/fluid/framework/details/ssa_graph_executor.h"
namespace paddle { namespace paddle {
...@@ -58,6 +59,21 @@ class ThreadedSSAGraphExecutor : public SSAGraphExecutor { ...@@ -58,6 +59,21 @@ class ThreadedSSAGraphExecutor : public SSAGraphExecutor {
std::unique_ptr<platform::EnforceNotMet> exception_; std::unique_ptr<platform::EnforceNotMet> exception_;
std::atomic<int> running_ops_; std::atomic<int> running_ops_;
bool allow_op_delay_; bool allow_op_delay_;
void InsertPendingOp(std::unordered_map<OpHandleBase *, size_t> *pending_ops,
OpHandleBase *op_instance) const;
void InsertPendingVar(std::unordered_set<VarHandleBase *> *pending_vars,
BlockingQueue<VarHandleBase *> *ready_vars,
VarHandleBase *var) const;
void InsertFetchOps(
const std::vector<std::string> &fetch_tensors,
std::vector<std::unique_ptr<FetchOpHandle>> *fetch_ops,
std::unordered_set<std::unique_ptr<VarHandleBase>> *fetch_dependencies,
std::unordered_map<OpHandleBase *, size_t> *pending_ops,
std::unordered_set<VarHandleBase *> *pending_vars,
BlockingQueue<VarHandleBase *> *ready_vars, FeedFetchList *fetch_data);
}; };
} // namespace details } // namespace details
......
...@@ -14,56 +14,57 @@ limitations under the License. */ ...@@ -14,56 +14,57 @@ limitations under the License. */
#pragma once #pragma once
#include <string> #include <string>
#include "glog/logging.h"
#include "paddle/fluid/framework/attribute.h" #include "paddle/fluid/framework/attribute.h"
#include "paddle/fluid/framework/framework.pb.h" #include "paddle/fluid/framework/framework.pb.h"
namespace paddle { namespace paddle {
namespace framework { namespace framework {
// this class not only make proto but also init attribute checkers. // this class not only make proto but also init attribute checkers.
class OpProtoAndCheckerMaker { class OpProtoAndCheckerMaker {
public: public:
using OpProto = proto::OpProto; virtual void Make() = 0;
using OpAttrChecker = framework::OpAttrChecker;
OpProtoAndCheckerMaker(OpProto* proto, OpAttrChecker* op_checker)
: proto_(proto), op_checker_(op_checker) {}
virtual ~OpProtoAndCheckerMaker() { virtual ~OpProtoAndCheckerMaker() {
PADDLE_ENFORCE(validated_, "should call Validate after build"); CHECK(validated_) << "should call Validate after build";
} }
void SetProto(proto::OpProto *proto) { proto_ = proto; }
void SetChecker(OpAttrChecker *attr_checker) { op_checker_ = attr_checker; }
void Validate(); void Validate();
protected: protected:
struct VariableBuilder { struct VariableBuilder {
OpProto::Var* var_; proto::OpProto::Var *var_;
VariableBuilder& AsDuplicable() { VariableBuilder &AsDuplicable() {
var_->set_duplicable(true); var_->set_duplicable(true);
return *this; return *this;
} }
VariableBuilder& AsIntermediate() { VariableBuilder &AsIntermediate() {
var_->set_intermediate(true); var_->set_intermediate(true);
return *this; return *this;
} }
VariableBuilder& AsDispensable() { VariableBuilder &AsDispensable() {
var_->set_dispensable(true); var_->set_dispensable(true);
return *this; return *this;
} }
}; };
VariableBuilder AddInput(const std::string& name, const std::string& comment); VariableBuilder AddInput(const std::string &name, const std::string &comment);
VariableBuilder AddOutput(const std::string& name, VariableBuilder AddOutput(const std::string &name,
const std::string& comment); const std::string &comment);
template <typename T> template <typename T>
TypedAttrChecker<T>& AddAttr(const std::string& name, TypedAttrChecker<T> &AddAttr(const std::string &name,
const std::string& comment, const std::string &comment,
bool generated = false) { bool generated = false) {
auto* attr = proto_->add_attrs(); auto *attr = proto_->add_attrs();
attr->set_name(name); attr->set_name(name);
attr->set_comment(comment); attr->set_comment(comment);
attr->set_generated(generated); attr->set_generated(generated);
...@@ -71,21 +72,14 @@ class OpProtoAndCheckerMaker { ...@@ -71,21 +72,14 @@ class OpProtoAndCheckerMaker {
return op_checker_->AddAttrChecker<T>(name); return op_checker_->AddAttrChecker<T>(name);
} }
void AddComment(const std::string& comment) { proto_->set_comment(comment); } void AddComment(const std::string &comment) { proto_->set_comment(comment); }
private: private:
void CheckNoDuplicatedInOutAttrs(); void CheckNoDuplicatedInOutAttrs();
OpProto* proto_; proto::OpProto *proto_;
OpAttrChecker* op_checker_; OpAttrChecker *op_checker_;
bool validated_{false}; bool validated_{false};
}; };
class NOPMaker : public OpProtoAndCheckerMaker {
public:
NOPMaker(OpProto* proto, framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {}
};
} // namespace framework } // namespace framework
} // namespace paddle } // namespace paddle
...@@ -18,9 +18,7 @@ limitations under the License. */ ...@@ -18,9 +18,7 @@ limitations under the License. */
class TestAttrProtoMaker : public paddle::framework::OpProtoAndCheckerMaker { class TestAttrProtoMaker : public paddle::framework::OpProtoAndCheckerMaker {
public: public:
TestAttrProtoMaker(paddle::framework::proto::OpProto* proto, void Make() {
paddle::framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddAttr<float>("scale", "scale of test op"); AddAttr<float>("scale", "scale of test op");
AddAttr<float>("scale", "scale of test op"); AddAttr<float>("scale", "scale of test op");
} }
...@@ -29,15 +27,16 @@ class TestAttrProtoMaker : public paddle::framework::OpProtoAndCheckerMaker { ...@@ -29,15 +27,16 @@ class TestAttrProtoMaker : public paddle::framework::OpProtoAndCheckerMaker {
TEST(ProtoMaker, DuplicatedAttr) { TEST(ProtoMaker, DuplicatedAttr) {
paddle::framework::proto::OpProto op_proto; paddle::framework::proto::OpProto op_proto;
paddle::framework::OpAttrChecker op_checker; paddle::framework::OpAttrChecker op_checker;
auto proto_maker = TestAttrProtoMaker(&op_proto, &op_checker); TestAttrProtoMaker proto_maker;
proto_maker.SetProto(&op_proto);
proto_maker.SetChecker(&op_checker);
proto_maker.Make();
ASSERT_THROW(proto_maker.Validate(), paddle::platform::EnforceNotMet); ASSERT_THROW(proto_maker.Validate(), paddle::platform::EnforceNotMet);
} }
class TestInOutProtoMaker : public paddle::framework::OpProtoAndCheckerMaker { class TestInOutProtoMaker : public paddle::framework::OpProtoAndCheckerMaker {
public: public:
TestInOutProtoMaker(paddle::framework::proto::OpProto* proto, void Make() {
paddle::framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("input", "input of test op"); AddInput("input", "input of test op");
AddInput("input", "input of test op"); AddInput("input", "input of test op");
} }
...@@ -46,6 +45,9 @@ class TestInOutProtoMaker : public paddle::framework::OpProtoAndCheckerMaker { ...@@ -46,6 +45,9 @@ class TestInOutProtoMaker : public paddle::framework::OpProtoAndCheckerMaker {
TEST(ProtoMaker, DuplicatedInOut) { TEST(ProtoMaker, DuplicatedInOut) {
paddle::framework::proto::OpProto op_proto; paddle::framework::proto::OpProto op_proto;
paddle::framework::OpAttrChecker op_checker; paddle::framework::OpAttrChecker op_checker;
auto proto_maker = TestInOutProtoMaker(&op_proto, &op_checker); TestAttrProtoMaker proto_maker;
proto_maker.SetProto(&op_proto);
proto_maker.SetChecker(&op_checker);
proto_maker.Make();
ASSERT_THROW(proto_maker.Validate(), paddle::platform::EnforceNotMet); ASSERT_THROW(proto_maker.Validate(), paddle::platform::EnforceNotMet);
} }
...@@ -33,8 +33,7 @@ class CosineOp : public OperatorBase { ...@@ -33,8 +33,7 @@ class CosineOp : public OperatorBase {
class CosineOpProtoAndCheckerMaker : public OpProtoAndCheckerMaker { class CosineOpProtoAndCheckerMaker : public OpProtoAndCheckerMaker {
public: public:
CosineOpProtoAndCheckerMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("input", "input of cosine op"); AddInput("input", "input of cosine op");
AddOutput("output", "output of cosine op"); AddOutput("output", "output of cosine op");
AddAttr<float>("scale", "scale of cosine op") AddAttr<float>("scale", "scale of cosine op")
...@@ -55,8 +54,7 @@ class MyTestOp : public OperatorBase { ...@@ -55,8 +54,7 @@ class MyTestOp : public OperatorBase {
class MyTestOpProtoAndCheckerMaker : public OpProtoAndCheckerMaker { class MyTestOpProtoAndCheckerMaker : public OpProtoAndCheckerMaker {
public: public:
MyTestOpProtoAndCheckerMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("input", "input of cosine op").AsDuplicable(); AddInput("input", "input of cosine op").AsDuplicable();
AddOutput("output", "output of cosine op").AsIntermediate(); AddOutput("output", "output of cosine op").AsIntermediate();
auto my_checker = [](int i) { auto my_checker = [](int i) {
...@@ -212,10 +210,7 @@ namespace framework { ...@@ -212,10 +210,7 @@ namespace framework {
class OpKernelTestMaker : public OpProtoAndCheckerMaker { class OpKernelTestMaker : public OpProtoAndCheckerMaker {
public: public:
OpKernelTestMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() { AddComment("NoGradOp, same input output. no Grad"); }
: OpProtoAndCheckerMaker(proto, op_checker) {
AddComment("NoGradOp, same input output. no Grad");
}
}; };
class OpWithKernelTest : public OperatorWithKernel { class OpWithKernelTest : public OperatorWithKernel {
...@@ -275,9 +270,9 @@ TEST(OperatorRegistrar, CUDA) { ...@@ -275,9 +270,9 @@ TEST(OperatorRegistrar, CUDA) {
static int op_test_value = 0; static int op_test_value = 0;
using paddle::platform::DeviceContext;
using paddle::platform::CPUDeviceContext; using paddle::platform::CPUDeviceContext;
using paddle::platform::CUDADeviceContext; using paddle::platform::CUDADeviceContext;
using paddle::platform::DeviceContext;
namespace paddle { namespace paddle {
namespace framework { namespace framework {
......
...@@ -46,8 +46,7 @@ class OpWithoutKernelTest : public OperatorBase { ...@@ -46,8 +46,7 @@ class OpWithoutKernelTest : public OperatorBase {
class OpWithoutKernelCheckerMaker : public OpProtoAndCheckerMaker { class OpWithoutKernelCheckerMaker : public OpProtoAndCheckerMaker {
public: public:
OpWithoutKernelCheckerMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("input", "input of test op"); AddInput("input", "input of test op");
AddOutput("output", "output of test op"); AddOutput("output", "output of test op");
AddAttr<float>("scale", "scale of cosine op"); AddAttr<float>("scale", "scale of cosine op");
...@@ -98,8 +97,7 @@ namespace framework { ...@@ -98,8 +97,7 @@ namespace framework {
class OpKernelTestProtoAndCheckerMaker : public OpProtoAndCheckerMaker { class OpKernelTestProtoAndCheckerMaker : public OpProtoAndCheckerMaker {
public: public:
OpKernelTestProtoAndCheckerMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("x", "input of test op"); AddInput("x", "input of test op");
AddOutput("y", "output of test op"); AddOutput("y", "output of test op");
AddAttr<float>("scale", "scale of cosine op") AddAttr<float>("scale", "scale of cosine op")
...@@ -137,9 +135,7 @@ class CPUKernelTest : public OpKernel<float> { ...@@ -137,9 +135,7 @@ class CPUKernelTest : public OpKernel<float> {
class OpKernelTestMultiInputsProtoAndCheckerMaker class OpKernelTestMultiInputsProtoAndCheckerMaker
: public OpProtoAndCheckerMaker { : public OpProtoAndCheckerMaker {
public: public:
OpKernelTestMultiInputsProtoAndCheckerMaker(OpProto* proto, void Make() {
OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("xs", "inputs of test op").AsDuplicable(); AddInput("xs", "inputs of test op").AsDuplicable();
AddInput("k", "input of test op"); AddInput("k", "input of test op");
AddOutput("ys", "outputs of test op").AsDuplicable(); AddOutput("ys", "outputs of test op").AsDuplicable();
......
...@@ -24,8 +24,7 @@ namespace framework { ...@@ -24,8 +24,7 @@ namespace framework {
class SumOpMaker : public OpProtoAndCheckerMaker { class SumOpMaker : public OpProtoAndCheckerMaker {
public: public:
SumOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "").AsDuplicable(); AddInput("X", "").AsDuplicable();
AddOutput("Out", ""); AddOutput("Out", "");
AddComment(""); AddComment("");
......
...@@ -20,8 +20,8 @@ if(NOT APPLE) ...@@ -20,8 +20,8 @@ if(NOT APPLE)
endif() endif()
if(WITH_TESTING) if(WITH_TESTING)
# both tests/book and analysis depends the models that generated by python/paddle/fluid/tests/book
add_subdirectory(tests/book) add_subdirectory(tests/book)
# analysis test depends the models that generate by python/paddle/fluid/tests/book
add_subdirectory(analysis) add_subdirectory(analysis)
endif() endif()
......
nv_library(tensorrt_engine SRCS engine.cc) nv_library(tensorrt_engine SRCS engine.cc DEPS framework_proto)
nv_test(test_tensorrt SRCS test_tensorrt.cc DEPS dynload_cuda device_context dynamic_loader) nv_test(test_tensorrt SRCS test_tensorrt.cc DEPS dynload_cuda device_context dynamic_loader)
nv_test(test_tensorrt_engine SRCS test_engine.cc DEPS dynload_cuda tensorrt_engine) nv_test(test_tensorrt_engine SRCS test_engine.cc DEPS dynload_cuda tensorrt_engine)
......
...@@ -98,7 +98,7 @@ TEST_F(TensorRTEngineTest, add_layer_multi_dim) { ...@@ -98,7 +98,7 @@ TEST_F(TensorRTEngineTest, add_layer_multi_dim) {
float x_v[2] = {1.0, 2.0}; float x_v[2] = {1.0, 2.0};
engine_->SetInputFromCPU("x", reinterpret_cast<void*>(&x_v), engine_->SetInputFromCPU("x", reinterpret_cast<void*>(&x_v),
2 * sizeof(float)); 2 * sizeof(float));
engine_->Execute(1); engine_->Execute(1);
LOG(INFO) << "to get output"; LOG(INFO) << "to get output";
......
...@@ -166,6 +166,8 @@ function(op_library TARGET) ...@@ -166,6 +166,8 @@ function(op_library TARGET)
# NOTE(*): activation use macro to regist the kernels, set use_op manually. # NOTE(*): activation use macro to regist the kernels, set use_op manually.
if(${TARGET} STREQUAL "activation") if(${TARGET} STREQUAL "activation")
file(APPEND ${pybind_file} "USE_OP(relu);\n") file(APPEND ${pybind_file} "USE_OP(relu);\n")
elseif(${TARGET} STREQUAL "reduce")
file(APPEND ${pybind_file} "USE_OP(reduce_sum);\n")
else() else()
file(APPEND ${pybind_file} "USE_OP(${TARGET});\n") file(APPEND ${pybind_file} "USE_OP(${TARGET});\n")
endif() endif()
......
...@@ -63,8 +63,7 @@ class AccuracyOp : public framework::OperatorWithKernel { ...@@ -63,8 +63,7 @@ class AccuracyOp : public framework::OperatorWithKernel {
class AccuracyOpMaker : public framework::OpProtoAndCheckerMaker { class AccuracyOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AccuracyOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
// TODO(typhoonzero): support both inference value and indices. // TODO(typhoonzero): support both inference value and indices.
AddInput("Out", "The network output of topk (inferences)"); AddInput("Out", "The network output of topk (inferences)");
AddInput("Indices", "The the network output of topk (indices)"); AddInput("Indices", "The the network output of topk (indices)");
......
...@@ -19,19 +19,18 @@ limitations under the License. */ ...@@ -19,19 +19,18 @@ limitations under the License. */
namespace paddle { namespace paddle {
namespace operators { namespace operators {
#define REGISTER_ACTIVATION_OP_MAKER(OP_NAME, OP_COMMENT) \ #define REGISTER_ACTIVATION_OP_MAKER(OP_NAME, OP_COMMENT) \
class OP_NAME##OpMaker \ class OP_NAME##OpMaker \
: public ::paddle::framework::OpProtoAndCheckerMaker { \ : public ::paddle::framework::OpProtoAndCheckerMaker { \
public: \ public: \
OP_NAME##OpMaker(OpProto *proto, OpAttrChecker *op_checker) \ void Make() override { \
: ::paddle::framework::OpProtoAndCheckerMaker(proto, op_checker) { \ AddInput("X", "Input of " #OP_NAME "operator"); \
AddInput("X", "Input of " #OP_NAME "operator"); \ AddOutput("Out", "Output of" #OP_NAME "operator"); \
AddOutput("Out", "Output of" #OP_NAME "operator"); \ AddAttr<bool>("use_mkldnn", \
AddAttr<bool>("use_mkldnn", \ "(bool, default false) Only used in mkldnn kernel") \
"(bool, default false) Only used in mkldnn kernel") \ .SetDefault(false); \
.SetDefault(false); \ AddComment(#OP_COMMENT); \
AddComment(#OP_COMMENT); \ } \
} \
} }
#define REGISTER_ACTIVATION_OP_GRAD_MAKER(OP_NAME, KERNEL_TYPE) \ #define REGISTER_ACTIVATION_OP_GRAD_MAKER(OP_NAME, KERNEL_TYPE) \
...@@ -204,8 +203,7 @@ $$out = \frac{x}{1 + |x|}$$ ...@@ -204,8 +203,7 @@ $$out = \frac{x}{1 + |x|}$$
class LeakyReluOpMaker : public framework::OpProtoAndCheckerMaker { class LeakyReluOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LeakyReluOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of LeakyRelu operator"); AddInput("X", "Input of LeakyRelu operator");
AddOutput("Out", "Output of LeakyRelu operator"); AddOutput("Out", "Output of LeakyRelu operator");
AddAttr<float>("alpha", "The small negative slope").SetDefault(0.02f); AddAttr<float>("alpha", "The small negative slope").SetDefault(0.02f);
...@@ -220,8 +218,7 @@ $out = \max(x, \alpha * x)$ ...@@ -220,8 +218,7 @@ $out = \max(x, \alpha * x)$
class SoftShrinkOpMaker : public framework::OpProtoAndCheckerMaker { class SoftShrinkOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SoftShrinkOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of Softshrink operator"); AddInput("X", "Input of Softshrink operator");
AddOutput("Out", "Output of Softshrink operator"); AddOutput("Out", "Output of Softshrink operator");
AddAttr<float>("lambda", "non-negative offset").SetDefault(0.5f); AddAttr<float>("lambda", "non-negative offset").SetDefault(0.5f);
...@@ -242,8 +239,7 @@ $$ ...@@ -242,8 +239,7 @@ $$
class HardShrinkOpMaker : public framework::OpProtoAndCheckerMaker { class HardShrinkOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
HardShrinkOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of HardShrink operator"); AddInput("X", "Input of HardShrink operator");
AddOutput("Out", "Output of HardShrink operator"); AddOutput("Out", "Output of HardShrink operator");
AddAttr<float>("threshold", "The value of threshold for HardShrink") AddAttr<float>("threshold", "The value of threshold for HardShrink")
...@@ -265,8 +261,7 @@ $$ ...@@ -265,8 +261,7 @@ $$
class BReluOpMaker : public framework::OpProtoAndCheckerMaker { class BReluOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BReluOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of BRelu operator"); AddInput("X", "Input of BRelu operator");
AddOutput("Out", "Output of BRelu operator"); AddOutput("Out", "Output of BRelu operator");
AddAttr<float>("t_min", "The min marginal value of BRelu") AddAttr<float>("t_min", "The min marginal value of BRelu")
...@@ -284,8 +279,7 @@ $out = \max(\min(x, t_{min}), t_{max})$ ...@@ -284,8 +279,7 @@ $out = \max(\min(x, t_{min}), t_{max})$
class SoftReluOpMaker : public framework::OpProtoAndCheckerMaker { class SoftReluOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SoftReluOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of SoftRelu operator"); AddInput("X", "Input of SoftRelu operator");
AddOutput("Out", "Output of SoftRelu operator"); AddOutput("Out", "Output of SoftRelu operator");
AddAttr<float>("threshold", "The threshold value of SoftRelu") AddAttr<float>("threshold", "The threshold value of SoftRelu")
...@@ -301,8 +295,7 @@ $out = \ln(1 + \exp(\max(\min(x, threshold), threshold))$ ...@@ -301,8 +295,7 @@ $out = \ln(1 + \exp(\max(\min(x, threshold), threshold))$
class ELUOpMaker : public framework::OpProtoAndCheckerMaker { class ELUOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ELUOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of ELU operator"); AddInput("X", "Input of ELU operator");
AddOutput("Out", "Output of ELU operator"); AddOutput("Out", "Output of ELU operator");
AddAttr<float>("alpha", "The alpha value of ELU").SetDefault(1.0f); AddAttr<float>("alpha", "The alpha value of ELU").SetDefault(1.0f);
...@@ -320,8 +313,7 @@ $out = \max(0, x) + \min(0, \alpha * (e^x - 1))$ ...@@ -320,8 +313,7 @@ $out = \max(0, x) + \min(0, \alpha * (e^x - 1))$
class Relu6OpMaker : public framework::OpProtoAndCheckerMaker { class Relu6OpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Relu6OpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of Relu6 operator"); AddInput("X", "Input of Relu6 operator");
AddOutput("Out", "Output of Relu6 operator"); AddOutput("Out", "Output of Relu6 operator");
AddAttr<float>("threshold", "The threshold value of Relu6") AddAttr<float>("threshold", "The threshold value of Relu6")
...@@ -337,8 +329,7 @@ $out = \min(\max(0, x), 6)$ ...@@ -337,8 +329,7 @@ $out = \min(\max(0, x), 6)$
class PowOpMaker : public framework::OpProtoAndCheckerMaker { class PowOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
PowOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of Pow operator"); AddInput("X", "Input of Pow operator");
AddOutput("Out", "Output of Pow operator"); AddOutput("Out", "Output of Pow operator");
AddAttr<float>("factor", "The exponential factor of Pow").SetDefault(1.0f); AddAttr<float>("factor", "The exponential factor of Pow").SetDefault(1.0f);
...@@ -353,8 +344,7 @@ $out = x^{factor}$ ...@@ -353,8 +344,7 @@ $out = x^{factor}$
class STanhOpMaker : public framework::OpProtoAndCheckerMaker { class STanhOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
STanhOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of STanh operator"); AddInput("X", "Input of STanh operator");
AddOutput("Out", "Output of STanh operator"); AddOutput("Out", "Output of STanh operator");
AddAttr<float>("scale_a", "The scale parameter of a for the input") AddAttr<float>("scale_a", "The scale parameter of a for the input")
...@@ -372,8 +362,7 @@ $$out = b * \frac{e^{a * x} - e^{-a * x}}{e^{a * x} + e^{-a * x}}$$ ...@@ -372,8 +362,7 @@ $$out = b * \frac{e^{a * x} - e^{-a * x}}{e^{a * x} + e^{-a * x}}$$
class ThresholdedReluOpMaker : public framework::OpProtoAndCheckerMaker { class ThresholdedReluOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ThresholdedReluOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of ThresholdedRelu operator"); AddInput("X", "Input of ThresholdedRelu operator");
AddOutput("Out", "Output of ThresholdedRelu operator"); AddOutput("Out", "Output of ThresholdedRelu operator");
AddAttr<float>("threshold", "The threshold location of activation") AddAttr<float>("threshold", "The threshold location of activation")
...@@ -394,8 +383,7 @@ $$ ...@@ -394,8 +383,7 @@ $$
class HardSigmoidOpMaker : public framework::OpProtoAndCheckerMaker { class HardSigmoidOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
HardSigmoidOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of HardSigmoid operator"); AddInput("X", "Input of HardSigmoid operator");
AddOutput("Out", "Output of HardSigmoid operator"); AddOutput("Out", "Output of HardSigmoid operator");
AddAttr<float>("slope", "Slope for linear approximation of sigmoid") AddAttr<float>("slope", "Slope for linear approximation of sigmoid")
...@@ -420,8 +408,7 @@ It is recommended to use the defaults for this activation. ...@@ -420,8 +408,7 @@ It is recommended to use the defaults for this activation.
class SwishOpMaker : public framework::OpProtoAndCheckerMaker { class SwishOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SwishOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of Swish operator"); AddInput("X", "Input of Swish operator");
AddOutput("Out", "Output of Swish operator"); AddOutput("Out", "Output of Swish operator");
AddAttr<float>("beta", "Constant beta of swish operator").SetDefault(1.0f); AddAttr<float>("beta", "Constant beta of swish operator").SetDefault(1.0f);
......
...@@ -66,8 +66,7 @@ class AdadeltaOp : public framework::OperatorWithKernel { ...@@ -66,8 +66,7 @@ class AdadeltaOp : public framework::OperatorWithKernel {
class AdadeltaOpMaker : public framework::OpProtoAndCheckerMaker { class AdadeltaOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AdadeltaOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", "(Tensor) Input parameter"); AddInput("Param", "(Tensor) Input parameter");
AddInput("Grad", "(Tensor) Input gradient"); AddInput("Grad", "(Tensor) Input gradient");
AddInput("AvgSquaredGrad", "(Tensor) Input average of squared gradient"); AddInput("AvgSquaredGrad", "(Tensor) Input average of squared gradient");
......
...@@ -67,8 +67,7 @@ class AdagradOp : public framework::OperatorWithKernel { ...@@ -67,8 +67,7 @@ class AdagradOp : public framework::OperatorWithKernel {
class AdagradOpMaker : public framework::OpProtoAndCheckerMaker { class AdagradOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AdagradOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", "(Tensor) Input parameter"); AddInput("Param", "(Tensor) Input parameter");
AddInput("Grad", "(Tensor) Input gradient"); AddInput("Grad", "(Tensor) Input gradient");
AddInput("Moment", "(Tensor) Second moment"); AddInput("Moment", "(Tensor) Second moment");
......
...@@ -80,8 +80,7 @@ class AdamOp : public framework::OperatorWithKernel { ...@@ -80,8 +80,7 @@ class AdamOp : public framework::OperatorWithKernel {
class AdamOpMaker : public framework::OpProtoAndCheckerMaker { class AdamOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AdamOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", "(Tensor) Input parameter"); AddInput("Param", "(Tensor) Input parameter");
AddInput("Grad", "(Tensor) Input gradient"); AddInput("Grad", "(Tensor) Input gradient");
AddInput("LearningRate", "(Tensor) Learning rate"); AddInput("LearningRate", "(Tensor) Learning rate");
......
...@@ -74,8 +74,7 @@ class AdamaxOp : public framework::OperatorWithKernel { ...@@ -74,8 +74,7 @@ class AdamaxOp : public framework::OperatorWithKernel {
class AdamaxOpMaker : public framework::OpProtoAndCheckerMaker { class AdamaxOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AdamaxOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", "(Tensor) Input parameter"); AddInput("Param", "(Tensor) Input parameter");
AddInput("Grad", "(Tensor) Input gradient"); AddInput("Grad", "(Tensor) Input gradient");
AddInput("LearningRate", "(Tensor) Learning rate"); AddInput("LearningRate", "(Tensor) Learning rate");
......
...@@ -123,8 +123,7 @@ class ArrayToLoDTensorOp : public framework::OperatorBase { ...@@ -123,8 +123,7 @@ class ArrayToLoDTensorOp : public framework::OperatorBase {
class ArrayToLoDTensorOpProtoMaker : public framework::OpProtoAndCheckerMaker { class ArrayToLoDTensorOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ArrayToLoDTensorOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(std::vector<LodTensor>) A vector of tensors that is going to " "(std::vector<LodTensor>) A vector of tensors that is going to "
"be casted to a big LoDTensor."); "be casted to a big LoDTensor.");
......
...@@ -94,8 +94,7 @@ class AssignOp : public framework::OperatorBase { ...@@ -94,8 +94,7 @@ class AssignOp : public framework::OperatorBase {
class AssignOpProtoMaker : public framework::OpProtoAndCheckerMaker { class AssignOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AssignOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor, SelectedRows or LoDTensorArray) The input variable " "(LoDTensor, SelectedRows or LoDTensorArray) The input variable "
"could be LoDTensor, SelectedRows or LoDTensorArray.") "could be LoDTensor, SelectedRows or LoDTensorArray.")
......
...@@ -45,8 +45,7 @@ class AssignValueOp : public framework::OperatorWithKernel { ...@@ -45,8 +45,7 @@ class AssignValueOp : public framework::OperatorWithKernel {
class AssignValueOpMaker : public framework::OpProtoAndCheckerMaker { class AssignValueOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AssignValueOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("Out", "(Tensor) Output tensor of assign_value operator."); AddOutput("Out", "(Tensor) Output tensor of assign_value operator.");
AddAttr<std::vector<int>>("shape", AddAttr<std::vector<int>>("shape",
"(vector<int>) " "(vector<int>) "
......
...@@ -50,8 +50,7 @@ class AucOp : public framework::OperatorWithKernel { ...@@ -50,8 +50,7 @@ class AucOp : public framework::OperatorWithKernel {
class AucOpMaker : public framework::OpProtoAndCheckerMaker { class AucOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AucOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Out", AddInput("Out",
"A floating point 2D tensor, values are in the range [0, 1]." "A floating point 2D tensor, values are in the range [0, 1]."
"Each row is sorted in descending order. This input should be the" "Each row is sorted in descending order. This input should be the"
......
...@@ -111,8 +111,7 @@ class AverageAccumulatesOp : public framework::OperatorWithKernel { ...@@ -111,8 +111,7 @@ class AverageAccumulatesOp : public framework::OperatorWithKernel {
class AverageAccumulatesOpMaker : public framework::OpProtoAndCheckerMaker { class AverageAccumulatesOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
AverageAccumulatesOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("param", "(Tensor), The parameter to be accumulated."); AddInput("param", "(Tensor), The parameter to be accumulated.");
AddInput("in_sum_1", AddInput("in_sum_1",
"(Tensor), A tensor used to store the parameter " "(Tensor), A tensor used to store the parameter "
......
...@@ -126,8 +126,7 @@ class BatchNormOp : public framework::OperatorWithKernel { ...@@ -126,8 +126,7 @@ class BatchNormOp : public framework::OperatorWithKernel {
class BatchNormOpMaker : public framework::OpProtoAndCheckerMaker { class BatchNormOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BatchNormOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddAttr<bool>("is_test", "").SetDefault(false); AddAttr<bool>("is_test", "").SetDefault(false);
AddAttr<float>("momentum", "").SetDefault(0.9); AddAttr<float>("momentum", "").SetDefault(0.9);
AddAttr<float>("epsilon", "") AddAttr<float>("epsilon", "")
......
...@@ -53,8 +53,7 @@ class BatchSizeLikeOp : public framework::OperatorWithKernel { ...@@ -53,8 +53,7 @@ class BatchSizeLikeOp : public framework::OperatorWithKernel {
class BatchSizeLikeOpMaker : public framework::OpProtoAndCheckerMaker { class BatchSizeLikeOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BatchSizeLikeOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() final {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input", AddInput("Input",
"(Tensor) Tensor " "(Tensor) Tensor "
"whose input_dim_idx'th dimension specifies the batch_size"); "whose input_dim_idx'th dimension specifies the batch_size");
...@@ -68,7 +67,11 @@ class BatchSizeLikeOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -68,7 +67,11 @@ class BatchSizeLikeOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr<int>("output_dim_idx", AddAttr<int>("output_dim_idx",
"(int, default 0) The index of output's batch size dimension") "(int, default 0) The index of output's batch size dimension")
.SetDefault(0); .SetDefault(0);
Apply();
} }
protected:
virtual void Apply() = 0;
}; };
} // namespace operators } // namespace operators
......
...@@ -134,8 +134,7 @@ class BeamSearchDecodeOp : public framework::OperatorBase { ...@@ -134,8 +134,7 @@ class BeamSearchDecodeOp : public framework::OperatorBase {
class BeamSearchDecodeOpProtoMaker : public framework::OpProtoAndCheckerMaker { class BeamSearchDecodeOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BeamSearchDecodeOpProtoMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Ids", AddInput("Ids",
"(LodTensorArray)" "(LodTensorArray)"
"score of the candidate words in each step"); "score of the candidate words in each step");
......
...@@ -197,8 +197,7 @@ std::string ItemToString(const BeamSearch::Item &item) { ...@@ -197,8 +197,7 @@ std::string ItemToString(const BeamSearch::Item &item) {
class BeamSearchOpMaker : public framework::OpProtoAndCheckerMaker { class BeamSearchOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BeamSearchOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
// inputs and outputs stored in proto // inputs and outputs stored in proto
AddInput("pre_ids", "ids in previous step"); AddInput("pre_ids", "ids in previous step");
AddInput("ids", "a LoDTensor of shape of [None,k]"); AddInput("ids", "a LoDTensor of shape of [None,k]");
......
...@@ -41,8 +41,7 @@ class BilinearInterpOp : public framework::OperatorWithKernel { ...@@ -41,8 +41,7 @@ class BilinearInterpOp : public framework::OperatorWithKernel {
class BilinearInterpOpMaker : public framework::OpProtoAndCheckerMaker { class BilinearInterpOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BilinearInterpOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor) The input tensor of bilinear interpolation, " "(Tensor) The input tensor of bilinear interpolation, "
"This is a 4-D tensor with shape of (N x C x h x w)"); "This is a 4-D tensor with shape of (N x C x h x w)");
......
...@@ -65,8 +65,7 @@ class BilinearTensorProductOp : public framework::OperatorWithKernel { ...@@ -65,8 +65,7 @@ class BilinearTensorProductOp : public framework::OperatorWithKernel {
class BilinearTensorProductOpMaker : public framework::OpProtoAndCheckerMaker { class BilinearTensorProductOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BilinearTensorProductOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The first input of bilinear_tensor_product operator."); AddInput("X", "The first input of bilinear_tensor_product operator.");
AddInput("Y", "The second input of bilinear_tensor_product operator."); AddInput("Y", "The second input of bilinear_tensor_product operator.");
AddInput("Weight", AddInput("Weight",
......
...@@ -182,8 +182,7 @@ class BipartiteMatchKernel : public framework::OpKernel<T> { ...@@ -182,8 +182,7 @@ class BipartiteMatchKernel : public framework::OpKernel<T> {
class BipartiteMatchOpMaker : public framework::OpProtoAndCheckerMaker { class BipartiteMatchOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BipartiteMatchOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"DistMat", "DistMat",
"(LoDTensor or Tensor) this input is a 2-D LoDTensor with shape " "(LoDTensor or Tensor) this input is a 2-D LoDTensor with shape "
......
...@@ -60,8 +60,7 @@ class BoxCoderOp : public framework::OperatorWithKernel { ...@@ -60,8 +60,7 @@ class BoxCoderOp : public framework::OperatorWithKernel {
class BoxCoderOpMaker : public framework::OpProtoAndCheckerMaker { class BoxCoderOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BoxCoderOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"PriorBox", "PriorBox",
"(Tensor, default Tensor<float>) " "(Tensor, default Tensor<float>) "
......
...@@ -21,8 +21,7 @@ namespace operators { ...@@ -21,8 +21,7 @@ namespace operators {
class CastOpProtoMaker : public framework::OpProtoAndCheckerMaker { class CastOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CastOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input tensor of cast op"); AddInput("X", "The input tensor of cast op");
AddOutput("Out", "The output tensor of cast op"); AddOutput("Out", "The output tensor of cast op");
AddAttr<int>("out_dtype", "output data type"); AddAttr<int>("out_dtype", "output data type");
......
...@@ -50,8 +50,7 @@ class ChannelCloseOpOpInferShape : public framework::InferShapeBase { ...@@ -50,8 +50,7 @@ class ChannelCloseOpOpInferShape : public framework::InferShapeBase {
class ChannelCloseOpMaker : public framework::OpProtoAndCheckerMaker { class ChannelCloseOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ChannelCloseOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(kChannel, AddInput(kChannel,
"The Channel Variable that should be closed by" "The Channel Variable that should be closed by"
" the ChannelClose Op."); " the ChannelClose Op.");
......
...@@ -91,8 +91,7 @@ class ChannelCreateOpOpInferShape : public framework::InferShapeBase { ...@@ -91,8 +91,7 @@ class ChannelCreateOpOpInferShape : public framework::InferShapeBase {
class ChannelCreateOpMaker : public framework::OpProtoAndCheckerMaker { class ChannelCreateOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ChannelCreateOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput(kOutput, AddOutput(kOutput,
"The object of a Channel type created by ChannelCreate Op."); "The object of a Channel type created by ChannelCreate Op.");
AddAttr<int>("capacity", "The size of the buffer of Channel.") AddAttr<int>("capacity", "The size of the buffer of Channel.")
......
...@@ -72,8 +72,7 @@ class ChannelRecvOp : public framework::OperatorBase { ...@@ -72,8 +72,7 @@ class ChannelRecvOp : public framework::OperatorBase {
class ChannelRecvOpMaker : public framework::OpProtoAndCheckerMaker { class ChannelRecvOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ChannelRecvOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(Channel, AddInput(Channel,
"(Channel) A variable which \"receives\" the a value sent" "(Channel) A variable which \"receives\" the a value sent"
"to it by a channel_send op.") "to it by a channel_send op.")
......
...@@ -57,8 +57,7 @@ class ChannelSendOp : public framework::OperatorBase { ...@@ -57,8 +57,7 @@ class ChannelSendOp : public framework::OperatorBase {
class ChannelSendOpMaker : public framework::OpProtoAndCheckerMaker { class ChannelSendOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ChannelSendOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(Channel, AddInput(Channel,
"(Channel) A variable which \"sends\" the passed in value to " "(Channel) A variable which \"sends\" the passed in value to "
"a listening receiver.") "a listening receiver.")
......
...@@ -66,8 +66,7 @@ class ChunkEvalOp : public framework::OperatorWithKernel { ...@@ -66,8 +66,7 @@ class ChunkEvalOp : public framework::OperatorWithKernel {
class ChunkEvalOpMaker : public framework::OpProtoAndCheckerMaker { class ChunkEvalOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ChunkEvalOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Inference", AddInput("Inference",
"(Tensor, default: Tensor<int64_t>). " "(Tensor, default: Tensor<int64_t>). "
"Predictions from the network."); "Predictions from the network.");
......
...@@ -37,8 +37,7 @@ class ClipByNormOp : public framework::OperatorWithKernel { ...@@ -37,8 +37,7 @@ class ClipByNormOp : public framework::OperatorWithKernel {
class ClipByNormOpMaker : public framework::OpProtoAndCheckerMaker { class ClipByNormOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ClipByNormOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor) The input of clip_by_norm op." "(Tensor) The input of clip_by_norm op."
"The number of dimensions must be between [1, 9]."); "The number of dimensions must be between [1, 9].");
......
...@@ -38,8 +38,7 @@ class ClipOp : public framework::OperatorWithKernel { ...@@ -38,8 +38,7 @@ class ClipOp : public framework::OperatorWithKernel {
template <typename AttrType> template <typename AttrType>
class ClipOpMaker : public framework::OpProtoAndCheckerMaker { class ClipOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ClipOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor)The input of clip op." "(Tensor)The input of clip op."
"The number of dimensions must be between [1, 9]."); "The number of dimensions must be between [1, 9].");
......
...@@ -21,8 +21,7 @@ namespace operators { ...@@ -21,8 +21,7 @@ namespace operators {
template <typename OpComment> template <typename OpComment>
class CompareOpProtoMaker : public framework::OpProtoAndCheckerMaker { class CompareOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CompareOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
OpComment comment; OpComment comment;
AddInput("X", AddInput("X",
string::Sprintf("(LoDTensor) the left hand operand of %s operator", string::Sprintf("(LoDTensor) the left hand operand of %s operator",
......
...@@ -63,8 +63,7 @@ class ConcatOp : public framework::OperatorWithKernel { ...@@ -63,8 +63,7 @@ class ConcatOp : public framework::OperatorWithKernel {
class ConcatOpMaker : public framework::OpProtoAndCheckerMaker { class ConcatOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ConcatOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input tensors of concat operator.").AsDuplicable(); AddInput("X", "Input tensors of concat operator.").AsDuplicable();
AddOutput("Out", "Output tensor of concat operator."); AddOutput("Out", "Output tensor of concat operator.");
AddAttr<int>("axis", AddAttr<int>("axis",
......
...@@ -108,8 +108,7 @@ class ConditionalBlockOp : public ConditionalOp { ...@@ -108,8 +108,7 @@ class ConditionalBlockOp : public ConditionalOp {
class ConditionalBlockOpProtoMaker : public framework::OpProtoAndCheckerMaker { class ConditionalBlockOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ConditionalBlockOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"The conditional variable of this operator. If X is empty, the " "The conditional variable of this operator. If X is empty, the "
"whole sub-block will not be executed.") "whole sub-block will not be executed.")
......
...@@ -106,8 +106,7 @@ framework::OpKernelType ConvOp::GetExpectedKernelType( ...@@ -106,8 +106,7 @@ framework::OpKernelType ConvOp::GetExpectedKernelType(
library); library);
} }
Conv2DOpMaker::Conv2DOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Conv2DOpMaker::Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"Input", "Input",
"(Tensor) The input tensor of convolution operator. " "(Tensor) The input tensor of convolution operator. "
...@@ -200,8 +199,7 @@ $$ ...@@ -200,8 +199,7 @@ $$
)DOC"); )DOC");
} }
Conv3DOpMaker::Conv3DOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Conv3DOpMaker::Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"Input", "Input",
"(Tensor) The input tensor of convolution operator. " "(Tensor) The input tensor of convolution operator. "
......
...@@ -60,12 +60,12 @@ inline bool IsExpand(const std::vector<int64_t>& filter_dim, ...@@ -60,12 +60,12 @@ inline bool IsExpand(const std::vector<int64_t>& filter_dim,
// operator implementations can reuse the code. // operator implementations can reuse the code.
class Conv2DOpMaker : public framework::OpProtoAndCheckerMaker { class Conv2DOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Conv2DOpMaker(OpProto* proto, OpAttrChecker* op_checker); void Make() override;
}; };
class Conv3DOpMaker : public framework::OpProtoAndCheckerMaker { class Conv3DOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Conv3DOpMaker(OpProto* proto, OpAttrChecker* op_checker); void Make() override;
}; };
class ConvOp : public framework::OperatorWithKernel { class ConvOp : public framework::OperatorWithKernel {
......
...@@ -75,8 +75,7 @@ class ConvShiftGradOp : public framework::OperatorWithKernel { ...@@ -75,8 +75,7 @@ class ConvShiftGradOp : public framework::OperatorWithKernel {
class ConvShiftOpMaker : public framework::OpProtoAndCheckerMaker { class ConvShiftOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ConvShiftOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor, default Tensor<float>), a 2-D tensor with shape B x M, " "(Tensor, default Tensor<float>), a 2-D tensor with shape B x M, "
"where B is the batch size and M is the data dimension."); "where B is the batch size and M is the data dimension.");
......
...@@ -84,9 +84,7 @@ framework::OpKernelType ConvTransposeOp::GetExpectedKernelType( ...@@ -84,9 +84,7 @@ framework::OpKernelType ConvTransposeOp::GetExpectedKernelType(
layout_, library_); layout_, library_);
} }
Conv2DTransposeOpMaker::Conv2DTransposeOpMaker(OpProto* proto, void Conv2DTransposeOpMaker::Make() {
OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"Input", "Input",
"(Tensor) The input tensor of convolution transpose operator. " "(Tensor) The input tensor of convolution transpose operator. "
...@@ -168,9 +166,7 @@ Example: ...@@ -168,9 +166,7 @@ Example:
)DOC"); )DOC");
} }
Conv3DTransposeOpMaker::Conv3DTransposeOpMaker(OpProto* proto, void Conv3DTransposeOpMaker::Make() {
OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input", AddInput("Input",
"(Tensor) The input tensor of convolution transpose operator." "(Tensor) The input tensor of convolution transpose operator."
"The format of input tensor is NCDHW. Where N is batch size, C is " "The format of input tensor is NCDHW. Where N is batch size, C is "
......
...@@ -30,12 +30,12 @@ using DDim = framework::DDim; ...@@ -30,12 +30,12 @@ using DDim = framework::DDim;
// operator implementations can reuse the code. // operator implementations can reuse the code.
class Conv2DTransposeOpMaker : public framework::OpProtoAndCheckerMaker { class Conv2DTransposeOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Conv2DTransposeOpMaker(OpProto* proto, OpAttrChecker* op_checker); void Make() override;
}; };
class Conv3DTransposeOpMaker : public framework::OpProtoAndCheckerMaker { class Conv3DTransposeOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Conv3DTransposeOpMaker(OpProto* proto, OpAttrChecker* op_checker); void Make() override;
}; };
class ConvTransposeOp : public framework::OperatorWithKernel { class ConvTransposeOp : public framework::OperatorWithKernel {
......
...@@ -62,8 +62,7 @@ class CosSimOp : public framework::OperatorWithKernel { ...@@ -62,8 +62,7 @@ class CosSimOp : public framework::OperatorWithKernel {
class CosSimOpMaker : public framework::OpProtoAndCheckerMaker { class CosSimOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CosSimOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The 1st input of cos_sim op."); AddInput("X", "The 1st input of cos_sim op.");
AddInput("Y", "The 2nd input of cos_sim op."); AddInput("Y", "The 2nd input of cos_sim op.");
AddOutput("Out", "The output of cos_sim op."); AddOutput("Out", "The output of cos_sim op.");
......
...@@ -18,8 +18,7 @@ namespace paddle { ...@@ -18,8 +18,7 @@ namespace paddle {
namespace operators { namespace operators {
class CRFDecodingOpMaker : public framework::OpProtoAndCheckerMaker { class CRFDecodingOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CRFDecodingOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Emission", AddInput("Emission",
"(LoDTensor, default: LoDTensor<float>). A LoDTensor with shape " "(LoDTensor, default: LoDTensor<float>). A LoDTensor with shape "
"[N x D] where N is the size of the mini-batch and D is the total " "[N x D] where N is the size of the mini-batch and D is the total "
......
...@@ -52,8 +52,7 @@ class CropOp : public framework::OperatorWithKernel { ...@@ -52,8 +52,7 @@ class CropOp : public framework::OperatorWithKernel {
class CropOpMaker : public framework::OpProtoAndCheckerMaker { class CropOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CropOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"The input of pad op. " "The input of pad op. "
"The input should be a k-D tensor(k > 0 and k < 7)."); "The input should be a k-D tensor(k > 0 and k < 7).");
......
...@@ -111,8 +111,7 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel { ...@@ -111,8 +111,7 @@ class CrossEntropyGradientOp : public framework::OperatorWithKernel {
class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker { class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CrossEntropyOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor, default Tensor<float>), a 2-D tensor with shape [N x D]," "(Tensor, default Tensor<float>), a 2-D tensor with shape [N x D],"
" where N is the batch size and D is the number of classes. " " where N is the batch size and D is the number of classes. "
......
...@@ -44,8 +44,7 @@ class CTCAlignOp : public framework::OperatorWithKernel { ...@@ -44,8 +44,7 @@ class CTCAlignOp : public framework::OperatorWithKernel {
class CTCAlignOpMaker : public framework::OpProtoAndCheckerMaker { class CTCAlignOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CTCAlignOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input", AddInput("Input",
"(LodTensor, default: LoDTensor<int>), Its shape is " "(LodTensor, default: LoDTensor<int>), Its shape is "
"[Lp, 1], where Lp is the sum of all input sequences' length."); "[Lp, 1], where Lp is the sum of all input sequences' length.");
......
...@@ -29,8 +29,7 @@ class CumOp : public framework::OperatorWithKernel { ...@@ -29,8 +29,7 @@ class CumOp : public framework::OperatorWithKernel {
class CumsumOpMaker : public framework::OpProtoAndCheckerMaker { class CumsumOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
CumsumOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "Input of Cumsum operator"); AddInput("X", "Input of Cumsum operator");
AddOutput("Out", "Output of Cumsum operator"); AddOutput("Out", "Output of Cumsum operator");
AddAttr<int>("axis", AddAttr<int>("axis",
......
...@@ -62,8 +62,7 @@ class DecayedAdagradOp : public framework::OperatorWithKernel { ...@@ -62,8 +62,7 @@ class DecayedAdagradOp : public framework::OperatorWithKernel {
class DecayedAdagradOpMaker : public framework::OpProtoAndCheckerMaker { class DecayedAdagradOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
DecayedAdagradOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", "(Tensor) Input parameter"); AddInput("Param", "(Tensor) Input parameter");
AddInput("Grad", "(Tensor) Input gradient"); AddInput("Grad", "(Tensor) Input gradient");
AddInput("Moment", "(Tensor) Second moment"); AddInput("Moment", "(Tensor) Second moment");
......
...@@ -34,8 +34,7 @@ class DeleteVarOp : public framework::OperatorBase { ...@@ -34,8 +34,7 @@ class DeleteVarOp : public framework::OperatorBase {
class DeleteVarOpInfoMaker : public framework::OpProtoAndCheckerMaker { class DeleteVarOpInfoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
DeleteVarOpInfoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of delete op").AsDuplicable(); AddInput("X", "The input of delete op").AsDuplicable();
AddComment(R"DOC( AddComment(R"DOC(
Delete Operator. Delete Operator.
......
...@@ -29,129 +29,127 @@ namespace paddle { ...@@ -29,129 +29,127 @@ namespace paddle {
namespace operators { namespace operators {
namespace detail { namespace detail {
using VarMsg = sendrecv::VariableMessage;
void GetTensorPayload(framework::Variable* var,
const platform::DeviceContext& ctx, VarMsg* request,
void** payload, size_t* payload_size) {
auto tensor = var->Get<framework::LoDTensor>();
// FIXME(wuyi): data types in send_recv.proto is copied from
// framework.proto
request->set_data_type(
static_cast<VarMsg::Type>(framework::ToDataType(tensor.type())));
for (auto& dim : framework::vectorize(tensor.dims())) {
request->add_dims(dim);
}
const framework::LoD lod = tensor.lod();
if (lod.size() > 0) {
request->set_lod_level(lod.size());
for (auto& each : lod) {
VarMsg::LodData* lod_inner = request->add_lod();
for (auto& d : each) {
lod_inner->add_lod_data(d);
}
}
}
if (platform::is_gpu_place(ctx.GetPlace())) {
#ifdef PADDLE_WITH_CUDA
PADDLE_ENFORCE(platform::is_gpu_place(tensor.place()));
platform::CPUPlace cpu;
auto& gpu_dev_ctx = static_cast<const platform::CUDADeviceContext&>(ctx);
auto copy_size = tensor.numel() * framework::SizeOfType(tensor.type());
*payload = memory::Alloc(cpu, copy_size);
memory::Copy(cpu, *payload, boost::get<platform::CUDAPlace>(tensor.place()),
reinterpret_cast<const void*>(tensor.data<void>()), copy_size,
gpu_dev_ctx.stream());
ctx.Wait();
#endif
} else {
*payload = tensor.data<void>();
}
*payload_size = tensor.numel() * framework::SizeOfType(tensor.type());
}
void GetSelectedRowsPayload(framework::Variable* var,
const platform::DeviceContext& ctx, VarMsg* request,
void** payload, size_t* payload_size) {
auto* slr = var->GetMutable<framework::SelectedRows>();
request->set_data_type(
static_cast<VarMsg::Type>(framework::ToDataType(slr->value().type())));
request->set_lod_level(0);
request->set_slr_height(slr->height());
for (auto& dim : framework::vectorize(slr->value().dims())) {
request->add_dims(dim);
}
auto* tensor = slr->mutable_value();
if (platform::is_gpu_place(ctx.GetPlace())) {
#ifdef PADDLE_WITH_CUDA
platform::CPUPlace cpu;
auto& gpu_dev_ctx = static_cast<const platform::CUDADeviceContext&>(ctx);
auto copy_size = tensor->numel() * framework::SizeOfType(tensor->type());
*payload = memory::Alloc(cpu, copy_size);
memory::Copy(cpu, *payload,
boost::get<platform::CUDAPlace>(tensor->place()),
reinterpret_cast<const void*>(tensor->data<void>()), copy_size,
gpu_dev_ctx.stream());
ctx.Wait();
#endif
} else {
*payload = slr->mutable_value()->data<void>();
}
*payload_size = tensor->numel() * framework::SizeOfType(tensor->type());
}
void SerializeToByteBuffer(const std::string& name, framework::Variable* var, void SerializeToByteBuffer(const std::string& name, framework::Variable* var,
const platform::DeviceContext& ctx, const platform::DeviceContext& ctx,
::grpc::ByteBuffer* msg, ::grpc::ByteBuffer* msg,
const std::string& out_name) { const std::string& out_name) {
using VarMsg = sendrecv::VariableMessage; // Default DestroyCallback does nothing, When using GPU
// When using GPU, need to free the copied CPU buffer // the CPU buffer need to be freed.
// when the ByteBuffer destroies
// TODO(typhoonzero): add unref here, if we have dependent
// parallelism execution, need to know when to free the tensor.
DestroyCallback destroy_callback = [](void* backing) {}; DestroyCallback destroy_callback = [](void* backing) {};
VarMsg request;
auto buffer = std::unique_ptr<char[]>(new char[1024]);
void* buf = buffer.get();
void* payload = nullptr; void* payload = nullptr;
size_t payload_size; size_t payload_size;
ProtoEncodeHelper e(static_cast<char*>(buf), 1024);
request.set_varname(name);
// Note: normally the profiler is enabled in 1 trainer, hence only // Note: normally the profiler is enabled in 1 trainer, hence only
// 1 trainer returns true for ShouldSendProfileState(). It tells PS // 1 trainer returns true for ShouldSendProfileState(). It tells PS
// servers the trainer's profiling state so that PS can follow the // servers the trainer's profiling state so that PS can follow the
// trainer. // trainer.
if (platform::ShouldSendProfileState()) { request.set_profile(platform::IsProfileEnabled());
e.WriteBool(VarMsg::kProfileFieldNumber, platform::IsProfileEnabled()); if (!out_name.empty()) {
request.set_out_varname(out_name);
} }
e.WriteString(VarMsg::kVarnameFieldNumber, name);
if (var->IsType<framework::LoDTensor>()) { if (var->IsType<framework::LoDTensor>()) {
e.WriteUint64(VarMsg::kTypeFieldNumber, 0); request.set_type(::sendrecv::LOD_TENSOR);
GetTensorPayload(var, ctx, &request, &payload, &payload_size);
} else if (var->IsType<framework::SelectedRows>()) { } else if (var->IsType<framework::SelectedRows>()) {
e.WriteUint64(VarMsg::kTypeFieldNumber, 1); request.set_type(::sendrecv::SELECTED_ROWS);
GetSelectedRowsPayload(var, ctx, &request, &payload, &payload_size);
} else {
PADDLE_THROW("Serialize does not support type: %s",
typeid(var->Type()).name());
} }
if (!out_name.empty()) { if (platform::is_gpu_place(ctx.GetPlace())) {
e.WriteString(VarMsg::kOutVarnameFieldNumber, out_name); // GPU data is copied to CPU buffer when sending,
// free the buffer when possible.
destroy_callback = [](void* backing) {
platform::CPUPlace cpu;
memory::Free(cpu, backing);
};
} }
switch (framework::ToVarType(var->Type())) {
case framework::proto::VarType_Type_LOD_TENSOR: {
auto tensor = var->Get<framework::LoDTensor>();
e.WriteUint64(VarMsg::kDataTypeFieldNumber,
framework::ToDataType(tensor.type()));
for (auto& dim : framework::vectorize(tensor.dims())) {
e.WriteUint64(VarMsg::kDimsFieldNumber, dim);
}
auto lod = tensor.lod(); // std::vector<Vector<size_t>>
if (lod.size() > 0) {
e.WriteUint64(VarMsg::kLodLevelFieldNumber, lod.size());
for (auto& each : lod) {
e.WriteVarlengthBeginning(VarMsg::kLodFieldNumber,
2 + // tag + varintlength of submessage
1 + // kLodDataFieldNumber
each.size());
// auto copied from GPU
for (auto& d : each) {
e.WriteUint64(VarMsg::LodData::kLodDataFieldNumber, d);
}
}
}
if (platform::is_gpu_place(ctx.GetPlace())) {
#ifdef PADDLE_WITH_CUDA
PADDLE_ENFORCE(platform::is_gpu_place(tensor.place()));
platform::CPUPlace cpu;
auto& gpu_dev_ctx =
static_cast<const platform::CUDADeviceContext&>(ctx);
auto copy_size = tensor.numel() * framework::SizeOfType(tensor.type());
payload = memory::Alloc(cpu, copy_size);
memory::Copy(cpu, payload,
boost::get<platform::CUDAPlace>(tensor.place()),
reinterpret_cast<const void*>(tensor.data<void>()),
copy_size, gpu_dev_ctx.stream());
ctx.Wait();
destroy_callback = [](void* backing) {
platform::CPUPlace cpu;
memory::Free(cpu, backing);
};
#endif std::string header;
} else { request.AppendToString(&header);
payload = tensor.data<void>(); auto buffer = std::unique_ptr<char[]>(new char[1024]);
} void* buf = buffer.get();
payload_size = tensor.numel() * framework::SizeOfType(tensor.type()); ProtoEncodeHelper e(static_cast<char*>(buf), 1024);
e.WriteVarlengthBeginning(VarMsg::kSerializedFieldNumber, payload_size); e.WriteRawBytes(std::string(header.data(), header.size()));
} break; e.WriteVarlengthBeginning(VarMsg::kSerializedFieldNumber, payload_size);
case framework::proto::VarType_Type_SELECTED_ROWS: {
// TODO(typhoonzero): selectedrows implement should not use unique_ptr
auto* slr = var->GetMutable<framework::SelectedRows>();
e.WriteUint64(VarMsg::kDataTypeFieldNumber,
framework::ToDataType(slr->value().type()));
for (auto& dim : framework::vectorize(slr->value().dims())) {
e.WriteUint64(VarMsg::kDimsFieldNumber, dim);
}
e.WriteUint64(VarMsg::kLodLevelFieldNumber, 0);
e.WriteUint64(VarMsg::kSlrHeightFieldNumber, slr->height());
auto* tensor = slr->mutable_value();
if (platform::is_gpu_place(ctx.GetPlace())) {
#ifdef PADDLE_WITH_CUDA
platform::CPUPlace cpu;
auto& gpu_dev_ctx =
static_cast<const platform::CUDADeviceContext&>(ctx);
auto copy_size =
tensor->numel() * framework::SizeOfType(tensor->type());
payload = memory::Alloc(cpu, copy_size);
memory::Copy(cpu, payload,
boost::get<platform::CUDAPlace>(tensor->place()),
reinterpret_cast<const void*>(tensor->data<void>()),
copy_size, gpu_dev_ctx.stream());
ctx.Wait();
destroy_callback = [](void* backing) {
platform::CPUPlace cpu;
memory::Free(cpu, backing);
};
#endif
} else {
payload = slr->mutable_value()->data<void>();
}
payload_size = tensor->numel() * framework::SizeOfType(tensor->type());
e.WriteVarlengthBeginning(VarMsg::kSerializedFieldNumber, payload_size);
} break;
default:
PADDLE_THROW("Serialize does not support type: %s",
typeid(var->Type()).name());
break;
}
// steal reference of tensor data // steal reference of tensor data
::grpc::Slice slices[4]; // metadata, tensor, rows meta, rows ::grpc::Slice slices[4]; // metadata, tensor, rows meta, rows
int num_slices = 2; // only SelectedRows have rows buffer int num_slices = 2; // only SelectedRows have rows buffer
...@@ -162,12 +160,9 @@ void SerializeToByteBuffer(const std::string& name, framework::Variable* var, ...@@ -162,12 +160,9 @@ void SerializeToByteBuffer(const std::string& name, framework::Variable* var,
static_cast<char*>(payload)), static_cast<char*>(payload)),
::grpc::Slice::STEAL_REF); ::grpc::Slice::STEAL_REF);
if (framework::ToVarType(var->Type()) == if (var->IsType<framework::SelectedRows>()) {
framework::proto::VarType_Type_SELECTED_ROWS) {
auto* slr = var->GetMutable<framework::SelectedRows>(); auto* slr = var->GetMutable<framework::SelectedRows>();
ProtoEncodeHelper e2(static_cast<char*>(buf), 128); ProtoEncodeHelper e2(static_cast<char*>(buf), 128);
// NOTE: rows is of type int64_t
size_t rows_memory_size = size_t rows_memory_size =
slr->rows().size() * framework::SizeOfType(typeid(int64_t)); slr->rows().size() * framework::SizeOfType(typeid(int64_t));
e2.WriteVarlengthBeginning(VarMsg::kRowsFieldNumber, rows_memory_size); e2.WriteVarlengthBeginning(VarMsg::kRowsFieldNumber, rows_memory_size);
...@@ -178,10 +173,7 @@ void SerializeToByteBuffer(const std::string& name, framework::Variable* var, ...@@ -178,10 +173,7 @@ void SerializeToByteBuffer(const std::string& name, framework::Variable* var,
grpc_slice_new_with_user_data( grpc_slice_new_with_user_data(
const_cast<void*>( const_cast<void*>(
reinterpret_cast<const void*>(slr->rows().data())), reinterpret_cast<const void*>(slr->rows().data())),
rows_memory_size, rows_memory_size, [](void* backing) {},
[](void* backing) {
// TODO(typhoonzero): add unref here, same as above.
},
const_cast<char*>( const_cast<char*>(
reinterpret_cast<const char*>(slr->rows().data()))), reinterpret_cast<const char*>(slr->rows().data()))),
::grpc::Slice::STEAL_REF); ::grpc::Slice::STEAL_REF);
......
...@@ -117,11 +117,11 @@ void RunTestLodTensor(platform::Place place, int from_type = 0) { ...@@ -117,11 +117,11 @@ void RunTestLodTensor(platform::Place place, int from_type = 0) {
// serialize var to ByteBuffer // serialize var to ByteBuffer
framework::Variable var; framework::Variable var;
auto* tensor = var.GetMutable<framework::LoDTensor>(); auto* tensor = var.GetMutable<framework::LoDTensor>();
tensor->Resize(framework::make_ddim({4, 8, 4, 2})); tensor->Resize(framework::make_ddim({512, 8, 4, 2}));
framework::LoD lod; framework::LoD lod;
lod.push_back(framework::Vector<size_t>({1, 3, 8})); lod.push_back(framework::Vector<size_t>({1, 3, 8}));
tensor->set_lod(lod); tensor->set_lod(lod);
int tensor_numel = 4 * 8 * 4 * 2; int tensor_numel = 512 * 8 * 4 * 2;
platform::DeviceContextPool& pool = platform::DeviceContextPool::Instance(); platform::DeviceContextPool& pool = platform::DeviceContextPool::Instance();
auto& ctx = *pool.Get(place); auto& ctx = *pool.Get(place);
tensor->mutable_data<float>(place); tensor->mutable_data<float>(place);
...@@ -142,7 +142,7 @@ void RunTestLodTensor(platform::Place place, int from_type = 0) { ...@@ -142,7 +142,7 @@ void RunTestLodTensor(platform::Place place, int from_type = 0) {
EXPECT_TRUE(varmsg.ParseFromString(tmp)); EXPECT_TRUE(varmsg.ParseFromString(tmp));
EXPECT_EQ(varmsg.varname(), "myvar"); EXPECT_EQ(varmsg.varname(), "myvar");
EXPECT_EQ(varmsg.type(), 0); EXPECT_EQ(varmsg.type(), 0);
EXPECT_EQ(varmsg.dims()[0], 4); EXPECT_EQ(varmsg.dims()[0], 512);
EXPECT_EQ(varmsg.dims()[1], 8); EXPECT_EQ(varmsg.dims()[1], 8);
EXPECT_EQ(varmsg.dims()[2], 4); EXPECT_EQ(varmsg.dims()[2], 4);
EXPECT_EQ(varmsg.dims()[3], 2); EXPECT_EQ(varmsg.dims()[3], 2);
......
...@@ -210,15 +210,15 @@ bool ParseLodData(::google::protobuf::io::CodedInputStream* input, ...@@ -210,15 +210,15 @@ bool ParseLodData(::google::protobuf::io::CodedInputStream* input,
} }
if (wt == WIRETYPE_LENGTH_DELIMITED) { if (wt == WIRETYPE_LENGTH_DELIMITED) {
int length = 0; int num_bytes = 0;
if (!input->ReadVarintSizeAsInt(&length)) { if (!input->ReadVarintSizeAsInt(&num_bytes)) {
return tag; return tag;
} }
int start_pos = input->CurrentPosition();
for (int i = 0; i < length; i++) { while (input->CurrentPosition() - start_pos < num_bytes) {
uint64_t v; uint64_t v;
if (!input->ReadVarint64(&v)) { if (!input->ReadVarint64(&v)) {
return false; return tag;
} }
lod->push_back(v); lod->push_back(v);
} }
...@@ -275,8 +275,8 @@ int VariableResponse::Parse(Source* source) { ...@@ -275,8 +275,8 @@ int VariableResponse::Parse(Source* source) {
break; break;
} }
case sendrecv::VariableMessage::kTypeFieldNumber: { case sendrecv::VariableMessage::kTypeFieldNumber: {
uint64_t v; uint32_t v;
if ((wt != WIRETYPE_VARINT) || !input.ReadVarint64(&v)) { if ((wt != WIRETYPE_VARINT) || !input.ReadVarint32(&v)) {
return tag; return tag;
} }
...@@ -284,8 +284,8 @@ int VariableResponse::Parse(Source* source) { ...@@ -284,8 +284,8 @@ int VariableResponse::Parse(Source* source) {
break; break;
} }
case sendrecv::VariableMessage::kDataTypeFieldNumber: { case sendrecv::VariableMessage::kDataTypeFieldNumber: {
uint64_t v = 0; uint32_t v = 0;
if ((wt != WIRETYPE_VARINT) || !input.ReadVarint64(&v)) { if ((wt != WIRETYPE_VARINT) || !input.ReadVarint32(&v)) {
return tag; return tag;
} }
...@@ -305,11 +305,12 @@ int VariableResponse::Parse(Source* source) { ...@@ -305,11 +305,12 @@ int VariableResponse::Parse(Source* source) {
// packed // packed
if (wt == WIRETYPE_LENGTH_DELIMITED) { if (wt == WIRETYPE_LENGTH_DELIMITED) {
int length = 0; int num_bytes = 0;
if (!input.ReadVarintSizeAsInt(&length)) { if (!input.ReadVarintSizeAsInt(&num_bytes)) {
return tag; return tag;
} }
for (int i = 0; i < length; i++) { int start_pos = input.CurrentPosition();
while (input.CurrentPosition() - start_pos < num_bytes) {
uint64_t v; uint64_t v;
if (!input.ReadVarint64(&v)) { if (!input.ReadVarint64(&v)) {
return tag; return tag;
...@@ -318,7 +319,6 @@ int VariableResponse::Parse(Source* source) { ...@@ -318,7 +319,6 @@ int VariableResponse::Parse(Source* source) {
} }
break; break;
} }
return tag; return tag;
} }
case sendrecv::VariableMessage::kLodLevelFieldNumber: { case sendrecv::VariableMessage::kLodLevelFieldNumber: {
...@@ -372,9 +372,9 @@ int VariableResponse::Parse(Source* source) { ...@@ -372,9 +372,9 @@ int VariableResponse::Parse(Source* source) {
meta_.varname() != "", meta_.varname() != "",
"meta info should be got first!"); "meta info should be got first!");
int length = 0; int num_bytes = 0;
if (wt != WIRETYPE_LENGTH_DELIMITED || if (wt != WIRETYPE_LENGTH_DELIMITED ||
!ReadVarintSizeAsInt(&input, &length)) { !ReadVarintSizeAsInt(&input, &num_bytes)) {
return tag; return tag;
} }
...@@ -382,14 +382,14 @@ int VariableResponse::Parse(Source* source) { ...@@ -382,14 +382,14 @@ int VariableResponse::Parse(Source* source) {
if (meta_.type() == sendrecv::LOD_TENSOR) { if (meta_.type() == sendrecv::LOD_TENSOR) {
PADDLE_ENFORCE(meta_.lod_size() >= 0, PADDLE_ENFORCE(meta_.lod_size() >= 0,
"lod info should be got first!"); "lod info should be got first!");
if (!CopyLodTensorData(&input, *dev_ctx_, dims, length)) { if (!CopyLodTensorData(&input, *dev_ctx_, dims, num_bytes)) {
return tag; return tag;
} }
break; break;
} }
if (meta_.type() == sendrecv::SELECTED_ROWS) { if (meta_.type() == sendrecv::SELECTED_ROWS) {
if (!CopySelectRowsTensorData(&input, *dev_ctx_, dims, length)) { if (!CopySelectRowsTensorData(&input, *dev_ctx_, dims, num_bytes)) {
return tag; return tag;
} }
break; break;
...@@ -403,13 +403,13 @@ int VariableResponse::Parse(Source* source) { ...@@ -403,13 +403,13 @@ int VariableResponse::Parse(Source* source) {
meta_.varname() != "", meta_.varname() != "",
"meta info should be got first!"); "meta info should be got first!");
int length = 0; int num_bytes = 0;
if (wt != WIRETYPE_LENGTH_DELIMITED || if (wt != WIRETYPE_LENGTH_DELIMITED ||
!ReadVarintSizeAsInt(&input, &length)) { !ReadVarintSizeAsInt(&input, &num_bytes)) {
return tag; return tag;
} }
if (!CopySelectRowsData(&input, *dev_ctx_, length)) { if (!CopySelectRowsData(&input, *dev_ctx_, num_bytes)) {
return tag; return tag;
} }
break; break;
......
...@@ -78,8 +78,7 @@ class DetectionMAPOp : public framework::OperatorWithKernel { ...@@ -78,8 +78,7 @@ class DetectionMAPOp : public framework::OperatorWithKernel {
class DetectionMAPOpMaker : public framework::OpProtoAndCheckerMaker { class DetectionMAPOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
DetectionMAPOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("DetectRes", AddInput("DetectRes",
"(LoDTensor) A 2-D LoDTensor with shape [M, 6] represents the " "(LoDTensor) A 2-D LoDTensor with shape [M, 6] represents the "
"detections. Each row has 6 values: " "detections. Each row has 6 values: "
......
...@@ -37,8 +37,7 @@ class DropoutOp : public framework::OperatorWithKernel { ...@@ -37,8 +37,7 @@ class DropoutOp : public framework::OperatorWithKernel {
class DropoutOpMaker : public framework::OpProtoAndCheckerMaker { class DropoutOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
DropoutOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of dropout op."); AddInput("X", "The input of dropout op.");
AddOutput("Out", "The output of dropout op."); AddOutput("Out", "The output of dropout op.");
AddOutput("Mask", "The random sampled dropout mask.").AsIntermediate(); AddOutput("Mask", "The random sampled dropout mask.").AsIntermediate();
......
...@@ -49,8 +49,7 @@ class EditDistanceOp : public framework::OperatorWithKernel { ...@@ -49,8 +49,7 @@ class EditDistanceOp : public framework::OperatorWithKernel {
class EditDistanceOpMaker : public framework::OpProtoAndCheckerMaker { class EditDistanceOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
EditDistanceOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Hyps", AddInput("Hyps",
"(2-D LoDTensor<int64_t>, 2nd dim. equal to 1) " "(2-D LoDTensor<int64_t>, 2nd dim. equal to 1) "
"The indices for hypothesis strings."); "The indices for hypothesis strings.");
......
...@@ -14,26 +14,8 @@ limitations under the License. */ ...@@ -14,26 +14,8 @@ limitations under the License. */
#include "paddle/fluid/operators/elementwise_add_op.h" #include "paddle/fluid/operators/elementwise_add_op.h"
#include "paddle/fluid/operators/elementwise_op.h" #include "paddle/fluid/operators/elementwise_op.h"
namespace paddle {
namespace operators {
class ElementwiseAddOpMaker : public ElementwiseOpMaker {
public:
ElementwiseAddOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: ElementwiseOpMaker(proto, op_checker) {
SetComment("Add", "Out = X + Y");
AddComment(comment_);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_add, ops::ElementwiseOp, REGISTER_ELEMWISE_OP(elementwise_add, "Add", "Out = X + Y");
ops::ElementwiseAddOpMaker, ops::ElementwiseOpInferVarType,
paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_add_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_add, elementwise_add,
ops::ElementwiseAddKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseAddKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -14,26 +14,8 @@ limitations under the License. */ ...@@ -14,26 +14,8 @@ limitations under the License. */
#include "paddle/fluid/operators/elementwise_div_op.h" #include "paddle/fluid/operators/elementwise_div_op.h"
#include "paddle/fluid/operators/elementwise_op.h" #include "paddle/fluid/operators/elementwise_op.h"
namespace paddle {
namespace operators {
class ElementwiseDivOpMaker : public ElementwiseOpMaker {
public:
ElementwiseDivOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: ElementwiseOpMaker(proto, op_checker) {
SetComment("Div", "Out = X / Y");
AddComment(comment_);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_div, ops::ElementwiseOp, REGISTER_ELEMWISE_OP(elementwise_div, "Div", "Out = X / Y");
ops::ElementwiseDivOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_div_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_div, elementwise_div,
ops::ElementwiseDivKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseDivKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -14,25 +14,8 @@ limitations under the License. */ ...@@ -14,25 +14,8 @@ limitations under the License. */
#include "paddle/fluid/operators/elementwise_max_op.h" #include "paddle/fluid/operators/elementwise_max_op.h"
#include "paddle/fluid/operators/elementwise_op.h" #include "paddle/fluid/operators/elementwise_op.h"
namespace paddle {
namespace operators {
class ElementwiseMaxOpMaker : public ElementwiseOpMaker {
public:
ElementwiseMaxOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: ElementwiseOpMaker(proto, op_checker) {
SetComment("Max", "Out = max(X, Y)");
AddComment(comment_);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_max, ops::ElementwiseOp, REGISTER_ELEMWISE_OP(elementwise_max, "Max", "Out = max(X, Y)");
ops::ElementwiseMaxOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_max_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_max, elementwise_max,
ops::ElementwiseMaxKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseMaxKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -14,25 +14,8 @@ limitations under the License. */ ...@@ -14,25 +14,8 @@ limitations under the License. */
#include "paddle/fluid/operators/elementwise_min_op.h" #include "paddle/fluid/operators/elementwise_min_op.h"
#include "paddle/fluid/operators/elementwise_op.h" #include "paddle/fluid/operators/elementwise_op.h"
namespace paddle {
namespace operators {
class ElementwiseMinOpMaker : public ElementwiseOpMaker {
public:
ElementwiseMinOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: ElementwiseOpMaker(proto, op_checker) {
SetComment("Max", "Out = min(X, Y)");
AddComment(comment_);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_min, ops::ElementwiseOp, REGISTER_ELEMWISE_OP(elementwise_min, "Min", "Out = min(X, Y)");
ops::ElementwiseMinOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_min_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_min, elementwise_min,
ops::ElementwiseMinKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseMinKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -14,27 +14,8 @@ limitations under the License. */ ...@@ -14,27 +14,8 @@ limitations under the License. */
#include "paddle/fluid/operators/elementwise_mul_op.h" #include "paddle/fluid/operators/elementwise_mul_op.h"
#include "paddle/fluid/operators/elementwise_op.h" #include "paddle/fluid/operators/elementwise_op.h"
namespace paddle {
namespace operators {
class ElementwiseMulOpMaker : public ElementwiseOpMaker {
public:
ElementwiseMulOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: ElementwiseOpMaker(proto, op_checker) {
SetComment("Mul", "Out = X \\odot\\ Y");
AddComment(comment_);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_mul, ops::ElementwiseOp, REGISTER_ELEMWISE_OP(elementwise_mul, "Mul", "Out = X \\odot\\ Y");
ops::ElementwiseMulOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_mul_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_mul, elementwise_mul,
ops::ElementwiseMulKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseMulKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -54,8 +54,7 @@ class ElementwiseOpInferVarType : public framework::VarTypeInference { ...@@ -54,8 +54,7 @@ class ElementwiseOpInferVarType : public framework::VarTypeInference {
class ElementwiseOpMaker : public framework::OpProtoAndCheckerMaker { class ElementwiseOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ElementwiseOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() final {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor), The first input tensor of elementwise op."); AddInput("X", "(Tensor), The first input tensor of elementwise op.");
AddInput("Y", "(Tensor), The second input tensor of elementwise op."); AddInput("Y", "(Tensor), The second input tensor of elementwise op.");
AddOutput("Out", "The output of elementwise op."); AddOutput("Out", "The output of elementwise op.");
...@@ -64,12 +63,12 @@ class ElementwiseOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -64,12 +63,12 @@ class ElementwiseOpMaker : public framework::OpProtoAndCheckerMaker {
"for broadcasting Y onto X.") "for broadcasting Y onto X.")
.SetDefault(-1) .SetDefault(-1)
.EqualGreaterThan(-1); .EqualGreaterThan(-1);
comment_ = R"DOC( AddComment(string::Sprintf(R"DOC(
Limited Elementwise {name} Operator. Limited Elementwise %s Operator.
The equation is: The equation is:
$${equation}$$ $$%s$$
$X$ is a tensor of any dimension and the dimensions of tensor $Y$ must be $X$ is a tensor of any dimension and the dimensions of tensor $Y$ must be
smaller than or equal to the dimensions of $X$. smaller than or equal to the dimensions of $X$.
...@@ -100,26 +99,13 @@ For example ...@@ -100,26 +99,13 @@ For example
Either of the inputs $X$ and $Y$ or none can carry the LoD (Level of Details) Either of the inputs $X$ and $Y$ or none can carry the LoD (Level of Details)
information. However, the output only shares the LoD information with input $X$. information. However, the output only shares the LoD information with input $X$.
)DOC"; )DOC",
AddComment(comment_); GetName(), GetEquation()));
} }
protected: protected:
std::string comment_; virtual std::string GetName() const = 0;
virtual std::string GetEquation() const = 0;
void Replace(std::string* src, std::string from, std::string to) {
std::size_t len_from = std::strlen(from.c_str());
std::size_t len_to = std::strlen(to.c_str());
for (std::size_t pos = src->find(from); pos != std::string::npos;
pos = src->find(from, pos + len_to)) {
src->replace(pos, len_from, to);
}
}
void SetComment(std::string name, std::string equation) {
Replace(&comment_, "{name}", name);
Replace(&comment_, "{equation}", equation);
}
}; };
class ElementwiseOpGrad : public framework::OperatorWithKernel { class ElementwiseOpGrad : public framework::OperatorWithKernel {
...@@ -152,3 +138,16 @@ class ElementwiseOpGrad : public framework::OperatorWithKernel { ...@@ -152,3 +138,16 @@ class ElementwiseOpGrad : public framework::OperatorWithKernel {
}; };
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
#define REGISTER_ELEMWISE_OP(op_type, op_name, equation) \
class __ElemwiseOp##op_type##Maker__ \
: public ::paddle::operators::ElementwiseOpMaker { \
protected: \
virtual std::string GetName() const { return op_name; } \
virtual std::string GetEquation() const { return equation; } \
}; \
REGISTER_OPERATOR(op_type, ::paddle::operators::ElementwiseOp, \
__ElemwiseOp##op_type##Maker__, \
::paddle::operators::ElementwiseOpInferVarType, \
::paddle::framework::DefaultGradOpDescMaker<true>); \
REGISTER_OPERATOR(op_type##_grad, ::paddle::operators::ElementwiseOpGrad)
...@@ -13,17 +13,15 @@ See the License for the specific language governing permissions and ...@@ -13,17 +13,15 @@ See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/fluid/operators/elementwise_pow_op.h" #include "paddle/fluid/operators/elementwise_pow_op.h"
#include <string>
#include "paddle/fluid/operators/elementwise_op.h" #include "paddle/fluid/operators/elementwise_op.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
class ElementwisePowOpMaker : public ElementwiseOpMaker { class ElementwisePowOpMaker : public ElementwiseOpMaker {
public: protected:
ElementwisePowOpMaker(OpProto* proto, OpAttrChecker* op_checker) std::string GetName() const override { return "Pow"; }
: ElementwiseOpMaker(proto, op_checker) { std::string GetEquation() const override { return "Out = X ^ Y"; }
SetComment("Pow", "Out = X ^ Y");
AddComment(comment_);
}
}; };
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
......
...@@ -14,25 +14,8 @@ limitations under the License. */ ...@@ -14,25 +14,8 @@ limitations under the License. */
#include "paddle/fluid/operators/elementwise_sub_op.h" #include "paddle/fluid/operators/elementwise_sub_op.h"
#include "paddle/fluid/operators/elementwise_op.h" #include "paddle/fluid/operators/elementwise_op.h"
namespace paddle {
namespace operators {
class ElementwiseSubOpMaker : public ElementwiseOpMaker {
public:
ElementwiseSubOpMaker(OpProto* proto, OpAttrChecker* op_checker)
: ElementwiseOpMaker(proto, op_checker) {
SetComment("Sub", "Out = X - Y");
AddComment(comment_);
}
};
} // namespace operators
} // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_sub, ops::ElementwiseOp, REGISTER_ELEMWISE_OP(elementwise_sub, "Sub", "Out = X - Y");
ops::ElementwiseSubOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_sub_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_sub, elementwise_sub,
ops::ElementwiseSubKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseSubKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -56,8 +56,7 @@ class ExpandOp : public framework::OperatorWithKernel { ...@@ -56,8 +56,7 @@ class ExpandOp : public framework::OperatorWithKernel {
class ExpandOpMaker : public framework::OpProtoAndCheckerMaker { class ExpandOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ExpandOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor, default Tensor<float>). A tensor with rank in [1, 6]." "(Tensor, default Tensor<float>). A tensor with rank in [1, 6]."
"X is the input to be expanded."); "X is the input to be expanded.");
......
...@@ -72,8 +72,7 @@ framework::OpKernelType FCOpGrad::GetExpectedKernelType( ...@@ -72,8 +72,7 @@ framework::OpKernelType FCOpGrad::GetExpectedKernelType(
layout, library); layout, library);
} }
FCOpMaker::FCOpMaker(OpProto* proto, OpAttrChecker* op_checker) void FCOpMaker::Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input", "(Tensor) The input tensor of fully connected operator. "); AddInput("Input", "(Tensor) The input tensor of fully connected operator. ");
AddInput("W", "(Tensor), The second input tensor of fc op."); AddInput("W", "(Tensor), The second input tensor of fc op.");
AddOutput("Out", "(Tensor) The output tensor of fully connected operator. "); AddOutput("Out", "(Tensor) The output tensor of fully connected operator. ");
......
...@@ -45,7 +45,7 @@ class FCOpGrad : public framework::OperatorWithKernel { ...@@ -45,7 +45,7 @@ class FCOpGrad : public framework::OperatorWithKernel {
class FCOpMaker : public framework::OpProtoAndCheckerMaker { class FCOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
FCOpMaker(OpProto* proto, OpAttrChecker* op_checker); void Make() override;
}; };
} // namespace operators } // namespace operators
......
...@@ -66,8 +66,7 @@ class FeedOp : public framework::OperatorBase { ...@@ -66,8 +66,7 @@ class FeedOp : public framework::OperatorBase {
class FeedOpInfoMaker : public framework::OpProtoAndCheckerMaker { class FeedOpInfoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
FeedOpInfoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of feed op"); AddInput("X", "The input of feed op");
AddOutput("Out", "The output of feed op"); AddOutput("Out", "The output of feed op");
AddAttr<int>("col", "(int) The column of feed"); AddAttr<int>("col", "(int) The column of feed");
......
...@@ -66,8 +66,7 @@ class FetchOp : public framework::OperatorBase { ...@@ -66,8 +66,7 @@ class FetchOp : public framework::OperatorBase {
class FetchOpInfoMaker : public framework::OpProtoAndCheckerMaker { class FetchOpInfoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
FetchOpInfoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of fetch op"); AddInput("X", "The input of fetch op");
AddOutput("Out", "The output of fetch op"); AddOutput("Out", "The output of fetch op");
AddAttr<int>("col", "(int) The column of fetch"); AddAttr<int>("col", "(int) The column of fetch");
......
...@@ -30,9 +30,8 @@ class FillConstantBatchSizeLikeOp : public BatchSizeLikeOp { ...@@ -30,9 +30,8 @@ class FillConstantBatchSizeLikeOp : public BatchSizeLikeOp {
}; };
class FillConstantBatchSizeLikeOpMaker : public BatchSizeLikeOpMaker { class FillConstantBatchSizeLikeOpMaker : public BatchSizeLikeOpMaker {
public: protected:
FillConstantBatchSizeLikeOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Apply() override {
: BatchSizeLikeOpMaker(proto, op_checker) {
AddAttr<int>("dtype", AddAttr<int>("dtype",
"(int, default 5 (FP32)) " "(int, default 5 (FP32)) "
"Output data type") "Output data type")
......
...@@ -59,8 +59,7 @@ class FillConstantOp : public framework::OperatorBase { ...@@ -59,8 +59,7 @@ class FillConstantOp : public framework::OperatorBase {
class FillConstantOpMaker : public framework::OpProtoAndCheckerMaker { class FillConstantOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
FillConstantOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddAttr<int>("dtype", AddAttr<int>("dtype",
"(int, default 5 (FP32)) " "(int, default 5 (FP32)) "
"Output data type") "Output data type")
......
...@@ -82,8 +82,7 @@ class FillOp : public framework::OperatorBase { ...@@ -82,8 +82,7 @@ class FillOp : public framework::OperatorBase {
class FillOpMaker : public framework::OpProtoAndCheckerMaker { class FillOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
FillOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddComment(R"DOC(Fill operator AddComment(R"DOC(Fill operator
Fill an tensor with `value` and `shape`. The type of the tensor is specify by Fill an tensor with `value` and `shape`. The type of the tensor is specify by
......
...@@ -33,8 +33,7 @@ class FillZerosLikeOp : public framework::OperatorWithKernel { ...@@ -33,8 +33,7 @@ class FillZerosLikeOp : public framework::OperatorWithKernel {
class FillZerosLikeOpMaker : public framework::OpProtoAndCheckerMaker { class FillZerosLikeOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
FillZerosLikeOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of fill-zeros-like op."); AddInput("X", "The input of fill-zeros-like op.");
AddOutput("Out", "The variable will be filled up with zeros."); AddOutput("Out", "The variable will be filled up with zeros.");
AddComment(R"DOC( AddComment(R"DOC(
......
...@@ -64,8 +64,7 @@ class FTRLOp : public framework::OperatorWithKernel { ...@@ -64,8 +64,7 @@ class FTRLOp : public framework::OperatorWithKernel {
class FTRLOpMaker : public framework::OpProtoAndCheckerMaker { class FTRLOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
FTRLOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", AddInput("Param",
"(Tensor, default Tensor<float>) " "(Tensor, default Tensor<float>) "
"Input parameter value that has to be updated."); "Input parameter value that has to be updated.");
......
...@@ -67,8 +67,7 @@ class GatherGradOp : public framework::OperatorWithKernel { ...@@ -67,8 +67,7 @@ class GatherGradOp : public framework::OperatorWithKernel {
class GatherOpMaker : public framework::OpProtoAndCheckerMaker { class GatherOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
GatherOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The source input of gather op"); AddInput("X", "The source input of gather op");
AddInput("Index", "The index input of gather op"); AddInput("Index", "The index input of gather op");
AddOutput("Out", "The output of gather op"); AddOutput("Out", "The output of gather op");
......
...@@ -32,9 +32,8 @@ class GaussianRandomBatchSizeLikeOp : public BatchSizeLikeOp { ...@@ -32,9 +32,8 @@ class GaussianRandomBatchSizeLikeOp : public BatchSizeLikeOp {
}; };
class GaussianRandomBatchSizeLikeOpMaker : public BatchSizeLikeOpMaker { class GaussianRandomBatchSizeLikeOpMaker : public BatchSizeLikeOpMaker {
public: protected:
GaussianRandomBatchSizeLikeOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Apply() override {
: BatchSizeLikeOpMaker(proto, op_checker) {
AddAttr<float>("mean", AddAttr<float>("mean",
"(float, default 0.0) " "(float, default 0.0) "
"mean of random tensor.") "mean of random tensor.")
......
...@@ -70,8 +70,7 @@ class GaussianRandomOp : public framework::OperatorWithKernel { ...@@ -70,8 +70,7 @@ class GaussianRandomOp : public framework::OperatorWithKernel {
class GaussianRandomOpMaker : public framework::OpProtoAndCheckerMaker { class GaussianRandomOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
GaussianRandomOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("Out", "Output matrix of gaussian random op"); AddOutput("Out", "Output matrix of gaussian random op");
AddAttr<std::vector<int>>("shape", AddAttr<std::vector<int>>("shape",
......
...@@ -78,8 +78,7 @@ class GetPlacesOp : public framework::OperatorBase { ...@@ -78,8 +78,7 @@ class GetPlacesOp : public framework::OperatorBase {
class GetPlacesOpProtoMaker : public framework::OpProtoAndCheckerMaker { class GetPlacesOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
GetPlacesOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("Out", "vector of Place"); AddOutput("Out", "vector of Place");
AddAttr<int>("device_count", "device count").SetDefault(0); AddAttr<int>("device_count", "device count").SetDefault(0);
AddAttr<std::string>("device_type", "device type") AddAttr<std::string>("device_type", "device type")
......
...@@ -89,8 +89,7 @@ class GoOp : public framework::OperatorBase { ...@@ -89,8 +89,7 @@ class GoOp : public framework::OperatorBase {
class GoOpMaker : public framework::OpProtoAndCheckerMaker { class GoOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
GoOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(kX, AddInput(kX,
"A set of variables, which are required by operators inside the " "A set of variables, which are required by operators inside the "
"block of Go Op.") "block of Go Op.")
......
...@@ -71,8 +71,7 @@ class GRUOp : public framework::OperatorWithKernel { ...@@ -71,8 +71,7 @@ class GRUOp : public framework::OperatorWithKernel {
class GRUOpMaker : public framework::OpProtoAndCheckerMaker { class GRUOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
GRUOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input", AddInput("Input",
"(LoDTensor) The first input is a LodTensor, which supports " "(LoDTensor) The first input is a LodTensor, which supports "
"variable-time length input sequence. The underlying tensor in " "variable-time length input sequence. The underlying tensor in "
......
...@@ -71,8 +71,7 @@ class GRUUnitOp : public framework::OperatorWithKernel { ...@@ -71,8 +71,7 @@ class GRUUnitOp : public framework::OperatorWithKernel {
class GRUUnitOpMaker : public framework::OpProtoAndCheckerMaker { class GRUUnitOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
GRUUnitOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input", AddInput("Input",
"(Tensor) Matrix with shape [batch_size, frame_size * 3] for the " "(Tensor) Matrix with shape [batch_size, frame_size * 3] for the "
"input."); "input.");
......
...@@ -46,8 +46,7 @@ class HingeLossOp : public framework::OperatorWithKernel { ...@@ -46,8 +46,7 @@ class HingeLossOp : public framework::OperatorWithKernel {
template <typename AttrType> template <typename AttrType>
class HingeLossOpMaker : public framework::OpProtoAndCheckerMaker { class HingeLossOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
HingeLossOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Logits", AddInput("Logits",
"The input value (Logits) of Hinge loss op." "The input value (Logits) of Hinge loss op."
"Logits is a 2-D tensor with shape [batch_size, 1]."); "Logits is a 2-D tensor with shape [batch_size, 1].");
......
...@@ -45,8 +45,7 @@ class HuberLossOp : public framework::OperatorWithKernel { ...@@ -45,8 +45,7 @@ class HuberLossOp : public framework::OperatorWithKernel {
template <typename AttrType> template <typename AttrType>
class HuberLossOpMaker : public framework::OpProtoAndCheckerMaker { class HuberLossOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
HuberLossOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"The input value of huber loss op." "The input value of huber loss op."
"X is a 2-D tensor with shape [batch_size, 1]."); "X is a 2-D tensor with shape [batch_size, 1].");
......
...@@ -54,8 +54,7 @@ class Im2SequenceOp : public framework::OperatorWithKernel { ...@@ -54,8 +54,7 @@ class Im2SequenceOp : public framework::OperatorWithKernel {
class Im2SequenceOpMaker : public framework::OpProtoAndCheckerMaker { class Im2SequenceOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Im2SequenceOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor) The input tensor has NCHW format." "(Tensor) The input tensor has NCHW format."
"N: batch size" "N: batch size"
......
...@@ -47,8 +47,7 @@ class IncrementOp : public framework::OperatorWithKernel { ...@@ -47,8 +47,7 @@ class IncrementOp : public framework::OperatorWithKernel {
class IncrementOpMaker : public framework::OpProtoAndCheckerMaker { class IncrementOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
IncrementOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) The input tensor of increment operator"); AddInput("X", "(Tensor) The input tensor of increment operator");
AddOutput("Out", "(Tensor) The output tensor of increment operator."); AddOutput("Out", "(Tensor) The output tensor of increment operator.");
AddAttr<float>("step", AddAttr<float>("step",
......
...@@ -42,8 +42,7 @@ class IOUSimilarityOp : public framework::OperatorWithKernel { ...@@ -42,8 +42,7 @@ class IOUSimilarityOp : public framework::OperatorWithKernel {
class IOUSimilarityOpMaker : public framework::OpProtoAndCheckerMaker { class IOUSimilarityOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
IOUSimilarityOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor, default LoDTensor<float>) " "(LoDTensor, default LoDTensor<float>) "
"Box list X is a 2-D LoDTensor with shape [N, 4] holds N boxes, " "Box list X is a 2-D LoDTensor with shape [N, 4] holds N boxes, "
......
...@@ -48,8 +48,7 @@ class IsEmptyOp : public framework::OperatorBase { ...@@ -48,8 +48,7 @@ class IsEmptyOp : public framework::OperatorBase {
class IsEmptyOpProtoMaker : public framework::OpProtoAndCheckerMaker { class IsEmptyOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
IsEmptyOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(kInput, "(Tensor) Tensor which is to be checked."); AddInput(kInput, "(Tensor) Tensor which is to be checked.");
AddOutput(kOutput, "(Tensor) a boolean Tensor that indicate empty or not."); AddOutput(kOutput, "(Tensor) a boolean Tensor that indicate empty or not.");
AddComment(R"DOC( AddComment(R"DOC(
......
...@@ -48,8 +48,7 @@ class L1NormGradOp : public framework::OperatorWithKernel { ...@@ -48,8 +48,7 @@ class L1NormGradOp : public framework::OperatorWithKernel {
class L1NormOpMaker : public framework::OpProtoAndCheckerMaker { class L1NormOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
L1NormOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) The input of l1_norm op."); AddInput("X", "(Tensor) The input of l1_norm op.");
AddOutput("Out", "(Scalar) The output of l1_norm op."); AddOutput("Out", "(Scalar) The output of l1_norm op.");
AddComment(R"DOC( AddComment(R"DOC(
......
...@@ -47,8 +47,7 @@ class LabelSmoothOp : public framework::OperatorWithKernel { ...@@ -47,8 +47,7 @@ class LabelSmoothOp : public framework::OperatorWithKernel {
class LabelSmoothOpMaker : public framework::OpProtoAndCheckerMaker { class LabelSmoothOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LabelSmoothOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor) The input labels of LabelSmooth operator. This " "(LoDTensor) The input labels of LabelSmooth operator. This "
"input can be batched labels in one-hot encoding or output from " "input can be batched labels in one-hot encoding or output from "
......
...@@ -61,8 +61,7 @@ class LayerNormOp : public framework::OperatorWithKernel { ...@@ -61,8 +61,7 @@ class LayerNormOp : public framework::OperatorWithKernel {
class LayerNormOpMaker : public framework::OpProtoAndCheckerMaker { class LayerNormOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LayerNormOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(LoDTensor) The input tensor."); AddInput("X", "(LoDTensor) The input tensor.");
AddInput("Scale", AddInput("Scale",
"(Tensor, optional) Scale is a 1-dimensional tensor of size " "(Tensor, optional) Scale is a 1-dimensional tensor of size "
......
...@@ -19,8 +19,7 @@ namespace operators { ...@@ -19,8 +19,7 @@ namespace operators {
class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker { class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LinearChainCRFOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Emission", AddInput("Emission",
"(LoDTensor, default LoDTensor<float>) " "(LoDTensor, default LoDTensor<float>) "
"A 2-D LoDTensor with shape [N x D], where N is the size of the " "A 2-D LoDTensor with shape [N x D], where N is the size of the "
......
...@@ -343,8 +343,7 @@ void ListenAndServOp::RunImpl(const framework::Scope &scope, ...@@ -343,8 +343,7 @@ void ListenAndServOp::RunImpl(const framework::Scope &scope,
class ListenAndServOpMaker : public framework::OpProtoAndCheckerMaker { class ListenAndServOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ListenAndServOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) Variables that server recv.").AsDuplicable(); AddInput("X", "(Tensor) Variables that server recv.").AsDuplicable();
AddComment(R"DOC( AddComment(R"DOC(
ListenAndServ operator ListenAndServ operator
......
...@@ -77,8 +77,7 @@ class LoadCombineOp : public framework::OperatorBase { ...@@ -77,8 +77,7 @@ class LoadCombineOp : public framework::OperatorBase {
class LoadCombineOpProtoMaker : public framework::OpProtoAndCheckerMaker { class LoadCombineOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LoadCombineOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput( AddOutput(
"Out", "Out",
"(vector) The output LoDTensors that will be read from the input file.") "(vector) The output LoDTensors that will be read from the input file.")
......
...@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and ...@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include <fstream> #include <fstream>
#include "paddle/fluid/framework/data_type_transform.h"
#include "paddle/fluid/framework/op_registry.h" #include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/device_context.h" #include "paddle/fluid/platform/device_context.h"
#include "paddle/fluid/platform/profiler.h" #include "paddle/fluid/platform/profiler.h"
...@@ -46,14 +47,41 @@ class LoadOp : public framework::OperatorBase { ...@@ -46,14 +47,41 @@ class LoadOp : public framework::OperatorBase {
auto *tensor = out_var->GetMutable<framework::LoDTensor>(); auto *tensor = out_var->GetMutable<framework::LoDTensor>();
DeserializeFromStream(fin, tensor, *dev_ctx); DeserializeFromStream(fin, tensor, *dev_ctx);
auto load_as_fp16 = Attr<bool>("load_as_fp16");
auto in_dtype = framework::ToDataType(tensor->type());
auto out_dtype = load_as_fp16 ? framework::proto::VarType::FP16 : in_dtype;
if (in_dtype != out_dtype) {
// convert to float16 tensor
auto in_kernel_type = framework::OpKernelType(in_dtype, place);
auto out_kernel_type = framework::OpKernelType(out_dtype, place);
framework::LoDTensor fp16_tensor;
// copy LoD info to the new tensor
fp16_tensor.set_lod(tensor->lod());
framework::TransDataType(in_kernel_type, out_kernel_type, *tensor,
&fp16_tensor);
// reset output tensor
out_var->Clear();
tensor = out_var->GetMutable<framework::LoDTensor>();
tensor->set_lod(fp16_tensor.lod());
tensor->ShareDataWith(fp16_tensor);
}
} }
}; };
class LoadOpProtoMaker : public framework::OpProtoAndCheckerMaker { class LoadOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LoadOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("Out", "(Tensor) The tensor need to be loaded"); AddOutput("Out", "(Tensor) The tensor need to be loaded");
AddAttr<bool>(
"load_as_fp16",
"(boolean, default false)"
"If true, the tensor will be first loaded and then "
"converted to float16 data type. Otherwise, the tensor will be "
"directly loaded without data type conversion.")
.SetDefault(false);
AddAttr<std::string>("file_path", AddAttr<std::string>("file_path",
"(string) " "(string) "
"Variable will be loaded from \"file_path\".") "Variable will be loaded from \"file_path\".")
......
...@@ -40,8 +40,7 @@ class LoDArrayLengthOp : public framework::OperatorBase { ...@@ -40,8 +40,7 @@ class LoDArrayLengthOp : public framework::OperatorBase {
class LoDArrayLengthProtoMaker : public framework::OpProtoAndCheckerMaker { class LoDArrayLengthProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LoDArrayLengthProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(LoDTensorArray) The input tensor array."); AddInput("X", "(LoDTensorArray) The input tensor array.");
AddOutput("Out", "(Tensor) 1x1 CPU Tensor of length, int64_t"); AddOutput("Out", "(Tensor) 1x1 CPU Tensor of length, int64_t");
AddComment(R"DOC( AddComment(R"DOC(
......
...@@ -38,8 +38,7 @@ class LoDRankTableOp : public framework::OperatorBase { ...@@ -38,8 +38,7 @@ class LoDRankTableOp : public framework::OperatorBase {
class LoDRankTableOpProtoMaker : public framework::OpProtoAndCheckerMaker { class LoDRankTableOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LoDRankTableOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor) input lod tensor, must contain lod information."); "(LoDTensor) input lod tensor, must contain lod information.");
AddOutput("Out", "(LoDRankTable) The rank table of specific level."); AddOutput("Out", "(LoDRankTable) The rank table of specific level.");
......
...@@ -47,8 +47,7 @@ class LoDResetOp : public framework::OperatorWithKernel { ...@@ -47,8 +47,7 @@ class LoDResetOp : public framework::OperatorWithKernel {
class LoDResetOpMaker : public framework::OpProtoAndCheckerMaker { class LoDResetOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LoDResetOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor, LoDTensor) Input variable of LoDResetOp which " "(Tensor, LoDTensor) Input variable of LoDResetOp which "
"could be a Tensor or LoDTensor, where the data of output " "could be a Tensor or LoDTensor, where the data of output "
......
...@@ -105,8 +105,7 @@ class LoDTensorToArrayOp : public framework::OperatorBase { ...@@ -105,8 +105,7 @@ class LoDTensorToArrayOp : public framework::OperatorBase {
class LoDTensorToArrayOpProtoMaker : public framework::OpProtoAndCheckerMaker { class LoDTensorToArrayOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LoDTensorToArrayOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", ""); AddInput("X", "");
AddInput("RankTable", ""); AddInput("RankTable", "");
AddOutput("Out", ""); AddOutput("Out", "");
......
...@@ -46,8 +46,7 @@ class LogLossOp : public framework::OperatorWithKernel { ...@@ -46,8 +46,7 @@ class LogLossOp : public framework::OperatorWithKernel {
template <typename AttrType> template <typename AttrType>
class LogLossOpMaker : public framework::OpProtoAndCheckerMaker { class LogLossOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LogLossOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Predicted", AddInput("Predicted",
"The input value (Predicted) of Log loss op." "The input value (Predicted) of Log loss op."
"Predicted is a 2-D tensor with shape [batch_size, 1]."); "Predicted is a 2-D tensor with shape [batch_size, 1].");
......
...@@ -21,8 +21,7 @@ namespace operators { ...@@ -21,8 +21,7 @@ namespace operators {
template <typename OpComment> template <typename OpComment>
class BinaryLogicalOpProtoMaker : public framework::OpProtoAndCheckerMaker { class BinaryLogicalOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
BinaryLogicalOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
OpComment comment; OpComment comment;
AddInput("X", AddInput("X",
string::Sprintf("(LoDTensor) Left hand operand of %s operator", string::Sprintf("(LoDTensor) Left hand operand of %s operator",
...@@ -45,8 +44,7 @@ Each element of Out is calculated by %s ...@@ -45,8 +44,7 @@ Each element of Out is calculated by %s
template <typename OpComment> template <typename OpComment>
class UnaryLogicalOpProtoMaker : public framework::OpProtoAndCheckerMaker { class UnaryLogicalOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
UnaryLogicalOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
OpComment comment; OpComment comment;
AddInput("X", string::Sprintf("(LoDTensor) Operand of %s operator", AddInput("X", string::Sprintf("(LoDTensor) Operand of %s operator",
comment.type)); comment.type));
......
...@@ -105,8 +105,7 @@ class LookupSparseTableOp : public framework::OperatorBase { ...@@ -105,8 +105,7 @@ class LookupSparseTableOp : public framework::OperatorBase {
class LookupSparseTableOpMaker : public framework::OpProtoAndCheckerMaker { class LookupSparseTableOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LookupSparseTableOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("W", AddInput("W",
"(SelectedRows) The input represents embedding table, " "(SelectedRows) The input represents embedding table, "
"which is a learnable parameter."); "which is a learnable parameter.");
......
...@@ -58,8 +58,7 @@ class LookupTableOp : public framework::OperatorWithKernel { ...@@ -58,8 +58,7 @@ class LookupTableOp : public framework::OperatorWithKernel {
class LookupTableOpMaker : public framework::OpProtoAndCheckerMaker { class LookupTableOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LookupTableOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("W", AddInput("W",
"(Tensor) The input represents embedding tensors, " "(Tensor) The input represents embedding tensors, "
"which is a learnable parameter."); "which is a learnable parameter.");
......
...@@ -169,8 +169,7 @@ class LRNOp : public framework::OperatorWithKernel { ...@@ -169,8 +169,7 @@ class LRNOp : public framework::OperatorWithKernel {
template <typename T> template <typename T>
class LRNOpMaker : public framework::OpProtoAndCheckerMaker { class LRNOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LRNOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor) The input of LRN operator. " "(Tensor) The input of LRN operator. "
"It must be a 4D tenor with NCHW format."); "It must be a 4D tenor with NCHW format.");
......
...@@ -103,8 +103,7 @@ class LSTMOp : public framework::OperatorWithKernel { ...@@ -103,8 +103,7 @@ class LSTMOp : public framework::OperatorWithKernel {
class LSTMOpMaker : public framework::OpProtoAndCheckerMaker { class LSTMOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LSTMOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input", AddInput("Input",
"(LoDTensor) the first input is a LodTensor, which support " "(LoDTensor) the first input is a LodTensor, which support "
"variable-time length input sequence. The underlying tensor in " "variable-time length input sequence. The underlying tensor in "
......
...@@ -48,8 +48,7 @@ class LstmUnitOp : public framework::OperatorWithKernel { ...@@ -48,8 +48,7 @@ class LstmUnitOp : public framework::OperatorWithKernel {
class LstmUnitOpMaker : public framework::OpProtoAndCheckerMaker { class LstmUnitOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LstmUnitOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"Lstm unit only applies non-linear activations, please make sure" "Lstm unit only applies non-linear activations, please make sure"
"that linear tranformation has already been applied to `X`. " "that linear tranformation has already been applied to `X`. "
......
...@@ -120,8 +120,7 @@ class LSTMPOp : public framework::OperatorWithKernel { ...@@ -120,8 +120,7 @@ class LSTMPOp : public framework::OperatorWithKernel {
class LSTMPOpMaker : public framework::OpProtoAndCheckerMaker { class LSTMPOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
LSTMPOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input", AddInput("Input",
"(LoDTensor) the input for sequence data, which supports " "(LoDTensor) the input for sequence data, which supports "
"variable-time length input sequence. The underlying tensor in " "variable-time length input sequence. The underlying tensor in "
......
...@@ -42,8 +42,7 @@ class MarginRankLossOp : public framework::OperatorWithKernel { ...@@ -42,8 +42,7 @@ class MarginRankLossOp : public framework::OperatorWithKernel {
template <typename T> template <typename T>
class MarginRankLossOpMaker : public framework::OpProtoAndCheckerMaker { class MarginRankLossOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MarginRankLossOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X1", AddInput("X1",
"(2-D tensor with shape [batch_size x 1]) The score for " "(2-D tensor with shape [batch_size x 1]) The score for "
"one item X1 to be ranked, from pairwise ranking model."); "one item X1 to be ranked, from pairwise ranking model.");
......
...@@ -13,10 +13,40 @@ ...@@ -13,10 +13,40 @@
// limitations under the License. // limitations under the License.
#include "paddle/fluid/operators/math/blas.h" #include "paddle/fluid/operators/math/blas.h"
#include <utility>
namespace paddle { namespace paddle {
namespace operators { namespace operators {
namespace math { namespace math {
// Do nothing. Blas is a header only library. MatDescriptor CreateMatrixDescriptor(const framework::DDim &tensor_dim,
int num_flatten_cols, bool trans) {
PADDLE_ENFORCE_GT(tensor_dim.size(), 1);
MatDescriptor retv;
if (num_flatten_cols > 1) {
auto flatten_dim = framework::flatten_to_2d(tensor_dim, num_flatten_cols);
retv.height_ = flatten_dim[0];
retv.width_ = flatten_dim[1];
} else {
if (tensor_dim.size() == 2) {
retv.height_ = tensor_dim[0];
retv.width_ = tensor_dim[1];
} else {
auto dim_vec = framework::vectorize(tensor_dim);
retv.batch_size_ = 1;
for (size_t i = 0; i < dim_vec.size() - 2; ++i) {
retv.batch_size_ *= dim_vec[i];
}
retv.height_ = dim_vec[dim_vec.size() - 2];
retv.width_ = dim_vec[dim_vec.size() - 1];
retv.stride_ = retv.height_ * retv.width_;
}
}
if (trans) {
std::swap(retv.width_, retv.height_);
}
retv.trans_ = trans;
return retv;
}
} // namespace math } // namespace math
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
...@@ -46,6 +46,50 @@ namespace paddle { ...@@ -46,6 +46,50 @@ namespace paddle {
namespace operators { namespace operators {
namespace math { namespace math {
/**
* Matrix Descriptor of a memory buffer.
*
* It is used for Blas::MatMul. MatMul operator can be batched.
* if Mat A is [BatchSize, H, W], Mat B is [BatchSize, H, W]. It will be a
* `batch_size` times of GEMM. The batched GEMM could be faster base on the
* implementation of the blas library. The batch size could be zero. If any
* matrix of `matmul` has a batch size, the will be a batched GEMM, too. e.g.,
* Mat A is [BatchSize, H1, W2], and Mat B [H2, W2], The result matrix wil be
* [BatchSize, H1, W2]
*
* The boolean flag, `trans`, describe the memory is the transpose of matrix or
* not. If the trans is true, the last two dims of matrix are transposed. The
* memory layout of the matrix is [Width, Height] or [BatchSize, Width, Height].
*
* The MatDescriptor is not only the dimension or shape of a matrix, it also
* contains the layout, stride of matrix. It is clearer to have a structure than
* reuse `DDim`.
*/
struct MatDescriptor {
int64_t height_;
int64_t width_;
int64_t stride_{0};
int64_t batch_size_{0};
bool trans_;
};
/**
* Create Matrix Descriptor from a tensor dim, num_flatten_cols, and transpose
* flag
*
* @param tensor_dim: The dimension of the tensor. The rank of this dimension
* must larger than 1.
*
* @param num_flatten_cols: Reshape a tensor to a matrix. The matrix's first
* dimension(column length) will be the product of tensor's first `num_col_dims`
* dimensions. If num_flatten_cols is zero, the first N-2 dimension will be the
* batch_size of descriptor.
*
* @param trans: True if the matrix is transposed.
*/
extern MatDescriptor CreateMatrixDescriptor(const framework::DDim& tensor_dim,
int num_flatten_cols, bool trans);
template <typename DeviceContext> template <typename DeviceContext>
class Blas { class Blas {
public: public:
...@@ -90,6 +134,11 @@ class Blas { ...@@ -90,6 +134,11 @@ class Blas {
int K, T alpha, const T* A, const T* B, T beta, T* C, int K, T alpha, const T* A, const T* B, T beta, T* C,
int batchCount, int64_t strideA, int64_t strideB) const; int batchCount, int64_t strideA, int64_t strideB) const;
template <typename T>
void MatMul(const framework::Tensor& mat_a, const MatDescriptor& dim_a,
const framework::Tensor& mat_b, const MatDescriptor& dim_b,
T alpha, framework::Tensor* mat_out, T beta) const;
private: private:
const DeviceContext& context_; const DeviceContext& context_;
}; };
......
...@@ -180,6 +180,31 @@ void Blas<platform::CPUDeviceContext>::BatchedGEMM( ...@@ -180,6 +180,31 @@ void Blas<platform::CPUDeviceContext>::BatchedGEMM(
#endif #endif
} }
template <typename DeviceContext>
template <typename T>
void Blas<DeviceContext>::MatMul(const framework::Tensor &mat_a,
const MatDescriptor &dim_a,
const framework::Tensor &mat_b,
const MatDescriptor &dim_b, T alpha,
framework::Tensor *mat_out, T beta) const {
PADDLE_ENFORCE_EQ(dim_a.width_, dim_b.height_);
CBLAS_TRANSPOSE transA = !dim_a.trans_ ? CblasNoTrans : CblasTrans;
CBLAS_TRANSPOSE transB = !dim_b.trans_ ? CblasNoTrans : CblasTrans;
if (dim_a.batch_size_ == 0 && dim_b.batch_size_ == 0) {
this->template GEMM<T>(transA, transB, dim_a.height_, dim_b.width_,
dim_a.width_, alpha, mat_a.data<T>(),
mat_b.data<T>(), beta, mat_out->data<T>());
} else {
PADDLE_ENFORCE(dim_a.batch_size_ == dim_b.batch_size_ ||
dim_a.batch_size_ == 0 || dim_b.batch_size_ == 0);
this->template BatchedGEMM<T>(
transA, transB, dim_a.height_, dim_b.width_, dim_a.width_, alpha,
mat_a.data<T>(), mat_b.data<T>(), beta, mat_out->data<T>(),
dim_a.batch_size_ == 0 ? dim_b.batch_size_ : dim_a.batch_size_,
dim_a.stride_, dim_b.stride_);
}
}
} // namespace math } // namespace math
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
/* Copyright (c) 2017 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <algorithm>
#include <vector>
#include "paddle/fluid/operators/math/blas.h"
namespace paddle {
namespace operators {
namespace math {
// Implements the logic of numpy matmul:
// https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html
//
// but allowing also for a, b to be transposed
//
// Both a & b can be 1- to 3-dimensional. Higher rank tensors are not supported
// yet.
template <typename DeviceContext, typename T>
class MatMulFunctor {
public:
void operator()(const DeviceContext& context, const framework::Tensor& a,
bool trans_a, const framework::Tensor& b, bool trans_b,
T alpha, framework::Tensor* out, T beta) {
auto dim_a = a.dims();
auto dim_b = b.dims();
PADDLE_ENFORCE(a.place() == b.place() && b.place() == out->place(),
"Tensors must all be in the same place.");
PADDLE_ENFORCE_GE(dim_a.size(), 1,
"Input tensor a must be at least 1-dimensional.");
PADDLE_ENFORCE_GE(dim_b.size(), 1,
"Input tensor b must be at least 1-dimensional.");
std::vector<int64_t> out_dim;
int64_t batch_count = 1;
if (dim_a.size() > 3) {
PADDLE_ENFORCE(dim_b.size() == dim_a.size(),
"The dimensions of X and Y must be the same, and both of "
"them should be %d-dimensional.",
dim_b.size());
// The first rank-2 dimensions are accumulated on the batch_count, and the
// last two dimensions are used for matrix multiplication.
for (int j = 0; j < dim_a.size() - 2; ++j) {
PADDLE_ENFORCE_EQ(dim_b[j], dim_a[j],
"The %d-th dimension of X and Y must be the same.",
j);
out_dim.push_back(dim_a[j]);
batch_count *= dim_a[j];
}
}
int M = 0, N = 0, kA = 0, kB = 0, batchCountA = 0, batchCountB = 0,
strideA = 0, strideB = 0;
switch (dim_a.size()) {
case 1:
// similar to np.matmul:
// prepend dimension 1 (no transpose) or append dimension 1 (transpose)
M = trans_a ? dim_a[0] : 1;
kA = trans_a ? 1 : dim_a[0];
break;
case 2:
M = trans_a ? dim_a[1] : dim_a[0];
kA = trans_a ? dim_a[0] : dim_a[1];
break;
case 3:
batchCountA = dim_a[0];
M = trans_a ? dim_a[2] : dim_a[1];
kA = trans_a ? dim_a[1] : dim_a[2];
strideA = M * kA;
break;
default:
batchCountA = batch_count;
size_t mat_s = dim_a.size() - 2;
M = trans_a ? dim_a[mat_s + 1] : dim_a[mat_s];
kA = trans_a ? dim_a[mat_s] : dim_a[mat_s + 1];
strideA = M * kA;
}
switch (dim_b.size()) {
case 1:
// similar to np.matmul:
// append dimension 1 (no transpose) or prepend dimension 1 (transpose)
kB = trans_b ? 1 : dim_b[0];
N = trans_b ? dim_b[0] : 1;
break;
case 2:
kB = trans_b ? dim_b[1] : dim_b[0];
N = trans_b ? dim_b[0] : dim_b[1];
break;
case 3:
batchCountB = dim_b[0];
kB = trans_b ? dim_b[2] : dim_b[1];
N = trans_b ? dim_b[1] : dim_b[2];
strideB = kB * N;
break;
default:
batchCountB = batch_count;
size_t mat_s = dim_b.size() - 2;
kB = trans_b ? dim_b[mat_s + 1] : dim_b[mat_s];
N = trans_b ? dim_b[mat_s] : dim_b[mat_s + 1];
strideB = kB * N;
}
PADDLE_ENFORCE_EQ(
kA, kB,
"First matrix's width must be equal with second matrix's height.");
if (batchCountA && batchCountB) {
PADDLE_ENFORCE_EQ(
batchCountA, batchCountB,
"When input tensors a and b are both batched, they must have the "
"same batch dimension.");
}
int batchCount = std::max(batchCountA, batchCountB);
CBLAS_TRANSPOSE transA = (trans_a == false) ? CblasNoTrans : CblasTrans;
CBLAS_TRANSPOSE transB = (trans_b == false) ? CblasNoTrans : CblasTrans;
auto blas = GetBlas<DeviceContext, T>(context);
if (!batchCount) {
// regular matrix multiplication
blas.GEMM(transA, transB, M, N, kA, alpha, a.data<T>(), b.data<T>(), beta,
out->data<T>());
} else {
// batched matrix multiplication
blas.BatchedGEMM(transA, transB, M, N, kA, alpha, a.data<T>(),
b.data<T>(), beta, out->data<T>(), batchCount, strideA,
strideB);
}
}
};
} // namespace math
} // namespace operators
} // namespace paddle
...@@ -12,14 +12,257 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ...@@ -12,14 +12,257 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/fluid/operators/matmul_op.h"
#include <algorithm> #include <algorithm>
#include <utility>
#include <vector> #include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/detail/safe_ref.h"
#include "paddle/fluid/operators/math/blas.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
/**
* Get row matrix shape from a vector shape. If the rank of x_dim > 1, the
* original x_dim is returned.
*/
static framework::DDim RowMatrixFromVector(const framework::DDim& x_dim) {
if (x_dim.size() > 1) {
return x_dim;
}
return framework::make_ddim({1, x_dim[0]});
}
/**
* Get column matrix shape from a vector shape. If the ran of y_dim > 1, the
* original y_dim is returned.
*/
static framework::DDim ColumnMatrixFromVector(const framework::DDim& y_dim) {
if (y_dim.size() > 1) {
return y_dim;
}
return framework::make_ddim({y_dim[0], 1});
}
template <typename DeviceContext, typename T>
class MatMulKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
auto& x =
detail::Ref(context.Input<framework::Tensor>("X"), "Cannot find X");
auto& y =
detail::Ref(context.Input<framework::Tensor>("Y"), "Cannot find Y");
auto* out = context.Output<framework::Tensor>("Out");
out->mutable_data<T>(context.GetPlace());
auto blas = math::GetBlas<DeviceContext, T>(context);
auto mat_dim_a = math::CreateMatrixDescriptor(
RowMatrixFromVector(x.dims()), 0, context.Attr<bool>("transpose_X"));
auto mat_dim_b = math::CreateMatrixDescriptor(
ColumnMatrixFromVector(y.dims()), 0, context.Attr<bool>("transpose_Y"));
blas.MatMul(x, mat_dim_a, y, mat_dim_b, T(1), out, T(0));
}
};
// Reshape a rank-3 tensor from P x M x N to (P * M) x N.
// Identity op if the tensor is not of rank 3.
static framework::Tensor FoldInitDims(const framework::Tensor& input) {
auto output = input;
auto in_dims = input.dims();
if (in_dims.size() == 3) {
output.Resize({in_dims[0] * in_dims[1], in_dims[2]});
}
return output;
}
// Reshape a rank-3 tensor from P x M x N to M x (P * N).
// (Warning: This requires transposing data and writes into new memory.)
// Identity op if the tensor is not of rank 3.
template <typename DeviceContext, typename T>
static framework::Tensor FoldHeadAndLastDims(const DeviceContext& context,
const framework::Tensor& input) {
auto in_dims = input.dims();
if (in_dims.size() != 3) {
return input;
}
framework::Tensor output;
output.Resize({in_dims[1], in_dims[0], in_dims[2]});
output.mutable_data<T>(context.GetPlace());
std::vector<int> axis = {1, 0, 2};
math::Transpose<DeviceContext, T, 3> trans;
trans(context, input, &output, axis);
output.Resize({in_dims[1], in_dims[0] * in_dims[2]});
return output;
}
/**
* Reshape a tensor to 3-D or 2-D tensor by matrix descriptor.
*
* The shape would be [BatchSize, H, W] or [H, W].
* If transposed, `H,W` will be swapped.
*/
static void ReshapeTensorIntoMatrixSequence(
framework::Tensor* x, const math::MatDescriptor& descriptor) {
int64_t h, w;
h = descriptor.height_;
w = descriptor.width_;
if (descriptor.trans_) {
std::swap(w, h);
}
if (descriptor.batch_size_) {
x->Resize({descriptor.batch_size_, h, w});
} else {
x->Resize({h, w});
}
}
/**
* Reshape the x,y,out tensor to 3-D or 2-D tensor by matrix descriptor
* Out = matmul(x, y)
*
* This method will first calculate X,Y matrix sequence, and then calculate
* the out shape.
*
* Assume X = [BatchSize, H1, W1], Y = [BatchSize, H2, W2]
* The out = [BatchSize, H1, W2]
*
* If there is no batch size in `X` and `Y`, the out will be [H1, W2]
* If any of `X` and `Y` has batch size BatchSize, the out will have the
* BatchSize.
*/
static void ReshapeXYOutIntoMatrixSequence(framework::Tensor* x,
framework::Tensor* y,
framework::Tensor* out, bool trans_x,
bool trans_y) {
auto x_dim = RowMatrixFromVector(x->dims());
auto y_dim = ColumnMatrixFromVector(y->dims());
auto mat_dim_x = math::CreateMatrixDescriptor(x_dim, 0, trans_x);
auto mat_dim_y = math::CreateMatrixDescriptor(y_dim, 0, trans_y);
if (mat_dim_x.batch_size_ == 0 && mat_dim_y.batch_size_ == 0) {
out->Resize({mat_dim_x.height_, mat_dim_y.width_});
} else {
out->Resize({std::max(mat_dim_x.batch_size_, mat_dim_y.batch_size_),
mat_dim_x.height_, mat_dim_y.width_});
}
ReshapeTensorIntoMatrixSequence(x, mat_dim_x);
ReshapeTensorIntoMatrixSequence(y, mat_dim_y);
}
// Using dimensional constraints on matrix multiplication, it is
// straight-forward to check the following table for when X and Y
// are both matrices.
//
// transpose_X | False | True | False | True
// transpose_Y | False | False | True | True
// -----------+----------+----------+----------+-----------
// dX = | dOut Y^T | Y dOut^T | dOut Y | Y^T dOut^T
// dY = | X^T dOut | X dOut | dOut^T X | dOut^T X^T
//
// When X is a vector of size K, we treat it instead as a matrix of shape
// (1, K). Similarly, when Y is a vector of size K, we treat it instead as
// a matrix of shape (K, 1).
//
// When X and Y are both 3-dimensional tensors, then the first dimension
// the batch dimension can be ignored and the exact same formulas apply
// as for two matrices.
//
// Finally, when, e.g., X is a 3-dimensional tensor but Y is a matrix, we end
// up with formulas like
//
// dY_{ij} = \sum_{p, m} X_{pmi} dOut_{pmj}
//
// To handle this sort of scenario, we reshape X : P x M x K, dOut: P x M x N
// to X: (P * M) x K, dOut: (P * M) x N.
template <typename DeviceContext, typename T>
class MatMulGradKernel : public framework::OpKernel<T> {
public:
void MatMul(const framework::ExecutionContext& context,
const framework::Tensor& a, bool trans_a,
const framework::Tensor& b, bool trans_b,
framework::Tensor* out) const {
out->mutable_data<T>(context.GetPlace());
auto blas = math::GetBlas<DeviceContext, T>(context);
auto mat_dim_a = math::CreateMatrixDescriptor(a.dims(), 0, trans_a);
auto mat_dim_b = math::CreateMatrixDescriptor(b.dims(), 0, trans_b);
blas.MatMul(a, mat_dim_a, b, mat_dim_b, T(1), out, T(0));
}
void CalcInputGrad(const framework::ExecutionContext& context,
const framework::Tensor& a, bool trans_a,
bool is_fold_init_dims_a, const framework::Tensor& b,
bool trans_b, bool is_fold_init_dims_b,
framework::Tensor* out) const {
if (out == nullptr) return;
bool need_combine = (a.dims().size() == 3 || b.dims().size() == 3) &&
out->dims().size() == 2;
if (!need_combine) {
MatMul(context, a, trans_a, b, trans_b, out);
} else {
auto& ctx = context.template device_context<DeviceContext>();
MatMul(context, is_fold_init_dims_a
? FoldInitDims(a)
: FoldHeadAndLastDims<DeviceContext, T>(ctx, a),
trans_a, is_fold_init_dims_b
? FoldInitDims(b)
: FoldHeadAndLastDims<DeviceContext, T>(ctx, b),
trans_b, out);
}
}
void Compute(const framework::ExecutionContext& context) const override {
auto x = *context.Input<framework::Tensor>("X");
auto y = *context.Input<framework::Tensor>("Y");
auto dout =
*context.Input<framework::Tensor>(framework::GradVarName("Out"));
auto* dx = context.Output<framework::Tensor>(framework::GradVarName("X"));
auto* dy = context.Output<framework::Tensor>(framework::GradVarName("Y"));
bool transpose_x = context.Attr<bool>("transpose_X");
bool transpose_y = context.Attr<bool>("transpose_Y");
ReshapeXYOutIntoMatrixSequence(&x, &y, &dout, transpose_x, transpose_y);
framework::DDim dx_dims;
if (dx) {
dx_dims = dx->dims();
if (dx_dims != x.dims()) {
dx->Resize(x.dims());
}
}
framework::DDim dy_dims;
if (dy) {
dy_dims = dy->dims();
if (dy_dims != y.dims()) {
dy->Resize(y.dims());
}
}
using framework::Tensor; if (transpose_x && transpose_y) {
CalcInputGrad(context, y, true, true, dout, true, false, dx);
CalcInputGrad(context, dout, true, true, x, true, false, dy);
} else if (transpose_x) {
CalcInputGrad(context, y, false, false, dout, true, false, dx);
CalcInputGrad(context, x, false, false, dout, false, true, dy);
} else if (transpose_y) {
CalcInputGrad(context, dout, false, false, y, false, true, dx);
CalcInputGrad(context, dout, true, true, x, false, true, dy);
} else {
CalcInputGrad(context, dout, false, false, y, true, false, dx);
CalcInputGrad(context, x, true, true, dout, false, true, dy);
}
if (dx) {
if (dx_dims != x.dims()) {
dx->Resize(dx_dims);
}
}
if (dy) {
if (dy_dims != y.dims()) {
dy->Resize(dy_dims);
}
}
}
};
class MatMulOp : public framework::OperatorWithKernel { class MatMulOp : public framework::OperatorWithKernel {
public: public:
...@@ -36,121 +279,41 @@ class MatMulOp : public framework::OperatorWithKernel { ...@@ -36,121 +279,41 @@ class MatMulOp : public framework::OperatorWithKernel {
auto dim_x = context->GetInputDim("X"); auto dim_x = context->GetInputDim("X");
auto dim_y = context->GetInputDim("Y"); auto dim_y = context->GetInputDim("Y");
bool transpose_x = context->Attrs().Get<bool>("transpose_X");
bool transpose_y = context->Attrs().Get<bool>("transpose_Y");
PADDLE_ENFORCE_GE(dim_x.size(), 1,
"Input tensor X must be at least 1-dimensional.");
PADDLE_ENFORCE_GE(dim_y.size(), 1,
"Input tensor Y must be at least 1-dimensional.");
std::vector<int64_t> out_dim;
int64_t batch_count = 1;
if (dim_x.size() > 3) {
PADDLE_ENFORCE_EQ(
dim_y.size(), dim_x.size(),
"The dimensions of X and Y must be the same, and both of "
"them should be %d-dimensional.",
dim_x.size());
// The first rank-2 dimensions are accumulated on the batch_count, and the
// last two dimensions are used for matrix multiplication.
for (int j = 0; j < dim_x.size() - 2; ++j) {
PADDLE_ENFORCE_EQ(dim_y[j], dim_x[j],
"The %d-th dimension of X and Y must be the same.",
j);
out_dim.push_back(dim_x[j]);
batch_count *= dim_x[j];
}
}
int M = 0, N = 0, KX = 0, KY = 0, batchCountX = 0, batchCountY = 0; auto mat_dim_x =
bool remove_initial_dim = false, remove_final_dim = false; math::CreateMatrixDescriptor(RowMatrixFromVector(dim_x), 0,
context->Attrs().Get<bool>("transpose_X"));
switch (dim_x.size()) { auto mat_dim_y =
case 1: math::CreateMatrixDescriptor(ColumnMatrixFromVector(dim_y), 0,
if (transpose_x) { context->Attrs().Get<bool>("transpose_Y"));
M = dim_x[0];
KX = 1;
} else {
M = 1;
KX = dim_x[0];
remove_initial_dim = true;
}
break;
case 2:
M = transpose_x ? dim_x[1] : dim_x[0];
KX = transpose_x ? dim_x[0] : dim_x[1];
break;
case 3:
batchCountX = dim_x[0];
M = transpose_x ? dim_x[2] : dim_x[1];
KX = transpose_x ? dim_x[1] : dim_x[2];
break;
default:
batchCountX = batch_count;
size_t mat_s = dim_x.size() - 2;
M = transpose_x ? dim_x[mat_s + 1] : dim_x[mat_s];
KX = transpose_x ? dim_x[mat_s] : dim_x[mat_s + 1];
break;
}
switch (dim_y.size()) { PADDLE_ENFORCE_EQ(mat_dim_x.width_, mat_dim_y.height_);
case 1: PADDLE_ENFORCE(mat_dim_x.batch_size_ == mat_dim_y.batch_size_ ||
if (transpose_y) { mat_dim_x.batch_size_ == 0 || mat_dim_y.batch_size_ == 0);
N = dim_y[0]; std::vector<int64_t> dim_out;
KY = 1; if (mat_dim_x.batch_size_ != 0) {
} else { dim_out = framework::vectorize(dim_x);
N = 1; dim_out[dim_out.size() - 2] = mat_dim_x.height_;
KY = dim_y[0]; dim_out[dim_out.size() - 1] = mat_dim_y.width_;
remove_final_dim = true; } else if (mat_dim_y.batch_size_ != 0) {
} dim_out = framework::vectorize(dim_y);
break; dim_out[dim_out.size() - 2] = mat_dim_x.height_;
case 2: dim_out[dim_out.size() - 1] = mat_dim_y.width_;
KY = transpose_y ? dim_y[1] : dim_y[0]; } else {
N = transpose_y ? dim_y[0] : dim_y[1]; dim_out = {mat_dim_x.height_, mat_dim_y.width_};
break;
case 3:
batchCountY = dim_y[0];
KY = transpose_y ? dim_y[2] : dim_y[1];
N = transpose_y ? dim_y[1] : dim_y[2];
break;
default:
batchCountY = batch_count;
size_t mat_s = dim_y.size() - 2;
KY = transpose_y ? dim_y[mat_s + 1] : dim_y[mat_s];
N = transpose_y ? dim_y[mat_s] : dim_y[mat_s + 1];
} }
PADDLE_ENFORCE_EQ( if (dim_x.size() == 1 && dim_out[dim_out.size() - 2] == 1) {
KX, KY, std::swap(dim_out[dim_out.size() - 2], dim_out[dim_out.size() - 1]);
"First matrix's width must be equal with second matrix's height."); dim_out.resize(dim_out.size() - 1);
if (batchCountX && batchCountY) {
PADDLE_ENFORCE_EQ(
batchCountX, batchCountY,
"When Input(X) and Input(Y) are both three dimensional, they "
"must have the same batch dimension.");
} }
int batchCount = std::max(batchCountX, batchCountY);
std::vector<int64_t> dim_out; if (dim_y.size() == 1 && dim_out[dim_out.size() - 1] == 1) {
if (batchCount) { dim_out.resize(dim_out.size() - 1);
if (dim_x.size() > 3) {
dim_out.insert(dim_out.begin(), out_dim.begin(), out_dim.end());
} else {
dim_out.push_back(batchCount);
}
} }
if (!remove_initial_dim) {
dim_out.push_back(M); if (dim_out.empty()) {
} dim_out = {1};
if (!remove_final_dim) {
dim_out.push_back(N);
}
if (dim_out.size() == 0) {
// We don't support 0-dimensional Tensors (scalars), so instead
// treat the output as a Tensor of shape (1, ) in this case.
dim_out.push_back(1);
} }
context->SetOutputDim("Out", framework::make_ddim(dim_out)); context->SetOutputDim("Out", framework::make_ddim(dim_out));
context->ShareLoD("X", /*->*/ "Out"); context->ShareLoD("X", /*->*/ "Out");
...@@ -159,8 +322,7 @@ class MatMulOp : public framework::OperatorWithKernel { ...@@ -159,8 +322,7 @@ class MatMulOp : public framework::OperatorWithKernel {
class MatMulOpMaker : public framework::OpProtoAndCheckerMaker { class MatMulOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MatMulOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The first input of MatMul op"); AddInput("X", "The first input of MatMul op");
AddInput("Y", "The second input of MatMul op"); AddInput("Y", "The second input of MatMul op");
AddOutput("Out", "The output of MatMul op"); AddOutput("Out", "The output of MatMul op");
...@@ -233,15 +395,40 @@ class MatMulOpGrad : public framework::OperatorWithKernel { ...@@ -233,15 +395,40 @@ class MatMulOpGrad : public framework::OperatorWithKernel {
} }
}; };
class MatMulOpGradMaker : public framework::SingleGradOpDescMaker {
public:
using framework::SingleGradOpDescMaker::SingleGradOpDescMaker;
protected:
std::unique_ptr<framework::OpDesc> Apply() const override {
auto* retv = new framework::OpDesc();
retv->SetType("matmul_grad");
retv->SetInput("X", Input("X"));
retv->SetInput("Y", Input("Y"));
retv->SetInput(framework::GradVarName("Out"), OutputGrad("Out"));
retv->SetOutput(framework::GradVarName("X"), InputGrad("X"));
retv->SetOutput(framework::GradVarName("Y"), InputGrad("Y"));
retv->SetAttrMap(Attrs());
return std::unique_ptr<framework::OpDesc>(retv);
}
};
} // namespace operators } // namespace operators
} // namespace paddle } // namespace paddle
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(matmul, ops::MatMulOp, ops::MatMulOpMaker, REGISTER_OPERATOR(matmul, ops::MatMulOp, ops::MatMulOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>); ops::MatMulOpGradMaker);
REGISTER_OPERATOR(matmul_grad, ops::MatMulOpGrad); REGISTER_OPERATOR(matmul_grad, ops::MatMulOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
matmul, ops::MatMulKernel<paddle::platform::CPUDeviceContext, float>); matmul, ops::MatMulKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
matmul_grad, matmul_grad,
ops::MatMulGradKernel<paddle::platform::CPUDeviceContext, float>); ops::MatMulGradKernel<paddle::platform::CPUDeviceContext, float>);
#ifdef PADDLE_WITH_CUDA
REGISTER_OP_CUDA_KERNEL(
matmul, ops::MatMulKernel<paddle::platform::CUDADeviceContext, float>);
REGISTER_OP_CUDA_KERNEL(
matmul_grad,
ops::MatMulGradKernel<paddle::platform::CUDADeviceContext, float>);
#endif
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "paddle/fluid/operators/matmul_op.h"
namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL(
matmul, ops::MatMulKernel<paddle::platform::CUDADeviceContext, float>);
REGISTER_OP_CUDA_KERNEL(
matmul_grad,
ops::MatMulGradKernel<paddle::platform::CUDADeviceContext, float>);
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#pragma once
#include <algorithm>
#include <functional>
#include <vector>
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/math/math_function.h"
#include "paddle/fluid/operators/math/matmul.h"
namespace paddle {
namespace operators {
namespace matmul_detail {
using Tensor = framework::Tensor;
using DDim = framework::DDim;
using framework::make_ddim;
using framework::vectorize;
template <typename DeviceContext, typename T>
class MatMulKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
const Tensor& x = *context.Input<Tensor>("X");
const Tensor& y = *context.Input<Tensor>("Y");
Tensor* out = context.Output<Tensor>("Out");
out->mutable_data<T>(context.GetPlace());
bool transpose_x = context.Attr<bool>("transpose_X");
bool transpose_y = context.Attr<bool>("transpose_Y");
math::MatMulFunctor<DeviceContext, T>()(
context.template device_context<DeviceContext>(), x, transpose_x, y,
transpose_y, T(1), out, T(0));
}
};
template <typename T>
inline Tensor Reshape(const Tensor& input, const DDim& dims) {
Tensor output;
output.ShareDataWith(input);
output.Resize(dims);
return output;
}
// Reshape a rank-3 tensor from P x M x N to (P * M) x N.
// Identity op if the tensor is not of rank 3.
template <typename T>
Tensor CombineBatchAndM(const Tensor& input) {
Tensor output;
output.ShareDataWith(input);
auto in_dims = input.dims();
if (in_dims.size() == 3) {
std::vector<int64_t> out_dims = {in_dims[0] * in_dims[1], in_dims[2]};
output.Resize(make_ddim(out_dims));
}
return output;
}
// Reshape a rank-3 tensor from P x M x N to M x (P * N).
// (Warning: This requires transposing data and writes into new memory.)
// Identity op if the tensor is not of rank 3.
template <typename DeviceContext, typename T>
Tensor CombineBatchAndN(const DeviceContext& context, const Tensor& input) {
Tensor output;
auto in_dims = input.dims();
if (in_dims.size() == 3) {
output.Resize({in_dims[1], in_dims[0], in_dims[2]});
output.mutable_data<T>(context.GetPlace());
std::vector<int> axis = {1, 0, 2};
math::Transpose<DeviceContext, T, 3> trans;
trans(context, input, &output, axis);
std::vector<int64_t> out_dims = {in_dims[1], in_dims[0] * in_dims[2]};
output.Resize({in_dims[1], in_dims[0] * in_dims[2]});
} else {
output.ShareDataWith(input);
}
return output;
}
// Using dimensional constraints on matrix multiplication, it is
// straight-forward to check the following table for when X and Y
// are both matrices.
//
// transpose_X | False | True | False | True
// transpose_Y | False | False | True | True
// -----------+----------+----------+----------+-----------
// dX = | dOut Y^T | Y dOut^T | dOut Y | Y^T dOut^T
// dY = | X^T dOut | X dOut | dOut^T X | dOut^T X^T
//
// When X is a vector of size K, we treat it instead as a matrix of shape
// (1, K). Similarly, when Y is a vector of size K, we treat it instead as
// a matrix of shape (K, 1).
//
// When X and Y are both 3-dimensional tensors, then the first dimension
// the batch dimension can be ignored and the exact same formulas apply
// as for two matrices.
//
// Finally, when, e.g., X is a 3-dimensional tensor but Y is a matrix, we end
// up with formulas like
//
// dY_{ij} = \sum_{p, m} X_{pmi} dOut_{pmj}
//
// To handle this sort of scenario, we reshape X : P x M x K, dOut: P x M x N
// to X: (P * M) x K, dOut: (P * M) x N.
template <typename DeviceContext, typename T>
class MatMulGradKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& context) const override {
const Tensor& x = *context.Input<Tensor>("X");
const Tensor& y = *context.Input<Tensor>("Y");
const Tensor& dout = *context.Input<Tensor>(framework::GradVarName("Out"));
Tensor* dx = context.Output<Tensor>(framework::GradVarName("X"));
Tensor* dy = context.Output<Tensor>(framework::GradVarName("Y"));
bool transpose_x = context.Attr<bool>("transpose_X");
bool transpose_y = context.Attr<bool>("transpose_Y");
std::vector<int64_t> x_dims = vectorize(x.dims());
std::vector<int64_t> y_dims = vectorize(y.dims());
// If X is a vector, reshape it to a matrix.
if (x_dims.size() == 1) {
x_dims.insert(x_dims.begin(), 1);
}
// If Y is a vector, reshape it to a matrix.
if (y_dims.size() == 1) {
y_dims.push_back(1);
}
int batch_count = 0;
// The first rank-2 dimensions are accumulated on the batch_count, and the
// last two dimensions are used for matrix multiplication.
if (x_dims.size() > 3) {
batch_count = accumulate(x_dims.begin(), x_dims.end() - 2, 1,
std::multiplies<int>());
}
// Fix the dOut dimensions.
int M = 0, N = 0, batchCountX = 0, batchCountY = 0;
switch (x_dims.size()) {
case 2:
M = transpose_x ? x_dims[1] : x_dims[0];
break;
case 3:
batchCountX = x_dims[0];
M = transpose_x ? x_dims[2] : x_dims[1];
break;
default:
batchCountX = batch_count;
size_t mat_s = x_dims.size() - 2;
M = transpose_x ? x_dims[mat_s + 1] : x_dims[mat_s];
}
switch (y_dims.size()) {
case 2:
N = transpose_y ? y_dims[0] : y_dims[1];
break;
case 3:
batchCountY = y_dims[0];
N = transpose_y ? y_dims[1] : y_dims[2];
break;
default:
batchCountY = batch_count;
size_t mat_s = y_dims.size() - 2;
N = transpose_y ? y_dims[mat_s] : y_dims[mat_s + 1];
}
if (batchCountX && batchCountY) {
PADDLE_ENFORCE_EQ(
batchCountX, batchCountY,
"When Input(X) and Input(Y) are both three dimensional, they "
"must have the same batch dimension.");
}
int batchCount = std::max(batchCountX, batchCountY);
std::vector<int64_t> dout_dims = {M, N};
if (batchCount) {
if (x_dims.size() > 3) {
dout_dims.insert(dout_dims.begin(), x_dims.begin(), x_dims.end() - 2);
} else {
dout_dims.insert(dout_dims.begin(), batchCount);
}
}
Tensor X = Reshape<T>(x, make_ddim(x_dims));
Tensor Y = Reshape<T>(y, make_ddim(y_dims));
Tensor dOut = Reshape<T>(dout, make_ddim(dout_dims));
auto& dev_ctx = context.template device_context<DeviceContext>();
if (dx) {
dx->mutable_data<T>(context.GetPlace());
const Tensor& dOut_for_dX =
(x_dims.size() == 2 && y_dims.size() == 3)
? CombineBatchAndN<DeviceContext, T>(dev_ctx, dOut)
: dOut;
if (x_dims.size() == 2 && y_dims.size() == 3) {
Y = transpose_y ? CombineBatchAndM<T>(Y)
: CombineBatchAndN<DeviceContext, T>(dev_ctx, Y);
}
if (transpose_x) {
math::MatMulFunctor<DeviceContext, T>()(
dev_ctx, Y, transpose_y, dOut_for_dX, transpose_x, T(1), dx, T(0));
} else {
math::MatMulFunctor<DeviceContext, T>()(
dev_ctx, dOut_for_dX, transpose_x, Y, !transpose_y, T(1), dx, T(0));
}
}
if (dy) {
dy->mutable_data<T>(context.GetPlace());
const Tensor& dOut_for_dY = (y_dims.size() == 2 && x_dims.size() == 3)
? CombineBatchAndM<T>(dOut)
: dOut;
if (y_dims.size() == 2 && x_dims.size() == 3) {
X = transpose_x ? CombineBatchAndN<DeviceContext, T>(dev_ctx, X)
: CombineBatchAndM<T>(X);
dOut = CombineBatchAndM<T>(dOut);
}
if (transpose_y) {
math::MatMulFunctor<DeviceContext, T>()(
dev_ctx, dOut_for_dY, transpose_y, X, transpose_x, T(1), dy, T(0));
} else {
math::MatMulFunctor<DeviceContext, T>()(
dev_ctx, X, !transpose_x, dOut_for_dY, transpose_y, T(1), dy, T(0));
}
}
}
};
} // namespace matmul_detail
using matmul_detail::MatMulKernel;
using matmul_detail::MatMulGradKernel;
} // namespace operators
} // namespace paddle
...@@ -41,8 +41,7 @@ class MaxSeqenceLenOp : public framework::OperatorBase { ...@@ -41,8 +41,7 @@ class MaxSeqenceLenOp : public framework::OperatorBase {
class MaxSeqenceLenOpProtoMaker : public framework::OpProtoAndCheckerMaker { class MaxSeqenceLenOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MaxSeqenceLenOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("RankTable", "The lod_rank_table."); AddInput("RankTable", "The lod_rank_table.");
AddOutput("Out", "The max sequence length."); AddOutput("Out", "The max sequence length.");
AddComment( AddComment(
......
...@@ -22,8 +22,7 @@ using framework::Tensor; ...@@ -22,8 +22,7 @@ using framework::Tensor;
class MaxOutOpMaker : public framework::OpProtoAndCheckerMaker { class MaxOutOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MaxOutOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(Tensor) The input tensor of maxout operator. " "(Tensor) The input tensor of maxout operator. "
......
...@@ -32,8 +32,7 @@ class MeanOp : public framework::OperatorWithKernel { ...@@ -32,8 +32,7 @@ class MeanOp : public framework::OperatorWithKernel {
class MeanOpMaker : public framework::OpProtoAndCheckerMaker { class MeanOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MeanOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of mean op"); AddInput("X", "The input of mean op");
AddOutput("Out", "The output of mean op"); AddOutput("Out", "The output of mean op");
AddComment(R"DOC( AddComment(R"DOC(
......
...@@ -121,8 +121,7 @@ class MergeLoDTensorOp : public framework::OperatorBase { ...@@ -121,8 +121,7 @@ class MergeLoDTensorOp : public framework::OperatorBase {
class MergeLoDTensorOpProtoMaker : public framework::OpProtoAndCheckerMaker { class MergeLoDTensorOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MergeLoDTensorOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"The input LoDTensor, contains complete lod information to " "The input LoDTensor, contains complete lod information to "
"construct the output"); "construct the output");
......
...@@ -253,8 +253,7 @@ class MineHardExamplesOp : public framework::OperatorWithKernel { ...@@ -253,8 +253,7 @@ class MineHardExamplesOp : public framework::OperatorWithKernel {
class MineHardExamplesOpMaker : public framework::OpProtoAndCheckerMaker { class MineHardExamplesOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MineHardExamplesOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"ClsLoss", "ClsLoss",
"(Tensor, default Tensor<float>), The classification loss with shape " "(Tensor, default Tensor<float>), The classification loss with shape "
......
...@@ -48,8 +48,7 @@ class MinusOp : public framework::OperatorWithKernel { ...@@ -48,8 +48,7 @@ class MinusOp : public framework::OperatorWithKernel {
class MinusOpMaker : public framework::OpProtoAndCheckerMaker { class MinusOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MinusOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The left tensor of minus operator."); AddInput("X", "The left tensor of minus operator.");
AddInput("Y", "The right tensor of minus operator."); AddInput("Y", "The right tensor of minus operator.");
AddOutput("Out", "The output tensor of minus operator."); AddOutput("Out", "The output tensor of minus operator.");
......
...@@ -39,8 +39,7 @@ class ModifiedHuberLossOp : public framework::OperatorWithKernel { ...@@ -39,8 +39,7 @@ class ModifiedHuberLossOp : public framework::OperatorWithKernel {
class ModifiedHuberLossOpMaker : public framework::OpProtoAndCheckerMaker { class ModifiedHuberLossOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ModifiedHuberLossOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"The input tensor of modified huber loss op. " "The input tensor of modified huber loss op. "
"X is 2-D tensor with shape [batch_size, 1]."); "X is 2-D tensor with shape [batch_size, 1].");
......
...@@ -62,8 +62,7 @@ class MomentumOp : public framework::OperatorWithKernel { ...@@ -62,8 +62,7 @@ class MomentumOp : public framework::OperatorWithKernel {
class MomentumOpMaker : public framework::OpProtoAndCheckerMaker { class MomentumOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MomentumOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", AddInput("Param",
"(Tensor, default Tensor<float>) " "(Tensor, default Tensor<float>) "
"Input parameter that has to be updated"); "Input parameter that has to be updated");
......
...@@ -96,8 +96,7 @@ class MulOp : public framework::OperatorWithKernel { ...@@ -96,8 +96,7 @@ class MulOp : public framework::OperatorWithKernel {
class MulOpMaker : public framework::OpProtoAndCheckerMaker { class MulOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MulOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor), The first input tensor of mul op."); AddInput("X", "(Tensor), The first input tensor of mul op.");
AddInput("Y", "(Tensor), The second input tensor of mul op."); AddInput("Y", "(Tensor), The second input tensor of mul op.");
AddOutput("Out", "(Tensor), The output tensor of mul op."); AddOutput("Out", "(Tensor), The output tensor of mul op.");
......
...@@ -309,8 +309,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> { ...@@ -309,8 +309,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
class MultiClassNMSOpMaker : public framework::OpProtoAndCheckerMaker { class MultiClassNMSOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MultiClassNMSOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("BBoxes", AddInput("BBoxes",
"(Tensor) A 3-D Tensor with shape [N, M, 4] represents the " "(Tensor) A 3-D Tensor with shape [N, M, 4] represents the "
"predicted locations of M bounding bboxes, N is the batch size. " "predicted locations of M bounding bboxes, N is the batch size. "
......
...@@ -61,8 +61,7 @@ class MultiplexOp : public framework::OperatorWithKernel { ...@@ -61,8 +61,7 @@ class MultiplexOp : public framework::OperatorWithKernel {
class MultiplexOpMaker : public framework::OpProtoAndCheckerMaker { class MultiplexOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MultiplexOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Ids", "The index tensor of multiplex operator."); AddInput("Ids", "The index tensor of multiplex operator.");
AddInput("X", "The candidate tensors of multiplex operator.") AddInput("X", "The candidate tensors of multiplex operator.")
.AsDuplicable(); .AsDuplicable();
......
...@@ -76,8 +76,7 @@ class NCCLInitOpShapeInference : public framework::InferShapeBase { ...@@ -76,8 +76,7 @@ class NCCLInitOpShapeInference : public framework::InferShapeBase {
class NCCLInitOpMaker : public framework::OpProtoAndCheckerMaker { class NCCLInitOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
NCCLInitOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(kParallelScopes, "The working place of parallel do."); AddInput(kParallelScopes, "The working place of parallel do.");
AddOutput("Communicator", AddOutput("Communicator",
"Create Communicator for communicating between gpus"); "Create Communicator for communicating between gpus");
...@@ -118,8 +117,7 @@ class NCCLAllReduceOp : public framework::OperatorWithKernel { ...@@ -118,8 +117,7 @@ class NCCLAllReduceOp : public framework::OperatorWithKernel {
// AllReduceOp // AllReduceOp
class NCCLAllReduceOpMaker : public framework::OpProtoAndCheckerMaker { class NCCLAllReduceOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
NCCLAllReduceOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of AllReduce op"); AddInput("X", "The input of AllReduce op");
AddInput("Communicator", "Communicator for communicating between gpus"); AddInput("Communicator", "Communicator for communicating between gpus");
AddOutput("Out", "The output of AllReduce op"); AddOutput("Out", "The output of AllReduce op");
...@@ -165,8 +163,7 @@ class NCCLReduceOp : public framework::OperatorWithKernel { ...@@ -165,8 +163,7 @@ class NCCLReduceOp : public framework::OperatorWithKernel {
// ReduceOp // ReduceOp
class NCCLReduceOpMaker : public framework::OpProtoAndCheckerMaker { class NCCLReduceOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
NCCLReduceOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of Reduce op"); AddInput("X", "The input of Reduce op");
AddInput("Communicator", "Communicator for communicating between gpus"); AddInput("Communicator", "Communicator for communicating between gpus");
AddOutput("Out", "The output of Reduce op"); AddOutput("Out", "The output of Reduce op");
...@@ -214,8 +211,7 @@ class NCCLBcastOp : public framework::OperatorWithKernel { ...@@ -214,8 +211,7 @@ class NCCLBcastOp : public framework::OperatorWithKernel {
// BcastOp // BcastOp
class NCCLBcastOpMaker : public framework::OpProtoAndCheckerMaker { class NCCLBcastOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
NCCLBcastOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input of BcastSend op"); AddInput("X", "The input of BcastSend op");
AddInput("Communicator", "Communicator for communicating between gpus"); AddInput("Communicator", "Communicator for communicating between gpus");
AddOutput("Out", "The output of Bcast"); AddOutput("Out", "The output of Bcast");
......
...@@ -75,8 +75,7 @@ class NCEOp : public framework::OperatorWithKernel { ...@@ -75,8 +75,7 @@ class NCEOp : public framework::OperatorWithKernel {
class NCEOpMaker : public framework::OpProtoAndCheckerMaker { class NCEOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
NCEOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input", "(Tensor) A tensor of shape [batch_size, dim]."); AddInput("Input", "(Tensor) A tensor of shape [batch_size, dim].");
AddInput( AddInput(
"Label", "Label",
......
...@@ -19,8 +19,7 @@ namespace operators { ...@@ -19,8 +19,7 @@ namespace operators {
template <typename AttrType> template <typename AttrType>
class NormOpMaker : public framework::OpProtoAndCheckerMaker { class NormOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
NormOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(Tensor) The input tensor of norm operator. " "(Tensor) The input tensor of norm operator. "
......
...@@ -46,8 +46,7 @@ class OneHotOp : public framework::OperatorWithKernel { ...@@ -46,8 +46,7 @@ class OneHotOp : public framework::OperatorWithKernel {
class OneHotOpMaker : public framework::OpProtoAndCheckerMaker { class OneHotOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
OneHotOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor, LoDTensor<int>) Input variable with rank at least 2. " "(LoDTensor, LoDTensor<int>) Input variable with rank at least 2. "
"The last dimension of X should be 1. Each value of X is an index " "The last dimension of X should be 1. Each value of X is an index "
......
...@@ -48,8 +48,7 @@ class PadOp : public framework::OperatorWithKernel { ...@@ -48,8 +48,7 @@ class PadOp : public framework::OperatorWithKernel {
class PadOpMaker : public framework::OpProtoAndCheckerMaker { class PadOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
PadOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"The input of pad op. " "The input of pad op. "
"The input should be a k-D tensor(k > 0 and k < 7)"); "The input should be a k-D tensor(k > 0 and k < 7)");
......
...@@ -196,8 +196,7 @@ class ParallelDoOp : public framework::OperatorBase { ...@@ -196,8 +196,7 @@ class ParallelDoOp : public framework::OperatorBase {
class ParallelDoOpProtoMaker : public framework::OpProtoAndCheckerMaker { class ParallelDoOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ParallelDoOpProtoMaker(OpProto *proto, framework::OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(kInputs, "").AsDuplicable(); AddInput(kInputs, "").AsDuplicable();
AddInput(kParameters, "").AsDuplicable(); AddInput(kParameters, "").AsDuplicable();
AddInput(kPlaces, ""); AddInput(kPlaces, "");
......
...@@ -135,8 +135,7 @@ framework::OpKernelType PoolOpGrad::GetExpectedKernelType( ...@@ -135,8 +135,7 @@ framework::OpKernelType PoolOpGrad::GetExpectedKernelType(
library_); library_);
} }
Pool2dOpMaker::Pool2dOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Pool2dOpMaker::Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(Tensor) The input tensor of pooling operator. " "(Tensor) The input tensor of pooling operator. "
...@@ -229,8 +228,7 @@ Example: ...@@ -229,8 +228,7 @@ Example:
)DOC"); )DOC");
} }
Pool3dOpMaker::Pool3dOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Pool3dOpMaker::Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor) The input tensor of pooling operator. " "(Tensor) The input tensor of pooling operator. "
"The format of input tensor is NCDHW, where N is batch size, C is " "The format of input tensor is NCDHW, where N is batch size, C is "
......
...@@ -50,12 +50,12 @@ class PoolOpGrad : public framework::OperatorWithKernel { ...@@ -50,12 +50,12 @@ class PoolOpGrad : public framework::OperatorWithKernel {
class Pool2dOpMaker : public framework::OpProtoAndCheckerMaker { class Pool2dOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Pool2dOpMaker(OpProto* proto, OpAttrChecker* op_checker); void Make() override;
}; };
class Pool3dOpMaker : public framework::OpProtoAndCheckerMaker { class Pool3dOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Pool3dOpMaker(OpProto* proto, OpAttrChecker* op_checker); void Make() override;
}; };
template <typename DeviceContext, typename T> template <typename DeviceContext, typename T>
......
...@@ -100,8 +100,7 @@ class MaxPoolWithIndexOpGrad : public framework::OperatorWithKernel { ...@@ -100,8 +100,7 @@ class MaxPoolWithIndexOpGrad : public framework::OperatorWithKernel {
class MaxPool2dWithIndexOpMaker : public framework::OpProtoAndCheckerMaker { class MaxPool2dWithIndexOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MaxPool2dWithIndexOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(Tensor) The input tensor of pooling operator. " "(Tensor) The input tensor of pooling operator. "
...@@ -177,8 +176,7 @@ Example: ...@@ -177,8 +176,7 @@ Example:
class MaxPool3dWithIndexOpMaker : public framework::OpProtoAndCheckerMaker { class MaxPool3dWithIndexOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
MaxPool3dWithIndexOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor) The input tensor of pooling operator. " "(Tensor) The input tensor of pooling operator. "
"The format of input tensor is NCDHW, where N is batch size, C is " "The format of input tensor is NCDHW, where N is batch size, C is "
......
...@@ -95,8 +95,7 @@ class PositiveNegativePairOp : public framework::OperatorWithKernel { ...@@ -95,8 +95,7 @@ class PositiveNegativePairOp : public framework::OperatorWithKernel {
class PositiveNegativePairOpMaker : public framework::OpProtoAndCheckerMaker { class PositiveNegativePairOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
PositiveNegativePairOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Score", AddInput("Score",
"(Tensor, float) Model Score on an item (with " "(Tensor, float) Model Score on an item (with "
"respect to QueryID). It's a 2-D tensor with shape [batch_size, " "respect to QueryID). It's a 2-D tensor with shape [batch_size, "
......
...@@ -90,8 +90,7 @@ class PrecisionRecallOp : public framework::OperatorWithKernel { ...@@ -90,8 +90,7 @@ class PrecisionRecallOp : public framework::OperatorWithKernel {
class PrecisionRecallOpMaker : public framework::OpProtoAndCheckerMaker { class PrecisionRecallOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
PrecisionRecallOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("MaxProbs", AddInput("MaxProbs",
"(Tensor, default Tensor<float>) A 2-D tensor with shape N x 1, " "(Tensor, default Tensor<float>) A 2-D tensor with shape N x 1, "
"where N is the batch size. Each row contains the max probability " "where N is the batch size. Each row contains the max probability "
......
...@@ -64,8 +64,7 @@ class PrefetchOp : public framework::OperatorBase { ...@@ -64,8 +64,7 @@ class PrefetchOp : public framework::OperatorBase {
class PrefetchOpMaker : public framework::OpProtoAndCheckerMaker { class PrefetchOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
PrefetchOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(LoDTensor) Input Id variables to be sent").AsDuplicable(); AddInput("X", "(LoDTensor) Input Id variables to be sent").AsDuplicable();
AddOutput("RPCClient", AddOutput("RPCClient",
"(RPCClient) The RPC client object which will be" "(RPCClient) The RPC client object which will be"
......
...@@ -38,8 +38,7 @@ class PReluOp : public framework::OperatorWithKernel { ...@@ -38,8 +38,7 @@ class PReluOp : public framework::OperatorWithKernel {
class PReluOpMaker : public framework::OpProtoAndCheckerMaker { class PReluOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
PReluOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input tensor of prelu operator."); AddInput("X", "The input tensor of prelu operator.");
AddInput("Alpha", "The alpha weight of prelu operator."); AddInput("Alpha", "The alpha weight of prelu operator.");
AddOutput("Out", "The output tensor of prelu operator."); AddOutput("Out", "The output tensor of prelu operator.");
......
...@@ -209,8 +209,7 @@ class TensorPrintOp : public framework::OperatorBase { ...@@ -209,8 +209,7 @@ class TensorPrintOp : public framework::OperatorBase {
class PrintOpProtoAndCheckMaker : public framework::OpProtoAndCheckerMaker { class PrintOpProtoAndCheckMaker : public framework::OpProtoAndCheckerMaker {
public: public:
PrintOpProtoAndCheckMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("In", "Input tensor to be displayed."); AddInput("In", "Input tensor to be displayed.");
AddAttr<int>("first_n", "Only log `first_n` number of times."); AddAttr<int>("first_n", "Only log `first_n` number of times.");
AddAttr<std::string>("message", "A string message to print as a prefix."); AddAttr<std::string>("message", "A string message to print as a prefix.");
......
...@@ -79,8 +79,7 @@ class PriorBoxOp : public framework::OperatorWithKernel { ...@@ -79,8 +79,7 @@ class PriorBoxOp : public framework::OperatorWithKernel {
class PriorBoxOpMaker : public framework::OpProtoAndCheckerMaker { class PriorBoxOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
PriorBoxOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Input", AddInput("Input",
"(Tensor, default Tensor<float>), " "(Tensor, default Tensor<float>), "
"the input feature data of PriorBoxOp, The layout is NCHW."); "the input feature data of PriorBoxOp, The layout is NCHW.");
......
...@@ -66,8 +66,7 @@ class ProximalAdagradOp : public framework::OperatorWithKernel { ...@@ -66,8 +66,7 @@ class ProximalAdagradOp : public framework::OperatorWithKernel {
class ProximalAdagradOpMaker : public framework::OpProtoAndCheckerMaker { class ProximalAdagradOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ProximalAdagradOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", AddInput("Param",
"(Tensor, default Tensor<float>) " "(Tensor, default Tensor<float>) "
"Input parameter that has to be updated."); "Input parameter that has to be updated.");
......
...@@ -54,8 +54,7 @@ class ProximalGDOp : public framework::OperatorWithKernel { ...@@ -54,8 +54,7 @@ class ProximalGDOp : public framework::OperatorWithKernel {
class ProximalGDOpMaker : public framework::OpProtoAndCheckerMaker { class ProximalGDOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ProximalGDOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", AddInput("Param",
"(Tensor, default Tensor<float>) " "(Tensor, default Tensor<float>) "
"Input parameter value that has to be updated."); "Input parameter value that has to be updated.");
......
...@@ -46,8 +46,7 @@ class RankLossOp : public framework::OperatorWithKernel { ...@@ -46,8 +46,7 @@ class RankLossOp : public framework::OperatorWithKernel {
class RankLossOpMaker : public framework::OpProtoAndCheckerMaker { class RankLossOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
RankLossOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Label", AddInput("Label",
"(2-D Tensor with shape [batch_size x 1]) " "(2-D Tensor with shape [batch_size x 1]) "
"The label indicating A ranked higher than B or not."); "The label indicating A ranked higher than B or not.");
......
...@@ -79,8 +79,7 @@ class ReadOp : public framework::OperatorBase { ...@@ -79,8 +79,7 @@ class ReadOp : public framework::OperatorBase {
class ReadOpMaker : public framework::OpProtoAndCheckerMaker { class ReadOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ReadOpMaker(OpProto* op_proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(op_proto, op_checker) {
AddInput("Reader", "(ReaderHolder) The executed reader."); AddInput("Reader", "(ReaderHolder) The executed reader.");
AddOutput("Out", "(LoDTensor) The output data.").AsDuplicable(); AddOutput("Out", "(LoDTensor) The output data.").AsDuplicable();
AddComment(R"DOC( AddComment(R"DOC(
......
...@@ -52,9 +52,8 @@ class CreateBatchReaderOp : public framework::OperatorBase { ...@@ -52,9 +52,8 @@ class CreateBatchReaderOp : public framework::OperatorBase {
}; };
class CreateBatchReaderOpMaker : public DecoratedReaderMakerBase { class CreateBatchReaderOpMaker : public DecoratedReaderMakerBase {
public: protected:
CreateBatchReaderOpMaker(OpProto* op_proto, OpAttrChecker* op_checker) void Apply() override {
: DecoratedReaderMakerBase(op_proto, op_checker) {
AddAttr<int>("batch_size", AddAttr<int>("batch_size",
"How many instances the batch reader yields each time.") "How many instances the batch reader yields each time.")
.GreaterThan(0); .GreaterThan(0);
......
...@@ -113,14 +113,13 @@ class CreateDoubleBufferReaderOp : public framework::OperatorBase { ...@@ -113,14 +113,13 @@ class CreateDoubleBufferReaderOp : public framework::OperatorBase {
}; };
class CreateDoubleBufferReaderOpMaker : public DecoratedReaderMakerBase { class CreateDoubleBufferReaderOpMaker : public DecoratedReaderMakerBase {
public: protected:
CreateDoubleBufferReaderOpMaker(OpProto* op_proto, OpAttrChecker* op_checker) void Apply() override {
: DecoratedReaderMakerBase(op_proto, op_checker) {
AddComment(R"DOC( AddComment(R"DOC(
CreateDoubleBufferReader Operator CreateDoubleBufferReader Operator
A double buffer reader takes another reader as its 'underlying reader'. A double buffer reader takes another reader as its 'underlying reader'.
It launches another thread to execute the 'underlying reader' asynchronously, It launches another thread to execute the 'underlying reader' asynchronously,
which prevents reading process from blocking subsequent training. which prevents reading process from blocking subsequent training.
)DOC"); )DOC");
std::unordered_set<std::string> enum_range; std::unordered_set<std::string> enum_range;
......
...@@ -65,20 +65,19 @@ class CreateMultiPassReaderOp : public framework::OperatorBase { ...@@ -65,20 +65,19 @@ class CreateMultiPassReaderOp : public framework::OperatorBase {
}; };
class CreateMultiPassReaderOpMaker : public DecoratedReaderMakerBase { class CreateMultiPassReaderOpMaker : public DecoratedReaderMakerBase {
public: protected:
CreateMultiPassReaderOpMaker(OpProto* op_proto, OpAttrChecker* op_checker) void Apply() override {
: DecoratedReaderMakerBase(op_proto, op_checker) {
AddAttr<int>("pass_num", "The number of pass to run.").GreaterThan(0); AddAttr<int>("pass_num", "The number of pass to run.").GreaterThan(0);
AddComment(R"DOC( AddComment(R"DOC(
CreateMultiPassReader Operator CreateMultiPassReader Operator
This operator creates a multi-pass reader. A multi-pass reader This operator creates a multi-pass reader. A multi-pass reader
is used to yield data for several pass training continuously. is used to yield data for several pass training continuously.
It takes the number of passes to run as one of its attributes It takes the number of passes to run as one of its attributes
('pass_num'), and maintains a pass counter to record how many ('pass_num'), and maintains a pass counter to record how many
passes it has completed. When the underlying reader reaches the passes it has completed. When the underlying reader reaches the
EOF, the multi-pass reader checks whether it has completed training EOF, the multi-pass reader checks whether it has completed training
of the given number of pass. If not, the underlying reader will of the given number of pass. If not, the underlying reader will
be re-initialized and starts a new pass automatically. be re-initialized and starts a new pass automatically.
)DOC"); )DOC");
} }
......
...@@ -84,9 +84,8 @@ class CreateRandomDataGeneratorOp : public framework::OperatorBase { ...@@ -84,9 +84,8 @@ class CreateRandomDataGeneratorOp : public framework::OperatorBase {
}; };
class CreateRandomDataGeneratorOpMaker : public FileReaderMakerBase { class CreateRandomDataGeneratorOpMaker : public FileReaderMakerBase {
public: protected:
CreateRandomDataGeneratorOpMaker(OpProto* op_proto, OpAttrChecker* op_checker) void Apply() override {
: FileReaderMakerBase(op_proto, op_checker) {
AddAttr<float>("min", "The lower bound of reader's uniform distribution."); AddAttr<float>("min", "The lower bound of reader's uniform distribution.");
AddAttr<float>("max", "The upper bound of reader's uniform distribution."); AddAttr<float>("max", "The upper bound of reader's uniform distribution.");
AddComment(R"DOC( AddComment(R"DOC(
......
...@@ -76,9 +76,8 @@ class CreateRecordIOReaderOp : public framework::OperatorBase { ...@@ -76,9 +76,8 @@ class CreateRecordIOReaderOp : public framework::OperatorBase {
}; };
class CreateRecordIOReaderOpMaker : public FileReaderMakerBase { class CreateRecordIOReaderOpMaker : public FileReaderMakerBase {
public: protected:
CreateRecordIOReaderOpMaker(OpProto* op_proto, OpAttrChecker* op_checker) void Apply() override {
: FileReaderMakerBase(op_proto, op_checker) {
AddAttr<std::string>("filename", "The filename of record io reader"); AddAttr<std::string>("filename", "The filename of record io reader");
AddComment(R"DOC( AddComment(R"DOC(
CreateRecordIOReader Operator CreateRecordIOReader Operator
......
...@@ -92,9 +92,8 @@ class CreateShuffleReaderOp : public framework::OperatorBase { ...@@ -92,9 +92,8 @@ class CreateShuffleReaderOp : public framework::OperatorBase {
}; };
class CreateShuffleReaderOpMaker : public DecoratedReaderMakerBase { class CreateShuffleReaderOpMaker : public DecoratedReaderMakerBase {
public: protected:
CreateShuffleReaderOpMaker(OpProto* op_proto, OpAttrChecker* op_checker) void Apply() override {
: DecoratedReaderMakerBase(op_proto, op_checker) {
AddAttr<int>("buffer_size", "The shuffle buffer size.").GreaterThan(0); AddAttr<int>("buffer_size", "The shuffle buffer size.").GreaterThan(0);
AddComment(R"DOC( AddComment(R"DOC(
CreateShuffleReader Operator CreateShuffleReader Operator
......
...@@ -53,17 +53,16 @@ class CreateThreadedReaderOp : public framework::OperatorBase { ...@@ -53,17 +53,16 @@ class CreateThreadedReaderOp : public framework::OperatorBase {
}; };
class CreateThreadedReaderOpMaker : public DecoratedReaderMakerBase { class CreateThreadedReaderOpMaker : public DecoratedReaderMakerBase {
public: protected:
CreateThreadedReaderOpMaker(OpProto* op_proto, OpAttrChecker* op_checker) void Apply() override {
: DecoratedReaderMakerBase(op_proto, op_checker) {
AddComment(R"DOC( AddComment(R"DOC(
CreateThreadedReader Operator CreateThreadedReader Operator
This operator creates a threaded reader. A threaded reader's This operator creates a threaded reader. A threaded reader's
'ReadNext()' can be invoked by several threads at the same 'ReadNext()' can be invoked by several threads at the same
time. time.
When the attribute 'safe_mode' is true, the threaded reader's When the attribute 'safe_mode' is true, the threaded reader's
'ReInit()' is disabled to avoid unexpected bugs in multi-thread 'ReInit()' is disabled to avoid unexpected bugs in multi-thread
environment. environment.
)DOC"); )DOC");
} }
......
...@@ -185,9 +185,8 @@ class OpenFilesOp : public framework::OperatorBase { ...@@ -185,9 +185,8 @@ class OpenFilesOp : public framework::OperatorBase {
}; };
class OpenFilesOpMaker : public FileReaderMakerBase { class OpenFilesOpMaker : public FileReaderMakerBase {
public: protected:
OpenFilesOpMaker(OpProto* op_proto, OpAttrChecker* op_checker) void Apply() override {
: FileReaderMakerBase(op_proto, op_checker) {
AddAttr<std::vector<std::string>>("file_names", "Files to be read."); AddAttr<std::vector<std::string>>("file_names", "Files to be read.");
AddAttr<int>("thread_num", "The maximal concurrent prefetch thread number.") AddAttr<int>("thread_num", "The maximal concurrent prefetch thread number.")
.GreaterThan(0); .GreaterThan(0);
...@@ -196,7 +195,7 @@ class OpenFilesOpMaker : public FileReaderMakerBase { ...@@ -196,7 +195,7 @@ class OpenFilesOpMaker : public FileReaderMakerBase {
AddComment(R"DOC( AddComment(R"DOC(
OpenFiles Operator OpenFiles Operator
An OpenFilesOp creates a MultiFileReader, which is able to An OpenFilesOp creates a MultiFileReader, which is able to
read data multi-threaded from multiple files. read data multi-threaded from multiple files.
)DOC"); )DOC");
} }
......
...@@ -53,10 +53,7 @@ std::unique_ptr<framework::ReaderBase> CreateReaderByFileName( ...@@ -53,10 +53,7 @@ std::unique_ptr<framework::ReaderBase> CreateReaderByFileName(
return std::unique_ptr<framework::ReaderBase>(reader); return std::unique_ptr<framework::ReaderBase>(reader);
} }
FileReaderMakerBase::FileReaderMakerBase( void FileReaderMakerBase::Make() {
framework::OpProtoAndCheckerMaker::OpProto* op_proto,
framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(op_proto, op_checker) {
AddOutput("Out", "(ReaderHolder) The created random reader.").AsDuplicable(); AddOutput("Out", "(ReaderHolder) The created random reader.").AsDuplicable();
AddAttr<std::vector<int>>("shape_concat", "The concat of all data's shapes."); AddAttr<std::vector<int>>("shape_concat", "The concat of all data's shapes.");
AddAttr<std::vector<int>>( AddAttr<std::vector<int>>(
...@@ -68,6 +65,7 @@ FileReaderMakerBase::FileReaderMakerBase( ...@@ -68,6 +65,7 @@ FileReaderMakerBase::FileReaderMakerBase(
"It means the reader will generate two data each time," "It means the reader will generate two data each time,"
"whose shapes are [2,3,4] and [5,6] respectively."); "whose shapes are [2,3,4] and [5,6] respectively.");
AddAttr<std::vector<int>>("lod_levels", "The LoD levels of each data."); AddAttr<std::vector<int>>("lod_levels", "The LoD levels of each data.");
Apply();
} }
void FileReaderInferShape::operator()(framework::InferShapeContext* ctx) const { void FileReaderInferShape::operator()(framework::InferShapeContext* ctx) const {
...@@ -127,13 +125,11 @@ void DecoratedReaderInferVarType::operator()( ...@@ -127,13 +125,11 @@ void DecoratedReaderInferVarType::operator()(
out_reader->SetDataTypes(in_reader->GetDataTypes()); out_reader->SetDataTypes(in_reader->GetDataTypes());
} }
DecoratedReaderMakerBase::DecoratedReaderMakerBase( void DecoratedReaderMakerBase::Make() {
framework::OpProtoAndCheckerMaker::OpProto* op_proto,
framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(op_proto, op_checker) {
AddInput("UnderlyingReader", AddInput("UnderlyingReader",
"(ReaderHolder) The underlying reader for creating a batch reader."); "(ReaderHolder) The underlying reader for creating a batch reader.");
AddOutput("Out", "(ReaderHolder) The created batch reader."); AddOutput("Out", "(ReaderHolder) The created batch reader.");
Apply();
} }
} // namespace reader } // namespace reader
......
...@@ -47,7 +47,10 @@ extern std::vector<framework::DDim> RestoreShapes( ...@@ -47,7 +47,10 @@ extern std::vector<framework::DDim> RestoreShapes(
class FileReaderMakerBase : public framework::OpProtoAndCheckerMaker { class FileReaderMakerBase : public framework::OpProtoAndCheckerMaker {
public: public:
FileReaderMakerBase(OpProto* op_proto, OpAttrChecker* op_checker); void Make() final;
protected:
virtual void Apply() = 0;
}; };
class FileReaderInferShape : public framework::InferShapeBase { class FileReaderInferShape : public framework::InferShapeBase {
...@@ -76,7 +79,10 @@ class DecoratedReaderInferVarType : public framework::VarTypeInference { ...@@ -76,7 +79,10 @@ class DecoratedReaderInferVarType : public framework::VarTypeInference {
class DecoratedReaderMakerBase : public framework::OpProtoAndCheckerMaker { class DecoratedReaderMakerBase : public framework::OpProtoAndCheckerMaker {
public: public:
DecoratedReaderMakerBase(OpProto* op_proto, OpAttrChecker* op_checker); void Make() final;
protected:
virtual void Apply() = 0;
}; };
} // namespace reader } // namespace reader
......
...@@ -508,8 +508,7 @@ class RecurrentGradOp : public RecurrentBase { ...@@ -508,8 +508,7 @@ class RecurrentGradOp : public RecurrentBase {
class RecurrentOpProtoMaker : public framework::OpProtoAndCheckerMaker { class RecurrentOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
RecurrentOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(kInputs, "rnn inputs").AsDuplicable(); AddInput(kInputs, "rnn inputs").AsDuplicable();
AddInput(kInitialStates, "rnn initial states").AsDuplicable(); AddInput(kInitialStates, "rnn initial states").AsDuplicable();
AddInput(kParameters, AddInput(kParameters,
......
...@@ -53,8 +53,7 @@ class RecvOp : public framework::OperatorBase { ...@@ -53,8 +53,7 @@ class RecvOp : public framework::OperatorBase {
class RecvOpMaker : public framework::OpProtoAndCheckerMaker { class RecvOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
RecvOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("Out", "(Tensor) Variables to get from server.").AsDuplicable(); AddOutput("Out", "(Tensor) Variables to get from server.").AsDuplicable();
AddComment(R"DOC( AddComment(R"DOC(
Recv operator Recv operator
......
...@@ -90,8 +90,7 @@ class ReduceGradOp : public framework::OperatorWithKernel { ...@@ -90,8 +90,7 @@ class ReduceGradOp : public framework::OperatorWithKernel {
class ReduceOpMaker : public framework::OpProtoAndCheckerMaker { class ReduceOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ReduceOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() final {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor) The input tensor. Tensors with rank at most 6 are " "(Tensor) The input tensor. Tensors with rank at most 6 are "
"supported."); "supported.");
...@@ -111,78 +110,20 @@ class ReduceOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -111,78 +110,20 @@ class ReduceOpMaker : public framework::OpProtoAndCheckerMaker {
"(bool, default false) " "(bool, default false) "
"If true, output a scalar reduced along all dimensions.") "If true, output a scalar reduced along all dimensions.")
.SetDefault(false); .SetDefault(false);
comment_ = R"DOC( AddComment(string::Sprintf(R"DOC(
{ReduceOp} Operator. %s Operator.
This operator computes the {reduce} of input tensor along the given dimension. This operator computes the %s of input tensor along the given dimension.
The result tensor has 1 fewer dimension than the input unless keep_dim is true. The result tensor has 1 fewer dimension than the input unless keep_dim is true.
If reduce_all is true, just reduce along all dimensions and output a scalar. If reduce_all is true, just reduce along all dimensions and output a scalar.
)DOC"; )DOC",
AddComment(comment_); GetOpType(), GetName()));
} }
protected: protected:
std::string comment_; virtual std::string GetName() const = 0;
virtual std::string GetOpType() const = 0;
void Replace(std::string *src, std::string from, std::string to) {
std::size_t len_from = std::strlen(from.c_str());
std::size_t len_to = std::strlen(to.c_str());
for (std::size_t pos = src->find(from); pos != std::string::npos;
pos = src->find(from, pos + len_to)) {
src->replace(pos, len_from, to);
}
}
void SetComment(std::string name, std::string op) {
Replace(&comment_, "{ReduceOp}", name);
Replace(&comment_, "{reduce}", op);
}
};
class ReduceSumOpMaker : public ReduceOpMaker {
public:
ReduceSumOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: ReduceOpMaker(proto, op_checker) {
SetComment("ReduceSum", "sum");
AddComment(comment_);
}
};
class ReduceMeanOpMaker : public ReduceOpMaker {
public:
ReduceMeanOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: ReduceOpMaker(proto, op_checker) {
SetComment("ReduceMean", "mean");
AddComment(comment_);
}
};
class ReduceMaxOpMaker : public ReduceOpMaker {
public:
ReduceMaxOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: ReduceOpMaker(proto, op_checker) {
SetComment("ReduceMax", "max");
AddComment(comment_);
}
};
class ReduceMinOpMaker : public ReduceOpMaker {
public:
ReduceMinOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: ReduceOpMaker(proto, op_checker) {
SetComment("ReduceMin", "min");
AddComment(comment_);
}
};
class ReduceProdOpMaker : public ReduceOpMaker {
public:
ReduceProdOpMaker(OpProto *proto, OpAttrChecker *op_checker)
: ReduceOpMaker(proto, op_checker) {
SetComment("ReduceProd", "production");
AddComment(comment_);
}
}; };
} // namespace operators } // namespace operators
...@@ -190,25 +131,21 @@ class ReduceProdOpMaker : public ReduceOpMaker { ...@@ -190,25 +131,21 @@ class ReduceProdOpMaker : public ReduceOpMaker {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(reduce_sum, ops::ReduceOp, ops::ReduceSumOpMaker, #define REGISTER_REDUCE_OP(op_name) \
paddle::framework::DefaultGradOpDescMaker<true>); class __##op_name##Maker__ : public ops::ReduceOpMaker { \
REGISTER_OPERATOR(reduce_sum_grad, ops::ReduceGradOp); protected: \
virtual std::string GetName() const { return #op_name; } \
REGISTER_OPERATOR(reduce_mean, ops::ReduceOp, ops::ReduceMeanOpMaker, virtual std::string GetOpType() const { return "Reduce " #op_name; } \
paddle::framework::DefaultGradOpDescMaker<true>); }; \
REGISTER_OPERATOR(reduce_mean_grad, ops::ReduceGradOp); REGISTER_OPERATOR(reduce_##op_name, ops::ReduceOp, __##op_name##Maker__, \
paddle::framework::DefaultGradOpDescMaker<true>); \
REGISTER_OPERATOR(reduce_max, ops::ReduceOp, ops::ReduceMaxOpMaker, REGISTER_OPERATOR(reduce_##op_name##_grad, ops::ReduceGradOp)
paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(reduce_max_grad, ops::ReduceGradOp); REGISTER_REDUCE_OP(sum);
REGISTER_REDUCE_OP(mean);
REGISTER_OPERATOR(reduce_min, ops::ReduceOp, ops::ReduceMinOpMaker, REGISTER_REDUCE_OP(max);
paddle::framework::DefaultGradOpDescMaker<true>); REGISTER_REDUCE_OP(min);
REGISTER_OPERATOR(reduce_min_grad, ops::ReduceGradOp); REGISTER_REDUCE_OP(prod);
REGISTER_OPERATOR(reduce_prod, ops::ReduceOp, ops::ReduceProdOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(reduce_prod_grad, ops::ReduceGradOp);
#define REGISTER_REDUCE_CPU_KERNEL(reduce_type, functor, grad_functor) \ #define REGISTER_REDUCE_CPU_KERNEL(reduce_type, functor, grad_functor) \
REGISTER_OP_CPU_KERNEL(reduce_type, \ REGISTER_OP_CPU_KERNEL(reduce_type, \
......
...@@ -23,9 +23,7 @@ namespace operators { ...@@ -23,9 +23,7 @@ namespace operators {
class ReorderLoDTensorByRankTableOpProtoMaker class ReorderLoDTensorByRankTableOpProtoMaker
: public framework::OpProtoAndCheckerMaker { : public framework::OpProtoAndCheckerMaker {
public: public:
ReorderLoDTensorByRankTableOpProtoMaker(OpProto *proto, void Make() override {
OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor), the input lod tensor to be reordered according to " "(LoDTensor), the input lod tensor to be reordered according to "
"Input(RankTable)."); "Input(RankTable).");
......
...@@ -22,8 +22,7 @@ namespace operators { ...@@ -22,8 +22,7 @@ namespace operators {
class ReshapeOpMaker : public framework::OpProtoAndCheckerMaker { class ReshapeOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ReshapeOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor). The input tensor of reshape operator."); AddInput("X", "(Tensor). The input tensor of reshape operator.");
AddInput("Shape", AddInput("Shape",
"(Tensor<int32>, optional). If provided, reshape according to " "(Tensor<int32>, optional). If provided, reshape according to "
......
...@@ -63,8 +63,7 @@ class RmspropOp : public framework::OperatorWithKernel { ...@@ -63,8 +63,7 @@ class RmspropOp : public framework::OperatorWithKernel {
class RmspropOpMaker : public framework::OpProtoAndCheckerMaker { class RmspropOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
RmspropOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", AddInput("Param",
"(Tensor, default Tensor<float>) " "(Tensor, default Tensor<float>) "
"Input parameter value that has to be updated."); "Input parameter value that has to be updated.");
......
...@@ -59,8 +59,7 @@ class RNNMemoryHelperOpShapeInference : public framework::InferShapeBase { ...@@ -59,8 +59,7 @@ class RNNMemoryHelperOpShapeInference : public framework::InferShapeBase {
class RNNMemoryHelperOpInfoMaker : public framework::OpProtoAndCheckerMaker { class RNNMemoryHelperOpInfoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
RNNMemoryHelperOpInfoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", ""); AddInput("X", "");
AddOutput("Out", ""); AddOutput("Out", "");
AddAttr<int>("dtype", AddAttr<int>("dtype",
...@@ -117,8 +116,7 @@ class RNNMemoryHelperGradOp : public framework::OperatorBase { ...@@ -117,8 +116,7 @@ class RNNMemoryHelperGradOp : public framework::OperatorBase {
class RNNMemoryHelperGradOpInfoMaker class RNNMemoryHelperGradOpInfoMaker
: public framework::OpProtoAndCheckerMaker { : public framework::OpProtoAndCheckerMaker {
public: public:
RNNMemoryHelperGradOpInfoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(framework::GradVarName("Out"), ""); AddInput(framework::GradVarName("Out"), "");
AddInput("X", ""); AddInput("X", "");
AddInput("Out", ""); AddInput("Out", "");
......
...@@ -98,8 +98,7 @@ class ROIPoolGradOp : public framework::OperatorWithKernel { ...@@ -98,8 +98,7 @@ class ROIPoolGradOp : public framework::OperatorWithKernel {
class ROIPoolOpMaker : public framework::OpProtoAndCheckerMaker { class ROIPoolOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ROIPoolOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor), " "(Tensor), "
"the input of ROIPoolOp. " "the input of ROIPoolOp. "
......
...@@ -76,8 +76,7 @@ class RowConvGradOp : public framework::OperatorWithKernel { ...@@ -76,8 +76,7 @@ class RowConvGradOp : public framework::OperatorWithKernel {
class RowConvOpMaker : public framework::OpProtoAndCheckerMaker { class RowConvOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
RowConvOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor), the input(X) is a LodTensor, which supports " "(LoDTensor), the input(X) is a LodTensor, which supports "
"variable time-length input sequences. The underlying tensor " "variable time-length input sequences. The underlying tensor "
......
...@@ -18,6 +18,7 @@ limitations under the License. */ ...@@ -18,6 +18,7 @@ limitations under the License. */
#include <numeric> #include <numeric>
#include <sstream> #include <sstream>
#include "paddle/fluid/framework/data_type.h" #include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/framework/data_type_transform.h"
#include "paddle/fluid/framework/framework.pb.h" #include "paddle/fluid/framework/framework.pb.h"
#include "paddle/fluid/framework/lod_tensor.h" #include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/op_registry.h" #include "paddle/fluid/framework/op_registry.h"
...@@ -69,6 +70,7 @@ class SaveCombineOp : public framework::OperatorBase { ...@@ -69,6 +70,7 @@ class SaveCombineOp : public framework::OperatorBase {
const platform::Place &place) const override { const platform::Place &place) const override {
auto filename = Attr<std::string>("file_path"); auto filename = Attr<std::string>("file_path");
auto overwrite = Attr<bool>("overwrite"); auto overwrite = Attr<bool>("overwrite");
auto save_as_fp16 = Attr<bool>("save_as_fp16");
bool is_present = FileExists(filename); bool is_present = FileExists(filename);
if (is_present && !overwrite) { if (is_present && !overwrite) {
...@@ -100,8 +102,24 @@ class SaveCombineOp : public framework::OperatorBase { ...@@ -100,8 +102,24 @@ class SaveCombineOp : public framework::OperatorBase {
inp_var_names[i]); inp_var_names[i]);
auto &tensor = var->Get<framework::LoDTensor>(); auto &tensor = var->Get<framework::LoDTensor>();
// Serialize tensor // Serialize tensors one by one
framework::SerializeToStream(fout, tensor, dev_ctx);
// Check types to see if a fp16 transformation is required
auto in_dtype = framework::ToDataType(tensor.type());
auto out_dtype =
save_as_fp16 ? framework::proto::VarType::FP16 : in_dtype;
if (in_dtype != out_dtype) {
auto in_kernel_type = framework::OpKernelType(in_dtype, place);
auto out_kernel_type = framework::OpKernelType(out_dtype, place);
framework::LoDTensor out;
// copy LoD info to the new tensor
out.set_lod(tensor.lod());
framework::TransDataType(in_kernel_type, out_kernel_type, tensor, &out);
framework::SerializeToStream(fout, out, dev_ctx);
} else {
framework::SerializeToStream(fout, tensor, dev_ctx);
}
} }
fout.close(); fout.close();
} }
...@@ -109,8 +127,7 @@ class SaveCombineOp : public framework::OperatorBase { ...@@ -109,8 +127,7 @@ class SaveCombineOp : public framework::OperatorBase {
class SaveCombineOpProtoMaker : public framework::OpProtoAndCheckerMaker { class SaveCombineOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SaveCombineOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(vector) Input LoDTensors that need to be saved together in a file.") "(vector) Input LoDTensors that need to be saved together in a file.")
...@@ -125,6 +142,12 @@ to a file on disk. ...@@ -125,6 +142,12 @@ to a file on disk.
"(boolean, default true)" "(boolean, default true)"
"Overwrite the output file if it exists.") "Overwrite the output file if it exists.")
.SetDefault(true); .SetDefault(true);
AddAttr<bool>("save_as_fp16",
"(boolean, default false)"
"If true, the tensor will be converted to float16 data "
"type and then saved. Otherwise, the tensor will be "
"directly saved without data type conversion.")
.SetDefault(false);
AddAttr<std::string>( AddAttr<std::string>(
"file_path", "file_path",
"(string)" "(string)"
......
...@@ -17,15 +17,17 @@ limitations under the License. */ ...@@ -17,15 +17,17 @@ limitations under the License. */
#include <vector> #include <vector>
#include "gtest/gtest.h" #include "gtest/gtest.h"
#include "paddle/fluid/framework/op_registry.h" #include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/platform/float16.h"
USE_NO_KERNEL_OP(save_combine); USE_NO_KERNEL_OP(save_combine);
USE_NO_KERNEL_OP(load_combine); USE_NO_KERNEL_OP(load_combine);
int* CreateForSaveCombineOp(int x, int y, const std::vector<int>& lod_info, template <typename T, typename U>
std::string var_name, T* CreateForSaveCombineOp(int x, int y, const std::vector<int>& lod_info,
const paddle::platform::CPUPlace& place, std::string var_name,
paddle::framework::Scope* scope, const paddle::platform::CPUPlace& place,
paddle::framework::LoD* expect_lod) { paddle::framework::Scope* scope,
paddle::framework::LoD* expect_lod) {
auto var = scope->Var(var_name); auto var = scope->Var(var_name);
auto tensor = var->GetMutable<paddle::framework::LoDTensor>(); auto tensor = var->GetMutable<paddle::framework::LoDTensor>();
tensor->Resize({x, y}); tensor->Resize({x, y});
...@@ -34,9 +36,10 @@ int* CreateForSaveCombineOp(int x, int y, const std::vector<int>& lod_info, ...@@ -34,9 +36,10 @@ int* CreateForSaveCombineOp(int x, int y, const std::vector<int>& lod_info,
(*expect_lod)[0].push_back(lod_info[i]); (*expect_lod)[0].push_back(lod_info[i]);
} }
tensor->set_lod(*expect_lod); tensor->set_lod(*expect_lod);
int* expect = tensor->mutable_data<int>(place); T* expect = tensor->mutable_data<T>(place);
for (int64_t i = 0; i < tensor->numel(); ++i) { for (int64_t i = 0; i < tensor->numel(); ++i) {
expect[i] = static_cast<int>(i); expect[i] = static_cast<T>(
static_cast<U>(i)); // For FP16, we intend to do float(float16(i))
} }
return expect; return expect;
} }
...@@ -48,18 +51,20 @@ paddle::framework::LoDTensor* GeneratePlaceholderBeforeLoad( ...@@ -48,18 +51,20 @@ paddle::framework::LoDTensor* GeneratePlaceholderBeforeLoad(
return target; return target;
} }
int* GetValuesAfterLoadCombineOp(paddle::framework::LoDTensor* target, template <typename T>
const paddle::framework::Scope& scope, T* GetValuesAfterLoadCombineOp(paddle::framework::LoDTensor* target,
paddle::framework::LoD* actual_lod) { const paddle::framework::Scope& scope,
int* actual = target->data<int>(); paddle::framework::LoD* actual_lod) {
T* actual = target->data<T>();
*actual_lod = target->lod(); *actual_lod = target->lod();
return actual; return actual;
} }
void CheckValues(int* expect, int* actual, paddle::framework::LoD expect_lod, template <typename T, typename U>
paddle::framework::LoD actual_lod, const int& numel) { void CheckValues(T* expect, U* actual, const paddle::framework::LoD& expect_lod,
for (int64_t i = 0; i < numel; ++i) { const paddle::framework::LoD& actual_lod, const int& numel) {
EXPECT_EQ(expect[i], actual[i]); for (int i = 0; i < numel; ++i) {
EXPECT_EQ(expect[i], static_cast<T>(actual[i]));
} }
EXPECT_EQ(expect_lod.size(), actual_lod.size()); EXPECT_EQ(expect_lod.size(), actual_lod.size());
for (size_t i = 0; i < expect_lod.size(); ++i) { for (size_t i = 0; i < expect_lod.size(); ++i) {
...@@ -78,26 +83,26 @@ TEST(SaveLoadCombineOp, CPU) { ...@@ -78,26 +83,26 @@ TEST(SaveLoadCombineOp, CPU) {
std::vector<int> lod1 = {0, 1, 2, 3, 10}; std::vector<int> lod1 = {0, 1, 2, 3, 10};
int numel1 = 100; int numel1 = 100;
paddle::framework::LoD expect_lod1; paddle::framework::LoD expect_lod1;
int* expect1 = CreateForSaveCombineOp(10, 10, lod1, "test_var1", place, int* expect1 = CreateForSaveCombineOp<int, int>(10, 10, lod1, "test_var1",
&scope, &expect_lod1); place, &scope, &expect_lod1);
std::vector<int> lod2 = {0, 2, 5, 10}; std::vector<int> lod2 = {0, 2, 5, 10};
int numel2 = 200; int numel2 = 200;
paddle::framework::LoD expect_lod2; paddle::framework::LoD expect_lod2;
int* expect2 = CreateForSaveCombineOp(10, 20, lod2, "test_var2", place, int* expect2 = CreateForSaveCombineOp<int, int>(10, 20, lod2, "test_var2",
&scope, &expect_lod2); place, &scope, &expect_lod2);
std::vector<int> lod3 = {0, 2, 3, 20}; std::vector<int> lod3 = {0, 2, 3, 20};
int numel3 = 4000; int numel3 = 4000;
paddle::framework::LoD expect_lod3; paddle::framework::LoD expect_lod3;
int* expect3 = CreateForSaveCombineOp(20, 200, lod3, "test_var3", place, int* expect3 = CreateForSaveCombineOp<int, int>(20, 200, lod3, "test_var3",
&scope, &expect_lod3); place, &scope, &expect_lod3);
std::vector<int> lod4 = {0, 1, 20}; std::vector<int> lod4 = {0, 1, 20};
int numel4 = 1000; int numel4 = 1000;
paddle::framework::LoD expect_lod4; paddle::framework::LoD expect_lod4;
int* expect4 = CreateForSaveCombineOp(20, 50, lod4, "test_var4", place, int* expect4 = CreateForSaveCombineOp<int, int>(20, 50, lod4, "test_var4",
&scope, &expect_lod4); place, &scope, &expect_lod4);
// Set attributes // Set attributes
std::string filename = "check_tensor.ls"; std::string filename = "check_tensor.ls";
...@@ -123,15 +128,92 @@ TEST(SaveLoadCombineOp, CPU) { ...@@ -123,15 +128,92 @@ TEST(SaveLoadCombineOp, CPU) {
load_combine_op->Run(scope, place); load_combine_op->Run(scope, place);
paddle::framework::LoD actual_lod1, actual_lod2, actual_lod3, actual_lod4; paddle::framework::LoD actual_lod1, actual_lod2, actual_lod3, actual_lod4;
int* actual1 = GetValuesAfterLoadCombineOp(target1, scope, &actual_lod1); int* actual1 = GetValuesAfterLoadCombineOp<int>(target1, scope, &actual_lod1);
int* actual2 = GetValuesAfterLoadCombineOp(target2, scope, &actual_lod2); int* actual2 = GetValuesAfterLoadCombineOp<int>(target2, scope, &actual_lod2);
int* actual3 = GetValuesAfterLoadCombineOp(target3, scope, &actual_lod3); int* actual3 = GetValuesAfterLoadCombineOp<int>(target3, scope, &actual_lod3);
int* actual4 = GetValuesAfterLoadCombineOp(target4, scope, &actual_lod4); int* actual4 = GetValuesAfterLoadCombineOp<int>(target4, scope, &actual_lod4);
CheckValues(expect1, actual1, expect_lod1, actual_lod1, numel1); CheckValues<int, int>(expect1, actual1, expect_lod1, actual_lod1, numel1);
CheckValues(expect2, actual2, expect_lod2, actual_lod2, numel2); CheckValues<int, int>(expect2, actual2, expect_lod2, actual_lod2, numel2);
CheckValues(expect3, actual3, expect_lod3, actual_lod3, numel3); CheckValues<int, int>(expect3, actual3, expect_lod3, actual_lod3, numel3);
CheckValues(expect4, actual4, expect_lod4, actual_lod4, numel4); CheckValues<int, int>(expect4, actual4, expect_lod4, actual_lod4, numel4);
}
// FP16 version of SaveLoadCombineOp Test
TEST(SaveLoadCombineFP16Op, CPU) {
paddle::framework::Scope scope;
paddle::platform::CPUPlace place;
std::vector<int> lod1 = {0, 1, 2, 3, 10};
int numel1 = 100;
paddle::framework::LoD expect_lod1;
float* expect1 = CreateForSaveCombineOp<float, paddle::platform::float16>(
10, 10, lod1, "test_var1", place, &scope, &expect_lod1);
std::vector<int> lod2 = {0, 2, 5, 10};
int numel2 = 200;
paddle::framework::LoD expect_lod2;
float* expect2 = CreateForSaveCombineOp<float, paddle::platform::float16>(
10, 20, lod2, "test_var2", place, &scope, &expect_lod2);
std::vector<int> lod3 = {0, 20};
int numel3 = 4000;
paddle::framework::LoD expect_lod3;
float* expect3 = CreateForSaveCombineOp<float, paddle::platform::float16>(
20, 200, lod3, "test_var3", place, &scope, &expect_lod3);
std::vector<int> lod4 = {0, 1, 20};
int numel4 = 1000;
paddle::framework::LoD expect_lod4;
float* expect4 = CreateForSaveCombineOp<float, paddle::platform::float16>(
20, 50, lod4, "test_var4", place, &scope, &expect_lod4);
// Set attributes
std::string filename = "check_tensor_fp16.ls";
paddle::framework::AttributeMap attrs;
attrs.insert({"file_path", std::string(filename)});
attrs.insert({"save_as_fp16", true});
// Run the save_combine_op
auto save_combine_op = paddle::framework::OpRegistry::CreateOp(
"save_combine",
{{"X", {"test_var1", "test_var2", "test_var3", "test_var4"}}}, {}, attrs);
save_combine_op->Run(scope, place);
// Set up output vars
auto target1 = GeneratePlaceholderBeforeLoad("out_var1", &scope);
auto target2 = GeneratePlaceholderBeforeLoad("out_var2", &scope);
auto target3 = GeneratePlaceholderBeforeLoad("out_var3", &scope);
auto target4 = GeneratePlaceholderBeforeLoad("out_var4", &scope);
// Run the load_combine_op
auto load_combine_op = paddle::framework::OpRegistry::CreateOp(
"load_combine", {},
{{"Out", {"out_var1", "out_var2", "out_var3", "out_var4"}}}, attrs);
load_combine_op->Run(scope, place);
paddle::framework::LoD actual_lod1, actual_lod2, actual_lod3, actual_lod4;
paddle::platform::float16* actual1 =
GetValuesAfterLoadCombineOp<paddle::platform::float16>(target1, scope,
&actual_lod1);
paddle::platform::float16* actual2 =
GetValuesAfterLoadCombineOp<paddle::platform::float16>(target2, scope,
&actual_lod2);
paddle::platform::float16* actual3 =
GetValuesAfterLoadCombineOp<paddle::platform::float16>(target3, scope,
&actual_lod3);
paddle::platform::float16* actual4 =
GetValuesAfterLoadCombineOp<paddle::platform::float16>(target4, scope,
&actual_lod4);
CheckValues<float, paddle::platform::float16>(expect1, actual1, expect_lod1,
actual_lod1, numel1);
CheckValues<float, paddle::platform::float16>(expect2, actual2, expect_lod2,
actual_lod2, numel2);
CheckValues<float, paddle::platform::float16>(expect3, actual3, expect_lod3,
actual_lod3, numel3);
CheckValues<float, paddle::platform::float16>(expect4, actual4, expect_lod4,
actual_lod4, numel4);
} }
// Test with original SaveLoadTest // Test with original SaveLoadTest
...@@ -141,7 +223,7 @@ TEST(SaveLoadTestWithCombineOp, CPU) { ...@@ -141,7 +223,7 @@ TEST(SaveLoadTestWithCombineOp, CPU) {
auto var = scope.Var("test_var"); auto var = scope.Var("test_var");
auto tensor = var->GetMutable<paddle::framework::LoDTensor>(); auto tensor = var->GetMutable<paddle::framework::LoDTensor>();
tensor->Resize({3, 10}); tensor->Resize({3, 4000});
paddle::framework::LoD expect_lod; paddle::framework::LoD expect_lod;
expect_lod.resize(1); expect_lod.resize(1);
expect_lod[0].push_back(0); expect_lod[0].push_back(0);
......
...@@ -63,14 +63,21 @@ TEST(SaveLoadOp, CPU) { ...@@ -63,14 +63,21 @@ TEST(SaveLoadOp, CPU) {
} }
} }
TEST(SaveLoadFP16Op, CPU) { TEST(SaveFP16Op, CPU) {
paddle::framework::Scope scope; paddle::framework::Scope scope;
paddle::platform::CPUPlace place; paddle::platform::CPUPlace place;
auto var = scope.Var("test_var"); auto var = scope.Var("test_var");
auto tensor = var->GetMutable<paddle::framework::LoDTensor>(); auto tensor = var->GetMutable<paddle::framework::LoDTensor>();
tensor->Resize({3, 10}); tensor->Resize({3, 10});
paddle::framework::LoD expect_lod;
expect_lod.resize(1);
expect_lod[0].push_back(0);
expect_lod[0].push_back(1);
expect_lod[0].push_back(2);
expect_lod[0].push_back(3);
tensor->set_lod(expect_lod);
float* expect = tensor->mutable_data<float>(place); float* expect = tensor->mutable_data<float>(place);
for (int64_t i = 0; i < tensor->numel(); ++i) { for (int64_t i = 0; i < tensor->numel(); ++i) {
expect[i] = static_cast<float>(paddle::platform::float16(i)); expect[i] = static_cast<float>(paddle::platform::float16(i));
...@@ -93,4 +100,60 @@ TEST(SaveLoadFP16Op, CPU) { ...@@ -93,4 +100,60 @@ TEST(SaveLoadFP16Op, CPU) {
for (int64_t i = 0; i < tensor->numel(); ++i) { for (int64_t i = 0; i < tensor->numel(); ++i) {
EXPECT_EQ(expect[i], static_cast<float>(actual[i])); EXPECT_EQ(expect[i], static_cast<float>(actual[i]));
} }
auto& actual_lod = target->lod();
EXPECT_EQ(expect_lod.size(), actual_lod.size());
for (size_t i = 0; i < expect_lod.size(); ++i) {
for (size_t j = 0; j < expect_lod[i].size(); ++j) {
EXPECT_EQ(expect_lod[i][j], actual_lod[i][j]);
}
}
}
TEST(LoadFP16Op, CPU) {
paddle::framework::Scope scope;
paddle::platform::CPUPlace place;
auto var = scope.Var("test_var");
auto tensor = var->GetMutable<paddle::framework::LoDTensor>();
tensor->Resize({3, 10});
paddle::framework::LoD expect_lod;
expect_lod.resize(1);
expect_lod[0].push_back(0);
expect_lod[0].push_back(1);
expect_lod[0].push_back(2);
expect_lod[0].push_back(3);
tensor->set_lod(expect_lod);
float* expect = tensor->mutable_data<float>(place);
for (int64_t i = 0; i < tensor->numel(); ++i) {
expect[i] = static_cast<float>(paddle::platform::float16(i));
}
paddle::framework::AttributeMap attrs;
attrs.insert({"file_path", std::string("tensor.save")});
attrs.insert({"load_as_fp16", true});
auto save_op = paddle::framework::OpRegistry::CreateOp(
"save", {{"X", {"test_var"}}}, {}, attrs);
save_op->Run(scope, place);
auto load_var = scope.Var("out_var");
auto load_op = paddle::framework::OpRegistry::CreateOp(
"load", {}, {{"Out", {"out_var"}}}, attrs);
load_op->Run(scope, place);
auto target = load_var->Get<paddle::framework::LoDTensor>();
paddle::platform::float16* actual = target.data<paddle::platform::float16>();
for (int64_t i = 0; i < tensor->numel(); ++i) {
EXPECT_EQ(expect[i], static_cast<float>(actual[i]));
}
auto& actual_lod = target.lod();
EXPECT_EQ(expect_lod.size(), actual_lod.size());
for (size_t i = 0; i < expect_lod.size(); ++i) {
for (size_t j = 0; j < expect_lod[i].size(); ++j) {
EXPECT_EQ(expect_lod[i][j], actual_lod[i][j]);
}
}
} }
...@@ -117,8 +117,7 @@ class SaveOp : public framework::OperatorBase { ...@@ -117,8 +117,7 @@ class SaveOp : public framework::OperatorBase {
class SaveOpProtoMaker : public framework::OpProtoAndCheckerMaker { class SaveOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SaveOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor ) Input tensor to be saved"); AddInput("X", "(Tensor ) Input tensor to be saved");
AddComment(R"DOC( AddComment(R"DOC(
Save operator Save operator
......
...@@ -37,8 +37,7 @@ class ScaleOp : public framework::OperatorWithKernel { ...@@ -37,8 +37,7 @@ class ScaleOp : public framework::OperatorWithKernel {
class ScaleOpMaker : public framework::OpProtoAndCheckerMaker { class ScaleOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ScaleOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) Input tensor of scale operator."); AddInput("X", "(Tensor) Input tensor of scale operator.");
AddOutput("Out", "(Tensor) Output tensor of scale operator."); AddOutput("Out", "(Tensor) Output tensor of scale operator.");
AddComment(R"DOC( AddComment(R"DOC(
......
...@@ -78,8 +78,7 @@ class ScatterGradOp : public framework::OperatorWithKernel { ...@@ -78,8 +78,7 @@ class ScatterGradOp : public framework::OperatorWithKernel {
class ScatterOpMaker : public framework::OpProtoAndCheckerMaker { class ScatterOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ScatterOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The source input of scatter op"); AddInput("X", "The source input of scatter op");
AddInput("Ids", "The index input of scatter op where X will be updated"); AddInput("Ids", "The index input of scatter op where X will be updated");
AddInput("Updates", "The updated value of updates op"); AddInput("Updates", "The updated value of updates op");
......
...@@ -380,8 +380,7 @@ class SelectOp : public framework::OperatorBase { ...@@ -380,8 +380,7 @@ class SelectOp : public framework::OperatorBase {
class SelectOpMaker : public framework::OpProtoAndCheckerMaker { class SelectOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SelectOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(kX, AddInput(kX,
"A set of variables, which are required by operators inside the " "A set of variables, which are required by operators inside the "
"cases of Select Op") "cases of Select Op")
......
...@@ -57,8 +57,7 @@ class SendBarrierOp : public framework::OperatorBase { ...@@ -57,8 +57,7 @@ class SendBarrierOp : public framework::OperatorBase {
class SendBarrierOpMaker : public framework::OpProtoAndCheckerMaker { class SendBarrierOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SendBarrierOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("RPCClient", AddOutput("RPCClient",
"(RPCClient) The RPC client object which is" "(RPCClient) The RPC client object which is"
"initialized at most once."); "initialized at most once.");
......
...@@ -92,8 +92,7 @@ class SendOp : public framework::OperatorBase { ...@@ -92,8 +92,7 @@ class SendOp : public framework::OperatorBase {
class SendOpMaker : public framework::OpProtoAndCheckerMaker { class SendOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SendOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) Input tensor to be sent").AsDuplicable(); AddInput("X", "(Tensor) Input tensor to be sent").AsDuplicable();
AddOutput("Out", "(Tensor) Output tensor to be received from server") AddOutput("Out", "(Tensor) Output tensor to be received from server")
.AsDuplicable(); .AsDuplicable();
......
...@@ -66,8 +66,7 @@ class SendVarsOp : public framework::OperatorBase { ...@@ -66,8 +66,7 @@ class SendVarsOp : public framework::OperatorBase {
class SendVarsOpMaker : public framework::OpProtoAndCheckerMaker { class SendVarsOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SendVarsOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor, SelectedRows) Input variables to be sent") AddInput("X", "(Tensor, SelectedRows) Input variables to be sent")
.AsDuplicable(); .AsDuplicable();
AddOutput("RPCClient", AddOutput("RPCClient",
......
...@@ -43,8 +43,7 @@ class SequenceConcatOp : public framework::OperatorWithKernel { ...@@ -43,8 +43,7 @@ class SequenceConcatOp : public framework::OperatorWithKernel {
class SequenceConcatOpMaker : public framework::OpProtoAndCheckerMaker { class SequenceConcatOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SequenceConcatOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LodTensorArray) Input is a vector of LoDTensor, " "(LodTensorArray) Input is a vector of LoDTensor, "
"each of which is a variable-length sequence or nested sequence.") "each of which is a variable-length sequence or nested sequence.")
......
...@@ -102,8 +102,7 @@ class SequenceConvGradOp : public framework::OperatorWithKernel { ...@@ -102,8 +102,7 @@ class SequenceConvGradOp : public framework::OperatorWithKernel {
class SequenceConvOpMaker : public framework::OpProtoAndCheckerMaker { class SequenceConvOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SequenceConvOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(LoDTensor) the input(X) is a LodTensor, which supports " "(LoDTensor) the input(X) is a LodTensor, which supports "
......
...@@ -37,8 +37,7 @@ class SequenceEraseOp : public framework::OperatorWithKernel { ...@@ -37,8 +37,7 @@ class SequenceEraseOp : public framework::OperatorWithKernel {
class SequenceEraseOpMaker : public framework::OpProtoAndCheckerMaker { class SequenceEraseOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SequenceEraseOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(2-D LoDTensor with the 2nd dim. equal to 1) " "(2-D LoDTensor with the 2nd dim. equal to 1) "
"Input LoDTensor of SequenceEraseOp."); "Input LoDTensor of SequenceEraseOp.");
......
...@@ -94,8 +94,7 @@ class SequenceExpandOp : public framework::OperatorWithKernel { ...@@ -94,8 +94,7 @@ class SequenceExpandOp : public framework::OperatorWithKernel {
class SequenceExpandOpMaker : public framework::OpProtoAndCheckerMaker { class SequenceExpandOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SequenceExpandOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor, default LoDTensor<float>) A 2-D LoDTensor whose lod " "(LoDTensor, default LoDTensor<float>) A 2-D LoDTensor whose lod "
"level is at most 1."); "level is at most 1.");
......
...@@ -38,8 +38,7 @@ class SequencePoolOp : public framework::OperatorWithKernel { ...@@ -38,8 +38,7 @@ class SequencePoolOp : public framework::OperatorWithKernel {
class SequencePoolOpMaker : public framework::OpProtoAndCheckerMaker { class SequencePoolOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SequencePoolOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(LoDTensor) The variable-length input of SequencePoolOp"); AddInput("X", "(LoDTensor) The variable-length input of SequencePoolOp");
AddOutput("Out", AddOutput("Out",
"(Tensor) The output of SequencePoolOp does not contain LoD " "(Tensor) The output of SequencePoolOp does not contain LoD "
......
...@@ -42,8 +42,7 @@ class SequenceReshapeOp : public framework::OperatorWithKernel { ...@@ -42,8 +42,7 @@ class SequenceReshapeOp : public framework::OperatorWithKernel {
class SequenceReshapeOpMaker : public framework::OpProtoAndCheckerMaker { class SequenceReshapeOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SequenceReshapeOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor, default LoDTensor<float>) A 2-D LoDTensor with shape " "(LoDTensor, default LoDTensor<float>) A 2-D LoDTensor with shape "
"being [N, M]."); "being [N, M].");
......
...@@ -79,8 +79,7 @@ class SequenceSliceGradOp : public framework::OperatorWithKernel { ...@@ -79,8 +79,7 @@ class SequenceSliceGradOp : public framework::OperatorWithKernel {
class SequenceSliceOpMaker : public framework::OpProtoAndCheckerMaker { class SequenceSliceOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SequenceSliceOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor), " "(LoDTensor), "
"the input of SequenceSliceOp."); "the input of SequenceSliceOp.");
......
...@@ -57,8 +57,7 @@ class SequenceSoftmaxOp : public framework::OperatorWithKernel { ...@@ -57,8 +57,7 @@ class SequenceSoftmaxOp : public framework::OperatorWithKernel {
class SequenceSoftmaxOpMaker : public framework::OpProtoAndCheckerMaker { class SequenceSoftmaxOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SequenceSoftmaxOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor) 1-D or 2-D input LoDTensor with the 2-nd dimension " "(LoDTensor) 1-D or 2-D input LoDTensor with the 2-nd dimension "
"of length 1."); "of length 1.");
......
...@@ -68,8 +68,7 @@ class SGDOpInferVarType : public framework::VarTypeInference { ...@@ -68,8 +68,7 @@ class SGDOpInferVarType : public framework::VarTypeInference {
class SGDOpMaker : public framework::OpProtoAndCheckerMaker { class SGDOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SGDOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Param", "(Tensor or SelectedRows) Input parameter"); AddInput("Param", "(Tensor or SelectedRows) Input parameter");
AddInput("LearningRate", "(Tensor) Learning rate of SGD"); AddInput("LearningRate", "(Tensor) Learning rate of SGD");
AddInput("Grad", "(Tensor or SelectedRows) Input gradient"); AddInput("Grad", "(Tensor or SelectedRows) Input gradient");
......
...@@ -69,8 +69,7 @@ class ShrinkRNNMemoryOp : public ArrayOp { ...@@ -69,8 +69,7 @@ class ShrinkRNNMemoryOp : public ArrayOp {
class ShrinkRNNMemoryOpProtoMaker : public framework::OpProtoAndCheckerMaker { class ShrinkRNNMemoryOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ShrinkRNNMemoryOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(LoDTensor) The RNN step memory to be shrinked."); AddInput("X", "(LoDTensor) The RNN step memory to be shrinked.");
AddInput("RankTable", "(LoDRankTable) The lod_rank_table of dynamic RNN."); AddInput("RankTable", "(LoDRankTable) The lod_rank_table of dynamic RNN.");
AddInput("I", AddInput("I",
......
...@@ -86,9 +86,7 @@ class SigmoidCrossEntropyWithLogitsGradOp ...@@ -86,9 +86,7 @@ class SigmoidCrossEntropyWithLogitsGradOp
class SigmoidCrossEntropyWithLogitsOpMaker class SigmoidCrossEntropyWithLogitsOpMaker
: public framework::OpProtoAndCheckerMaker { : public framework::OpProtoAndCheckerMaker {
public: public:
SigmoidCrossEntropyWithLogitsOpMaker(OpProto* proto, void Make() override {
OpAttrChecker* op_checker)
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor, default Tensor<float>), a 2-D tensor with shape N x D, " "(Tensor, default Tensor<float>), a 2-D tensor with shape N x D, "
"where N is the batch size and D is the number of classes. " "where N is the batch size and D is the number of classes. "
......
...@@ -34,8 +34,7 @@ class SignOp : public framework::OperatorWithKernel { ...@@ -34,8 +34,7 @@ class SignOp : public framework::OperatorWithKernel {
template <typename AttrType> template <typename AttrType>
class SignOpMaker : public framework::OpProtoAndCheckerMaker { class SignOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SignOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) Input tensor of sign operator."); AddInput("X", "(Tensor) Input tensor of sign operator.");
AddOutput("Out", "(Tensor) Output tensor of sign operator."); AddOutput("Out", "(Tensor) Output tensor of sign operator.");
AddComment(R"DOC( AddComment(R"DOC(
......
...@@ -46,8 +46,7 @@ class SmoothL1LossOp : public framework::OperatorWithKernel { ...@@ -46,8 +46,7 @@ class SmoothL1LossOp : public framework::OperatorWithKernel {
class SmoothL1LossOpMaker : public framework::OpProtoAndCheckerMaker { class SmoothL1LossOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SmoothL1LossOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(Tensor, default Tensor<float>) A tensor with rank at least 2. " "(Tensor, default Tensor<float>) A tensor with rank at least 2. "
"The input value of smooth l1 loss op with shape " "The input value of smooth l1 loss op with shape "
......
...@@ -77,8 +77,7 @@ class SoftmaxOp : public framework::OperatorWithKernel { ...@@ -77,8 +77,7 @@ class SoftmaxOp : public framework::OperatorWithKernel {
class SoftmaxOpMaker : public framework::OpProtoAndCheckerMaker { class SoftmaxOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SoftmaxOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"The input tensor of softmax. " "The input tensor of softmax. "
"2-D with shape [batch_size, input_feature_dimensions]."); "2-D with shape [batch_size, input_feature_dimensions].");
......
...@@ -20,8 +20,7 @@ namespace operators { ...@@ -20,8 +20,7 @@ namespace operators {
class SoftmaxWithCrossEntropyOpMaker class SoftmaxWithCrossEntropyOpMaker
: public framework::OpProtoAndCheckerMaker { : public framework::OpProtoAndCheckerMaker {
public: public:
SoftmaxWithCrossEntropyOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Logits", AddInput("Logits",
"(Tensor, default: Tensor<float>), The unscaled log probabilities " "(Tensor, default: Tensor<float>), The unscaled log probabilities "
"which is a 2-D tensor with shape [N x K]. N is the batch_size, " "which is a 2-D tensor with shape [N x K]. N is the batch_size, "
......
...@@ -64,8 +64,7 @@ class SplitByrefOp : public framework::OperatorWithKernel { ...@@ -64,8 +64,7 @@ class SplitByrefOp : public framework::OperatorWithKernel {
class SplitByrefOpMaker : public framework::OpProtoAndCheckerMaker { class SplitByrefOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SplitByrefOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) Input tensor of the split operator."); AddInput("X", "(Tensor) Input tensor of the split operator.");
AddOutput("Out", "(Tensor) Output tensors of the split operator.") AddOutput("Out", "(Tensor) Output tensors of the split operator.")
.AsDuplicable(); .AsDuplicable();
......
...@@ -19,8 +19,7 @@ namespace operators { ...@@ -19,8 +19,7 @@ namespace operators {
class SplitIdsOpMaker : public framework::OpProtoAndCheckerMaker { class SplitIdsOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SplitIdsOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Ids", "(LoDTensor) the input ids with shape{batch_num, 1}"); AddInput("Ids", "(LoDTensor) the input ids with shape{batch_num, 1}");
AddOutput("Out", "(LoDTensor) The outputs of the input Ids.") AddOutput("Out", "(LoDTensor) The outputs of the input Ids.")
.AsDuplicable(); .AsDuplicable();
......
...@@ -125,8 +125,7 @@ class SplitLoDTensorOp : public framework::OperatorBase { ...@@ -125,8 +125,7 @@ class SplitLoDTensorOp : public framework::OperatorBase {
class SplitLoDTensorOpProtoMaker : public framework::OpProtoAndCheckerMaker { class SplitLoDTensorOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SplitLoDTensorOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input LoDTensor"); AddInput("X", "The input LoDTensor");
AddInput("Mask", "A bool column vector which mask the input"); AddInput("Mask", "A bool column vector which mask the input");
AddOutput("OutTrue", "True branch of input LoDTensor"); AddOutput("OutTrue", "True branch of input LoDTensor");
......
...@@ -70,8 +70,7 @@ class SplitOp : public framework::OperatorWithKernel { ...@@ -70,8 +70,7 @@ class SplitOp : public framework::OperatorWithKernel {
class SplitOpMaker : public framework::OpProtoAndCheckerMaker { class SplitOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SplitOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) Input tensor of the split operator."); AddInput("X", "(Tensor) Input tensor of the split operator.");
AddOutput("Out", "(Tensor) Output tensors of the split operator.") AddOutput("Out", "(Tensor) Output tensors of the split operator.")
.AsDuplicable(); .AsDuplicable();
......
...@@ -19,8 +19,7 @@ namespace operators { ...@@ -19,8 +19,7 @@ namespace operators {
class SplitSelectedRowsOpMaker : public framework::OpProtoAndCheckerMaker { class SplitSelectedRowsOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SplitSelectedRowsOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input SelectedRows."); AddInput("X", "The input SelectedRows.");
AddOutput("Out", "The outputs of the input SelectedRows.").AsDuplicable(); AddOutput("Out", "The outputs of the input SelectedRows.").AsDuplicable();
AddAttr<std::vector<int>>("height_sections", AddAttr<std::vector<int>>("height_sections",
......
...@@ -20,8 +20,7 @@ namespace operators { ...@@ -20,8 +20,7 @@ namespace operators {
class SppOpMaker : public framework::OpProtoAndCheckerMaker { class SppOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SppOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(Tensor) The input tensor of spp operator. " "(Tensor) The input tensor of spp operator. "
......
...@@ -56,8 +56,7 @@ class SquaredL2DistanceOp : public framework::OperatorWithKernel { ...@@ -56,8 +56,7 @@ class SquaredL2DistanceOp : public framework::OperatorWithKernel {
class SquaredL2DistanceOpMaker : public framework::OpProtoAndCheckerMaker { class SquaredL2DistanceOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SquaredL2DistanceOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) Input of SquaredL2DistanceOp."); AddInput("X", "(Tensor) Input of SquaredL2DistanceOp.");
AddInput("Y", "(Tensor) Target of SquaredL2DistanceOp."); AddInput("Y", "(Tensor) Target of SquaredL2DistanceOp.");
AddOutput("sub_result", AddOutput("sub_result",
......
...@@ -48,8 +48,7 @@ class SquaredL2NormGradOp : public framework::OperatorWithKernel { ...@@ -48,8 +48,7 @@ class SquaredL2NormGradOp : public framework::OperatorWithKernel {
class SquaredL2NormOpMaker : public framework::OpProtoAndCheckerMaker { class SquaredL2NormOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SquaredL2NormOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) The input of squared_l2_norm op."); AddInput("X", "(Tensor) The input of squared_l2_norm op.");
AddOutput("Out", "(Scalar) The output of squared_l2_norm op."); AddOutput("Out", "(Scalar) The output of squared_l2_norm op.");
AddComment(R"DOC( AddComment(R"DOC(
......
...@@ -112,8 +112,7 @@ class SumOp : public framework::OperatorWithKernel { ...@@ -112,8 +112,7 @@ class SumOp : public framework::OperatorWithKernel {
class SumOpMaker : public framework::OpProtoAndCheckerMaker { class SumOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
SumOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(vector<Tensor>) The input tensors of sum operator.") AddInput("X", "(vector<Tensor>) The input tensors of sum operator.")
.AsDuplicable(); .AsDuplicable();
AddOutput("Out", "(Tensor) The output tensor of sum operator."); AddOutput("Out", "(Tensor) The output tensor of sum operator.");
......
...@@ -65,8 +65,7 @@ class TargetAssignOp : public framework::OperatorWithKernel { ...@@ -65,8 +65,7 @@ class TargetAssignOp : public framework::OperatorWithKernel {
class TargetAssignOpMaker : public framework::OpProtoAndCheckerMaker { class TargetAssignOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
TargetAssignOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", AddInput("X",
"(LoDTensor), This input is a 3D LoDTensor with shape [M, P, K]. " "(LoDTensor), This input is a 3D LoDTensor with shape [M, P, K]. "
"Some elements in X will be assigned to Out based on the " "Some elements in X will be assigned to Out based on the "
......
...@@ -57,8 +57,7 @@ class WriteToArrayOp : public ArrayOp { ...@@ -57,8 +57,7 @@ class WriteToArrayOp : public ArrayOp {
class WriteToArrayOpProtoMaker : public framework::OpProtoAndCheckerMaker { class WriteToArrayOpProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
WriteToArrayOpProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(LoDTensor) the tensor will be written to tensor array"); AddInput("X", "(LoDTensor) the tensor will be written to tensor array");
AddInput( AddInput(
"I", "I",
...@@ -148,8 +147,7 @@ class ReadFromArrayOp : public ArrayOp { ...@@ -148,8 +147,7 @@ class ReadFromArrayOp : public ArrayOp {
class ReadFromArrayProtoMaker : public framework::OpProtoAndCheckerMaker { class ReadFromArrayProtoMaker : public framework::OpProtoAndCheckerMaker {
public: public:
ReadFromArrayProtoMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(TensorArray) the array will be read from."); AddInput("X", "(TensorArray) the array will be read from.");
AddInput("I", AddInput("I",
"(Tensor) the subscript index in tensor array. The number of " "(Tensor) the subscript index in tensor array. The number of "
......
...@@ -48,8 +48,7 @@ class TopkOp : public framework::OperatorWithKernel { ...@@ -48,8 +48,7 @@ class TopkOp : public framework::OperatorWithKernel {
class TopkOpMaker : public framework::OpProtoAndCheckerMaker { class TopkOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
TopkOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "(Tensor) The input of Topk op"); AddInput("X", "(Tensor) The input of Topk op");
AddOutput("Out", "(Tensor) The output tensor of Topk op"); AddOutput("Out", "(Tensor) The output tensor of Topk op");
AddOutput("Indices", "(Tensor) The indices of Topk elements of input"); AddOutput("Indices", "(Tensor) The indices of Topk elements of input");
......
...@@ -56,8 +56,7 @@ class TransposeOp : public framework::OperatorWithKernel { ...@@ -56,8 +56,7 @@ class TransposeOp : public framework::OperatorWithKernel {
class TransposeOpMaker : public framework::OpProtoAndCheckerMaker { class TransposeOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
TransposeOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(Tensor) The input tensor, tensors with rank up to 6 are supported."); "(Tensor) The input tensor, tensors with rank up to 6 are supported.");
......
...@@ -32,9 +32,8 @@ class UniformRandomBatchSizeLikeOp : public BatchSizeLikeOp { ...@@ -32,9 +32,8 @@ class UniformRandomBatchSizeLikeOp : public BatchSizeLikeOp {
}; };
class UniformRandomBatchSizeLikeOpMaker : public BatchSizeLikeOpMaker { class UniformRandomBatchSizeLikeOpMaker : public BatchSizeLikeOpMaker {
public: protected:
UniformRandomBatchSizeLikeOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Apply() override {
: BatchSizeLikeOpMaker(proto, op_checker) {
AddComment(R"DOC( AddComment(R"DOC(
Uniform random operator Uniform random operator
......
...@@ -85,8 +85,7 @@ class UniformRandomOp : public framework::OperatorWithKernel { ...@@ -85,8 +85,7 @@ class UniformRandomOp : public framework::OperatorWithKernel {
class UniformRandomOpMaker : public framework::OpProtoAndCheckerMaker { class UniformRandomOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
UniformRandomOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: framework::OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("Out", "(Tensor) The output tensor of uniform random op"); AddOutput("Out", "(Tensor) The output tensor of uniform random op");
AddComment(R"DOC( AddComment(R"DOC(
Uniform random operator. Uniform random operator.
......
...@@ -20,8 +20,7 @@ namespace operators { ...@@ -20,8 +20,7 @@ namespace operators {
class Unpool2dOpMaker : public framework::OpProtoAndCheckerMaker { class Unpool2dOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
Unpool2dOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(Tensor) The input tensor of unpool operator. " "(Tensor) The input tensor of unpool operator. "
......
...@@ -53,8 +53,7 @@ class WarpCTCOp : public framework::OperatorWithKernel { ...@@ -53,8 +53,7 @@ class WarpCTCOp : public framework::OperatorWithKernel {
class WarpCTCOpMaker : public framework::OpProtoAndCheckerMaker { class WarpCTCOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
WarpCTCOpMaker(OpProto* proto, OpAttrChecker* op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Logits", AddInput("Logits",
"(LodTensor, default: LoDTensor<float>), the unscaled " "(LodTensor, default: LoDTensor<float>), the unscaled "
"probabilities of variable-length sequences, which is a 2-D " "probabilities of variable-length sequences, which is a 2-D "
......
...@@ -68,8 +68,7 @@ class WhileOp : public framework::OperatorBase { ...@@ -68,8 +68,7 @@ class WhileOp : public framework::OperatorBase {
class WhileOpMaker : public framework::OpProtoAndCheckerMaker { class WhileOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
WhileOpMaker(OpProto *proto, OpAttrChecker *op_checker) void Make() override {
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput(kX, AddInput(kX,
"A set of variables, which are required by operators inside the " "A set of variables, which are required by operators inside the "
"block of While Op.") "block of While Op.")
......
...@@ -398,7 +398,7 @@ function gen_dockerfile() { ...@@ -398,7 +398,7 @@ function gen_dockerfile() {
cat <<EOF cat <<EOF
======================================== ========================================
Generate /paddle/build/Dockerfile ... Generate ${PADDLE_ROOT}/build/Dockerfile ...
======================================== ========================================
EOF EOF
...@@ -422,7 +422,7 @@ EOF ...@@ -422,7 +422,7 @@ EOF
CMD='"true"' CMD='"true"'
fi fi
cat >> /paddle/build/Dockerfile <<EOF cat >> ${PADDLE_ROOT}/build/Dockerfile <<EOF
ADD python/dist/*.whl / ADD python/dist/*.whl /
# run paddle version to install python packages first # run paddle version to install python packages first
RUN apt-get update &&\ RUN apt-get update &&\
...@@ -436,8 +436,14 @@ EOF ...@@ -436,8 +436,14 @@ EOF
${DOCKERFILE_CUDNN_DSO} ${DOCKERFILE_CUDNN_DSO}
${DOCKERFILE_GPU_ENV} ${DOCKERFILE_GPU_ENV}
ENV NCCL_LAUNCH_MODE PARALLEL ENV NCCL_LAUNCH_MODE PARALLEL
ADD go/cmd/pserver/pserver /usr/bin/ EOF
ADD go/cmd/master/master /usr/bin/ if [[ ${WITH_GOLANG:-OFF} == "ON" ]]; then
cat >> ${PADDLE_ROOT}/build/Dockerfile <<EOF
ADD go/cmd/pserver/pserver /usr/bin/
ADD go/cmd/master/master /usr/bin/
EOF
fi
cat >> ${PADDLE_ROOT}/build/Dockerfile <<EOF
# default command shows the paddle version and exit # default command shows the paddle version and exit
CMD [${CMD}] CMD [${CMD}]
EOF EOF
...@@ -467,6 +473,7 @@ EOF ...@@ -467,6 +473,7 @@ EOF
} }
function main() { function main() {
set -e
local CMD=$1 local CMD=$1
init init
case $CMD in case $CMD in
......
...@@ -32,7 +32,7 @@ function start_build_docker() { ...@@ -32,7 +32,7 @@ function start_build_docker() {
DOCKER_ENV=$(cat <<EOL DOCKER_ENV=$(cat <<EOL
-e FLAGS_fraction_of_gpu_memory_to_use=0.15 \ -e FLAGS_fraction_of_gpu_memory_to_use=0.15 \
-e CTEST_OUTPUT_ON_FAILURE=1 \ -e CTEST_OUTPUT_ON_FAILURE=1 \
-e CTEST_PARALLEL_LEVEL=5 \ -e CTEST_PARALLEL_LEVEL=1 \
-e APT_MIRROR=${apt_mirror} \ -e APT_MIRROR=${apt_mirror} \
-e WITH_GPU=ON \ -e WITH_GPU=ON \
-e CUDA_ARCH_NAME=Auto \ -e CUDA_ARCH_NAME=Auto \
...@@ -59,7 +59,7 @@ EOL ...@@ -59,7 +59,7 @@ EOL
if [ ! -d "${HOME}/.ccache" ]; then if [ ! -d "${HOME}/.ccache" ]; then
mkdir ${HOME}/.ccache mkdir ${HOME}/.ccache
fi fi
set -x set -ex
${DOCKER_CMD} run -it \ ${DOCKER_CMD} run -it \
--name $CONTAINER_ID \ --name $CONTAINER_ID \
${DOCKER_ENV} \ ${DOCKER_ENV} \
......
...@@ -299,14 +299,18 @@ class Executor(object): ...@@ -299,14 +299,18 @@ class Executor(object):
if feed is None: if feed is None:
feed = {} feed = {}
if not isinstance(feed, dict): if not isinstance(feed, dict):
raise TypeError("feed should be a map") raise TypeError(
"feed requires dict as its Parameter. But you passed in %s" %
(type(feed)))
if fetch_list is None: if fetch_list is None:
fetch_list = [] fetch_list = []
if program is None: if program is None:
program = default_main_program() program = default_main_program()
if not isinstance(program, Program): if not isinstance(program, Program):
raise TypeError() raise TypeError(
"Executor requires Program as its Parameter. But you passed in %s"
% (type(program)))
if scope is None: if scope is None:
scope = global_scope() scope = global_scope()
......
...@@ -160,6 +160,7 @@ class Variable(object): ...@@ -160,6 +160,7 @@ class Variable(object):
persistable=None, persistable=None,
error_clip=None, error_clip=None,
stop_gradient=False, stop_gradient=False,
is_data=False,
**kwargs): **kwargs):
self.block = block self.block = block
self.error_clip = error_clip self.error_clip = error_clip
...@@ -238,6 +239,7 @@ class Variable(object): ...@@ -238,6 +239,7 @@ class Variable(object):
self.block.vars[name] = self self.block.vars[name] = self
self.op = None self.op = None
self.stop_gradient = stop_gradient self.stop_gradient = stop_gradient
self.is_data = is_data
def __str__(self): def __str__(self):
return self.to_string(True) return self.to_string(True)
...@@ -475,7 +477,7 @@ class Operator(object): ...@@ -475,7 +477,7 @@ class Operator(object):
if isinstance(attrs[attr_name], Block): if isinstance(attrs[attr_name], Block):
self.desc.set_block_attr(attr_name, attrs[attr_name].desc) self.desc.set_block_attr(attr_name, attrs[attr_name].desc)
elif isinstance(attrs[attr_name], core.BlockDesc) or \ elif isinstance(attrs[attr_name], core.BlockDesc) or \
isinstance(attrs[attr_name], core.ProgramDesc): isinstance(attrs[attr_name], core.ProgramDesc):
self.desc.set_serialized_attr( self.desc.set_serialized_attr(
attr_name, attrs[attr_name].serialize_to_string()) attr_name, attrs[attr_name].serialize_to_string())
else: else:
...@@ -978,7 +980,8 @@ class Block(object): ...@@ -978,7 +980,8 @@ class Block(object):
shape=var.shape, shape=var.shape,
dtype=var.dtype, dtype=var.dtype,
type=var.type, type=var.type,
persistable=True) persistable=True,
is_data=var.is_data)
else: else:
ret_var = self.create_var( ret_var = self.create_var(
name=var.name, name=var.name,
...@@ -986,7 +989,8 @@ class Block(object): ...@@ -986,7 +989,8 @@ class Block(object):
dtype=var.dtype, dtype=var.dtype,
type=var.type, type=var.type,
lod_level=var.lod_level, lod_level=var.lod_level,
persistable=True) persistable=True,
is_data=var.is_data)
return ret_var return ret_var
...@@ -1051,6 +1055,7 @@ class Program(object): ...@@ -1051,6 +1055,7 @@ class Program(object):
p.sync_with_cpp() p.sync_with_cpp()
p.copy_param_info_from(self) p.copy_param_info_from(self)
p.copy_data_info_from(self)
return p return p
def prune(self, targets): def prune(self, targets):
...@@ -1172,6 +1177,26 @@ class Program(object): ...@@ -1172,6 +1177,26 @@ class Program(object):
"program, with represent the same topology") "program, with represent the same topology")
self.global_block().copy_param_info_from(other.global_block()) self.global_block().copy_param_info_from(other.global_block())
def copy_data_info_from(self, other):
"""
Copy the information of data variables from other program.
Args:
other(Program): Other program
Returns:
None
"""
if not isinstance(other, Program):
raise TypeError("copy_param_info_from should be invoked with "
"Program")
if len(self.blocks) != len(other.blocks):
raise ValueError("copy_param_info_from should be invoked with two "
"program, with represent the same topology")
for var in other.global_block().vars.itervalues():
if var.is_data:
self.global_block().var(var.name).is_data = True
def list_vars(self): def list_vars(self):
for each_block in self.blocks: for each_block in self.blocks:
for each_var in each_block.vars.itervalues(): for each_var in each_block.vars.itervalues():
......
...@@ -78,8 +78,8 @@ def data(name, ...@@ -78,8 +78,8 @@ def data(name,
dtype=dtype, dtype=dtype,
type=type, type=type,
stop_gradient=stop_gradient, stop_gradient=stop_gradient,
lod_level=lod_level) lod_level=lod_level,
data_var.is_data = True is_data=True)
return data_var return data_var
......
...@@ -113,7 +113,7 @@ def generate_layer_fn(op_type): ...@@ -113,7 +113,7 @@ def generate_layer_fn(op_type):
if len(not_intermediate_outputs) != 1: if len(not_intermediate_outputs) != 1:
raise ValueError("Only one non intermediate output operator can be", raise ValueError("Only one non intermediate output operator can be",
"automatically generated.") "automatically generated. {0}".format(op_type))
if not_intermediate_outputs[0].duplicable: if not_intermediate_outputs[0].duplicable:
raise ValueError( raise ValueError(
......
...@@ -47,6 +47,8 @@ class Optimizer(object): ...@@ -47,6 +47,8 @@ class Optimizer(object):
raise TypeError("learning rate should be float or Variable") raise TypeError("learning rate should be float or Variable")
self.regularization = regularization self.regularization = regularization
self._learning_rate = learning_rate self._learning_rate = learning_rate
# the learning rate type should be inferenced from loss
self._dtype = None
# each program should have a independent learning rate # each program should have a independent learning rate
# program -> Variable(learning_rate) # program -> Variable(learning_rate)
self._learning_rate_map = dict() self._learning_rate_map = dict()
...@@ -77,7 +79,7 @@ class Optimizer(object): ...@@ -77,7 +79,7 @@ class Optimizer(object):
name=unique_name.generate("learning_rate"), name=unique_name.generate("learning_rate"),
shape=[1], shape=[1],
value=float(self._learning_rate), value=float(self._learning_rate),
dtype='float32', dtype='float32' if self._dtype == None else self._dtype,
persistable=True) persistable=True)
def global_learning_rate(self, program=None): def global_learning_rate(self, program=None):
...@@ -200,6 +202,7 @@ class Optimizer(object): ...@@ -200,6 +202,7 @@ class Optimizer(object):
# Create any accumulators # Create any accumulators
program = loss.block.program program = loss.block.program
self._dtype = loss.dtype
with program_guard(program, startup_program): with program_guard(program, startup_program):
global_block = framework.default_main_program().global_block() global_block = framework.default_main_program().global_block()
start = len(global_block.ops) start = len(global_block.ops)
...@@ -391,7 +394,7 @@ class AdamOptimizer(Optimizer): ...@@ -391,7 +394,7 @@ class AdamOptimizer(Optimizer):
beta_shape = [1] beta_shape = [1]
self._beta1_pow_acc = self.helper.create_global_variable( self._beta1_pow_acc = self.helper.create_global_variable(
name=unique_name.generate('beta1_pow_acc'), name=unique_name.generate('beta1_pow_acc'),
dtype='float32', dtype='float32' if self._dtype == None else self._dtype,
shape=beta_shape, shape=beta_shape,
lod_level=0, lod_level=0,
persistable=True) persistable=True)
...@@ -400,7 +403,7 @@ class AdamOptimizer(Optimizer): ...@@ -400,7 +403,7 @@ class AdamOptimizer(Optimizer):
self._beta2_pow_acc = self.helper.create_global_variable( self._beta2_pow_acc = self.helper.create_global_variable(
name=unique_name.generate('beta2_pow_acc'), name=unique_name.generate('beta2_pow_acc'),
dtype='float32', dtype='float32' if self._dtype == None else self._dtype,
shape=beta_shape, shape=beta_shape,
lod_level=0, lod_level=0,
persistable=True) persistable=True)
...@@ -493,7 +496,7 @@ class AdamaxOptimizer(Optimizer): ...@@ -493,7 +496,7 @@ class AdamaxOptimizer(Optimizer):
beta_shape = [1] beta_shape = [1]
self._beta1_pow_acc = self.helper.create_global_variable( self._beta1_pow_acc = self.helper.create_global_variable(
name=unique_name.generate('beta1_pow_acc'), name=unique_name.generate('beta1_pow_acc'),
dtype='float32', dtype='float32' if self._dtype == None else self._dtype,
shape=beta_shape, shape=beta_shape,
lod_level=0, lod_level=0,
persistable=True) persistable=True)
...@@ -900,8 +903,10 @@ class ModelAverage(Optimizer): ...@@ -900,8 +903,10 @@ class ModelAverage(Optimizer):
# param = (sum_1 + sum_2 + sum_3) / (num_accumulates + old_num_accumulates) # param = (sum_1 + sum_2 + sum_3) / (num_accumulates + old_num_accumulates)
tmp = layers.sum(x=[num_accumulates, old_num_accumulates]) tmp = layers.sum(x=[num_accumulates, old_num_accumulates])
sum = layers.sum(x=[sum_1, sum_2, sum_3]) sum = layers.sum(x=[sum_1, sum_2, sum_3])
tmp = layers.cast(x=tmp, dtype='float32') tmp = layers.cast(
sum = layers.cast(x=sum, dtype='float32') x=tmp, dtype='float32' if self._dtype == None else self._dtype)
sum = layers.cast(
x=sum, dtype='float32' if self._dtype == None else self._dtype)
layers.elementwise_div(x=sum, y=tmp, out=param) layers.elementwise_div(x=sum, y=tmp, out=param)
def _add_average_restore_op(self, block, param_grad): def _add_average_restore_op(self, block, param_grad):
......
...@@ -80,8 +80,11 @@ def inference_program(is_sparse): ...@@ -80,8 +80,11 @@ def inference_program(is_sparse):
def train_program(is_sparse): def train_program(is_sparse):
next_word = fluid.layers.data(name='nextw', shape=[1], dtype='int64') # The declaration of 'next_word' must be after the invoking of inference_program,
# or the data input order of train program would be [next_word, firstw, secondw,
# thirdw, forthw], which is not correct.
predict_word = inference_program(is_sparse) predict_word = inference_program(is_sparse)
next_word = fluid.layers.data(name='nextw', shape=[1], dtype='int64')
cost = fluid.layers.cross_entropy(input=predict_word, label=next_word) cost = fluid.layers.cross_entropy(input=predict_word, label=next_word)
avg_cost = fluid.layers.mean(cost) avg_cost = fluid.layers.mean(cost)
return avg_cost return avg_cost
...@@ -90,14 +93,17 @@ def train_program(is_sparse): ...@@ -90,14 +93,17 @@ def train_program(is_sparse):
def train(use_cuda, is_sparse, save_path): def train(use_cuda, is_sparse, save_path):
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.dataset.imikolov.train(word_dict, N), BATCH_SIZE) paddle.dataset.imikolov.train(word_dict, N), BATCH_SIZE)
test_reader = paddle.batch(
paddle.dataset.imikolov.test(word_dict, N), BATCH_SIZE)
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
def event_handler(event): def event_handler(event):
print type(event) # print type(event)
if isinstance(event, fluid.EndEpochEvent): if isinstance(event, fluid.EndEpochEvent):
avg_cost = trainer.test(reader=paddle.dataset.imikolov.test( outs = trainer.test(reader=test_reader)
word_dict, N)) avg_cost = outs[0]
print("loss= ", avg_cost)
if avg_cost < 5.0: if avg_cost < 5.0:
trainer.save_params(save_path) trainer.save_params(save_path)
......
...@@ -36,7 +36,7 @@ depth = 8 ...@@ -36,7 +36,7 @@ depth = 8
mix_hidden_lr = 1e-3 mix_hidden_lr = 1e-3
IS_SPARSE = True IS_SPARSE = True
PASS_NUM = 100 PASS_NUM = 10
BATCH_SIZE = 10 BATCH_SIZE = 10
embedding_name = 'emb' embedding_name = 'emb'
......
...@@ -111,21 +111,24 @@ class Generator(object): ...@@ -111,21 +111,24 @@ class Generator(object):
# Generate test cases for all possibilities # Generate test cases for all possibilities
for dim_X in [1, 2, 3]: def inject_test(dim_x, dim_y, trans_x, trans_y):
for dim_Y in [1, 2, 3]: test_name = ('TestMatMulOp_dimX_{}_dim_Y_{}_transX_{}_transY_{}'.format(
for transpose_X in [False, True]: dim_x, dim_y, trans_x, trans_y))
for transpose_Y in [False, True]: shape_x, shape_y = generate_compatible_shapes(dim_x, dim_y, trans_x,
test_name = ( trans_y)
'TestMatMulOp_dimX_{}_dim_Y_{}_transX_{}_transY_{}'.format( globals()[test_name] = type(test_name, (Generator, OpTest), {
dim_X, dim_Y, transpose_X, transpose_Y)) 'shape_X': shape_x,
shape_X, shape_Y = generate_compatible_shapes( 'shape_Y': shape_y,
dim_X, dim_Y, transpose_X, transpose_Y) 'transpose_X': trans_x,
globals()[test_name] = type(test_name, (Generator, OpTest), { 'transpose_Y': trans_y,
'shape_X': shape_X, })
'shape_Y': shape_Y,
'transpose_X': transpose_X,
'transpose_Y': transpose_Y, for dim_X in (1, 2, 3):
}) for dim_Y in (1, 2, 3):
for transose_x in (False, True):
for transose_y in (False, True):
inject_test(dim_X, dim_Y, transose_x, transose_y)
# Test case n-dim # Test case n-dim
...@@ -149,7 +152,7 @@ def generate_compatible_shapes(dim, transpose_X, transpose_Y): ...@@ -149,7 +152,7 @@ def generate_compatible_shapes(dim, transpose_X, transpose_Y):
return shape_X, shape_Y return shape_X, shape_Y
# Test case n-dim # # Test case n-dim
for dim in [4]: for dim in [4]:
for transpose_X in [False, True]: for transpose_X in [False, True]:
for transpose_Y in [False, True]: for transpose_Y in [False, True]:
......
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import unittest
import numpy as np
import paddle
import paddle.fluid as fluid
import paddle.fluid.core as core
from paddle.fluid.executor import Executor
BATCH_SIZE = 20
class TestNetWithDtype(unittest.TestCase):
def setUp(self):
self.dtype = "float64"
self.init_dtype()
self.x = fluid.layers.data(name='x', shape=[13], dtype=self.dtype)
self.y = fluid.layers.data(name='y', shape=[1], dtype=self.dtype)
y_predict = fluid.layers.fc(input=self.x, size=1, act=None)
cost = fluid.layers.square_error_cost(input=y_predict, label=self.y)
avg_cost = fluid.layers.mean(cost)
self.fetch_list = [avg_cost]
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(avg_cost)
def run_net_on_place(self, place):
train_reader = paddle.batch(
paddle.dataset.uci_housing.train(), batch_size=BATCH_SIZE)
feeder = fluid.DataFeeder(place=place, feed_list=[self.x, self.y])
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
for data in train_reader():
exe.run(fluid.default_main_program(),
feed=feeder.feed(data),
fetch_list=self.fetch_list)
# the main program is runable, the datatype is fully supported
break
def init_dtype(self):
pass
def test_cpu(self):
place = fluid.CPUPlace()
self.run_net_on_place(place)
def test_gpu(self):
if not core.is_compiled_with_cuda():
return
place = fluid.CUDAPlace(0)
self.run_net_on_place(place)
# TODO(dzhwinter): make sure the fp16 is runable
# class TestFloat16(SimpleNet):
# def init_dtype(self):
# self.dtype = "float16"
if __name__ == '__main__':
unittest.main()
...@@ -75,11 +75,15 @@ class Trainer(object): ...@@ -75,11 +75,15 @@ class Trainer(object):
self.train_program = framework.Program() self.train_program = framework.Program()
with framework.program_guard(self.train_program, self.startup_program): with framework.program_guard(self.train_program, self.startup_program):
loss = program_func() program_func_outs = program_func()
self.test_outputs = program_func_outs if isinstance(
program_func_outs, list) else [program_func_outs]
self.test_program = self.train_program.clone()
if not isinstance(optimizer, opt_module.Optimizer): if not isinstance(optimizer, opt_module.Optimizer):
raise TypeError( raise TypeError(
"The optimizer should be an instance of Optimizer") "The optimizer should be an instance of Optimizer")
# The fisrt element of program_func_outs is loss.
loss = self.test_outputs[0]
optimize_ops, params_grads = optimizer.minimize(loss) optimize_ops, params_grads = optimizer.minimize(loss)
self.place = Trainer._check_and_get_place(place) self.place = Trainer._check_and_get_place(place)
...@@ -168,8 +172,17 @@ class Trainer(object): ...@@ -168,8 +172,17 @@ class Trainer(object):
self._train_by_executor(num_epochs, event_handler, reader, feed_order) self._train_by_executor(num_epochs, event_handler, reader, feed_order)
def test(self, reader): def test(self, reader, feed_order=None):
pass """
Test the model on given test data
Args:
reader: The reader that yields test data.
feed_order: Feeding order of reader. None will following the defining
order in program
"""
return self._test_by_executor(reader, feed_order, self.test_outputs)
def save_params(self, param_path): def save_params(self, param_path):
# reference: save_persistables in io.py # reference: save_persistables in io.py
...@@ -225,22 +238,10 @@ class Trainer(object): ...@@ -225,22 +238,10 @@ class Trainer(object):
""" """
with self._prog_and_scope_guard(): with self._prog_and_scope_guard():
exe = executor.Executor(self.place) feed_var_list = build_feed_var_list(self.train_program, feed_order)
if feed_order is None:
feed_var_list = [
var
for var in self.train_program.global_block(
).vars.itervalues()
if hasattr(var, 'is_data') and var.is_data
]
else:
feed_var_list = [
self.train_program.global_block().var(var_name)
for var_name in feed_order
]
feeder = data_feeder.DataFeeder( feeder = data_feeder.DataFeeder(
feed_list=feed_var_list, place=self.place) feed_list=feed_var_list, place=self.place)
exe = executor.Executor(self.place)
for epoch_id in range(num_epochs): for epoch_id in range(num_epochs):
event_handler(BeginEpochEvent(epoch_id)) event_handler(BeginEpochEvent(epoch_id))
for step_id, data in enumerate(reader()): for step_id, data in enumerate(reader()):
...@@ -248,3 +249,48 @@ class Trainer(object): ...@@ -248,3 +249,48 @@ class Trainer(object):
exe.run(feed=feeder.feed(data), fetch_list=[]) exe.run(feed=feeder.feed(data), fetch_list=[])
event_handler(EndStepEvent(epoch_id, step_id)) event_handler(EndStepEvent(epoch_id, step_id))
event_handler(EndEpochEvent(epoch_id)) event_handler(EndEpochEvent(epoch_id))
def _test_by_executor(self, reader, feed_order, fetch_list):
with executor.scope_guard(self.scope):
feed_var_list = build_feed_var_list(self.test_program, feed_order)
feeder = data_feeder.DataFeeder(
feed_list=feed_var_list, place=self.place)
exe = executor.Executor(self.place)
accumulated = len(fetch_list) * [0]
count = 0
for data in reader():
outs = exe.run(program=self.test_program,
feed=feeder.feed(data),
fetch_list=fetch_list)
accumulated = [x[0] + x[1][0] for x in zip(accumulated, outs)]
count += 1
return [x / count for x in accumulated]
def build_feed_var_list(program, feed_order):
if not isinstance(program, framework.Program):
raise TypeError("The 'program' should be an object of Program")
if feed_order is None:
feed_var_list = [
var for var in program.global_block().vars.itervalues()
if var.is_data
]
elif isinstance(feed_order, list):
feed_var_list = [
program.global_block().var(var_name) for var_name in feed_order
]
else:
if not isinstance(feed_order, dict):
raise TypeError(
"The 'feed_order' should be either None, list or dict.")
if not sorted(feed_order.values()) == range(len(feed_order)):
raise ValueError(
"The values of 'feed_order' should be a permutation of [0, len(feed_order))"
)
sorted_pair_list = sorted(feed_order.items(), key=lambda item: item[1])
feed_var_list = [
program.global_block().var(pair[0]) for pair in sorted_pair_list
]
return feed_var_list
...@@ -18,7 +18,9 @@ import math ...@@ -18,7 +18,9 @@ import math
import distributed_splitter as splitter import distributed_splitter as splitter
from .. import core from .. import core
from ..framework import Program, default_main_program, Variable, Parameter from ..framework import Program, default_main_program, \
default_startup_program, \
Variable, Parameter, grad_var_name
LOOKUP_TABLE_TYPE = "lookup_table" LOOKUP_TABLE_TYPE = "lookup_table"
LOOKUP_TABLE_GRAD_TYPE = "lookup_table_grad" LOOKUP_TABLE_GRAD_TYPE = "lookup_table_grad"
...@@ -153,43 +155,43 @@ class DistributeTranspiler: ...@@ -153,43 +155,43 @@ class DistributeTranspiler:
split_method=splitter.round_robin, split_method=splitter.round_robin,
sync_mode=True): sync_mode=True):
""" """
Transpile the program to distributed data-parallelism programs. Transpile the program to distributed data-parallelism programs.
The main_program will be transformed to use a remote parameter server The main_program will be transformed to use a remote parameter server
to do parameter optimization. And the optimization graph will be put to do parameter optimization. And the optimization graph will be put
into a parameter server program. into a parameter server program.
Use different methods to split trainable variables to different Use different methods to split trainable variables to different
parameter servers. parameter servers.
Steps to transpile trainer: Steps to transpile trainer:
1. split variable to multiple blocks, aligned by product(dim[1:]) (width). 1. split variable to multiple blocks, aligned by product(dim[1:]) (width).
2. rename splited grad variables to add trainer_id suffix ".trainer_%d". 2. rename splited grad variables to add trainer_id suffix ".trainer_%d".
3. modify trainer program add split_op to each grad variable. 3. modify trainer program add split_op to each grad variable.
4. append send_op to send splited variables to server and fetch 4. append send_op to send splited variables to server and fetch
params(splited blocks or origin param) from server. params(splited blocks or origin param) from server.
5. append concat_op to merge splited blocks to update local weights. 5. append concat_op to merge splited blocks to update local weights.
Steps to transpile pserver: Steps to transpile pserver:
1. create new program for parameter server. 1. create new program for parameter server.
2. create params and grad variables that assigned to current server instance. 2. create params and grad variables that assigned to current server instance.
3. create a sub-block in the server side program 3. create a sub-block in the server side program
4. append ops that should run on current server instance. 4. append ops that should run on current server instance.
5. add listen_and_serv op 5. add listen_and_serv op
:param trainer_id: one unique id for each trainer in a job. :param trainer_id: one unique id for each trainer in a job.
:type trainer_id: int :type trainer_id: int
:param program: program to transpile, default is default_main_program :param program: program to transpile, default is default_main_program
:type program: Program :type program: Program
:param pservers: parameter server endpoints like "m1:6174,m2:6174" :param pservers: parameter server endpoints like "m1:6174,m2:6174"
:type pservers: string :type pservers: string
:param trainers: total number of workers/trainers in the job :param trainers: total number of workers/trainers in the job
:type trainers: int :type trainers: int
:param split_method: A function to determin how to split variables :param split_method: A function to determin how to split variables
to different servers equally. to different servers equally.
:type split_method: function :type split_method: function
:type sync_mode: boolean default True :type sync_mode: boolean default True
:param sync_mode: if sync_mode is set True, it means that dist transpiler :param sync_mode: if sync_mode is set True, it means that dist transpiler
will transpile the program into sync_mode pserver and trainer program. will transpile the program into sync_mode pserver and trainer program.
""" """
assert (callable(split_method)) assert (callable(split_method))
if program is None: if program is None:
...@@ -244,7 +246,7 @@ class DistributeTranspiler: ...@@ -244,7 +246,7 @@ class DistributeTranspiler:
] ]
grad_list = [ grad_list = [
grad for grad in grad_list grad for grad in grad_list
if grad.name != framework.grad_var_name(self.table_name) if grad.name != grad_var_name(self.table_name)
] ]
self.table_param_grad = [ self.table_param_grad = [
param_grad for param_grad in params_grads param_grad for param_grad in params_grads
...@@ -494,7 +496,7 @@ class DistributeTranspiler: ...@@ -494,7 +496,7 @@ class DistributeTranspiler:
were split to several blocks. were split to several blocks.
""" """
s_prog = Program() s_prog = Program()
orig_s_prog = framework.default_startup_program() orig_s_prog = default_startup_program()
params = self.param_grad_ep_mapping[endpoint]["params"] params = self.param_grad_ep_mapping[endpoint]["params"]
def _get_splited_name_and_shape(varname): def _get_splited_name_and_shape(varname):
...@@ -619,7 +621,7 @@ class DistributeTranspiler: ...@@ -619,7 +621,7 @@ class DistributeTranspiler:
# 2. add split_ids_op and send_vars_op to send gradient to pservers # 2. add split_ids_op and send_vars_op to send gradient to pservers
# there should only be one table_name # there should only be one table_name
all_ops = program.global_block().ops all_ops = program.global_block().ops
table_grad_name = framework.grad_var_name(self.table_name) table_grad_name = grad_var_name(self.table_name)
for op in all_ops: for op in all_ops:
if table_grad_name in op.output_arg_names: if table_grad_name in op.output_arg_names:
op_index = list(all_ops).index(op) op_index = list(all_ops).index(op)
...@@ -692,7 +694,7 @@ class DistributeTranspiler: ...@@ -692,7 +694,7 @@ class DistributeTranspiler:
persistable=True) persistable=True)
grad_var = _clone_var( grad_var = _clone_var(
pserver_program.global_block(), pserver_program.global_block(),
self.origin_program.global_block().vars[framework.grad_var_name( self.origin_program.global_block().vars[grad_var_name(
self.table_name)], self.table_name)],
persistable=False) persistable=False)
......
...@@ -20,6 +20,7 @@ import time ...@@ -20,6 +20,7 @@ import time
import threading import threading
import logging import logging
import copy import copy
import csv
import netaddr import netaddr
import boto3 import boto3
...@@ -136,6 +137,12 @@ parser.add_argument( ...@@ -136,6 +137,12 @@ parser.add_argument(
parser.add_argument( parser.add_argument(
'--master_server_ip', type=str, default="", help="master server private ip") '--master_server_ip', type=str, default="", help="master server private ip")
parser.add_argument(
'--metric_data_identifier',
type=str,
default="**metrics_data: ",
help="key string to identify metrics data")
parser.add_argument( parser.add_argument(
'--no_clean_up', '--no_clean_up',
type=str2bool, type=str2bool,
...@@ -155,6 +162,11 @@ logging.basicConfig( ...@@ -155,6 +162,11 @@ logging.basicConfig(
log_files = ["master.log"] log_files = ["master.log"]
metrics = {}
metrics_csv_file_name = "metrics.csv"
is_metrics_file_created = False
def create_subnet(): def create_subnet():
# if no vpc id provided, list vpcs # if no vpc id provided, list vpcs
...@@ -329,12 +341,42 @@ def create_pservers(): ...@@ -329,12 +341,42 @@ def create_pservers():
cleanup(args.task_name) cleanup(args.task_name)
def save_metrics_data(str_msg):
#parse msg
logging.info("found metrics data, saving it to csv file")
global is_metrics_file_created
metrics_raw = str_msg.split(",")
with open(args.log_path + metrics_csv_file_name, 'a') as csvfile:
csv_fieldnames = []
csv_write_data = {}
for metric in metrics_raw:
metric_data = metric.split("=")
metric_key = metric_data[0].strip()
metric_val = float(metric_data[1].strip())
if not metric_key in metrics:
metrics[metric_key] = []
metric_repo = metrics[metric_key]
metric_repo.append(metric_val)
csv_fieldnames.append(metric_key)
csv_write_data[metric_key] = metric_val
writer = csv.DictWriter(csvfile, fieldnames=csv_fieldnames)
if not is_metrics_file_created:
writer.writeheader()
is_metrics_file_created = True
writer.writerow(csv_write_data)
logging.info("csv file appended")
def log_to_file(source, filename): def log_to_file(source, filename):
if not filename in log_files: if not filename in log_files:
log_files.append(filename) log_files.append(filename)
with open(args.log_path + filename, "a") as log_file: with open(args.log_path + filename, "a") as log_file:
for line in iter(source.readline, ""): for line in iter(source.readline, ""):
log_file.write(line) log_file.write(line)
if (line.startswith(args.metric_data_identifier)):
#found key data, trying to add to csv
line = line.replace(args.metric_data_identifier, "")
save_metrics_data(line)
def parse_command(command_raw, defaults={}): def parse_command(command_raw, defaults={}):
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册