提交 216443de 编写于 作者: W wanghaoshuang

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into fix_gru_unit

...@@ -134,7 +134,7 @@ ...@@ -134,7 +134,7 @@
**输入不等长** 是指recurrent_group的多个输入序列,在每个时间步的子序列长度可以不相等。但序列输出时,需要指定与某一个输入的序列信息是一致的。使用\ :red:`targetInlink`\ 可以指定哪一个输入和输出序列信息一致,默认指定第一个输入。 **输入不等长** 是指recurrent_group的多个输入序列,在每个时间步的子序列长度可以不相等。但序列输出时,需要指定与某一个输入的序列信息是一致的。使用\ :red:`targetInlink`\ 可以指定哪一个输入和输出序列信息一致,默认指定第一个输入。
示例3的配置分别为\ `单层不等长RNN <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/sequence_rnn_multi_unequalength_inputs.conf>`_\ 和\ `双层不等长RNN <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/sequence_nest_rnn_multi_unequalength_inputs.conf>`_\ 。 示例3的配置分别为\ `单层不等长RNN <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/sequence_rnn_multi_unequalength_inputs.py>`_\ 和\ `双层不等长RNN <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/sequence_nest_rnn_multi_unequalength_inputs.py>`_\ 。
示例3对于单层RNN和双层RNN数据完全相同。 示例3对于单层RNN和双层RNN数据完全相同。
......
.. _algo_hrnn_rnn_api_compare:
#####################
API comparision between RNN and hierarchical RNN API comparision between RNN and hierarchical RNN
================================================ #####################
This article takes PaddlePaddle's hierarchical RNN unit test as an example. We will use several examples to illestrate the usage of single-layer and hierarchical RNNs. Each example has two model configurations, one for single-layer, and the other for hierarchical RNN. Although the implementations are different, both the two model configurations' effects are the same. All of the examples in this article only describe the API interface of the hierarchical RNN, while we do not use this hierarchical RNN to solve practical problems. If you want to understand the use of hierarchical RNN in specific issues, please refer to \ :ref:`algo_hrnn_demo`\ The unit test file used in this article's example is \ `test_RecurrentGradientMachine.cpp <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/test_RecurrentGradientMachine.cpp>`_\ 。
Example 1:Hierarchical RNN without Memory between subsequences
================================
The classical case in the hierarchical RNN is to perform sequence operations on each time series data in the inner layers seperately. And the sequence operations in the inner layers is independent, that is, it does not need to use Memory.
In this example, the network configuration of single-layer RNNs and hierarchical RNNs are all to use LSTM as en encoder to compress a word-segmented sentence into a vector. The difference is that, RNN uses a hierarchical RNN model, treating multiple sentences as a whole to use encoder to compress simultaneously. They are completely consistent in their semantic meanings. This pair of semantically identical example configurations is as follows:
* RNN\: `sequence_layer_group.conf <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/sequence_layer_group.conf>`_
* Hierarchical RNN\: `sequence_nest_layer_group.conf <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/sequence_nest_layer_group.conf>`_
Reading hierarchical sequence data
----------------
Firstly, the original data in this example is as follows \:
- The original data in this example has 10 samples. Each of the sample includes two components: a lable(all 2 here), and a word-segmented sentence. This data is used by single RNN as well.
.. literalinclude:: ../../../../paddle/gserver/tests/Sequence/tour_train_wdseg
:language: text
- The data for hierarchical RNN has 4 samples. Every sample is seperated by a blank line, while the content of the data is the same as the original data. But as for hierarchical LSTM, the first sample will encode two sentences into two vectors simultaneously. The sentence count dealed simultaneously by this 4 samples are \ :code:`[2, 3, 2, 3]`\ .
.. literalinclude:: ../../../../paddle/gserver/tests/Sequence/tour_train_wdseg.nest
:language: text
Secondly, as for these two types of different input data formats, the contrast of different DataProviders are as follows (`sequenceGen.py <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/sequenceGen.py>`_)\:
.. literalinclude:: ../../../../paddle/gserver/tests/sequenceGen.py
:language: python
:lines: 21-39
:linenos:
- This is the DataProvider code for an ordinary single-layer time series. Its description is as follows:
* DataProvider returns two parts, that are "words" and "label",as line 19 in the above code.
- "words" is a list of word table indices corresponding to each word in the sentence in the original data. Its data type is integer_value_sequence, that is integer list. So, "words" is a singler-layer time series in the data.
- "label" is the categorical label of each sentence, whose data type is integer_value.
.. literalinclude:: ../../../../paddle/gserver/tests/sequenceGen.py
:language: python
:lines: 42-71
:linenos:
- As for the same data, the DataProvider code for hierarchical time series. Its description is as follows:
- DataProvider returns two lists of data, that are "sentences" and "labels", corresponding to the sentences and labels in each group in the original data of hierarchical time series.
- "sentences" comes from the hierarchical time series original data. As it contains every sentences in each group internally, and each sentences are represented by a list of word table indices, so its data type is integer_value_sub_sequence, which is hierarchical time series.
- "labels" is the categorical lable of each sentence, so it is a sigle-layer time series.
Model configuration
------------------------------------------
Firstly, let's look at the configuration of single-layer RNN. The hightlighted part of line 9 to line 15 is the usage of single-layer RNN. Here we use the pre-defined RNN process function in PaddlePaddle. In this function, for each time step, RNN passes through an LSTM network.
.. literalinclude:: ../../../../paddle/gserver/tests/sequence_layer_group.conf
:language: python
:lines: 38-63
:linenos:
:emphasize-lines: 9-15
Secondly, let's look at the model configuration of hierarchical RNN which has the same semantic meaning. \:
* Most layers in PaddlePaddle do not care about whether the input is time series or not, e.g. \ :code:`embedding_layer`\ . In these layers, every operation is processed on each time step.
* In the hightlighted part of line 7 to line 26 of this configuration, we transform the hierarchical time series data into single-layer time series data, then process each single-layer time series.
* Use the function \ :code:`recurrent_group`\ to transform. Input sequences need to be passed in when transforming. As we want to transform hierarchical time series into single-layer sequences, we need to lable the input data as \ :code:`SubsequenceInput`\ .
* In this example, we disassemble every group of the original data into sentences using \ :code:`recurrent_group`\ . Each of the disassembled sentences passes through an LSTM network. This is equivalent to single-layer RNN configuration.
* Similar to single-layer RNN configuration, we only use the last vector after the encode of LSTM. So we use the operation of \ :code:`last_seq`\ to \ :code:`recurrent_group`\ . But unlike single-layer RNN, we use the last element of every subsequence, so we need to set \ :code:`agg_level=AggregateLevel.TO_SEQUENCE`\ .
* Till now, \ :code:`lstm_last`\ has the same result as \ :code:`lstm_last`\ in single-layer RNN configuration.
.. literalinclude:: ../../../../paddle/gserver/tests/sequence_nest_layer_group.conf
:language: python
:lines: 38-64
:linenos:
:emphasize-lines: 7-26
Example 2:Hierarchical RNN with Memory between subsequences
================================
This example is intended to implement two fully-equivalent fully-connected RNNs using single-layer RNN and hierarchical RNN.
* As for single-layer RNN, input is a full time series, e.g. \ :code:`[4, 5, 2, 0, 9, 8, 1, 4]`\ .
* As for hierarchical RNN, input is a hierarchical time series which elements are arbitrarily combination of data in single-layer RNN, e.g. \ :code:`[ [4, 5, 2], [0, 9], [8, 1, 4]]`.
model configuration
------------------
We select the different parts between single-layer RNN and hierarchical RNN configurations, to compare and analyze the reason why they have same semantic meanings.
- single-layer RNN:passes through a simple recurrent_group. For each time step, the current input y and the last time step's output rnn_state pass through a fully-connected layer.
.. literalinclude:: ../../../../paddle/gserver/tests/sequence_rnn.conf
:language: python
:lines: 36-48
- hierarchical RNN, the outer layer's memory is an element.
- The recurrent_group of inner layer's inner_step is nearly the same as single-layer sequence, except for the case of boot_layer=outer_mem, which means using the outer layer's outer_mem as the initial state for the inner layer's memory. In the outer layer's out_step, outer_mem is the last vector of a subsequence, that is, the whole hierarchical group uses the last vector of the previous subsequence as the initial state for the next subsequence's memory.
- From the aspect of the input data, sentences from single-layer and hierarchical RNN are the same. The only difference is that, hierarchical RNN disassembes the sequence into subsequences. So in the hierarchical RNN configuration, we must use the last element of the previous subsequence as a boot_layer for the memory of the next subsequence, so that it makes no difference with "every time step uses the output of last time step" in the sigle-layer RNN configuration.
.. literalinclude:: ../../../../paddle/gserver/tests/sequence_nest_rnn.conf
:language: python
:lines: 39-66
.. warning::
Currently PaddlePaddle only supports the case that the lengths of the time series of Memory in each time step are the same.
Example 3hierarchical RNN with unequal length inputs
==========================
.. role:: red
.. raw:: html
<style> .red {color:red} </style>
**unequal length inputs** means in the multiple input sequences of recurrent_group, the lengths of subsequences can be unequal. But the output of the sequence, needs to be consistent with one of the input sequences. Using \ :red:`targetInlink`\ can help you specify which of the input sequences and the output sequence can be consistent, by default is the first input.
The configurations of Example 3 are \ `sequence_rnn_multi_unequalength_inputs <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/sequence_rnn_multi_unequalength_inputs.py>`_ \ and \ `sequence_nest_rnn_multi_unequalength_inputs <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/gserver/tests/sequence_nest_rnn_multi_unequalength_inputs.py>`_\ .
The data for the configurations of Example 3's single-layer RNN and hierarchical RNN are exactly the same.
* For the single-layer RNN, the data has two samples, which are \ :code:`[1, 2, 4, 5, 2], [5, 4, 1, 3, 1]`\ and \ :code:`[0, 2, 2, 5, 0, 1, 2], [1, 5, 4, 2, 3, 6, 1]`\ . Each of the data for the single-layer RNN has two group of features.
* On the basis of the single-layer's data, hierarchical RNN's data randomly adds some partitions. For example, the first sample is transformed to \ :code:`[[0, 2], [2, 5], [0, 1, 2]],[[1, 5], [4], [2, 3, 6, 1]]`\ .
* You need to pay attention that, PaddlePaddle only supports multiple input hierarchical RNNs that have same amount of subsequences currently. In this example, the two features both have 3 subsequences. Although the length of each subsequence can be different, the amount of subsequences should be the same.
model configuration
--------
Similar to Example 2's configuration, Example 3's configuration uses single-layer and hierarchical RNN to implement 2 fully-equivalent fully-connected RNNs.
* single-layer RNN\:
.. literalinclude:: ../../../../paddle/gserver/tests/sequence_rnn_multi_unequalength_inputs.py
:language: python
:lines: 42-59
:linenos:
* hierarchical RNN\ \:
.. literalinclude:: ../../../../paddle/gserver/tests/sequence_nest_rnn_multi_unequalength_inputs.py
:language: python
:lines: 41-80
:linenos:
In the above code, the usage of single-layer and hierarchical RNNs are similar to Example 2, which difference is that it processes 2 inputs simultaneously. As for the hierarchical RNN, the lengths of the 2 input's subsequences are not equal. But we use the parameter \ :code:`targetInlink` \ to set the outper layer's \ :code:`recurrent_group` \ 's output format, so the shape of outer layer's output is the same as the shape of \ :code:`emb2`\ .
Glossary
======
.. _glossary_memory:
Memory
------
Memory is a concept when PaddlePaddle is implementing RNN. RNN, recurrent neural network, usually requires some dependency between time steps, that is, the neural network in current time step depends on one of the neurons in the neural network in previous time steps, as the following figure shows:
.. graphviz:: src/glossary_rnn.dot
The dotted connections in the figure, is the network connections across time steps. When PaddlePaddle is implementing RNN, this connection accross time steps is implemented using a special neural network unit, called Memory. Memory can cache the output of one of the neurons in previous time step, then can be passed to another neuron in next time step. The implementation of an RNN using Memory is as follows:
.. graphviz:: src/glossary_rnn_with_memory.dot
With this method, PaddlePaddle can easily determine which outputs should cross time steps, and which should not.
.. _glossary_timestep:
time step
------
refers to time series
.. _glossary_sequence:
time series
--------
Time series is a series of featured data. The order among these featured data is meaningful. So it is a list of features, not a set of features. As for each element of this list, or the featured data in each series, is called a time step. It must be noted that, the concepts of time series and time steps, are not necessarrily related to "time". As long as the "order" in a series of featured data is meaningful, it can be the input of time series.
For example, in text classification task, we regard a sentence as a time series. So, each word in the sentence can become the index of the word in the word table. So this sentence can be represented as a list of these indices, e.g.:code:`[9, 2, 3, 5, 3]` .
For a more detailed and accurate definition of the time series, please refer to `Wikipedia of Time series <https://en.wikipedia.org/wiki/Time_series>`_ or `Chinese Wikipedia of time series <https://zh.wikipedia.org/wiki/%E6%99%82%E9%96%93%E5%BA%8F%E5%88%97>`_ .
In additioin, Paddle always calls time series as :code:`Sequence` . They are a same concept in Paddle's documentations and APIs.
.. _glossary_RNN:
RNN
---
In PaddlePaddle's documentations, RNN is usually represented as :code:`Recurrent neural network` . For more information, please refer to `Wikipedia Recurrent neural network <https://en.wikipedia.org/wiki/Recurrent_neural_network>`_ or `Chinese Wikipedia <https://zh.wikipedia.org/wiki/%E9%80%92%E5%BD%92%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C>`_ .
In PaddlePaddle, RNN usually means, for the input data of a time series, the neural network between each time steps has a certain relevance. For example, the input of a certain neuron is the output of a certain neuron in the neural network of the last time step. Or, as for each time step, the network structure of the neural network has a directed ring structure.
.. _glossary_hierarchical_RNN:
hierarchical RNN
-------
Hierarchical RNN, as the name suggests, means there is a nested relationship in RNNs. The input data is a time series, but for each of the inner featured data, it is also a time series, namely 2-dimentional array, or, array of array. Hierarchical RNN is a neural network that can process this type of input data.
For example, the task of text classification of a paragragh, meaning to classify a paragraph of sentences. We can treat a paragraph as an array of sentences, and each sentence is an array of words. This is a type of the input data for the hierarchical RNN. We encode each sentence of this paragraph into a vector using LSTM, then encode each of the encoded vectors into a vector of this paragraph using LSTM. Finally we use this paragraph vector perform classification, which is the neural network structure of this hierarchical RNN.
TBD
...@@ -29,9 +29,7 @@ namespace framework { ...@@ -29,9 +29,7 @@ namespace framework {
namespace details { namespace details {
struct BroadcastOpHandle : public OpHandleBase { struct BroadcastOpHandle : public OpHandleBase {
const std::vector<Scope *> &local_scopes_; public:
const std::vector<platform::Place> &places_;
BroadcastOpHandle(const std::vector<Scope *> &local_scopes, BroadcastOpHandle(const std::vector<Scope *> &local_scopes,
const std::vector<platform::Place> &places); const std::vector<platform::Place> &places);
...@@ -41,6 +39,10 @@ struct BroadcastOpHandle : public OpHandleBase { ...@@ -41,6 +39,10 @@ struct BroadcastOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
private:
const std::vector<Scope *> &local_scopes_;
const std::vector<platform::Place> &places_;
}; };
} // namespace details } // namespace details
......
...@@ -90,7 +90,7 @@ struct TestBroadcastOpHandle { ...@@ -90,7 +90,7 @@ struct TestBroadcastOpHandle {
op_handle_->AddInput(dummy_var_handle); op_handle_->AddInput(dummy_var_handle);
for (size_t j = 0; j < gpu_list_.size(); ++j) { for (size_t j = 0; j < gpu_list_.size(); ++j) {
op_handle_->dev_ctxes_[gpu_list_[j]] = ctxs_[j].get(); op_handle_->SetDeviceContext(gpu_list_[j], ctxs_[j].get());
VarHandle* out_var_handle = new VarHandle(2, j, "out", gpu_list_[j]); VarHandle* out_var_handle = new VarHandle(2, j, "out", gpu_list_[j]);
vars_.emplace_back(out_var_handle); vars_.emplace_back(out_var_handle);
op_handle_->AddOutput(out_var_handle); op_handle_->AddOutput(out_var_handle);
......
...@@ -28,8 +28,8 @@ ComputationOpHandle::ComputationOpHandle(const OpDesc &op_desc, Scope *scope, ...@@ -28,8 +28,8 @@ ComputationOpHandle::ComputationOpHandle(const OpDesc &op_desc, Scope *scope,
void ComputationOpHandle::RunImpl() { void ComputationOpHandle::RunImpl() {
auto *cur_ctx = dev_ctxes_[place_]; auto *cur_ctx = dev_ctxes_[place_];
for (auto *in : inputs_) { for (auto *in : inputs_) {
bool need_wait = bool need_wait = in->generated_op_ &&
in->generated_op_ && in->generated_op_->dev_ctxes_[place_] != cur_ctx; in->generated_op_->DeviceContext(place_) != cur_ctx;
if (need_wait) { if (need_wait) {
in->generated_op_->Wait(cur_ctx); in->generated_op_->Wait(cur_ctx);
} }
......
...@@ -14,6 +14,9 @@ ...@@ -14,6 +14,9 @@
#pragma once #pragma once
#include <string>
#include <vector>
#include "paddle/fluid/framework/details/op_handle_base.h" #include "paddle/fluid/framework/details/op_handle_base.h"
#include "paddle/fluid/framework/op_registry.h" #include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h" #include "paddle/fluid/framework/operator.h"
...@@ -24,10 +27,7 @@ namespace paddle { ...@@ -24,10 +27,7 @@ namespace paddle {
namespace framework { namespace framework {
namespace details { namespace details {
struct ComputationOpHandle : public OpHandleBase { struct ComputationOpHandle : public OpHandleBase {
std::unique_ptr<OperatorBase> op_; public:
Scope *scope_;
platform::Place place_;
ComputationOpHandle(const OpDesc &op_desc, Scope *scope, ComputationOpHandle(const OpDesc &op_desc, Scope *scope,
platform::Place place); platform::Place place);
...@@ -35,6 +35,11 @@ struct ComputationOpHandle : public OpHandleBase { ...@@ -35,6 +35,11 @@ struct ComputationOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
private:
std::unique_ptr<OperatorBase> op_;
Scope *scope_;
platform::Place place_;
}; };
} // namespace details } // namespace details
} // namespace framework } // namespace framework
......
...@@ -14,6 +14,9 @@ ...@@ -14,6 +14,9 @@
#pragma once #pragma once
#include <string>
#include <vector>
#include "paddle/fluid/framework/details/op_handle_base.h" #include "paddle/fluid/framework/details/op_handle_base.h"
#include "paddle/fluid/framework/feed_fetch_type.h" #include "paddle/fluid/framework/feed_fetch_type.h"
#include "paddle/fluid/framework/scope.h" #include "paddle/fluid/framework/scope.h"
...@@ -24,11 +27,7 @@ namespace framework { ...@@ -24,11 +27,7 @@ namespace framework {
namespace details { namespace details {
struct FetchOpHandle : public OpHandleBase { struct FetchOpHandle : public OpHandleBase {
FeedFetchList *data_; public:
size_t offset_;
std::vector<Scope *> *local_scopes_;
std::vector<LoDTensor> tensors_;
FetchOpHandle(FeedFetchList *data, size_t offset, FetchOpHandle(FeedFetchList *data, size_t offset,
std::vector<Scope *> *local_scopes); std::vector<Scope *> *local_scopes);
...@@ -42,6 +41,12 @@ struct FetchOpHandle : public OpHandleBase { ...@@ -42,6 +41,12 @@ struct FetchOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
private:
FeedFetchList *data_;
size_t offset_;
std::vector<Scope *> *local_scopes_;
std::vector<LoDTensor> tensors_;
}; };
} // namespace details } // namespace details
......
...@@ -29,9 +29,7 @@ namespace framework { ...@@ -29,9 +29,7 @@ namespace framework {
namespace details { namespace details {
struct GatherOpHandle : public OpHandleBase { struct GatherOpHandle : public OpHandleBase {
const std::vector<Scope *> &local_scopes_; public:
const std::vector<platform::Place> &places_;
GatherOpHandle(const std::vector<Scope *> &local_scopes, GatherOpHandle(const std::vector<Scope *> &local_scopes,
const std::vector<platform::Place> &places); const std::vector<platform::Place> &places);
...@@ -41,6 +39,10 @@ struct GatherOpHandle : public OpHandleBase { ...@@ -41,6 +39,10 @@ struct GatherOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
private:
const std::vector<Scope *> &local_scopes_;
const std::vector<platform::Place> &places_;
}; };
} // namespace details } // namespace details
......
...@@ -78,7 +78,7 @@ struct TestGatherOpHandle { ...@@ -78,7 +78,7 @@ struct TestGatherOpHandle {
op_handle_.reset(new GatherOpHandle(local_scopes_, gpu_list_)); op_handle_.reset(new GatherOpHandle(local_scopes_, gpu_list_));
// add input // add input
for (size_t j = 0; j < gpu_list_.size(); ++j) { for (size_t j = 0; j < gpu_list_.size(); ++j) {
op_handle_->dev_ctxes_[gpu_list_[j]] = ctxs_[j].get(); op_handle_->SetDeviceContext(gpu_list_[j], ctxs_[j].get());
auto* in_var_handle = new VarHandle(1, j, "input", gpu_list_[j]); auto* in_var_handle = new VarHandle(1, j, "input", gpu_list_[j]);
vars_.emplace_back(in_var_handle); vars_.emplace_back(in_var_handle);
op_handle_->AddInput(in_var_handle); op_handle_->AddInput(in_var_handle);
......
...@@ -60,7 +60,8 @@ void MultiDevSSAGraphBuilder::CreateOpHandleIOs(SSAGraph *result, ...@@ -60,7 +60,8 @@ void MultiDevSSAGraphBuilder::CreateOpHandleIOs(SSAGraph *result,
const platform::Place &p, const platform::Place &p,
const size_t &i) const { const size_t &i) const {
auto *op_handle = result->ops_.back().get(); auto *op_handle = result->ops_.back().get();
op_handle->dev_ctxes_[p] = platform::DeviceContextPool::Instance().Get(p); op_handle->SetDeviceContext(p,
platform::DeviceContextPool::Instance().Get(p));
auto var_names = op.InputArgumentNames(); auto var_names = op.InputArgumentNames();
......
...@@ -27,10 +27,6 @@ namespace framework { ...@@ -27,10 +27,6 @@ namespace framework {
namespace details { namespace details {
struct NCCLAllReduceOpHandle : public OpHandleBase { struct NCCLAllReduceOpHandle : public OpHandleBase {
const std::vector<Scope *> &local_scopes_;
const std::vector<platform::Place> &places_;
const platform::NCCLContextMap &nccl_ctxs_;
NCCLAllReduceOpHandle(const std::vector<Scope *> &local_scopes, NCCLAllReduceOpHandle(const std::vector<Scope *> &local_scopes,
const std::vector<platform::Place> &places, const std::vector<platform::Place> &places,
const platform::NCCLContextMap &ctxs); const platform::NCCLContextMap &ctxs);
...@@ -43,6 +39,11 @@ struct NCCLAllReduceOpHandle : public OpHandleBase { ...@@ -43,6 +39,11 @@ struct NCCLAllReduceOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
private:
const std::vector<Scope *> &local_scopes_;
const std::vector<platform::Place> &places_;
const platform::NCCLContextMap &nccl_ctxs_;
}; };
} // namespace details } // namespace details
......
...@@ -27,28 +27,15 @@ namespace details { ...@@ -27,28 +27,15 @@ namespace details {
constexpr char kLocalExecScopeName[] = "@LCOAL_SCOPE@"; constexpr char kLocalExecScopeName[] = "@LCOAL_SCOPE@";
class OpHandleBase { class OpHandleBase {
private:
DISABLE_COPY_AND_ASSIGN(OpHandleBase);
public: public:
std::vector<VarHandleBase *> inputs_;
std::vector<VarHandleBase *> outputs_;
std::unordered_map<platform::Place, platform::DeviceContext *,
platform::PlaceHash>
dev_ctxes_;
#ifdef PADDLE_WITH_CUDA
std::unordered_map<int, cudaEvent_t> events_;
#endif
OpHandleBase() {} OpHandleBase() {}
virtual ~OpHandleBase();
std::string DebugString() const; std::string DebugString() const;
virtual std::string Name() const = 0; virtual std::string Name() const = 0;
virtual ~OpHandleBase();
void Run(bool use_event); void Run(bool use_event);
virtual void Wait(platform::DeviceContext *waited_dev); virtual void Wait(platform::DeviceContext *waited_dev);
...@@ -61,6 +48,18 @@ class OpHandleBase { ...@@ -61,6 +48,18 @@ class OpHandleBase {
// will likely block other computations. // will likely block other computations.
virtual bool IsMultiDeviceTransfer() { return false; } virtual bool IsMultiDeviceTransfer() { return false; }
const platform::DeviceContext *DeviceContext(platform::Place place) {
return dev_ctxes_[place];
}
void SetDeviceContext(platform::Place place, platform::DeviceContext *ctx_) {
dev_ctxes_[place] = ctx_;
}
const std::vector<VarHandleBase *> &Inputs() const { return inputs_; }
const std::vector<VarHandleBase *> &Outputs() const { return outputs_; }
protected: protected:
void RunAndRecordEvent(const std::function<void()> &callback); void RunAndRecordEvent(const std::function<void()> &callback);
...@@ -68,6 +67,18 @@ class OpHandleBase { ...@@ -68,6 +67,18 @@ class OpHandleBase {
const std::function<void()> &callback); const std::function<void()> &callback);
virtual void RunImpl() = 0; virtual void RunImpl() = 0;
std::vector<VarHandleBase *> inputs_;
std::vector<VarHandleBase *> outputs_;
std::unordered_map<platform::Place, platform::DeviceContext *,
platform::PlaceHash>
dev_ctxes_;
#ifdef PADDLE_WITH_CUDA
std::unordered_map<int, cudaEvent_t> events_;
#endif
DISABLE_COPY_AND_ASSIGN(OpHandleBase);
}; };
} // namespace details } // namespace details
......
...@@ -14,6 +14,8 @@ ...@@ -14,6 +14,8 @@
#pragma once #pragma once
#include <string>
#include "paddle/fluid/framework/details/op_handle_base.h" #include "paddle/fluid/framework/details/op_handle_base.h"
#include "paddle/fluid/framework/lod_tensor.h" #include "paddle/fluid/framework/lod_tensor.h"
#include "paddle/fluid/framework/scope.h" #include "paddle/fluid/framework/scope.h"
...@@ -23,10 +25,6 @@ namespace framework { ...@@ -23,10 +25,6 @@ namespace framework {
namespace details { namespace details {
struct ScaleLossGradOpHandle : public OpHandleBase { struct ScaleLossGradOpHandle : public OpHandleBase {
float coeff_;
Scope *scope_;
platform::Place place_;
ScaleLossGradOpHandle(size_t num_dev, Scope *scope, platform::Place place, ScaleLossGradOpHandle(size_t num_dev, Scope *scope, platform::Place place,
platform::DeviceContext *context); platform::DeviceContext *context);
...@@ -36,6 +34,11 @@ struct ScaleLossGradOpHandle : public OpHandleBase { ...@@ -36,6 +34,11 @@ struct ScaleLossGradOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
private:
float coeff_;
Scope *scope_;
platform::Place place_;
}; };
} // namespace details } // namespace details
......
...@@ -28,10 +28,6 @@ namespace framework { ...@@ -28,10 +28,6 @@ namespace framework {
namespace details { namespace details {
struct SendOpHandle : public OpHandleBase { struct SendOpHandle : public OpHandleBase {
std::unique_ptr<OperatorBase> op_;
const Scope* local_scope_;
const platform::Place& place_;
SendOpHandle(const framework::OpDesc& op_desc, const Scope* local_scope, SendOpHandle(const framework::OpDesc& op_desc, const Scope* local_scope,
const platform::Place& place); const platform::Place& place);
...@@ -43,6 +39,11 @@ struct SendOpHandle : public OpHandleBase { ...@@ -43,6 +39,11 @@ struct SendOpHandle : public OpHandleBase {
protected: protected:
void RunImpl() override; void RunImpl() override;
private:
std::unique_ptr<OperatorBase> op_;
const Scope* local_scope_;
const platform::Place& place_;
}; };
} // namespace details } // namespace details
......
...@@ -117,12 +117,12 @@ void SSAGraphBuilder::PrintGraphviz(const SSAGraph &graph, std::ostream &sout) { ...@@ -117,12 +117,12 @@ void SSAGraphBuilder::PrintGraphviz(const SSAGraph &graph, std::ostream &sout) {
std::string op_name = "op_" + std::to_string(op_id++); std::string op_name = "op_" + std::to_string(op_id++);
sout << op_name << " [label=\"" << op->Name() << "\", shape=rect]" sout << op_name << " [label=\"" << op->Name() << "\", shape=rect]"
<< std::endl; << std::endl;
for (auto in : op->inputs_) { for (auto in : op->Inputs()) {
std::string var_name = "var_" + std::to_string(vars[in]); std::string var_name = "var_" + std::to_string(vars[in]);
sout << var_name << " -> " << op_name << std::endl; sout << var_name << " -> " << op_name << std::endl;
} }
for (auto out : op->outputs_) { for (auto out : op->Outputs()) {
std::string var_name = "var_" + std::to_string(vars[out]); std::string var_name = "var_" + std::to_string(vars[out]);
sout << op_name << " -> " << var_name << std::endl; sout << op_name << " -> " << var_name << std::endl;
} }
...@@ -133,7 +133,7 @@ void SSAGraphBuilder::PrintGraphviz(const SSAGraph &graph, std::ostream &sout) { ...@@ -133,7 +133,7 @@ void SSAGraphBuilder::PrintGraphviz(const SSAGraph &graph, std::ostream &sout) {
void SSAGraphBuilder::AddOutputToLeafOps(SSAGraph *graph) { void SSAGraphBuilder::AddOutputToLeafOps(SSAGraph *graph) {
for (auto &op : graph->ops_) { for (auto &op : graph->ops_) {
if (!op->outputs_.empty()) { if (!op->Outputs().empty()) {
continue; continue;
} }
auto *dummy_leaf = new DummyVarHandle(); auto *dummy_leaf = new DummyVarHandle();
......
...@@ -53,7 +53,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run( ...@@ -53,7 +53,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
}; };
auto InsertPendingOp = [&pending_ops](OpHandleBase &op_instance) { auto InsertPendingOp = [&pending_ops](OpHandleBase &op_instance) {
pending_ops.insert({&op_instance, op_instance.inputs_.size()}); pending_ops.insert({&op_instance, op_instance.Inputs().size()});
}; };
// Transform SSAGraph to pending_ops & pending_vars // Transform SSAGraph to pending_ops & pending_vars
...@@ -69,7 +69,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run( ...@@ -69,7 +69,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
} }
for (auto &op : graph_->ops_) { for (auto &op : graph_->ops_) {
if (op->inputs_.empty()) { // Special case, Op has no input. if (op->Inputs().empty()) { // Special case, Op has no input.
ready_ops.insert(op.get()); ready_ops.insert(op.get());
} else { } else {
InsertPendingOp(*op); InsertPendingOp(*op);
...@@ -99,7 +99,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run( ...@@ -99,7 +99,7 @@ FeedFetchList ThreadedSSAGraphExecutor::Run(
fetch_ops.emplace_back(op); fetch_ops.emplace_back(op);
for (auto &p : places_) { for (auto &p : places_) {
op->dev_ctxes_[p] = fetch_ctxs_.Get(p); op->SetDeviceContext(p, fetch_ctxs_.Get(p));
} }
for (auto *var : vars) { for (auto *var : vars) {
...@@ -180,7 +180,7 @@ void ThreadedSSAGraphExecutor::RunOp( ...@@ -180,7 +180,7 @@ void ThreadedSSAGraphExecutor::RunOp(
op->Run(use_event_); op->Run(use_event_);
VLOG(10) << op << " " << op->Name() << " Done "; VLOG(10) << op << " " << op->Name() << " Done ";
running_ops_--; running_ops_--;
ready_var_q->Extend(op->outputs_); ready_var_q->Extend(op->Outputs());
VLOG(10) << op << " " << op->Name() << "Signal posted"; VLOG(10) << op << " " << op->Name() << "Signal posted";
} catch (platform::EnforceNotMet ex) { } catch (platform::EnforceNotMet ex) {
exception_.reset(new platform::EnforceNotMet(ex)); exception_.reset(new platform::EnforceNotMet(ex));
......
...@@ -119,7 +119,7 @@ class OpDesc { ...@@ -119,7 +119,7 @@ class OpDesc {
void InferVarType(BlockDesc *block) const; void InferVarType(BlockDesc *block) const;
void MarkAsTarget() { desc_.set_is_target(true); } void SetIsTarget(bool is_target) { desc_.set_is_target(is_target); }
void Flush(); void Flush();
......
...@@ -559,125 +559,125 @@ $$out = \frac{x}{1 + e^{- \beta x}}$$ ...@@ -559,125 +559,125 @@ $$out = \frac{x}{1 + e^{- \beta x}}$$
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(sigmoid, ops::ActivationOp, ops::SigmoidOpMaker, REGISTER_OPERATOR(sigmoid, ops::ActivationOp, ops::SigmoidOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(sigmoid_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(sigmoid_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(logsigmoid, ops::ActivationOp, ops::LogSigmoidOpMaker, REGISTER_OPERATOR(logsigmoid, ops::ActivationOp, ops::LogSigmoidOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(logsigmoid_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(logsigmoid_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(exp, ops::ActivationOp, ops::ExpOpMaker, REGISTER_OPERATOR(exp, ops::ActivationOp, ops::ExpOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(exp_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(exp_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(relu, ops::ActivationWithMKLDNNOp, ops::ReluOpMaker, REGISTER_OPERATOR(relu, ops::ActivationWithMKLDNNOp, ops::ReluOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(relu_grad, ops::ActivationWithMKLDNNOpGrad) REGISTER_OPERATOR(relu_grad, ops::ActivationWithMKLDNNOpGrad);
REGISTER_OPERATOR(tanh, ops::ActivationWithMKLDNNOp, ops::TanhOpMaker, REGISTER_OPERATOR(tanh, ops::ActivationWithMKLDNNOp, ops::TanhOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(tanh_grad, ops::ActivationWithMKLDNNOpGrad) REGISTER_OPERATOR(tanh_grad, ops::ActivationWithMKLDNNOpGrad);
REGISTER_OPERATOR(tanh_shrink, ops::ActivationOp, ops::TanhShrinkOpMaker, REGISTER_OPERATOR(tanh_shrink, ops::ActivationOp, ops::TanhShrinkOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(tanh_shrink_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(tanh_shrink_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(softshrink, ops::ActivationOp, ops::SoftShrinkOpMaker, REGISTER_OPERATOR(softshrink, ops::ActivationOp, ops::SoftShrinkOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(softshrink_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(softshrink_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(sqrt, ops::ActivationWithMKLDNNOp, ops::SqrtOpMaker, REGISTER_OPERATOR(sqrt, ops::ActivationWithMKLDNNOp, ops::SqrtOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(sqrt_grad, ops::ActivationWithMKLDNNOpGrad) REGISTER_OPERATOR(sqrt_grad, ops::ActivationWithMKLDNNOpGrad);
REGISTER_OPERATOR(abs, ops::ActivationWithMKLDNNOp, ops::AbsOpMaker, REGISTER_OPERATOR(abs, ops::ActivationWithMKLDNNOp, ops::AbsOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(abs_grad, ops::ActivationWithMKLDNNOpGrad) REGISTER_OPERATOR(abs_grad, ops::ActivationWithMKLDNNOpGrad);
REGISTER_OPERATOR(ceil, ops::ActivationOp, ops::CeilOpMaker, REGISTER_OPERATOR(ceil, ops::ActivationOp, ops::CeilOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(ceil_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(ceil_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(floor, ops::ActivationOp, ops::FloorOpMaker, REGISTER_OPERATOR(floor, ops::ActivationOp, ops::FloorOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(floor_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(floor_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(cos, ops::ActivationOp, ops::CosOpMaker, REGISTER_OPERATOR(cos, ops::ActivationOp, ops::CosOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(cos_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(cos_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(sin, ops::ActivationOp, ops::SinOpMaker, REGISTER_OPERATOR(sin, ops::ActivationOp, ops::SinOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(sin_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(sin_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(round, ops::ActivationOp, ops::RoundOpMaker, REGISTER_OPERATOR(round, ops::ActivationOp, ops::RoundOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(round_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(round_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(reciprocal, ops::ActivationOp, ops::ReciprocalOpMaker, REGISTER_OPERATOR(reciprocal, ops::ActivationOp, ops::ReciprocalOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(reciprocal_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(reciprocal_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(log, ops::ActivationOp, ops::LogOpMaker, REGISTER_OPERATOR(log, ops::ActivationOp, ops::LogOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(log_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(log_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(square, ops::ActivationOp, ops::SquareOpMaker, REGISTER_OPERATOR(square, ops::ActivationOp, ops::SquareOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(square_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(square_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(softplus, ops::ActivationOp, ops::SoftplusOpMaker, REGISTER_OPERATOR(softplus, ops::ActivationOp, ops::SoftplusOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(softplus_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(softplus_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(softsign, ops::ActivationOp, ops::SoftsignOpMaker, REGISTER_OPERATOR(softsign, ops::ActivationOp, ops::SoftsignOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(softsign_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(softsign_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(brelu, ops::ActivationOp, ops::BReluOpMaker, REGISTER_OPERATOR(brelu, ops::ActivationOp, ops::BReluOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(brelu_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(brelu_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(leaky_relu, ops::ActivationOp, ops::LeakyReluOpMaker, REGISTER_OPERATOR(leaky_relu, ops::ActivationOp, ops::LeakyReluOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(leaky_relu_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(leaky_relu_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(soft_relu, ops::ActivationOp, ops::SoftReluOpMaker, REGISTER_OPERATOR(soft_relu, ops::ActivationOp, ops::SoftReluOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(soft_relu_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(soft_relu_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(elu, ops::ActivationOp, ops::ELUOpMaker, REGISTER_OPERATOR(elu, ops::ActivationOp, ops::ELUOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elu_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(elu_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(relu6, ops::ActivationOp, ops::Relu6OpMaker, REGISTER_OPERATOR(relu6, ops::ActivationOp, ops::Relu6OpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(relu6_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(relu6_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(pow, ops::ActivationOp, ops::PowOpMaker, REGISTER_OPERATOR(pow, ops::ActivationOp, ops::PowOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(pow_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(pow_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(stanh, ops::ActivationOp, ops::STanhOpMaker, REGISTER_OPERATOR(stanh, ops::ActivationOp, ops::STanhOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(stanh_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(stanh_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(hard_shrink, ops::ActivationOp, ops::HardShrinkOpMaker, REGISTER_OPERATOR(hard_shrink, ops::ActivationOp, ops::HardShrinkOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(hard_shrink_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(hard_shrink_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(thresholded_relu, ops::ActivationOp, REGISTER_OPERATOR(thresholded_relu, ops::ActivationOp,
ops::ThresholdedReluOpMaker, ops::ThresholdedReluOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(thresholded_relu_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(thresholded_relu_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(hard_sigmoid, ops::ActivationOp, ops::HardSigmoidOpMaker, REGISTER_OPERATOR(hard_sigmoid, ops::ActivationOp, ops::HardSigmoidOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(hard_sigmoid_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(hard_sigmoid_grad, ops::ActivationOpGrad);
REGISTER_OPERATOR(swish, ops::ActivationOp, ops::SwishOpMaker, REGISTER_OPERATOR(swish, ops::ActivationOp, ops::SwishOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(swish_grad, ops::ActivationOpGrad) REGISTER_OPERATOR(swish_grad, ops::ActivationOpGrad);
#define REGISTER_ACTIVATION_CPU_KERNEL(act_type, functor, grad_functor) \ #define REGISTER_ACTIVATION_CPU_KERNEL(act_type, functor, grad_functor) \
REGISTER_OP_CPU_KERNEL( \ REGISTER_OP_CPU_KERNEL( \
......
...@@ -155,9 +155,9 @@ class BilinearTensorProductOpGrad : public framework::OperatorWithKernel { ...@@ -155,9 +155,9 @@ class BilinearTensorProductOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(bilinear_tensor_product, ops::BilinearTensorProductOp, REGISTER_OPERATOR(bilinear_tensor_product, ops::BilinearTensorProductOp,
ops::BilinearTensorProductOpMaker, ops::BilinearTensorProductOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(bilinear_tensor_product_grad, REGISTER_OPERATOR(bilinear_tensor_product_grad,
ops::BilinearTensorProductOpGrad) ops::BilinearTensorProductOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
bilinear_tensor_product, bilinear_tensor_product,
ops::BilinearTensorProductKernel<paddle::platform::CPUDeviceContext, float>, ops::BilinearTensorProductKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -82,8 +82,8 @@ class ClipOpGrad : public framework::OperatorWithKernel { ...@@ -82,8 +82,8 @@ class ClipOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(clip, ops::ClipOp, ops::ClipOpMaker<float>, REGISTER_OPERATOR(clip, ops::ClipOp, ops::ClipOpMaker<float>,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(clip_grad, ops::ClipOpGrad) REGISTER_OPERATOR(clip_grad, ops::ClipOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
clip, ops::ClipKernel<paddle::platform::CPUDeviceContext, float>); clip, ops::ClipKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -105,10 +105,10 @@ class ConcatOpGrad : public framework::OperatorWithKernel { ...@@ -105,10 +105,10 @@ class ConcatOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(concat, ops::ConcatOp, ops::ConcatOpMaker, REGISTER_OPERATOR(concat, ops::ConcatOp, ops::ConcatOpMaker,
paddle::framework::DefaultGradOpDescMaker< paddle::framework::DefaultGradOpDescMaker<
false> /* set false to disable empty grad */) false> /* set false to disable empty grad */);
REGISTER_OPERATOR(concat_grad, ops::ConcatOpGrad) REGISTER_OPERATOR(concat_grad, ops::ConcatOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
concat, ops::ConcatKernel<paddle::platform::CPUDeviceContext, float>) concat, ops::ConcatKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
concat_grad, concat_grad,
ops::ConcatGradKernel<paddle::platform::CPUDeviceContext, float>) ops::ConcatGradKernel<paddle::platform::CPUDeviceContext, float>);
...@@ -336,16 +336,16 @@ framework::OpKernelType ConvOpGrad::GetExpectedKernelType( ...@@ -336,16 +336,16 @@ framework::OpKernelType ConvOpGrad::GetExpectedKernelType(
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(conv2d, ops::ConvOp, ops::Conv2DOpMaker, REGISTER_OPERATOR(conv2d, ops::ConvOp, ops::Conv2DOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(conv2d_grad, ops::ConvOpGrad) REGISTER_OPERATOR(conv2d_grad, ops::ConvOpGrad);
// depthwise convolution op // depthwise convolution op
REGISTER_OPERATOR(depthwise_conv2d, ops::ConvOp, ops::Conv2DOpMaker, REGISTER_OPERATOR(depthwise_conv2d, ops::ConvOp, ops::Conv2DOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(depthwise_conv2d_grad, ops::ConvOpGrad) REGISTER_OPERATOR(depthwise_conv2d_grad, ops::ConvOpGrad);
REGISTER_OPERATOR(conv3d, ops::ConvOp, ops::Conv3DOpMaker, REGISTER_OPERATOR(conv3d, ops::ConvOp, ops::Conv3DOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(conv3d_grad, ops::ConvOpGrad) REGISTER_OPERATOR(conv3d_grad, ops::ConvOpGrad);
// depthwise conv kernel // depthwise conv kernel
// TODO(xingzhaolong): neon kernel for mobile // TODO(xingzhaolong): neon kernel for mobile
......
...@@ -194,8 +194,8 @@ class ConvShiftGradKernel<platform::CPUPlace, T> ...@@ -194,8 +194,8 @@ class ConvShiftGradKernel<platform::CPUPlace, T>
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(conv_shift, ops::ConvShiftOp, ops::ConvShiftOpMaker, REGISTER_OPERATOR(conv_shift, ops::ConvShiftOp, ops::ConvShiftOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(conv_shift_grad, ops::ConvShiftGradOp) REGISTER_OPERATOR(conv_shift_grad, ops::ConvShiftGradOp);
REGISTER_OP_CPU_KERNEL(conv_shift, REGISTER_OP_CPU_KERNEL(conv_shift,
ops::ConvShiftKernel<paddle::platform::CPUPlace, float>); ops::ConvShiftKernel<paddle::platform::CPUPlace, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -300,8 +300,8 @@ namespace ops = paddle::operators; ...@@ -300,8 +300,8 @@ namespace ops = paddle::operators;
REGISTER_OPERATOR(conv2d_transpose, ops::ConvTransposeOp, REGISTER_OPERATOR(conv2d_transpose, ops::ConvTransposeOp,
ops::Conv2DTransposeOpMaker, ops::Conv2DTransposeOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(conv2d_transpose_grad, ops::ConvTransposeOpGrad) REGISTER_OPERATOR(conv2d_transpose_grad, ops::ConvTransposeOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
conv2d_transpose, conv2d_transpose,
...@@ -315,8 +315,8 @@ REGISTER_OP_CPU_KERNEL( ...@@ -315,8 +315,8 @@ REGISTER_OP_CPU_KERNEL(
REGISTER_OPERATOR(conv3d_transpose, ops::ConvTransposeOp, REGISTER_OPERATOR(conv3d_transpose, ops::ConvTransposeOp,
ops::Conv3DTransposeOpMaker, ops::Conv3DTransposeOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(conv3d_transpose_grad, ops::ConvTransposeOpGrad) REGISTER_OPERATOR(conv3d_transpose_grad, ops::ConvTransposeOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
conv3d_transpose, conv3d_transpose,
......
...@@ -154,8 +154,8 @@ class CosSimOpGrad : public framework::OperatorWithKernel { ...@@ -154,8 +154,8 @@ class CosSimOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(cos_sim, ops::CosSimOp, ops::CosSimOpMaker, REGISTER_OPERATOR(cos_sim, ops::CosSimOp, ops::CosSimOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(cos_sim_grad, ops::CosSimOpGrad) REGISTER_OPERATOR(cos_sim_grad, ops::CosSimOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
cos_sim, ops::CosSimKernel<paddle::platform::CPUDeviceContext, float>); cos_sim, ops::CosSimKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -165,8 +165,8 @@ or not. But the output only shares the LoD information with input X. ...@@ -165,8 +165,8 @@ or not. But the output only shares the LoD information with input X.
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(cross_entropy, ops::CrossEntropyOp, ops::CrossEntropyOpMaker, REGISTER_OPERATOR(cross_entropy, ops::CrossEntropyOp, ops::CrossEntropyOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(cross_entropy_grad, ops::CrossEntropyGradientOp) REGISTER_OPERATOR(cross_entropy_grad, ops::CrossEntropyGradientOp);
REGISTER_OP_CPU_KERNEL(cross_entropy, ops::CrossEntropyOpKernel<float>, REGISTER_OP_CPU_KERNEL(cross_entropy, ops::CrossEntropyOpKernel<float>,
ops::CrossEntropyOpKernel<double>); ops::CrossEntropyOpKernel<double>);
REGISTER_OP_CPU_KERNEL(cross_entropy_grad, REGISTER_OP_CPU_KERNEL(cross_entropy_grad,
......
...@@ -79,4 +79,4 @@ using CPU = paddle::platform::CPUDeviceContext; ...@@ -79,4 +79,4 @@ using CPU = paddle::platform::CPUDeviceContext;
REGISTER_OPERATOR(cumsum, ops::CumOp, ops::CumsumOpMaker, ops::CumsumGradMaker); REGISTER_OPERATOR(cumsum, ops::CumOp, ops::CumsumOpMaker, ops::CumsumGradMaker);
REGISTER_OP_CPU_KERNEL(cumsum, ops::CumKernel<CPU, ops::CumsumFunctor<float>>, REGISTER_OP_CPU_KERNEL(cumsum, ops::CumKernel<CPU, ops::CumsumFunctor<float>>,
ops::CumKernel<CPU, ops::CumsumFunctor<double>>, ops::CumKernel<CPU, ops::CumsumFunctor<double>>,
ops::CumKernel<CPU, ops::CumsumFunctor<int>>) ops::CumKernel<CPU, ops::CumsumFunctor<int>>);
...@@ -19,4 +19,4 @@ using CUDA = paddle::platform::CUDADeviceContext; ...@@ -19,4 +19,4 @@ using CUDA = paddle::platform::CUDADeviceContext;
REGISTER_OP_CUDA_KERNEL(cumsum, ops::CumKernel<CUDA, ops::CumsumFunctor<float>>, REGISTER_OP_CUDA_KERNEL(cumsum, ops::CumKernel<CUDA, ops::CumsumFunctor<float>>,
ops::CumKernel<CUDA, ops::CumsumFunctor<double>>, ops::CumKernel<CUDA, ops::CumsumFunctor<double>>,
ops::CumKernel<CUDA, ops::CumsumFunctor<int>>) ops::CumKernel<CUDA, ops::CumsumFunctor<int>>);
...@@ -102,8 +102,8 @@ class DropoutOpGrad : public framework::OperatorWithKernel { ...@@ -102,8 +102,8 @@ class DropoutOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(dropout, ops::DropoutOp, ops::DropoutOpMaker, REGISTER_OPERATOR(dropout, ops::DropoutOp, ops::DropoutOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(dropout_grad, ops::DropoutOpGrad) REGISTER_OPERATOR(dropout_grad, ops::DropoutOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
dropout, ops::CPUDropoutKernel<paddle::platform::CPUDeviceContext, float>); dropout, ops::CPUDropoutKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -32,8 +32,8 @@ class ElementwiseDivOpMaker : public ElementwiseOpMaker { ...@@ -32,8 +32,8 @@ class ElementwiseDivOpMaker : public ElementwiseOpMaker {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_div, ops::ElementwiseOp, REGISTER_OPERATOR(elementwise_div, ops::ElementwiseOp,
ops::ElementwiseDivOpMaker, ops::ElementwiseDivOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_div_grad, ops::ElementwiseOpGrad) REGISTER_OPERATOR(elementwise_div_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_div, elementwise_div,
ops::ElementwiseDivKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseDivKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -31,8 +31,8 @@ class ElementwiseMaxOpMaker : public ElementwiseOpMaker { ...@@ -31,8 +31,8 @@ class ElementwiseMaxOpMaker : public ElementwiseOpMaker {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_max, ops::ElementwiseOp, REGISTER_OPERATOR(elementwise_max, ops::ElementwiseOp,
ops::ElementwiseMaxOpMaker, ops::ElementwiseMaxOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_max_grad, ops::ElementwiseOpGrad) REGISTER_OPERATOR(elementwise_max_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_max, elementwise_max,
ops::ElementwiseMaxKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseMaxKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -31,8 +31,8 @@ class ElementwiseMinOpMaker : public ElementwiseOpMaker { ...@@ -31,8 +31,8 @@ class ElementwiseMinOpMaker : public ElementwiseOpMaker {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_min, ops::ElementwiseOp, REGISTER_OPERATOR(elementwise_min, ops::ElementwiseOp,
ops::ElementwiseMinOpMaker, ops::ElementwiseMinOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_min_grad, ops::ElementwiseOpGrad) REGISTER_OPERATOR(elementwise_min_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_min, elementwise_min,
ops::ElementwiseMinKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseMinKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -33,8 +33,8 @@ class ElementwiseMulOpMaker : public ElementwiseOpMaker { ...@@ -33,8 +33,8 @@ class ElementwiseMulOpMaker : public ElementwiseOpMaker {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_mul, ops::ElementwiseOp, REGISTER_OPERATOR(elementwise_mul, ops::ElementwiseOp,
ops::ElementwiseMulOpMaker, ops::ElementwiseMulOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_mul_grad, ops::ElementwiseOpGrad) REGISTER_OPERATOR(elementwise_mul_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_mul, elementwise_mul,
ops::ElementwiseMulKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseMulKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -31,8 +31,8 @@ class ElementwiseSubOpMaker : public ElementwiseOpMaker { ...@@ -31,8 +31,8 @@ class ElementwiseSubOpMaker : public ElementwiseOpMaker {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(elementwise_sub, ops::ElementwiseOp, REGISTER_OPERATOR(elementwise_sub, ops::ElementwiseOp,
ops::ElementwiseSubOpMaker, ops::ElementwiseSubOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(elementwise_sub_grad, ops::ElementwiseOpGrad) REGISTER_OPERATOR(elementwise_sub_grad, ops::ElementwiseOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
elementwise_sub, elementwise_sub,
ops::ElementwiseSubKernel<paddle::platform::CPUDeviceContext, float>, ops::ElementwiseSubKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and ...@@ -13,6 +13,7 @@ See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/fluid/operators/expand_op.h" #include "paddle/fluid/operators/expand_op.h"
#include <vector>
#include <vector> #include <vector>
...@@ -131,8 +132,8 @@ class ExpandGradOp : public framework::OperatorWithKernel { ...@@ -131,8 +132,8 @@ class ExpandGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(expand, ops::ExpandOp, ops::ExpandOpMaker, REGISTER_OPERATOR(expand, ops::ExpandOp, ops::ExpandOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(expand_grad, ops::ExpandGradOp) REGISTER_OPERATOR(expand_grad, ops::ExpandGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
expand, ops::ExpandKernel<paddle::platform::CPUDeviceContext, float>); expand, ops::ExpandKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -14,13 +14,14 @@ limitations under the License. */ ...@@ -14,13 +14,14 @@ limitations under the License. */
#pragma once #pragma once
#include <vector>
#include <boost/preprocessor/arithmetic/div.hpp> #include <boost/preprocessor/arithmetic/div.hpp>
#include <boost/preprocessor/arithmetic/mod.hpp> #include <boost/preprocessor/arithmetic/mod.hpp>
#include <boost/preprocessor/comparison/greater.hpp> #include <boost/preprocessor/comparison/greater.hpp>
#include <boost/preprocessor/comparison/greater_equal.hpp> #include <boost/preprocessor/comparison/greater_equal.hpp>
#include <boost/preprocessor/control/if.hpp> #include <boost/preprocessor/control/if.hpp>
#include <boost/preprocessor/repetition/repeat.hpp> #include <boost/preprocessor/repetition/repeat.hpp>
#include <iostream>
#include "paddle/fluid/framework/eigen.h" #include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h" #include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/operator.h" #include "paddle/fluid/framework/operator.h"
......
...@@ -99,5 +99,5 @@ FCOpMaker::FCOpMaker(OpProto* proto, OpAttrChecker* op_checker) ...@@ -99,5 +99,5 @@ FCOpMaker::FCOpMaker(OpProto* proto, OpAttrChecker* op_checker)
} // namespace paddle } // namespace paddle
REGISTER_OPERATOR(fc, paddle::operators::FCOp, paddle::operators::FCOpMaker, REGISTER_OPERATOR(fc, paddle::operators::FCOp, paddle::operators::FCOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(fc_grad, paddle::operators::FCOpGrad) REGISTER_OPERATOR(fc_grad, paddle::operators::FCOpGrad);
...@@ -101,7 +101,7 @@ Out = [[3, 4], ...@@ -101,7 +101,7 @@ Out = [[3, 4],
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(gather, ops::GatherOp, ops::GatherOpMaker, REGISTER_OPERATOR(gather, ops::GatherOp, ops::GatherOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(gather_grad, ops::GatherGradOp) REGISTER_OPERATOR(gather_grad, ops::GatherGradOp);
REGISTER_OP_CPU_KERNEL(gather, ops::GatherOpKernel<float>); REGISTER_OP_CPU_KERNEL(gather, ops::GatherOpKernel<float>);
REGISTER_OP_CPU_KERNEL(gather_grad, ops::GatherGradientOpKernel<float>); REGISTER_OP_CPU_KERNEL(gather_grad, ops::GatherGradientOpKernel<float>);
...@@ -12,10 +12,10 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ...@@ -12,10 +12,10 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "gather.cu.h"
#include "paddle/fluid/framework/eigen.h" #include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/operators/gather.cu.h"
#include "paddle/fluid/operators/gather_op.h" #include "paddle/fluid/operators/gather_op.h"
#include "scatter.cu.h" #include "paddle/fluid/operators/scatter.cu.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
...@@ -13,10 +13,10 @@ See the License for the specific language governing permissions and ...@@ -13,10 +13,10 @@ See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#pragma once #pragma once
#include "gather.h"
#include "paddle/fluid/framework/eigen.h" #include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h" #include "paddle/fluid/framework/op_registry.h"
#include "scatter.h" #include "paddle/fluid/operators/gather.h"
#include "paddle/fluid/operators/scatter.h"
namespace paddle { namespace paddle {
namespace operators { namespace operators {
......
...@@ -12,38 +12,37 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ...@@ -12,38 +12,37 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include "paddle/fluid/operators/gather.h"
#include "paddle/fluid/framework/ddim.h"
#include "paddle/fluid/framework/tensor.h"
#include "paddle/fluid/platform/place.h"
#include <gtest/gtest.h> #include <gtest/gtest.h>
#include <iostream> #include <iostream>
#include <string> #include <string>
TEST(Gather, GatherData) { #include "paddle/fluid/framework/ddim.h"
using namespace paddle::framework; #include "paddle/fluid/framework/tensor.h"
using namespace paddle::platform; #include "paddle/fluid/operators/gather.h"
using namespace paddle::operators; #include "paddle/fluid/platform/place.h"
Tensor* src = new Tensor(); TEST(Gather, GatherData) {
Tensor* index = new Tensor(); paddle::framework::Tensor* src = new paddle::framework::Tensor();
Tensor* output = new Tensor(); paddle::framework::Tensor* index = new paddle::framework::Tensor();
paddle::framework::Tensor* output = new paddle::framework::Tensor();
int* p_src = nullptr; int* p_src = nullptr;
int* p_index = nullptr; int* p_index = nullptr;
p_src = src->mutable_data<int>(make_ddim({3, 4}), CPUPlace()); p_src = src->mutable_data<int>(paddle::framework::make_ddim({3, 4}),
p_index = index->mutable_data<int>(make_ddim({2}), CPUPlace()); paddle::platform::CPUPlace());
p_index = index->mutable_data<int>(paddle::framework::make_ddim({2}),
paddle::platform::CPUPlace());
for (int i = 0; i < 12; ++i) p_src[i] = i; for (int i = 0; i < 12; ++i) p_src[i] = i;
p_index[0] = 1; p_index[0] = 1;
p_index[1] = 0; p_index[1] = 0;
int* p_output = output->mutable_data<int>(make_ddim({2, 4}), CPUPlace()); int* p_output = output->mutable_data<int>(
paddle::framework::make_ddim({2, 4}), paddle::platform::CPUPlace());
auto* cpu_place = new paddle::platform::CPUPlace(); auto* cpu_place = new paddle::platform::CPUPlace();
paddle::platform::CPUDeviceContext ctx(*cpu_place); paddle::platform::CPUDeviceContext ctx(*cpu_place);
CPUGather<int>(ctx, *src, *index, output); paddle::operators::CPUGather<int>(ctx, *src, *index, output);
for (int i = 0; i < 4; ++i) EXPECT_EQ(p_output[i], i + 4); for (int i = 0; i < 4; ++i) EXPECT_EQ(p_output[i], i + 4);
for (int i = 4; i < 8; ++i) EXPECT_EQ(p_output[i], i - 4); for (int i = 4; i < 8; ++i) EXPECT_EQ(p_output[i], i - 4);
......
...@@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ...@@ -12,7 +12,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. */ limitations under the License. */
#include <thread> #include <thread> // NOLINT
#include "paddle/fluid/framework/op_registry.h" #include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/operators/detail/safe_ref.h" #include "paddle/fluid/operators/detail/safe_ref.h"
#include "paddle/fluid/platform/place.h" #include "paddle/fluid/platform/place.h"
......
...@@ -217,8 +217,8 @@ class GRUGradOp : public framework::OperatorWithKernel { ...@@ -217,8 +217,8 @@ class GRUGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(gru, ops::GRUOp, ops::GRUOpMaker, REGISTER_OPERATOR(gru, ops::GRUOp, ops::GRUOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(gru_grad, ops::GRUGradOp) REGISTER_OPERATOR(gru_grad, ops::GRUGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
gru, ops::GRUKernel<paddle::platform::CPUDeviceContext, float>, gru, ops::GRUKernel<paddle::platform::CPUDeviceContext, float>,
ops::GRUKernel<paddle::platform::CPUDeviceContext, double>); ops::GRUKernel<paddle::platform::CPUDeviceContext, double>);
......
...@@ -104,8 +104,8 @@ class HingeLossGradOp : public framework::OperatorWithKernel { ...@@ -104,8 +104,8 @@ class HingeLossGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(hinge_loss, ops::HingeLossOp, ops::HingeLossOpMaker<float>, REGISTER_OPERATOR(hinge_loss, ops::HingeLossOp, ops::HingeLossOpMaker<float>,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(hinge_loss_grad, ops::HingeLossGradOp) REGISTER_OPERATOR(hinge_loss_grad, ops::HingeLossGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
hinge_loss, hinge_loss,
ops::HingeLossKernel<paddle::platform::CPUDeviceContext, float>); ops::HingeLossKernel<paddle::platform::CPUDeviceContext, float>);
......
...@@ -122,8 +122,8 @@ class HuberLossGradOp : public framework::OperatorWithKernel { ...@@ -122,8 +122,8 @@ class HuberLossGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(huber_loss, ops::HuberLossOp, ops::HuberLossOpMaker<float>, REGISTER_OPERATOR(huber_loss, ops::HuberLossOp, ops::HuberLossOpMaker<float>,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(huber_loss_grad, ops::HuberLossGradOp) REGISTER_OPERATOR(huber_loss_grad, ops::HuberLossGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
huber_loss, huber_loss,
ops::HuberLossKernel<paddle::platform::CPUDeviceContext, float>); ops::HuberLossKernel<paddle::platform::CPUDeviceContext, float>);
......
...@@ -149,8 +149,8 @@ class Im2SequenceGradOp : public framework::OperatorWithKernel { ...@@ -149,8 +149,8 @@ class Im2SequenceGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(im2sequence, ops::Im2SequenceOp, ops::Im2SequenceOpMaker, REGISTER_OPERATOR(im2sequence, ops::Im2SequenceOp, ops::Im2SequenceOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(im2sequence_grad, ops::Im2SequenceGradOp) REGISTER_OPERATOR(im2sequence_grad, ops::Im2SequenceGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
im2sequence, im2sequence,
ops::Im2SequenceKernel<paddle::platform::CPUDeviceContext, float>); ops::Im2SequenceKernel<paddle::platform::CPUDeviceContext, float>);
......
...@@ -89,4 +89,4 @@ REGISTER_OP_CPU_KERNEL( ...@@ -89,4 +89,4 @@ REGISTER_OP_CPU_KERNEL(
increment, ops::IncrementKernel<paddle::platform::CPUDeviceContext, float>, increment, ops::IncrementKernel<paddle::platform::CPUDeviceContext, float>,
ops::IncrementKernel<paddle::platform::CPUDeviceContext, double>, ops::IncrementKernel<paddle::platform::CPUDeviceContext, double>,
ops::IncrementKernel<paddle::platform::CPUDeviceContext, int>, ops::IncrementKernel<paddle::platform::CPUDeviceContext, int>,
ops::IncrementKernel<paddle::platform::CPUDeviceContext, int64_t>) ops::IncrementKernel<paddle::platform::CPUDeviceContext, int64_t>);
...@@ -19,4 +19,4 @@ REGISTER_OP_CUDA_KERNEL( ...@@ -19,4 +19,4 @@ REGISTER_OP_CUDA_KERNEL(
increment, ops::IncrementKernel<paddle::platform::CUDADeviceContext, float>, increment, ops::IncrementKernel<paddle::platform::CUDADeviceContext, float>,
ops::IncrementKernel<paddle::platform::CUDADeviceContext, double>, ops::IncrementKernel<paddle::platform::CUDADeviceContext, double>,
ops::IncrementKernel<paddle::platform::CUDADeviceContext, int>, ops::IncrementKernel<paddle::platform::CUDADeviceContext, int>,
ops::IncrementKernel<paddle::platform::CUDADeviceContext, int64_t>) ops::IncrementKernel<paddle::platform::CUDADeviceContext, int64_t>);
文件模式从 100755 更改为 100644
文件模式从 100755 更改为 100644
...@@ -68,8 +68,8 @@ $$Out = \sum{|X|}$$ ...@@ -68,8 +68,8 @@ $$Out = \sum{|X|}$$
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(l1_norm, ops::L1NormOp, ops::L1NormOpMaker, REGISTER_OPERATOR(l1_norm, ops::L1NormOp, ops::L1NormOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(l1_norm_grad, ops::L1NormGradOp) REGISTER_OPERATOR(l1_norm_grad, ops::L1NormGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
l1_norm, ops::L1NormKernel<paddle::platform::CPUDeviceContext, float>); l1_norm, ops::L1NormKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -118,8 +118,8 @@ class LabelSmoothGradOp : public framework::OperatorWithKernel { ...@@ -118,8 +118,8 @@ class LabelSmoothGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(label_smooth, ops::LabelSmoothOp, ops::LabelSmoothOpMaker, REGISTER_OPERATOR(label_smooth, ops::LabelSmoothOp, ops::LabelSmoothOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(label_smooth_grad, ops::LabelSmoothGradOp) REGISTER_OPERATOR(label_smooth_grad, ops::LabelSmoothGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
label_smooth, label_smooth,
ops::LabelSmoothKernel<paddle::platform::CPUDeviceContext, float>, ops::LabelSmoothKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -163,8 +163,8 @@ class LayerNormGradOp : public framework::OperatorWithKernel { ...@@ -163,8 +163,8 @@ class LayerNormGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(layer_norm, ops::LayerNormOp, ops::LayerNormOpMaker, REGISTER_OPERATOR(layer_norm, ops::LayerNormOp, ops::LayerNormOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(layer_norm_grad, ops::LayerNormGradOp) REGISTER_OPERATOR(layer_norm_grad, ops::LayerNormGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
layer_norm, ops::LayerNormKernel<paddle::platform::CPUDeviceContext, float>, layer_norm, ops::LayerNormKernel<paddle::platform::CPUDeviceContext, float>,
ops::LayerNormKernel<paddle::platform::CPUDeviceContext, double>); ops::LayerNormKernel<paddle::platform::CPUDeviceContext, double>);
......
...@@ -258,8 +258,8 @@ class LinearChainCRFGradOp : public framework::OperatorWithKernel { ...@@ -258,8 +258,8 @@ class LinearChainCRFGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(linear_chain_crf, ops::LinearChainCRFOp, REGISTER_OPERATOR(linear_chain_crf, ops::LinearChainCRFOp,
ops::LinearChainCRFOpMaker, ops::LinearChainCRFOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(linear_chain_crf_grad, ops::LinearChainCRFGradOp) REGISTER_OPERATOR(linear_chain_crf_grad, ops::LinearChainCRFGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
linear_chain_crf, linear_chain_crf,
ops::LinearChainCRFOpKernel<paddle::platform::CPUDeviceContext, float>, ops::LinearChainCRFOpKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -156,8 +156,8 @@ class LoDResetGradOp : public framework::OperatorWithKernel { ...@@ -156,8 +156,8 @@ class LoDResetGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(lod_reset, ops::LoDResetOp, ops::LoDResetOpMaker, REGISTER_OPERATOR(lod_reset, ops::LoDResetOp, ops::LoDResetOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(lod_reset_grad, ops::LoDResetGradOp) REGISTER_OPERATOR(lod_reset_grad, ops::LoDResetGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
lod_reset, ops::LoDResetKernel<paddle::platform::CPUPlace, float>, lod_reset, ops::LoDResetKernel<paddle::platform::CPUPlace, float>,
ops::LoDResetKernel<paddle::platform::CPUPlace, double>, ops::LoDResetKernel<paddle::platform::CPUPlace, double>,
......
...@@ -107,8 +107,8 @@ class LogLossGradOp : public framework::OperatorWithKernel { ...@@ -107,8 +107,8 @@ class LogLossGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(log_loss, ops::LogLossOp, ops::LogLossOpMaker<float>, REGISTER_OPERATOR(log_loss, ops::LogLossOp, ops::LogLossOpMaker<float>,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(log_loss_grad, ops::LogLossGradOp) REGISTER_OPERATOR(log_loss_grad, ops::LogLossGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
log_loss, ops::LogLossKernel<paddle::platform::CPUDeviceContext, float>); log_loss, ops::LogLossKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -277,8 +277,8 @@ class LRNOpGrad : public framework::OperatorWithKernel { ...@@ -277,8 +277,8 @@ class LRNOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(lrn, ops::LRNOp, ops::LRNOpMaker<float>, REGISTER_OPERATOR(lrn, ops::LRNOp, ops::LRNOpMaker<float>,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(lrn_grad, ops::LRNOpGrad) REGISTER_OPERATOR(lrn_grad, ops::LRNOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
lrn, ops::LRNKernel<paddle::platform::CPUDeviceContext, float>); lrn, ops::LRNKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -274,8 +274,8 @@ class LSTMGradOp : public framework::OperatorWithKernel { ...@@ -274,8 +274,8 @@ class LSTMGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(lstm, ops::LSTMOp, ops::LSTMOpMaker, REGISTER_OPERATOR(lstm, ops::LSTMOp, ops::LSTMOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(lstm_grad, ops::LSTMGradOp) REGISTER_OPERATOR(lstm_grad, ops::LSTMGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
lstm, ops::LSTMKernel<paddle::platform::CPUDeviceContext, float>, lstm, ops::LSTMKernel<paddle::platform::CPUDeviceContext, float>,
ops::LSTMKernel<paddle::platform::CPUDeviceContext, double>); ops::LSTMKernel<paddle::platform::CPUDeviceContext, double>);
......
...@@ -98,8 +98,8 @@ class LstmUnitGradOp : public framework::OperatorWithKernel { ...@@ -98,8 +98,8 @@ class LstmUnitGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(lstm_unit, ops::LstmUnitOp, ops::LstmUnitOpMaker, REGISTER_OPERATOR(lstm_unit, ops::LstmUnitOp, ops::LstmUnitOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(lstm_unit_grad, ops::LstmUnitGradOp) REGISTER_OPERATOR(lstm_unit_grad, ops::LstmUnitGradOp);
REGISTER_OP_CPU_KERNEL(lstm_unit, REGISTER_OP_CPU_KERNEL(lstm_unit,
ops::LstmUnitKernel<paddle::platform::CPUPlace, float>, ops::LstmUnitKernel<paddle::platform::CPUPlace, float>,
ops::LstmUnitKernel<paddle::platform::CPUPlace, double>); ops::LstmUnitKernel<paddle::platform::CPUPlace, double>);
......
...@@ -323,8 +323,8 @@ class LSTMPGradOp : public framework::OperatorWithKernel { ...@@ -323,8 +323,8 @@ class LSTMPGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(lstmp, ops::LSTMPOp, ops::LSTMPOpMaker, REGISTER_OPERATOR(lstmp, ops::LSTMPOp, ops::LSTMPOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(lstmp_grad, ops::LSTMPGradOp) REGISTER_OPERATOR(lstmp_grad, ops::LSTMPGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
lstmp, ops::LSTMPKernel<paddle::platform::CPUDeviceContext, float>, lstmp, ops::LSTMPKernel<paddle::platform::CPUDeviceContext, float>,
ops::LSTMPKernel<paddle::platform::CPUDeviceContext, double>); ops::LSTMPKernel<paddle::platform::CPUDeviceContext, double>);
......
...@@ -113,8 +113,8 @@ namespace ops = paddle::operators; ...@@ -113,8 +113,8 @@ namespace ops = paddle::operators;
REGISTER_OPERATOR(margin_rank_loss, ops::MarginRankLossOp, REGISTER_OPERATOR(margin_rank_loss, ops::MarginRankLossOp,
ops::MarginRankLossOpMaker<float>, ops::MarginRankLossOpMaker<float>,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(margin_rank_loss_grad, ops::MarginRankLossGradOp) REGISTER_OPERATOR(margin_rank_loss_grad, ops::MarginRankLossGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
margin_rank_loss, margin_rank_loss,
ops::MarginRankLossKernel<paddle::platform::CPUDeviceContext, float>); ops::MarginRankLossKernel<paddle::platform::CPUDeviceContext, float>);
......
...@@ -238,8 +238,8 @@ class MatMulOpGrad : public framework::OperatorWithKernel { ...@@ -238,8 +238,8 @@ class MatMulOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(matmul, ops::MatMulOp, ops::MatMulOpMaker, REGISTER_OPERATOR(matmul, ops::MatMulOp, ops::MatMulOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(matmul_grad, ops::MatMulOpGrad) REGISTER_OPERATOR(matmul_grad, ops::MatMulOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
matmul, ops::MatMulKernel<paddle::platform::CPUDeviceContext, float>); matmul, ops::MatMulKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -102,8 +102,8 @@ class MaxOutOpGrad : public framework::OperatorWithKernel { ...@@ -102,8 +102,8 @@ class MaxOutOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(maxout, ops::MaxOutOp, ops::MaxOutOpMaker, REGISTER_OPERATOR(maxout, ops::MaxOutOp, ops::MaxOutOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(maxout_grad, ops::MaxOutOpGrad) REGISTER_OPERATOR(maxout_grad, ops::MaxOutOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
maxout, ops::MaxOutKernel<paddle::platform::CPUDeviceContext, float>); maxout, ops::MaxOutKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -110,8 +110,8 @@ class ModifiedHuberLossGradOp : public framework::OperatorWithKernel { ...@@ -110,8 +110,8 @@ class ModifiedHuberLossGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(modified_huber_loss, ops::ModifiedHuberLossOp, REGISTER_OPERATOR(modified_huber_loss, ops::ModifiedHuberLossOp,
ops::ModifiedHuberLossOpMaker, ops::ModifiedHuberLossOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(modified_huber_loss_grad, ops::ModifiedHuberLossGradOp) REGISTER_OPERATOR(modified_huber_loss_grad, ops::ModifiedHuberLossGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
modified_huber_loss, modified_huber_loss,
......
...@@ -161,8 +161,8 @@ class MulGradOp : public framework::OperatorWithKernel { ...@@ -161,8 +161,8 @@ class MulGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(mul, ops::MulOp, ops::MulOpMaker, REGISTER_OPERATOR(mul, ops::MulOp, ops::MulOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(mul_grad, ops::MulGradOp) REGISTER_OPERATOR(mul_grad, ops::MulGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
mul, ops::MulKernel<paddle::platform::CPUDeviceContext, float>); mul, ops::MulKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -173,8 +173,8 @@ class MultiClassNMSKernel : public framework::OpKernel<T> { ...@@ -173,8 +173,8 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
void MultiClassNMS(const framework::ExecutionContext& ctx, void MultiClassNMS(const framework::ExecutionContext& ctx,
const Tensor& scores, const Tensor& bboxes, const Tensor& scores, const Tensor& bboxes,
std::map<int, std::vector<int>>& indices, std::map<int, std::vector<int>>* indices,
int& num_nmsed_out) const { int* num_nmsed_out) const {
int64_t background_label = ctx.Attr<int>("background_label"); int64_t background_label = ctx.Attr<int>("background_label");
int64_t nms_top_k = ctx.Attr<int>("nms_top_k"); int64_t nms_top_k = ctx.Attr<int>("nms_top_k");
int64_t keep_top_k = ctx.Attr<int>("keep_top_k"); int64_t keep_top_k = ctx.Attr<int>("keep_top_k");
...@@ -189,15 +189,15 @@ class MultiClassNMSKernel : public framework::OpKernel<T> { ...@@ -189,15 +189,15 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
if (c == background_label) continue; if (c == background_label) continue;
Tensor score = scores.Slice(c, c + 1); Tensor score = scores.Slice(c, c + 1);
NMSFast(bboxes, score, score_threshold, nms_threshold, nms_eta, nms_top_k, NMSFast(bboxes, score, score_threshold, nms_threshold, nms_eta, nms_top_k,
&(indices[c])); &((*indices)[c]));
num_det += indices[c].size(); num_det += (*indices)[c].size();
} }
num_nmsed_out = num_det; *num_nmsed_out = num_det;
const T* scores_data = scores.data<T>(); const T* scores_data = scores.data<T>();
if (keep_top_k > -1 && num_det > keep_top_k) { if (keep_top_k > -1 && num_det > keep_top_k) {
std::vector<std::pair<float, std::pair<int, int>>> score_index_pairs; std::vector<std::pair<float, std::pair<int, int>>> score_index_pairs;
for (const auto& it : indices) { for (const auto& it : *indices) {
int label = it.first; int label = it.first;
const T* sdata = scores_data + label * predict_dim; const T* sdata = scores_data + label * predict_dim;
const std::vector<int>& label_indices = it.second; const std::vector<int>& label_indices = it.second;
...@@ -220,13 +220,13 @@ class MultiClassNMSKernel : public framework::OpKernel<T> { ...@@ -220,13 +220,13 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
int idx = score_index_pairs[j].second.second; int idx = score_index_pairs[j].second.second;
new_indices[label].push_back(idx); new_indices[label].push_back(idx);
} }
new_indices.swap(indices); new_indices.swap(*indices);
num_nmsed_out = keep_top_k; *num_nmsed_out = keep_top_k;
} }
} }
void MultiClassOutput(const Tensor& scores, const Tensor& bboxes, void MultiClassOutput(const Tensor& scores, const Tensor& bboxes,
std::map<int, std::vector<int>>& selected_indices, const std::map<int, std::vector<int>>& selected_indices,
Tensor* outs) const { Tensor* outs) const {
int predict_dim = scores.dims()[1]; int predict_dim = scores.dims()[1];
auto* scores_data = scores.data<T>(); auto* scores_data = scores.data<T>();
...@@ -273,7 +273,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> { ...@@ -273,7 +273,7 @@ class MultiClassNMSKernel : public framework::OpKernel<T> {
std::map<int, std::vector<int>> indices; std::map<int, std::vector<int>> indices;
int num_nmsed_out = 0; int num_nmsed_out = 0;
MultiClassNMS(ctx, ins_score, ins_boxes, indices, num_nmsed_out); MultiClassNMS(ctx, ins_score, ins_boxes, &indices, &num_nmsed_out);
all_indices.push_back(indices); all_indices.push_back(indices);
batch_starts.push_back(batch_starts.back() + num_nmsed_out); batch_starts.push_back(batch_starts.back() + num_nmsed_out);
} }
......
...@@ -135,8 +135,9 @@ class NCCLBcastKernel : public framework::OpKernel<T> { ...@@ -135,8 +135,9 @@ class NCCLBcastKernel : public framework::OpKernel<T> {
auto* x = ctx.Input<LoDTensor>("X"); auto* x = ctx.Input<LoDTensor>("X");
VLOG(3) << "gpu : " << gpu_id << " invoke Bcast. send " << x->numel(); VLOG(3) << "gpu : " << gpu_id << " invoke Bcast. send " << x->numel();
PADDLE_ENFORCE(platform::dynload::ncclBcast( PADDLE_ENFORCE(platform::dynload::ncclBcast(
(void*)x->data<T>(), x->numel(), NCCLTypeWrapper<T>::type, root, reinterpret_cast<void*>(const_cast<T*>(x->data<T>())), x->numel(),
comm->comms().at(idx), ctx.cuda_device_context().stream())); NCCLTypeWrapper<T>::type, root, comm->comms().at(idx),
ctx.cuda_device_context().stream()));
VLOG(3) << "gpu : " << gpu_id << " finished Bcast."; VLOG(3) << "gpu : " << gpu_id << " finished Bcast.";
} else { } else {
auto* out = ctx.Output<LoDTensor>("Out"); auto* out = ctx.Output<LoDTensor>("Out");
......
...@@ -182,8 +182,8 @@ class NCEOpGrad : public framework::OperatorWithKernel { ...@@ -182,8 +182,8 @@ class NCEOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(nce, ops::NCEOp, ops::NCEOpMaker, REGISTER_OPERATOR(nce, ops::NCEOp, ops::NCEOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(nce_grad, ops::NCEOpGrad) REGISTER_OPERATOR(nce_grad, ops::NCEOpGrad);
REGISTER_OP_CPU_KERNEL(nce, ops::NCEKernel<paddle::platform::CPUPlace, float>, REGISTER_OP_CPU_KERNEL(nce, ops::NCEKernel<paddle::platform::CPUPlace, float>,
ops::NCEKernel<paddle::platform::CPUPlace, double>); ops::NCEKernel<paddle::platform::CPUPlace, double>);
REGISTER_OP_CPU_KERNEL(nce_grad, REGISTER_OP_CPU_KERNEL(nce_grad,
......
...@@ -16,6 +16,7 @@ limitations under the License. */ ...@@ -16,6 +16,7 @@ limitations under the License. */
#include <math.h> #include <math.h>
#include <random> #include <random>
#include <vector>
#include "paddle/fluid/framework/eigen.h" #include "paddle/fluid/framework/eigen.h"
#include "paddle/fluid/framework/op_registry.h" #include "paddle/fluid/framework/op_registry.h"
#include "unsupported/Eigen/CXX11/Tensor" #include "unsupported/Eigen/CXX11/Tensor"
...@@ -108,7 +109,7 @@ class NCEKernel : public framework::OpKernel<T> { ...@@ -108,7 +109,7 @@ class NCEKernel : public framework::OpKernel<T> {
auto weight_mat = EigenMatrix<T>::From(*(context.Input<Tensor>("Weight"))); auto weight_mat = EigenMatrix<T>::From(*(context.Input<Tensor>("Weight")));
for (int64_t i = 0; i < sample_labels->numel(); ++i) { for (int64_t i = 0; i < sample_labels->numel(); ++i) {
Eigen::Tensor<T, 0, Eigen::RowMajor, Eigen::DenseIndex> result = Eigen::Tensor<T, 0, Eigen::RowMajor, Eigen::DenseIndex> result =
(input_mat.chip((int)(i / sample_labels->dims()[1]), 0) * (input_mat.chip(static_cast<int>(i / sample_labels->dims()[1]), 0) *
weight_mat.chip(sample_labels_data[i], 0)) weight_mat.chip(sample_labels_data[i], 0))
.sum(); .sum();
sample_out_data[i] += result(0); sample_out_data[i] += result(0);
...@@ -190,7 +191,7 @@ class NCEGradKernel : public framework::OpKernel<T> { ...@@ -190,7 +191,7 @@ class NCEGradKernel : public framework::OpKernel<T> {
auto x_matrix = EigenMatrix<T>::From(*(context.Input<Tensor>("Input"))); auto x_matrix = EigenMatrix<T>::From(*(context.Input<Tensor>("Input")));
for (int64_t i = 0; i < sample_labels->numel(); ++i) { for (int64_t i = 0; i < sample_labels->numel(); ++i) {
d_w_matrix.chip(sample_labels_data[i], 0) += d_w_matrix.chip(sample_labels_data[i], 0) +=
x_matrix.chip((int)(i / sample_labels->dims()[1]), 0) * x_matrix.chip(static_cast<int>(i / sample_labels->dims()[1]), 0) *
sample_grad_data[i]; sample_grad_data[i];
} }
} }
...@@ -202,7 +203,7 @@ class NCEGradKernel : public framework::OpKernel<T> { ...@@ -202,7 +203,7 @@ class NCEGradKernel : public framework::OpKernel<T> {
auto d_x_matrix = EigenMatrix<T>::From(*d_x); auto d_x_matrix = EigenMatrix<T>::From(*d_x);
auto w_matrix = EigenMatrix<T>::From(*(context.Input<Tensor>("Weight"))); auto w_matrix = EigenMatrix<T>::From(*(context.Input<Tensor>("Weight")));
for (int64_t i = 0; i < sample_labels->numel(); ++i) { for (int64_t i = 0; i < sample_labels->numel(); ++i) {
d_x_matrix.chip((int)(i / sample_labels->dims()[1]), 0) += d_x_matrix.chip(static_cast<int>(i / sample_labels->dims()[1]), 0) +=
w_matrix.chip(sample_labels_data[i], 0) * sample_grad_data[i]; w_matrix.chip(sample_labels_data[i], 0) * sample_grad_data[i];
} }
} }
......
...@@ -86,8 +86,8 @@ class NormOpGrad : public framework::OperatorWithKernel { ...@@ -86,8 +86,8 @@ class NormOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(norm, ops::NormOp, ops::NormOpMaker<float>, REGISTER_OPERATOR(norm, ops::NormOp, ops::NormOpMaker<float>,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(norm_grad, ops::NormOpGrad) REGISTER_OPERATOR(norm_grad, ops::NormOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
norm, ops::NormKernel<paddle::platform::CPUDeviceContext, float>, norm, ops::NormKernel<paddle::platform::CPUDeviceContext, float>,
ops::NormKernel<paddle::platform::CPUDeviceContext, double, float>); ops::NormKernel<paddle::platform::CPUDeviceContext, double, float>);
......
...@@ -334,19 +334,19 @@ Example: ...@@ -334,19 +334,19 @@ Example:
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(pool2d, ops::PoolOp, ops::Pool2dOpMaker, REGISTER_OPERATOR(pool2d, ops::PoolOp, ops::Pool2dOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(pool2d_grad, ops::PoolOpGrad) REGISTER_OPERATOR(pool2d_grad, ops::PoolOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
pool2d, ops::PoolKernel<paddle::platform::CPUDeviceContext, float>, pool2d, ops::PoolKernel<paddle::platform::CPUDeviceContext, float>,
ops::PoolKernel<paddle::platform::CPUDeviceContext, double>); ops::PoolKernel<paddle::platform::CPUDeviceContext, double>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
pool2d_grad, ops::PoolGradKernel<paddle::platform::CPUDeviceContext, float>, pool2d_grad, ops::PoolGradKernel<paddle::platform::CPUDeviceContext, float>,
ops::PoolGradKernel<paddle::platform::CPUDeviceContext, double>) ops::PoolGradKernel<paddle::platform::CPUDeviceContext, double>);
REGISTER_OPERATOR(pool3d, ops::PoolOp, ops::Pool3dOpMaker, REGISTER_OPERATOR(pool3d, ops::PoolOp, ops::Pool3dOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(pool3d_grad, ops::PoolOpGrad) REGISTER_OPERATOR(pool3d_grad, ops::PoolOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
pool3d, ops::PoolKernel<paddle::platform::CPUDeviceContext, float>, pool3d, ops::PoolKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -260,8 +260,8 @@ namespace ops = paddle::operators; ...@@ -260,8 +260,8 @@ namespace ops = paddle::operators;
REGISTER_OPERATOR(max_pool2d_with_index, ops::MaxPoolWithIndexOp, REGISTER_OPERATOR(max_pool2d_with_index, ops::MaxPoolWithIndexOp,
ops::MaxPool2dWithIndexOpMaker, ops::MaxPool2dWithIndexOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(max_pool2d_with_index_grad, ops::MaxPoolWithIndexOpGrad) REGISTER_OPERATOR(max_pool2d_with_index_grad, ops::MaxPoolWithIndexOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
max_pool2d_with_index, max_pool2d_with_index,
...@@ -273,12 +273,12 @@ REGISTER_OP_CPU_KERNEL( ...@@ -273,12 +273,12 @@ REGISTER_OP_CPU_KERNEL(
ops::MaxPoolWithIndexGradKernel<paddle::platform::CPUDeviceContext, float, ops::MaxPoolWithIndexGradKernel<paddle::platform::CPUDeviceContext, float,
int>, int>,
ops::MaxPoolWithIndexGradKernel<paddle::platform::CPUDeviceContext, double, ops::MaxPoolWithIndexGradKernel<paddle::platform::CPUDeviceContext, double,
int>) int>);
REGISTER_OPERATOR(max_pool3d_with_index, ops::MaxPoolWithIndexOp, REGISTER_OPERATOR(max_pool3d_with_index, ops::MaxPoolWithIndexOp,
ops::MaxPool3dWithIndexOpMaker, ops::MaxPool3dWithIndexOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(max_pool3d_with_index_grad, ops::MaxPoolWithIndexOpGrad) REGISTER_OPERATOR(max_pool3d_with_index_grad, ops::MaxPoolWithIndexOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
max_pool3d_with_index, max_pool3d_with_index,
...@@ -290,4 +290,4 @@ REGISTER_OP_CPU_KERNEL( ...@@ -290,4 +290,4 @@ REGISTER_OP_CPU_KERNEL(
ops::MaxPoolWithIndexGradKernel<paddle::platform::CPUDeviceContext, float, ops::MaxPoolWithIndexGradKernel<paddle::platform::CPUDeviceContext, float,
int>, int>,
ops::MaxPoolWithIndexGradKernel<paddle::platform::CPUDeviceContext, double, ops::MaxPoolWithIndexGradKernel<paddle::platform::CPUDeviceContext, double,
int>) int>);
...@@ -27,7 +27,7 @@ REGISTER_OP_CUDA_KERNEL( ...@@ -27,7 +27,7 @@ REGISTER_OP_CUDA_KERNEL(
ops::MaxPoolWithIndexGradKernel<paddle::platform::CUDADeviceContext, float, ops::MaxPoolWithIndexGradKernel<paddle::platform::CUDADeviceContext, float,
int>, int>,
ops::MaxPoolWithIndexGradKernel<paddle::platform::CUDADeviceContext, double, ops::MaxPoolWithIndexGradKernel<paddle::platform::CUDADeviceContext, double,
int>) int>);
REGISTER_OP_CUDA_KERNEL( REGISTER_OP_CUDA_KERNEL(
max_pool3d_with_index, max_pool3d_with_index,
...@@ -40,4 +40,4 @@ REGISTER_OP_CUDA_KERNEL( ...@@ -40,4 +40,4 @@ REGISTER_OP_CUDA_KERNEL(
ops::MaxPoolWithIndexGradKernel<paddle::platform::CUDADeviceContext, float, ops::MaxPoolWithIndexGradKernel<paddle::platform::CUDADeviceContext, float,
int>, int>,
ops::MaxPoolWithIndexGradKernel<paddle::platform::CUDADeviceContext, double, ops::MaxPoolWithIndexGradKernel<paddle::platform::CUDADeviceContext, double,
int>) int>);
...@@ -84,8 +84,8 @@ class PReluGradOp : public framework::OperatorWithKernel { ...@@ -84,8 +84,8 @@ class PReluGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(prelu, ops::PReluOp, ops::PReluOpMaker, REGISTER_OPERATOR(prelu, ops::PReluOp, ops::PReluOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(prelu_grad, ops::PReluGradOp) REGISTER_OPERATOR(prelu_grad, ops::PReluGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
prelu, ops::PReluKernel<paddle::platform::CPUDeviceContext, float>); prelu, ops::PReluKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -122,8 +122,8 @@ class RankLossGradOp : public framework::OperatorWithKernel { ...@@ -122,8 +122,8 @@ class RankLossGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(rank_loss, ops::RankLossOp, ops::RankLossOpMaker, REGISTER_OPERATOR(rank_loss, ops::RankLossOp, ops::RankLossOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(rank_loss_grad, ops::RankLossGradOp) REGISTER_OPERATOR(rank_loss_grad, ops::RankLossGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
rank_loss, ops::RankLossKernel<paddle::platform::CPUDeviceContext, float>); rank_loss, ops::RankLossKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -191,24 +191,24 @@ class ReduceProdOpMaker : public ReduceOpMaker { ...@@ -191,24 +191,24 @@ class ReduceProdOpMaker : public ReduceOpMaker {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(reduce_sum, ops::ReduceOp, ops::ReduceSumOpMaker, REGISTER_OPERATOR(reduce_sum, ops::ReduceOp, ops::ReduceSumOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(reduce_sum_grad, ops::ReduceGradOp) REGISTER_OPERATOR(reduce_sum_grad, ops::ReduceGradOp);
REGISTER_OPERATOR(reduce_mean, ops::ReduceOp, ops::ReduceMeanOpMaker, REGISTER_OPERATOR(reduce_mean, ops::ReduceOp, ops::ReduceMeanOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(reduce_mean_grad, ops::ReduceGradOp) REGISTER_OPERATOR(reduce_mean_grad, ops::ReduceGradOp);
REGISTER_OPERATOR(reduce_max, ops::ReduceOp, ops::ReduceMaxOpMaker, REGISTER_OPERATOR(reduce_max, ops::ReduceOp, ops::ReduceMaxOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(reduce_max_grad, ops::ReduceGradOp) REGISTER_OPERATOR(reduce_max_grad, ops::ReduceGradOp);
REGISTER_OPERATOR(reduce_min, ops::ReduceOp, ops::ReduceMinOpMaker, REGISTER_OPERATOR(reduce_min, ops::ReduceOp, ops::ReduceMinOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(reduce_min_grad, ops::ReduceGradOp) REGISTER_OPERATOR(reduce_min_grad, ops::ReduceGradOp);
REGISTER_OPERATOR(reduce_prod, ops::ReduceOp, ops::ReduceProdOpMaker, REGISTER_OPERATOR(reduce_prod, ops::ReduceOp, ops::ReduceProdOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(reduce_prod_grad, ops::ReduceGradOp) REGISTER_OPERATOR(reduce_prod_grad, ops::ReduceGradOp);
#define REGISTER_REDUCE_CPU_KERNEL(reduce_type, functor, grad_functor) \ #define REGISTER_REDUCE_CPU_KERNEL(reduce_type, functor, grad_functor) \
REGISTER_OP_CPU_KERNEL(reduce_type, \ REGISTER_OP_CPU_KERNEL(reduce_type, \
......
...@@ -35,77 +35,77 @@ using EigenVector = framework::EigenVector<T, MajorType, IndexType>; ...@@ -35,77 +35,77 @@ using EigenVector = framework::EigenVector<T, MajorType, IndexType>;
struct SumFunctor { struct SumFunctor {
template <typename DeviceContext, typename X, typename Y, typename Dim> template <typename DeviceContext, typename X, typename Y, typename Dim>
void operator()(const DeviceContext& place, X& x, Y& y, const Dim& dim) { void operator()(const DeviceContext& place, X* x, Y* y, const Dim& dim) {
y.device(place) = x.sum(dim); y->device(place) = x->sum(dim);
} }
}; };
struct SumGradFunctor { struct SumGradFunctor {
template <typename DeviceContext, typename X, typename Y, typename DX, template <typename DeviceContext, typename X, typename Y, typename DX,
typename DY, typename Dim> typename DY, typename Dim>
void operator()(const DeviceContext& place, X& x, Y& y, DX& dx, DY& dy, void operator()(const DeviceContext& place, X* x, Y* y, DX* dx, DY* dy,
const Dim& dim, int size) { const Dim& dim, int size) {
dx.device(place) = dy.broadcast(dim); dx->device(place) = dy->broadcast(dim);
} }
}; };
struct MeanFunctor { struct MeanFunctor {
template <typename DeviceContext, typename X, typename Y, typename Dim> template <typename DeviceContext, typename X, typename Y, typename Dim>
void operator()(const DeviceContext& place, X& x, Y& y, const Dim& dim) { void operator()(const DeviceContext& place, X* x, Y* y, const Dim& dim) {
y.device(place) = x.mean(dim); y->device(place) = x->mean(dim);
} }
}; };
struct MeanGradFunctor { struct MeanGradFunctor {
template <typename DeviceContext, typename X, typename Y, typename DX, template <typename DeviceContext, typename X, typename Y, typename DX,
typename DY, typename Dim> typename DY, typename Dim>
void operator()(const DeviceContext& place, X& x, Y& y, DX& dx, DY& dy, void operator()(const DeviceContext& place, X* x, Y* y, DX* dx, DY* dy,
const Dim& dim, int size) { const Dim& dim, int size) {
dx.device(place) = dy.broadcast(dim) / dx.constant(size); dx->device(place) = dy->broadcast(dim) / dx->constant(size);
} }
}; };
struct MaxFunctor { struct MaxFunctor {
template <typename DeviceContext, typename X, typename Y, typename Dim> template <typename DeviceContext, typename X, typename Y, typename Dim>
void operator()(const DeviceContext& place, X& x, Y& y, const Dim& dim) { void operator()(const DeviceContext& place, X* x, Y* y, const Dim& dim) {
y.device(place) = x.maximum(dim); y->device(place) = x->maximum(dim);
} }
}; };
struct MinFunctor { struct MinFunctor {
template <typename DeviceContext, typename X, typename Y, typename Dim> template <typename DeviceContext, typename X, typename Y, typename Dim>
void operator()(const DeviceContext& place, X& x, Y& y, const Dim& dim) { void operator()(const DeviceContext& place, X* x, Y* y, const Dim& dim) {
y.device(place) = x.minimum(dim); y->device(place) = x->minimum(dim);
} }
}; };
struct MaxOrMinGradFunctor { struct MaxOrMinGradFunctor {
template <typename DeviceContext, typename X, typename Y, typename DX, template <typename DeviceContext, typename X, typename Y, typename DX,
typename DY, typename Dim> typename DY, typename Dim>
void operator()(const DeviceContext& place, X& x, Y& y, DX& dx, DY& dy, void operator()(const DeviceContext& place, X* x, Y* y, DX* dx, DY* dy,
const Dim& dim, int size) { const Dim& dim, int size) {
auto equals = x == y.broadcast(dim); auto equals = (*x) == y->broadcast(dim);
auto ones = dx.constant(1); auto ones = dx->constant(1);
auto zeros = dx.constant(0); auto zeros = dx->constant(0);
// If there are multiple minimum or maximum elements, the subgradient of // If there are multiple minimum or maximum elements, the subgradient of
// each is the set [0, 1], and we pass gradient to all of them here. // each is the set [0, 1], and we pass gradient to all of them here.
dx.device(place) = dy.broadcast(dim) * equals.select(ones, zeros); dx->device(place) = dy->broadcast(dim) * equals.select(ones, zeros);
} }
}; };
struct ProdFunctor { struct ProdFunctor {
template <typename DeviceContext, typename X, typename Y, typename Dim> template <typename DeviceContext, typename X, typename Y, typename Dim>
void operator()(const DeviceContext& place, X& x, Y& y, const Dim& dim) { void operator()(const DeviceContext& place, X* x, Y* y, const Dim& dim) {
y.device(place) = x.prod(dim); y->device(place) = x->prod(dim);
} }
}; };
struct ProdGradFunctor { struct ProdGradFunctor {
template <typename DeviceContext, typename X, typename Y, typename DX, template <typename DeviceContext, typename X, typename Y, typename DX,
typename DY, typename Dim> typename DY, typename Dim>
void operator()(const DeviceContext& place, X& x, Y& y, DX& dx, DY& dy, void operator()(const DeviceContext& place, X* x, Y* y, DX* dx, DY* dy,
const Dim& dim, int size) { const Dim& dim, int size) {
dx.device(place) = dy.broadcast(dim) * y.broadcast(dim) * x.inverse(); dx->device(place) = dy->broadcast(dim) * y->broadcast(dim) * x->inverse();
} }
}; };
...@@ -125,7 +125,7 @@ class ReduceKernel : public framework::OpKernel<T> { ...@@ -125,7 +125,7 @@ class ReduceKernel : public framework::OpKernel<T> {
*context.template device_context<DeviceContext>().eigen_device(); *context.template device_context<DeviceContext>().eigen_device();
auto reduce_dim = Eigen::array<int, 1>({{0}}); auto reduce_dim = Eigen::array<int, 1>({{0}});
Functor functor; Functor functor;
functor(place, x, out, reduce_dim); functor(place, &x, &out, reduce_dim);
} else { } else {
int rank = context.Input<Tensor>("X")->dims().size(); int rank = context.Input<Tensor>("X")->dims().size();
switch (rank) { switch (rank) {
...@@ -178,10 +178,10 @@ class ReduceKernel : public framework::OpKernel<T> { ...@@ -178,10 +178,10 @@ class ReduceKernel : public framework::OpKernel<T> {
if (D == 1) { if (D == 1) {
auto out = EigenScalar<T>::From(*output); auto out = EigenScalar<T>::From(*output);
functor(place, x, out, reduce_dim); functor(place, &x, &out, reduce_dim);
} else { } else {
auto out = EigenTensor<T, (D - 1)>::From(*output, dims); auto out = EigenTensor<T, (D - 1)>::From(*output, dims);
functor(place, x, out, reduce_dim); functor(place, &x, &out, reduce_dim);
} }
} }
}; };
...@@ -206,7 +206,7 @@ class ReduceGradKernel : public framework::OpKernel<T> { ...@@ -206,7 +206,7 @@ class ReduceGradKernel : public framework::OpKernel<T> {
auto broadcast_dim = auto broadcast_dim =
Eigen::array<int, 1>({{static_cast<int>(input0->numel())}}); Eigen::array<int, 1>({{static_cast<int>(input0->numel())}});
Functor functor; Functor functor;
functor(place, x, x_reduce, x_grad, x_reduce_grad, broadcast_dim, functor(place, &x, &x_reduce, &x_grad, &x_reduce_grad, broadcast_dim,
broadcast_dim[0]); broadcast_dim[0]);
} else { } else {
int rank = context.Input<Tensor>("X")->dims().size(); int rank = context.Input<Tensor>("X")->dims().size();
...@@ -258,7 +258,7 @@ class ReduceGradKernel : public framework::OpKernel<T> { ...@@ -258,7 +258,7 @@ class ReduceGradKernel : public framework::OpKernel<T> {
auto& place = auto& place =
*context.template device_context<DeviceContext>().eigen_device(); *context.template device_context<DeviceContext>().eigen_device();
Functor functor; Functor functor;
functor(place, x, x_reduce, x_grad, x_reduce_grad, broadcast_dim, functor(place, &x, &x_reduce, &x_grad, &x_reduce_grad, broadcast_dim,
broadcast_dim[dim]); broadcast_dim[dim]);
} }
}; };
......
...@@ -114,8 +114,8 @@ namespace ops = paddle::operators; ...@@ -114,8 +114,8 @@ namespace ops = paddle::operators;
using CPU = paddle::platform::CPUDeviceContext; using CPU = paddle::platform::CPUDeviceContext;
REGISTER_OPERATOR(reshape, ops::ReshapeOp, ops::ReshapeOpMaker, REGISTER_OPERATOR(reshape, ops::ReshapeOp, ops::ReshapeOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(reshape_grad, ops::ReshapeGradOp) REGISTER_OPERATOR(reshape_grad, ops::ReshapeGradOp);
REGISTER_OP_CPU_KERNEL(reshape, ops::ReshapeKernel<CPU, float>, REGISTER_OP_CPU_KERNEL(reshape, ops::ReshapeKernel<CPU, float>,
ops::ReshapeKernel<CPU, double>, ops::ReshapeKernel<CPU, double>,
ops::ReshapeKernel<CPU, int>, ops::ReshapeKernel<CPU, int>,
......
...@@ -154,8 +154,8 @@ https://stackoverflow.com/questions/43430056/what-is-roi-layer-in-fast-rcnn ...@@ -154,8 +154,8 @@ https://stackoverflow.com/questions/43430056/what-is-roi-layer-in-fast-rcnn
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(roi_pool, ops::ROIPoolOp, ops::ROIPoolOpMaker, REGISTER_OPERATOR(roi_pool, ops::ROIPoolOp, ops::ROIPoolOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(roi_pool_grad, ops::ROIPoolGradOp) REGISTER_OPERATOR(roi_pool_grad, ops::ROIPoolGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
roi_pool, roi_pool,
ops::CPUROIPoolOpKernel<paddle::platform::CPUDeviceContext, float>, ops::CPUROIPoolOpKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -251,8 +251,8 @@ class RowConvGradKernel<platform::CPUDeviceContext, T> ...@@ -251,8 +251,8 @@ class RowConvGradKernel<platform::CPUDeviceContext, T>
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(row_conv, ops::RowConvOp, ops::RowConvOpMaker, REGISTER_OPERATOR(row_conv, ops::RowConvOp, ops::RowConvOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(row_conv_grad, ops::RowConvGradOp) REGISTER_OPERATOR(row_conv_grad, ops::RowConvGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
row_conv, ops::RowConvKernel<paddle::platform::CPUDeviceContext, float>); row_conv, ops::RowConvKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -23,17 +23,17 @@ USE_NO_KERNEL_OP(load_combine); ...@@ -23,17 +23,17 @@ USE_NO_KERNEL_OP(load_combine);
int* CreateForSaveCombineOp(int x, int y, const std::vector<int>& lod_info, int* CreateForSaveCombineOp(int x, int y, const std::vector<int>& lod_info,
std::string var_name, std::string var_name,
paddle::platform::CPUPlace& place, const paddle::platform::CPUPlace& place,
paddle::framework::Scope& scope, paddle::framework::Scope* scope,
paddle::framework::LoD& expect_lod) { paddle::framework::LoD* expect_lod) {
auto var = scope.Var(var_name); auto var = scope->Var(var_name);
auto tensor = var->GetMutable<paddle::framework::LoDTensor>(); auto tensor = var->GetMutable<paddle::framework::LoDTensor>();
tensor->Resize({x, y}); tensor->Resize({x, y});
expect_lod.resize(1); expect_lod->resize(1);
for (size_t i = 0; i < lod_info.size(); i++) { for (size_t i = 0; i < lod_info.size(); i++) {
expect_lod[0].push_back(lod_info[i]); (*expect_lod)[0].push_back(lod_info[i]);
} }
tensor->set_lod(expect_lod); tensor->set_lod(*expect_lod);
int* expect = tensor->mutable_data<int>(place); int* expect = tensor->mutable_data<int>(place);
for (int64_t i = 0; i < tensor->numel(); ++i) { for (int64_t i = 0; i < tensor->numel(); ++i) {
expect[i] = static_cast<int>(i); expect[i] = static_cast<int>(i);
...@@ -42,17 +42,17 @@ int* CreateForSaveCombineOp(int x, int y, const std::vector<int>& lod_info, ...@@ -42,17 +42,17 @@ int* CreateForSaveCombineOp(int x, int y, const std::vector<int>& lod_info,
} }
paddle::framework::LoDTensor* GeneratePlaceholderBeforeLoad( paddle::framework::LoDTensor* GeneratePlaceholderBeforeLoad(
const std::string out_var_name, paddle::framework::Scope& scope) { const std::string out_var_name, paddle::framework::Scope* scope) {
auto load_var = scope.Var(out_var_name); auto load_var = scope->Var(out_var_name);
auto target = load_var->GetMutable<paddle::framework::LoDTensor>(); auto target = load_var->GetMutable<paddle::framework::LoDTensor>();
return target; return target;
} }
int* GetValuesAfterLoadCombineOp(paddle::framework::LoDTensor* target, int* GetValuesAfterLoadCombineOp(paddle::framework::LoDTensor* target,
paddle::framework::Scope& scope, const paddle::framework::Scope& scope,
paddle::framework::LoD& actual_lod) { paddle::framework::LoD* actual_lod) {
int* actual = target->data<int>(); int* actual = target->data<int>();
actual_lod = target->lod(); *actual_lod = target->lod();
return actual; return actual;
} }
...@@ -78,26 +78,26 @@ TEST(SaveLoadCombineOp, CPU) { ...@@ -78,26 +78,26 @@ TEST(SaveLoadCombineOp, CPU) {
std::vector<int> lod1 = {0, 1, 2, 3, 10}; std::vector<int> lod1 = {0, 1, 2, 3, 10};
int numel1 = 100; int numel1 = 100;
paddle::framework::LoD expect_lod1; paddle::framework::LoD expect_lod1;
int* expect1 = CreateForSaveCombineOp(10, 10, lod1, "test_var1", place, scope, int* expect1 = CreateForSaveCombineOp(10, 10, lod1, "test_var1", place,
expect_lod1); &scope, &expect_lod1);
std::vector<int> lod2 = {0, 2, 5, 10}; std::vector<int> lod2 = {0, 2, 5, 10};
int numel2 = 200; int numel2 = 200;
paddle::framework::LoD expect_lod2; paddle::framework::LoD expect_lod2;
int* expect2 = CreateForSaveCombineOp(10, 20, lod2, "test_var2", place, scope, int* expect2 = CreateForSaveCombineOp(10, 20, lod2, "test_var2", place,
expect_lod2); &scope, &expect_lod2);
std::vector<int> lod3 = {0, 2, 3, 20}; std::vector<int> lod3 = {0, 2, 3, 20};
int numel3 = 4000; int numel3 = 4000;
paddle::framework::LoD expect_lod3; paddle::framework::LoD expect_lod3;
int* expect3 = CreateForSaveCombineOp(20, 200, lod3, "test_var3", place, int* expect3 = CreateForSaveCombineOp(20, 200, lod3, "test_var3", place,
scope, expect_lod3); &scope, &expect_lod3);
std::vector<int> lod4 = {0, 1, 20}; std::vector<int> lod4 = {0, 1, 20};
int numel4 = 1000; int numel4 = 1000;
paddle::framework::LoD expect_lod4; paddle::framework::LoD expect_lod4;
int* expect4 = CreateForSaveCombineOp(20, 50, lod4, "test_var4", place, scope, int* expect4 = CreateForSaveCombineOp(20, 50, lod4, "test_var4", place,
expect_lod4); &scope, &expect_lod4);
// Set attributes // Set attributes
std::string filename = "check_tensor.ls"; std::string filename = "check_tensor.ls";
...@@ -111,10 +111,10 @@ TEST(SaveLoadCombineOp, CPU) { ...@@ -111,10 +111,10 @@ TEST(SaveLoadCombineOp, CPU) {
save_combine_op->Run(scope, place); save_combine_op->Run(scope, place);
// Set up output vars // Set up output vars
auto target1 = GeneratePlaceholderBeforeLoad("out_var1", scope); auto target1 = GeneratePlaceholderBeforeLoad("out_var1", &scope);
auto target2 = GeneratePlaceholderBeforeLoad("out_var2", scope); auto target2 = GeneratePlaceholderBeforeLoad("out_var2", &scope);
auto target3 = GeneratePlaceholderBeforeLoad("out_var3", scope); auto target3 = GeneratePlaceholderBeforeLoad("out_var3", &scope);
auto target4 = GeneratePlaceholderBeforeLoad("out_var4", scope); auto target4 = GeneratePlaceholderBeforeLoad("out_var4", &scope);
// Run the load_combine_op // Run the load_combine_op
auto load_combine_op = paddle::framework::OpRegistry::CreateOp( auto load_combine_op = paddle::framework::OpRegistry::CreateOp(
...@@ -123,10 +123,10 @@ TEST(SaveLoadCombineOp, CPU) { ...@@ -123,10 +123,10 @@ TEST(SaveLoadCombineOp, CPU) {
load_combine_op->Run(scope, place); load_combine_op->Run(scope, place);
paddle::framework::LoD actual_lod1, actual_lod2, actual_lod3, actual_lod4; paddle::framework::LoD actual_lod1, actual_lod2, actual_lod3, actual_lod4;
int* actual1 = GetValuesAfterLoadCombineOp(target1, scope, actual_lod1); int* actual1 = GetValuesAfterLoadCombineOp(target1, scope, &actual_lod1);
int* actual2 = GetValuesAfterLoadCombineOp(target2, scope, actual_lod2); int* actual2 = GetValuesAfterLoadCombineOp(target2, scope, &actual_lod2);
int* actual3 = GetValuesAfterLoadCombineOp(target3, scope, actual_lod3); int* actual3 = GetValuesAfterLoadCombineOp(target3, scope, &actual_lod3);
int* actual4 = GetValuesAfterLoadCombineOp(target4, scope, actual_lod4); int* actual4 = GetValuesAfterLoadCombineOp(target4, scope, &actual_lod4);
CheckValues(expect1, actual1, expect_lod1, actual_lod1, numel1); CheckValues(expect1, actual1, expect_lod1, actual_lod1, numel1);
CheckValues(expect2, actual2, expect_lod2, actual_lod2, numel2); CheckValues(expect2, actual2, expect_lod2, actual_lod2, numel2);
......
...@@ -103,7 +103,7 @@ $$ ...@@ -103,7 +103,7 @@ $$
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(scatter, ops::ScatterOp, ops::ScatterOpMaker, REGISTER_OPERATOR(scatter, ops::ScatterOp, ops::ScatterOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(scatter_grad, ops::ScatterGradOp) REGISTER_OPERATOR(scatter_grad, ops::ScatterGradOp);
REGISTER_OP_CPU_KERNEL(scatter, ops::ScatterOpKernel<float>); REGISTER_OP_CPU_KERNEL(scatter, ops::ScatterOpKernel<float>);
REGISTER_OP_CPU_KERNEL(scatter_grad, ops::ScatterGradientOpKernel<float>); REGISTER_OP_CPU_KERNEL(scatter_grad, ops::ScatterGradientOpKernel<float>);
...@@ -127,7 +127,7 @@ namespace ops = paddle::operators; ...@@ -127,7 +127,7 @@ namespace ops = paddle::operators;
REGISTER_OPERATOR(sequence_concat, ops::SequenceConcatOp, REGISTER_OPERATOR(sequence_concat, ops::SequenceConcatOp,
ops::SequenceConcatOpMaker, ops::SequenceConcatOpMaker,
paddle::framework::DefaultGradOpDescMaker< paddle::framework::DefaultGradOpDescMaker<
false> /* set false to disable empty grad */) false> /* set false to disable empty grad */);
REGISTER_OPERATOR(sequence_concat_grad, ops::SequenceConcatGradOp); REGISTER_OPERATOR(sequence_concat_grad, ops::SequenceConcatGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
sequence_concat, sequence_concat,
......
...@@ -177,8 +177,8 @@ context_length, context_stride and context_start. ...@@ -177,8 +177,8 @@ context_length, context_stride and context_start.
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(sequence_conv, ops::SequenceConvOp, ops::SequenceConvOpMaker, REGISTER_OPERATOR(sequence_conv, ops::SequenceConvOp, ops::SequenceConvOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(sequence_conv_grad, ops::SequenceConvGradOp) REGISTER_OPERATOR(sequence_conv_grad, ops::SequenceConvGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
sequence_conv, sequence_conv,
......
...@@ -202,8 +202,8 @@ class SequenceExpandOpGrad : public framework::OperatorWithKernel { ...@@ -202,8 +202,8 @@ class SequenceExpandOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(sequence_expand, ops::SequenceExpandOp, REGISTER_OPERATOR(sequence_expand, ops::SequenceExpandOp,
ops::SequenceExpandOpMaker, ops::SequenceExpandOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(sequence_expand_grad, ops::SequenceExpandOpGrad) REGISTER_OPERATOR(sequence_expand_grad, ops::SequenceExpandOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
sequence_expand, sequence_expand,
ops::SequenceExpandKernel<paddle::platform::CPUDeviceContext, float>, ops::SequenceExpandKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -122,8 +122,8 @@ NOTE: The first dimension size of input, the size of offset and Length, should b ...@@ -122,8 +122,8 @@ NOTE: The first dimension size of input, the size of offset and Length, should b
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(sequence_slice, ops::SequenceSliceOp, REGISTER_OPERATOR(sequence_slice, ops::SequenceSliceOp,
ops::SequenceSliceOpMaker, ops::SequenceSliceOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(sequence_slice_grad, ops::SequenceSliceGradOp) REGISTER_OPERATOR(sequence_slice_grad, ops::SequenceSliceGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
sequence_slice, sequence_slice,
ops::SequenceSliceOpKernel<paddle::platform::CPUDeviceContext, float>); ops::SequenceSliceOpKernel<paddle::platform::CPUDeviceContext, float>);
......
文件模式从 100755 更改为 100644
...@@ -99,7 +99,7 @@ class SequenceSoftmaxGradCUDNNKernel : public framework::OpKernel<T> { ...@@ -99,7 +99,7 @@ class SequenceSoftmaxGradCUDNNKernel : public framework::OpKernel<T> {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OP_KERNEL(sequence_softmax, CUDNN, ::paddle::platform::CUDAPlace, REGISTER_OP_KERNEL(sequence_softmax, CUDNN, ::paddle::platform::CUDAPlace,
ops::SequenceSoftmaxCUDNNKernel<float>, ops::SequenceSoftmaxCUDNNKernel<float>,
ops::SequenceSoftmaxCUDNNKernel<double>) ops::SequenceSoftmaxCUDNNKernel<double>);
REGISTER_OP_KERNEL(sequence_softmax_grad, CUDNN, ::paddle::platform::CUDAPlace, REGISTER_OP_KERNEL(sequence_softmax_grad, CUDNN, ::paddle::platform::CUDAPlace,
ops::SequenceSoftmaxGradCUDNNKernel<float>, ops::SequenceSoftmaxGradCUDNNKernel<float>,
ops::SequenceSoftmaxGradCUDNNKernel<double>) ops::SequenceSoftmaxGradCUDNNKernel<double>);
...@@ -157,8 +157,8 @@ class SequenceSoftmaxGradOp : public framework::OperatorWithKernel { ...@@ -157,8 +157,8 @@ class SequenceSoftmaxGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(sequence_softmax, ops::SequenceSoftmaxOp, REGISTER_OPERATOR(sequence_softmax, ops::SequenceSoftmaxOp,
ops::SequenceSoftmaxOpMaker, ops::SequenceSoftmaxOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(sequence_softmax_grad, ops::SequenceSoftmaxGradOp) REGISTER_OPERATOR(sequence_softmax_grad, ops::SequenceSoftmaxGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
sequence_softmax, sequence_softmax,
ops::SequenceSoftmaxKernel<paddle::platform::CPUDeviceContext, float>, ops::SequenceSoftmaxKernel<paddle::platform::CPUDeviceContext, float>,
......
...@@ -18,7 +18,7 @@ namespace ops = paddle::operators; ...@@ -18,7 +18,7 @@ namespace ops = paddle::operators;
REGISTER_OP_CUDA_KERNEL( REGISTER_OP_CUDA_KERNEL(
sequence_softmax, sequence_softmax,
ops::SequenceSoftmaxKernel<paddle::platform::CUDADeviceContext, float>, ops::SequenceSoftmaxKernel<paddle::platform::CUDADeviceContext, float>,
ops::SequenceSoftmaxKernel<paddle::platform::CUDADeviceContext, double>) ops::SequenceSoftmaxKernel<paddle::platform::CUDADeviceContext, double>);
REGISTER_OP_CUDA_KERNEL( REGISTER_OP_CUDA_KERNEL(
sequence_softmax_grad, sequence_softmax_grad,
ops::SequenceSoftmaxGradKernel<paddle::platform::CUDADeviceContext, float>, ops::SequenceSoftmaxGradKernel<paddle::platform::CUDADeviceContext, float>,
......
...@@ -138,9 +138,9 @@ namespace ops = paddle::operators; ...@@ -138,9 +138,9 @@ namespace ops = paddle::operators;
REGISTER_OPERATOR(sigmoid_cross_entropy_with_logits, REGISTER_OPERATOR(sigmoid_cross_entropy_with_logits,
ops::SigmoidCrossEntropyWithLogitsOp, ops::SigmoidCrossEntropyWithLogitsOp,
ops::SigmoidCrossEntropyWithLogitsOpMaker, ops::SigmoidCrossEntropyWithLogitsOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(sigmoid_cross_entropy_with_logits_grad, REGISTER_OPERATOR(sigmoid_cross_entropy_with_logits_grad,
ops::SigmoidCrossEntropyWithLogitsGradOp) ops::SigmoidCrossEntropyWithLogitsGradOp);
REGISTER_OP_CPU_KERNEL(sigmoid_cross_entropy_with_logits, REGISTER_OP_CPU_KERNEL(sigmoid_cross_entropy_with_logits,
ops::SigmoidCrossEntropyWithLogitsKernel< ops::SigmoidCrossEntropyWithLogitsKernel<
paddle::platform::CPUDeviceContext, float>); paddle::platform::CPUDeviceContext, float>);
......
...@@ -133,8 +133,8 @@ class SmoothL1LossGradOp : public framework::OperatorWithKernel { ...@@ -133,8 +133,8 @@ class SmoothL1LossGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(smooth_l1_loss, ops::SmoothL1LossOp, ops::SmoothL1LossOpMaker, REGISTER_OPERATOR(smooth_l1_loss, ops::SmoothL1LossOp, ops::SmoothL1LossOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(smooth_l1_loss_grad, ops::SmoothL1LossGradOp) REGISTER_OPERATOR(smooth_l1_loss_grad, ops::SmoothL1LossGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
smooth_l1_loss, smooth_l1_loss,
ops::SmoothL1LossKernel<paddle::platform::CPUDeviceContext, float>); ops::SmoothL1LossKernel<paddle::platform::CPUDeviceContext, float>);
......
...@@ -161,8 +161,8 @@ class SoftmaxOpGrad : public framework::OperatorWithKernel { ...@@ -161,8 +161,8 @@ class SoftmaxOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(softmax, ops::SoftmaxOp, ops::SoftmaxOpMaker, REGISTER_OPERATOR(softmax, ops::SoftmaxOp, ops::SoftmaxOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(softmax_grad, ops::SoftmaxOpGrad) REGISTER_OPERATOR(softmax_grad, ops::SoftmaxOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
softmax, ops::SoftmaxKernel<paddle::platform::CPUDeviceContext, float>); softmax, ops::SoftmaxKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -93,8 +93,8 @@ class SppOpGrad : public framework::OperatorWithKernel { ...@@ -93,8 +93,8 @@ class SppOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(spp, ops::SppOp, ops::SppOpMaker, REGISTER_OPERATOR(spp, ops::SppOp, ops::SppOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(spp_grad, ops::SppOpGrad) REGISTER_OPERATOR(spp_grad, ops::SppOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
spp, ops::SppKernel<paddle::platform::CPUDeviceContext, float>, spp, ops::SppKernel<paddle::platform::CPUDeviceContext, float>,
ops::SppKernel<paddle::platform::CPUDeviceContext, double>); ops::SppKernel<paddle::platform::CPUDeviceContext, double>);
......
...@@ -111,8 +111,8 @@ class SquaredL2DistanceGradOp : public framework::OperatorWithKernel { ...@@ -111,8 +111,8 @@ class SquaredL2DistanceGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(squared_l2_distance, ops::SquaredL2DistanceOp, REGISTER_OPERATOR(squared_l2_distance, ops::SquaredL2DistanceOp,
ops::SquaredL2DistanceOpMaker, ops::SquaredL2DistanceOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(squared_l2_distance_grad, ops::SquaredL2DistanceGradOp) REGISTER_OPERATOR(squared_l2_distance_grad, ops::SquaredL2DistanceGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
squared_l2_distance, squared_l2_distance,
ops::SquaredL2DistanceKernel<paddle::platform::CPUDeviceContext, float>); ops::SquaredL2DistanceKernel<paddle::platform::CPUDeviceContext, float>);
......
...@@ -69,8 +69,8 @@ $$Out = \sum_{i} X_{i}^2$$ ...@@ -69,8 +69,8 @@ $$Out = \sum_{i} X_{i}^2$$
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(squared_l2_norm, ops::SquaredL2NormOp, REGISTER_OPERATOR(squared_l2_norm, ops::SquaredL2NormOp,
ops::SquaredL2NormOpMaker, ops::SquaredL2NormOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(squared_l2_norm_grad, ops::SquaredL2NormGradOp) REGISTER_OPERATOR(squared_l2_norm_grad, ops::SquaredL2NormGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
squared_l2_norm, squared_l2_norm,
ops::SquaredL2NormKernel<paddle::platform::CPUDeviceContext, float>); ops::SquaredL2NormKernel<paddle::platform::CPUDeviceContext, float>);
......
...@@ -119,8 +119,8 @@ class TransposeOpGrad : public framework::OperatorWithKernel { ...@@ -119,8 +119,8 @@ class TransposeOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(transpose, ops::TransposeOp, ops::TransposeOpMaker, REGISTER_OPERATOR(transpose, ops::TransposeOp, ops::TransposeOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(transpose_grad, ops::TransposeOpGrad) REGISTER_OPERATOR(transpose_grad, ops::TransposeOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
transpose, ops::TransposeKernel<paddle::platform::CPUDeviceContext, float>); transpose, ops::TransposeKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -133,8 +133,8 @@ class UnpoolOpGrad : public framework::OperatorWithKernel { ...@@ -133,8 +133,8 @@ class UnpoolOpGrad : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(unpool, ops::UnpoolOp, ops::Unpool2dOpMaker, REGISTER_OPERATOR(unpool, ops::UnpoolOp, ops::Unpool2dOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(unpool_grad, ops::UnpoolOpGrad) REGISTER_OPERATOR(unpool_grad, ops::UnpoolOpGrad);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
unpool, ops::UnpoolKernel<paddle::platform::CPUDeviceContext, float>, unpool, ops::UnpoolKernel<paddle::platform::CPUDeviceContext, float>,
ops::UnpoolKernel<paddle::platform::CPUDeviceContext, double>); ops::UnpoolKernel<paddle::platform::CPUDeviceContext, double>);
......
...@@ -133,8 +133,8 @@ class WarpCTCGradOp : public framework::OperatorWithKernel { ...@@ -133,8 +133,8 @@ class WarpCTCGradOp : public framework::OperatorWithKernel {
namespace ops = paddle::operators; namespace ops = paddle::operators;
REGISTER_OPERATOR(warpctc, ops::WarpCTCOp, ops::WarpCTCOpMaker, REGISTER_OPERATOR(warpctc, ops::WarpCTCOp, ops::WarpCTCOpMaker,
paddle::framework::DefaultGradOpDescMaker<true>) paddle::framework::DefaultGradOpDescMaker<true>);
REGISTER_OPERATOR(warpctc_grad, ops::WarpCTCGradOp) REGISTER_OPERATOR(warpctc_grad, ops::WarpCTCGradOp);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
warpctc, ops::WarpCTCKernel<paddle::platform::CPUDeviceContext, float>); warpctc, ops::WarpCTCKernel<paddle::platform::CPUDeviceContext, float>);
REGISTER_OP_CPU_KERNEL( REGISTER_OP_CPU_KERNEL(
......
...@@ -127,6 +127,8 @@ void BindProgramDesc(pybind11::module *m) { ...@@ -127,6 +127,8 @@ void BindProgramDesc(pybind11::module *m) {
.def("block", &pd::ProgramDesc::MutableBlock, .def("block", &pd::ProgramDesc::MutableBlock,
pybind11::return_value_policy::reference) pybind11::return_value_policy::reference)
.def("num_blocks", &pd::ProgramDesc::Size) .def("num_blocks", &pd::ProgramDesc::Size)
.def("get_feed_target_names", &pd::ProgramDesc::GetFeedTargetNames)
.def("get_fetch_target_names", &pd::ProgramDesc::GetFetchTargetNames)
.def("serialize_to_string", SerializeMessage<pd::ProgramDesc>) .def("serialize_to_string", SerializeMessage<pd::ProgramDesc>)
.def("parse_from_string", .def("parse_from_string",
[](pd::ProgramDesc &program_desc, const std::string &data) { [](pd::ProgramDesc &program_desc, const std::string &data) {
...@@ -299,6 +301,7 @@ void BindOpDesc(pybind11::module *m) { ...@@ -299,6 +301,7 @@ void BindOpDesc(pybind11::module *m) {
.def("check_attrs", &pd::OpDesc::CheckAttrs) .def("check_attrs", &pd::OpDesc::CheckAttrs)
.def("infer_shape", &pd::OpDesc::InferShape) .def("infer_shape", &pd::OpDesc::InferShape)
.def("infer_var_type", &pd::OpDesc::InferVarType) .def("infer_var_type", &pd::OpDesc::InferVarType)
.def("set_is_target", &pd::OpDesc::SetIsTarget)
.def("serialize_to_string", SerializeMessage<pd::OpDesc>) .def("serialize_to_string", SerializeMessage<pd::OpDesc>)
.def("block", &pd::OpDesc::Block, .def("block", &pd::OpDesc::Block,
pybind11::return_value_policy::reference); pybind11::return_value_policy::reference);
......
...@@ -294,7 +294,7 @@ All parameter, weight, gradient are variables in Paddle. ...@@ -294,7 +294,7 @@ All parameter, weight, gradient are variables in Paddle.
const std::vector<std::array<size_t, 2>> &targets) { const std::vector<std::array<size_t, 2>> &targets) {
ProgramDesc prog_with_targets(origin); ProgramDesc prog_with_targets(origin);
for (const auto &t : targets) { for (const auto &t : targets) {
prog_with_targets.MutableBlock(t[0])->Op(t[1])->MarkAsTarget(); prog_with_targets.MutableBlock(t[0])->Op(t[1])->SetIsTarget(true);
} }
proto::ProgramDesc pruned_desc; proto::ProgramDesc pruned_desc;
Prune(*prog_with_targets.Proto(), &pruned_desc); Prune(*prog_with_targets.Proto(), &pruned_desc);
......
...@@ -1070,6 +1070,12 @@ class Program(object): ...@@ -1070,6 +1070,12 @@ class Program(object):
for t in targets: for t in targets:
if not isinstance(t, Operator): if not isinstance(t, Operator):
if isinstance(t, Variable): if isinstance(t, Variable):
if t.op is None:
global_block = self.global_block()
for op in global_block.ops:
if t.name in op.output_arg_names:
t.op = op
break
t = t.op t = t.op
else: else:
raise ValueError(("All targets of prune() can only be " raise ValueError(("All targets of prune() can only be "
......
...@@ -340,6 +340,13 @@ def save_inference_model(dirname, ...@@ -340,6 +340,13 @@ def save_inference_model(dirname,
if not os.path.isdir(dirname): if not os.path.isdir(dirname):
os.makedirs(dirname) os.makedirs(dirname)
# Clear the is_target information and remove the existed feed and fetch op
global_block = main_program.global_block()
for i, op in enumerate(global_block.ops):
op.desc.set_is_target(False)
if op.type == "feed" or op.type == "fetch":
global_block.remove_op(i)
pruned_program = main_program.prune(targets=target_vars) pruned_program = main_program.prune(targets=target_vars)
inference_program = pruned_program.inference_optimize() inference_program = pruned_program.inference_optimize()
fetch_var_names = [v.name for v in target_vars] fetch_var_names = [v.name for v in target_vars]
...@@ -362,24 +369,6 @@ def save_inference_model(dirname, ...@@ -362,24 +369,6 @@ def save_inference_model(dirname,
save_persistables(executor, dirname, inference_program, params_filename) save_persistables(executor, dirname, inference_program, params_filename)
def get_feed_targets_names(program):
feed_targets_names = []
global_block = program.global_block()
for op in global_block.ops:
if op.desc.type() == 'feed':
feed_targets_names.insert(0, op.desc.output('Out')[0])
return feed_targets_names
def get_fetch_targets_names(program):
fetch_targets_names = []
global_block = program.global_block()
for op in global_block.ops:
if op.desc.type() == 'fetch':
fetch_targets_names.append(op.desc.input('X')[0])
return fetch_targets_names
def load_inference_model(dirname, def load_inference_model(dirname,
executor, executor,
model_filename=None, model_filename=None,
...@@ -418,8 +407,8 @@ def load_inference_model(dirname, ...@@ -418,8 +407,8 @@ def load_inference_model(dirname,
program = Program.parse_from_string(program_desc_str) program = Program.parse_from_string(program_desc_str)
load_persistables(executor, dirname, program, params_filename) load_persistables(executor, dirname, program, params_filename)
feed_target_names = get_feed_targets_names(program) feed_target_names = program.desc.get_feed_target_names()
fetch_target_names = get_fetch_targets_names(program) fetch_target_names = program.desc.get_fetch_target_names()
fetch_targets = [ fetch_targets = [
program.global_block().var(name) for name in fetch_target_names program.global_block().var(name) for name in fetch_target_names
] ]
......
...@@ -248,6 +248,10 @@ def infer(use_cuda, save_dirname=None): ...@@ -248,6 +248,10 @@ def infer(use_cuda, save_dirname=None):
print("infer results: ", results[0]) print("infer results: ", results[0])
fluid.io.save_inference_model(save_dirname, feed_target_names,
fetch_targets, exe,
inference_transpiler_program)
def main(net_type, use_cuda, is_local=True): def main(net_type, use_cuda, is_local=True):
if use_cuda and not fluid.core.is_compiled_with_cuda(): if use_cuda and not fluid.core.is_compiled_with_cuda():
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册