未验证 提交 8dde7aea 编写于 作者: N Nyakku Shigure 提交者: GitHub

[CodeStyle] trim trailing whitespace in .h, .cc, .cu, etc. (#46006)

上级 bc77e6d5
...@@ -511,9 +511,9 @@ class CustomOpMaker : public OpProtoAndCheckerMaker { ...@@ -511,9 +511,9 @@ class CustomOpMaker : public OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
Custom Operator. Custom Operator.
According to the Tensor operation function implemented by the user According to the Tensor operation function implemented by the user
independently of the framework, it is encapsulated into a framework independently of the framework, it is encapsulated into a framework
operator to adapt to various execution scenarios such as dynamic graph, operator to adapt to various execution scenarios such as dynamic graph,
mode static graph mode, and inference mode. mode static graph mode, and inference mode.
)DOC"); )DOC");
......
...@@ -600,9 +600,9 @@ void StatisticsEngine::Log(const std::string& filepath) { ...@@ -600,9 +600,9 @@ void StatisticsEngine::Log(const std::string& filepath) {
for (size_t idx = 0; idx < statistics_.size(); ++idx) { for (size_t idx = 0; idx < statistics_.size(); ++idx) {
const auto& evt_stat = statistics_[idx]; const auto& evt_stat = statistics_[idx];
ofs << platform::string_format(std::string(R"JSON( ofs << platform::string_format(std::string(R"JSON(
{ {
"statistical item" : "%s", "statistical item" : "%s",
"total time(ns)" : %llu, "total time(ns)" : %llu,
"total number of times" : %llu, "total number of times" : %llu,
"normalization time(ns)" : %llu "normalization time(ns)" : %llu
},)JSON"), },)JSON"),
......
...@@ -607,7 +607,7 @@ class LogitOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -607,7 +607,7 @@ class LogitOpMaker : public framework::OpProtoAndCheckerMaker {
"(float, default 1e-6f) the epsilon for input clamp bound") "(float, default 1e-6f) the epsilon for input clamp bound")
.SetDefault(1e-6f); .SetDefault(1e-6f);
AddComment(R"DOC( AddComment(R"DOC(
Logit Operator. Logit Operator.
this function is defined as follow: this function is defined as follow:
$ logit=ln\left ( {\frac {x} {1-x}} \right ) $ $ logit=ln\left ( {\frac {x} {1-x}} \right ) $
......
...@@ -87,7 +87,7 @@ class AddPositionEncodingOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -87,7 +87,7 @@ class AddPositionEncodingOpMaker : public framework::OpProtoAndCheckerMaker {
}); });
AddComment(R"DOC( AddComment(R"DOC(
Add Position Encoding Operator. Add Position Encoding Operator.
The add position encoding calculates the output based on the input, alpha, beta. The add position encoding calculates the output based on the input, alpha, beta.
The size of each dimension of the parameters checked in the infer-shape. The size of each dimension of the parameters checked in the infer-shape.
)DOC"); )DOC");
......
...@@ -77,7 +77,7 @@ class AddMMOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -77,7 +77,7 @@ class AddMMOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
AddMM Operator. AddMM Operator.
This operator is used to perform matrix multiplication for input $x$ and $y$ with coefficient $alpha$. This operator is used to perform matrix multiplication for input $x$ and $y$ with coefficient $alpha$.
$input$ with coefficient $beta$ is added to the final result. $input$ with coefficient $beta$ is added to the final result.
The equation is: The equation is:
$$Out = alpha * x * y + beta * input$$ $$Out = alpha * x * y + beta * input$$
......
...@@ -177,7 +177,7 @@ class AffineGridOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -177,7 +177,7 @@ class AffineGridOpMaker : public framework::OpProtoAndCheckerMaker {
[x_14, x_15, x_16]] [x_14, x_15, x_16]]
[[x_21, x_22, x_23] [[x_21, x_22, x_23]
[x_24, x_25, x_26]]] [x_24, x_25, x_26]]]
OutputShape = [2, 3, 5, 5] OutputShape = [2, 3, 5, 5]
Step 1: Step 1:
...@@ -185,12 +185,12 @@ class AffineGridOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -185,12 +185,12 @@ class AffineGridOpMaker : public framework::OpProtoAndCheckerMaker {
Generate relative coordinates according to OutputShape. Generate relative coordinates according to OutputShape.
The values of relative coordinates are in the interval between -1 and 1. The values of relative coordinates are in the interval between -1 and 1.
The shape of the relative coordinates is [2, H, W] as below: The shape of the relative coordinates is [2, H, W] as below:
C = [[[-1. -1. -1. -1. -1. ] C = [[[-1. -1. -1. -1. -1. ]
[-0.5 -0.5 -0.5 -0.5 -0.5] [-0.5 -0.5 -0.5 -0.5 -0.5]
[ 0. 0. 0. 0. 0. ] [ 0. 0. 0. 0. 0. ]
[ 0.5 0.5 0.5 0.5 0.5] [ 0.5 0.5 0.5 0.5 0.5]
[ 1. 1. 1. 1. 1. ]] [ 1. 1. 1. 1. 1. ]]
[[-1. -0.5 0. 0.5 1. ] [[-1. -0.5 0. 0.5 1. ]
[-1. -0.5 0. 0.5 1. ] [-1. -0.5 0. 0.5 1. ]
[-1. -0.5 0. 0.5 1. ] [-1. -0.5 0. 0.5 1. ]
...@@ -198,7 +198,7 @@ class AffineGridOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -198,7 +198,7 @@ class AffineGridOpMaker : public framework::OpProtoAndCheckerMaker {
[-1. -0.5 0. 0.5 1. ]]] [-1. -0.5 0. 0.5 1. ]]]
C[0] is the coordinates in height axis and C[1] is the coordinates in C[0] is the coordinates in height axis and C[1] is the coordinates in
width axis. width axis.
Step2: Step2:
Tanspose and reshape C to shape [H * W, 2] and append ones to last Tanspose and reshape C to shape [H * W, 2] and append ones to last
dimension. The we get: dimension. The we get:
......
...@@ -47,7 +47,7 @@ class AllcloseOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -47,7 +47,7 @@ class AllcloseOpMaker : public framework::OpProtoAndCheckerMaker {
"compared as equal. Default: :math:`False` .") "compared as equal. Default: :math:`False` .")
.SetDefault(false); .SetDefault(false);
AddComment(R"DOC( AddComment(R"DOC(
This operator checks if all :math:`x` and :math:`y` satisfy the condition: This operator checks if all :math:`x` and :math:`y` satisfy the condition:
.. math:: .. math::
...@@ -110,7 +110,7 @@ REGISTER_OP_VERSION(allclose) ...@@ -110,7 +110,7 @@ REGISTER_OP_VERSION(allclose)
"The added input 'Atol' is not" "The added input 'Atol' is not"
"dispensable.")) "dispensable."))
.AddCheckpoint( .AddCheckpoint(
R"ROC(Delete two float attributes [rtol] and [atol], R"ROC(Delete two float attributes [rtol] and [atol],
then add 2 string attributes [atol, rtol]. Don't be surprised. then add 2 string attributes [atol, rtol]. Don't be surprised.
This is because float cannot represent hight-precision This is because float cannot represent hight-precision
floating-point values, and our framework doesn't support floating-point values, and our framework doesn't support
......
...@@ -69,8 +69,8 @@ Check if input X contains all finite data, if yes, scale it by input Scale. ...@@ -69,8 +69,8 @@ Check if input X contains all finite data, if yes, scale it by input Scale.
$$Out = X / scale$$ $$Out = X / scale$$
If any tensor in X contains Inf or Nan, the Out will generate a indicator. If any tensor in X contains Inf or Nan, the Out will generate a indicator.
FoundInfinite will be 1 (True), and Out will not be scaled. In this case, the data of FoundInfinite will be 1 (True), and Out will not be scaled. In this case, the data of
Out should not be used, and its data may not be deterministic. Out should not be used, and its data may not be deterministic.
Otherwise, FoundInfinite will be 0 (False). Otherwise, FoundInfinite will be 0 (False).
)DOC"); )DOC");
......
...@@ -111,8 +111,8 @@ class UpdateLossScalingOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -111,8 +111,8 @@ class UpdateLossScalingOpMaker : public framework::OpProtoAndCheckerMaker {
"Stop updating loss scaling, and just zero inputs.") "Stop updating loss scaling, and just zero inputs.")
.SetDefault(false); .SetDefault(false);
AddComment(R"DOC( AddComment(R"DOC(
Update loss scaling according to overall gradients. If all gradients is Update loss scaling according to overall gradients. If all gradients is
finite after incr_every_n_steps, loss scaling will increase by incr_ratio. finite after incr_every_n_steps, loss scaling will increase by incr_ratio.
Otherwise, loss scaling will decrease by decr_ratio after Otherwise, loss scaling will decrease by decr_ratio after
decr_every_n_nan_or_inf steps and each step some gradients are infinite. decr_every_n_nan_or_inf steps and each step some gradients are infinite.
......
...@@ -58,9 +58,9 @@ class ArgsortOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -58,9 +58,9 @@ class ArgsortOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
Argsort operator Argsort operator
Performs sorting on the input tensor along the given axis and outputs two Performs sorting on the input tensor along the given axis and outputs two
tensors, Output(Out) and Output(Indices). They reserve the same shape tensors, Output(Out) and Output(Indices). They reserve the same shape
with Input(X), and Output(Out) represents the sorted tensor while with Input(X), and Output(Out) represents the sorted tensor while
Output(Indices) gives the sorted order along the given axis Attr(axis). Output(Indices) gives the sorted order along the given axis Attr(axis).
)DOC"); )DOC");
......
...@@ -223,10 +223,10 @@ class ArrayToLoDTensorOpProtoMaker : public framework::OpProtoAndCheckerMaker { ...@@ -223,10 +223,10 @@ class ArrayToLoDTensorOpProtoMaker : public framework::OpProtoAndCheckerMaker {
"'paddle/framework/lod_rank_table.h' for more details."); "'paddle/framework/lod_rank_table.h' for more details.");
AddOutput("Out", "(LoDTensor) The LoDTensor formed by input tensor array."); AddOutput("Out", "(LoDTensor) The LoDTensor formed by input tensor array.");
AddComment( AddComment(
R"DOC(This Op build a big LoDTensor from a std::vector<LoDTensor> R"DOC(This Op build a big LoDTensor from a std::vector<LoDTensor>
and a LoDRankTable. It is supposed to be used in getting dynamic RNN's and a LoDRankTable. It is supposed to be used in getting dynamic RNN's
outputs back to a normal LoDTensor. The std::vector<LoDTensor> outputs back to a normal LoDTensor. The std::vector<LoDTensor>
would be the output of RNN Op and the LoDRankTable would be build would be the output of RNN Op and the LoDRankTable would be build
with RNN's input.)DOC"); with RNN's input.)DOC");
} }
}; };
......
...@@ -62,7 +62,7 @@ class AssignPosOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -62,7 +62,7 @@ class AssignPosOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
assign_pos_op Operator. assign_pos_op Operator.
Assign pos decides which tokens should be fetched belong to Assign pos decides which tokens should be fetched belong to
specially counter orderingly. specially counter orderingly.
)DOC"); )DOC");
......
...@@ -297,7 +297,7 @@ tmp(seqlen*(M+D)) * fc((M+D)*1) => fcout(seqlen*1) with bias, relu ...@@ -297,7 +297,7 @@ tmp(seqlen*(M+D)) * fc((M+D)*1) => fcout(seqlen*1) with bias, relu
fcout(seqlen*1) * scalar => fcout(seqlen*1) with bias, relu fcout(seqlen*1) * scalar => fcout(seqlen*1) with bias, relu
dotmul and sum pool ( fcout(seqlen*1), x(seqlen * M) ) => lstm_x_t(1, M) dotmul and sum pool ( fcout(seqlen*1), x(seqlen * M) ) => lstm_x_t(1, M)
LSTM part: LSTM part:
use lstm_x_t as input and compute as standard LSTM. use lstm_x_t as input and compute as standard LSTM.
......
...@@ -44,8 +44,8 @@ class BmmOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -44,8 +44,8 @@ class BmmOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("Out", "(Tensor), The output tensor of Bmm op."); AddOutput("Out", "(Tensor), The output tensor of Bmm op.");
AddComment(R"DOC( AddComment(R"DOC(
The Bmm operator is used to perform batched matrix multiplication The Bmm operator is used to perform batched matrix multiplication
over the last two dimensions of the input tensors `X` and `Y` over the last two dimensions of the input tensors `X` and `Y`
which are both 3-dimentionsal. which are both 3-dimentionsal.
Examples: Examples:
- X: [B, M, K], Y: [B, K, N] => Out: [B, M, N] - X: [B, M, K], Y: [B, K, N] => Out: [B, M, N]
......
...@@ -54,7 +54,7 @@ class BroadcastTensorsOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -54,7 +54,7 @@ class BroadcastTensorsOpMaker : public framework::OpProtoAndCheckerMaker {
"consistent with :code:`x`.") "consistent with :code:`x`.")
.AsDuplicable(); .AsDuplicable();
AddComment( AddComment(
R"DOC(This OP is used to broadcast a vector of inputs R"DOC(This OP is used to broadcast a vector of inputs
with Tensor or LoDTensor type, following broadcast semantics.)DOC"); with Tensor or LoDTensor type, following broadcast semantics.)DOC");
} }
}; };
......
...@@ -80,10 +80,10 @@ class CenterLossOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -80,10 +80,10 @@ class CenterLossOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr<bool>("need_update", "whether need to update center info."); AddAttr<bool>("need_update", "whether need to update center info.");
AddComment(R"DOC( AddComment(R"DOC(
**CenterLoss operator** **CenterLoss operator**
implemention of the center loss function in the papper<<A Discriminative implemention of the center loss function in the papper<<A Discriminative
Feature Learning Approach for Deep Face Recognition>>, equations in this implement Feature Learning Approach for Deep Face Recognition>>, equations in this implement
is:loss = 1/2 * (x-y)^2 ,where x(X) means the deep feature(output of last hidden layer ) is:loss = 1/2 * (x-y)^2 ,where x(X) means the deep feature(output of last hidden layer )
and y(Label) the target label and y(Label) the target label
)DOC"); )DOC");
} }
}; };
......
...@@ -52,9 +52,9 @@ class ChannelShuffleOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -52,9 +52,9 @@ class ChannelShuffleOpMaker : public framework::OpProtoAndCheckerMaker {
while keeping the original tensor shape. while keeping the original tensor shape.
Please refer to the paper: Please refer to the paper:
`ShuffleNet: An Extremely Efficient Convolutional Neural Network for `ShuffleNet: An Extremely Efficient Convolutional Neural Network for
Mobile Devices <https://arxiv.org/abs/1707.01083>`_ Mobile Devices <https://arxiv.org/abs/1707.01083>`_
by Zhang et. al (2017) for more details. by Zhang et. al (2017) for more details.
)DOC"); )DOC");
} }
......
...@@ -145,7 +145,7 @@ For some basics of chunking, please refer to ...@@ -145,7 +145,7 @@ For some basics of chunking, please refer to
ChunkEvalOp computes the precision, recall, and F1-score of chunk detection, ChunkEvalOp computes the precision, recall, and F1-score of chunk detection,
and supports IOB, IOE, IOBES and IO (also known as plain) tagging schemes. and supports IOB, IOE, IOBES and IO (also known as plain) tagging schemes.
Here is a NER example of labeling for these tagging schemes: Here is a NER example of labeling for these tagging schemes:
Li Ming works at Agricultural Bank of China in Beijing. Li Ming works at Agricultural Bank of China in Beijing.
IO I-PER I-PER O O I-ORG I-ORG I-ORG I-ORG O I-LOC IO I-PER I-PER O O I-ORG I-ORG I-ORG I-ORG O I-LOC
IOB B-PER I-PER O O B-ORG I-ORG I-ORG I-ORG O B-LOC IOB B-PER I-PER O O B-ORG I-ORG I-ORG I-ORG O B-LOC
...@@ -158,13 +158,13 @@ and LOC(LOCATION), and we can see that the labels have the form <tag type>-<chun ...@@ -158,13 +158,13 @@ and LOC(LOCATION), and we can see that the labels have the form <tag type>-<chun
Since the calculations actually use label ids rather than labels, extra attention Since the calculations actually use label ids rather than labels, extra attention
should be paid when mapping labels to ids to make CheckEvalOp work. The key point should be paid when mapping labels to ids to make CheckEvalOp work. The key point
is that the listed equations are satisfied by ids. is that the listed equations are satisfied by ids.
tag_type = label % num_tag_type tag_type = label % num_tag_type
chunk_type = label / num_tag_type chunk_type = label / num_tag_type
where `num_tag_type` is the num of tag types in the tagging scheme, `num_chunk_type` where `num_tag_type` is the num of tag types in the tagging scheme, `num_chunk_type`
is the num of chunk types, and `tag_type` get its value from the following table. is the num of chunk types, and `tag_type` get its value from the following table.
Scheme Begin Inside End Single Scheme Begin Inside End Single
plain 0 - - - plain 0 - - -
IOB 0 1 - - IOB 0 1 - -
......
...@@ -94,7 +94,7 @@ CINN(https://github.com/PaddlePaddle/CINN/blob/develop/README.md) instruction ex ...@@ -94,7 +94,7 @@ CINN(https://github.com/PaddlePaddle/CINN/blob/develop/README.md) instruction ex
Both the input and output of this operator are a set of variables Both the input and output of this operator are a set of variables
which are the input and output arguments of the bound cinn instruction respectively. which are the input and output arguments of the bound cinn instruction respectively.
In addition, there is an attribute named 'cached_index' should be In addition, there is an attribute named 'cached_index' should be
set necessarily to get the CinnCompiledObject where the instruction is included set necessarily to get the CinnCompiledObject where the instruction is included
and 'instruction_index' is fetch the instruction object from complied runtime prograrm. and 'instruction_index' is fetch the instruction object from complied runtime prograrm.
It accomplishes the execution of the instruction according to the following steps: It accomplishes the execution of the instruction according to the following steps:
......
...@@ -75,8 +75,8 @@ class ClassCenterSampleOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -75,8 +75,8 @@ class ClassCenterSampleOpMaker : public framework::OpProtoAndCheckerMaker {
The process of sampling subset class centers is straightforward: 1) First select the positive class centers; The process of sampling subset class centers is straightforward: 1) First select the positive class centers;
2) Randomly sample negative class centers. Specifically, given a Label tensor, shape [batch_size], select all 2) Randomly sample negative class centers. Specifically, given a Label tensor, shape [batch_size], select all
the positive class centers and randomly sample negative class centers, then remap the input label tensor using the positive class centers and randomly sample negative class centers, then remap the input label tensor using
the sampled class centers. Note that if the number of the positive class centers is greater than the input the sampled class centers. Note that if the number of the positive class centers is greater than the input
num_samples, it keeps all the positive class centers and the shape of SampledLocalClassCenter will be num_samples, it keeps all the positive class centers and the shape of SampledLocalClassCenter will be
[num_positive_class_centers]. The op supports CPU, single GPU and multi GPU. [num_positive_class_centers]. The op supports CPU, single GPU and multi GPU.
For more information, Partial FC: Training 10 Million Identities on a Single Machine For more information, Partial FC: Training 10 Million Identities on a Single Machine
......
...@@ -80,8 +80,8 @@ class GlobalScatterOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -80,8 +80,8 @@ class GlobalScatterOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("Out", "(Tensor) the result of global_scatter."); AddOutput("Out", "(Tensor) the result of global_scatter.");
AddComment(R"DOC( AddComment(R"DOC(
Global Scatter Operator Global Scatter Operator
Scatter data in X which has been put together belong to one expert Scatter data in X which has been put together belong to one expert
to n_expert * world_size exeperts according to local_count to n_expert * world_size exeperts according to local_count
and receive tensors from n_expert * world_size experts according and receive tensors from n_expert * world_size experts according
to global_count. to global_count.
)DOC"); )DOC");
......
...@@ -41,8 +41,8 @@ class AsComplexOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -41,8 +41,8 @@ class AsComplexOpMaker : public framework::OpProtoAndCheckerMaker {
As_complex Operator. As_complex Operator.
This operator is used to return a complex tensor represented This operator is used to return a complex tensor represented
by an old-fashioned real tensor. The size of the last dimension of by an old-fashioned real tensor. The size of the last dimension of
the input tensor should be 2, which corresponds to 'real' and the input tensor should be 2, which corresponds to 'real' and
'complex', respectively. 'complex', respectively.
)DOC"); )DOC");
...@@ -75,7 +75,7 @@ class AsRealOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -75,7 +75,7 @@ class AsRealOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
AsReal Operator. AsReal Operator.
This operator is used to return an old-fashioned real tensor from a This operator is used to return an old-fashioned real tensor from a
complex tensor. The size of the last dimension of the output tensor is 2, complex tensor. The size of the last dimension of the output tensor is 2,
which corresponds to 'real' and 'complex', respectively. which corresponds to 'real' and 'complex', respectively.
......
...@@ -38,7 +38,7 @@ class CompareReduceOpProtoMaker : public framework::OpProtoAndCheckerMaker { ...@@ -38,7 +38,7 @@ class CompareReduceOpProtoMaker : public framework::OpProtoAndCheckerMaker {
comment.equation)); comment.equation));
AddComment(string::Sprintf(R"DOC( AddComment(string::Sprintf(R"DOC(
It operates element-wise on X and Y, and returns the Out. X, Y is a It operates element-wise on X and Y, and returns the Out. X, Y is a
N-dim tensor, which could be any type. If all element $%s$, the Out tensor N-dim tensor, which could be any type. If all element $%s$, the Out tensor
is [True], else [False] is [True], else [False]
)DOC", )DOC",
comment.equation)); comment.equation));
......
...@@ -73,7 +73,7 @@ b = opA(a) ...@@ -73,7 +73,7 @@ b = opA(a)
y = opB(x) y = opB(x)
if tensor b and tensor x has some inner dependency, for example, x share data with b, if tensor b and tensor x has some inner dependency, for example, x share data with b,
we need to add explicit dependency for x <- b, otherwise the these two operators may we need to add explicit dependency for x <- b, otherwise the these two operators may
be executed parellel in static graph. We can use depend op as below, be executed parellel in static graph. We can use depend op as below,
b = opA(a) b = opA(a)
......
...@@ -140,9 +140,9 @@ class CopyCrossScopeOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -140,9 +140,9 @@ class CopyCrossScopeOpMaker : public framework::OpProtoAndCheckerMaker {
.SetDefault(false); .SetDefault(false);
AddAttr<int>("num_micro_batches", "Number of micro batches for pipeline."); AddAttr<int>("num_micro_batches", "Number of micro batches for pipeline.");
AddComment(R"DOC( AddComment(R"DOC(
This op is used by pipeline to copy tensors across micro batch scopes. This op is used by pipeline to copy tensors across micro batch scopes.
Copy the variable value of the giving Id's micro scope to the micro scope of Id + 1 position. Copy the variable value of the giving Id's micro scope to the micro scope of Id + 1 position.
If need to copy back to the main scope, using to_main_scope option to copy the variable value of If need to copy back to the main scope, using to_main_scope option to copy the variable value of
the current micro scope to the main scope. the current micro scope to the main scope.
)DOC"); )DOC");
} }
......
...@@ -58,9 +58,9 @@ class CRFDecodingOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -58,9 +58,9 @@ class CRFDecodingOpMaker : public framework::OpProtoAndCheckerMaker {
.AsDispensable(); .AsDispensable();
AddComment(R"DOC( AddComment(R"DOC(
The crf_decoding operator reads the emission feature weights and the transition The crf_decoding operator reads the emission feature weights and the transition
feature weights learned by the linear_chain_crf operator and performs decoding. feature weights learned by the linear_chain_crf operator and performs decoding.
It implements the Viterbi algorithm which is a dynamic programming algorithm It implements the Viterbi algorithm which is a dynamic programming algorithm
for finding the most likely sequence of hidden states, called the Viterbi path, for finding the most likely sequence of hidden states, called the Viterbi path,
that results in a sequence of observed tags. that results in a sequence of observed tags.
The output of this operator changes according to whether Input(Label) is given: The output of this operator changes according to whether Input(Label) is given:
...@@ -68,15 +68,15 @@ The output of this operator changes according to whether Input(Label) is given: ...@@ -68,15 +68,15 @@ The output of this operator changes according to whether Input(Label) is given:
1. Input(Label) is given: 1. Input(Label) is given:
This happens in training. This operator is used to co-work with the chunk_eval This happens in training. This operator is used to co-work with the chunk_eval
operator. operator.
When Input(Label) is given, the crf_decoding operator returns tensor with the When Input(Label) is given, the crf_decoding operator returns tensor with the
sampe shape as Input(Label) whose values are fixed to be 0, indicating an sampe shape as Input(Label) whose values are fixed to be 0, indicating an
incorrect prediction, or 1 indicating a tag is correctly predicted. Such an incorrect prediction, or 1 indicating a tag is correctly predicted. Such an
output is the input to chunk_eval operator. output is the input to chunk_eval operator.
2. Input(Label) is not given: 2. Input(Label) is not given:
This is the standard decoding process. This is the standard decoding process.
The crf_decoding operator returns a row vector with shape [N x 1]/[B x S], here The crf_decoding operator returns a row vector with shape [N x 1]/[B x S], here
the shape depends on the inputs are LoDTensors or common tensors, whose values the shape depends on the inputs are LoDTensors or common tensors, whose values
range from 0 to maximum tag number - 1, Each element indicates an index of a range from 0 to maximum tag number - 1, Each element indicates an index of a
predicted tag. predicted tag.
......
...@@ -102,14 +102,14 @@ Crop Operator. ...@@ -102,14 +102,14 @@ Crop Operator.
Crop input into output, as specified by offsets and shape. Crop input into output, as specified by offsets and shape.
There are two ways to set the offsets: There are two ways to set the offsets:
1. In runtime: Using the input 'Offsets', which is a Variable and can be 1. In runtime: Using the input 'Offsets', which is a Variable and can be
output of other operators. This way is suitable for output of other operators. This way is suitable for
dynamic offsets. dynamic offsets.
2. In network configuration: Using the attribute 'offsets', which will be 2. In network configuration: Using the attribute 'offsets', which will be
set in Python configure script. This way is set in Python configure script. This way is
suitable for fixed offsets. suitable for fixed offsets.
You CANNOT use these two ways at the same time. An exception will be raised You CANNOT use these two ways at the same time. An exception will be raised
if input 'Offset' is configured and meanwhile the attribute 'offsets' is if input 'Offset' is configured and meanwhile the attribute 'offsets' is
not empty. not empty.
There are two ways to set shape: There are two ways to set shape:
......
...@@ -180,26 +180,26 @@ CropTensor Operator. ...@@ -180,26 +180,26 @@ CropTensor Operator.
Crop input into output, as specified by offsets and shape. Crop input into output, as specified by offsets and shape.
There are three ways to set the offsets: There are three ways to set the offsets:
1. Input 'OffsetsTensor: It is a tensor list. It should be set as a list that 1. Input 'OffsetsTensor: It is a tensor list. It should be set as a list that
contains tensor variable in python configure script. contains tensor variable in python configure script.
This way is suitable for dynamic offsets. This way is suitable for dynamic offsets.
2. Input 'Offsets': It is a variable and can be output of other operators. 2. Input 'Offsets': It is a variable and can be output of other operators.
This way is suitable for dynamic offsets. This way is suitable for dynamic offsets.
3. Attribute 'offsets': It will be set in python configure script. This way 3. Attribute 'offsets': It will be set in python configure script. This way
is suitable for fixed offsets. is suitable for fixed offsets.
You CANNOT use these three ways at the same time. An exception will be raised You CANNOT use these three ways at the same time. An exception will be raised
if input 'OffsetsTensor' or 'Offset' is configured and meanwhile the attribute 'offsets' is if input 'OffsetsTensor' or 'Offset' is configured and meanwhile the attribute 'offsets' is
not empty. not empty.
There are three ways to set shape: There are three ways to set shape:
1. Input 'ShapeTensor': It is a tensor list. It should be set as a list that contains 1. Input 'ShapeTensor': It is a tensor list. It should be set as a list that contains
tensor variable in python configure script. This way is suitable tensor variable in python configure script. This way is suitable
for dynamic shape. for dynamic shape.
2. Input 'Shape': It is a Variable and can be output of other operators. This way is suitable 2. Input 'Shape': It is a Variable and can be output of other operators. This way is suitable
for dynamic shape. for dynamic shape.
2. Attribute 'shape': crop input X into the shape described by a list<int>. The size of shape 2. Attribute 'shape': crop input X into the shape described by a list<int>. The size of shape
list should be the same as the dimension size of input X. This way is list should be the same as the dimension size of input X. This way is
suitable for fixed shape. suitable for fixed shape.
The input should be a k-D tensor(k > 0 and k < 7). As an example: The input should be a k-D tensor(k > 0 and k < 7). As an example:
......
...@@ -250,10 +250,10 @@ class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -250,10 +250,10 @@ class CrossEntropyOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
CrossEntropy Operator. CrossEntropy Operator.
The input 'X' and 'Label' will first be logically flattened to 2-D matrixs. The input 'X' and 'Label' will first be logically flattened to 2-D matrixs.
The matrix's second dimension(row length) is as same as the original last The matrix's second dimension(row length) is as same as the original last
dimension, and the first dimension(column length) is the product of all other dimension, and the first dimension(column length) is the product of all other
original dimensions. Then the softmax computation will take palce on each raw original dimensions. Then the softmax computation will take palce on each raw
of flattened matrixs. of flattened matrixs.
It supports both standard cross-entropy and soft-label cross-entropy loss It supports both standard cross-entropy and soft-label cross-entropy loss
...@@ -385,10 +385,10 @@ class CrossEntropyOpMaker2 : public framework::OpProtoAndCheckerMaker { ...@@ -385,10 +385,10 @@ class CrossEntropyOpMaker2 : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
Hard-label CrossEntropy Operator. Hard-label CrossEntropy Operator.
The input 'X' and 'Label' will first be logically flattened to 2-D matrixs. The input 'X' and 'Label' will first be logically flattened to 2-D matrixs.
The matrix's second dimension(row length) is as same as the original last The matrix's second dimension(row length) is as same as the original last
dimension, and the first dimension(column length) is the product of all other dimension, and the first dimension(column length) is the product of all other
original dimensions. Then the softmax computation will take palce on each raw original dimensions. Then the softmax computation will take palce on each raw
of flattened matrixs. of flattened matrixs.
Only support hard label. Only support hard label.
......
...@@ -93,12 +93,12 @@ Then: ...@@ -93,12 +93,12 @@ Then:
Output.dims = {8, 1} Output.dims = {8, 1}
Output.LoD = [[0, 6, 8]] Output.LoD = [[0, 6, 8]]
or Given: or Given:
Input.data = [[0, 1, 2, 2, 0, 4], Input.data = [[0, 1, 2, 2, 0, 4],
[0, 4, 5, 0, 6, 0], [0, 4, 5, 0, 6, 0],
[0, 7, 7, 7, 0, 0]] [0, 7, 7, 7, 0, 0]]
InputLength.data = [[6], InputLength.data = [[6],
[5], [5],
[4]], [4]],
Input.dims = {3, 6}, Input.dims = {3, 6},
Input.Lod = [] Input.Lod = []
And: And:
......
...@@ -190,7 +190,7 @@ class CudnnLSTMOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -190,7 +190,7 @@ class CudnnLSTMOpMaker : public framework::OpProtoAndCheckerMaker {
CUDNN LSTM implementation CUDNN LSTM implementation
A four-gate Long Short-Term Memory network with no peephole connections. A four-gate Long Short-Term Memory network with no peephole connections.
In the forward pass the output ht and cell output ct for a given iteration can be computed from the recurrent input ht-1, In the forward pass the output ht and cell output ct for a given iteration can be computed from the recurrent input ht-1,
the cell input ct-1 and the previous layer input xt given matrices W, R and biases bW, bR from the following equations: the cell input ct-1 and the previous layer input xt given matrices W, R and biases bW, bR from the following equations:
$$ i_t = sigmoid(W_{ix}x_{t} + W_{ih}h_{t-1} + bx_i + bh_i) $$ $$ i_t = sigmoid(W_{ix}x_{t} + W_{ih}h_{t-1} + bx_i + bh_i) $$
...@@ -217,7 +217,7 @@ $$ h_t = o_t \\odot tanh(c_t) $$ ...@@ -217,7 +217,7 @@ $$ h_t = o_t \\odot tanh(c_t) $$
- $\tilde{c_t}$ is also called candidate hidden state, - $\tilde{c_t}$ is also called candidate hidden state,
which is computed based on the current input and the previous hidden state. which is computed based on the current input and the previous hidden state.
Where sigmoid is the sigmoid operator: sigmoid(x) = 1 / (1 + e^-x), * represents a point-wise multiplication, Where sigmoid is the sigmoid operator: sigmoid(x) = 1 / (1 + e^-x), * represents a point-wise multiplication,
X represensts a matrix multiplication X represensts a matrix multiplication
......
...@@ -35,7 +35,7 @@ class CumprodOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -35,7 +35,7 @@ class CumprodOpMaker : public framework::OpProtoAndCheckerMaker {
"(int), The dim along which the input tensors will be cumproded"); "(int), The dim along which the input tensors will be cumproded");
AddComment( AddComment(
R"DOC(Cumprod operator. Return the cumprod results of the input elements along the dim. R"DOC(Cumprod operator. Return the cumprod results of the input elements along the dim.
For example, if input X is a tensor with rank 1 and N elements, the output will also be a tensor For example, if input X is a tensor with rank 1 and N elements, the output will also be a tensor
with rank 1 and N elements, and elements y[i] = x[0] * x[1] * x[2] *...* x[i] (0<=i<N))DOC"); with rank 1 and N elements, and elements y[i] = x[0] * x[1] * x[2] *...* x[i] (0<=i<N))DOC");
} }
}; };
......
...@@ -61,9 +61,9 @@ class DecodeJpegOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -61,9 +61,9 @@ class DecodeJpegOpMaker : public framework::OpProtoAndCheckerMaker {
"of the JPEG image. It is a tensor with rank 1."); "of the JPEG image. It is a tensor with rank 1.");
AddOutput("Out", "The output tensor of DecodeJpeg op"); AddOutput("Out", "The output tensor of DecodeJpeg op");
AddComment(R"DOC( AddComment(R"DOC(
This operator decodes a JPEG image into a 3 dimensional RGB Tensor This operator decodes a JPEG image into a 3 dimensional RGB Tensor
or 1 dimensional Gray Tensor. Optionally converts the image to the or 1 dimensional Gray Tensor. Optionally converts the image to the
desired format. The values of the output tensor are uint8 between 0 desired format. The values of the output tensor are uint8 between 0
and 255. and 255.
)DOC"); )DOC");
AddAttr<std::string>( AddAttr<std::string>(
......
...@@ -73,13 +73,13 @@ class DeformableConvV1OpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -73,13 +73,13 @@ class DeformableConvV1OpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
**Deformable Convolution v1 Operator** **Deformable Convolution v1 Operator**
Deformable Convolution is a new method based Convolution which feature has offset Deformable Convolution is a new method based Convolution which feature has offset
in spatial location. in spatial location.
1. Get offset of each pixel in feature map with convolution layers which number 1. Get offset of each pixel in feature map with convolution layers which number
of channels should be double of weight size. of channels should be double of weight size.
2. Add offset to pixel to get new location and the new value which are computed 2. Add offset to pixel to get new location and the new value which are computed
directly through bilinear interpolation with four nearest pixel. directly through bilinear interpolation with four nearest pixel.
3. Get the product of pixel and weight as result 3. Get the product of pixel and weight as result
......
...@@ -104,7 +104,7 @@ class DeformablePSROIPoolOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -104,7 +104,7 @@ class DeformablePSROIPoolOpMaker : public framework::OpProtoAndCheckerMaker {
"W is thewidth of output. "); "W is thewidth of output. ");
AddComment(R"DOC( AddComment(R"DOC(
**DeformablePSROIPooling Operator** **DeformablePSROIPooling Operator**
DeformablePSROIPooling is a new method based Region of interest pooling DeformablePSROIPooling is a new method based Region of interest pooling
(also known as RoI pooling). (also known as RoI pooling).
The operator has four steps: The operator has four steps:
......
...@@ -82,14 +82,14 @@ This operator clips input boxes to original input images. ...@@ -82,14 +82,14 @@ This operator clips input boxes to original input images.
For each input box, The formula is given as follows: For each input box, The formula is given as follows:
$$xmin = \max(\min(xmin, im_w - 1), 0)$$ $$xmin = \max(\min(xmin, im_w - 1), 0)$$
$$ymin = \max(\min(ymin, im_h - 1), 0)$$ $$ymin = \max(\min(ymin, im_h - 1), 0)$$
$$xmax = \max(\min(xmax, im_w - 1), 0)$$ $$xmax = \max(\min(xmax, im_w - 1), 0)$$
$$ymax = \max(\min(ymax, im_h - 1), 0)$$ $$ymax = \max(\min(ymax, im_h - 1), 0)$$
where im_w and im_h are computed from ImInfo, the formula is given as follows: where im_w and im_h are computed from ImInfo, the formula is given as follows:
$$im_w = \round(width / im_scale)$$ $$im_w = \round(width / im_scale)$$
$$im_h = \round(height / im_scale)$$ $$im_h = \round(height / im_scale)$$
)DOC"); )DOC");
} }
}; };
......
...@@ -98,9 +98,9 @@ The Encoding schema described below: ...@@ -98,9 +98,9 @@ The Encoding schema described below:
oy = (ty - py) / ph / pyv oy = (ty - py) / ph / pyv
ow = log(abs(tw / pw)) / pwv ow = log(abs(tw / pw)) / pwv
oh = log(abs(th / ph)) / phv oh = log(abs(th / ph)) / phv
The Decoding schema described below: The Decoding schema described below:
...@@ -116,11 +116,11 @@ where `tx`, `ty`, `tw`, `th` denote the target box's center coordinates, width ...@@ -116,11 +116,11 @@ where `tx`, `ty`, `tw`, `th` denote the target box's center coordinates, width
and height respectively. Similarly, `px`, `py`, `pw`, `ph` denote the and height respectively. Similarly, `px`, `py`, `pw`, `ph` denote the
priorbox's (anchor) center coordinates, width and height. `pxv`, `pyv`, `pwv`, priorbox's (anchor) center coordinates, width and height. `pxv`, `pyv`, `pwv`,
`phv` denote the variance of the priorbox and `ox`, `oy`, `ow`, `oh` denote the `phv` denote the variance of the priorbox and `ox`, `oy`, `ow`, `oh` denote the
encoded/decoded coordinates, width and height. encoded/decoded coordinates, width and height.
During Box Decoding, two modes for broadcast are supported. Say target box has During Box Decoding, two modes for broadcast are supported. Say target box has
shape [N, M, 4], and the shape of prior box can be [N, 4] or [M, 4]. Then prior shape [N, M, 4], and the shape of prior box can be [N, 4] or [M, 4]. Then prior
box will broadcast to target box along the assigned axis. box will broadcast to target box along the assigned axis.
)DOC"); )DOC");
} }
}; };
......
...@@ -189,7 +189,7 @@ Decode the target bounding box with the prior_box information. ...@@ -189,7 +189,7 @@ Decode the target bounding box with the prior_box information.
The Decoding schema is described below: The Decoding schema is described below:
$$ $$
ox = (pw \\times pxv \\times tx + px) - \\frac{tw}{2} ox = (pw \\times pxv \\times tx + px) - \\frac{tw}{2}
$$ $$
$$ $$
oy = (ph \\times pyv \\times ty + py) - \\frac{th}{2} oy = (ph \\times pyv \\times ty + py) - \\frac{th}{2}
...@@ -205,11 +205,11 @@ where `tx`, `ty`, `tw`, `th` denote the target box's center coordinates, width ...@@ -205,11 +205,11 @@ where `tx`, `ty`, `tw`, `th` denote the target box's center coordinates, width
and height respectively. Similarly, `px`, `py`, `pw`, `ph` denote the and height respectively. Similarly, `px`, `py`, `pw`, `ph` denote the
prior_box's (anchor) center coordinates, width and height. `pxv`, `pyv`, `pwv`, prior_box's (anchor) center coordinates, width and height. `pxv`, `pyv`, `pwv`,
`phv` denote the variance of the prior_box and `ox`, `oy`, `ow`, `oh` denote the `phv` denote the variance of the prior_box and `ox`, `oy`, `ow`, `oh` denote the
decoded coordinates, width and height in decode_box. decoded coordinates, width and height in decode_box.
decode_box is obtained after box decode, then assigning schema is described below: decode_box is obtained after box decode, then assigning schema is described below:
For each prior_box, use the best non-background class's decoded values to For each prior_box, use the best non-background class's decoded values to
update the prior_box locations and get output_assign_box. So, the shape of update the prior_box locations and get output_assign_box. So, the shape of
output_assign_box is the same as PriorBox. output_assign_box is the same as PriorBox.
)DOC"); )DOC");
......
...@@ -125,7 +125,7 @@ class CollectFpnProposalsOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -125,7 +125,7 @@ class CollectFpnProposalsOpMaker : public framework::OpProtoAndCheckerMaker {
This operator concats all proposals from different images This operator concats all proposals from different images
and different FPN levels. Then sort all of those proposals and different FPN levels. Then sort all of those proposals
by objectness confidence. Select the post_nms_topN RoIs in by objectness confidence. Select the post_nms_topN RoIs in
total. Finally, re-sort the RoIs in the order of batch index. total. Finally, re-sort the RoIs in the order of batch index.
)DOC"); )DOC");
} }
}; };
...@@ -145,7 +145,7 @@ REGISTER_OP_CPU_KERNEL(collect_fpn_proposals, ...@@ -145,7 +145,7 @@ REGISTER_OP_CPU_KERNEL(collect_fpn_proposals,
REGISTER_OP_VERSION(collect_fpn_proposals) REGISTER_OP_VERSION(collect_fpn_proposals)
.AddCheckpoint( .AddCheckpoint(
R"ROC( R"ROC(
Upgrade collect_fpn_proposals add a new input Upgrade collect_fpn_proposals add a new input
[MultiLevelRoIsNum] and add a new output [RoisNum].)ROC", [MultiLevelRoIsNum] and add a new output [RoisNum].)ROC",
paddle::framework::compatible::OpVersionDesc() paddle::framework::compatible::OpVersionDesc()
.NewInput("MultiLevelRoIsNum", .NewInput("MultiLevelRoIsNum",
......
...@@ -86,7 +86,7 @@ class GenerateProposalsV2OpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -86,7 +86,7 @@ class GenerateProposalsV2OpMaker : public framework::OpProtoAndCheckerMaker {
"If true, im_shape pixel offset is 1.") "If true, im_shape pixel offset is 1.")
.SetDefault(true); .SetDefault(true);
AddComment(R"DOC( AddComment(R"DOC(
This operator is the second version of generate_proposals op to generate This operator is the second version of generate_proposals op to generate
bounding box proposals for Faster RCNN. bounding box proposals for Faster RCNN.
The proposals are generated for a list of images based on image The proposals are generated for a list of images based on image
score 'Scores', bounding box regression result 'BboxDeltas' as score 'Scores', bounding box regression result 'BboxDeltas' as
...@@ -96,9 +96,9 @@ boxes. ...@@ -96,9 +96,9 @@ boxes.
The difference between this version and the first version is that the image The difference between this version and the first version is that the image
scale is no long needed now, so the input requires im_shape instead of im_info. scale is no long needed now, so the input requires im_shape instead of im_info.
The change aims to unify the input for all kinds of objective detection The change aims to unify the input for all kinds of objective detection
such as YOLO-v3 and Faster R-CNN. As a result, the min_size represents the such as YOLO-v3 and Faster R-CNN. As a result, the min_size represents the
size on input image instead of original image which is slightly different size on input image instead of original image which is slightly different
to before and will not effect the result. to before and will not effect the result.
)DOC"); )DOC");
......
...@@ -95,7 +95,7 @@ boxes in 'Y' are shared by all instance of the batched inputs of X. ...@@ -95,7 +95,7 @@ boxes in 'Y' are shared by all instance of the batched inputs of X.
Given two boxes A and B, the calculation of IOU is as follows: Given two boxes A and B, the calculation of IOU is as follows:
$$ $$
IOU(A, B) = IOU(A, B) =
\\frac{area(A\\cap B)}{area(A)+area(B)-area(A\\cap B)} \\frac{area(A\\cap B)}{area(A)+area(B)-area(A\\cap B)}
$$ $$
......
...@@ -116,7 +116,7 @@ independently for each class. The outputs is a 2-D LoDTenosr, for each ...@@ -116,7 +116,7 @@ independently for each class. The outputs is a 2-D LoDTenosr, for each
image, the offsets in first dimension of LoDTensor are called LoD, the number image, the offsets in first dimension of LoDTensor are called LoD, the number
of offset is N + 1, where N is the batch size. If LoD[i + 1] - LoD[i] == 0, of offset is N + 1, where N is the batch size. If LoD[i + 1] - LoD[i] == 0,
means there is no detected bbox for this image. Now this operator has one more means there is no detected bbox for this image. Now this operator has one more
output, which is RoisNum. The size of RoisNum is N, RoisNum[i] means the number of output, which is RoisNum. The size of RoisNum is N, RoisNum[i] means the number of
detected bbox for this image. detected bbox for this image.
For more information on Matrix NMS, please refer to: For more information on Matrix NMS, please refer to:
......
...@@ -383,11 +383,11 @@ class MineHardExamplesOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -383,11 +383,11 @@ class MineHardExamplesOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
Mine hard examples Operator. Mine hard examples Operator.
This operator implements hard example mining to select a subset of negative box indices. This operator implements hard example mining to select a subset of negative box indices.
For each image, selects the box with highest losses. subject to the condition that the For each image, selects the box with highest losses. subject to the condition that the
box cannot have an Matcht > neg_dist_threshold when mining_type is max_negative. box cannot have an Matcht > neg_dist_threshold when mining_type is max_negative.
The selected number is min(sample_size, max_negative_box_number) when mining_type is The selected number is min(sample_size, max_negative_box_number) when mining_type is
hard_example, or min(neg_pos_ratio * positive_box_number, max_negative_box_number) hard_example, or min(neg_pos_ratio * positive_box_number, max_negative_box_number)
when mining_type is max_negative, where the max_negative_box_number is the count of when mining_type is max_negative, where the max_negative_box_number is the count of
MatchIndices elements with value -1. MatchIndices elements with value -1.
)DOC"); )DOC");
} }
......
...@@ -640,7 +640,7 @@ where `tx`, `ty`, `tw`, `th` denote the predicted box's center coordinates, widt ...@@ -640,7 +640,7 @@ where `tx`, `ty`, `tw`, `th` denote the predicted box's center coordinates, widt
and height respectively. Similarly, `px`, `py`, `pw`, `ph` denote the and height respectively. Similarly, `px`, `py`, `pw`, `ph` denote the
anchor's center coordinates, width and height. `pxv`, `pyv`, `pwv`, anchor's center coordinates, width and height. `pxv`, `pyv`, `pwv`,
`phv` denote the variance of the anchor box and `ox`, `oy`, `ow`, `oh` denote the `phv` denote the variance of the anchor box and `ox`, `oy`, `ow`, `oh` denote the
decoded coordinates, width and height. decoded coordinates, width and height.
Then the top decoded prediction from all levels are merged followed by NMS. Then the top decoded prediction from all levels are merged followed by NMS.
In the NMS step, this operator prunes away boxes that have high IOU In the NMS step, this operator prunes away boxes that have high IOU
......
...@@ -661,7 +661,7 @@ The rest anchors would not contibute to the RPN training loss ...@@ -661,7 +661,7 @@ The rest anchors would not contibute to the RPN training loss
ScoreIndex is composed of foreground anchor indexes(positive labels) and ScoreIndex is composed of foreground anchor indexes(positive labels) and
background anchor indexes(negative labels). LocationIndex is exactly same background anchor indexes(negative labels). LocationIndex is exactly same
as the foreground anchor indexes since we can not assign regression target to as the foreground anchor indexes since we can not assign regression target to
the background anchors. the background anchors.
The classification targets(TargetLabel) is a binary class label (of being The classification targets(TargetLabel) is a binary class label (of being
...@@ -730,16 +730,16 @@ class RetinanetTargetAssignOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -730,16 +730,16 @@ class RetinanetTargetAssignOpMaker : public framework::OpProtoAndCheckerMaker {
This layer can be, for given the Intersection-over-Union (IoU) overlap This layer can be, for given the Intersection-over-Union (IoU) overlap
between anchors and ground truth boxes, to assign classification and between anchors and ground truth boxes, to assign classification and
regression targets to each anchor, these target labels are used for regression targets to each anchor, these target labels are used for
train retinanet. train retinanet.
Every anchor is assigned with a length C one-hot vector of Every anchor is assigned with a length C one-hot vector of
classification targets, and a 4-vector of box regression targets, classification targets, and a 4-vector of box regression targets,
where C is the class number. The assignment rules are as followed: where C is the class number. The assignment rules are as followed:
1. Anchors are assigned to ground-truth boxes when: (i) it has the highest 1. Anchors are assigned to ground-truth boxes when: (i) it has the highest
IoU overlap with a ground-truth box, or (ii) it has an IoU overlap higher IoU overlap with a ground-truth box, or (ii) it has an IoU overlap higher
than positive_overlap(0.5) with any ground-truth box. than positive_overlap(0.5) with any ground-truth box.
2. Anchors are assigned to background when its IoU ratio is lower than 2. Anchors are assigned to background when its IoU ratio is lower than
negative_overlap (0.4) for all ground-truth boxes. negative_overlap (0.4) for all ground-truth boxes.
......
...@@ -131,7 +131,7 @@ If id = MatchIndices[i][j] > 0, ...@@ -131,7 +131,7 @@ If id = MatchIndices[i][j] > 0,
Out[i][j][0 : K] = X[lod[i] + id][j % P][0 : K] Out[i][j][0 : K] = X[lod[i] + id][j % P][0 : K]
OutWeight[i][j] = 1. OutWeight[i][j] = 1.
Otherwise, Otherwise,
Out[j][j][0 : K] = {mismatch_value, mismatch_value, ...} Out[j][j][0 : K] = {mismatch_value, mismatch_value, ...}
OutWeight[i][j] = 0. OutWeight[i][j] = 0.
......
...@@ -192,19 +192,19 @@ class YoloBoxOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -192,19 +192,19 @@ class YoloBoxOpMaker : public framework::OpProtoAndCheckerMaker {
.SetDefault(0.5); .SetDefault(0.5);
AddComment(R"DOC( AddComment(R"DOC(
This operator generates YOLO detection boxes from output of YOLOv3 network. This operator generates YOLO detection boxes from output of YOLOv3 network.
The output of previous network is in shape [N, C, H, W], while H and W The output of previous network is in shape [N, C, H, W], while H and W
should be the same, H and W specify the grid size, each grid point predict should be the same, H and W specify the grid size, each grid point predict
given number boxes, this given number, which following will be represented as S, given number boxes, this given number, which following will be represented as S,
is specified by the number of anchors. In the second dimension(the channel is specified by the number of anchors. In the second dimension(the channel
dimension), C should be equal to S * (5 + class_num) if :attr:`iou_aware` is false, dimension), C should be equal to S * (5 + class_num) if :attr:`iou_aware` is false,
otherwise C should be equal to S * (6 + class_num). class_num is the object otherwise C should be equal to S * (6 + class_num). class_num is the object
category number of source dataset(such as 80 in coco dataset), so the category number of source dataset(such as 80 in coco dataset), so the
second(channel) dimension, apart from 4 box location coordinates x, y, w, h, second(channel) dimension, apart from 4 box location coordinates x, y, w, h,
also includes confidence score of the box and class one-hot key of each anchor also includes confidence score of the box and class one-hot key of each anchor
box. box.
Assume the 4 location coordinates are :math:`t_x, t_y, t_w, t_h`, the box Assume the 4 location coordinates are :math:`t_x, t_y, t_w, t_h`, the box
predictions should be as follows: predictions should be as follows:
$$ $$
...@@ -225,9 +225,9 @@ class YoloBoxOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -225,9 +225,9 @@ class YoloBoxOpMaker : public framework::OpProtoAndCheckerMaker {
The logistic regression value of the 5th channel of each anchor prediction boxes The logistic regression value of the 5th channel of each anchor prediction boxes
represents the confidence score of each prediction box, and the logistic represents the confidence score of each prediction box, and the logistic
regression value of the last :attr:`class_num` channels of each anchor prediction regression value of the last :attr:`class_num` channels of each anchor prediction
boxes represents the classifcation scores. Boxes with confidence scores less than boxes represents the classifcation scores. Boxes with confidence scores less than
:attr:`conf_thresh` should be ignored, and box final scores is the product of :attr:`conf_thresh` should be ignored, and box final scores is the product of
confidence scores and classification scores. confidence scores and classification scores.
$$ $$
......
...@@ -105,14 +105,14 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -105,14 +105,14 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
This operator generates yolov3 loss based on given predict result and ground This operator generates yolov3 loss based on given predict result and ground
truth boxes. truth boxes.
The output of previous network is in shape [N, C, H, W], while H and W The output of previous network is in shape [N, C, H, W], while H and W
should be the same, H and W specify the grid size, each grid point predict should be the same, H and W specify the grid size, each grid point predict
given number bounding boxes, this given number, which following will be represented as S, given number bounding boxes, this given number, which following will be represented as S,
is specified by the number of anchor clusters in each scale. In the second dimension(the channel is specified by the number of anchor clusters in each scale. In the second dimension(the channel
dimension), C should be equal to S * (class_num + 5), class_num is the object dimension), C should be equal to S * (class_num + 5), class_num is the object
category number of source dataset(such as 80 in coco dataset), so in the category number of source dataset(such as 80 in coco dataset), so in the
second(channel) dimension, apart from 4 box location coordinates x, y, w, h, second(channel) dimension, apart from 4 box location coordinates x, y, w, h,
also includes confidence score of the box and class one-hot key of each anchor box. also includes confidence score of the box and class one-hot key of each anchor box.
Assume the 4 location coordinates are :math:`t_x, t_y, t_w, t_h`, the box predictions Assume the 4 location coordinates are :math:`t_x, t_y, t_w, t_h`, the box predictions
...@@ -135,21 +135,21 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -135,21 +135,21 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
and :math:`p_w, p_h` is specified by anchors. and :math:`p_w, p_h` is specified by anchors.
As for confidence score, it is the logistic regression value of IoU between As for confidence score, it is the logistic regression value of IoU between
anchor boxes and ground truth boxes, the score of the anchor box which has anchor boxes and ground truth boxes, the score of the anchor box which has
the max IoU should be 1, and if the anchor box has IoU bigger than ignore the max IoU should be 1, and if the anchor box has IoU bigger than ignore
thresh, the confidence score loss of this anchor box will be ignored. thresh, the confidence score loss of this anchor box will be ignored.
Therefore, the yolov3 loss consists of three major parts: box location loss, Therefore, the yolov3 loss consists of three major parts: box location loss,
objectness loss and classification loss. The L1 loss is used for objectness loss and classification loss. The L1 loss is used for
box coordinates (w, h), sigmoid cross entropy loss is used for box box coordinates (w, h), sigmoid cross entropy loss is used for box
coordinates (x, y), objectness loss and classification loss. coordinates (x, y), objectness loss and classification loss.
Each groud truth box finds a best matching anchor box in all anchors. Each groud truth box finds a best matching anchor box in all anchors.
Prediction of this anchor box will incur all three parts of losses, and Prediction of this anchor box will incur all three parts of losses, and
prediction of anchor boxes with no GT box matched will only incur objectness prediction of anchor boxes with no GT box matched will only incur objectness
loss. loss.
In order to trade off box coordinate losses between big boxes and small In order to trade off box coordinate losses between big boxes and small
boxes, box coordinate losses will be mutiplied by scale weight, which is boxes, box coordinate losses will be mutiplied by scale weight, which is
calculated as follows. calculated as follows.
...@@ -165,12 +165,12 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -165,12 +165,12 @@ class Yolov3LossOpMaker : public framework::OpProtoAndCheckerMaker {
$$ $$
While :attr:`use_label_smooth` is set to be :attr:`True`, the classification While :attr:`use_label_smooth` is set to be :attr:`True`, the classification
target will be smoothed when calculating classification loss, target of target will be smoothed when calculating classification loss, target of
positive samples will be smoothed to :math:`1.0 - 1.0 / class\_num` and target of positive samples will be smoothed to :math:`1.0 - 1.0 / class\_num` and target of
negetive samples will be smoothed to :math:`1.0 / class\_num`. negetive samples will be smoothed to :math:`1.0 / class\_num`.
While :attr:`GTScore` is given, which means the mixup score of ground truth While :attr:`GTScore` is given, which means the mixup score of ground truth
boxes, all losses incured by a ground truth box will be multiplied by its boxes, all losses incured by a ground truth box will be multiplied by its
mixup score. mixup score.
)DOC"); )DOC");
} }
......
...@@ -126,10 +126,10 @@ class DGCOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -126,10 +126,10 @@ class DGCOpMaker : public framework::OpProtoAndCheckerMaker {
DGC also uses momentum factor masking and warmup training to overcome the staleness problem caused by reduced communication. DGC also uses momentum factor masking and warmup training to overcome the staleness problem caused by reduced communication.
This optimizer will do two things: This optimizer will do two things:
1. Compress the gradient by get TopK import value from tensor \ 1. Compress the gradient by get TopK import value from tensor \
and use it for allreduce to reduce network bandwidth. and use it for allreduce to reduce network bandwidth.
2. Call momentum to optimize on the cost. 2. Call momentum to optimize on the cost.
)DOC"); )DOC");
......
...@@ -47,11 +47,11 @@ class DiagEmbedOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -47,11 +47,11 @@ class DiagEmbedOpMaker : public framework::OpProtoAndCheckerMaker {
)DOC") )DOC")
.SetDefault(-1); .SetDefault(-1);
AddComment(R"DOC(Creates a tensor whose diagonals of certain 2D planes AddComment(R"DOC(Creates a tensor whose diagonals of certain 2D planes
(specified by dim1 and dim2) are filled by input. (specified by dim1 and dim2) are filled by input.
To facilitate creating batched diagonal matrices, To facilitate creating batched diagonal matrices,
the 2D planes formed by the last two dimensions of the returned tensor the 2D planes formed by the last two dimensions of the returned tensor
are chosen by default. are chosen by default.
)DOC"); )DOC");
} }
}; };
......
...@@ -45,7 +45,7 @@ class DiagOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -45,7 +45,7 @@ class DiagOpMaker : public framework::OpProtoAndCheckerMaker {
"Diagonal values of square matrix. It is a tensor with rank 1."); "Diagonal values of square matrix. It is a tensor with rank 1.");
AddOutput("Out", "A square matrix."); AddOutput("Out", "A square matrix.");
AddComment(R"DOC( AddComment(R"DOC(
Return a square matrix with specified diagonal values. Return a square matrix with specified diagonal values.
)DOC"); )DOC");
} }
}; };
......
...@@ -65,7 +65,7 @@ strings and their references. ...@@ -65,7 +65,7 @@ strings and their references.
Edit distance, also called Levenshtein distance, measures how dissimilar two strings Edit distance, also called Levenshtein distance, measures how dissimilar two strings
are by counting the minimum number of operations to transform one string into another. are by counting the minimum number of operations to transform one string into another.
The operations include insertion, deletion, and substitution. The operations include insertion, deletion, and substitution.
For example, given hypothesis string A = "kitten" and reference B = "sitting", For example, given hypothesis string A = "kitten" and reference B = "sitting",
A will be transformed into B at least after two substitutions and one A will be transformed into B at least after two substitutions and one
......
...@@ -30,7 +30,7 @@ class FillAnyOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -30,7 +30,7 @@ class FillAnyOpMaker : public framework::OpProtoAndCheckerMaker {
.SetDefault(0); .SetDefault(0);
AddAttr<int>("value_int", "The int var to fill in Tensor").SetDefault(0); AddAttr<int>("value_int", "The int var to fill in Tensor").SetDefault(0);
AddComment(R"DOC(Fill operator with backward; AddComment(R"DOC(Fill operator with backward;
Fill an tensor with `value`. Fill an tensor with `value`.
)DOC"); )DOC");
}; };
}; };
......
...@@ -80,15 +80,15 @@ class FilterByInstagOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -80,15 +80,15 @@ class FilterByInstagOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("LossWeight", "(Tensor) loss weight."); AddOutput("LossWeight", "(Tensor) loss weight.");
AddOutput("IndexMap", "(LoDTensor) mapping from Out rows to X1 rows"); AddOutput("IndexMap", "(LoDTensor) mapping from Out rows to X1 rows");
AddComment(R"DOC( AddComment(R"DOC(
Filter By Instag Op Filter By Instag Op
This operator is used to filter embeded ins. This operator is used to filter embeded ins.
There are 3 inputs. First is embeded ins, Second is tags for ins, There are 3 inputs. First is embeded ins, Second is tags for ins,
Third is tags to filter. Third is tags to filter.
There are 3 outputs. First is filtered embeded ins, Second is Loss Weight, There are 3 outputs. First is filtered embeded ins, Second is Loss Weight,
Third is the IndexMap from Out line number to X1 line number. Third is the IndexMap from Out line number to X1 line number.
)DOC"); )DOC");
} }
}; };
......
...@@ -70,9 +70,9 @@ class FoldOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -70,9 +70,9 @@ class FoldOpMaker : public framework::OpProtoAndCheckerMaker {
**Fold Operator** **Fold Operator**
This Operator is used to combines an array of sliding local blocks into a large containing This Operator is used to combines an array of sliding local blocks into a large containing
tensor. also known as col2im when operated on batched 2D image tensor. Fold calculates each tensor. also known as col2im when operated on batched 2D image tensor. Fold calculates each
combined value in the resulting large tensor by summing all values from all containing blocks. combined value in the resulting large tensor by summing all values from all containing blocks.
Unfold extracts the values in the local blocks by copying from the large tensor. So, if the Unfold extracts the values in the local blocks by copying from the large tensor. So, if the
blocks overlap, they are not inverses of each other. blocks overlap, they are not inverses of each other.
)DOC"); )DOC");
} }
......
...@@ -432,8 +432,8 @@ class FusedAttentionOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -432,8 +432,8 @@ class FusedAttentionOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
The fused_attention operator is the same as following pseudo codes: The fused_attention operator is the same as following pseudo codes:
// @input: [batch_size, seq_len, embed_dim] // @input: [batch_size, seq_len, embed_dim]
// @final_out: [batch_size, seq_len, num_heads, head_dim] // @final_out: [batch_size, seq_len, num_heads, head_dim]
residual = input residual = input
if (pre_layernorm) if (pre_layernorm)
query = layer_norm(input); query = layer_norm(input);
...@@ -447,7 +447,7 @@ class FusedAttentionOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -447,7 +447,7 @@ class FusedAttentionOpMaker : public framework::OpProtoAndCheckerMaker {
out = dropout(out); out = dropout(out);
out = out * v; out = out * v;
out = transpose(out, perm=[0, 2, 1, 3]); out = transpose(out, perm=[0, 2, 1, 3]);
} }
// out linear // out linear
out = linear(out); out = linear(out);
......
...@@ -140,8 +140,8 @@ class FusedBiasDropoutResidualLnOpMaker ...@@ -140,8 +140,8 @@ class FusedBiasDropoutResidualLnOpMaker
AddComment(R"DOC( AddComment(R"DOC(
Add fused bias_dropout_residual_layer_norm op whose logic is as follows: Add fused bias_dropout_residual_layer_norm op whose logic is as follows:
// @input: [batch_size, seq_len, embed_dim] // @input: [batch_size, seq_len, embed_dim]
// @final_out: [batch_size, seq_len, embed_dim] // @final_out: [batch_size, seq_len, embed_dim]
y = layer_norm(residual + dropout(bias + x)); y = layer_norm(residual + dropout(bias + x));
)DOC"); )DOC");
} }
......
...@@ -174,7 +174,7 @@ class FusedGateAttentionOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -174,7 +174,7 @@ class FusedGateAttentionOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
Add fused attention op whose logic is as follows: Add fused attention op whose logic is as follows:
{ {
q = paddle.einsum('nbqa,ahc->nbqhc', q_data, self.query_w) q = paddle.einsum('nbqa,ahc->nbqhc', q_data, self.query_w)
k = paddle.einsum('nbka,ahc->nbkhc', m_data, self.key_w) k = paddle.einsum('nbka,ahc->nbkhc', m_data, self.key_w)
v = paddle.einsum('nbka,ahc->nbkhc', m_data, self.value_w) v = paddle.einsum('nbka,ahc->nbkhc', m_data, self.value_w)
...@@ -189,10 +189,10 @@ class FusedGateAttentionOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -189,10 +189,10 @@ class FusedGateAttentionOpMaker : public framework::OpProtoAndCheckerMaker {
self.gating_w) + self.gating_b self.gating_w) + self.gating_b
gate_values_1 = nn.functional.sigmoid(gate_values) gate_values_1 = nn.functional.sigmoid(gate_values)
weighted_avg *= gate_values_1 weighted_avg *= gate_values_1
output = paddle.einsum('nbqhc,hco->nbqo', weighted_avg, output = paddle.einsum('nbqhc,hco->nbqo', weighted_avg,
self.output_w) + self.output_b self.output_w) + self.output_b
} }
)DOC"); )DOC");
} }
......
...@@ -164,32 +164,32 @@ class FusedGemmEpilogueOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -164,32 +164,32 @@ class FusedGemmEpilogueOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("Out", "The output tensor Out of Out = Act((X * Y) + Bias)."); AddOutput("Out", "The output tensor Out of Out = Act((X * Y) + Bias).");
AddOutput("ReserveSpace", AddOutput("ReserveSpace",
R"DOC(Reserve GPU space to place R"DOC(Reserve GPU space to place
auxiliary data pointer. It is used to pass auxiliary data pointer auxiliary data pointer. It is used to pass auxiliary data pointer
for fused_gemm_epilogue op. If not given (empty string), the for fused_gemm_epilogue op. If not given (empty string), the
auxiliary mode would not be enable.)DOC") auxiliary mode would not be enable.)DOC")
.AsDispensable() .AsDispensable()
.AsExtra(); .AsExtra();
AddAttr<bool>( AddAttr<bool>(
"trans_x", "trans_x",
R"DOC((bool, default false), Whether to transpose input tensor X R"DOC((bool, default false), Whether to transpose input tensor X
or not. The input tensor X coulbe be more than two dimension. When or not. The input tensor X coulbe be more than two dimension. When
set trans_x=true, it would fully reverse X. For instant: X with shpae set trans_x=true, it would fully reverse X. For instant: X with shpae
[d0, d1, d2, d3] -> [d3, d2, d1, d0].)DOC") [d0, d1, d2, d3] -> [d3, d2, d1, d0].)DOC")
.SetDefault(false); .SetDefault(false);
AddAttr<bool>( AddAttr<bool>(
"trans_y", "trans_y",
R"DOC((bool, default false), Whether to transpose input tensor Y R"DOC((bool, default false), Whether to transpose input tensor Y
or not. The input tensor Y should be two dimension. When or not. The input tensor Y should be two dimension. When
set trans_y=true, it would transpose Y. For instant: Y with shpae set trans_y=true, it would transpose Y. For instant: Y with shpae
[d0, d1] -> [d1, d0].)DOC") [d0, d1] -> [d1, d0].)DOC")
.SetDefault(false); .SetDefault(false);
AddAttr<std::string>( AddAttr<std::string>(
"activation", "activation",
R"DOC((string, default none), The activation function. It could be R"DOC((string, default none), The activation function. It could be
one of {none, relu, gelu}. When none is given, Act would be null one of {none, relu, gelu}. When none is given, Act would be null
operations)DOC") operations)DOC")
.SetDefault("none"); .SetDefault("none");
...@@ -337,9 +337,9 @@ class FusedGemmEpilogueGradOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -337,9 +337,9 @@ class FusedGemmEpilogueGradOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("X", "The input tensor X of Out = (Act(X) * Y) + bias"); AddInput("X", "The input tensor X of Out = (Act(X) * Y) + bias");
AddInput("Y", "The input tensor Y of Out = (Act(X) * Y) + bias"); AddInput("Y", "The input tensor Y of Out = (Act(X) * Y) + bias");
AddInput("ReserveSpace", AddInput("ReserveSpace",
R"DOC(A GPU space to fetch R"DOC(A GPU space to fetch
auxiliary data pointer. It is used to pass auxiliary data pointer auxiliary data pointer. It is used to pass auxiliary data pointer
for fused_gemm_epilogue_grad op. If not given (empty string), the for fused_gemm_epilogue_grad op. If not given (empty string), the
auxiliary mode would not be enable.)DOC") auxiliary mode would not be enable.)DOC")
.AsDispensable(); .AsDispensable();
...@@ -352,23 +352,23 @@ class FusedGemmEpilogueGradOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -352,23 +352,23 @@ class FusedGemmEpilogueGradOpMaker : public framework::OpProtoAndCheckerMaker {
.AsDispensable(); .AsDispensable();
AddAttr<bool>( AddAttr<bool>(
"trans_x", "trans_x",
R"DOC((bool, default false), Whether to transpose input tensor X R"DOC((bool, default false), Whether to transpose input tensor X
or not. The input tensor X coulbe be more than two dimension. When or not. The input tensor X coulbe be more than two dimension. When
set trans_x=true, it would fully reverse X. For instant: X with shpae set trans_x=true, it would fully reverse X. For instant: X with shpae
[d0, d1, d2, d3] -> [d3, d2, d1, d0].)DOC") [d0, d1, d2, d3] -> [d3, d2, d1, d0].)DOC")
.SetDefault(false); .SetDefault(false);
AddAttr<bool>( AddAttr<bool>(
"trans_y", "trans_y",
R"DOC((bool, default false), Whether to transpose input tensor Y R"DOC((bool, default false), Whether to transpose input tensor Y
or not. The input tensor Y should be two dimension. When or not. The input tensor Y should be two dimension. When
set trans_y=true, it would transpose Y. For instant: Y with shpae set trans_y=true, it would transpose Y. For instant: Y with shpae
[d0, d1] -> [d1, d0].)DOC") [d0, d1] -> [d1, d0].)DOC")
.SetDefault(false); .SetDefault(false);
AddAttr<std::string>( AddAttr<std::string>(
"activation_grad", "activation_grad",
R"DOC((string, default none), The backward activation function. It could be R"DOC((string, default none), The backward activation function. It could be
one of {none, relu_grad, gelu_grad}. When none is given, The backward Act would one of {none, relu_grad, gelu_grad}. When none is given, The backward Act would
be null operations)DOC") be null operations)DOC")
.SetDefault("none"); .SetDefault("none");
......
...@@ -251,7 +251,7 @@ void FusionGRUOpMaker::Make() { ...@@ -251,7 +251,7 @@ void FusionGRUOpMaker::Make() {
.SetDefault(false); .SetDefault(false);
AddComment(R"DOC( AddComment(R"DOC(
The Fusion complete GRU Operator. The Fusion complete GRU Operator.
This operator fuse the fully-connected operator into GRU, This operator fuse the fully-connected operator into GRU,
more details can refer to GRU op. more details can refer to GRU op.
)DOC"); )DOC");
} }
......
...@@ -79,7 +79,7 @@ void FusionSquaredMatSubOpMaker::Make() { ...@@ -79,7 +79,7 @@ void FusionSquaredMatSubOpMaker::Make() {
AddAttr<float>("scalar", "The scalar on output matrix.").SetDefault(1.f); AddAttr<float>("scalar", "The scalar on output matrix.").SetDefault(1.f);
AddComment(R"DOC( AddComment(R"DOC(
Fusion Squared Matrix and substrct operator. Fusion Squared Matrix and substrct operator.
( (X * Y).^2 - (X.^2 * Y.^2) ) .* scalar ( (X * Y).^2 - (X.^2 * Y.^2) ) .* scalar
)DOC"); )DOC");
} }
......
...@@ -219,7 +219,7 @@ void MultiGRUOpMaker::Make() { ...@@ -219,7 +219,7 @@ void MultiGRUOpMaker::Make() {
.SetDefault(false); .SetDefault(false);
AddComment(R"DOC( AddComment(R"DOC(
The Fusion complete GRU Operator. The Fusion complete GRU Operator.
This operator fuse the fully-connected operator into GRU, This operator fuse the fully-connected operator into GRU,
more details can refer to GRU op. more details can refer to GRU op.
)DOC"); )DOC");
} }
......
...@@ -274,10 +274,10 @@ class ResNetUnitOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -274,10 +274,10 @@ class ResNetUnitOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr<std::string>("act_type", "The activation type to be fused.") AddAttr<std::string>("act_type", "The activation type to be fused.")
.SetDefault("relu"); .SetDefault("relu");
AddComment(R"DOC( AddComment(R"DOC(
Fusion op of the basic unit of resnet block. Fusion op of the basic unit of resnet block.
The implementation is based on the latest fusion op interface in cuDNN v8.0. The implementation is based on the latest fusion op interface in cuDNN v8.0.
For more details: For more details:
https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnFusedOps_t https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnFusedOps_t
)DOC"); )DOC");
......
...@@ -81,7 +81,7 @@ class FusedTokenPruneOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -81,7 +81,7 @@ class FusedTokenPruneOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
fused_token_prune op is used to fuse multiple ops to perform token pruning. fused_token_prune op is used to fuse multiple ops to perform token pruning.
In this op: In this op:
1. Elements of Attn will be set to zero if their corresponding mask is smaller than 0. 1. Elements of Attn will be set to zero if their corresponding mask is smaller than 0.
2. The second dimension of X will be sorted by Attn. 2. The second dimension of X will be sorted by Attn.
3. The last (max_seq_len - slimmed_seq_len) lines of X will be pruned. 3. The last (max_seq_len - slimmed_seq_len) lines of X will be pruned.
4. The remainning part of sorted X will output. 4. The remainning part of sorted X will output.
......
...@@ -59,13 +59,13 @@ class GatherNdOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -59,13 +59,13 @@ class GatherNdOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
Gather_Nd Operator. Gather_Nd Operator.
This function is actually a high-dimensional extension of gather This function is actually a high-dimensional extension of gather
and supports for simultaneous indexing by multiple axes. Out is and supports for simultaneous indexing by multiple axes. Out is
obtained by gathering slices from X into a tensor with shape obtained by gathering slices from X into a tensor with shape
Index.shape[:-1] + X.shape[Index.shape[-1]:]. Index.shape[:-1] + X.shape[Index.shape[-1]:].
Example: Example:
Given: Given:
X = [[[ 0, 1, 2, 3], X = [[[ 0, 1, 2, 3],
[ 4, 5, 6, 7], [ 4, 5, 6, 7],
...@@ -73,7 +73,7 @@ class GatherNdOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -73,7 +73,7 @@ class GatherNdOpMaker : public framework::OpProtoAndCheckerMaker {
[[12, 13, 14, 15], [[12, 13, 14, 15],
[16, 17, 18, 19], [16, 17, 18, 19],
[20, 21, 22, 23]]] [20, 21, 22, 23]]]
X.shape = (2, 3, 4) X.shape = (2, 3, 4)
*Case 1: *Case 1:
...@@ -81,7 +81,7 @@ class GatherNdOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -81,7 +81,7 @@ class GatherNdOpMaker : public framework::OpProtoAndCheckerMaker {
Index = [[1]] Index = [[1]]
we get: we get:
Out = Out =
[[12, 13, 14, 15], [[12, 13, 14, 15],
[16, 17, 18, 19], [16, 17, 18, 19],
[20, 21, 22, 23]] [20, 21, 22, 23]]
...@@ -91,7 +91,7 @@ class GatherNdOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -91,7 +91,7 @@ class GatherNdOpMaker : public framework::OpProtoAndCheckerMaker {
Index = [[0,2]] Index = [[0,2]]
we get: we get:
Out = [8, 9, 10, 11] Out = [8, 9, 10, 11]
*Case 3: *Case 3:
......
...@@ -161,7 +161,7 @@ REGISTER_OP_CPU_KERNEL(gaussian_random_batch_size_like, ...@@ -161,7 +161,7 @@ REGISTER_OP_CPU_KERNEL(gaussian_random_batch_size_like,
REGISTER_OP_VERSION(gaussian_random) REGISTER_OP_VERSION(gaussian_random)
.AddCheckpoint( .AddCheckpoint(
R"ROC( R"ROC(
Upgrade gaussian_random add new inputs [ShapeTensor] and [ShapeTensorList] Upgrade gaussian_random add new inputs [ShapeTensor] and [ShapeTensorList]
and modify the attribute of [shape])ROC", and modify the attribute of [shape])ROC",
paddle::framework::compatible::OpVersionDesc() paddle::framework::compatible::OpVersionDesc()
.NewInput("ShapeTensor", .NewInput("ShapeTensor",
......
...@@ -100,7 +100,7 @@ class GeluOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -100,7 +100,7 @@ class GeluOpMaker : public framework::OpProtoAndCheckerMaker {
"(bool, default false) use approximation of gelu") "(bool, default false) use approximation of gelu")
.SetDefault(false); .SetDefault(false);
AddComment(R"DOC( AddComment(R"DOC(
Gelu Activation Operator. Gelu Activation Operator.
For more details, please refer to [Gaussian Error Linear Units](https://arxiv.org/pdf/1606.08415.pdf). For more details, please refer to [Gaussian Error Linear Units](https://arxiv.org/pdf/1606.08415.pdf).
......
...@@ -83,10 +83,10 @@ Graph Learning Send_Recv combine operator. ...@@ -83,10 +83,10 @@ Graph Learning Send_Recv combine operator.
$Out = Recv(Send(X, Src_index), Dst_index, reduce_op)$ $Out = Recv(Send(X, Src_index), Dst_index, reduce_op)$
This operator is mainly used in Graph Learning domain, and the main purpose is to reduce This operator is mainly used in Graph Learning domain, and the main purpose is to reduce
intermediate memory consumption in the process of message passing. intermediate memory consumption in the process of message passing.
Take `x` as the input tensor, we first use `src_index` to gather corresponding data, Take `x` as the input tensor, we first use `src_index` to gather corresponding data,
and then use `dst_index` to update the corresponding position of output tensor in different and then use `dst_index` to update the corresponding position of output tensor in different
pooling types, like sum, mean, max, or min. pooling types, like sum, mean, max, or min.
)DOC"); )DOC");
......
...@@ -97,7 +97,7 @@ intermediate memory consumption in the process of message passing. ...@@ -97,7 +97,7 @@ intermediate memory consumption in the process of message passing.
Take `X` as the input tensor, we first use `src_index` to gather corresponding data. Take `X` as the input tensor, we first use `src_index` to gather corresponding data.
Then the gather data should compute with `Y` in different message_ops, like add, sub, mul, and div, Then the gather data should compute with `Y` in different message_ops, like add, sub, mul, and div,
and get the computation result. Then, use `dst_index` to update the corresponding position of output and get the computation result. Then, use `dst_index` to update the corresponding position of output
tensor in different pooling types, like sum, mean, max, or min. tensor in different pooling types, like sum, mean, max, or min.
)DOC"); )DOC");
......
...@@ -89,12 +89,12 @@ class GridSampleOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -89,12 +89,12 @@ class GridSampleOpMaker : public framework::OpProtoAndCheckerMaker {
.SetDefault("zeros"); .SetDefault("zeros");
AddComment(R"DOC( AddComment(R"DOC(
This operation samples input X by using bilinear or nearest interpolation based on This operation samples input X by using bilinear or nearest interpolation based on
flow field grid, which is usually generated by affine_grid. The grid of flow field grid, which is usually generated by affine_grid. The grid of
shape [N, H, W, 2] is the concatenation of (grid_x, grid_y) coordinates shape [N, H, W, 2] is the concatenation of (grid_x, grid_y) coordinates
with shape [N, H, W] each, where grid_x is indexing the 4th dimension with shape [N, H, W] each, where grid_x is indexing the 4th dimension
(in width dimension) of input data x and grid_y is indexing the 3rd (in width dimension) of input data x and grid_y is indexing the 3rd
dimension (in height dimension), finally results is the bilinear dimension (in height dimension), finally results is the bilinear
interpolation value or nearest value of 4 nearest corner points. interpolation value or nearest value of 4 nearest corner points.
For bilinear interpolation mode: For bilinear interpolation mode:
...@@ -105,7 +105,7 @@ class GridSampleOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -105,7 +105,7 @@ class GridSampleOpMaker : public framework::OpProtoAndCheckerMaker {
grid_y = 0.5 * (grid[:, :, :, 1] + 1) * (H - 1) grid_y = 0.5 * (grid[:, :, :, 1] + 1) * (H - 1)
Step 2: Step 2:
Indices input data X with grid (x, y) in each [H, W] area, and bilinear Indices input data X with grid (x, y) in each [H, W] area, and bilinear
interpolate point value by 4 nearest points. interpolate point value by 4 nearest points.
wn ------- y_n ------- en wn ------- y_n ------- en
......
...@@ -63,7 +63,7 @@ class HashOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -63,7 +63,7 @@ class HashOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("X", "(Tensor) Input tensor of hash operator."); AddInput("X", "(Tensor) Input tensor of hash operator.");
AddOutput("Out", "(Tensor) Output tensor of hash operator."); AddOutput("Out", "(Tensor) Output tensor of hash operator.");
AddComment(R"DOC( AddComment(R"DOC(
Execute `num_hash` times xxHash algorithm on all elements on second dimension of input. Execute `num_hash` times xxHash algorithm on all elements on second dimension of input.
)DOC"); )DOC");
AddAttr<int>("num_hash", "").SetDefault(1); AddAttr<int>("num_hash", "").SetDefault(1);
AddAttr<int64_t>("mod_by", "").SetDefault(100000); AddAttr<int64_t>("mod_by", "").SetDefault(100000);
......
...@@ -82,7 +82,7 @@ take any values from (-inf, inf), but the labels should be either -1 or 1. ...@@ -82,7 +82,7 @@ take any values from (-inf, inf), but the labels should be either -1 or 1.
Then, the hinge loss is computed as follows: Then, the hinge loss is computed as follows:
$$ $$
L_(x, y) = max(1 - y.x, 0) L_(x, y) = max(1 - y.x, 0)
$$ $$
Note that the labels passed as input will have values as either 0 or 1. Note that the labels passed as input will have values as either 0 or 1.
......
...@@ -61,7 +61,7 @@ class IncrementOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -61,7 +61,7 @@ class IncrementOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
Increment Operator. Increment Operator.
The equation is: The equation is:
$$Out = X + step$$ $$Out = X + step$$
)DOC"); )DOC");
......
...@@ -30,10 +30,10 @@ class IndexSampleOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -30,10 +30,10 @@ class IndexSampleOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("Out", "Return the element of input at index"); AddOutput("Out", "Return the element of input at index");
AddComment(R"DOC( AddComment(R"DOC(
IndexSample OP returns the element of the specified location of X, IndexSample OP returns the element of the specified location of X,
and the location is specified by Index. and the location is specified by Index.
X tensor and Index tensor's shape must be 2-D, X tensor and Index tensor's shape must be 2-D,
dimension at 0 which usually is batch size must be equal. dimension at 0 which usually is batch size must be equal.
The returned tensor has the same shape and dimensions as the Index tensor. The returned tensor has the same shape and dimensions as the Index tensor.
......
...@@ -452,25 +452,25 @@ class InterpolateOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -452,25 +452,25 @@ class InterpolateOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
This operator samples input X to given output shape by using specified This operator samples input X to given output shape by using specified
interpolation method, the interpolation methods can be \"nearest\" interpolation method, the interpolation methods can be \"nearest\"
for nearest neighbor interpolation and \"bilinear\" for bilinear for nearest neighbor interpolation and \"bilinear\" for bilinear
interpolation and \"linear\" for linear interpolation.. interpolation and \"linear\" for linear interpolation..
Nearest neighbor interpolation is to perform nearest neighbor interpolation Nearest neighbor interpolation is to perform nearest neighbor interpolation
in both the 3rd dimension(in height direction) and the 4th dimension(in width in both the 3rd dimension(in height direction) and the 4th dimension(in width
direction) on input tensor. direction) on input tensor.
Linear interpolation is the method of using a line connecting two known quantities Linear interpolation is the method of using a line connecting two known quantities
to determine the value of an unknown quantity between the two known quantities. to determine the value of an unknown quantity between the two known quantities.
Bilinear interpolation is an extension of linear interpolation for Bilinear interpolation is an extension of linear interpolation for
interpolating functions of two variables (e.g. H-direction and interpolating functions of two variables (e.g. H-direction and
W-direction in this op) on a rectilinear 2D grid. The key idea is W-direction in this op) on a rectilinear 2D grid. The key idea is
to perform linear interpolation first in one direction, and then to perform linear interpolation first in one direction, and then
again in the other direction. again in the other direction.
Trilinear interpolation is an extension of linear interpolation for Trilinear interpolation is an extension of linear interpolation for
interpolating functions of three variables (e.g. D-direction, interpolating functions of three variables (e.g. D-direction,
H-direction and W-direction in this op) on a rectilinear 3D grid. H-direction and W-direction in this op) on a rectilinear 3D grid.
The linear interpolation is performed on three directions. The linear interpolation is performed on three directions.
Bicubic interpolation is an extension of cubic interpolation for interpolating Bicubic interpolation is an extension of cubic interpolation for interpolating
...@@ -478,24 +478,24 @@ class InterpolateOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -478,24 +478,24 @@ class InterpolateOpMaker : public framework::OpProtoAndCheckerMaker {
smoother than corresponding surfaces obtained by bilinear interpolation or smoother than corresponding surfaces obtained by bilinear interpolation or
nearest-neighbor interpolation. nearest-neighbor interpolation.
Align_corners and align_mode are optional parameters,the calculation method Align_corners and align_mode are optional parameters,the calculation method
of interpolation can be selected by them. of interpolation can be selected by them.
Example: Example:
For scale: For scale:
if align_corners = True and out_{size}>1 : if align_corners = True and out_{size}>1 :
scale_{factor} = (in_{size}-1.0)/(out_{size}-1.0) scale_{factor} = (in_{size}-1.0)/(out_{size}-1.0)
else: else:
scale_{factor} = float(in_{size}/out_{size}) scale_{factor} = float(in_{size}/out_{size})
Nearest neighbor interpolation: Nearest neighbor interpolation:
if: if:
align_corners = False align_corners = False
...@@ -518,16 +518,16 @@ class InterpolateOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -518,16 +518,16 @@ class InterpolateOpMaker : public framework::OpProtoAndCheckerMaker {
if: if:
align_corners = False , align_mode = 0 align_corners = False , align_mode = 0
input : (N,C,H_in,W_in) input : (N,C,H_in,W_in)
output: (N,C,H_out,W_out) where: output: (N,C,H_out,W_out) where:
H_out = (H_{in}+0.5) * scale_{factor} - 0.5 H_out = (H_{in}+0.5) * scale_{factor} - 0.5
W_out = (W_{in}+0.5) * scale_{factor} - 0.5 W_out = (W_{in}+0.5) * scale_{factor} - 0.5
else: else:
input : (N,C,H_in,W_in) input : (N,C,H_in,W_in)
output: (N,C,H_out,W_out) where: output: (N,C,H_out,W_out) where:
...@@ -538,17 +538,17 @@ class InterpolateOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -538,17 +538,17 @@ class InterpolateOpMaker : public framework::OpProtoAndCheckerMaker {
if: if:
align_corners = False , align_mode = 0 align_corners = False , align_mode = 0
input : (N,C,D_in,H_in,W_in) input : (N,C,D_in,H_in,W_in)
output: (N,C,D_out,H_out,W_out) where: output: (N,C,D_out,H_out,W_out) where:
D_out = (D_{in}+0.5) * scale_{factor} - 0.5 D_out = (D_{in}+0.5) * scale_{factor} - 0.5
H_out = (H_{in}+0.5) * scale_{factor} - 0.5 H_out = (H_{in}+0.5) * scale_{factor} - 0.5
W_out = (W_{in}+0.5) * scale_{factor} - 0.5 W_out = (W_{in}+0.5) * scale_{factor} - 0.5
else: else:
input : (N,C,D_in,H_in,W_in) input : (N,C,D_in,H_in,W_in)
output: (N,C,D_out,H_out,W_out) where: output: (N,C,D_out,H_out,W_out) where:
...@@ -570,13 +570,13 @@ class InterpolateOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -570,13 +570,13 @@ class InterpolateOpMaker : public framework::OpProtoAndCheckerMaker {
H_out = H_{in} * scale_{factor} H_out = H_{in} * scale_{factor}
W_out = W_{in} * scale_{factor} W_out = W_{in} * scale_{factor}
For details of nearest neighbor interpolation, please refer to Wikipedia: For details of nearest neighbor interpolation, please refer to Wikipedia:
https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation
For details of bilinear interpolation, please refer to Wikipedia: For details of bilinear interpolation, please refer to Wikipedia:
https://en.wikipedia.org/wiki/Bilinear_interpolation https://en.wikipedia.org/wiki/Bilinear_interpolation
For details of trilinear interpolation, please refer to Wikipedia: For details of trilinear interpolation, please refer to Wikipedia:
https://en.wikipedia.org/wiki/Trilinear_interpolation https://en.wikipedia.org/wiki/Trilinear_interpolation
For details of bicubic interpolation, please refer to Wikipedia: For details of bicubic interpolation, please refer to Wikipedia:
......
...@@ -553,25 +553,25 @@ class InterpolateV2OpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -553,25 +553,25 @@ class InterpolateV2OpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
This operator samples input X to given output shape by using specified This operator samples input X to given output shape by using specified
interpolation method, the interpolation methods can be \"nearest\" interpolation method, the interpolation methods can be \"nearest\"
for nearest neighbor interpolation and \"bilinear\" for bilinear for nearest neighbor interpolation and \"bilinear\" for bilinear
interpolation and \"linear\" for linear interpolation.. interpolation and \"linear\" for linear interpolation..
Nearest neighbor interpolation is to perform nearest neighbor interpolation Nearest neighbor interpolation is to perform nearest neighbor interpolation
in both the 3rd dimension(in height direction) and the 4th dimension(in width in both the 3rd dimension(in height direction) and the 4th dimension(in width
direction) on input tensor. direction) on input tensor.
Linear interpolation is the method of using a line connecting two known quantities Linear interpolation is the method of using a line connecting two known quantities
to determine the value of an unknown quantity between the two known quantities. to determine the value of an unknown quantity between the two known quantities.
Bilinear interpolation is an extension of linear interpolation for Bilinear interpolation is an extension of linear interpolation for
interpolating functions of two variables (e.g. H-direction and interpolating functions of two variables (e.g. H-direction and
W-direction in this op) on a rectilinear 2D grid. The key idea is W-direction in this op) on a rectilinear 2D grid. The key idea is
to perform linear interpolation first in one direction, and then to perform linear interpolation first in one direction, and then
again in the other direction. again in the other direction.
Trilinear interpolation is an extension of linear interpolation for Trilinear interpolation is an extension of linear interpolation for
interpolating functions of three variables (e.g. D-direction, interpolating functions of three variables (e.g. D-direction,
H-direction and W-direction in this op) on a rectilinear 3D grid. H-direction and W-direction in this op) on a rectilinear 3D grid.
The linear interpolation is performed on three directions. The linear interpolation is performed on three directions.
Bicubic interpolation is an extension of cubic interpolation for interpolating Bicubic interpolation is an extension of cubic interpolation for interpolating
...@@ -579,24 +579,24 @@ class InterpolateV2OpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -579,24 +579,24 @@ class InterpolateV2OpMaker : public framework::OpProtoAndCheckerMaker {
smoother than corresponding surfaces obtained by bilinear interpolation or smoother than corresponding surfaces obtained by bilinear interpolation or
nearest-neighbor interpolation. nearest-neighbor interpolation.
Align_corners and align_mode are optional parameters,the calculation method Align_corners and align_mode are optional parameters,the calculation method
of interpolation can be selected by them. of interpolation can be selected by them.
Example: Example:
For scale: For scale:
if align_corners = True and out_{size}>1 : if align_corners = True and out_{size}>1 :
scale_{factor} = (in_{size}-1.0)/(out_{size}-1.0) scale_{factor} = (in_{size}-1.0)/(out_{size}-1.0)
else: else:
scale_{factor} = float(in_{size}/out_{size}) scale_{factor} = float(in_{size}/out_{size})
Nearest neighbor interpolation: Nearest neighbor interpolation:
if: if:
align_corners = False align_corners = False
...@@ -619,16 +619,16 @@ class InterpolateV2OpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -619,16 +619,16 @@ class InterpolateV2OpMaker : public framework::OpProtoAndCheckerMaker {
if: if:
align_corners = False , align_mode = 0 align_corners = False , align_mode = 0
input : (N,C,H_in,W_in) input : (N,C,H_in,W_in)
output: (N,C,H_out,W_out) where: output: (N,C,H_out,W_out) where:
H_out = (H_{in}+0.5) * scale_{factor} - 0.5 H_out = (H_{in}+0.5) * scale_{factor} - 0.5
W_out = (W_{in}+0.5) * scale_{factor} - 0.5 W_out = (W_{in}+0.5) * scale_{factor} - 0.5
else: else:
input : (N,C,H_in,W_in) input : (N,C,H_in,W_in)
output: (N,C,H_out,W_out) where: output: (N,C,H_out,W_out) where:
...@@ -639,17 +639,17 @@ class InterpolateV2OpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -639,17 +639,17 @@ class InterpolateV2OpMaker : public framework::OpProtoAndCheckerMaker {
if: if:
align_corners = False , align_mode = 0 align_corners = False , align_mode = 0
input : (N,C,D_in,H_in,W_in) input : (N,C,D_in,H_in,W_in)
output: (N,C,D_out,H_out,W_out) where: output: (N,C,D_out,H_out,W_out) where:
D_out = (D_{in}+0.5) * scale_{factor} - 0.5 D_out = (D_{in}+0.5) * scale_{factor} - 0.5
H_out = (H_{in}+0.5) * scale_{factor} - 0.5 H_out = (H_{in}+0.5) * scale_{factor} - 0.5
W_out = (W_{in}+0.5) * scale_{factor} - 0.5 W_out = (W_{in}+0.5) * scale_{factor} - 0.5
else: else:
input : (N,C,D_in,H_in,W_in) input : (N,C,D_in,H_in,W_in)
output: (N,C,D_out,H_out,W_out) where: output: (N,C,D_out,H_out,W_out) where:
...@@ -671,13 +671,13 @@ class InterpolateV2OpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -671,13 +671,13 @@ class InterpolateV2OpMaker : public framework::OpProtoAndCheckerMaker {
H_out = H_{in} * scale_{factor} H_out = H_{in} * scale_{factor}
W_out = W_{in} * scale_{factor} W_out = W_{in} * scale_{factor}
For details of nearest neighbor interpolation, please refer to Wikipedia: For details of nearest neighbor interpolation, please refer to Wikipedia:
https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation
For details of bilinear interpolation, please refer to Wikipedia: For details of bilinear interpolation, please refer to Wikipedia:
https://en.wikipedia.org/wiki/Bilinear_interp_v2olation https://en.wikipedia.org/wiki/Bilinear_interp_v2olation
For details of trilinear interpolation, please refer to Wikipedia: For details of trilinear interpolation, please refer to Wikipedia:
https://en.wikipedia.org/wiki/Trilinear_interp_v2olation https://en.wikipedia.org/wiki/Trilinear_interp_v2olation
For details of bicubic interpolation, please refer to Wikipedia: For details of bicubic interpolation, please refer to Wikipedia:
......
...@@ -46,7 +46,7 @@ class IscloseOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -46,7 +46,7 @@ class IscloseOpMaker : public framework::OpProtoAndCheckerMaker {
"compared as equal. Default: :math:`False` .") "compared as equal. Default: :math:`False` .")
.SetDefault(false); .SetDefault(false);
AddComment(R"DOC( AddComment(R"DOC(
This operator checks if all :math:`x` and :math:`y` satisfy the condition: This operator checks if all :math:`x` and :math:`y` satisfy the condition:
.. math:: .. math::
......
...@@ -72,19 +72,19 @@ class KLDivLossOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -72,19 +72,19 @@ class KLDivLossOpMaker : public framework::OpProtoAndCheckerMaker {
While :math:`x` is Input(X) and :math:`y` is Input(Target). While :math:`x` is Input(X) and :math:`y` is Input(Target).
While :attr:`reduction` is :attr:`none`, output loss is in While :attr:`reduction` is :attr:`none`, output loss is in
the same shape as Input(X), loss in each point is calculated the same shape as Input(X), loss in each point is calculated
seperately and no reduction is applied. seperately and no reduction is applied.
While :attr:`reduction` is :attr:`mean`, output loss is in While :attr:`reduction` is :attr:`mean`, output loss is in
shape of [1] and loss value is the mean value of all losses. shape of [1] and loss value is the mean value of all losses.
While :attr:`reduction` is :attr:`sum`, output loss is in While :attr:`reduction` is :attr:`sum`, output loss is in
shape of [1] and loss value is the sum value of all losses. shape of [1] and loss value is the sum value of all losses.
While :attr:`reduction` is :attr:`batchmean`, output loss is While :attr:`reduction` is :attr:`batchmean`, output loss is
in shape of [1] and loss value is the sum value of all losses in shape of [1] and loss value is the sum value of all losses
divided by batch size. divided by batch size.
)DOC"); )DOC");
} }
}; };
......
...@@ -63,14 +63,14 @@ class KronOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -63,14 +63,14 @@ class KronOpMaker : public framework::OpProtoAndCheckerMaker {
Kron Operator. Kron Operator.
This operator computes the Kronecker product of two tensors, a This operator computes the Kronecker product of two tensors, a
composite tensor made of blocks of the second tensor scaled by the composite tensor made of blocks of the second tensor scaled by the
first. first.
This operator assumes that the rank of the two tensors, $X$ and $Y$ This operator assumes that the rank of the two tensors, $X$ and $Y$
are the same, if necessary prepending the smallest with ones. If the are the same, if necessary prepending the smallest with ones. If the
shape of $X$ is [$r_0$, $r_1$, ..., $r_N$] and the shape of $Y$ is shape of $X$ is [$r_0$, $r_1$, ..., $r_N$] and the shape of $Y$ is
[$s_0$, $s_1$, ..., $s_N$], then the shape of the output tensor is [$s_0$, $s_1$, ..., $s_N$], then the shape of the output tensor is
[$r_{0}s_{0}$, $r_{1}s_{1}$, ..., $r_{N}s_{N}$]. The elements are [$r_{0}s_{0}$, $r_{1}s_{1}$, ..., $r_{N}s_{N}$]. The elements are
products of elements from $X$ and $Y$. products of elements from $X$ and $Y$.
The equation is: The equation is:
......
...@@ -92,23 +92,23 @@ class LabelSmoothOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -92,23 +92,23 @@ class LabelSmoothOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
LabelSmooth Operator. LabelSmooth Operator.
Label smoothing is a mechanism to regularize the classifier layer. In machine Label smoothing is a mechanism to regularize the classifier layer. In machine
learning, optimizing the log-likelihood of the correct label directly may learning, optimizing the log-likelihood of the correct label directly may
cause two problems. First, it may result in overfitting: if the model learns cause two problems. First, it may result in overfitting: if the model learns
to assign full probability to the ground-truth label for each training example, to assign full probability to the ground-truth label for each training example,
it is not guaranteed to generalize. Second, it encourages the differences it is not guaranteed to generalize. Second, it encourages the differences
between the largest logit and all others to become large, reducing the ability between the largest logit and all others to become large, reducing the ability
of the model to adapt. Label smoothing is proposed to encourage the model to of the model to adapt. Label smoothing is proposed to encourage the model to
be less confident, which replaces the ground-truth label $y$ with the weighted be less confident, which replaces the ground-truth label $y$ with the weighted
sum of itself and some fixed distribution $\mu$, i.e. sum of itself and some fixed distribution $\mu$, i.e.
$$ $$
\tilde{y} = (1 - \epsilon) * y + \epsilon * \mu, \tilde{y} = (1 - \epsilon) * y + \epsilon * \mu,
$$ $$
where $(1 - \epsilon)$ and $\epsilon$ are the weights respectively, and where $(1 - \epsilon)$ and $\epsilon$ are the weights respectively, and
$\tilde{y}$ is the smoothed label. Usually uniform distribution is used for $\tilde{y}$ is the smoothed label. Usually uniform distribution is used for
$\mu$. This change in the ground-truth label is called label-smoothing $\mu$. This change in the ground-truth label is called label-smoothing
regularization or LSR. regularization or LSR.
See more details about label smoothing in https://arxiv.org/abs/1512.00567. See more details about label smoothing in https://arxiv.org/abs/1512.00567.
......
...@@ -54,11 +54,11 @@ class LogspaceOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -54,11 +54,11 @@ class LogspaceOpMaker : public framework::OpProtoAndCheckerMaker {
AddAttr<int>("dtype", "The output data type."); AddAttr<int>("dtype", "The output data type.");
AddOutput("Out", "A sequence of numbers."); AddOutput("Out", "A sequence of numbers.");
AddComment(R"DOC( AddComment(R"DOC(
Return fixed number of logarithmical-evenly spaced values within a given Return fixed number of logarithmical-evenly spaced values within a given
interval. First entry is exponential of Start with base Base, and last interval. First entry is exponential of Start with base Base, and last
entry is exponential of Stop with base Base. In the case when Num is 1, entry is exponential of Stop with base Base. In the case when Num is 1,
only exponential of Start with base Base is returned. If dtype is int32 only exponential of Start with base Base is returned. If dtype is int32
or int64, the decimal part of values will be truncated. or int64, the decimal part of values will be truncated.
Like logspace function of numpy. Like logspace function of numpy.
)DOC"); )DOC");
} }
......
...@@ -114,7 +114,7 @@ Lookup Table Dequant Operator. ...@@ -114,7 +114,7 @@ Lookup Table Dequant Operator.
The `W` input is a quantized parameter for the sake of saving memories. The `W` input is a quantized parameter for the sake of saving memories.
This operator first index embeddings with `Ids`, This operator first index embeddings with `Ids`,
then dequantizes them and contact them as output (`Out`). then dequantizes them and contact them as output (`Out`).
The input Ids can carry the LoD (Level of Details) information, The input Ids can carry the LoD (Level of Details) information,
or not. And the output only shares the LoD information with input Ids. or not. And the output only shares the LoD information with input Ids.
......
...@@ -259,11 +259,11 @@ class LSTMPOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -259,11 +259,11 @@ class LSTMPOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
Long-Short Term Memory with recurrent Projection layer (LSTMP) Operator. Long-Short Term Memory with recurrent Projection layer (LSTMP) Operator.
LSTMP has a separate projection layer after the LSTM layer, projecting the LSTMP has a separate projection layer after the LSTM layer, projecting the
original hidden state to a lower-dimensional one, which is proposed to reduce original hidden state to a lower-dimensional one, which is proposed to reduce
the number of total parameters and furthermore computational complexity for the number of total parameters and furthermore computational complexity for
the LSTM, espeacially for the case that the size of output units is relative the LSTM, espeacially for the case that the size of output units is relative
large (https://research.google.com/pubs/archive/43905.pdf). large (https://research.google.com/pubs/archive/43905.pdf).
The formula is as follows: The formula is as follows:
...@@ -291,14 +291,14 @@ denote bias vectors ($b_i$ is the input gate bias vector), $\sigma$ ...@@ -291,14 +291,14 @@ denote bias vectors ($b_i$ is the input gate bias vector), $\sigma$
is the activation, such as logistic sigmoid function, and is the activation, such as logistic sigmoid function, and
$i, f, o$ and $c$ are the input gate, forget gate, output gate, $i, f, o$ and $c$ are the input gate, forget gate, output gate,
and cell activation vectors, respectively, all of which have the same size as and cell activation vectors, respectively, all of which have the same size as
the cell output activation vector $h$. Here $h$ is usually called the hidden the cell output activation vector $h$. Here $h$ is usually called the hidden
state and $r$ denotes its recurrent projection. And $\tilde{c_t}$ is also state and $r$ denotes its recurrent projection. And $\tilde{c_t}$ is also
called the candidate hidden state, whose computation is based on the current called the candidate hidden state, whose computation is based on the current
input and previous hidden state. input and previous hidden state.
The $\odot$ is the element-wise product of the vectors. $act_g$ and $act_h$ The $\odot$ is the element-wise product of the vectors. $act_g$ and $act_h$
are the cell input and cell output activation functions and `tanh` is usually are the cell input and cell output activation functions and `tanh` is usually
used for them. $\overline{act_h}$ is the activation function for the used for them. $\overline{act_h}$ is the activation function for the
projection output, usually using `identity` or same as $act_h$. projection output, usually using `identity` or same as $act_h$.
Note that these $W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}$ Note that these $W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}$
......
...@@ -24,7 +24,7 @@ namespace operators { ...@@ -24,7 +24,7 @@ namespace operators {
class LUOpMaker : public framework::OpProtoAndCheckerMaker { class LUOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
void Make() override { void Make() override {
AddComment(R"DOC(LU decomposition, AddComment(R"DOC(LU decomposition,
Computes the LU factorization of a matrix or batches of matrices A. Computes the LU factorization of a matrix or batches of matrices A.
)DOC"); )DOC");
AddInput("X", "(Tensor) The input tensor, shape of (*,m,n)"); AddInput("X", "(Tensor) The input tensor, shape of (*,m,n)");
......
...@@ -24,7 +24,7 @@ namespace operators { ...@@ -24,7 +24,7 @@ namespace operators {
class LU_UnpackOpMaker : public framework::OpProtoAndCheckerMaker { class LU_UnpackOpMaker : public framework::OpProtoAndCheckerMaker {
public: public:
void Make() override { void Make() override {
AddComment(R"DOC(Unpack L U and P to single matrix tensor, AddComment(R"DOC(Unpack L U and P to single matrix tensor,
unpack L and U matrix from LU, unpack permutation matrix Pmat from Pivtos . unpack L and U matrix from LU, unpack permutation matrix Pmat from Pivtos .
)DOC"); )DOC");
AddInput("X", "(Tensor) The input LU tensor, shape of (*,m,n)"); AddInput("X", "(Tensor) The input LU tensor, shape of (*,m,n)");
......
...@@ -102,19 +102,19 @@ class MarginRankLossOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -102,19 +102,19 @@ class MarginRankLossOpMaker : public framework::OpProtoAndCheckerMaker {
MarginRankLoss Operator. MarginRankLoss Operator.
This operator measures the loss given a pair of training sample This operator measures the loss given a pair of training sample
{`X1`, `X2`} and the `Label` with attribute `margin`, where `Label = +1` {`X1`, `X2`} and the `Label` with attribute `margin`, where `Label = +1`
indicating X1 is ranked higher than `X2` and `Label = -1` otherwise. The loss indicating X1 is ranked higher than `X2` and `Label = -1` otherwise. The loss
is calculated as: is calculated as:
$loss(X1, X2, Label) = \max(0, -Label * (X1 - X2) + margin)$ $loss(X1, X2, Label) = \max(0, -Label * (X1 - X2) + margin)$
The attribute `margin` here helps make the predictions more robust. The attribute `margin` here helps make the predictions more robust.
Denote the item ranked higher as the positive sample, otherwise the negative Denote the item ranked higher as the positive sample, otherwise the negative
sample. If the score of the two samples satisfies sample. If the score of the two samples satisfies
$positive sample - negative sample < margin$ $positive sample - negative sample < margin$
the pair of samples will contribute to the final loss, which will backpropagate the pair of samples will contribute to the final loss, which will backpropagate
and train the ranking model to enlarge the difference between the two scores. and train the ranking model to enlarge the difference between the two scores.
For batch input with size `batch_size`, `X1`, `X2` and `Label` For batch input with size `batch_size`, `X1`, `X2` and `Label`
......
...@@ -230,9 +230,9 @@ void MatchMatrixTensorOpMaker::Make() { ...@@ -230,9 +230,9 @@ void MatchMatrixTensorOpMaker::Make() {
Match Matrix Tensor Operator Match Matrix Tensor Operator
This operator calculate X * W * Y, only support 2-D for X and Y. This operator calculate X * W * Y, only support 2-D for X and Y.
the output is a level-1 LodTensor: the output is a level-1 LodTensor:
level_0: dim_t level_0: dim_t
NOTE: only support 'float32' data type now. NOTE: only support 'float32' data type now.
)DOC"); )DOC");
......
...@@ -193,8 +193,8 @@ class MatMulV2OpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -193,8 +193,8 @@ class MatMulV2OpMaker : public framework::OpProtoAndCheckerMaker {
"doing multiplication") "doing multiplication")
.SetDefault(false); .SetDefault(false);
AddComment( AddComment(
R"DOC(Matrix multiplication Out = X * Y. A has shape (d0, d1 ... M, K), R"DOC(Matrix multiplication Out = X * Y. A has shape (d0, d1 ... M, K),
B has shape (d0, d1 ... K, N), Out has shape ((d0, d1 ... M, N)). B has shape (d0, d1 ... K, N), Out has shape ((d0, d1 ... M, N)).
In addition, it also follows the broadcast rule which is similar as In addition, it also follows the broadcast rule which is similar as
numpy.matmul. numpy.matmul.
)DOC"); )DOC");
......
...@@ -87,10 +87,10 @@ class MeanIoUOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -87,10 +87,10 @@ class MeanIoUOpMaker : public framework::OpProtoAndCheckerMaker {
mean-IOU Operator. mean-IOU Operator.
Mean Intersection-Over-Union is a common evaluation metric for Mean Intersection-Over-Union is a common evaluation metric for
semantic image segmentation, which first computes the IOU for each semantic image segmentation, which first computes the IOU for each
semantic class and then computes the average over classes. semantic class and then computes the average over classes.
IOU is defined as follows: IOU is defined as follows:
IOU = true_positive / (true_positive + false_positive + false_negative). IOU = true_positive / (true_positive + false_positive + false_negative).
It is based on pixel level area while "IOU Similarity Operator" It is based on pixel level area while "IOU Similarity Operator"
is based on area of rectangle. is based on area of rectangle.
)DOC"); )DOC");
......
...@@ -118,7 +118,7 @@ class MemcpyOpProtoMaker : public framework::OpProtoAndCheckerMaker { ...@@ -118,7 +118,7 @@ class MemcpyOpProtoMaker : public framework::OpProtoAndCheckerMaker {
"6: dst is on CustomDevicePlace"); "6: dst is on CustomDevicePlace");
AddComment(R"DOC( AddComment(R"DOC(
Memcpy Operator. Memcpy Operator.
By now, it ONLY supports the memcopy between CUDAPinnedPlace <-> CUDAPlace or By now, it ONLY supports the memcopy between CUDAPinnedPlace <-> CUDAPlace or
NPUPlace <-> CPUPlace, and used as an internal op by Recompute-Offload. NPUPlace <-> CPUPlace, and used as an internal op by Recompute-Offload.
You would have to update it if you want other more capacities. You would have to update it if you want other more capacities.
......
...@@ -65,7 +65,7 @@ Take: N tensors, each of which can be either scalr or 1-dimensional vector, and ...@@ -65,7 +65,7 @@ Take: N tensors, each of which can be either scalr or 1-dimensional vector, and
N-dimensional grids. N-dimensional grids.
Args: Args:
tensors (list of tensor): if the input k tensors has (N1,), (N2,),..., (Nk,), then tensors (list of tensor): if the input k tensors has (N1,), (N2,),..., (Nk,), then
the output tensors are all of size (N1, N2, ...., Nk). the output tensors are all of size (N1, N2, ...., Nk).
Example:: Example::
......
...@@ -44,7 +44,7 @@ class AccuracyOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -44,7 +44,7 @@ class AccuracyOpMaker : public framework::OpProtoAndCheckerMaker {
AddOutput("Total", "The samples count of current batch"); AddOutput("Total", "The samples count of current batch");
AddComment(R"DOC( AddComment(R"DOC(
Accuracy Operator. Accuracy Operator.
It will print accuracy rate for classification. It will print accuracy rate for classification.
The accuracy is calculated as follows: The accuracy is calculated as follows:
...@@ -52,7 +52,7 @@ The accuracy is calculated as follows: ...@@ -52,7 +52,7 @@ The accuracy is calculated as follows:
$$accuracy = \frac{NumOfCorrectPredicts}{NumOfAllSamples}$$ $$accuracy = \frac{NumOfCorrectPredicts}{NumOfAllSamples}$$
Both the input Out and Label can carry the LoD (Level of Details) Both the input Out and Label can carry the LoD (Level of Details)
information, or not. But the output only shares the LoD information information, or not. But the output only shares the LoD information
with the input Out(Inference). with the input Out(Inference).
)DOC"); )DOC");
......
...@@ -51,7 +51,7 @@ class ModeOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -51,7 +51,7 @@ class ModeOpMaker : public framework::OpProtoAndCheckerMaker {
.SetDefault(-1); .SetDefault(-1);
AddAttr<bool>("keepdim", "Keep the dim that to reduce.").SetDefault(false); AddAttr<bool>("keepdim", "Keep the dim that to reduce.").SetDefault(false);
AddComment(R"DOC( AddComment(R"DOC(
This operator finds the mode of input Tensor. And outputs their values and indices as vectors. This operator finds the mode of input Tensor. And outputs their values and indices as vectors.
)DOC"); )DOC");
} }
}; };
......
...@@ -86,7 +86,7 @@ Since target Y is not differentiable, calculating gradient for Y is illegal. ...@@ -86,7 +86,7 @@ Since target Y is not differentiable, calculating gradient for Y is illegal.
The formula of modified huber loss is: The formula of modified huber loss is:
$$ $$
L(y, f(x)) = L(y, f(x)) =
\begin{cases} \begin{cases}
(\max(0, 1 - yf(x)))^2, \text{if} \ yf(x) >= -1 \\ (\max(0, 1 - yf(x)))^2, \text{if} \ yf(x) >= -1 \\
-4yf(x), \quad \text{otherwise} -4yf(x), \quad \text{otherwise}
......
...@@ -82,10 +82,10 @@ The loss can be described as: ...@@ -82,10 +82,10 @@ The loss can be described as:
$Out[i] = -X[Label[i]]*Weight[Label[i]]$ $Out[i] = -X[Label[i]]*Weight[Label[i]]$
It can also be used for higher dimension inputs, such as 2D images, by It can also be used for higher dimension inputs, such as 2D images, by
providing an input of shape (batch_size, C, d1, d2, ..., dK), with providing an input of shape (batch_size, C, d1, d2, ..., dK), with
K >= 1, where K is the number of dimensions, and a Label of K >= 1, where K is the number of dimensions, and a Label of
appropriate shape. In the case of images, it computes NLL loss appropriate shape. In the case of images, it computes NLL loss
per-pixel. per-pixel.
)DOC"); )DOC");
......
...@@ -54,7 +54,7 @@ y = \frac{x}{ \sqrt{\sum {x^2} + epsion }} ...@@ -54,7 +54,7 @@ y = \frac{x}{ \sqrt{\sum {x^2} + epsion }}
$$ $$
where, $\sum {x^2}$ is calculated along the `axis` dimension. where, $\sum {x^2}$ is calculated along the `axis` dimension.
)DOC"); )DOC");
} }
}; };
......
...@@ -116,7 +116,7 @@ class DpsgdOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -116,7 +116,7 @@ class DpsgdOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
Dpsgd Optimizer. Dpsgd Optimizer.
We implement the Dpsgd optimizer according to CCS16 paper - We implement the Dpsgd optimizer according to CCS16 paper -
Deep Learning with Differential Privacy. Deep Learning with Differential Privacy.
Dpsgd updates: Dpsgd updates:
......
...@@ -101,8 +101,8 @@ class LambOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -101,8 +101,8 @@ class LambOpMaker : public framework::OpProtoAndCheckerMaker {
AddComment(R"DOC( AddComment(R"DOC(
LAMB (Layer-wise Adaptive Moments optimizer for Batching training) Optimizer. LAMB (Layer-wise Adaptive Moments optimizer for Batching training) Optimizer.
LAMB Optimizer is designed to scale up the batch size of training without losing LAMB Optimizer is designed to scale up the batch size of training without losing
accuracy, which supports adaptive element-wise updating and accurate layer-wise accuracy, which supports adaptive element-wise updating and accurate layer-wise
correction. For more information, please refer to https://arxiv.org/abs/1904.00962. correction. For more information, please refer to https://arxiv.org/abs/1904.00962.
The updating of parameters follows: The updating of parameters follows:
...@@ -121,7 +121,7 @@ r_t &= \frac{m_t}{\sqrt{v_t}+\epsilon} \\ ...@@ -121,7 +121,7 @@ r_t &= \frac{m_t}{\sqrt{v_t}+\epsilon} \\
w_t &= w_{t-1} -\eta_t \frac{\left \| w_{t-1}\right \|}{\left \| r_t + \lambda w_{t-1}\right \|} (r_t + \lambda w_{t-1}) w_t &= w_{t-1} -\eta_t \frac{\left \| w_{t-1}\right \|}{\left \| r_t + \lambda w_{t-1}\right \|} (r_t + \lambda w_{t-1})
$$ $$
where $m$ is the 1st moment, and $v$ the 2nd moment, $\eta$ the where $m$ is the 1st moment, and $v$ the 2nd moment, $\eta$ the
learning rate, $\lambda$ the weight decay rate. learning rate, $\lambda$ the weight decay rate.
)DOC"); )DOC");
} }
......
...@@ -62,11 +62,11 @@ class Pow2DecayWithLinearWarmupOpMaker ...@@ -62,11 +62,11 @@ class Pow2DecayWithLinearWarmupOpMaker
AddComment(R"DOC( AddComment(R"DOC(
The Pow2DecayWithLinearWarmup learning rate scheduler. The Pow2DecayWithLinearWarmup learning rate scheduler.
When step_num < warmup_steps, lr = base_lr * step_num / warmup_steps When step_num < warmup_steps, lr = base_lr * step_num / warmup_steps
When warmup_steps <= step_num <= total_steps, When warmup_steps <= step_num <= total_steps,
factor = 1 - (step_num - warmup_steps) / (total_steps - warmup_steps) factor = 1 - (step_num - warmup_steps) / (total_steps - warmup_steps)
lr = (base_lr - end_lr) * factor * factor + end_lr lr = (base_lr - end_lr) * factor * factor + end_lr
When step_num > total_steps, lr = end_lr When step_num > total_steps, lr = end_lr
......
...@@ -119,9 +119,9 @@ param = sign(prox\_param) / (1 + learning\_rate * l2) * ...@@ -119,9 +119,9 @@ param = sign(prox\_param) / (1 + learning\_rate * l2) *
\max(|prox\_param| - learning\_rate * l1 , 0) \max(|prox\_param| - learning\_rate * l1 , 0)
$$ $$
The paper that proposed Proximal GD: The paper that proposed Proximal GD:
(http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf) (http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf)
Here, we use the adagrad learning rate as specified here: Here, we use the adagrad learning rate as specified here:
(http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf) (http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
)DOC"); )DOC");
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册