提交 5d8cdf20 编写于 作者: K kexinzhao 提交者: Yi Wang

Polish operator docs (n to p) (#5376)

* polish p ops

* fix precision_recall

* fix linear_chain_crf_op

* small fix
上级 fb2aa717
...@@ -23,21 +23,21 @@ class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -23,21 +23,21 @@ class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker {
framework::OpAttrChecker* op_checker) framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("Emission", AddInput("Emission",
"(LoDTensor, default: LoDTensor<float>). " "(LoDTensor, default LoDTensor<float>) "
"A 2-D LoDTensor with shape [N x D] where N is the size of the " "A 2-D LoDTensor with shape [N x D], where N is the size of the "
"mini-batch and D is the total tag number. The unscaled emission " "mini-batch and D is the total tag number. The unscaled emission "
"weight matrix for the linear chain CRF. "); "weight matrix for the linear chain CRF. ");
AddInput("Transition", AddInput("Transition",
"(Tensor, default: Tensor<float>). A 2-D Tensor with shape " "(Tensor, default Tensor<float>) A 2-D Tensor with shape "
"[(D + 2) x D]. The learnable parameter for the linear_chain_crf " "[(D + 2) x D]. The learnable parameter for the linear_chain_crf "
"operator. See more details in the operator's comments."); "operator. See more details in the operator's comments.");
AddInput("Label", AddInput("Label",
"(LoDTensor, default: LoDTensor<int>). A LoDTensor with shape " "(LoDTensor, default LoDTensor<int>) A LoDTensor with shape "
"[N x 1], where N is the total element number in a mini-batch. " "[N x 1], where N is the total element number in a mini-batch. "
"The ground truth."); "The ground truth.");
AddOutput( AddOutput(
"Alpha", "Alpha",
"(Tensor, default: Tensor<float>). A 2-D Tensor with shape [N x D]. " "(Tensor, default Tensor<float>) A 2-D Tensor with shape [N x D]. "
"The forward vectors for the entire batch. Denote it as \f$\alpha\f$. " "The forward vectors for the entire batch. Denote it as \f$\alpha\f$. "
"\f$\alpha$\f is a memo table used to calculate the normalization " "\f$\alpha$\f is a memo table used to calculate the normalization "
"factor in CRF. \f$\alpha[k, v]$\f stores the unnormalized " "factor in CRF. \f$\alpha[k, v]$\f stores the unnormalized "
...@@ -49,26 +49,28 @@ class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -49,26 +49,28 @@ class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker {
.AsIntermediate(); .AsIntermediate();
AddOutput( AddOutput(
"EmissionExps", "EmissionExps",
"(Tensor, default: Tensor<float>). A 2-D Tensor with shape [N x D]. " "(Tensor, default Tensor<float>) A 2-D Tensor with shape [N x D]. "
"The exponentials of Input(Emission). This is an intermediate " "The exponentials of Input(Emission). This is an intermediate "
"computational result in forward computation, and will be reused in " "computational result in forward computation, and will be reused in "
"backward computation.") "backward computation.")
.AsIntermediate(); .AsIntermediate();
AddOutput( AddOutput(
"TransitionExps", "TransitionExps",
"(Tensor, default: Tensor<float>). A 2-D Tensor with shape " "(Tensor, default Tensor<float>) A 2-D Tensor with shape "
"[(D + 2) x D]. The exponentials of Input(Transition). This is an " "[(D + 2) x D]. The exponentials of Input(Transition). This is an "
"intermediate computational result in forward computation, and " "intermediate computational result in forward computation, and "
"will be reused in backward computation.") "will be reused in backward computation.")
.AsIntermediate(); .AsIntermediate();
AddOutput( AddOutput(
"LogLikelihood", "LogLikelihood",
"(Tensor, default: Tensor<float>). The logarithm of the conditional " "(Tensor, default Tensor<float>) The logarithm of the conditional "
"likelihood of each training sample in a mini-batch. This is a 2-D " "likelihood of each training sample in a mini-batch. This is a 2-D "
"tensor with shape [S x 1], where S is the sequence number in a " "tensor with shape [S x 1], where S is the sequence number in a "
"mini-batch. Note: S is equal to the sequence number in a mini-batch. " "mini-batch. Note: S is equal to the sequence number in a mini-batch. "
"The output is no longer a LoDTensor."); "The output is no longer a LoDTensor.");
AddComment(R"DOC( AddComment(R"DOC(
LinearChainCRF Operator.
Conditional Random Field defines an undirected probabilistic graph with nodes Conditional Random Field defines an undirected probabilistic graph with nodes
denoting random variables and edges denoting dependencies between these denoting random variables and edges denoting dependencies between these
variables. CRF learns the conditional probability \f$P(Y|X)\f$, where variables. CRF learns the conditional probability \f$P(Y|X)\f$, where
...@@ -82,29 +84,28 @@ and output must be linear sequences. Thus, the graph of such a CRF is a simple ...@@ -82,29 +84,28 @@ and output must be linear sequences. Thus, the graph of such a CRF is a simple
chain or a line, which results in the linear chain CRF. chain or a line, which results in the linear chain CRF.
This operator implements the Forward-Backward algorithm for the linear chain This operator implements the Forward-Backward algorithm for the linear chain
CRF. Please see http://www.cs.columbia.edu/~mcollins/fb.pdf and CRF. Please refer to http://www.cs.columbia.edu/~mcollins/fb.pdf and
http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for reference. http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for details.
Equation: Equation:
1. Denote Input(Emission) to this operator as \f$x\f$ here.
- Denote Input(Emission) to this operator as \f$x\f$ here. 2. The first D values of Input(Transition) to this operator are for starting
- The first D values of Input(Transition) to this operator are for starting
weights, denoted as \f$a\f$ here. weights, denoted as \f$a\f$ here.
- The next D values of Input(Transition) of this operator are for ending 3. The next D values of Input(Transition) of this operator are for ending
weights, denoted as \f$b\f$ here. weights, denoted as \f$b\f$ here.
- The remaning values of Input(Transition) are for transition weights, 4. The remaning values of Input(Transition) are for transition weights,
denoted as \f$w\f$ here. denoted as \f$w\f$ here.
- Denote Input(Label) as \f$s\f$ here. 5. Denote Input(Label) as \f$s\f$ here.
The probability of a sequence \f$s\f$ of length \f$L\f$ is defined as: The probability of a sequence \f$s\f$ of length \f$L\f$ is defined as:
\f$P(s) = (1/Z) exp(a_{s_1} + b_{s_L} \f$P(s) = (1/Z) \exp(a_{s_1} + b_{s_L}
+ \sum_{l=1}^L x_{s_l} + \sum_{l=1}^L x_{s_l}
+ \sum_{l=2}^L w_{s_{l-1},s_l})\f$ + \sum_{l=2}^L w_{s_{l-1},s_l})\f$
where \f$Z\f$ is a normalization value so that the sum of \f$P(s)\f$ over where \f$Z\f$ is a normalization value so that the sum of \f$P(s)\f$ over
all possible sequences is \f$1\f$, and \f$x\f$ is the emission feature weight all possible sequences is \f$1\f$, and \f$x\f$ is the emission feature weight
to the linear chain CRF. to the linear chain CRF.
Finaly, the linear chain CRF operator outputs the logarithm of the conditional Finally, the linear chain CRF operator outputs the logarithm of the conditional
likelihood of each training sample in a mini-batch. likelihood of each training sample in a mini-batch.
NOTE: NOTE:
......
...@@ -48,12 +48,17 @@ class NCCLInitOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -48,12 +48,17 @@ class NCCLInitOpMaker : public framework::OpProtoAndCheckerMaker {
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddOutput("Communicator", AddOutput("Communicator",
"Create Communicator for communicating between gpus"); "Create Communicator for communicating between gpus");
AddAttr<std::vector<int>>("gpus", "gpu id lists"); AddAttr<std::vector<int>>("gpus", "(vector<int>) GPU id lists");
AddAttr<int>("data_type", "output data type") AddAttr<int>("data_type",
"(int, default 5 (FP32)) "
"Output data type")
.SetDefault(framework::DataType::FP32); .SetDefault(framework::DataType::FP32);
AddComment(R"DOC( AddComment(R"DOC(
create communicator. NCCLInit Operator.
)DOC");
Create communicator.
)DOC");
} }
}; };
...@@ -143,11 +148,15 @@ class NCCLAllReduceOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -143,11 +148,15 @@ class NCCLAllReduceOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("Communicator", "Communicator for communicating between gpus"); AddInput("Communicator", "Communicator for communicating between gpus");
AddOutput("Out", "The output of AllReduce op"); AddOutput("Out", "The output of AllReduce op");
AddAttr<std::string>("reduction", AddAttr<std::string>("reduction",
"(string, default 'ncclSum') "
"{'ncclMin', 'ncclMax', 'ncclProd', 'ncclSum'}.") "{'ncclMin', 'ncclMax', 'ncclProd', 'ncclSum'}.")
.SetDefault("ncclSum"); .SetDefault("ncclSum");
AddComment(R"DOC( AddComment(R"DOC(
AllReduce the input tensors. NCCLAllReduce Operator.
)DOC");
AllReduce the input tensors.
)DOC");
} }
}; };
...@@ -161,14 +170,20 @@ class NCCLReduceOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -161,14 +170,20 @@ class NCCLReduceOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("Communicator", "Communicator for communicating between gpus"); AddInput("Communicator", "Communicator for communicating between gpus");
AddOutput("Out", "The output of Reduce op"); AddOutput("Out", "The output of Reduce op");
AddAttr<std::string>("reduction", AddAttr<std::string>("reduction",
"(string, default 'ncclSum') "
"{'ncclMin', 'ncclMax', 'ncclProd', 'ncclSum'}.") "{'ncclMin', 'ncclMax', 'ncclProd', 'ncclSum'}.")
.SetDefault("ncclSum"); .SetDefault("ncclSum");
AddAttr<int>("root", AddAttr<int>("root",
"root gpu of the parameter. if not " "(int, default kInvalidGPUId) "
"set(platform::kInvalidGPUId). hashed by name.") "Root gpu of the parameter. If not, "
"set(platform::kInvalidGPUId). Hashed by name.")
.SetDefault(platform::kInvalidGPUId); .SetDefault(platform::kInvalidGPUId);
AddComment(R"DOC( AddComment(R"DOC(
Reduce the tensors)DOC"); NCCLReduce Operator.
Reduce the tensors.
)DOC");
} }
}; };
...@@ -182,12 +197,16 @@ class NCCLBcastOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -182,12 +197,16 @@ class NCCLBcastOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("Communicator", "Communicator for communicating between gpus"); AddInput("Communicator", "Communicator for communicating between gpus");
AddOutput("Out", "The output of Bcast"); AddOutput("Out", "The output of Bcast");
AddAttr<int>("root", AddAttr<int>("root",
"root gpu of the parameter. if not " "(int, default kInvalidGPUId) "
"set(platform::kInvalidGPUId). hashed by name.") "Root gpu of the parameter. If not, "
"set(platform::kInvalidGPUId). Hashed by name.")
.SetDefault(platform::kInvalidGPUId); .SetDefault(platform::kInvalidGPUId);
AddComment(R"DOC( AddComment(R"DOC(
Bcast the tensors. NCCLBcast Operator.
)DOC");
Bcast the tensors.
)DOC");
} }
}; };
......
...@@ -54,41 +54,44 @@ class PadOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -54,41 +54,44 @@ class PadOpMaker : public framework::OpProtoAndCheckerMaker {
"The input of pad op. " "The input of pad op. "
"The input should be a k-D tensor(k > 0 and k < 7)"); "The input should be a k-D tensor(k > 0 and k < 7)");
AddOutput("Out", AddOutput("Out",
"The output of pad op." "The output of pad op. "
"A tensor with the same shape as X."); "A tensor with the same shape as X.");
AddAttr<std::vector<int>>(
"paddings",
"(vector<int>) "
"A list<int> to describe the padding rules for each dimension. "
"For 2-D image tensor, paddings=[0, 1, 2, 3] means "
"padding 0 row to top, 1 row to bottom, 2 columns to left "
"and 3 columns to right. Size of paddings should be equal to "
"2 * dimension size of the input tensor.");
AddAttr<float>("pad_value",
"(float, default 0.0) "
"The value to fill the padded areas.")
.SetDefault(0.0f);
AddComment(R"DOC( AddComment(R"DOC(
Pad input into output, as specified by paddings and pad_value. The input should be a k-D tensor(k > 0 and k < 7). As an example: Pad Operator.
Pad input into output, as specified by paddings and pad_value.
The input should be a k-D tensor(k > 0 and k < 7). As an example:
Given: Given:
X = [[1, 2], X = [[1, 2],
[3, 4]] [3, 4]],
and
paddings = [0, 1, 1, 2] paddings = [0, 1, 1, 2],
and and
pad_value = 0 pad_value = 0,
then we get we have:
Out = [[0, 1, 2, 0, 0] Out = [[0, 1, 2, 0, 0]
[0, 3, 4, 0, 0] [0, 3, 4, 0, 0]
[0, 0, 0, 0, 0]] [0, 0, 0, 0, 0]]
)DOC"); )DOC");
AddAttr<std::vector<int>>(
"paddings",
"A list<int> to describes padding rules for each dimension."
" For 2-D image tensor, paddings=[0, 1, 2, 3] means"
" padding 0 row to top, 1 row to bottom, 2 columns to left"
" and 3 columns to right.Size of paddings should be equal to"
" 2 * dimension size of input tensor.");
AddAttr<float>("pad_value",
"(float) default to 0; "
"The value to fill padded areas.")
.SetDefault(0.0f);
} }
}; };
......
...@@ -73,125 +73,138 @@ Pool2dOpMaker::Pool2dOpMaker(framework::OpProto *proto, ...@@ -73,125 +73,138 @@ Pool2dOpMaker::Pool2dOpMaker(framework::OpProto *proto,
AddInput( AddInput(
"X", "X",
"(Tensor) The input tensor of pooling operator. " "(Tensor) The input tensor of pooling operator. "
"The format of input tensor is NCHW. Where N is batch size, C is the " "The format of input tensor is NCHW, where N is batch size, C is the "
"number of channels, H and W is the height and width of feature."); "number of channels, H is the height of the feature, "
"and W is the width of the feature.");
AddOutput("Out", AddOutput("Out",
"(Tensor) The output tensor of pooling operator." "(Tensor) The output tensor of pooling operator. "
"The format of output tensor is also NCHW." "The format of output tensor is also NCHW, "
"Where N is batch size, C is " "where N is batch size, C is the number of channels, "
"the number of channels, H and W is the height and " "H is the height of the feature, "
"width of feature."); "and W is the width of the feature.");
AddAttr<std::string>("poolingType", AddAttr<std::string>("poolingType",
"(string), pooling type, can be \"max\" for max-pooling " "(string), pooling type, can be \"max\" for max-pooling "
"and \"avg\" for average-pooling.") "and \"avg\" for average-pooling.")
.InEnum({"max", "avg"}); .InEnum({"max", "avg"});
AddAttr<std::vector<int>>("ksize", AddAttr<std::vector<int>>("ksize",
"(vector ), the pooling window size(height, width) " "(vector<int>) The pooling window "
"of pooling operator." "size(height, width) of the pooling operator. "
"If globalPooling = true, ksize and paddings will " "If globalPooling = true, ksize and paddings will "
"be ignored."); // TODO(Chengduo): Add checker. "be ignored."); // TODO(Chengduo): Add checker.
// (Currently, // (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddAttr<bool>("globalPooling", AddAttr<bool>("globalPooling",
"(bool default: false), whether to use the global pooling." "(bool, default false) Whether to use the global pooling. "
"If globalPooling = true, ksize and paddings will be ignored.") "If globalPooling = true, ksize and paddings will be ignored.")
.SetDefault(false); .SetDefault(false);
AddAttr<std::vector<int>>( AddAttr<std::vector<int>>("strides",
"strides", "(vector<int>, default {1, 1}), strides(height, "
"(vector, default:{1, 1}), strides(height, width) of pooling operator.") "width) of pooling operator.")
.SetDefault({1, 1}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({1, 1}); // TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddAttr<std::vector<int>>( AddAttr<std::vector<int>>(
"paddings", "paddings",
"(vector defalut:{0,0}), paddings(height, width) of pooling operator." "(vector<int>, defalut {0,0}), paddings(height, width) of pooling "
"operator."
"If globalPooling = true, paddings and ksize will be ignored.") "If globalPooling = true, paddings and ksize will be ignored.")
.SetDefault({0, 0}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({0, 0}); // TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddComment(R"DOC( AddComment(R"DOC(
Pool2d Operator.
The pooling2d operation calculates the output based on The pooling2d operation calculates the output based on
the input, poolingType and ksize, strides, paddings parameters. the input, poolingType and ksize, strides, paddings parameters.
Input(X) and output(Out) are in NCHW format. Where N is batch size, C is the Input(X) and output(Out) are in NCHW format, where N is batch size, C is the
number of channels, H and W is the height and width of feature. number of channels, H is the height of the feature, and W is the width of the feature.
Parameters(ksize, strides, paddings) are two elements. Parameters(ksize, strides, paddings) are two elements.
These two elements represent height and width, respectively. These two elements represent height and width, respectively.
The input(X) size and output(Out) size may be different. The input(X) size and output(Out) size may be different.
Example: Example:
Input: Input:
X shape: (N, C, H_in, W_in) X shape: $(N, C, H_{in}, W_{in})$
Output: Output:
Out shape: (N, C, H_out, W_out) Out shape: $(N, C, H_{out}, W_{out})$
where where
H_out = (H_in - ksize[0] + 2 * paddings[0]) / strides[0] + 1; $$
W_out = (W_in - ksize[1] + 2 * paddings[1]) / strides[1] + 1; H_{out} = (H_{in} - ksize[0] + 2 * paddings[0]) / strides[0] + 1 \\
W_{out} = (W_{in} - ksize[1] + 2 * paddings[1]) / strides[1] + 1
$$
)DOC"); )DOC");
} }
Pool3dOpMaker::Pool3dOpMaker(framework::OpProto *proto, Pool3dOpMaker::Pool3dOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput("X",
"X",
"(Tensor) The input tensor of pooling operator. " "(Tensor) The input tensor of pooling operator. "
"The format of input tensor is NCDHW. Where N is batch size, C is " "The format of input tensor is NCDHW, where N is batch size, C is "
"the number of channels, D, H and W is the depth, height and width of " "the number of channels, and D, H and W is the depth, height and "
"feature."); "width of "
"the feature, respectively.");
AddOutput("Out", AddOutput("Out",
"(Tensor) The output tensor of pooling operator." "(Tensor) The output tensor of pooling operator."
"The format of output tensor is also NCDHW." "The format of output tensor is also NCDHW, "
"Where N is batch size, C is " "where N is batch size, C is "
"the number of channels, D, H and W is the depth, height and " "the number of channels, and D, H and W is the depth, height and "
"width of feature."); "width of the feature, respectively.");
AddAttr<std::string>("poolingType", AddAttr<std::string>("poolingType",
"(string), pooling type, can be \"max\" for max-pooling " "(string) Pooling type, can be \"max\" for max-pooling "
"and \"avg\" for average-pooling.") "and \"avg\" for average-pooling.")
.InEnum({"max", "avg"}); .InEnum({"max", "avg"});
AddAttr<std::vector<int>>("ksize", AddAttr<std::vector<int>>(
"(vector ), the pooling window size(depth, height, " "ksize",
"width) of pooling " "(vector<int>) The pooling window size(depth, height, "
"operator." "width) of pooling operator. "
"If globalPooling = true, ksize and paddings wille " "If globalPooling = true, ksize and paddings will "
"be ignored."); // TODO(Chengduo): Add checker. "be ignored."); // TODO(Chengduo): Add checker.
// (Currently, // (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddAttr<bool>("globalPooling", AddAttr<bool>("globalPooling",
"(bool default: false), whether to use the global pooling." "(bool, default false) Whether to use the global pooling. "
"If globalPooling = true, ksize and paddings wille be ignored.") "If globalPooling = true, ksize and paddings wille be ignored.")
.SetDefault(false); .SetDefault(false);
AddAttr<std::vector<int>>("strides", AddAttr<std::vector<int>>(
"(vector, default:{1,1,1}), strides(depth, height, " "strides",
"width) of pooling operator.") "(vector<int>, default {1,1,1}) Strides(depth, height, "
"width) of the pooling operator.")
.SetDefault({1, 1, 1}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({1, 1, 1}); // TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddAttr<std::vector<int>>( AddAttr<std::vector<int>>(
"paddings", "paddings",
"(vector defalut:{0,0,0}), paddings(depth, height, " "(vector<int>, defalut {0,0,0}), paddings(depth, height, "
"width) of pooling operator." "width) of pooling operator. "
"If globalPooling = true, ksize and paddings wille be ignored.") "If globalPooling = true, ksize and paddings will be ignored.")
.SetDefault({0, 0, 0}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({0, 0, 0}); // TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddComment(R"DOC( AddComment(R"DOC(
Pool3d Operator.
The pooling3d operation calculates the output based on The pooling3d operation calculates the output based on
the input, poolingType and ksize, strides, paddings parameters. the input, poolingType, ksize, strides, and paddings parameters.
Input(X) and output(Out) are in NCDHW format. Where N is batch Input(X) and output(Out) are in NCDHW format, where N is batch
size, C is the number of channels, D, H and W is the depth, height and size, C is the number of channels, and D, H and W are the depth, height and
width of feature. Parameters(ksize, strides, paddings) are three elements. width of the feature, respectively. Parameters(ksize, strides, paddings)
These three elements represent depth, height and width, respectively. are three elements. These three elements represent depth, height and
The input(X) size and output(Out) size may be different. width, respectively. The input(X) size and output(Out) size may be different.
Example: Example:
Input: Input:
X shape: (N, C, D_in, H_in, W_in) X shape: $(N, C, D_{in}, H_{in}, W_{in})$
Output: Output:
Out shape: (N, C, D_out, H_out, W_out) Out shape: $(N, C, D_{out}, H_{out}, W_{out})$
where where
D_out = (D_in - ksize[0] + 2 * paddings[0]) / strides[0] + 1; $$
H_out = (H_in - ksize[1] + 2 * paddings[1]) / strides[1] + 1; D_{out} = (D_{in} - ksize[0] + 2 * paddings[0]) / strides[0] + 1 \\
W_out = (W_in - ksize[2] + 2 * paddings[2]) / strides[2] + 1; H_{out} = (H_{in} - ksize[1] + 2 * paddings[1]) / strides[1] + 1 \\
W_{out} = (W_{in} - ksize[2] + 2 * paddings[2]) / strides[2] + 1
$$
)DOC"); )DOC");
} }
} // namespace operators } // namespace operators
......
...@@ -89,64 +89,73 @@ class MaxPool2dWithIndexOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -89,64 +89,73 @@ class MaxPool2dWithIndexOpMaker : public framework::OpProtoAndCheckerMaker {
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(Tensor), the input tensor of pooling operator. " "(Tensor) The input tensor of pooling operator. "
"The format of input tensor is NCHW. Where N is batch size, C is the " "The format of input tensor is NCHW, where N is batch size, C is the "
"number of channels, H and W is the height and width of image."); "number of channels, H is the height of the image, "
"and W is the width of the image.");
AddOutput("Out", AddOutput("Out",
"(Tensor), the output tensor of pooling operator." "(Tensor) The output tensor of pooling operator. "
"The format of output tensor is also NCHW." "The format of output tensor is also NCHW, "
"Where N is batch size, C is " "where N is batch size, C is "
"the number of channels, H and W is the height and " "the number of channels, H is the height of the image "
"width of image."); "and W is the width of the image.");
AddOutput("Mask", AddOutput("Mask",
"(Tensor), the Mask tensor of pooling operator." "(Tensor) The Mask tensor of pooling operator."
"The format of output tensor is also NCHW." "The format of output tensor is also NCHW, "
"Where N is batch size, C is the number of channels, H and W " "where N is batch size, C is the number of channels, "
"is the height and width of image." "H is the height of the image, "
"The value in it is the index in current feature map"); "and W is the width of the image. "
"It represents the index in the current feature map.");
AddAttr<std::vector<int>>("ksize", AddAttr<std::vector<int>>("ksize",
"(vector ), the pooling window size(height, " "(vector<int>) The pooling window size(height, "
"width) of pooling operator." "width) of pooling operator. "
"If globalPooling = true, ksize and paddings " "If globalPooling = true, ksize and paddings "
"will be ignored."); // TODO(Chengduo): Add "will be ignored."); // TODO(Chengduo): Add
// checker. (Currently, // checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddAttr<bool>( AddAttr<bool>(
"globalPooling", "globalPooling",
"(bool default: false), whether to use the global pooling." "(bool, default false) Whether to use the global pooling. "
"If globalPooling = true, ksize and paddings will be ignored.") "If globalPooling = true, ksize and paddings will be ignored.")
.SetDefault(false); .SetDefault(false);
AddAttr<std::vector<int>>( AddAttr<std::vector<int>>("strides",
"strides", "(vector<int>, default {1, 1}), strides(height, "
"(vector, default:{1, 1}), strides(height, width) of pooling operator.") "width) of pooling operator.")
.SetDefault({1, 1}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({1, 1}); // TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddAttr<std::vector<int>>( AddAttr<std::vector<int>>(
"paddings", "paddings",
"(vector defalut:{0, 0}), paddings(height, width) of pooling operator." "(vector<int>, defalut {0, 0}), paddings(height, width) of pooling "
"operator. "
"If globalPooling = true, paddings and will be ignored.") "If globalPooling = true, paddings and will be ignored.")
.SetDefault({0, 0}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({0, 0}); // TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddComment(R"DOC( AddComment(R"DOC(
MaxPool2d Operator.
The maxPooling2d with index operation calculates the output and the mask The maxPooling2d with index operation calculates the output and the mask
based on the input and ksize, strides, paddings parameters. Input(X) and based on the input, ksize, strides, and paddings parameters. Input(X) and
output(Out, Mask) are in NCHW format. Where N is batch size, C is the output(Out, Mask) are in NCHW format, where N is batch size, C is the
number of channels, H and W is the height and width of feature. number of channels, H is the height of the feature,
and W is the width of the feature.
Parameters(ksize, strides, paddings) are two elements. Parameters(ksize, strides, paddings) are two elements.
These two elements represent height and width, respectively. These two elements represent height and width, respectively.
The input(X) size and output(Out, Mask) size may be different. The input(X) size and output(Out, Mask) size may be different.
Example: Example:
Input: Input:
X shape: (N, C, H_in, W_in) X shape: $(N, C, H_{in}, W_{in})$
Output: Output:
Out shape: (N, C, H_out, W_out) Out shape: $(N, C, H_{out}, W_{out})$
Mask shape: (N, C, H_out, W_out) Mask shape: $(N, C, H_{out}, W_{out})$
where where
H_out = (H_in - ksize[0] + 2 * paddings[0]) / strides[0] + 1; $$
W_out = (W_in - ksize[1] + 2 * paddings[1]) / strides[1] + 1; H_{out} = (H_{in} - ksize[0] + 2 * paddings[0]) / strides[0] + 1 \\
W_{out} = (W_{in} - ksize[1] + 2 * paddings[1]) / strides[1] + 1
$$
)DOC"); )DOC");
} }
}; };
...@@ -156,70 +165,76 @@ class MaxPool3dWithIndexOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -156,70 +165,76 @@ class MaxPool3dWithIndexOpMaker : public framework::OpProtoAndCheckerMaker {
MaxPool3dWithIndexOpMaker(framework::OpProto *proto, MaxPool3dWithIndexOpMaker(framework::OpProto *proto,
framework::OpAttrChecker *op_checker) framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput("X",
"X", "(Tensor) The input tensor of pooling operator. "
"(Tensor), the input tensor of pooling operator. " "The format of input tensor is NCDHW, where N is batch size, C is "
"The format of input tensor is NCDHW. Where N is batch size, C is " "the number of channels, and D, H and W are the depth, height and "
"the number of channels, D, H and W is the depth, height and width of " "width of "
"image."); "the image, respectively");
AddOutput("Out", AddOutput("Out",
"(Tensor), the output tensor of pooling operator." "(Tensor) The output tensor of pooling operator. "
"The format of output tensor is also NCDHW." "The format of output tensor is also NCDHW, "
"Where N is batch size, C is " "where N is the batch size, C is the number of channels, "
"the number of channels, D, H and W is the depth, height and " "and D, H and W are the depth, height and "
"width of image."); "width of the image, respectively.");
AddOutput("Mask", AddOutput("Mask",
"(Tensor), the Mask tensor of pooling operator." "(Tensor) The Mask tensor of pooling operator. "
"The format of output tensor is also NCDHW." "The format of output tensor is also NCDHW, "
"Where N is batch size, C is the number of channels, D, H and W " "where N is the batch size, C is the number of channels, and "
"is the depth, height and width of image." "D, H and W are the depth, height and width "
"The value in it is the index in current feature map"); "of the image, respectively. "
"It represents the index in the current feature map.");
AddAttr<std::vector<int>>("ksize", AddAttr<std::vector<int>>("ksize",
"(vector), the pooling window size(depth, " "(vector<int>) The pooling window size(depth, "
"height, width) of pooling " "height, width) of pooling operator. "
"operator."
"If globalPooling = true, ksize and paddings " "If globalPooling = true, ksize and paddings "
"will be ignored."); // TODO(Chengduo): Add "will be ignored."); // TODO(Chengduo): Add
// checker. (Currently, // checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddAttr<bool>( AddAttr<bool>(
"globalPooling", "globalPooling",
"(bool default: false), whether to use the global pooling." "(bool, default false) Whether to use the global pooling. "
"If globalPooling = true, ksize and paddings will be ignored.") "If globalPooling = true, ksize and paddings will be ignored.")
.SetDefault(false); .SetDefault(false);
AddAttr<std::vector<int>>("strides", AddAttr<std::vector<int>>("strides",
"(vector, default:{1,1,1}), strides(depth, " "(vector<int>, default {1,1,1}), strides(depth, "
"height, width) of pooling operator.") "height, width) of pooling operator.")
.SetDefault({1, 1, 1}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({1, 1, 1}); // TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddAttr<std::vector<int>>( AddAttr<std::vector<int>>(
"paddings", "paddings",
"(vector defalut:{0,0,0}), paddings(depth, " "(vector, defalut {0,0,0}), paddings(depth, "
"height, width) of pooling operator." "height, width) of pooling operator. "
"If globalPooling = true, paddings and ksize will be ignored.") "If globalPooling = true, paddings and ksize will be ignored.")
.SetDefault({0, 0, 0}); // TODO(Chengduo): Add checker. (Currently, .SetDefault({0, 0, 0}); // TODO(Chengduo): Add checker. (Currently,
// TypedAttrChecker don't support vector type.) // TypedAttrChecker don't support vector type.)
AddComment(R"DOC( AddComment(R"DOC(
MaxPool3d Operator.
The maxpooling3d with index operation calculates the output and the mask The maxpooling3d with index operation calculates the output and the mask
based on the input and ksize, strides, paddings parameters. based on the input and ksize, strides, paddings parameters.
Input(X) and output(Out, Mask) are in NCDHW format. Where N is batch Input(X) and output(Out, Mask) are in NCDHW format, where N is batch
size, C is the number of channels, D, H and W is the depth, height and size, C is the number of channels, and D, H and W are the depth, height and
width of feature. Parameters(ksize, strides, paddings) are three elements. width of the feature, respectively.
Parameters(ksize, strides, paddings) are three elements.
These three elements represent depth, height and width, respectively. These three elements represent depth, height and width, respectively.
The input(X) size and output(Out, Mask) size may be different. The input(X) size and output(Out, Mask) size may be different.
Example: Example:
Input: Input:
X shape: (N, C, D_in, H_in, W_in) X shape: $(N, C, D_{in}, H_{in}, W_{in})$
Output: Output:
Out shape: (N, C, D_out, H_out, W_out) Out shape: $(N, C, D_{out}, H_{out}, W_{out})$
Mask shape: (N, C, D_out, H_out, W_out) Mask shape: $(N, C, D_{out}, H_{out}, W_{out})$
where where
D_out = (D_in - ksize[0] + 2 * paddings[0]) / strides[0] + 1; $$
H_out = (H_in - ksize[1] + 2 * paddings[1]) / strides[1] + 1; D_{out} = (D_{in} - ksize[0] + 2 * paddings[0]) / strides[0] + 1 \\
W_out = (W_in - ksize[2] + 2 * paddings[2]) / strides[2] + 1; H_{out} = (H_{in} - ksize[1] + 2 * paddings[1]) / strides[1] + 1 \\
W_{out} = (W_{in} - ksize[2] + 2 * paddings[2]) / strides[2] + 1
$$
)DOC"); )DOC");
} }
}; };
......
...@@ -92,76 +92,78 @@ class PrecisionRecallOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -92,76 +92,78 @@ class PrecisionRecallOpMaker : public framework::OpProtoAndCheckerMaker {
framework::OpAttrChecker *op_checker) framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("MaxProbs", AddInput("MaxProbs",
"(Tensor, default Tensor<float>), a 2-D tensor with shape N x 1, " "(Tensor, default Tensor<float>) A 2-D tensor with shape N x 1, "
"where N is the batch size. Each row contains the max probability " "where N is the batch size. Each row contains the max probability "
"of an instance which computed by the previous top_k (k=1) " "of an instance which computed by the previous top_k (k=1) "
"operator."); "operator.");
AddInput("Indices", AddInput("Indices",
"(Tensor, default Tensor<int>), a 2-D tensor with shape N x 1, " "(Tensor, default Tensor<int>) A 2-D tensor with shape N x 1, "
"where N is the batch size. Each row contains the corresponding " "where N is the batch size. Each row contains the corresponding "
"index which computed by the previous top_k (k=1) operator."); "index which computed by the previous top_k (k=1) operator.");
AddInput("Labels", AddInput("Labels",
"(Tensor, default Tensor<int>), a 2-D tensor with shape N x 1, " "(Tensor, default Tensor<int>) A 2-D tensor with shape N x 1, "
"where N is the batch size. Each element is a label and the " "where N is the batch size. Each element is a label and the "
"value should be in [0, class_number - 1]."); "value should be in [0, class_number - 1].");
AddInput("Weights", AddInput("Weights",
"(Tensor, default Tensor<float>), a 2-D tensor with shape N x 1, " "(Tensor, default Tensor<float>) A 2-D tensor with shape N x 1, "
"where N is the batch size. This input is optional. If provided, " "where N is the batch size. This input is optional. If provided, "
"weight of instance would be considered when computing metrics.") "weight of instance would be considered when computing metrics.")
.AsDispensable(); .AsDispensable();
AddInput("StatesInfo", AddInput("StatesInfo",
"(Tensor, default Tensor<int>), a 2-D tensor with shape D x 4, " "(Tensor, default Tensor<int>) A 2-D tensor with shape D x 4, "
"where D is the number of classes. This input is optional. If " "where D is the number of classes. This input is optional. If "
"provided, current state will be accumulated to this state and " "provided, current state will be accumulated to this state and "
"the accumulation state will be as the output state.") "the accumulation state will be the output state.")
.AsDispensable(); .AsDispensable();
AddOutput("BatchMetrics", AddOutput("BatchMetrics",
"(Tensor, default Tensor<float>), a 1-D tensor with shape {6}." "(Tensor, default Tensor<float>) A 1-D tensor with shape {6}. "
"This output tensor contains metrics for current batch data." "This output tensor contains metrics for current batch data. "
"The layout is [macro average precision, macro average recall, " "The layout is [macro average precision, macro average recall, "
"macro f1 score, micro average precision, micro average recall, " "macro f1 score, micro average precision, micro average recall, "
"micro f1 score]"); "micro f1 score].");
AddOutput("AccumMetrics", AddOutput("AccumMetrics",
"(Tensor, default Tensor<float>), a 1-D tensor with shape {6}." "(Tensor, default Tensor<float>) A 1-D tensor with shape {6}. "
"This output tensor contains metrics for accumulated data." "This output tensor contains metrics for accumulated data. "
"The layout is [macro average precision, macro average recall, " "The layout is [macro average precision, macro average recall, "
"macro f1 score, micro average precision, micro average recall, " "macro f1 score, micro average precision, micro average recall, "
"micro f1 score]"); "micro f1 score].");
AddOutput("AccumStatesInfo", AddOutput("AccumStatesInfo",
"(Tensor, default Tensor<float>), a 2-D tensor with shape D x 4, " "(Tensor, default Tensor<float>) A 2-D tensor with shape D x 4, "
"where D is equal to class number. This output tensor contains " "where D is equal to class number. This output tensor contains "
"accumulated state variables used to compute metrics. The layout " "accumulated state variables used to compute metrics. The layout "
"for each class is [true positives, false positives, " "for each class is [true positives, false positives, "
"true negatives, false negatives]."); "true negatives, false negatives].");
AddAttr<int>("class_number", "Number of classes to be evaluated."); AddAttr<int>("class_number", "(int) Number of classes to be evaluated.");
AddComment(R"DOC( AddComment(R"DOC(
When given 'Input(Indices)' and 'Input(Labels)', this operator can be used Precision Recall Operator.
When given Input(Indices) and Input(Labels), this operator can be used
to compute various metrics including: to compute various metrics including:
- macro average precision 1. macro average precision
- macro average recall 2. macro average recall
- macro f1 score 3. macro f1 score
- micro average precision 4. micro average precision
- micro average recall 5. micro average recall
- micro f1 score 6. micro f1 score
To compute the above metrics, we need to do statistics for true positives, To compute the above metrics, we need to do statistics for true positives,
false positives and false negatives. Here count of true negatives is not false positives and false negatives. Here the count of true negatives is not
necessary, but counting it may provide potential usage and the cost is necessary, but counting it may provide potential usage and the cost is
trivial, so the operator also provides count of true negatives. trivial, so the operator also provides the count of true negatives.
We define state as a 2-D tensor with shape [class_number, 4]. Each row of a We define state as a 2-D tensor with shape [class_number, 4]. Each row of a
state contains statistic variables for corresponding class. Layout of each row state contains statistic variables for corresponding class. Layout of each row
is: TP(true positives), FP(false positives), TN(true negatives), is: TP(true positives), FP(false positives), TN(true negatives),
FN(false negatives). If 'Input(Weights)' provided, TP, FP, TN, FN will be FN(false negatives). If Input(Weights) is provided, TP, FP, TN, FN will be
calculated by given weight instead of instance count. calculated by given weight instead of the instance count.
This operator also supports metrics computing for cross-batch situation. To This operator also supports metrics computing for cross-batch situation. To
achieve this, 'Input(StatesInfo)' should be provided. State of current batch achieve this, Input(StatesInfo) should be provided. State of current batch
data will be accumulated to 'Input(StatesInfo)' and 'Output(AccumStatesInfo)' data will be accumulated to Input(StatesInfo) and Output(AccumStatesInfo)
is the accumulation state. is the accumulation state.
'Output(BatchMetrics)' is metrics of current batch data while Output(BatchMetrics) is metrics of current batch data while
'Output(AccumStatesInfo)' is metrics of accumulation data. Output(AccumStatesInfo) is metrics of accumulation data.
)DOC"); )DOC");
} }
......
...@@ -41,17 +41,24 @@ class PReluOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -41,17 +41,24 @@ class PReluOpMaker : public framework::OpProtoAndCheckerMaker {
PReluOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker) PReluOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The input tensor of prelu operator."); AddInput("X", "The input tensor of prelu operator.");
AddInput("Alpha", "The alpha weight of PRelu operator."); AddInput("Alpha", "The alpha weight of prelu operator.");
AddOutput("Out", "The output tensor of PRelu operator."); AddOutput("Out", "The output tensor of prelu operator.");
AddComment(R"DOC(PRelu operator AddComment(R"DOC(
PRelu Operator.
The equation is: The equation is:
f(x) = alpha * x , for x < 0 $$
f(x) = x , for x >= 0 f(x) =
\begin{cases}
\alpha * x, \quad \text{if} \ x < 0 \\
x, \qquad \text{if} \ x >= 0
\end{cases}
$$
The input `X` can carry the LoD (Level of Details) information, The input `X` can carry the LoD (Level of Details) information,
or not. And the output shares the LoD with input `X`. or not. And the output shares the LoD information with input `X`.
)DOC"); )DOC");
} }
}; };
......
...@@ -83,22 +83,26 @@ class ProximalAdagradOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -83,22 +83,26 @@ class ProximalAdagradOpMaker : public framework::OpProtoAndCheckerMaker {
"L1 regularization strength.") "L1 regularization strength.")
.SetDefault(0.0f); .SetDefault(0.0f);
AddAttr<float>("l2", AddAttr<float>("l2",
"(float, default 0.0)" "(float, default 0.0) "
"L2 regularization strength.") "L2 regularization strength.")
.SetDefault(0.0f); .SetDefault(0.0f);
AddComment(R"DOC( AddComment(R"DOC(
Proximal Adagrad Optimizer.
Optimizer that implements the proximal adagrad algorithm. Optimizer that implements the proximal adagrad algorithm:
moment = moment + grad * grad $$
prox_param = param - learning_rate * grad * (1 / sqrt(moment)) moment = moment + grad * grad \\
param = sign(prox_param) / (1 + learning_rate * l2) * prox\_param = param - learning\_rate * grad * (1 / \sqrt{moment}) \\
max { |prox_param| - learning_rate * l1 , 0 } param = sign(prox\_param) / (1 + learning\_rate * l2) *
\max(|prox\_param| - learning\_rate * l1 , 0)
$$
The paper that proposed Proximal GD: The paper that proposed Proximal GD:
(http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf) (http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf)
Here, we use the adagrad learning rate as specified here: Here, we use the adagrad learning rate as specified here:
(http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf) (http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
)DOC"); )DOC");
} }
}; };
......
...@@ -67,19 +67,23 @@ class ProximalGDOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -67,19 +67,23 @@ class ProximalGDOpMaker : public framework::OpProtoAndCheckerMaker {
"L1 regularization strength.") "L1 regularization strength.")
.SetDefault(0.0f); .SetDefault(0.0f);
AddAttr<float>("l2", AddAttr<float>("l2",
"(float, default 0.0)" "(float, default 0.0) "
"L2 regularization strength.") "L2 regularization strength.")
.SetDefault(0.0f); .SetDefault(0.0f);
AddComment(R"DOC( AddComment(R"DOC(
ProximalGD Operator.
Optimizer that implements the proximal gradient descent algorithm. Optimizer that implements the proximal gradient descent algorithm:
prox_param = param - learning_rate * grad $$
param = sign(prox_param) / (1 + learning_rate * l2) * prox\_param = param - learning\_rate * grad \\
max { |prox_param| - learning_rate * l1 , 0 } param = sign(prox\_param) / (1 + learning\_rate * l2) *
\max(|prox\_param| - learning\_rate * l1, 0)
$$
The paper that proposed Proximal Gradient Descent: The paper that proposed Proximal Gradient Descent:
(http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf) (http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf)
)DOC"); )DOC");
} }
}; };
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册