提交 2ac5d7d0 编写于 作者: K kavyasrinet 提交者: Yi Wang

Fixing documentation for operators (#5373)

* Adding documentation for seq_expand

* Adding documentation for seq_concat_op

* Adding documentation for sequence_conv

* Adding sequence_pool

* Fixing review comment

* Adding sequence_softmax

* Updating doc for sigmoid_cross_entropy_with_logits
上级 e65ab795
...@@ -53,8 +53,10 @@ class SeqExpandOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -53,8 +53,10 @@ class SeqExpandOpMaker : public framework::OpProtoAndCheckerMaker {
"(LodTensor)The output of seq_expand op." "(LodTensor)The output of seq_expand op."
"The lod of output will be as same as input(Y)'s lod."); "The lod of output will be as same as input(Y)'s lod.");
AddComment(R"DOC( AddComment(R"DOC(
Expand input(X) according to LOD of input(Y). Seq Expand Operator.
This operator expands input(X) according to LOD of input(Y).
Following are cases to better explain how this works:
Case 1: Case 1:
Given 2-level a LoDTensor input(X) Given 2-level a LoDTensor input(X)
......
...@@ -68,11 +68,12 @@ class SequenceConcatOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -68,11 +68,12 @@ class SequenceConcatOpMaker : public framework::OpProtoAndCheckerMaker {
"The level should be less than the level number of inputs.") "The level should be less than the level number of inputs.")
.SetDefault(0); .SetDefault(0);
AddComment(R"DOC( AddComment(R"DOC(
Sequence Concat operator Sequence Concat Operator.
The sequence_concat operator concatenates multiple LoDTensors. The sequence_concat operator concatenates multiple LoDTensors.
It only supports sequence (LoD Tensor with level number is 1) It supports a sequence (LoD Tensor with level number is 1)
or a nested sequence (LoD tensor with level number is 2) as its input. or a nested sequence (LoD tensor with level number is 2) as its input.
The following examples explain how the operator works:
- Case1: - Case1:
If the axis is other than 0(here, axis is 1 and level is 1), If the axis is other than 0(here, axis is 1 and level is 1),
each input should have the same LoD information and the LoD each input should have the same LoD information and the LoD
...@@ -98,6 +99,7 @@ or a nested sequence (LoD tensor with level number is 2) as its input. ...@@ -98,6 +99,7 @@ or a nested sequence (LoD tensor with level number is 2) as its input.
LoD(Out) = {{0,5,9}, {0,2,5,7,9}}; Dims(Out) = (9,3,4) LoD(Out) = {{0,5,9}, {0,2,5,7,9}}; Dims(Out) = (9,3,4)
NOTE: The levels of all the inputs should be the same. NOTE: The levels of all the inputs should be the same.
)DOC"); )DOC");
} }
}; };
......
...@@ -105,10 +105,10 @@ class SequenceConvOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -105,10 +105,10 @@ class SequenceConvOpMaker : public framework::OpProtoAndCheckerMaker {
: OpProtoAndCheckerMaker(proto, op_checker) { : OpProtoAndCheckerMaker(proto, op_checker) {
AddInput( AddInput(
"X", "X",
"(LoDTensor) the input(X) is a LodTensor, which support " "(LoDTensor) the input(X) is a LodTensor, which supports "
"variable-time length input sequence. The underlying tensor in " "variable-time length input sequence. The underlying tensor in "
"this LoDTensor is a matrix with shape (T, N), where, T is the " "this LoDTensor is a matrix with shape (T, N), where T is the "
"total time steps in this mini-batch, N is the input_hidden_size."); "total time steps in this mini-batch and N is the input_hidden_size.");
AddInput("PaddingData", AddInput("PaddingData",
"(Tensor, optional) the input(PaddingData) is an optional " "(Tensor, optional) the input(PaddingData) is an optional "
"parameter, and it is learnable. " "parameter, and it is learnable. "
...@@ -157,14 +157,16 @@ class SequenceConvOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -157,14 +157,16 @@ class SequenceConvOpMaker : public framework::OpProtoAndCheckerMaker {
.GreaterThan(0); .GreaterThan(0);
AddComment(R"DOC( AddComment(R"DOC(
SequenceConvOp performs convolution operation on features of Sequence Conv Operator.
contextLength time-steps of each instance.
The convolution operation calculates the output based on the input, filter SequenceConvOp performs convolution operation on features of contextLength
and strides, paddings parameters. The size of each dimension of the time-steps of each instance. The convolution operation calculates the output
parameters is checked in the infer-shape. In order to ensure the equal based on the input, filter, strides and paddings parameters.
length of sequence before and after convolution, it is necessary to fill The size of each dimension of the parameters is checked during infer-shape.
the top and bottom of each sequence according to context_length, In order to ensure the equal length of sequence before and after convolution,
context_stride and context_start. it is necessary to fill the top and bottom of each sequence based on
context_length, context_stride and context_start.
)DOC"); )DOC");
} }
}; };
......
...@@ -45,33 +45,36 @@ class SequencePoolOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -45,33 +45,36 @@ class SequencePoolOpMaker : public framework::OpProtoAndCheckerMaker {
.SetDefault("AVERAGE") .SetDefault("AVERAGE")
.InEnum({"AVERAGE", "SUM", "SQRT", "LAST", "FIRST", "MAX"}); .InEnum({"AVERAGE", "SUM", "SQRT", "LAST", "FIRST", "MAX"});
AddComment(R"DOC( AddComment(R"DOC(
SequencePoolOp pools features of all time-steps of each instance. Sequence Pool Operator.
It supports six pooling pooltype: The SequencePoolOp pools features of all time-steps of each instance.
- AVERAGE: Out[i] = average_{for each instance in i-th sequence}{X[i]} It supports six pooling types:
- SUM: Out[i] = sum_{for each instance in i-th sequence}{X[i]} 1. AVERAGE: Out[i] = $$avg(X_i)$$
- SQRT: Out[i] = sum_{for each instance in i-th sequence}{X[i]} 2. SUM: Out[i] = $$\sum_jX_{ij}$$
/ sqrt(i-th sequence length) 3. SQRT: Out[i] = $$\frac{\sum_jX_{ij}}{\sqrt{len(X_i)}}$$
- LAST: Out[i] = last instance in i-th sequence X[i] 4. LAST: Out[i] = last instance in i-th sequence X[i]
- FIRST: Out[i] = first instance in i-th sequence X[i] 5. FIRST: Out[i] = first instance in i-th sequence X[i]
- MAX: Out[i] = max_{for each instance in i-th sequence}{X[i]} 6. MAX: Out[i] = $$max(X_i)$$
For a mini-batch of 3 variable-length sentences, containing 2, 3, and 2 time-steps: The following example explains how this works:
For a mini-batch of 3 variable-length sentences,
Assume X is a [7,M,N] LoDTensor, and X->lod()[0] = [0, 2, 5, 7], 7=2+3+2. containing 2, 3, and 2 time-steps:
Besides, for the sake of simplicity, we assume M=1 and N=1,
and the value of X = [[1, 3], [2, 4, 6], [5, 1]]. Assume X is a [7,M,N] LoDTensor, and X->lod()[0] = [0, 2, 5, 7], 7=2+3+2.
Besides, for the sake of simplicity, we assume M=1 and N=1,
Thus, Out is a [3,1,1] Tensor without LoD infomation. and the value of X = [[1, 3], [2, 4, 6], [5, 1]].
And for different pooltype, the value of Out is as follows:
Thus, Out is a [3,1,1] Tensor without LoD infomation.
- AVERAGE: [2, 4, 3], where 2=(1+3)/2, 4=(2+4+6)/3, 3=(5+1)/2 And for different pooltype, the value of Out is as follows:
- SUM: [4, 12, 6], where 4=1+3, 12=2+4+6, 6=5+1
- SQRT: [2.82, 6.93, 4.24], where 2.82=(1+3)/sqrt(2), - AVERAGE: [2, 4, 3], where 2=(1+3)/2, 4=(2+4+6)/3, 3=(5+1)/2
- SUM: [4, 12, 6], where 4=1+3, 12=2+4+6, 6=5+1
- SQRT: [2.82, 6.93, 4.24], where 2.82=(1+3)/sqrt(2),
6.93=(2+4+6)/sqrt(3), 4.24=(5+1)/sqrt(2) 6.93=(2+4+6)/sqrt(3), 4.24=(5+1)/sqrt(2)
- MAX: [3, 6, 5], where 3=max(1,3), 6=max(2,4,6), 5=max(5,1) - MAX: [3, 6, 5], where 3=max(1,3), 6=max(2,4,6), 5=max(5,1)
- LAST: [3, 6, 1], where 3=last(1,3), 6=last(2,4,6), 1=last(5,1) - LAST: [3, 6, 1], where 3=last(1,3), 6=last(2,4,6), 1=last(5,1)
- FIRST: [1, 2, 5], where 1=first(1,3), 2=first(2,4,6), 5=first(5,1) - FIRST: [1, 2, 5], where 1=first(1,3), 2=first(2,4,6), 5=first(5,1)
)DOC"); )DOC");
} }
}; };
......
...@@ -43,20 +43,24 @@ class SequenceSoftmaxOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -43,20 +43,24 @@ class SequenceSoftmaxOpMaker : public framework::OpProtoAndCheckerMaker {
"(LoDTensor) 1-D or 2-D output LoDTensor with the 2-nd dimension " "(LoDTensor) 1-D or 2-D output LoDTensor with the 2-nd dimension "
"of length 1."); "of length 1.");
AddComment(R"DOC( AddComment(R"DOC(
SequenceSoftmaxOp computes softmax activation among all time-steps for each Sequence Softmax Operator.
SequenceSoftmaxOp computes the softmax activation among all time-steps for each
sequence. The dimension of each time-step should be 1. Thus, the shape of sequence. The dimension of each time-step should be 1. Thus, the shape of
input Tensor can be either [N, 1] or [N], where N is the sum of all sequences' input Tensor can be either [N, 1] or [N], where N is the sum of the length
lengths. of all sequences.
Equation: The algorithm works as follows:
for i-th sequence in a mini-batch: for i-th sequence in a mini-batch:
Out(X[lod[i]:lod[i+1]], :) = $$Out(X[lod[i]:lod[i+1]], :) =
exp(X[lod[i]:lod[i+1], :]) / sum(exp(X[lod[i]:lod[i+1], :])) \frac{\exp(X[lod[i]:lod[i+1], :])}
{\sum(\exp(X[lod[i]:lod[i+1], :]))}$$
For example, for a mini-batch of 3 sequences with variable-length, For example, for a mini-batch of 3 sequences with variable-length,
each containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7], each containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],
then softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :] then softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]
and N turns out to be 7. and N turns out to be 7.
)DOC"); )DOC");
} }
}; };
......
...@@ -107,26 +107,28 @@ class SigmoidCrossEntropyWithLogitsOpMaker ...@@ -107,26 +107,28 @@ class SigmoidCrossEntropyWithLogitsOpMaker
AddComment(R"DOC( AddComment(R"DOC(
SigmoidCrossEntropyWithLogits Operator. SigmoidCrossEntropyWithLogits Operator.
This measures the elementwise probability error in discrete classification tasks This measures the element-wise probability error in classification tasks
in which each class is independent. This can be thought of as predicting labels in which each class is independent. This can be thought of as predicting labels
for a data-point that are not mutually exclusive. For example, a news article for a data-point, where labels are not mutually exclusive.
can be about politics, technology or sports at the same time or none of these. For example, a news article can be about politics, technology or sports
at the same time or none of these.
The logistic loss is given as follows: The logistic loss is given as follows:
loss = -Labels * log(sigmoid(X)) - (1 - Labels) * log(1 - sigmoid(X)) $$loss = -Labels * \log(\sigma(X)) - (1 - Labels) * \log(1 - \sigma(X))$$
We know that sigmoid(X) = (1 / (1 + exp(-X))). By substituting this we get We know that $$\sigma(X) = (1 / (1 + \exp(-X)))$$. By substituting this we get:
loss = X - X * Labels + log(1 + exp(-X)) $$loss = X - X * Labels + \log(1 + \exp(-X))$$
For stability and to prevent overflow of exp(-X) when X < 0, For stability and to prevent overflow of $$\exp(-X)$$ when X < 0,
we can reformulate the loss as follows: we reformulate the loss as follows:
loss = max(X, 0) - X * Labels + log(1 + exp(-abs(X))) $$loss = \max(X, 0) - X * Labels + \log(1 + \exp(-|X|))$$
Both the input `X` and `Labels` can carry the LoD (Level of Details) information. Both the input `X` and `Labels` can carry the LoD (Level of Details) information.
However the output only shares the LoD with input `X`. However the output only shares the LoD with input `X`.
)DOC"); )DOC");
} }
}; };
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册