diff --git a/develop/doc/api/v2/fluid/layers.html b/develop/doc/api/v2/fluid/layers.html index ebd7308d038b6776c9ab3258c723d6c8baa8e407..3b53810eff969afe25c8438d2748a3d3b16409de 100644 --- a/develop/doc/api/v2/fluid/layers.html +++ b/develop/doc/api/v2/fluid/layers.html @@ -1304,19 +1304,12 @@ bias weights will be created and be set to default value. sequence. The dimension of each time-step should be 1. Thus, the shape of input Tensor can be either [N, 1] or [N], where N is the sum of the length of all sequences.
-The algorithm works as follows:
+++for i-th sequence in a mini-batch:
$$ +Out(X[lod[i]:lod[i+1]], :) = frac{exp(X[lod[i]:lod[i+1], :])} {sum(exp(X[lod[i]:lod[i+1], :]))} +$$
For example, for a mini-batch of 3 sequences with variable-length, each containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7], then softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :] diff --git a/develop/doc/operators.json b/develop/doc/operators.json index 5883824dc5a2bde081d40551e06f32062263aa78..04538facedb046ed28d6fc891daf395bb12b90ec 100644 --- a/develop/doc/operators.json +++ b/develop/doc/operators.json @@ -2545,7 +2545,7 @@ "attrs" : [ ] },{ "type" : "sequence_softmax", - "comment" : "\nSequence Softmax Operator.\n\nSequenceSoftmaxOp computes the softmax activation among all time-steps for each\nsequence. The dimension of each time-step should be 1. Thus, the shape of\ninput Tensor can be either [N, 1] or [N], where N is the sum of the length\nof all sequences.\n\nThe algorithm works as follows:\n for i-th sequence in a mini-batch:\n $$Out(X[lod[i]:lod[i+1]], :) =\n \\frac{\\exp(X[lod[i]:lod[i+1], :])}\n {\\sum(\\exp(X[lod[i]:lod[i+1], :]))}$$\n\nFor example, for a mini-batch of 3 sequences with variable-length,\neach containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],\nthen softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]\nand N turns out to be 7.\n\n", + "comment" : "\nSequence Softmax Operator.\n\nSequenceSoftmaxOp computes the softmax activation among all time-steps for each\nsequence. The dimension of each time-step should be 1. Thus, the shape of\ninput Tensor can be either [N, 1] or [N], where N is the sum of the length\nof all sequences.\n\nThe algorithm works as follows:\n\n for i-th sequence in a mini-batch:\n\n$$\nOut(X[lod[i]:lod[i+1]], :) = \\\n\\frac{\\exp(X[lod[i]:lod[i+1], :])} \\\n{\\sum(\\exp(X[lod[i]:lod[i+1], :]))}\n$$\n\nFor example, for a mini-batch of 3 sequences with variable-length,\neach containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],\nthen softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]\nand N turns out to be 7.\n\n", "inputs" : [ { "name" : "X", diff --git a/develop/doc_cn/api/v2/fluid/layers.html b/develop/doc_cn/api/v2/fluid/layers.html index 12d2f698c201cfe3d5e794cb20f2d2455e353004..e530e803d8062335977c197443dc7058f23a77fd 100644 --- a/develop/doc_cn/api/v2/fluid/layers.html +++ b/develop/doc_cn/api/v2/fluid/layers.html @@ -1317,19 +1317,12 @@ bias weights will be created and be set to default value. sequence. The dimension of each time-step should be 1. Thus, the shape of input Tensor can be either [N, 1] or [N], where N is the sum of the length of all sequences.
-The algorithm works as follows:
+++for i-th sequence in a mini-batch:
$$ +Out(X[lod[i]:lod[i+1]], :) = frac{exp(X[lod[i]:lod[i+1], :])} {sum(exp(X[lod[i]:lod[i+1], :]))} +$$
For example, for a mini-batch of 3 sequences with variable-length, each containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7], then softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]