From 9a0b4d3c1323a3d6e6e07c15c462e93bc9e4f731 Mon Sep 17 00:00:00 2001
From: Travis CI
The algorithm works as follows:
+++for i-th sequence in a mini-batch:
$$ +Out(X[lod[i]:lod[i+1]], :) = frac{exp(X[lod[i]:lod[i+1], :])} {sum(exp(X[lod[i]:lod[i+1], :]))} +$$
For example, for a mini-batch of 3 sequences with variable-length, each containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7], then softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :] diff --git a/develop/doc/operators.json b/develop/doc/operators.json index 5883824dc5a..04538facedb 100644 --- a/develop/doc/operators.json +++ b/develop/doc/operators.json @@ -2545,7 +2545,7 @@ "attrs" : [ ] },{ "type" : "sequence_softmax", - "comment" : "\nSequence Softmax Operator.\n\nSequenceSoftmaxOp computes the softmax activation among all time-steps for each\nsequence. The dimension of each time-step should be 1. Thus, the shape of\ninput Tensor can be either [N, 1] or [N], where N is the sum of the length\nof all sequences.\n\nThe algorithm works as follows:\n for i-th sequence in a mini-batch:\n $$Out(X[lod[i]:lod[i+1]], :) =\n \\frac{\\exp(X[lod[i]:lod[i+1], :])}\n {\\sum(\\exp(X[lod[i]:lod[i+1], :]))}$$\n\nFor example, for a mini-batch of 3 sequences with variable-length,\neach containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],\nthen softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]\nand N turns out to be 7.\n\n", + "comment" : "\nSequence Softmax Operator.\n\nSequenceSoftmaxOp computes the softmax activation among all time-steps for each\nsequence. The dimension of each time-step should be 1. Thus, the shape of\ninput Tensor can be either [N, 1] or [N], where N is the sum of the length\nof all sequences.\n\nThe algorithm works as follows:\n\n for i-th sequence in a mini-batch:\n\n$$\nOut(X[lod[i]:lod[i+1]], :) = \\\n\\frac{\\exp(X[lod[i]:lod[i+1], :])} \\\n{\\sum(\\exp(X[lod[i]:lod[i+1], :]))}\n$$\n\nFor example, for a mini-batch of 3 sequences with variable-length,\neach containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],\nthen softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]\nand N turns out to be 7.\n\n", "inputs" : [ { "name" : "X", diff --git a/develop/doc_cn/api/v2/fluid/layers.html b/develop/doc_cn/api/v2/fluid/layers.html index 12d2f698c20..e530e803d80 100644 --- a/develop/doc_cn/api/v2/fluid/layers.html +++ b/develop/doc_cn/api/v2/fluid/layers.html @@ -1317,19 +1317,12 @@ bias weights will be created and be set to default value. sequence. The dimension of each time-step should be 1. Thus, the shape of input Tensor can be either [N, 1] or [N], where N is the sum of the length of all sequences.
-The algorithm works as follows:
+++for i-th sequence in a mini-batch:
$$ +Out(X[lod[i]:lod[i+1]], :) = frac{exp(X[lod[i]:lod[i+1], :])} {sum(exp(X[lod[i]:lod[i+1], :]))} +$$
For example, for a mini-batch of 3 sequences with variable-length, each containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7], then softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :] -- GitLab