diff --git a/develop/doc/api/v2/fluid/layers.html b/develop/doc/api/v2/fluid/layers.html
index ebd7308d038b6776c9ab3258c723d6c8baa8e407..3b53810eff969afe25c8438d2748a3d3b16409de 100644
--- a/develop/doc/api/v2/fluid/layers.html
+++ b/develop/doc/api/v2/fluid/layers.html
@@ -1304,19 +1304,12 @@ bias weights will be created and be set to default value.</li>
 sequence. The dimension of each time-step should be 1. Thus, the shape of
 input Tensor can be either [N, 1] or [N], where N is the sum of the length
 of all sequences.</p>
-<dl class="docutils">
-<dt>The algorithm works as follows:</dt>
-<dd><dl class="first last docutils">
-<dt>for i-th sequence in a mini-batch:</dt>
-<dd><dl class="first last docutils">
-<dt>$$Out(X[lod[i]:lod[i+1]], :) =</dt>
-<dd>frac{exp(X[lod[i]:lod[i+1], :])}
-{sum(exp(X[lod[i]:lod[i+1], :]))}$$</dd>
-</dl>
-</dd>
-</dl>
-</dd>
-</dl>
+<p>The algorithm works as follows:</p>
+<blockquote>
+<div>for i-th sequence in a mini-batch:</div></blockquote>
+<p>$$
+Out(X[lod[i]:lod[i+1]], :) = frac{exp(X[lod[i]:lod[i+1], :])} {sum(exp(X[lod[i]:lod[i+1], :]))}
+$$</p>
 <p>For example, for a mini-batch of 3 sequences with variable-length,
 each containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],
 then softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]
diff --git a/develop/doc/operators.json b/develop/doc/operators.json
index 5883824dc5a2bde081d40551e06f32062263aa78..04538facedb046ed28d6fc891daf395bb12b90ec 100644
--- a/develop/doc/operators.json
+++ b/develop/doc/operators.json
@@ -2545,7 +2545,7 @@
  "attrs" : [  ] 
 },{
  "type" : "sequence_softmax",
- "comment" : "\nSequence Softmax Operator.\n\nSequenceSoftmaxOp computes the softmax activation among all time-steps for each\nsequence. The dimension of each time-step should be 1. Thus, the shape of\ninput Tensor can be either [N, 1] or [N], where N is the sum of the length\nof all sequences.\n\nThe algorithm works as follows:\n    for i-th sequence in a mini-batch:\n        $$Out(X[lod[i]:lod[i+1]], :) =\n            \\frac{\\exp(X[lod[i]:lod[i+1], :])}\n            {\\sum(\\exp(X[lod[i]:lod[i+1], :]))}$$\n\nFor example, for a mini-batch of 3 sequences with variable-length,\neach containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],\nthen softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]\nand N turns out to be 7.\n\n",
+ "comment" : "\nSequence Softmax Operator.\n\nSequenceSoftmaxOp computes the softmax activation among all time-steps for each\nsequence. The dimension of each time-step should be 1. Thus, the shape of\ninput Tensor can be either [N, 1] or [N], where N is the sum of the length\nof all sequences.\n\nThe algorithm works as follows:\n\n    for i-th sequence in a mini-batch:\n\n$$\nOut(X[lod[i]:lod[i+1]], :) = \\\n\\frac{\\exp(X[lod[i]:lod[i+1], :])} \\\n{\\sum(\\exp(X[lod[i]:lod[i+1], :]))}\n$$\n\nFor example, for a mini-batch of 3 sequences with variable-length,\neach containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],\nthen softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]\nand N turns out to be 7.\n\n",
  "inputs" : [ 
  { 
    "name" : "X",
diff --git a/develop/doc_cn/api/v2/fluid/layers.html b/develop/doc_cn/api/v2/fluid/layers.html
index 12d2f698c201cfe3d5e794cb20f2d2455e353004..e530e803d8062335977c197443dc7058f23a77fd 100644
--- a/develop/doc_cn/api/v2/fluid/layers.html
+++ b/develop/doc_cn/api/v2/fluid/layers.html
@@ -1317,19 +1317,12 @@ bias weights will be created and be set to default value.</li>
 sequence. The dimension of each time-step should be 1. Thus, the shape of
 input Tensor can be either [N, 1] or [N], where N is the sum of the length
 of all sequences.</p>
-<dl class="docutils">
-<dt>The algorithm works as follows:</dt>
-<dd><dl class="first last docutils">
-<dt>for i-th sequence in a mini-batch:</dt>
-<dd><dl class="first last docutils">
-<dt>$$Out(X[lod[i]:lod[i+1]], :) =</dt>
-<dd>frac{exp(X[lod[i]:lod[i+1], :])}
-{sum(exp(X[lod[i]:lod[i+1], :]))}$$</dd>
-</dl>
-</dd>
-</dl>
-</dd>
-</dl>
+<p>The algorithm works as follows:</p>
+<blockquote>
+<div>for i-th sequence in a mini-batch:</div></blockquote>
+<p>$$
+Out(X[lod[i]:lod[i+1]], :) = frac{exp(X[lod[i]:lod[i+1], :])} {sum(exp(X[lod[i]:lod[i+1], :]))}
+$$</p>
 <p>For example, for a mini-batch of 3 sequences with variable-length,
 each containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],
 then softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]