Add doc for gru_unit op (in fluid) (#7151)

* Add squared error layers doc * Add doc for gru_unit * Remove cdot which isn't supported * Update layers.rst * Update layers.rst (minor)

Add doc for gru_unit op (in fluid) (#7151)
* Add squared error layers doc * Add doc for gru_unit * Remove cdot which isn't supported * Update layers.rst * Update layers.rst (minor)
f3c42f60 · Siddharth Goyal · GitHub · 564dba17 · f3c42f60 · f3c42f60
隐藏空白更改
内联并排

Showing with 43 addition and 11 deletion

doc/api/v2/fluid/layers.rst doc/api/v2/fluid/layers.rst +6 -0

python/paddle/v2/fluid/layers/nn.py python/paddle/v2/fluid/layers/nn.py +37 -11

未找到文件。
--- a/doc/api/v2/fluid/layers.rst
+++ b/doc/api/v2/fluid/layers.rst
@@ -307,6 +307,12 @@ sequence_expand
    :noindex:
+gru_unit
+--------
+..  autofunction:: paddle.v2.fluid.layers.gru_unit
+    :noindex:
 lstm_unit
 ---------
 ..  autofunction:: paddle.v2.fluid.layers.lstm_unit

--- a/python/paddle/v2/fluid/layers/nn.py
+++ b/python/paddle/v2/fluid/layers/nn.py
@@ -236,21 +236,47 @@ def gru_unit(input,
             activation='tanh',
             gate_activation='sigmoid'):
    """
-    GRUUnit Operator implements partial calculations of the GRU unit as following:
+    GRU unit layer. The equation of a gru step is:
-    $$
+        .. math::
-    update \ gate: u_t = actGate(xu_t + W_u * h_{t-1} + b_u) \\
+            u_t & = actGate(xu_{t} + W_u h_{t-1} + b_u)
-    reset \ gate: r_t = actGate(xr_t + W_r * h_{t-1} + b_r)  \\
-    output \ candidate: {h}_t = actNode(xc_t + W_c * dot(r_t, h_{t-1}) + b_c) \\
+            r_t & = actGate(xr_{t} + W_r h_{t-1} + b_r)
-    output: h_t = dot((1 - u_t), h_{t-1}) + dot(u_t, {h}_t)
-    $$
+            ch_t & = actNode(xc_t + W_c dot(r_t, h_{t-1}) + b_c)
+            h_t & = dot((1-u_t), ch_{t-1}) + dot(u_t, h_t)
-    which is same as one time step of GRU Operator.
+    The inputs of gru unit includes :math:`z_t`, :math:`h_{t-1}`. In terms
+    of the equation above, the :math:`z_t` is split into 3 parts - 
+    :math:`xu_t`, :math:`xr_t` and :math:`xc_t`. This means that in order to 
+    implement a full GRU unit operator for an input, a fully 
+    connected layer has to be applied, such that :math:`z_t = W_{fc}x_t`.
+    This layer has three outputs :math:`h_t`, :math:`dot(r_t, h_{t - 1})`
+    and concatenation of :math:`u_t`, :math:`r_t` and :math:`ch_t`.
+    Args:
+        input (Variable): The fc transformed input value of current step.
+        hidden (Variable): The hidden value of lstm unit from previous step.
+        size (integer): The input dimension value.
+        weight (ParamAttr): The weight parameters for gru unit. Default: None
+        bias (ParamAttr): The bias parameters for gru unit. Default: None
+        activation (string): The activation type for cell (actNode). Default: 'tanh'
+        gate_activation (string): The activation type for gates (actGate). Default: 'sigmoid'
+    Returns:
+        tuple: The hidden value, reset-hidden value and gate values.
+    Examples:
+        .. code-block:: python
-    @note To implement the complete GRU unit, fully-connected operator must be
+             # assuming we have x_t_data and prev_hidden of size=10
-    used before to feed xu, xr and xc as the Input of GRUUnit operator.
+             x_t = fluid.layers.fc(input=x_t_data, size=30) 
+             hidden_val, r_h_val, gate_val = fluid.layers.gru_unit(input=x_t,
+                                                    hidden = prev_hidden)
-    TODO(ChunweiYan) add more document here
    """
    activation_dict = dict(
        identity=0,