update ctc_greedy_decoder chinese doc, test=document_preview (#1935)

107d113c · Double_V · GitHub · d9e300b1 · 107d113c
隐藏空白更改
内联并排

Showing with 36 addition and 4 deletion

doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst +36 -4

未找到文件。
--- a/doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/ctc_greedy_decoder_cn.rst
@@ -5,17 +5,18 @@ ctc_greedy_decoder

 .. py:function:: paddle.fluid.layers.ctc_greedy_decoder(input, blank, name=None)

-**注意：该OP的输入input必须是2维LoDTensor, lod_level为1** 

 该OP用于贪婪策略解码序列，步骤如下:
    1. 获取输入中的每一行的最大值索引，也就是numpy.argmax(input, axis=0)。
    2. 对于step1结果中的每个序列，合并两个空格之间的重复部分并删除所有空格。

+该API支持两种输入，LoDTensor和Tensor输入，不同输入的代码样例如下：

 **样例**：

 ::

+        # for lod tensor input 
        已知：

        input.data = [[0.6, 0.1, 0.3, 0.1],
@@ -45,13 +46,38 @@ ctc_greedy_decoder

        output.lod = [[2, 1]]

+        # for tensor input
+        input.data = [[[0.6, 0.1, 0.3, 0.1],
+                [0.3, 0.2, 0.4, 0.1],
+                [0.1, 0.5, 0.1, 0.3],
+                [0.5, 0.1, 0.3, 0.1]],
+
+               [[0.5, 0.1, 0.3, 0.1],
+                [0.2, 0.2, 0.2, 0.4],
+                [0.2, 0.2, 0.1, 0.5],
+                [0.5, 0.1, 0.3, 0.1]]]
+
+        input_length.data = [[4], [4]]
+        input.shape = [2, 4, 4]
+
+        step1: Apply argmax to first input sequence which is input.data[0:4]. Then we get:
+            [[0], [2], [1], [0]], for input.data[4:8] is [[0], [3], [3], [0]], shape is [2,4,1]
+        step2: Change the argmax result to use padding mode, then argmax result is
+                [[0, 2, 1, 0], [0, 3, 3, 0]], shape is [2, 4], lod is [], input_length is [[4], [4]]
+        step3: Apply ctc_align to padding argmax result, padding_value is 0
+
+        Finally:
+        output.data = [[2, 1, 0, 0],
+                       [3, 0, 0, 0]]
+        output_length.data = [[2], [1]]
+

 参数:
-        - **input** (Variable) — 变长序列的概率，2维LoDTensor, lod_level为1。它的形状是[Lp, num_classes + 1]，其中Lp是所有输入序列长度的和，num_classes是类别数目(不包括空白标签)。数据类型是float32或者float64
+        - **input** (Variable) — 变长序列的概率， 在输入为LoDTensor情况下，它是具有LoD信息的二维LoDTensor。 形状为[Lp，num_classes +1]，其中Lp是所有输入序列的长度之和，num_classes是真实的类数。 在输入为Tensor情况下，它是带有填充的3-D张量，其形状为[batch_size，N，num_classes +1]。 （不包括空白标签）。 数据类型可以是float32或float64。
        - **blank** (int) — Connectionist Temporal Classification (CTC) loss空白标签索引,  其数值属于半开区间[0,num_classes + 1）
        - **name** (str) — (str|None，可选) – 该参数供开发人员打印调试信息时使用，具体用法请参见 :ref:`api_guide_Name` ，默认值为None

-返回： CTC贪婪解码结果是一个形为(Lp,1)的2维LoDTensor，lod_level为1，其中Lp是所有输出序列的长度之和。如果结果中的所有序列都为空，则输出LoDTensor为[-1]，其lod信息为空。
+返回：对于输入为LoDTensor的情况，返回CTC贪婪解码器的结果，即2-D LoDTensor，形状为[Lp，1]，数据类型为int64。 “ Lp”是所有输出序列长度的总和。 如果结果中的所有序列均为空，则结果LoDTensor将为[-1]，其中LoD为[[]]。对于输入为Tensor的情况，返回一个元组，(output, output_length), 其中，output是一个形状为 [batch_size, N]，类型为int64的Tensor。output_length是一个形状为[batch_size, 1]，类型为int64的Tensor，表示Tensor输入下，每个输出序列的长度。

 返回类型： Variable

@@ -60,9 +86,15 @@ ctc_greedy_decoder

 ..  code-block:: python

+    # for lod mode
    import paddle.fluid as fluid
-    x = fluid.layers.data(name='x', shape=[8], dtype='float32')
+    x = fluid.data(name='x', shape=[None, 8], dtype='float32', lod_level=1)
    cost = fluid.layers.ctc_greedy_decoder(input=x, blank=0)
+    # for padding mode
+    x_pad = fluid.data(name='x_pad', shape=[10, 4, 8], dtype='float32')
+    x_pad_len = fluid.data(name='x_pad_len', shape=[10, 1], dtype='int64')
+    out, out_len = fluid.layers.ctc_greedy_decoder(input=x_pad, blank=0,
+                input_length=x_pad_len)