add text/datasets Chinese doc (#2628)

fb4e0c3b · LiuChiachi · GitHub · 55bd7e03 · fb4e0c3b · fb4e0c3b
8 changed file
--- a/doc/paddle/api/paddle/text/datasets/conll05/Conll05st_cn.rst
+++ b/doc/paddle/api/paddle/text/datasets/conll05/Conll05st_cn.rst
@@ -6,36 +6,35 @@ Conll05st
 .. py:class:: paddle.text.datasets.Conll05st()


-    Implementation of `Conll05st <https://www.cs.upc.edu/~srlconll/soft.html>`_
-    test dataset.
+该类是对`Conll05st <https://www.cs.upc.edu/~srlconll/soft.html>`_
+测试数据集的实现.

-    Note: only support download test dataset automatically for that
-          only test dataset of Conll05st is public.
+.. note::
+    只支持自动下载公共的 Conll05st测试数据集。

-    参数
+参数
 :::::::::
-        data_file(str): path to data tar file, can be set None if
-            :attr:`download` is True. Default None
-        word_dict_file(str): path to word dictionary file, can be set None if
-            :attr:`download` is True. Default None
-        verb_dict_file(str): path to verb dictionary file, can be set None if
-            :attr:`download` is True. Default None
-        target_dict_file(str): path to target dictionary file, can be set None if
-            :attr:`download` is True. Default None
-        emb_file(str): path to embedding dictionary file, only used for
-            :code:`get_embedding` can be set None if :attr:`download` is
-            True. Default None
-        download(bool): whether to download dataset automatically if
-            :attr:`data_file` :attr:`word_dict_file` :attr:`verb_dict_file`
-            :attr:`target_dict_file` is not set. Default True
-
-    Returns:
-        Dataset: instance of conll05st dataset
-
-    代码示例
+    - data_file（str）- 保存数据的路径，如果参数:attr:`download`设置为True，
+    可设置为None。默认为None。
+    - word_dict_file（str）- 保存词典的路径。如果参数:attr:`download`设置为True，
+    可设置为None。默认为None。
+    - verb_dict_file（str）- 保存动词词典的路径。如果参数:attr:`download`设置为True，
+    可设置为None。默认为None。
+    - target_dict_file（str）- 保存目标词典的路径如果参数:attr:`download`设置为True，
+    可设置为None。默认为None。
+    - emb_file（str）- 保存词嵌入词典的文件。只有在:code:`get_embedding`能被设置为None
+    且:attr:`download` 为True时使用。
+    - download（bool）- 如果:attr:`data_file` :attr:`word_dict_file` 
+    :attr:`verb_dict_file` 和:attr:`target_dict_file` 未设置，是否下载数据集。默认为True。
+
+返回值
 :::::::::
+``Dataset``，conll05st数据集实例。

-        .. code-block:: python
+代码示例
+:::::::::
+
+.. code-block:: python

    import paddle
    from paddle.text.datasets import Conll05st
@@ -61,4 +60,3 @@ Conll05st
        pred_idx, mark, label= model(pred_idx, mark, label)
        print(pred_idx.numpy(), mark.numpy(), label.numpy())

-    
\ No newline at end of file
--- a/doc/paddle/api/paddle/text/datasets/imdb/Imdb_cn.rst
+++ b/doc/paddle/api/paddle/text/datasets/imdb/Imdb_cn.rst
@@ -6,24 +6,24 @@ Imdb
 .. py:class:: paddle.text.datasets.Imdb()


-    Implementation of `IMDB <https://www.imdb.com/interfaces/>`_ dataset.
+该类是对`IMDB <https://www.imdb.com/interfaces/>`_ 测试数据集的实现。

-    参数
+参数
 :::::::::
-        data_file(str): path to data tar file, can be set None if
-            :attr:`download` is True. Default None
-        mode(str): 'train' 'test' mode. Default 'train'.
-        cutoff(int): cutoff number for building word dictionary. Default 150.
-        download(bool): whether to download dataset automatically if
-            :attr:`data_file` is not set. Default True
+    - data_file(str) - 保存压缩数据的路径，如果参数:attr:`download`设置为True，
+    可设置为None。默认为None。
+    - mode(str) - 'train' 或'test' 模式。默认为'train'。
+    - cutoff(int) - 构建词典的截止大小。默认为Default 150。
+    - download(bool) - 如果:attr:`data_file`未设置，是否自动下载数据集。默认为True。

-    Returns:
-        Dataset: instance of IMDB dataset
+返回值
+:::::::::
+``Dataset``， IMDB数据集实例。

-    代码示例
+代码示例
 :::::::::

-        .. code-block:: python
+.. code-block:: python

    import paddle
    from paddle.text.datasets import Imdb
@@ -48,4 +48,3 @@ Imdb
        image, label = model(doc, label)
        print(doc.numpy().shape, label.numpy().shape)

-    
\ No newline at end of file
--- a/doc/paddle/api/paddle/text/datasets/imikolov/Imikolov_cn.rst
+++ b/doc/paddle/api/paddle/text/datasets/imikolov/Imikolov_cn.rst
@@ -6,26 +6,26 @@ Imikolov
 .. py:class:: paddle.text.datasets.Imikolov()


-    Implementation of imikolov dataset.
+该类是对imikolov测试数据集的实现。

-    参数
+参数
 :::::::::
-        data_file(str): path to data tar file, can be set None if
-            :attr:`download` is True. Default None
-        data_type(str): 'NGRAM' or 'SEQ'. Default 'NGRAM'.
-        window_size(int): sliding window size for 'NGRAM' data. Default -1.
-        mode(str): 'train' 'test' mode. Default 'train'.
-        min_word_freq(int): minimal word frequence for building word dictionary. Default 50.
-        download(bool): whether to download dataset automatically if
-            :attr:`data_file` is not set. Default True
-
-    Returns:
-        Dataset: instance of imikolov dataset
-
-    代码示例
+    - data_file（str）- 保存数据的路径，如果参数:attr:`download`设置为True，
+    可设置为None。默认为None。
+    - data_type（str）- 'NGRAM'或'SEQ'。默认为'NGRAM'。
+    - window_size（int) - 'NGRAM'数据滑动窗口的大小。默认为-1。
+    - mode（str）- 'train' 'test' mode. Default 'train'.
+    - min_word_freq（int）- 构建词典的最小词频。默认为50。
+    - download（bool）- 如果:attr:`data_file`未设置，是否自动下载数据集。默认为True。
+
+返回值
 :::::::::
+``Dataset``，imikolov数据集实例。

-        .. code-block:: python
+代码示例
+:::::::::
+
+.. code-block:: python

    import paddle
    from paddle.text.datasets import Imikolov
@@ -50,4 +50,3 @@ Imikolov
        src, trg = model(src, trg)
        print(src.numpy().shape, trg.numpy().shape)

-    
\ No newline at end of file
--- a/doc/paddle/api/paddle/text/datasets/movie_reviews/MovieReviews_cn.rst
+++ b/doc/paddle/api/paddle/text/datasets/movie_reviews/MovieReviews_cn.rst
@@ -6,23 +6,23 @@ MovieReviews
 .. py:class:: paddle.text.datasets.MovieReviews()


-    Implementation of `NLTK movie reviews <http://www.nltk.org/nltk_data/>`_ dataset.
+该类是对`NLTK movie reviews <http://www.nltk.org/nltk_data/>`_ 测试数据集的实现。

-    参数
+参数
 :::::::::
-        data_file(str): path to data tar file, can be set None if
-            :attr:`download` is True. Default None
-        mode(str): 'train' 'test' mode. Default 'train'.
-        download(bool): whether auto download cifar dataset if
-            :attr:`data_file` unset. Default True.
+    - data_file（str）- 保存压缩数据的路径，如果参数:attr:`download`设置为True，
+    可设置为None。默认为None。
+    - mode（str）- 'train'或 'test' 模式。默认为'train'。
+    - download（bool）- 如果:attr:`data_file`未设置，是否自动下载数据集。默认为True。

-    Returns:
-        Dataset: instance of movie reviews dataset
+返回值
+:::::::::
+``Dataset``，NLTK movie reviews数据集实例。

-    代码示例
+代码示例
 :::::::::

-        .. code-block:: python
+.. code-block:: python

    import paddle
    from paddle.text.datasets import MovieReviews
@@ -47,4 +47,3 @@ MovieReviews
        word_list, category = model(word_list, category)
        print(word_list.numpy().shape, category.numpy())

-    
\ No newline at end of file
--- a/doc/paddle/api/paddle/text/datasets/movielens/Movielens_cn.rst
+++ b/doc/paddle/api/paddle/text/datasets/movielens/Movielens_cn.rst
@@ -6,25 +6,26 @@ Movielens
 .. py:class:: paddle.text.datasets.Movielens()


-    Implementation of `Movielens 1-M <https://grouplens.org/datasets/movielens/1m/>`_ dataset.
+该类是对`Movielens 1-M <https://grouplens.org/datasets/movielens/1m/>`_
+测试数据集的实现。

-    参数
+参数
 :::::::::
-        data_file(str): path to data tar file, can be set None if
-            :attr:`download` is True. Default None
-        mode(str): 'train' or 'test' mode. Default 'train'.
-        test_ratio(float): split ratio for test sample. Default 0.1.
-        rand_seed(int): random seed. Default 0.
-        download(bool): whether to download dataset automatically if
-            :attr:`data_file` is not set. Default True
-
-    Returns:
-        Dataset: instance of Movielens 1-M dataset
-
-    代码示例
+    - data_file（str）- 保存压缩数据的路径，如果参数:attr:`download`设置为True，
+    可设置为None。默认为None。
+    - mode（str）- 'train' 或 'test' 模式。默认为'train'。
+    - test_ratio（float) - 为测试集划分的比例。默认为0.1。
+    - rand_seed（int）- 随机数种子。默认为0。
+    - download（bool）- 如果:attr:`data_file`未设置，是否自动下载数据集。默认为True。
+
+返回值
 :::::::::
+    ``Dataset``，Movielens 1-M数据集实例。

-        .. code-block:: python
+代码示例
+:::::::::
+
+.. code-block:: python

    import paddle
    from paddle.text.datasets import Movielens
@@ -49,5 +50,3 @@ Movielens
        model = SimpleNet()
        category, title, rating = model(category, title, rating)
        print(category.numpy().shape, title.numpy().shape, rating.numpy().shape)
-
-    
\ No newline at end of file
--- a/doc/paddle/api/paddle/text/datasets/uci_housing/UCIHousing_cn.rst
+++ b/doc/paddle/api/paddle/text/datasets/uci_housing/UCIHousing_cn.rst
@@ -6,24 +6,24 @@ UCIHousing
 .. py:class:: paddle.text.datasets.UCIHousing()


-    Implementation of `UCI housing <https://archive.ics.uci.edu/ml/datasets/Housing>`_
-    dataset
+该类是对`UCI housing <https://archive.ics.uci.edu/ml/datasets/Housing>`_
+测试数据集的实现。

-    参数
+参数
 :::::::::
-        data_file(str): path to data file, can be set None if
-            :attr:`download` is True. Default None
-        mode(str): 'train' or 'test' mode. Default 'train'.
-        download(bool): whether to download dataset automatically if
-            :attr:`data_file` is not set. Default True
+    - data_file（str）- 保存数据的路径，如果参数:attr:`download`设置为True，
+    可设置为None。默认为None。
+    - mode（str）- 'train'或'test'模式。默认为'train'。
+    - download（bool）- 如果:attr:`data_file`未设置，是否自动下载数据集。默认为True。

-    Returns:
-        Dataset: instance of UCI housing dataset.
+返回值
+:::::::::
+``Dataset``，UCI housing数据集实例。

-    代码示例
+代码示例
 :::::::::
        
-        .. code-block:: python
+.. code-block:: python

    import paddle
    from paddle.text.datasets import UCIHousing
@@ -48,4 +48,3 @@ UCIHousing
        feature, target = model(feature, target)
        print(feature.numpy().shape, target.numpy())

-    
\ No newline at end of file
--- a/doc/paddle/api/paddle/text/datasets/wmt14/WMT14_cn.rst
+++ b/doc/paddle/api/paddle/text/datasets/wmt14/WMT14_cn.rst
@@ -6,27 +6,27 @@ WMT14
 .. py:class:: paddle.text.datasets.WMT14()


-    Implementation of `WMT14 <http://www.statmt.org/wmt14/>`_ test dataset.
-    The original WMT14 dataset is too large and a small set of data for set is
-    provided. This module will download dataset from
-    http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz
+该类是对`WMT14 <http://www.statmt.org/wmt14/>`_ 测试数据集实现。
+由于原始WMT14数据集太大，我们在这里提供了一组小数据集。该类将从
+http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz
+下载数据集。

-    参数
+参数
 :::::::::
-        data_file(str): path to data tar file, can be set None if
-            :attr:`download` is True. Default None
-        mode(str): 'train', 'test' or 'gen'. Default 'train'
-        dict_size(int): word dictionary size. Default -1.
-        download(bool): whether to download dataset automatically if
-            :attr:`data_file` is not set. Default True
+    - data_file（str）- 保存数据集压缩文件的路径, 如果参数:attr:`download`设置为True，可设置为None。
+    默认为None。
+    - mode（str）- 'train', 'test' 或'gen'。默认为'train'。
+    - dict_size（int）- 词典大小。默认为-1。
+    - download（bool）- 如果:attr:`data_file`未设置，是否自动下载数据集。默认为True。

-    Returns:
-        Dataset: instance of WMT14 dataset
+返回值
+:::::::::
+``Dataset``，WMT14数据集实例。

-    代码示例
+代码示例
 :::::::::

-        .. code-block:: python
+.. code-block:: python

    import paddle
    from paddle.text.datasets import WMT14
@@ -51,5 +51,3 @@ WMT14
        model = SimpleNet()
        src_ids, trg_ids, trg_ids_next = model(src_ids, trg_ids, trg_ids_next)
        print(src_ids.numpy(), trg_ids.numpy(), trg_ids_next.numpy())
-
-    
\ No newline at end of file
--- a/doc/paddle/api/paddle/text/datasets/wmt16/WMT16_cn.rst
+++ b/doc/paddle/api/paddle/text/datasets/wmt16/WMT16_cn.rst
@@ -6,14 +6,14 @@ WMT16
 .. py:class:: paddle.text.datasets.WMT16()


-    Implementation of `WMT16 <http://www.statmt.org/wmt16/>`_ test dataset.
-    ACL2016 Multimodal Machine Translation. Please see this website for more
-    details: http://www.statmt.org/wmt16/multimodal-task.html#task1
+该类是对`WMT16 <http://www.statmt.org/wmt16/>`_ 测试数据集实现。
+ACL2016多模态机器翻译。有关更多详细信息，请访问此网站：
+http://www.statmt.org/wmt16/multimodal-task.html#task1

-    If you use the dataset created for your task, please cite the following paper:
-    Multi30K: Multilingual English-German Image Descriptions.
+如果您任务中使用了该数据集，请引用如下论文：
+Multi30K: Multilingual English-German Image Descriptions.

-    .. code-block:: text
+.. code-block:: text

    @article{elliott-EtAl:2016:VL16,
        author    = {{Elliott}, D. and {Frank}, S. and {Sima"an}, K. and {Specia}, L.},
@@ -24,24 +24,24 @@ WMT16
        year      = 2016
    }

-    参数
+参数
 :::::::::
-        data_file(str): path to data tar file, can be set None if
-            :attr:`download` is True. Default None
-        mode(str): 'train', 'test' or 'val'. Default 'train'
-        src_dict_size(int): word dictionary size for source language word. Default -1.
-        trg_dict_size(int): word dictionary size for target language word. Default -1.
-        lang(str): source language, 'en' or 'de'. Default 'en'.
-        download(bool): whether to download dataset automatically if
-            :attr:`data_file` is not set. Default True
-
-    Returns:
-        Dataset: instance of WMT16 dataset
-
-    代码示例
+    - data_file（str）- 保存数据集压缩文件的路径，如果参数:attr:`download`设置为True，可设置为None。
+    默认值为None。
+    - mode（str）- 'train', 'test' 或 'val'。默认为'train'。
+    - src_dict_size（int）- 源语言词典大小。默认为-1。
+    - trg_dict_size（int) - 目标语言测点大小。默认为-1。
+    - lang（str）- 源语言，'en' 或 'de'。默认为 'en'。
+    - download（bool）- 如果:attr:`data_file`未设置，是否自动下载数据集。默认为True。
+
+返回值
 :::::::::
+``Dataset``，WMT16数据集实例。

-        .. code-block:: python
+代码示例
+:::::::::
+
+.. code-block:: python

    import paddle
    from paddle.text.datasets import WMT16
@@ -67,4 +67,3 @@ WMT16
        src_ids, trg_ids, trg_ids_next = model(src_ids, trg_ids, trg_ids_next)
        print(src_ids.numpy(), trg_ids.numpy(), trg_ids_next.numpy())

-    
\ No newline at end of file