diff --git a/doc/algorithm/rnn/rnn.rst b/doc/algorithm/rnn/rnn.rst index a918f02ab160eb181a02e5c8a4535b5353c88701..9653ddbf371764df726b4c2db6724cbb80b64861 100644 --- a/doc/algorithm/rnn/rnn.rst +++ b/doc/algorithm/rnn/rnn.rst @@ -30,7 +30,7 @@ Then at the :code:`process` function, each :code:`yield` function will return th yield src_ids, trg_ids, trg_ids_next -For more details description of how to write a data provider, please refer to :doc:`Python Data Provider <../py_data_provider_wrapper>`. The full data provider file is located at :code:`demo/seqToseq/dataprovider.py`. +For more details description of how to write a data provider, please refer to `PyDataProvider2 <../../ui/data_provider/index.html>`_. The full data provider file is located at :code:`demo/seqToseq/dataprovider.py`. =============================================== Configure Recurrent Neural Network Architecture @@ -106,7 +106,7 @@ We will use the sequence to sequence model with attention as an example to demon In this model, the source sequence :math:`S = \{s_1, \dots, s_T\}` is encoded with a bidirectional gated recurrent neural networks. The hidden states of the bidirectional gated recurrent neural network :math:`H_S = \{H_1, \dots, H_T\}` is called *encoder vector* The decoder is a gated recurrent neural network. When decoding each token :math:`y_t`, the gated recurrent neural network generates a set of weights :math:`W_S^t = \{W_1^t, \dots, W_T^t\}`, which are used to compute a weighted sum of the encoder vector. The weighted sum of the encoder vector is utilized to condition the generation of the token :math:`y_t`. -The encoder part of the model is listed below. It calls :code:`grumemory` to represent gated recurrent neural network. It is the recommended way of using recurrent neural network if the network architecture is simple, because it is faster than :code:`recurrent_group`. We have implemented most of the commonly used recurrent neural network architectures, you can refer to :doc:`Layers <../trainer_config_helpers/layers>` for more details. +The encoder part of the model is listed below. It calls :code:`grumemory` to represent gated recurrent neural network. It is the recommended way of using recurrent neural network if the network architecture is simple, because it is faster than :code:`recurrent_group`. We have implemented most of the commonly used recurrent neural network architectures, you can refer to `Layers <../../ui/api/trainer_config_helpers/layers_index.html>`_ for more details. We also project the encoder vector to :code:`decoder_size` dimensional space, get the first instance of the backward recurrent network, and project it to :code:`decoder_size` dimensional space: @@ -246,6 +246,6 @@ The code is listed below: outputs(beam_gen) -Notice that this generation technique is only useful for decoder like generation process. If you are working on sequence tagging tasks, please refer to :doc:`Semantic Role Labeling Demo <../../../demo/semantic_role_labeling>` for more details. +Notice that this generation technique is only useful for decoder like generation process. If you are working on sequence tagging tasks, please refer to `Semantic Role Labeling Demo <../../demo/semantic_role_labeling/index.html>`_ for more details. The full configuration file is located at :code:`demo/seqToseq/seqToseq_net.py`. diff --git a/doc/build/index.rst b/doc/build/index.rst index 6fefa7990ae461ba706482fe5b1c7fbe32686827..2b983dceb2777e6c79ee1efaa977fef6e5c33ad6 100644 --- a/doc/build/index.rst +++ b/doc/build/index.rst @@ -5,6 +5,7 @@ Install PaddlePaddle ---------------------- .. toctree:: + :maxdepth: 1 :glob: install_* @@ -15,6 +16,7 @@ Build from Source If you want to hack and contribute PaddlePaddle source code, following guides can help you\: .. toctree:: + :maxdepth: 1 :glob: build_from_source.md @@ -29,6 +31,7 @@ state and your experience of installation may not be smooth. If you want to pack docker image, the following guide can help you\: .. toctree:: + :maxdepth: 1 :glob: docker_install.md diff --git a/doc/ui/data_provider/pydataprovider2.rst b/doc/ui/data_provider/pydataprovider2.rst index 94472ad0d8cfb78e348e08b7c575dde61c59670a..152f8a6df6634c6292b4f219f216881c7024f4e4 100644 --- a/doc/ui/data_provider/pydataprovider2.rst +++ b/doc/ui/data_provider/pydataprovider2.rst @@ -152,7 +152,6 @@ Please refer to the following section reference for details. Reference --------- -.. _@provider:: @provider +++++++++ @@ -170,31 +169,28 @@ PaddlePaddle from a user defined function. Its parameters are: usefull in sequential model, that defines batch size is counted upon sequence or token. By default, each sample or sequence counts to 1 when calculating batch size. -* cache is a data cache strategy, see `cache`_ +* cache is a data cache strategy, see `cache`_. * Init_hook function is invoked once the data provider is initialized, - see `init_hook`_ + see `init_hook`_. -.. _input_types:: input_types +++++++++++ PaddlePaddle has four data types, and three sequence types. The four data types are: -* dense_vector represents dense float vector. -* sparse_binary_vector sparse binary vector, most of the value is 0, and +* :code:`dense_vector`: dense float vector. +* :code:`sparse_binary_vector`: sparse binary vector, most of the value is 0, and the non zero elements are fixed to 1. -* sparse_float_vector sparse float vector, most of the value is 0, and some - non zero elements that can be any float value. They are given by the user. -* integer represents an integer scalar, that is especially used for label or - word index. +* :code:`sparse_float_vector`: sparse float vector, most of the value is 0, and some + non zero elements can be any float value. They are given by the user. +* :code:`integer`: an integer scalar, that is especially used for label or word index. +The three sequence types are: -The three sequence types are - -* SequenceType.NO_SEQUENCE means the sample is not a sequence -* SequenceType.SEQUENCE means the sample is a sequence -* SequenceType.SUB_SEQUENCE means it is a nested sequence, that each timestep of +* :code:`SequenceType.NO_SEQUENCE` means the sample is not a sequence. +* :code:`SequenceType.SEQUENCE` means the sample is a sequence. +* :code:`SequenceType.SUB_SEQUENCE` means it is a nested sequence, that each timestep of the input sequence is also a sequence. Different input type has a defferenct input format. Their formats are shown @@ -214,36 +210,39 @@ in the above table. where f represents a float value, i represents an integer value. -.. _init_hook:: -.. _settings:: init_hook +++++++++ init_hook is a function that is invoked once the data provoder is initialized. Its parameters lists as follows: -* The first parameter is a settings object, which is the same to :code:'settings' - in :code:`process` method. The object contains several attributes, including: - * settings.input_types the input types. Reference `input_types`_ - * settings.logger a logging object +* The first parameter is a settings object, which is the same to :code:`settings` + in :code:`process` method. The object contains several attributes, including: + + * :code:`settings.input_types`: the input types. Reference `input_types`_. + * :code:`settings.logger`: a logging object. + * The rest parameters are the key word arguments. It is made up of PaddpePaddle pre-defined parameters and user defined parameters. - * PaddlePaddle defines parameters including: - * is_train is a bool parameter that indicates the DataProvider is used in - training or testing - * file_list is the list of all files. + + * PaddlePaddle-defined parameters including: + + * :code:`is_train` is a bool parameter that indicates the DataProvider is used in + training or testing. + * :code:`file_list` is the list of all files. + * User-defined parameters args can be set in training configuration. Note, PaddlePaddle reserves the right to add pre-defined parameter, so please use :code:`**kwargs` in init_hook to ensure compatibility by accepting the parameters which your init_hook does not use. -.. _cache :: cache +++++ -DataProvider provides two simple cache strategy. They are -* CacheType.NO_CACHE means do not cache any data, then data is read at runtime by +DataProvider provides two simple cache strategy. They are: + +* :code:`CacheType.NO_CACHE` means do not cache any data, then data is read at runtime by the user implemented python module every pass. -* CacheType.CACHE_PASS_IN_MEM means the first pass reads data by the user +* :code:`CacheType.CACHE_PASS_IN_MEM` means the first pass reads data by the user implemented python module, and the rest passes will directly read data from memory. diff --git a/doc_cn/build_and_install/index.rst b/doc_cn/build_and_install/index.rst index 67d85eca9bbd047c262826a256348f36a048b97f..e9182903c5f62b3a96c196d5ba1ebba2fd14f669 100644 --- a/doc_cn/build_and_install/index.rst +++ b/doc_cn/build_and_install/index.rst @@ -1,8 +1,15 @@ 编译与安装 ======================== -.. toctree:: - :maxdepth: 1 - - install/index.rst - cmake/index.rst +PaddlePaddle提供数个预编译的二进制来进行安装,包括Docker镜像,ubuntu的deb安装包等。我们推荐使用Docker镜像来部署环境,同时欢迎贡献更多的安装包。 + +Note: The intallation packages are still in pre-release state and your experience of installation may not be smooth. + +注意:目前PaddlePaddle的安装包还处在pre-release的状态,使用起来或许会不是很顺畅。 + +.. toctree:: + :maxdepth: 1 + + install/docker_install.rst + install/ubuntu_install.rst + cmake/index.rst diff --git a/doc_cn/build_and_install/install/index.rst b/doc_cn/build_and_install/install/index.rst deleted file mode 100644 index ce463728c78c954439f17284bdceea2c502172ea..0000000000000000000000000000000000000000 --- a/doc_cn/build_and_install/install/index.rst +++ /dev/null @@ -1,15 +0,0 @@ -安装PaddlePaddle -========== - -PaddlePaddle提供数个预编译的二进制来进行安装。他们包括Docker镜像,ubuntu的deb安装包等 -。欢迎贡献更多的安装包。我们更推荐使用Docker镜像来部署PaddlePaddle环境。 - -Note: The intallation packages are still in pre-release -state and your experience of installation may not be smooth. - -注意!目前PaddlePaddle的安装包还处在pre-release的状态, -使用起来或许会不是很顺畅。 - -.. toctree:: - docker_install.rst - ubuntu_install.rst diff --git a/doc_cn/ui/data_provider/index.rst b/doc_cn/ui/data_provider/index.rst index 681a131b66389917e81f629a473d1528c9a5a4a8..ec8f8e5dc5b29e3504d0087e844c1f14436919d9 100644 --- a/doc_cn/ui/data_provider/index.rst +++ b/doc_cn/ui/data_provider/index.rst @@ -1,24 +1,15 @@ PaddlePaddle的数据提供(DataProvider)介绍 -================================== +======================================== -数据提供(DataProvider,后用DataProvider代替)是PaddlePaddle负责提供数据的模块。其作用是将训练数据 -传入内存或者显存,让神经网络可以进行训练。简单的使用,用户可以使用Python的 -:code:`PyDataProvider` 来自定义传数据的过程。如果有更复杂的使用,或者需要更高的效率, -用户也可以在C++端自定义一个 :code:`DataProvider` 。 +数据提供(DataProvider)是PaddlePaddle负责提供数据的模块。其作用是将训练数据传入内存或者显存,让神经网络可以进行训练。简单的使用,用户可以使用Python的 :code:`PyDataProvider` 来自定义传数据的过程。如果有更复杂的使用,或者需要更高的效率,用户也可以在C++端自定义一个 :code:`DataProvider` 。 -PaddlePaddle需要用户在网络配置(trainer_config.py)中定义使用什么DataProvider,和DataProvider -的一些参数,训练文件列表(train.list)和测试文件列表(test.list)。 +PaddlePaddle需要用户在网络配置(trainer_config.py)中定义使用哪种DataProvider及其参数,训练文件列表(train.list)和测试文件列表(test.list)。 -其中,train.list和test.list均为本地的两个文件(推荐直接放置到训练目录,以相对路径引用)。如果 -test.list不设置,或者设置为None的话,那么在训练过程中,不会执行测试操作。否则,则会根据命令行 -参数指定的测试方式,在训练过程中进行测试,从而防止过拟合。 +其中,train.list和test.list均为本地的两个文件(推荐直接放置到训练目录,以相对路径引用)。如果test.list不设置,或者设置为None,那么在训练过程中,不会执行测试操作。否则,会根据命令行参数指定的测试方式,在训练过程中进行测试,从而防止过拟合。 -一般情况下,train.list和test.list为纯文本文件,其每一行对应这每一个数据文件。数据文件存放在 -本地磁盘中,将文件的绝对路径或相对路径(相对于PaddlePaddle程序运行时的路径)的方式写在train.list和 -test.list中。当然,train.list和test.list也可以放置hdfs文件路径,或者数据库连接地址等等。 -用户在DataProvider中需要实现如何访问其中每一个文件。 +一般情况下,train.list和test.list为纯文本文件,一行对应一个数据文件,数据文件存放在本地磁盘中。将文件的绝对路径或相对路径(相对于PaddlePaddle程序运行时的路径)写在train.list和test.list中。当然,train.list和test.list也可以放置hdfs文件路径,或者数据库连接地址等等。 -DataProvider的具体用法和如何实现一个新的DataProvider,请参考下述文章: +用户在DataProvider中需要实现如何访问其中每一个文件。DataProvider的具体用法和如何实现一个新的DataProvider,请参考下述文章: .. toctree:: diff --git a/doc_cn/ui/data_provider/pydataprovider2.rst b/doc_cn/ui/data_provider/pydataprovider2.rst index 766f5835386557e1a092c70f60157ec8552ef0d3..e743e4168821ff4713ddb015d03586ce82da4969 100644 --- a/doc_cn/ui/data_provider/pydataprovider2.rst +++ b/doc_cn/ui/data_provider/pydataprovider2.rst @@ -116,8 +116,6 @@ DataProvider创建的时候执行。这个初始化函数具有如下参数: 参考(Reference) --------------- -.. _@provider:: - @provider +++++++++ @@ -134,9 +132,6 @@ DataProvider创建的时候执行。这个初始化函数具有如下参数: * cache 是数据缓存的策略,参考 `cache`_ * init_hook 是初始化时调用的函数,参考 `init_hook`_ - -.. _input_types:: - input_types +++++++++++ @@ -169,16 +164,11 @@ PaddlePaddle的数据包括四种主要类型,和三种序列模式。其中 其中,f代表一个浮点数,i代表一个整数。 -.. _init_hook:: -.. _settings:: - init_hook +++++++++ init_hook可以传入一个函数。这个函数在初始化的时候会被调用。这个函数的参数是: - - * 第一个参数是 settings 对象。这个对象和process的第一个参数一致。具有的属性有 * settings.input_types 设置输入类型。参考 `input_types`_ * settings.logger 一个logging对象 @@ -192,8 +182,6 @@ init_hook可以传入一个函数。这个函数在初始化的时候会被调 注意,PaddlePaddle保留添加参数的权力,所以init_hook尽量使用 :code:`**kwargs` , 来接受不使用的 函数来保证兼容性。 -.. _cache:: - cache +++++