提交 0948ea30 编写于 作者: Q qingqing01 提交者: GitHub

Merge pull request #806 from luotao1/link

fix some dead links in doc/
.. _api_pydataprovider: .. _api_pydataprovider2_en:
PyDataProvider2 PyDataProvider2
=============== ===============
...@@ -104,6 +104,8 @@ And PaddlePadle will do all of the rest things\: ...@@ -104,6 +104,8 @@ And PaddlePadle will do all of the rest things\:
Is this cool? Is this cool?
.. _api_pydataprovider2_en_sequential_model:
DataProvider for the sequential model DataProvider for the sequential model
------------------------------------- -------------------------------------
A sequence model takes sequences as its input. A sequence is made up of several A sequence model takes sequences as its input. A sequence is made up of several
......
...@@ -23,7 +23,7 @@ python's :code:`help()` function. Let's walk through the above python script: ...@@ -23,7 +23,7 @@ python's :code:`help()` function. Let's walk through the above python script:
* At the beginning, use :code:`swig_paddle.initPaddle()` to initialize * At the beginning, use :code:`swig_paddle.initPaddle()` to initialize
PaddlePaddle with command line arguments, for more about command line arguments PaddlePaddle with command line arguments, for more about command line arguments
see `Command Line Arguments <../cmd_argument/detail_introduction.html>`_. see :ref:`cmd_detail_introduction_en` .
* Parse the configuration file that is used in training with :code:`parse_config()`. * Parse the configuration file that is used in training with :code:`parse_config()`.
Because data to predict with always have no label, and output of prediction work Because data to predict with always have no label, and output of prediction work
normally is the output layer rather than the cost layer, so you should modify normally is the output layer rather than the cost layer, so you should modify
...@@ -36,7 +36,7 @@ python's :code:`help()` function. Let's walk through the above python script: ...@@ -36,7 +36,7 @@ python's :code:`help()` function. Let's walk through the above python script:
- Note: As swig_paddle can only accept C++ matrices, we offer a utility - Note: As swig_paddle can only accept C++ matrices, we offer a utility
class DataProviderConverter that can accept the same input data with class DataProviderConverter that can accept the same input data with
PyDataProvider2, for more information please refer to document PyDataProvider2, for more information please refer to document
of `PyDataProvider2 <../data_provider/pydataprovider2.html>`_. of :ref:`api_pydataprovider2_en` .
* Do the prediction with :code:`forwardTest()`, which takes the converted * Do the prediction with :code:`forwardTest()`, which takes the converted
input data and outputs the activations of the output layer. input data and outputs the activations of the output layer.
......
.. _api_trainer_config_helpers_layers:
====== ======
Layers Layers
====== ======
......
...@@ -99,11 +99,3 @@ In PaddlePaddle, training is just to get a collection of model parameters, which ...@@ -99,11 +99,3 @@ In PaddlePaddle, training is just to get a collection of model parameters, which
Although starts from a random guess, you can see that value of ``w`` changes quickly towards 2 and ``b`` changes quickly towards 0.3. In the end, the predicted line is almost identical with real answer. Although starts from a random guess, you can see that value of ``w`` changes quickly towards 2 and ``b`` changes quickly towards 0.3. In the end, the predicted line is almost identical with real answer.
There, you have recovered the underlying pattern between ``X`` and ``Y`` only from observed data. There, you have recovered the underlying pattern between ``X`` and ``Y`` only from observed data.
5. Where to Go from Here
-------------------------
- `Install and Build <../build_and_install/index.html>`_
- `Tutorials <../demo/quick_start/index_en.html>`_
- `Example and Demo <../demo/index.html>`_
```eval_rst
.. _cmd_detail_introduction_en:
```
# Detail Description # Detail Description
## Common ## Common
......
...@@ -30,7 +30,7 @@ Then at the :code:`process` function, each :code:`yield` function will return th ...@@ -30,7 +30,7 @@ Then at the :code:`process` function, each :code:`yield` function will return th
yield src_ids, trg_ids, trg_ids_next yield src_ids, trg_ids, trg_ids_next
For more details description of how to write a data provider, please refer to `PyDataProvider2 <../../ui/data_provider/index.html>`_. The full data provider file is located at :code:`demo/seqToseq/dataprovider.py`. For more details description of how to write a data provider, please refer to :ref:`api_pydataprovider2_en` . The full data provider file is located at :code:`demo/seqToseq/dataprovider.py`.
=============================================== ===============================================
Configure Recurrent Neural Network Architecture Configure Recurrent Neural Network Architecture
...@@ -106,7 +106,7 @@ We will use the sequence to sequence model with attention as an example to demon ...@@ -106,7 +106,7 @@ We will use the sequence to sequence model with attention as an example to demon
In this model, the source sequence :math:`S = \{s_1, \dots, s_T\}` is encoded with a bidirectional gated recurrent neural networks. The hidden states of the bidirectional gated recurrent neural network :math:`H_S = \{H_1, \dots, H_T\}` is called *encoder vector* The decoder is a gated recurrent neural network. When decoding each token :math:`y_t`, the gated recurrent neural network generates a set of weights :math:`W_S^t = \{W_1^t, \dots, W_T^t\}`, which are used to compute a weighted sum of the encoder vector. The weighted sum of the encoder vector is utilized to condition the generation of the token :math:`y_t`. In this model, the source sequence :math:`S = \{s_1, \dots, s_T\}` is encoded with a bidirectional gated recurrent neural networks. The hidden states of the bidirectional gated recurrent neural network :math:`H_S = \{H_1, \dots, H_T\}` is called *encoder vector* The decoder is a gated recurrent neural network. When decoding each token :math:`y_t`, the gated recurrent neural network generates a set of weights :math:`W_S^t = \{W_1^t, \dots, W_T^t\}`, which are used to compute a weighted sum of the encoder vector. The weighted sum of the encoder vector is utilized to condition the generation of the token :math:`y_t`.
The encoder part of the model is listed below. It calls :code:`grumemory` to represent gated recurrent neural network. It is the recommended way of using recurrent neural network if the network architecture is simple, because it is faster than :code:`recurrent_group`. We have implemented most of the commonly used recurrent neural network architectures, you can refer to `Layers <../../ui/api/trainer_config_helpers/layers_index.html>`_ for more details. The encoder part of the model is listed below. It calls :code:`grumemory` to represent gated recurrent neural network. It is the recommended way of using recurrent neural network if the network architecture is simple, because it is faster than :code:`recurrent_group`. We have implemented most of the commonly used recurrent neural network architectures, you can refer to :ref:`api_trainer_config_helpers_layers` for more details.
We also project the encoder vector to :code:`decoder_size` dimensional space, get the first instance of the backward recurrent network, and project it to :code:`decoder_size` dimensional space: We also project the encoder vector to :code:`decoder_size` dimensional space, get the first instance of the backward recurrent network, and project it to :code:`decoder_size` dimensional space:
...@@ -246,6 +246,6 @@ The code is listed below: ...@@ -246,6 +246,6 @@ The code is listed below:
outputs(beam_gen) outputs(beam_gen)
Notice that this generation technique is only useful for decoder like generation process. If you are working on sequence tagging tasks, please refer to `Semantic Role Labeling Demo <../../demo/semantic_role_labeling/index.html>`_ for more details. Notice that this generation technique is only useful for decoder like generation process. If you are working on sequence tagging tasks, please refer to :ref:`semantic_role_labeling_en` for more details.
The full configuration file is located at :code:`demo/seqToseq/seqToseq_net.py`. The full configuration file is located at :code:`demo/seqToseq/seqToseq_net.py`.
...@@ -51,7 +51,7 @@ In this tutorial, we will focus on nvprof and nvvp. ...@@ -51,7 +51,7 @@ In this tutorial, we will focus on nvprof and nvvp.
:code:`test_GpuProfiler` from :code:`paddle/math/tests` directory will be used to evaluate :code:`test_GpuProfiler` from :code:`paddle/math/tests` directory will be used to evaluate
above profilers. above profilers.
.. literalinclude:: ../../paddle/math/tests/test_GpuProfiler.cpp .. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++ :language: c++
:lines: 111-124 :lines: 111-124
:linenos: :linenos:
...@@ -77,7 +77,7 @@ As a simple example, consider the following: ...@@ -77,7 +77,7 @@ As a simple example, consider the following:
1. Add :code:`REGISTER_TIMER_INFO` and :code:`printAllStatus` functions (see the emphasize-lines). 1. Add :code:`REGISTER_TIMER_INFO` and :code:`printAllStatus` functions (see the emphasize-lines).
.. literalinclude:: ../../paddle/math/tests/test_GpuProfiler.cpp .. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++ :language: c++
:lines: 111-124 :lines: 111-124
:emphasize-lines: 8-10,13 :emphasize-lines: 8-10,13
...@@ -124,7 +124,7 @@ To use this command line profiler **nvprof**, you can simply issue the following ...@@ -124,7 +124,7 @@ To use this command line profiler **nvprof**, you can simply issue the following
1. Add :code:`REGISTER_GPU_PROFILER` function (see the emphasize-lines). 1. Add :code:`REGISTER_GPU_PROFILER` function (see the emphasize-lines).
.. literalinclude:: ../../paddle/math/tests/test_GpuProfiler.cpp .. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++ :language: c++
:lines: 111-124 :lines: 111-124
:emphasize-lines: 6-7 :emphasize-lines: 6-7
......
...@@ -93,7 +93,7 @@ where `train.sh` is almost the same as `demo/seqToseq/translation/train.sh`, the ...@@ -93,7 +93,7 @@ where `train.sh` is almost the same as `demo/seqToseq/translation/train.sh`, the
- `--init_model_path`: path of the initialization model, here is `data/paraphrase_model` - `--init_model_path`: path of the initialization model, here is `data/paraphrase_model`
- `--load_missing_parameter_strategy`: operations when model file is missing, here use a normal distibution to initialize the other parameters except for the embedding layer - `--load_missing_parameter_strategy`: operations when model file is missing, here use a normal distibution to initialize the other parameters except for the embedding layer
For users who want to understand the dataset format, model architecture and training procedure in detail, please refer to [Text generation Tutorial](../text_generation/text_generation.md). For users who want to understand the dataset format, model architecture and training procedure in detail, please refer to [Text generation Tutorial](../text_generation/index_en.md).
## Optional Function ## ## Optional Function ##
### Embedding Parameters Observation ### Embedding Parameters Observation
......
...@@ -264,7 +264,7 @@ In this :code:`dataprovider.py`, we should set\: ...@@ -264,7 +264,7 @@ In this :code:`dataprovider.py`, we should set\:
* use_seq\: Whether this :code:`dataprovider.py` in sequence mode or not. * use_seq\: Whether this :code:`dataprovider.py` in sequence mode or not.
* process\: Return each sample of data to :code:`paddle`. * process\: Return each sample of data to :code:`paddle`.
The data provider details document see :ref:`api_pydataprovider`. The data provider details document see :ref:`api_pydataprovider2_en`.
Train Train
````` `````
......
```eval_rst
.. _semantic_role_labeling_en:
```
# Semantic Role labeling Tutorial # # Semantic Role labeling Tutorial #
Semantic role labeling (SRL) is a form of shallow semantic parsing whose goal is to discover the predicate-argument structure of each predicate in a given input sentence. SRL is useful as an intermediate step in a wide range of natural language processing tasks, such as information extraction. automatic document categorization and question answering. An instance is as following [1]: Semantic role labeling (SRL) is a form of shallow semantic parsing whose goal is to discover the predicate-argument structure of each predicate in a given input sentence. SRL is useful as an intermediate step in a wide range of natural language processing tasks, such as information extraction. automatic document categorization and question answering. An instance is as following [1]:
......
# 语义角色标注教程 #
语义角色标注(Semantic role labeling, SRL)是浅语义解析的一种形式,其目的是在给定的输入句子中发现每个谓词的谓词参数结构。 SRL作为很多自然语言处理任务中的中间步骤是很有用的,如信息提取、文档自动分类和问答。 实例如下 [1]:
[ <sub>A0</sub> 他 ] [ <sub>AM-MOD</sub> 将 ][ <sub>AM-NEG</sub> 不会 ] [ <sub>V</sub> 接受] [ <sub>A1</sub> 任何东西 ] 从 [<sub>A2</sub> 那些他写的东西中 ]。
- V: 动词
- A0: 接受者
- A1: 接受的东西
- A2: 从……接受
- A3: 属性
- AM-MOD: 情态动词
- AM-NEG: 否定
给定动词“接受”,句子中的大部分将会扮演某些语义角色。这里,标签方案来自 Penn Proposition Bank。
到目前为止,大多数成功的SRL系统是建立在某种形式的解析结果之上的,其中在语法结构上使用了预先定义的特征模板。 本教程将介绍使用深度双向长短期记忆(DB-LSTM)模型[2]的端到端系统来解决SRL任务,这在很大程度上优于先前的最先进的系统。 这个系统将SRL任务视为序列标记问题。
## 数据描述
相关论文[2]采用 CoNLL-2005&2012 共享任务中设置的数据进行训练和测试。根据数据许可证,演示采用 CoNLL-2005 的测试数据集,可以在网站上找到。
用户只需执行以下命令就可以下载并处理原始数据:
```bash
cd data
./get_data.sh
```
`data `目录会出现如下几个新的文件:
```bash
conll05st-release:the test data set of CoNll-2005 shared task
test.wsj.words:the Wall Street Journal data sentences
test.wsj.props: the propositional arguments
feature: the extracted features from data set
```
## 训练
### DB-LSTM
请参阅情绪分析的演示以了解有关长期短期记忆单元的更多信息。
与在 Sentiment Analysis 演示中使用的 Bidirectional-LSTM 不同,DB-LSTM 采用另一种方法来堆叠LSTM层。首先,标准LSTM以正向处理该序列。该 LSTM 层的输入和输出作为下一个 LSTM 层的输入,并被反向处理。这两个标准 LSTM 层组成一对 LSTM。然后我们堆叠一对对的 LSTM 层后得到深度 LSTM 模型。
下图展示了时间扩展的2层 DB-LSTM 网络。
<center>
![pic](./network_arch.png)
</center>
### 特征
两个输入特性在这个管道中起着至关重要的作用:predicate(pred)和argument(arguments)。 还采用了两个其他特征:谓词上下文(ctx-p)和区域标记(mr)。 因为单个谓词不能精确地描述谓词信息,特别是当相同的词在句子中出现多于一次时。 使用谓词上下文,可以在很大程度上消除歧义。类似地,如果它位于谓词上下文区域中,则使用区域标记 m<sub>r</sub> = 1 来表示参数位置,反之则 m<sub>r</sub> = 0。这四个简单的特征是我们的SRL系统所需要的。上下文大小设置为1的一个样本的特征如下[2]所示:
<center>
![pic](./feature.jpg)
</center>
在这个示例中,相应的标记句子是:
[ <sub>A1</sub> A record date ] has [ <sub>AM-NEG</sub> n't ] been [ <sub>V</sub> set ] .
在演示中, 我们采用上面的特征模板, 包括: `argument`, `predicate`, `ctx-p (p=-1,0,1)`, `mark` 并使用 `B/I/O` 方案来标记每个参数。这些特征和标签存储在 `feature` 文件中, 用`\t`分割。
### 数据提供
`dataprovider.py` 是一个包装数据的 Python 文件。 函数 `hook()` 定义了网络的数据槽。六个特征和标签都是索引槽。
```
def hook(settings, word_dict, label_dict, **kwargs):
settings.word_dict = word_dict
settings.label_dict = label_dict
#all inputs are integral and sequential type
settings.slots = [
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(predicate_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(2),
integer_value_sequence(len(label_dict))]
```
相应的数据迭代器如下:
```
@provider(init_hook=hook, should_shuffle=True, calc_batch_size=get_batch_size,
can_over_batch_size=False, cache=CacheType.CACHE_PASS_IN_MEM)
def process(settings, file_name):
with open(file_name, 'r') as fdata:
for line in fdata:
sentence, predicate, ctx_n2, ctx_n1, ctx_0, ctx_p1, ctx_p2, mark, label = \
line.strip().split('\t')
words = sentence.split()
sen_len = len(words)
word_slot = [settings.word_dict.get(w, UNK_IDX) for w in words]
predicate_slot = [settings.predicate_dict.get(predicate)] * sen_len
ctx_n2_slot = [settings.word_dict.get(ctx_n2, UNK_IDX)] * sen_len
ctx_n1_slot = [settings.word_dict.get(ctx_n1, UNK_IDX)] * sen_len
ctx_0_slot = [settings.word_dict.get(ctx_0, UNK_IDX)] * sen_len
ctx_p1_slot = [settings.word_dict.get(ctx_p1, UNK_IDX)] * sen_len
ctx_p2_slot = [settings.word_dict.get(ctx_p2, UNK_IDX)] * sen_len
marks = mark.split()
mark_slot = [int(w) for w in marks]
label_list = label.split()
label_slot = [settings.label_dict.get(w) for w in label_list]
yield word_slot, predicate_slot, ctx_n2_slot, ctx_n1_slot, \
ctx_0_slot, ctx_p1_slot, ctx_p2_slot, mark_slot, label_slot
```
函数 `process` 产出有8个特征和标签的9个表。
### 神经网络配置
`db_lstm.py` 是在训练过程中加载字典并定义数据提供程序模块和网络架构的神经网络配置文件。
九个 `data_layer` 从数据提供程序加载实例。八个特征分别转换为嵌入,并由`mixed_layer`混合。 深度双向LSTM层提取softmax层的特征。目标函数是标签的交叉熵。
### 训练
训练的脚本是 `train.sh`,用户只需执行:
```bash
./train.sh
```
`train.sh` 中的内容:
```
paddle train \
--config=./db_lstm.py \
--use_gpu=0 \
--log_period=5000 \
--trainer_count=1 \
--show_parameter_stats_period=5000 \
--save_dir=./output \
--num_passes=10000 \
--average_test_period=10000000 \
--init_model_path=./data \
--load_missing_parameter_strategy=rand \
--test_all_data_in_one_period=1 \
2>&1 | tee 'train.log'
```
- \--config=./db_lstm.py : 网络配置文件
- \--use_gpu=false: 使用 CPU 训练(如果已安装 PaddlePaddle GPU版本并想使用 GPU 训练可以设置为true,目前 crf_layer 不支持 GPU)
- \--log_period=500: 每20批(batch)输出日志
- \--trainer_count=1: 设置线程数(或 GPU 数)
- \--show_parameter_stats_period=5000: 每100批显示参数统计
- \--save_dir=./output: 模型输出路径
- \--num_passes=10000: 设置通过数,一次通过意味着PaddlePaddle训练数据集中的所有样本一次
- \--average_test_period=10000000: 每个 average_test_period 批次对平均参数进行测试
- \--init_model_path=./data: 参数初始化路径
- \--load_missing_parameter_strategy=rand: 随机初始不存在的参数
- \--test_all_data_in_one_period=1: 在一个周期内测试所有数据
训练后,模型将保存在目录`output`中。 我们的训练曲线如下:
<center>
![pic](./curve.jpg)
</center>
### 测试
测试脚本是 `test.sh`, 执行:
```bash
./test.sh
```
`tesh.sh` 的主要部分:
```
paddle train \
--config=./db_lstm.py \
--model_list=$model_list \
--job=test \
--config_args=is_test=1 \
```
- \--config=./db_lstm.py: 网络配置文件
- \--model_list=$model_list.list: 模型列表文件
- \--job=test: 指示测试任务
- \--config_args=is_test=1: 指示测试任务的标记
- \--test_all_data_in_one_period=1: 在一个周期内测试所有数据
### 预测
预测脚本是 `predict.sh`,用户只需执行:
```bash
./predict.sh
```
`predict.sh`中,用户应该提供网络配置文件,模型路径,标签文件,字典文件,特征文件。
```
python predict.py
-c $config_file \
-w $best_model_path \
-l $label_file \
-p $predicate_dict_file \
-d $dict_file \
-i $input_file \
-o $output_file
```
`predict.py` 是主要的可执行python脚本,其中包括函数:加载模型,加载数据,数据预测。网络模型将输出标签的概率分布。 在演示中,我们使用最大概率的标签作为结果。用户还可以根据概率分布矩阵实现集束搜索或维特比解码。
预测后,结果保存在 `predict.res` 中。
## 引用
[1] Martha Palmer, Dan Gildea, and Paul Kingsbury. The Proposition Bank: An Annotated Corpus of Semantic Roles , Computational Linguistics, 31(1), 2005.
[2] Zhou, Jie, and Wei Xu. "End-to-end learning of semantic role labeling using recurrent neural networks." Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2015.
...@@ -186,8 +186,7 @@ def define_py_data_sources2(train_list, test_list, module, obj, args=None): ...@@ -186,8 +186,7 @@ def define_py_data_sources2(train_list, test_list, module, obj, args=None):
obj="process", obj="process",
args={"dictionary": dict_name}) args={"dictionary": dict_name})
The related data provider can refer to The related data provider can refer to :ref:`api_pydataprovider2_en_sequential_model` .
`here <../../data_provider/pydataprovider2.html#dataprovider-for-the-sequential-model>`__.
:param train_list: Train list name. :param train_list: Train list name.
:type train_list: basestring :type train_list: basestring
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册