提交 3e87021d 编写于 作者: L luotao02

refine demo dataprovider and some tiny fix

ISSUE=4597359 

git-svn-id: https://svn.baidu.com/idl/trunk/paddle@1432 1ad973e4-5ce8-4261-8a94-b56d1f490c56
上级 13f46029
...@@ -22,13 +22,13 @@ def hook(settings, word_dict, label_dict, **kwargs): ...@@ -22,13 +22,13 @@ def hook(settings, word_dict, label_dict, **kwargs):
settings.label_dict = label_dict settings.label_dict = label_dict
#all inputs are integral and sequential type #all inputs are integral and sequential type
settings.slots = [ settings.slots = [
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(word_dict)),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(word_dict)),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(word_dict)),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(word_dict)),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(word_dict)),
integer_value(2, seq_type=SequenceType.SEQUENCE), integer_value_sequence(2),
integer_value(len(label_dict), seq_type=SequenceType.SEQUENCE)] integer_value_sequence(len(label_dict))]
@provider(init_hook=hook) @provider(init_hook=hook)
......
...@@ -17,7 +17,7 @@ from paddle.trainer.PyDataProvider2 import * ...@@ -17,7 +17,7 @@ from paddle.trainer.PyDataProvider2 import *
def hook(settings, dictionary, **kwargs): def hook(settings, dictionary, **kwargs):
settings.word_dict = dictionary settings.word_dict = dictionary
settings.input_types = [ settings.input_types = [
integer_value(len(settings.word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(settings.word_dict)),
integer_value(2)] integer_value(2)]
settings.logger.info('dict len : %d' % (len(settings.word_dict))) settings.logger.info('dict len : %d' % (len(settings.word_dict)))
......
...@@ -30,22 +30,15 @@ def hook(settings, src_dict, trg_dict, file_list, **kwargs): ...@@ -30,22 +30,15 @@ def hook(settings, src_dict, trg_dict, file_list, **kwargs):
if settings.job_mode: if settings.job_mode:
settings.trg_dict = trg_dict settings.trg_dict = trg_dict
settings.slots = [ settings.slots = [
integer_value( integer_value_sequence(len(settings.src_dict)),
len(settings.src_dict), integer_value_sequence(len(settings.trg_dict)),
seq_type=SequenceType.SEQUENCE), integer_value( integer_value_sequence(len(settings.trg_dict))
len(settings.trg_dict),
seq_type=SequenceType.SEQUENCE), integer_value(
len(settings.trg_dict),
seq_type=SequenceType.SEQUENCE)
] ]
settings.logger.info("trg dict len : %d" % (len(settings.trg_dict))) settings.logger.info("trg dict len : %d" % (len(settings.trg_dict)))
else: else:
settings.slots = [ settings.slots = [
integer_value( integer_value_sequence(len(settings.src_dict)),
len(settings.src_dict), integer_value_sequence(len(open(file_list[0], "r").readlines()))
seq_type=SequenceType.SEQUENCE), integer_value(
len(open(file_list[0], "r").readlines()),
seq_type=SequenceType.SEQUENCE)
] ]
......
doc/demo/quick_start/NetRNN_en.png

70.3 KB | W: | H:

doc/demo/quick_start/NetRNN_en.png

57.1 KB | W: | H:

doc/demo/quick_start/NetRNN_en.png
doc/demo/quick_start/NetRNN_en.png
doc/demo/quick_start/NetRNN_en.png
doc/demo/quick_start/NetRNN_en.png
  • 2-up
  • Swipe
  • Onion skin
...@@ -225,7 +225,7 @@ Performance summary: You can refer to the training and testing scripts later. In ...@@ -225,7 +225,7 @@ Performance summary: You can refer to the training and testing scripts later. In
<br> <br>
### Word Embedding Model ### Word Embedding Model
In order to use the word embedding model, you need to change the data provider a little bit to make the input words as a sequence of word IDs. The revised data provider is listed below. You only need to change initializer() for the type of the first input. It is changed from sparse_binary_vector to sequence of intergers. process() remains the same. This data provider can also be used for later sequence models. In order to use the word embedding model, you need to change the data provider a little bit to make the input words as a sequence of word IDs. The revised data provider `dataprovider_emb.py` is listed below. You only need to change initializer() for the type of the first input. It is changed from sparse_binary_vector to sequence of intergers. process() remains the same. This data provider can also be used for later sequence models.
```python ```python
def initializer(settings, dictionary, **kwargs): def initializer(settings, dictionary, **kwargs):
...@@ -260,7 +260,7 @@ avg = pooling_layer(input=emb, pooling_type=AvgPooling()) ...@@ -260,7 +260,7 @@ avg = pooling_layer(input=emb, pooling_type=AvgPooling())
The other parts of the model are the same as logistic regression network. The other parts of the model are the same as logistic regression network.
The performance is summarized in the following table: The performance is summarized in the following table:
<html> <html>
<center> <center>
...@@ -400,7 +400,7 @@ If you want to install the remote training platform, which enables distributed t ...@@ -400,7 +400,7 @@ If you want to install the remote training platform, which enables distributed t
You can use the trained model to perform prediction on the dataset with no labels. You can also evaluate the model on dataset with labels to obtain its test accuracy. You can use the trained model to perform prediction on the dataset with no labels. You can also evaluate the model on dataset with labels to obtain its test accuracy.
<center> ![](./PipelineTest_en.png) </center> <center> ![](./PipelineTest_en.png) </center>
The test script (test.sh) is listed below. PaddlePaddle can evaluate a model on the data with labels specified in `test.list`. The test script is listed below. PaddlePaddle can evaluate a model on the data with labels specified in `test.list`.
```bash ```bash
paddle train \ paddle train \
...@@ -497,11 +497,12 @@ The scripts of data downloading, network configurations, and training scrips are ...@@ -497,11 +497,12 @@ The scripts of data downloading, network configurations, and training scrips are
## Appendix ## Appendix
### Command Line Argument ### Command Line Argument
* --config:network architecture path. * \--config:network architecture path.
* --save_dir:model save directory. * \--save_dir:model save directory.
* --log_period:the logging period per batch. * \--log_period:the logging period per batch.
* --num_passes:number of training passes. One pass means the training would go over the whole training dataset once.* --config_args:Other configuration arguments. * \--num_passes:number of training passes. One pass means the training would go over the whole training dataset once.
* --init_model_path:The path of the initial model parameter. * \--config_args:Other configuration arguments.
* \--init_model_path:The path of the initial model parameter.
By default, the trainer will save model every pass. You can also specify `saving_period_by_batches` to set the frequency of batch saving. You can use `show_parameter_stats_period` to print the statistics of the parameters, which are very useful for tuning parameters. Other command line arguments can be found in <a href = "../../ui/index.html#command-line-argument">command line argument documentation</a> By default, the trainer will save model every pass. You can also specify `saving_period_by_batches` to set the frequency of batch saving. You can use `show_parameter_stats_period` to print the statistics of the parameters, which are very useful for tuning parameters. Other command line arguments can be found in <a href = "../../ui/index.html#command-line-argument">command line argument documentation</a>
......
...@@ -71,15 +71,14 @@ def hook(settings, word_dict, label_dict, **kwargs): ...@@ -71,15 +71,14 @@ def hook(settings, word_dict, label_dict, **kwargs):
settings.word_dict = word_dict settings.word_dict = word_dict
settings.label_dict = label_dict settings.label_dict = label_dict
#all inputs are integral and sequential type #all inputs are integral and sequential type
settings.slots = [ settings.slots = [
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(word_dict)),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(word_dict)),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(word_dict)),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(word_dict)),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE), integer_value_sequence(len(word_dict)),
integer_value(2, seq_type=SequenceType.SEQUENCE), integer_value_sequence(2),
integer_value(len(label_dict), seq_type=SequenceType.SEQUENCE)]``` integer_value_sequence(len(label_dict))]
``` ```
The corresponding data iterator is as following: The corresponding data iterator is as following:
``` ```
......
# Layer Documents
* [Layer Source Code Document](source/gserver/layers/index.rst)
* [Layer Python API Document](ui/api/trainer_config_helpers/layers_index.rst)
...@@ -510,11 +510,24 @@ NCELayer ...@@ -510,11 +510,24 @@ NCELayer
.. doxygenclass:: paddle::NCELayer .. doxygenclass:: paddle::NCELayer
:members: :members:
Validation Layers
-----------------
ValidationLayer ValidationLayer
--------------- ```````````````
.. doxygenclass:: paddle::ValidationLayer .. doxygenclass:: paddle::ValidationLayer
:members: :members:
AucValidation
`````````````
.. doxygenclass:: paddle::AucValidation
:members:
PnpairValidation
````````````````
.. doxygenclass:: paddle::PnpairValidation
:members:
Check Layers Check Layers
============ ============
......
Activations
===========
.. toctree::
:maxdepth: 3
activations.rst
...@@ -207,17 +207,16 @@ classification_cost(input=output, label=label) ...@@ -207,17 +207,16 @@ classification_cost(input=output, label=label)
### 词向量模型(Word Vector) ### 词向量模型(Word Vector)
embeding模型需要稍微改变数据提供的脚本,即`dataprovider_emb.py`,词向量模型、 embedding模型需要稍微改变数据提供的脚本,即`dataprovider_emb.py`,词向量模型、
卷积模型、时序模型均使用该脚 卷积模型、时序模型均使用该脚本。其中文本输入类型定义为整数时序类型integer_value_sequence。
- 文本输入类型定义为整数类型integer_value
- 设置文本输入类型seq_type为SequenceType.SEQUENCE
``` ```
def initializer(settings, dictionary, **kwargs): def initializer(settings, dictionary, **kwargs):
settings.word_dict = dictionary settings.word_dict = dictionary
settings.input_types = [ settings.input_types = [
# Define the type of the first input as sequence of integer. # Define the type of the first input as sequence of integer.
integer_value(len(dictionary), seq_type=SequenceType.SEQUENCE), # The value of the integers range from 0 to len(dictrionary)-1
integer_value_sequence(len(dictionary)),
# Define the second input for label id # Define the second input for label id
integer_value(2)] integer_value(2)]
...@@ -479,12 +478,12 @@ else: ...@@ -479,12 +478,12 @@ else:
## 附录(Appendix) ## 附录(Appendix)
### 命令行参数(Command Line Argument) ### 命令行参数(Command Line Argument)
* --config:网络配置 * \--config:网络配置
* --save_dir:模型存储路径 * \--save_dir:模型存储路径
* --log_period:每隔多少batch打印一次日志 * \--log_period:每隔多少batch打印一次日志
* --num_passes:训练轮次,一个pass表示过一遍所有训练样本 * \--num_passes:训练轮次,一个pass表示过一遍所有训练样本
* --config_args:命令指定的参数会传入网络配置中。 * \--config_args:命令指定的参数会传入网络配置中。
* --init_model_path:指定初始化模型路径,可用在测试或训练时指定初始化模型。 * \--init_model_path:指定初始化模型路径,可用在测试或训练时指定初始化模型。
默认一个pass保存一次模型,也可以通过saving_period_by_batches设置每隔多少batch保存一次模型。 默认一个pass保存一次模型,也可以通过saving_period_by_batches设置每隔多少batch保存一次模型。
可以通过show_parameter_stats_period设置打印参数信息等。 可以通过show_parameter_stats_period设置打印参数信息等。
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册