提交 3e87021d 编写于 作者: L luotao02

refine demo dataprovider and some tiny fix

ISSUE=4597359 

git-svn-id: https://svn.baidu.com/idl/trunk/paddle@1432 1ad973e4-5ce8-4261-8a94-b56d1f490c56
上级 13f46029
......@@ -22,13 +22,13 @@ def hook(settings, word_dict, label_dict, **kwargs):
settings.label_dict = label_dict
#all inputs are integral and sequential type
settings.slots = [
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(2, seq_type=SequenceType.SEQUENCE),
integer_value(len(label_dict), seq_type=SequenceType.SEQUENCE)]
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(2),
integer_value_sequence(len(label_dict))]
@provider(init_hook=hook)
......
......@@ -17,7 +17,7 @@ from paddle.trainer.PyDataProvider2 import *
def hook(settings, dictionary, **kwargs):
settings.word_dict = dictionary
settings.input_types = [
integer_value(len(settings.word_dict), seq_type=SequenceType.SEQUENCE),
integer_value_sequence(len(settings.word_dict)),
integer_value(2)]
settings.logger.info('dict len : %d' % (len(settings.word_dict)))
......
......@@ -30,22 +30,15 @@ def hook(settings, src_dict, trg_dict, file_list, **kwargs):
if settings.job_mode:
settings.trg_dict = trg_dict
settings.slots = [
integer_value(
len(settings.src_dict),
seq_type=SequenceType.SEQUENCE), integer_value(
len(settings.trg_dict),
seq_type=SequenceType.SEQUENCE), integer_value(
len(settings.trg_dict),
seq_type=SequenceType.SEQUENCE)
integer_value_sequence(len(settings.src_dict)),
integer_value_sequence(len(settings.trg_dict)),
integer_value_sequence(len(settings.trg_dict))
]
settings.logger.info("trg dict len : %d" % (len(settings.trg_dict)))
else:
settings.slots = [
integer_value(
len(settings.src_dict),
seq_type=SequenceType.SEQUENCE), integer_value(
len(open(file_list[0], "r").readlines()),
seq_type=SequenceType.SEQUENCE)
integer_value_sequence(len(settings.src_dict)),
integer_value_sequence(len(open(file_list[0], "r").readlines()))
]
......
doc/demo/quick_start/NetRNN_en.png

70.3 KB | W: | H:

doc/demo/quick_start/NetRNN_en.png

57.1 KB | W: | H:

doc/demo/quick_start/NetRNN_en.png
doc/demo/quick_start/NetRNN_en.png
doc/demo/quick_start/NetRNN_en.png
doc/demo/quick_start/NetRNN_en.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -225,7 +225,7 @@ Performance summary: You can refer to the training and testing scripts later. In
<br>
### Word Embedding Model
In order to use the word embedding model, you need to change the data provider a little bit to make the input words as a sequence of word IDs. The revised data provider is listed below. You only need to change initializer() for the type of the first input. It is changed from sparse_binary_vector to sequence of intergers. process() remains the same. This data provider can also be used for later sequence models.
In order to use the word embedding model, you need to change the data provider a little bit to make the input words as a sequence of word IDs. The revised data provider `dataprovider_emb.py` is listed below. You only need to change initializer() for the type of the first input. It is changed from sparse_binary_vector to sequence of intergers. process() remains the same. This data provider can also be used for later sequence models.
```python
def initializer(settings, dictionary, **kwargs):
......@@ -260,7 +260,7 @@ avg = pooling_layer(input=emb, pooling_type=AvgPooling())
The other parts of the model are the same as logistic regression network.
The performance is summarized in the following table:
The performance is summarized in the following table:
<html>
<center>
......@@ -400,7 +400,7 @@ If you want to install the remote training platform, which enables distributed t
You can use the trained model to perform prediction on the dataset with no labels. You can also evaluate the model on dataset with labels to obtain its test accuracy.
<center> ![](./PipelineTest_en.png) </center>
The test script (test.sh) is listed below. PaddlePaddle can evaluate a model on the data with labels specified in `test.list`.
The test script is listed below. PaddlePaddle can evaluate a model on the data with labels specified in `test.list`.
```bash
paddle train \
......@@ -497,11 +497,12 @@ The scripts of data downloading, network configurations, and training scrips are
## Appendix
### Command Line Argument
* --config:network architecture path.
* --save_dir:model save directory.
* --log_period:the logging period per batch.
* --num_passes:number of training passes. One pass means the training would go over the whole training dataset once.* --config_args:Other configuration arguments.
* --init_model_path:The path of the initial model parameter.
* \--config:network architecture path.
* \--save_dir:model save directory.
* \--log_period:the logging period per batch.
* \--num_passes:number of training passes. One pass means the training would go over the whole training dataset once.
* \--config_args:Other configuration arguments.
* \--init_model_path:The path of the initial model parameter.
By default, the trainer will save model every pass. You can also specify `saving_period_by_batches` to set the frequency of batch saving. You can use `show_parameter_stats_period` to print the statistics of the parameters, which are very useful for tuning parameters. Other command line arguments can be found in <a href = "../../ui/index.html#command-line-argument">command line argument documentation</a>
......
......@@ -71,15 +71,14 @@ def hook(settings, word_dict, label_dict, **kwargs):
settings.word_dict = word_dict
settings.label_dict = label_dict
#all inputs are integral and sequential type
settings.slots = [
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(len(word_dict), seq_type=SequenceType.SEQUENCE),
integer_value(2, seq_type=SequenceType.SEQUENCE),
integer_value(len(label_dict), seq_type=SequenceType.SEQUENCE)]```
settings.slots = [
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(2),
integer_value_sequence(len(label_dict))]
```
The corresponding data iterator is as following:
```
......
# Layer Documents
* [Layer Source Code Document](source/gserver/layers/index.rst)
* [Layer Python API Document](ui/api/trainer_config_helpers/layers_index.rst)
......@@ -510,11 +510,24 @@ NCELayer
.. doxygenclass:: paddle::NCELayer
:members:
Validation Layers
-----------------
ValidationLayer
---------------
```````````````
.. doxygenclass:: paddle::ValidationLayer
:members:
AucValidation
`````````````
.. doxygenclass:: paddle::AucValidation
:members:
PnpairValidation
````````````````
.. doxygenclass:: paddle::PnpairValidation
:members:
Check Layers
============
......
Activations
===========
.. toctree::
:maxdepth: 3
activations.rst
......@@ -207,17 +207,16 @@ classification_cost(input=output, label=label)
### 词向量模型(Word Vector)
embeding模型需要稍微改变数据提供的脚本,即`dataprovider_emb.py`,词向量模型、
卷积模型、时序模型均使用该脚
- 文本输入类型定义为整数类型integer_value
- 设置文本输入类型seq_type为SequenceType.SEQUENCE
embedding模型需要稍微改变数据提供的脚本,即`dataprovider_emb.py`,词向量模型、
卷积模型、时序模型均使用该脚本。其中文本输入类型定义为整数时序类型integer_value_sequence。
```
def initializer(settings, dictionary, **kwargs):
settings.word_dict = dictionary
settings.input_types = [
# Define the type of the first input as sequence of integer.
integer_value(len(dictionary), seq_type=SequenceType.SEQUENCE),
# The value of the integers range from 0 to len(dictrionary)-1
integer_value_sequence(len(dictionary)),
# Define the second input for label id
integer_value(2)]
......@@ -479,12 +478,12 @@ else:
## 附录(Appendix)
### 命令行参数(Command Line Argument)
* --config:网络配置
* --save_dir:模型存储路径
* --log_period:每隔多少batch打印一次日志
* --num_passes:训练轮次,一个pass表示过一遍所有训练样本
* --config_args:命令指定的参数会传入网络配置中。
* --init_model_path:指定初始化模型路径,可用在测试或训练时指定初始化模型。
* \--config:网络配置
* \--save_dir:模型存储路径
* \--log_period:每隔多少batch打印一次日志
* \--num_passes:训练轮次,一个pass表示过一遍所有训练样本
* \--config_args:命令指定的参数会传入网络配置中。
* \--init_model_path:指定初始化模型路径,可用在测试或训练时指定初始化模型。
默认一个pass保存一次模型,也可以通过saving_period_by_batches设置每隔多少batch保存一次模型。
可以通过show_parameter_stats_period设置打印参数信息等。
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册