提交 13cd4dc0 编写于 作者: P peterzhang2029

refine docstring and notation

上级 0096515a
...@@ -76,8 +76,8 @@ pip install -r requirements.txt ...@@ -76,8 +76,8 @@ pip install -r requirements.txt
## 指定训练配置参数 ## 指定训练配置参数
`config.py`脚本中包含训练配置和模型配置的参数设置, 示例代码如下: 通过 `config.py` 脚本修改训练和模型配置参数,脚本中有对可配置参数的详细解释,示例如下:
``` ```python
class TrainerConfig(object): class TrainerConfig(object):
# whether to use GPU for training # whether to use GPU for training
...@@ -98,8 +98,7 @@ class ModelConfig(object): ...@@ -98,8 +98,7 @@ class ModelConfig(object):
... ...
``` ```
用户可以对具体参数进行设置实现训练, 例如通过设置 `use_gpu` 参数来指定是否使用 GPU 修改 `config.py` 对参数进行调整。例如,通过修改 `use_gpu` 参数来指定是否使用 GPU 进行训练。
进行训练。
## 使用 PaddlePaddle 内置数据运行 ## 使用 PaddlePaddle 内置数据运行
...@@ -200,7 +199,11 @@ Options: ...@@ -200,7 +199,11 @@ Options:
修改`train.py`脚本中的启动参数,可以直接运行本例。 以`data`目录下的示例数据为例,在终端执行: 修改`train.py`脚本中的启动参数,可以直接运行本例。 以`data`目录下的示例数据为例,在终端执行:
```bash ```bash
python train.py --train_data_dir 'data/train_data' --test_data_dir 'data/test_data' --word_dict_path 'word_dict.txt' --label_dict_path 'label_dict.txt' python train.py \
--train_data_dir 'data/train_data' \
--test_data_dir 'data/test_data' \
--word_dict_path 'word_dict.txt' \
--label_dict_path 'label_dict.txt'
``` ```
即可对样例数据进行训练。 即可对样例数据进行训练。
...@@ -226,7 +229,11 @@ Options: ...@@ -226,7 +229,11 @@ Options:
2.`data`目录下的示例数据为例,在终端执行: 2.`data`目录下的示例数据为例,在终端执行:
```bash ```bash
python infer.py --data_path 'data/infer.txt' --word_dict_path 'word_dict.txt' --label_dict_path 'label_dict.txt' --model_path 'models/params_pass_00000.tar.gz' python infer.py \
--data_path 'data/infer.txt' \
--word_dict_path 'word_dict.txt' \
--label_dict_path 'label_dict.txt' \
--model_path 'models/params_pass_00000.tar.gz'
``` ```
即可对样例数据进行预测。 即可对样例数据进行预测。
...@@ -3,37 +3,44 @@ __all__ = ["TrainerConfig", "ModelConfig"] ...@@ -3,37 +3,44 @@ __all__ = ["TrainerConfig", "ModelConfig"]
class TrainerConfig(object): class TrainerConfig(object):
# whether to use GPU for training # Whether to use GPU in training or not.
use_gpu = False use_gpu = False
# the number of threads used in one machine # The number of computing threads.
trainer_count = 1 trainer_count = 1
# train batch size # The training batch size.
batch_size = 32 batch_size = 32
# number of pass during training # The epoch number.
num_passes = 10 num_passes = 10
# learning rate for optimizer # The global learning rate.
learning_rate = 1e-3 learning_rate = 1e-3
# learning rate for L2Regularization # The decay rate for L2Regularization
l2_learning_rate = 1e-3 l2_learning_rate = 1e-3
# average_window for ModelAverage # This parameter is used for the averaged SGD.
# About the average_window * (number of the processed batch) parameters
# are used for average.
# To be accurate, between average_window *(number of the processed batch)
# and 2 * average_window * (number of the processed batch) parameters
# are used for average.
average_window = 0.5 average_window = 0.5
# buffer size for shuffling # The buffer size of the data reader.
# The number of buffer size samples will be shuffled in training.
buf_size = 1000 buf_size = 1000
# log progress every log_period batches # The parameter is used to control logging period.
# Training log will be printed every log_period.
log_period = 100 log_period = 100
class ModelConfig(object): class ModelConfig(object):
# embedding vector dimension # The dimension of embedding vector.
emb_size = 28 emb_size = 28
# size of sentence vector representation and fc layer in cnn # The hidden size of sentence vectors.
hidden_size = 128 hidden_size = 128
...@@ -118,8 +118,8 @@ pip install -r requirements.txt ...@@ -118,8 +118,8 @@ pip install -r requirements.txt
## 指定训练配置参数 ## 指定训练配置参数
`config.py`脚本中包含训练配置和模型配置的参数设置, 示例代码如下: 通过 `config.py` 脚本修改训练和模型配置参数,脚本中有对可配置参数的详细解释,示例如下:
``` ```python
class TrainerConfig(object): class TrainerConfig(object):
# whether to use GPU for training # whether to use GPU for training
...@@ -140,8 +140,7 @@ class ModelConfig(object): ...@@ -140,8 +140,7 @@ class ModelConfig(object):
... ...
``` ```
用户可以对具体参数进行设置实现训练, 例如通过设置 `use_gpu` 参数来指定是否使用 GPU 修改 `config.py` 对参数进行调整。例如,通过修改 `use_gpu` 参数来指定是否使用 GPU 进行训练。
进行训练。
## 使用 PaddlePaddle 内置数据运行 ## 使用 PaddlePaddle 内置数据运行
...@@ -242,7 +241,11 @@ Options: ...@@ -242,7 +241,11 @@ Options:
修改`train.py`脚本中的启动参数,可以直接运行本例。 以`data`目录下的示例数据为例,在终端执行: 修改`train.py`脚本中的启动参数,可以直接运行本例。 以`data`目录下的示例数据为例,在终端执行:
```bash ```bash
python train.py --train_data_dir 'data/train_data' --test_data_dir 'data/test_data' --word_dict_path 'word_dict.txt' --label_dict_path 'label_dict.txt' python train.py \
--train_data_dir 'data/train_data' \
--test_data_dir 'data/test_data' \
--word_dict_path 'word_dict.txt' \
--label_dict_path 'label_dict.txt'
``` ```
即可对样例数据进行训练。 即可对样例数据进行训练。
...@@ -268,7 +271,11 @@ Options: ...@@ -268,7 +271,11 @@ Options:
2.以`data`目录下的示例数据为例,在终端执行: 2.以`data`目录下的示例数据为例,在终端执行:
```bash ```bash
python infer.py --data_path 'data/infer.txt' --word_dict_path 'word_dict.txt' --label_dict_path 'label_dict.txt' --model_path 'models/params_pass_00000.tar.gz' python infer.py \
--data_path 'data/infer.txt' \
--word_dict_path 'word_dict.txt' \
--label_dict_path 'label_dict.txt' \
--model_path 'models/params_pass_00000.tar.gz'
``` ```
即可对样例数据进行预测。 即可对样例数据进行预测。
......
...@@ -58,6 +58,7 @@ def infer(data_path, model_path, word_dict_path, batch_size, label_dict_path): ...@@ -58,6 +58,7 @@ def infer(data_path, model_path, word_dict_path, batch_size, label_dict_path):
word_reverse_dict = dict((value, key) word_reverse_dict = dict((value, key)
for key, value in word_dict.iteritems()) for key, value in word_dict.iteritems())
# The reversed label dict of the imdb dataset
label_reverse_dict = {0: "positive", 1: "negative"} label_reverse_dict = {0: "positive", 1: "negative"}
test_reader = reader.imdb_test(word_dict) test_reader = reader.imdb_test(word_dict)
class_num = 2 class_num = 2
...@@ -75,11 +76,12 @@ def infer(data_path, model_path, word_dict_path, batch_size, label_dict_path): ...@@ -75,11 +76,12 @@ def infer(data_path, model_path, word_dict_path, batch_size, label_dict_path):
test_reader = reader.infer_reader(data_path, word_dict)() test_reader = reader.infer_reader(data_path, word_dict)()
dict_dim = len(word_dict) dict_dim = len(word_dict)
prob_layer = nested_net(dict_dim, class_num, is_infer=True)
# initialize PaddlePaddle. # initialize PaddlePaddle.
paddle.init(use_gpu=False, trainer_count=1) paddle.init(use_gpu=False, trainer_count=1)
prob_layer = nested_net(dict_dim, class_num, is_infer=True)
# load the trained models. # load the trained models.
parameters = paddle.parameters.Parameters.from_tar( parameters = paddle.parameters.Parameters.from_tar(
gzip.open(model_path, "r")) gzip.open(model_path, "r"))
......
...@@ -4,10 +4,10 @@ from config import ModelConfig as conf ...@@ -4,10 +4,10 @@ from config import ModelConfig as conf
def cnn_cov_group(group_input, hidden_size): def cnn_cov_group(group_input, hidden_size):
""" """
Covolution group definition Convolution group definition.
:param group_input: The input of this layer. :param group_input: The input of this layer.
:type group_input: LayerOutput :type group_input: LayerOutput
:params hidden_size: Size of FC layer. :params hidden_size: The size of the fully connected layer.
:type hidden_size: int :type hidden_size: int
""" """
conv3 = paddle.networks.sequence_conv_pool( conv3 = paddle.networks.sequence_conv_pool(
...@@ -32,7 +32,7 @@ def nested_net(dict_dim, class_num, is_infer=False): ...@@ -32,7 +32,7 @@ def nested_net(dict_dim, class_num, is_infer=False):
:type dict_dim: int :type dict_dim: int
:params class_num: Number of instance class. :params class_num: Number of instance class.
:type class_num: int :type class_num: int
:params is_infer: The boolean parameter :params is_infer: The boolean parameter
indicating inferring or training. indicating inferring or training.
:type is_infer: bool :type is_infer: bool
""" """
......
...@@ -37,7 +37,7 @@ from config import TrainerConfig as conf ...@@ -37,7 +37,7 @@ from config import TrainerConfig as conf
"--label_dict_path", "--label_dict_path",
type=str, type=str,
default=None, default=None,
help=("The path of label dictionary (default: None)." help=("The path of label dictionary (default: None). "
"If this parameter is not set, imdb dataset will be used. " "If this parameter is not set, imdb dataset will be used. "
"If this parameter is set, but the file does not exist, " "If this parameter is set, but the file does not exist, "
"label dictionay will be built from " "label dictionay will be built from "
...@@ -50,16 +50,16 @@ from config import TrainerConfig as conf ...@@ -50,16 +50,16 @@ from config import TrainerConfig as conf
def train(train_data_dir, test_data_dir, word_dict_path, label_dict_path, def train(train_data_dir, test_data_dir, word_dict_path, label_dict_path,
model_save_dir): model_save_dir):
""" """
:params train_data_path: path of training data, if this parameter :params train_data_path: The path of training data, if this parameter
is not specified, imdb dataset will be used to run this example is not specified, imdb dataset will be used to run this example
:type train_data_path: str :type train_data_path: str
:params test_data_path: path of testing data, if this parameter :params test_data_path: The path of testing data, if this parameter
is not specified, imdb dataset will be used to run this example is not specified, imdb dataset will be used to run this example
:type test_data_path: str :type test_data_path: str
:params word_dict_path: path of word dictionary, if this parameter :params word_dict_path: The path of word dictionary, if this parameter
is not specified, imdb dataset will be used to run this example is not specified, imdb dataset will be used to run this example
:type word_dict_path: str :type word_dict_path: str
:params label_dict_path: path of label dictionary, if this parameter :params label_dict_path: The path of label dictionary, if this parameter
is not specified, imdb dataset will be used to run this example is not specified, imdb dataset will be used to run this example
:type label_dict_path: str :type label_dict_path: str
:params model_save_dir: dir where models saved :params model_save_dir: dir where models saved
......
...@@ -7,6 +7,18 @@ logger.setLevel(logging.INFO) ...@@ -7,6 +7,18 @@ logger.setLevel(logging.INFO)
def build_word_dict(data_dir, save_path, use_col=1, cutoff_fre=1): def build_word_dict(data_dir, save_path, use_col=1, cutoff_fre=1):
"""
Build word dictionary from training data.
:param data_dir: The directory of training dataset.
:type data_dir: str
:params save_path: The path where the word dictionary will be saved.
:type save_path: str
:params use_col: The index of text juring line split.
:type use_col: int
:params cutoff_fre: The word will not be added to dictionary if it's
frequency is less than cutoff_fre.
:type cutoff_fre: int
"""
values = defaultdict(int) values = defaultdict(int)
for file_name in os.listdir(data_dir): for file_name in os.listdir(data_dir):
...@@ -33,6 +45,15 @@ def build_word_dict(data_dir, save_path, use_col=1, cutoff_fre=1): ...@@ -33,6 +45,15 @@ def build_word_dict(data_dir, save_path, use_col=1, cutoff_fre=1):
def build_label_dict(data_dir, save_path, use_col=0): def build_label_dict(data_dir, save_path, use_col=0):
"""
Build label dictionary from training data.
:param data_dir: The directory of training dataset.
:type data_dir: str
:params save_path: The path where the label dictionary will be saved.
:type save_path: str
:params use_col: The index of label juring line split.
:type use_col: int
"""
values = defaultdict(int) values = defaultdict(int)
for file_name in os.listdir(data_dir): for file_name in os.listdir(data_dir):
...@@ -53,10 +74,22 @@ def build_label_dict(data_dir, save_path, use_col=0): ...@@ -53,10 +74,22 @@ def build_label_dict(data_dir, save_path, use_col=0):
def load_dict(dict_path): def load_dict(dict_path):
"""
Load word dictionary from dictionary path.
:param dict_path: The path of word dictionary.
:type data_dir: str
"""
return dict((line.strip().split("\t")[0], idx) return dict((line.strip().split("\t")[0], idx)
for idx, line in enumerate(open(dict_path, "r").readlines())) for idx, line in enumerate(open(dict_path, "r").readlines()))
def load_reverse_dict(dict_path): def load_reverse_dict(dict_path):
"""
Load the reversed word dictionary from dictionary path.
Index of each word is saved in key of the dictionary and the
corresponding word saved in value of the dictionary.
:param dict_path: The path of word dictionary.
:type data_dir: str
"""
return dict((idx, line.strip().split("\t")[0]) return dict((idx, line.strip().split("\t")[0])
for idx, line in enumerate(open(dict_path, "r").readlines())) for idx, line in enumerate(open(dict_path, "r").readlines()))
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册