未验证 提交 baaa6166 编写于 作者: Z zhengya01 提交者: GitHub

Merge pull request #8 from PaddlePaddle/develop

update
......@@ -13,6 +13,57 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化
- [fluid模型](fluid): 使用 PaddlePaddle Fluid版本的 APIs,我们特别推荐您使用Fluid模型。
## PaddleCV
模型|简介|模型优势|参考论文
--|:--:|:--:|:--:
[AlexNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速|[ImageNet Classification with Deep Convolutional Neural Networks](https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks)
[VGG](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|在AlexNet的基础上使用3*3小卷积核,增加网络深度,具有很好的泛化能力|[Very Deep ConvNets for Large-Scale Inage Recognition](https://arxiv.org/pdf/1409.1556.pdf)
[GoogleNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越|[Going deeper with convolutions](https://ieeexplore.ieee.org/document/7298594)
[ResNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|残差网络|引入了新的残差结构,解决了随着网络加深,准确率下降的问题|[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
[Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|更加deeper和wider的inception结构|[Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261)
[MobileNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|轻量级网络模型|为移动和嵌入式设备提出的高效模型|[MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)
[DPN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类模型|结合了DenseNet和ResNeXt的网络结构,对图像分类效果有所提升|[Dual Path Networks](https://arxiv.org/abs/1707.01629)
[SE-ResNeXt](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类模型|ResNeXt中加入了SE block,提高了模型准确率|[Squeeze-and-excitation networks](https://arxiv.org/abs/1709.01507)
[SSD](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md)|单阶段目标检测器|在不同尺度的特征图上检测对应尺度的目标,可以方便地插入到任何一种标准卷积网络中|[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325)
[Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md)|基于SSD的单阶段人脸检测器|利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强|[PyramidBox: A Context-assisted Single Shot Face Detector](https://arxiv.org/pdf/1803.07737.pdf)
[Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/faster_rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497)
[ICNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/icnet)|图像实时语义分割模型|即考虑了速度,也考虑了准确性,在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡|[ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
[DCGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan)|图像生成模型|深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题|[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf)
[ConditionalGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan)|图像生成模型|条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程|[Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784)
[CycleGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/cycle_gan)|图片转化模型|自动将某一类图片转换成另外一类图片,可用于风格迁移|[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593)
[CRNN-CTC模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用CTC model识别图片中单行英文字符|[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.researchgate.net/publication/221346365_Connectionist_temporal_classification_Labelling_unsegmented_sequence_data_with_recurrent_neural_'networks)
[Attention模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用attention 识别图片中单行英文字符|[Recurrent Models of Visual Attention](https://arxiv.org/abs/1406.6247)
[Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/metric_learning)|度量学习模型|能够用于分析对象时间的关联、比较关系,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域|-
[TSN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/video_classification)|视频分类模型|基于长范围时间结构建模,结合了稀疏时间采样策略和视频级监督来保证使用整段视频时学习得有效和高效|[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859)
[caffe2fluid](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/caffe2fluid)|将Caffe模型转换为Paddle Fluid配置和模型文件工具|-|-
## PaddleNLP
模型|简介|模型优势|参考论文
--|:--:|:--:|:--:
[Transformer](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md)|机器翻译模型|基于self-attention,计算复杂度小,并行度高,容易学习长程依赖,翻译效果更好|[Attention Is All You Need](https://arxiv.org/abs/1706.03762)
[LAC](https://github.com/baidu/lac/blob/master/README.md)|联合的词法分析模型|能够整体性地完成中文分词、词性标注、专名识别任务|[Chinese Lexical Analysis with Deep Bi-GRU-CRF Network](https://arxiv.org/abs/1807.01882)
[Senta](https://github.com/baidu/Senta/blob/master/README.md)|情感倾向分析模型集|百度AI开放平台中情感倾向分析模型|-
[DAM](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/deep_attention_matching_net)|语义匹配模型|百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择|[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://aclweb.org/anthology/P18-1103)
[SimNet](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md)|语义匹配框架|使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力|-
[DuReader](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/machine_reading_comprehension/README.md)|阅读理解模型|百度MRC数据集上的机器阅读理解模型|-
[Bi-GRU-CRF](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/sequence_tagging_for_ner/README.md)|命名实体识别|结合了CRF和双向GRU的命名实体识别模型|-
## PaddleRec
模型|简介|模型优势|参考论文
--|:--:|:--:|:--:
[TagSpace](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/tagspace)|文本及标签的embedding表示学习模型|应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等|[#TagSpace: Semantic embeddings from hashtags](https://www.bibsonomy.org/bibtex/0ed4314916f8e7c90d066db45c293462)
[GRU4Rec](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec)|个性化推荐模型|首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升|[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)
[SSR](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/ssr)|序列语义检索推荐模型|使用参考论文中的思想,使用多种时间粒度进行用户行为预测|[Multi-Rate Deep Learning for Temporal Recommendation](https://dl.acm.org/citation.cfm?id=2914726)
[DeepCTR](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/ctr/README.cn.md)|点击率预估模型|只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出|[DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247)
[Multiview-Simnet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/multiview_simnet)|个性化推荐模型|基于多元视图,将用户和项目的多个功能视图合并为一个统一模型|[A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](http://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf)
## Other Models
模型|简介|模型优势|参考论文
--|:--:|:--:|:--:
[DeepASR](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepASR/README_cn.md)|语音识别系统|利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器|-
[DQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|深度Q网络|value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型|[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236)
[DoubleDQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|DQN的变体|将Double Q的想法应用在DQN上,解决过优化问题|[Font Size: Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389)
[DuelingDQN](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md)|DQN的变体|改进了DQN模型,提高了模型的性能|[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html)
## License
This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE).
......
......@@ -440,7 +440,8 @@ class Network(object):
if need_transpose:
order = range(dims)
order.remove(axis).append(axis)
order.remove(axis)
order.append(axis)
input = fluid.layers.transpose(
input,
perm=order,
......@@ -525,11 +526,21 @@ class Network(object):
scale_shape = input.shape[axis:axis + num_axes]
param_attr = fluid.ParamAttr(name=prefix + 'scale')
scale_param = fluid.layers.create_parameter(
shape=scale_shape, dtype=input.dtype, name=name, attr=param_attr)
shape=scale_shape,
dtype=input.dtype,
name=name,
attr=param_attr,
is_bias=True,
default_initializer=fluid.initializer.Constant(value=1.0))
offset_attr = fluid.ParamAttr(name=prefix + 'offset')
offset_param = fluid.layers.create_parameter(
shape=scale_shape, dtype=input.dtype, name=name, attr=offset_attr)
shape=scale_shape,
dtype=input.dtype,
name=name,
attr=offset_attr,
is_bias=True,
default_initializer=fluid.initializer.Constant(value=0.0))
output = fluid.layers.elementwise_mul(
input,
......
......@@ -76,7 +76,7 @@ python ./train.py \
--train_crop_size=769 \
--total_step=90000 \
--init_weights_path=deeplabv3plus_xception65_initialize.params \
--save_weights_path=output \
--save_weights_path=output/ \
--dataset_path=$DATASET_PATH
```
......
......@@ -99,7 +99,7 @@ python -u train.py --batch_size=16 --pretrained_model=vgg_ilsvrc_16_fc_reduced
模型训练所采用的数据增强:
**数据增强**:数据的读取行为定义在 `reader.py` 中,所有的图片都会被缩放到640x640。在训练时还会对图片进行数据增强,包括随机扰动、翻转、裁剪等,和[物体检测SSD算法](https://github.com/PaddlePaddle/models/blob/develop/fluid/object_detection/README_cn.md#%E8%AE%AD%E7%BB%83-pascal-voc-%E6%95%B0%E6%8D%AE%E9%9B%86)中数据增强类似,除此之外,增加了上面提到的Data-anchor-sampling:
**数据增强**:数据的读取行为定义在 `reader.py` 中,所有的图片都会被缩放到640x640。在训练时还会对图片进行数据增强,包括随机扰动、翻转、裁剪等,和[物体检测SSD算法](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README.md)中数据增强类似,除此之外,增加了上面提到的Data-anchor-sampling:
**尺度变换(Data-anchor-sampling)**:随机将图片尺度变换到一定范围的尺度,大大增强人脸的尺度变化。具体操作为根据随机选择的人脸高(height)和宽(width),得到$v=\\sqrt{width * height}$,判断$v$的值位于缩放区间$[16,32,64,128,256,512]$中的的哪一个。假设$v=45$,则选定$32<v<64$,以均匀分布的概率选取$[16,32,64]$中的任意一个值。若选中$64$,则该人脸的缩放区间在 $[64 / 2,min(v * 2, 64 * 2)]$中选定。
......
......@@ -38,18 +38,6 @@ Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download
## Training
After data preparation, one can start the training step by:
python train.py \
--model_save_dir=output/ \
--pretrained_model=${path_to_pretrain_model}
--data_dir=${path_to_data}
- Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train.
- For more help on arguments:
python train.py --help
**download the pre-trained model:** This sample provides Resnet-50 pre-trained model which is converted from Caffe. The model fuses the parameters in batch normalization layer. One can download pre-trained model as:
sh ./pretrained/download.sh
......@@ -72,6 +60,18 @@ To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed.
# not to install the COCO API into global site-packages
python2 setup.py install --user
After data preparation, one can start the training step by:
python train.py \
--model_save_dir=output/ \
--pretrained_model=${path_to_pretrain_model}
--data_dir=${path_to_data}
- Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train.
- For more help on arguments:
python train.py --help
**data reader introduction:**
* Data reader is defined in `reader.py`.
......@@ -128,7 +128,7 @@ Inference is used to get prediction score or image features based on trained mod
python infer.py \
--dataset=coco2017 \
--pretrained_model=${path_to_pretrain_model} \
--image_path=data/COCO17/val2017/ \
--image_path=dataset/coco/val2017/ \
--image_name=000000000139.jpg \
--draw_threshold=0.6
......
......@@ -37,18 +37,6 @@ Faster RCNN 目标检测模型
## 模型训练
数据准备完毕后,可以通过如下的方式启动训练:
python train.py \
--model_save_dir=output/ \
--pretrained_model=${path_to_pretrain_model}
--data_dir=${path_to_data}
- 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。
- 可选参数见:
python train.py --help
**下载预训练模型:** 本示例提供Resnet-50预训练模型,该模性转换自Caffe,并对批标准化层(Batch Normalization Layer)进行参数融合。采用如下命令下载预训练模型:
sh ./pretrained/download.sh
......@@ -71,6 +59,18 @@ Faster RCNN 目标检测模型
# not to install the COCO API into global site-packages
python2 setup.py install --user
数据准备完毕后,可以通过如下的方式启动训练:
python train.py \
--model_save_dir=output/ \
--pretrained_model=${path_to_pretrain_model}
--data_dir=${path_to_data}
- 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。
- 可选参数见:
python train.py --help
**数据读取器说明:** 数据读取器定义在reader.py中。所有图像将短边等比例缩放至`scales`,若长边大于`max_size`, 则再次将长边等比例缩放至`max_size`。在训练阶段,对图像采用水平翻转。支持将同一个batch内的图像padding为相同尺寸。
**模型设置:**
......@@ -124,7 +124,7 @@ Faster RCNN 目标检测模型
python infer.py \
--dataset=coco2017 \
--pretrained_model=${path_to_pretrain_model} \
--image_path=data/COCO17/val2017/ \
--image_path=dataset/coco/val2017/ \
--image_name=000000000139.jpg \
--draw_threshold=0.6
......
......@@ -28,6 +28,7 @@ from __future__ import unicode_literals
import cv2
import numpy as np
from config import cfg
import os
def get_image_blob(roidb, mode):
......@@ -43,8 +44,11 @@ def get_image_blob(roidb, mode):
target_size = cfg.TEST.scales[0]
max_size = cfg.TEST.max_size
im = cv2.imread(roidb['image'])
assert im is not None, \
'Failed to read image \'{}\''.format(roidb['image'])
try:
assert im is not None
except AssertionError as e:
print('Failed to read image \'{}\''.format(roidb['image']))
os._exit(0)
if roidb['flipped']:
im = im[:, ::-1, :]
im, im_scale = prep_im_for_blob(im, cfg.pixel_means, target_size, max_size)
......
......@@ -98,7 +98,7 @@ def parse_args():
add_arg('pretrained_model', str, 'imagenet_resnet50_fusebn', "The init model path.")
add_arg('dataset', str, 'coco2017', "coco2014, coco2017.")
add_arg('class_num', int, 81, "Class number.")
add_arg('data_dir', str, 'data/COCO17', "The data root path.")
add_arg('data_dir', str, 'dataset/coco', "The data root path.")
add_arg('use_pyreader', bool, True, "Use pyreader.")
add_arg('use_profile', bool, False, "Whether use profiler.")
add_arg('padding_minibatch',bool, False,
......@@ -127,7 +127,7 @@ def parse_args():
add_arg('debug', bool, False, "Debug mode")
# SINGLE EVAL AND DRAW
add_arg('draw_threshold', float, 0.8, "Confidence threshold to draw bbox.")
add_arg('image_path', str, 'data/COCO17/val2017', "The image path used to inference and visualize.")
add_arg('image_path', str, 'dataset/coco/val2017', "The image path used to inference and visualize.")
add_arg('image_name', str, '', "The single image used to inference and visualize.")
# ce
parser.add_argument(
......
......@@ -3,7 +3,7 @@
# This file is only used for continuous evaluation.
export FLAGS_cudnn_deterministic=True
export ce_mode=1
(CUDA_VISIBLE_DEVICES=6 python c_gan.py --batch_size=121 --epoch=1 --run_ce=True --use_gpu=True & \
CUDA_VISIBLE_DEVICES=7 python dc_gan.py --batch_size=121 --epoch=1 --run_ce=True --use_gpu=True) | python _ce.py
(CUDA_VISIBLE_DEVICES=2 python c_gan.py --batch_size=121 --epoch=1 --run_ce=True --use_gpu=True & \
CUDA_VISIBLE_DEVICES=3 python dc_gan.py --batch_size=121 --epoch=1 --run_ce=True --use_gpu=True) | python _ce.py
......@@ -6,6 +6,7 @@ Image classification, which is an important field of computer vision, is to clas
- [Installation](#installation)
- [Data preparation](#data-preparation)
- [Training a model with flexible parameters](#training-a-model)
- [Using Mixed-Precision Training](#using-mixed-precision-training)
- [Finetuning](#finetuning)
- [Evaluation](#evaluation)
- [Inference](#inference)
......@@ -112,6 +113,13 @@ The error rate curves of AlexNet, ResNet50 and SE-ResNeXt-50 are shown in the fi
Training and validation Curves
</p>
## Using Mixed-Precision Training
You may add `--fp16 1` to start train using mixed precisioin training, which the training process will use float16 and the output model ("master" parameters) is saved as float32. You also may need to pass `--scale_loss` to overcome accuracy issues, usually `--scale_loss 8.0` will do.
Note that currently `--fp16` can not use together with `--with_mem_opt`, so pass `--with_mem_opt 0` to disable memory optimization pass.
## Finetuning
Finetuning is to finetune model weights in a specific task by loading pretrained weights. After initializing ```path_to_pretrain_model```, one can finetune a model as:
......
......@@ -109,6 +109,11 @@ End pass 9, train_loss 3.3745200634, train_acc1 0.303871691227, train_acc5 0.545
训练集合与验证集合上的错误率曲线
</p>
## 混合精度训练
可以通过开启`--fp16 1`启动混合精度训练,这样训练过程会使用float16数据,并输出float32的模型参数("master"参数)。您可能需要同时传入`--scale_loss`来解决fp16训练的精度问题,通常传入`--scale_loss 8.0`即可。
注意,目前混合精度训练不能和内存优化功能同时使用,所以需要传`--with_mem_opt 0`这个参数来禁用内存优化功能。
## 参数微调
......
......@@ -49,7 +49,7 @@ def eval(args):
# model definition
model = models.__dict__[model_name]()
if model_name is "GoogleNet":
if model_name == "GoogleNet":
out0, out1, out2 = model.net(input=image, class_dim=class_dim)
cost0 = fluid.layers.cross_entropy(input=out0, label=label)
cost1 = fluid.layers.cross_entropy(input=out1, label=label)
......@@ -71,8 +71,10 @@ def eval(args):
test_program = fluid.default_main_program().clone(for_test=True)
fetch_list = [avg_cost.name, acc_top1.name, acc_top5.name]
if with_memory_optimization:
fluid.memory_optimize(fluid.default_main_program())
fluid.memory_optimize(
fluid.default_main_program(), skip_opt_set=set(fetch_list))
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
......@@ -88,8 +90,6 @@ def eval(args):
val_reader = paddle.batch(reader.val(), batch_size=args.batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
fetch_list = [avg_cost.name, acc_top1.name, acc_top5.name]
test_info = [[], [], []]
cnt = 0
for batch_id, data in enumerate(val_reader()):
......
......@@ -11,7 +11,6 @@ import models
import reader
import argparse
import functools
from models.learning_rate import cosine_decay
from utility import add_arguments, print_arguments
import math
......@@ -44,7 +43,6 @@ def infer(args):
# model definition
model = models.__dict__[model_name]()
if model_name is "GoogleNet":
out, _, _ = model.net(input=image, class_dim=class_dim)
else:
......@@ -52,8 +50,10 @@ def infer(args):
test_program = fluid.default_main_program().clone(for_test=True)
fetch_list = [out.name]
if with_memory_optimization:
fluid.memory_optimize(fluid.default_main_program())
fluid.memory_optimize(
fluid.default_main_program(), skip_opt_set=set(fetch_list))
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
......@@ -70,8 +70,6 @@ def infer(args):
test_reader = paddle.batch(reader.test(), batch_size=test_batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=[image])
fetch_list = [out.name]
TOPK = 1
for batch_id, data in enumerate(test_reader()):
result = exe.run(test_program,
......
......@@ -142,7 +142,6 @@ class AlexNet():
out = fluid.layers.fc(
input=fc7,
size=class_dim,
act='softmax',
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
param_attr=fluid.param_attr.ParamAttr(
......
......@@ -94,7 +94,6 @@ class DPN(object):
initializer=fluid.initializer.Uniform(-stdv, stdv))
fc6 = fluid.layers.fc(input=pool5,
size=class_dim,
act='softmax',
param_attr=param_attr)
return fc6
......
......@@ -47,7 +47,6 @@ class InceptionV4():
out = fluid.layers.fc(
input=drop,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
return out
......
......@@ -120,7 +120,6 @@ class MobileNet():
output = fluid.layers.fc(input=input,
size=class_dim,
act='softmax',
param_attr=ParamAttr(initializer=MSRA()))
return output
......
......@@ -73,7 +73,6 @@ class MobileNetV2():
output = fluid.layers.fc(input=input,
size=class_dim,
act='softmax',
param_attr=ParamAttr(initializer=MSRA()))
return output
......
......@@ -60,7 +60,6 @@ class ResNet():
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(input=pool,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv,
stdv)))
......
......@@ -62,7 +62,6 @@ class DistResNet():
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(input=pool,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv,
stdv),
......
......@@ -110,7 +110,6 @@ class SE_ResNeXt():
stdv = 1.0 / math.sqrt(drop.shape[1] * 1.0)
out = fluid.layers.fc(input=drop,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv,
stdv)))
......
......@@ -93,7 +93,6 @@ class ShuffleNetV2():
output = fluid.layers.fc(input=pool_last,
size=class_dim,
act='softmax',
param_attr=ParamAttr(initializer=MSRA()))
return output
......
......@@ -64,7 +64,6 @@ class VGGNet():
out = fluid.layers.fc(
input=fc2,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Normal(scale=0.005)),
bias_attr=fluid.param_attr.ParamAttr(
......
......@@ -159,7 +159,6 @@ class AlexNet():
out = fluid.layers.fc(
input=fc7,
size=class_dim,
act='softmax',
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv),
name=layer_name[7] + "_offset"),
......
......@@ -122,7 +122,6 @@ class DPN(object):
initializer=fluid.initializer.Uniform(-stdv, stdv))
fc6 = fluid.layers.fc(input=pool5,
size=class_dim,
act='softmax',
param_attr=param_attr,
name="fc6")
......
......@@ -48,7 +48,6 @@ class InceptionV4():
out = fluid.layers.fc(
input=drop,
size=class_dim,
act='softmax',
param_attr=ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv),
name="final_fc_weights"),
......
......@@ -130,7 +130,6 @@ class MobileNet():
output = fluid.layers.fc(input=input,
size=class_dim,
act='softmax',
param_attr=ParamAttr(
initializer=MSRA(), name="fc7_weights"),
bias_attr=ParamAttr(name="fc7_offset"))
......
......@@ -80,7 +80,6 @@ class MobileNetV2():
output = fluid.layers.fc(input=input,
size=class_dim,
act='softmax',
param_attr=ParamAttr(name='fc10_weights'),
bias_attr=ParamAttr(name='fc10_offset'))
return output
......
......@@ -74,7 +74,6 @@ class ResNet():
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(input=pool,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv,
stdv)))
......
......@@ -123,7 +123,6 @@ class SE_ResNeXt():
out = fluid.layers.fc(
input=drop,
size=class_dim,
act='softmax',
param_attr=ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv),
name='fc6_weights'),
......
......@@ -97,7 +97,6 @@ class ShuffleNetV2():
output = fluid.layers.fc(input=pool_last,
size=class_dim,
act='softmax',
param_attr=ParamAttr(
initializer=MSRA(), name='fc6_weights'),
bias_attr=ParamAttr(name='fc6_offset'))
......
......@@ -61,7 +61,6 @@ class VGGNet():
out = fluid.layers.fc(
input=fc2,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(name=fc_name[2] + "_weights"),
bias_attr=fluid.param_attr.ParamAttr(name=fc_name[2] + "_offset"))
......
......@@ -17,6 +17,7 @@ import functools
import subprocess
import utils
from utils.learning_rate import cosine_decay
from utils.fp16_utils import create_master_params_grads, master_param_to_train_param
from utility import add_arguments, print_arguments
import models
import models_name
......@@ -40,7 +41,9 @@ add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use
add_arg('enable_ce', bool, False, "If set True, enable continuous evaluation job.")
add_arg('data_dir', str, "./data/ILSVRC2012", "The ImageNet dataset root dir.")
add_arg('model_category', str, "models", "Whether to use models_name or not, valid value:'models','models_name'" )
# yapf: enabl
add_arg('fp16', bool, False, "Enable half precision training with fp16." )
add_arg('scale_loss', float, 1.0, "Scale loss for fp16." )
# yapf: enable
def set_models(model):
......@@ -145,12 +148,15 @@ def net_config(image, label, model, args):
acc_top1 = fluid.layers.accuracy(input=out0, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5)
else:
out = model.net(input=image, class_dim=class_dim)
cost = fluid.layers.cross_entropy(input=out, label=label)
out = model.net(input=image, class_dim=class_dim)
cost, pred = fluid.layers.softmax_with_cross_entropy(out, label, return_softmax=True)
if args.scale_loss > 1:
avg_cost = fluid.layers.mean(x=cost) * float(args.scale_loss)
else:
avg_cost = fluid.layers.mean(x=cost)
avg_cost = fluid.layers.mean(x=cost)
acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
acc_top1 = fluid.layers.accuracy(input=pred, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=pred, label=label, k=5)
return avg_cost, acc_top1, acc_top5
......@@ -171,6 +177,8 @@ def build_program(is_train, main_prog, startup_prog, args):
use_double_buffer=True)
with fluid.unique_name.guard():
image, label = fluid.layers.read_file(py_reader)
if args.fp16:
image = fluid.layers.cast(image, "float16")
avg_cost, acc_top1, acc_top5 = net_config(image, label, model, args)
avg_cost.persistable = True
acc_top1.persistable = True
......@@ -184,7 +192,15 @@ def build_program(is_train, main_prog, startup_prog, args):
params["learning_strategy"]["name"] = args.lr_strategy
optimizer = optimizer_setting(params)
optimizer.minimize(avg_cost)
if args.fp16:
params_grads = optimizer.backward(avg_cost)
master_params_grads = create_master_params_grads(
params_grads, main_prog, startup_prog, args.scale_loss)
optimizer.apply_gradients(master_params_grads)
master_param_to_train_param(master_params_grads, params_grads, main_prog)
else:
optimizer.minimize(avg_cost)
return py_reader, avg_cost, acc_top1, acc_top5
......
from .learning_rate import cosine_decay, lr_warmup
from .fp16_utils import create_master_params_grads, master_param_to_train_param
from __future__ import print_function
import paddle
import paddle.fluid as fluid
def cast_fp16_to_fp32(i, o, prog):
prog.global_block().append_op(
type="cast",
inputs={"X": i},
outputs={"Out": o},
attrs={
"in_dtype": fluid.core.VarDesc.VarType.FP16,
"out_dtype": fluid.core.VarDesc.VarType.FP32
}
)
def cast_fp32_to_fp16(i, o, prog):
prog.global_block().append_op(
type="cast",
inputs={"X": i},
outputs={"Out": o},
attrs={
"in_dtype": fluid.core.VarDesc.VarType.FP32,
"out_dtype": fluid.core.VarDesc.VarType.FP16
}
)
def copy_to_master_param(p, block):
v = block.vars.get(p.name, None)
if v is None:
raise ValueError("no param name %s found!" % p.name)
new_p = fluid.framework.Parameter(
block=block,
shape=v.shape,
dtype=fluid.core.VarDesc.VarType.FP32,
type=v.type,
lod_level=v.lod_level,
stop_gradient=p.stop_gradient,
trainable=p.trainable,
optimize_attr=p.optimize_attr,
regularizer=p.regularizer,
gradient_clip_attr=p.gradient_clip_attr,
error_clip=p.error_clip,
name=v.name + ".master")
return new_p
def create_master_params_grads(params_grads, main_prog, startup_prog, scale_loss):
master_params_grads = []
tmp_role = main_prog._current_role
OpRole = fluid.core.op_proto_and_checker_maker.OpRole
main_prog._current_role = OpRole.Backward
for p, g in params_grads:
# create master parameters
master_param = copy_to_master_param(p, main_prog.global_block())
startup_master_param = startup_prog.global_block()._clone_variable(master_param)
startup_p = startup_prog.global_block().var(p.name)
cast_fp16_to_fp32(startup_p, startup_master_param, startup_prog)
# cast fp16 gradients to fp32 before apply gradients
if g.name.startswith("batch_norm"):
if scale_loss > 1:
scaled_g = g / float(scale_loss)
else:
scaled_g = g
master_params_grads.append([p, scaled_g])
continue
master_grad = fluid.layers.cast(g, "float32")
if scale_loss > 1:
master_grad = master_grad / float(scale_loss)
master_params_grads.append([master_param, master_grad])
main_prog._current_role = tmp_role
return master_params_grads
def master_param_to_train_param(master_params_grads, params_grads, main_prog):
for idx, m_p_g in enumerate(master_params_grads):
train_p, _ = params_grads[idx]
if train_p.name.startswith("batch_norm"):
continue
with main_prog._optimized_guard([m_p_g[0], m_p_g[1]]):
cast_fp32_to_fp16(m_p_g[0], train_p, main_prog)
......@@ -248,8 +248,9 @@ def train(args):
print("device count %d" % dev_count)
print("theoretical memory usage: ")
print(fluid.contrib.memory_usage(
program=train_program, batch_size=args.batch_size))
print(
fluid.contrib.memory_usage(
program=train_program, batch_size=args.batch_size))
exe = fluid.Executor(place)
exe.run(train_startup)
......@@ -318,8 +319,9 @@ def train(args):
if (args.save_path is not None) and (step % save_step == 0):
save_path = os.path.join(args.save_path, "step_" + str(step))
print("Save model at step %d ... " % step)
print(time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(time.time())))
print(
time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(time.time())))
fluid.io.save_persistables(exe, save_path, train_program)
score_path = os.path.join(args.save_path, 'score.' + str(step))
......@@ -358,8 +360,9 @@ def train(args):
save_path = os.path.join(args.save_path,
"step_" + str(step))
print("Save model at step %d ... " % step)
print(time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(time.time())))
print(
time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(time.time())))
fluid.io.save_persistables(exe, save_path, train_program)
score_path = os.path.join(args.save_path,
......@@ -389,9 +392,11 @@ def train(args):
global_step, last_cost = train_with_pyreader(global_step)
else:
global_step, last_cost = train_with_feed(global_step)
train_time += time.time() - begin_time
pass_time_cost = time.time() - begin_time
train_time += pass_time_cost
print("Pass {0}, pass_time_cost {1}"
.format(epoch, "%2.2f sec" % (time.time() - begin_time)))
.format(epoch, "%2.2f sec" % pass_time_cost))
# For internal continuous evaluation
if "CE_MODE_X" in os.environ:
print("kpis train_cost %f" % last_cost)
......
......@@ -14,7 +14,7 @@
## 简介,模型详解
在PaddlePaddle v2版本[文本分类](https://github.com/PaddlePaddle/models/blob/develop/text/README.md)中对于文本分类任务有较详细的介绍,在本例中不再重复介绍。
在PaddlePaddle v2版本[文本分类](https://github.com/PaddlePaddle/models/blob/develop/legacy/text_classification/README.md)中对于文本分类任务有较详细的介绍,在本例中不再重复介绍。
在模型上,我们采用了bow, cnn, lstm, gru四种常见的文本分类模型。
## 训练
......
# 文本分类
以下是本例的简要目录结构及说明:
```text
.
|-- README.md # README
|-- data_generator # IMDB数据集生成工具
| |-- IMDB.py # 在data_generator.py基础上扩展IMDB数据集处理逻辑
| |-- build_raw_data.py # IMDB数据预处理,其产出被splitfile.py读取。格式:word word ... | label
| |-- data_generator.py # 与AsyncExecutor配套的数据生成工具框架
| `-- splitfile.py # 将build_raw_data.py生成的文件切分,其产出被IMDB.py读取
|-- data_generator.sh # IMDB数据集生成工具入口
|-- data_reader.py # 预测脚本使用的数据读取工具
|-- infer.py # 预测脚本
`-- train.py # 训练脚本
```
## 简介
本目录包含用fluid.AsyncExecutor训练文本分类任务的脚本。网络模型定义沿用自父目录nets.py
## 训练
1. 运行命令 `sh data_generator.sh`,下载IMDB数据集,并转化成适合AsyncExecutor读取的训练数据
2. 运行命令 `python train.py bow` 开始训练模型。
```python
python train.py bow # bow指定网络结构,可替换成cnn, lstm, gru
```
3. (可选)想自定义网络结构,需在[nets.py](../nets.py)中自行添加,并设置[train.py](./train.py)中的相应参数。
```python
def train(train_reader, # 训练数据
word_dict, # 数据字典
network, # 模型配置
use_cuda, # 是否用GPU
parallel, # 是否并行
save_dirname, # 保存模型路径
lr=0.2, # 学习率大小
batch_size=128, # 每个batch的样本数
pass_num=30): # 训练的轮数
```
## 训练结果示例
```text
pass_id: 0 pass_time_cost 4.723438
pass_id: 1 pass_time_cost 3.867186
pass_id: 2 pass_time_cost 4.490111
pass_id: 3 pass_time_cost 4.573296
pass_id: 4 pass_time_cost 4.180547
pass_id: 5 pass_time_cost 4.214476
pass_id: 6 pass_time_cost 4.520387
pass_id: 7 pass_time_cost 4.149485
pass_id: 8 pass_time_cost 3.821354
pass_id: 9 pass_time_cost 5.136178
pass_id: 10 pass_time_cost 4.137318
pass_id: 11 pass_time_cost 3.943429
pass_id: 12 pass_time_cost 3.766478
pass_id: 13 pass_time_cost 4.235983
pass_id: 14 pass_time_cost 4.796462
pass_id: 15 pass_time_cost 4.668116
pass_id: 16 pass_time_cost 4.373798
pass_id: 17 pass_time_cost 4.298131
pass_id: 18 pass_time_cost 4.260021
pass_id: 19 pass_time_cost 4.244411
pass_id: 20 pass_time_cost 3.705138
pass_id: 21 pass_time_cost 3.728070
pass_id: 22 pass_time_cost 3.817919
pass_id: 23 pass_time_cost 4.698598
pass_id: 24 pass_time_cost 4.859262
pass_id: 25 pass_time_cost 5.725732
pass_id: 26 pass_time_cost 5.102599
pass_id: 27 pass_time_cost 3.876582
pass_id: 28 pass_time_cost 4.762538
pass_id: 29 pass_time_cost 3.797759
```
与fluid.Executor不同,AsyncExecutor在每个pass结束不会将accuracy打印出来。为了观察训练过程,可以将fluid.AsyncExecutor.run()方法的Debug参数设为True,这样每个pass结束会把参数指定的fetch variable打印出来:
```
async_executor.run(
main_program,
dataset,
filelist,
thread_num,
[acc],
debug=True)
```
## 预测
1. 运行命令 `python infer.py bow_model`, 开始预测。
```python
python infer.py bow_model # bow_model指定需要导入的模型
```
## 预测结果示例
```text
model_path: bow_model/epoch0.model, avg_acc: 0.882600
model_path: bow_model/epoch1.model, avg_acc: 0.887920
model_path: bow_model/epoch2.model, avg_acc: 0.886920
model_path: bow_model/epoch3.model, avg_acc: 0.884720
model_path: bow_model/epoch4.model, avg_acc: 0.879760
model_path: bow_model/epoch5.model, avg_acc: 0.876920
model_path: bow_model/epoch6.model, avg_acc: 0.874160
model_path: bow_model/epoch7.model, avg_acc: 0.872000
model_path: bow_model/epoch8.model, avg_acc: 0.870360
model_path: bow_model/epoch9.model, avg_acc: 0.868480
model_path: bow_model/epoch10.model, avg_acc: 0.867240
model_path: bow_model/epoch11.model, avg_acc: 0.866200
model_path: bow_model/epoch12.model, avg_acc: 0.865560
model_path: bow_model/epoch13.model, avg_acc: 0.865160
model_path: bow_model/epoch14.model, avg_acc: 0.864480
model_path: bow_model/epoch15.model, avg_acc: 0.864240
model_path: bow_model/epoch16.model, avg_acc: 0.863800
model_path: bow_model/epoch17.model, avg_acc: 0.863520
model_path: bow_model/epoch18.model, avg_acc: 0.862760
model_path: bow_model/epoch19.model, avg_acc: 0.862680
model_path: bow_model/epoch20.model, avg_acc: 0.862240
model_path: bow_model/epoch21.model, avg_acc: 0.862280
model_path: bow_model/epoch22.model, avg_acc: 0.862080
model_path: bow_model/epoch23.model, avg_acc: 0.861560
model_path: bow_model/epoch24.model, avg_acc: 0.861280
model_path: bow_model/epoch25.model, avg_acc: 0.861160
model_path: bow_model/epoch26.model, avg_acc: 0.861080
model_path: bow_model/epoch27.model, avg_acc: 0.860920
model_path: bow_model/epoch28.model, avg_acc: 0.860800
model_path: bow_model/epoch29.model, avg_acc: 0.860760
```
注:过拟合导致acc持续下降,请忽略
#!/usr/bin/env bash
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
pushd .
cd ./data_generator
# wget "http://ai.stanford.edu/%7Eamaas/data/sentiment/aclImdb_v1.tar.gz"
if [ ! -f aclImdb_v1.tar.gz ]; then
wget "http://10.64.74.104:8080/paddle/dataset/imdb/aclImdb_v1.tar.gz"
fi
tar zxvf aclImdb_v1.tar.gz
mkdir train_data
python build_raw_data.py train | python splitfile.py 12 train_data
mkdir test_data
python build_raw_data.py test | python splitfile.py 12 test_data
/opt/python27/bin/python IMDB.py train_data
/opt/python27/bin/python IMDB.py test_data
mv ./output_dataset/train_data ../
mv ./output_dataset/test_data ../
cp aclImdb/imdb.vocab ../
rm -rf ./output_dataset
rm -rf train_data
rm -rf test_data
rm -rf aclImdb
popd
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import re
import os, sys
sys.path.append(os.path.abspath(os.path.join('..')))
from data_generator import MultiSlotDataGenerator
class IMDbDataGenerator(MultiSlotDataGenerator):
def load_resource(self, dictfile):
self._vocab = {}
wid = 0
with open(dictfile) as f:
for line in f:
self._vocab[line.strip()] = wid
wid += 1
self._unk_id = len(self._vocab)
self._pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
def process(self, line):
send = '|'.join(line.split('|')[:-1]).lower().replace("<br />",
" ").strip()
label = [int(line.split('|')[-1])]
words = [x for x in self._pattern.split(send) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return ("words", feas), ("label", label)
imdb = IMDbDataGenerator()
imdb.load_resource("aclImdb/imdb.vocab")
# data from files
file_names = os.listdir(sys.argv[1])
filelist = []
for i in range(0, len(file_names)):
filelist.append(os.path.join(sys.argv[1], file_names[i]))
line_limit = 2500
process_num = 24
imdb.run_from_files(
filelist=filelist,
line_limit=line_limit,
process_num=process_num,
output_dir=('output_dataset/%s' % (sys.argv[1])))
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Split file into parts
"""
import sys
import os
block = int(sys.argv[1])
datadir = sys.argv[2]
file_list = []
for i in range(block):
file_list.append(open(datadir + "/part-" + str(i), "w"))
id_ = 0
for line in sys.stdin:
file_list[id_ % block].write(line)
id_ += 1
for f in file_list:
f.close()
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import os
import paddle
def parse_fields(fields):
words_width = int(fields[0])
words = fields[1:1 + words_width]
label = fields[-1]
return words, label
def imdb_data_feed_reader(data_dir, batch_size, buf_size):
"""
Data feed reader for IMDB dataset.
This data set has been converted from original format to a format suitable
for AsyncExecutor
See data.proto for data format
"""
def reader():
for file in os.listdir(data_dir):
if file.endswith('.proto'):
continue
with open(os.path.join(data_dir, file), 'r') as f:
for line in f:
fields = line.split(' ')
words, label = parse_fields(fields)
yield words, label
test_reader = paddle.batch(
paddle.reader.shuffle(
reader, buf_size=buf_size), batch_size=batch_size)
return test_reader
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import time
import unittest
import contextlib
import numpy as np
import paddle
import paddle.fluid as fluid
import data_reader
def infer(test_reader, use_cuda, model_path=None):
"""
inference function
"""
if model_path is None:
print(str(model_path) + " cannot be found")
return
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
inference_scope = fluid.core.Scope()
with fluid.scope_guard(inference_scope):
[inference_program, feed_target_names,
fetch_targets] = fluid.io.load_inference_model(model_path, exe)
total_acc = 0.0
total_count = 0
for data in test_reader():
acc = exe.run(inference_program,
feed=utils.data2tensor(data, place),
fetch_list=fetch_targets,
return_numpy=True)
total_acc += acc[0] * len(data)
total_count += len(data)
avg_acc = total_acc / total_count
print("model_path: %s, avg_acc: %f" % (model_path, avg_acc))
if __name__ == "__main__":
if __package__ is None:
from os import sys, path
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
import utils
batch_size = 128
model_path = sys.argv[1]
test_data_dirname = 'test_data'
if len(sys.argv) == 3:
test_data_dirname = sys.argv[2]
test_reader = data_reader.imdb_data_feed_reader(
'test_data', batch_size, buf_size=500000)
models = os.listdir(model_path)
for i in range(0, len(models)):
epoch_path = "epoch" + str(i) + ".model"
epoch_path = os.path.join(model_path, epoch_path)
infer(test_reader, use_cuda=False, model_path=epoch_path)
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import time
import multiprocessing
import paddle
import paddle.fluid as fluid
def train(network, dict_dim, lr, save_dirname, training_data_dirname, pass_num,
thread_num, batch_size):
file_names = os.listdir(training_data_dirname)
filelist = []
for i in range(0, len(file_names)):
if file_names[i] == 'data_feed.proto':
continue
filelist.append(os.path.join(training_data_dirname, file_names[i]))
dataset = fluid.DataFeedDesc(
os.path.join(training_data_dirname, 'data_feed.proto'))
dataset.set_batch_size(
batch_size) # datafeed should be assigned a batch size
dataset.set_use_slots(['words', 'label'])
data = fluid.layers.data(
name="words", shape=[1], dtype="int64", lod_level=1)
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
avg_cost, acc, prediction = network(data, label, dict_dim)
optimizer = fluid.optimizer.Adagrad(learning_rate=lr)
opt_ops, weight_and_grad = optimizer.minimize(avg_cost)
startup_program = fluid.default_startup_program()
main_program = fluid.default_main_program()
place = fluid.CPUPlace()
executor = fluid.Executor(place)
executor.run(startup_program)
async_executor = fluid.AsyncExecutor(place)
for i in range(pass_num):
pass_start = time.time()
async_executor.run(main_program,
dataset,
filelist,
thread_num, [acc],
debug=False)
print('pass_id: %u pass_time_cost %f' % (i, time.time() - pass_start))
fluid.io.save_inference_model('%s/epoch%d.model' % (save_dirname, i),
[data.name, label.name], [acc], executor)
if __name__ == "__main__":
if __package__ is None:
from os import sys, path
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
from nets import bow_net, cnn_net, lstm_net, gru_net
from utils import load_vocab
batch_size = 4
lr = 0.002
pass_num = 30
save_dirname = ""
thread_num = multiprocessing.cpu_count()
if sys.argv[1] == "bow":
network = bow_net
batch_size = 128
save_dirname = "bow_model"
elif sys.argv[1] == "cnn":
network = cnn_net
lr = 0.01
save_dirname = "cnn_model"
elif sys.argv[1] == "lstm":
network = lstm_net
lr = 0.05
save_dirname = "lstm_model"
elif sys.argv[1] == "gru":
network = gru_net
batch_size = 128
lr = 0.05
save_dirname = "gru_model"
training_data_dirname = 'train_data/'
if len(sys.argv) == 3:
training_data_dirname = sys.argv[2]
if len(sys.argv) == 4:
if thread_num >= int(sys.argv[3]):
thread_num = int(sys.argv[3])
vocab = load_vocab('imdb.vocab')
dict_dim = len(vocab)
train(network, dict_dim, lr, save_dirname, training_data_dirname, pass_num,
thread_num, batch_size)
#!/bin/bash
wget --no-check-certificate https://s3-eu-west-1.amazonaws.com/criteo-labs/dac.tar.gz
wget --no-check-certificate https://s3-eu-west-1.amazonaws.com/kaggle-display-advertising-challenge-dataset/dac.tar.gz
tar zxf dac.tar.gz
rm -f dac.tar.gz
......
......@@ -25,7 +25,14 @@ cd data && ./download.sh && cd ..
```bash
python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict
```
如果您想使用我们支持的第三方词汇表,请将--other_dict_path设置为您存放将使用的词汇表的目录,并设置--with_other_dict使用它
如果您想使用自定义的词典形如:
```bash
<UNK>
a
b
c
```
请将--other_dict_path设置为您存放将使用的词典的目录,并设置--with_other_dict使用它
## 训练
训练的命令行选项可以通过`python train.py -h`列出。
......@@ -40,6 +47,14 @@ python train.py \
--with_hs --with_nce --is_local \
2>&1 | tee train.log
```
如果您想使用自定义的词典形如:
```bash
<UNK>
a
b
c
```
请将--other_dict_path设置为您存放将使用的词典的目录,并设置--with_other_dict使用它
### 分布式训练
......
......@@ -29,9 +29,16 @@ This model implement a skip-gram model of word2vector.
Preprocess the training data to generate a word dict.
```bash
python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --is_local --dict_path data/1-billion_dict
python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict
```
if you would like to use our supported third party vocab, please set --other_dict_path as the directory of where you
if you would like to use your own vocab follow the format below:
```bash
<UNK>
a
b
c
```
Then, please set --other_dict_path as the directory of where you
save the vocab you will use and set --with_other_dict flag on to using it.
## Train
......@@ -47,7 +54,8 @@ python train.py \
--with_hs --with_nce --is_local \
2>&1 | tee train.log
```
if you would like to use our supported third party vocab, please set --other_dict_path as the directory of where you
save the vocab you will use and set --with_other_dict flag on to using it.
### Distributed Train
Run a 2 pserver 2 trainer distribute training on a single machine.
......
......@@ -27,12 +27,6 @@ def parse_args():
type=int,
default=5,
help="If the word count is less then freq, it will be removed from dict")
parser.add_argument(
'--is_local',
action='store_true',
required=False,
default=False,
help='Local train or not, (default: False)')
parser.add_argument(
'--with_other_dict',
......@@ -203,26 +197,27 @@ def preprocess(args):
for line in f:
word_count[native_to_unicode(line.strip())] = 1
if args.is_local:
for i in range(1, 100):
with io.open(
args.data_path + "/news.en-000{:0>2d}-of-00100".format(i),
encoding='utf-8') as f:
for line in f:
for i in range(1, 100):
with io.open(
args.data_path + "/news.en-000{:0>2d}-of-00100".format(i),
encoding='utf-8') as f:
for line in f:
if args.with_other_dict:
line = strip_lines(line)
words = line.split()
if args.with_other_dict:
for item in words:
if item in word_count:
word_count[item] = word_count[item] + 1
else:
word_count[native_to_unicode('<UNK>')] += 1
else:
for item in words:
if item in word_count:
word_count[item] = word_count[item] + 1
else:
word_count[item] = 1
for item in words:
if item in word_count:
word_count[item] = word_count[item] + 1
else:
word_count[native_to_unicode('<UNK>')] += 1
else:
line = text_strip(line)
words = line.split()
for item in words:
if item in word_count:
word_count[item] = word_count[item] + 1
else:
word_count[item] = 1
item_to_remove = []
for item in word_count:
if word_count[item] <= args.freq:
......
......@@ -105,7 +105,7 @@ class Word2VecReader(object):
return set(targets)
def train(self, with_hs):
def train(self, with_hs, with_other_dict):
def _reader():
for file in self.filelist:
with io.open(
......@@ -116,7 +116,11 @@ class Word2VecReader(object):
count = 1
for line in f:
if self.trainer_id == count % self.trainer_num:
line = preprocess.strip_lines(line, self.word_count)
if with_other_dict:
line = preprocess.strip_lines(line,
self.word_count)
else:
line = preprocess.text_strip(line)
word_ids = [
self.word_to_id_[word] for word in line.split()
if word in self.word_to_id_
......@@ -140,7 +144,11 @@ class Word2VecReader(object):
count = 1
for line in f:
if self.trainer_id == count % self.trainer_num:
line = preprocess.strip_lines(line, self.word_count)
if with_other_dict:
line = preprocess.strip_lines(line,
self.word_count)
else:
line = preprocess.text_strip(line)
word_ids = [
self.word_to_id_[word] for word in line.split()
if word in self.word_to_id_
......
......@@ -116,6 +116,13 @@ def parse_args():
default=False,
help='Do inference every 100 batches , (default: False)')
parser.add_argument(
'--with_other_dict',
action='store_true',
required=False,
default=False,
help='if use other dict , (default: False)')
parser.add_argument(
'--rank_num',
type=int,
......@@ -161,8 +168,8 @@ def train_loop(args, train_program, reader, py_reader, loss, trainer_id):
py_reader.decorate_tensor_provider(
convert_python_to_tensor(args.batch_size,
reader.train((args.with_hs or (
not args.with_nce))), (args.with_hs or (
not args.with_nce))))
not args.with_nce)), args.with_other_dict),
(args.with_hs or (not args.with_nce))))
place = fluid.CPUPlace()
......@@ -261,7 +268,7 @@ def train(args):
args.dict_path, args.train_data_path, filelist, 0, 1)
else:
trainer_id = int(os.environ["PADDLE_TRAINER_ID"])
trainers = int(os.environ["PADDLE_TRAINERS"])
trainer_num = int(os.environ["PADDLE_TRAINERS"])
word2vec_reader = reader.Word2VecReader(args.dict_path,
args.train_data_path, filelist,
trainer_id, trainer_num)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册