Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • models
  • 合并请求
  • !4252

M
models
  • 项目概览

PaddlePaddle / models
大约 2 年 前同步成功

通知 232
Star 6828
Fork 2962
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
M
models
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 602
    • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
    • 合并请求 255
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

Add an argument to enable the use of experimental feature, fusion_group. !4252

  • Report abuse
!4252 已合并 2月 08, 2020 由 saxon_zh@saxon_zh 创建
#<User:0x00007fed5f0990d0>
  • 概览 2
  • 提交 1
  • 变更 3

Created by: Xreki

  • Paddle的develop代码中合入了fusion_group检测的功能,可将如下代码计算自动融合成一个fusion_group_op来计算,可将static模式small模型性能从60 steps/s提升到84 steps/s。但该功能未经过充分的验证,因此添加一个参数enable_auto_fusion,以打开该功能,用于在Benchmark系统中测试。
c = pre_cell * layers.sigmoid(f) + layers.sigmoid(i) * layers.tanh(j)
m = layers.tanh(c) * layers.sigmoid(o)
  • profile时不应该执行eval。

develop版本small static训练log如下:

+ python -u train.py --data_path data/simple-examples/data/ --model_type small --use_gpu True --enable_ce --max_epoch=3 --rnn_model static --use_dataloader True --enable_auto_fusion True --profile 0 --profiler_path=/work/models/PaddleNLP/language_model/profiler_paddingrnn_small_static --batch_size 20
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py:147: UserWarning: PaddlePaddle version 1.6.0 or higher is required, but 0.0.0 installed, Maybe you are using a develop version, please make sure the version is good with your code.
  (min_version, fluid_version.full_version))
2020-02-11 03:02:26,571 - lm - INFO - Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_auto_fusion=True, enable_ce=True, init_from_pretrain_model=None, log_path=None, max_epoch=3, model_type='small', para_init=False, parallel=True, profile=False, profiler_path='/work/models/PaddleNLP/language_model/profiler_paddingrnn_small_static', rnn_model='static', save_model_dir='models', use_dataloader=True, use_gpu=True)
2020-02-11 03:02:26,571-INFO: Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_auto_fusion=True, enable_ce=True, init_from_pretrain_model=None, log_path=None, max_epoch=3, model_type='small', para_init=False, parallel=True, profile=False, profiler_path='/work/models/PaddleNLP/language_model/profiler_paddingrnn_small_static', rnn_model='static', save_model_dir='models', use_dataloader=True, use_gpu=True)
W0211 03:02:28.825318 25986 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W0211 03:02:28.833045 25986 device_context.cc:245] device: 0, cuDNN Version: 7.5.
/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py:147: UserWarning: PaddlePaddle version 1.7.0 or higher is required, but 0.0.0 installed, Maybe you are using a develop version, please make sure the version is good with your code.
  (min_version, fluid_version.full_version))
begin to load data
vocab word num 10000
finished load data
I0211 03:02:31.421459 25986 parallel_executor.cc:430] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
W0211 03:02:53.896385 25986 fuse_optimizer_op_pass.cc:191] Find sgd operators : 7, and 7 for dense gradients. To make the speed faster, those optimization are fused during training.
I0211 03:02:53.925590 25986 build_strategy.cc:368] SeqOnlyAllReduceOps:0, num_trainers:1
I0211 03:02:53.980311 25986 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0211 03:02:53.994568 25986 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
-- Epoch:[0]; Batch:[232]; Time: 0.01198 s; ppl: 860.27307, lr: 1.00000
-- Epoch:[0]; Batch:[464]; Time: 0.01208 s; ppl: 632.97675, lr: 1.00000
-- Epoch:[0]; Batch:[696]; Time: 0.01195 s; ppl: 510.73911, lr: 1.00000
-- Epoch:[0]; Batch:[928]; Time: 0.01209 s; ppl: 438.34106, lr: 1.00000
-- Epoch:[0]; Batch:[1160]; Time: 0.01187 s; ppl: 393.82922, lr: 1.00000
-- Epoch:[0]; Batch:[1392]; Time: 0.01198 s; ppl: 353.60745, lr: 1.00000
-- Epoch:[0]; Batch:[1624]; Time: 0.01206 s; ppl: 325.96683, lr: 1.00000
-- Epoch:[0]; Batch:[1856]; Time: 0.01363 s; ppl: 305.43149, lr: 1.00000
-- Epoch:[0]; Batch:[2088]; Time: 0.01187 s; ppl: 285.78018, lr: 1.00000
-- Epoch:[0]; Batch:[2320]; Time: 0.01242 s; ppl: 270.18124, lr: 1.00000

Train epoch:[0]; epoch Time: 52.11883; ppl: 270.12582; avg_time: 44.64967 steps/s

Valid ppl: 178.84186
Saved model to: models/0/params.

-- Epoch:[1]; Batch:[232]; Time: 0.01204 s; ppl: 151.05600, lr: 1.00000
-- Epoch:[1]; Batch:[464]; Time: 0.01266 s; ppl: 158.79492, lr: 1.00000
-- Epoch:[1]; Batch:[696]; Time: 0.01202 s; ppl: 153.96539, lr: 1.00000
-- Epoch:[1]; Batch:[928]; Time: 0.01185 s; ppl: 150.36273, lr: 1.00000
-- Epoch:[1]; Batch:[1160]; Time: 0.01205 s; ppl: 148.70813, lr: 1.00000
-- Epoch:[1]; Batch:[1392]; Time: 0.01187 s; ppl: 143.61874, lr: 1.00000
-- Epoch:[1]; Batch:[1624]; Time: 0.01522 s; ppl: 141.26443, lr: 1.00000
-- Epoch:[1]; Batch:[1856]; Time: 0.01220 s; ppl: 139.66539, lr: 1.00000
-- Epoch:[1]; Batch:[2088]; Time: 0.01191 s; ppl: 136.02548, lr: 1.00000
-- Epoch:[1]; Batch:[2320]; Time: 0.01184 s; ppl: 133.62427, lr: 1.00000

Train epoch:[1]; epoch Time: 29.02805; ppl: 133.63580; avg_time: 80.21580 steps/s

Valid ppl: 143.01823
Saved model to: models/1/params.

develop版本large static模型训练log如下:

+ python -u train.py --data_path data/simple-examples/data/ --model_type large --use_gpu True --enable_ce --max_epoch=3 --rnn_model static --use_dataloader True --enable_auto_fusion True --profile 0 --profiler_path=/work/models/PaddleNLP/language_model/profiler_paddingrnn_large_static --batch_size 20
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py:147: UserWarning: PaddlePaddle version 1.6.0 or higher is required, but 0.0.0 installed, Maybe you are using a develop version, please make sure the version is good with your code.
  (min_version, fluid_version.full_version))
2020-02-11 03:05:36,212 - lm - INFO - Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_auto_fusion=True, enable_ce=True, init_from_pretrain_model=None, log_path=None, max_epoch=3, model_type='large', para_init=False, parallel=True, profile=False, profiler_path='/work/models/PaddleNLP/language_model/profiler_paddingrnn_large_static', rnn_model='static', save_model_dir='models', use_dataloader=True, use_gpu=True)
2020-02-11 03:05:36,212-INFO: Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_auto_fusion=True, enable_ce=True, init_from_pretrain_model=None, log_path=None, max_epoch=3, model_type='large', para_init=False, parallel=True, profile=False, profiler_path='/work/models/PaddleNLP/language_model/profiler_paddingrnn_large_static', rnn_model='static', save_model_dir='models', use_dataloader=True, use_gpu=True)
W0211 03:05:39.587276 26016 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W0211 03:05:39.591958 26016 device_context.cc:245] device: 0, cuDNN Version: 7.5.
/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py:147: UserWarning: PaddlePaddle version 1.7.0 or higher is required, but 0.0.0 installed, Maybe you are using a develop version, please make sure the version is good with your code.
  (min_version, fluid_version.full_version))
begin to load data
vocab word num 10000
finished load data
I0211 03:05:42.239248 26016 parallel_executor.cc:430] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
W0211 03:06:29.715612 26016 fuse_optimizer_op_pass.cc:191] Find sgd operators : 7, and 7 for dense gradients. To make the speed faster, those optimization are fused during training.
I0211 03:06:29.775419 26016 build_strategy.cc:368] SeqOnlyAllReduceOps:0, num_trainers:1
I0211 03:06:29.936949 26016 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0211 03:06:29.978932 26016 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
-- Epoch:[0]; Batch:[132]; Time: 0.05168 s; ppl: 5350.31348, lr: 1.00000
-- Epoch:[0]; Batch:[264]; Time: 0.05244 s; ppl: 2302.86011, lr: 1.00000
-- Epoch:[0]; Batch:[396]; Time: 0.05178 s; ppl: 1542.09558, lr: 1.00000
-- Epoch:[0]; Batch:[528]; Time: 0.05234 s; ppl: 1232.63391, lr: 1.00000
-- Epoch:[0]; Batch:[660]; Time: 0.05203 s; ppl: 1059.05396, lr: 1.00000
-- Epoch:[0]; Batch:[792]; Time: 0.05198 s; ppl: 922.34430, lr: 1.00000
-- Epoch:[0]; Batch:[924]; Time: 0.05422 s; ppl: 823.51917, lr: 1.00000
-- Epoch:[0]; Batch:[1056]; Time: 0.05134 s; ppl: 747.93158, lr: 1.00000
-- Epoch:[0]; Batch:[1188]; Time: 0.05243 s; ppl: 683.94556, lr: 1.00000
-- Epoch:[0]; Batch:[1320]; Time: 0.05182 s; ppl: 631.68329, lr: 1.00000

Train epoch:[0]; epoch Time: 117.88483; ppl: 629.88489; avg_time: 11.27102 steps/s

Valid ppl: 300.55658
Saved model to: models/0/params.

-- Epoch:[1]; Batch:[132]; Time: 0.05165 s; ppl: 284.29178, lr: 1.00000
-- Epoch:[1]; Batch:[264]; Time: 0.05174 s; ppl: 293.22693, lr: 1.00000
-- Epoch:[1]; Batch:[396]; Time: 0.05155 s; ppl: 285.89767, lr: 1.00000
-- Epoch:[1]; Batch:[528]; Time: 0.05342 s; ppl: 279.74893, lr: 1.00000
-- Epoch:[1]; Batch:[660]; Time: 0.05232 s; ppl: 277.07483, lr: 1.00000
-- Epoch:[1]; Batch:[792]; Time: 0.05249 s; ppl: 267.44934, lr: 1.00000
-- Epoch:[1]; Batch:[924]; Time: 0.05259 s; ppl: 261.28140, lr: 1.00000
-- Epoch:[1]; Batch:[1056]; Time: 0.05274 s; ppl: 256.47076, lr: 1.00000
-- Epoch:[1]; Batch:[1188]; Time: 0.05138 s; ppl: 249.94817, lr: 1.00000
-- Epoch:[1]; Batch:[1320]; Time: 0.05127 s; ppl: 244.38910, lr: 1.00000

Train epoch:[1]; epoch Time: 69.60030; ppl: 244.37781; avg_time: 19.09562 steps/s

Valid ppl: 193.27681
Saved model to: models/1/params.
指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/models!4252
Source branch: github/fork/Xreki/language_model/enable_fusion_group
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7