Created by: Xreki
- Paddle的develop代码中合入了fusion_group检测的功能,可将如下代码计算自动融合成一个
fusion_group_op
来计算,可将static模式small模型性能从60 steps/s提升到84 steps/s。但该功能未经过充分的验证,因此添加一个参数enable_auto_fusion
,以打开该功能,用于在Benchmark系统中测试。
c = pre_cell * layers.sigmoid(f) + layers.sigmoid(i) * layers.tanh(j)
m = layers.tanh(c) * layers.sigmoid(o)
- profile时不应该执行
eval
。
develop版本small static训练log如下:
+ python -u train.py --data_path data/simple-examples/data/ --model_type small --use_gpu True --enable_ce --max_epoch=3 --rnn_model static --use_dataloader True --enable_auto_fusion True --profile 0 --profiler_path=/work/models/PaddleNLP/language_model/profiler_paddingrnn_small_static --batch_size 20
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py:147: UserWarning: PaddlePaddle version 1.6.0 or higher is required, but 0.0.0 installed, Maybe you are using a develop version, please make sure the version is good with your code.
(min_version, fluid_version.full_version))
2020-02-11 03:02:26,571 - lm - INFO - Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_auto_fusion=True, enable_ce=True, init_from_pretrain_model=None, log_path=None, max_epoch=3, model_type='small', para_init=False, parallel=True, profile=False, profiler_path='/work/models/PaddleNLP/language_model/profiler_paddingrnn_small_static', rnn_model='static', save_model_dir='models', use_dataloader=True, use_gpu=True)
2020-02-11 03:02:26,571-INFO: Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_auto_fusion=True, enable_ce=True, init_from_pretrain_model=None, log_path=None, max_epoch=3, model_type='small', para_init=False, parallel=True, profile=False, profiler_path='/work/models/PaddleNLP/language_model/profiler_paddingrnn_small_static', rnn_model='static', save_model_dir='models', use_dataloader=True, use_gpu=True)
W0211 03:02:28.825318 25986 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W0211 03:02:28.833045 25986 device_context.cc:245] device: 0, cuDNN Version: 7.5.
/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py:147: UserWarning: PaddlePaddle version 1.7.0 or higher is required, but 0.0.0 installed, Maybe you are using a develop version, please make sure the version is good with your code.
(min_version, fluid_version.full_version))
begin to load data
vocab word num 10000
finished load data
I0211 03:02:31.421459 25986 parallel_executor.cc:430] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
W0211 03:02:53.896385 25986 fuse_optimizer_op_pass.cc:191] Find sgd operators : 7, and 7 for dense gradients. To make the speed faster, those optimization are fused during training.
I0211 03:02:53.925590 25986 build_strategy.cc:368] SeqOnlyAllReduceOps:0, num_trainers:1
I0211 03:02:53.980311 25986 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0211 03:02:53.994568 25986 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
-- Epoch:[0]; Batch:[232]; Time: 0.01198 s; ppl: 860.27307, lr: 1.00000
-- Epoch:[0]; Batch:[464]; Time: 0.01208 s; ppl: 632.97675, lr: 1.00000
-- Epoch:[0]; Batch:[696]; Time: 0.01195 s; ppl: 510.73911, lr: 1.00000
-- Epoch:[0]; Batch:[928]; Time: 0.01209 s; ppl: 438.34106, lr: 1.00000
-- Epoch:[0]; Batch:[1160]; Time: 0.01187 s; ppl: 393.82922, lr: 1.00000
-- Epoch:[0]; Batch:[1392]; Time: 0.01198 s; ppl: 353.60745, lr: 1.00000
-- Epoch:[0]; Batch:[1624]; Time: 0.01206 s; ppl: 325.96683, lr: 1.00000
-- Epoch:[0]; Batch:[1856]; Time: 0.01363 s; ppl: 305.43149, lr: 1.00000
-- Epoch:[0]; Batch:[2088]; Time: 0.01187 s; ppl: 285.78018, lr: 1.00000
-- Epoch:[0]; Batch:[2320]; Time: 0.01242 s; ppl: 270.18124, lr: 1.00000
Train epoch:[0]; epoch Time: 52.11883; ppl: 270.12582; avg_time: 44.64967 steps/s
Valid ppl: 178.84186
Saved model to: models/0/params.
-- Epoch:[1]; Batch:[232]; Time: 0.01204 s; ppl: 151.05600, lr: 1.00000
-- Epoch:[1]; Batch:[464]; Time: 0.01266 s; ppl: 158.79492, lr: 1.00000
-- Epoch:[1]; Batch:[696]; Time: 0.01202 s; ppl: 153.96539, lr: 1.00000
-- Epoch:[1]; Batch:[928]; Time: 0.01185 s; ppl: 150.36273, lr: 1.00000
-- Epoch:[1]; Batch:[1160]; Time: 0.01205 s; ppl: 148.70813, lr: 1.00000
-- Epoch:[1]; Batch:[1392]; Time: 0.01187 s; ppl: 143.61874, lr: 1.00000
-- Epoch:[1]; Batch:[1624]; Time: 0.01522 s; ppl: 141.26443, lr: 1.00000
-- Epoch:[1]; Batch:[1856]; Time: 0.01220 s; ppl: 139.66539, lr: 1.00000
-- Epoch:[1]; Batch:[2088]; Time: 0.01191 s; ppl: 136.02548, lr: 1.00000
-- Epoch:[1]; Batch:[2320]; Time: 0.01184 s; ppl: 133.62427, lr: 1.00000
Train epoch:[1]; epoch Time: 29.02805; ppl: 133.63580; avg_time: 80.21580 steps/s
Valid ppl: 143.01823
Saved model to: models/1/params.
develop版本large static模型训练log如下:
+ python -u train.py --data_path data/simple-examples/data/ --model_type large --use_gpu True --enable_ce --max_epoch=3 --rnn_model static --use_dataloader True --enable_auto_fusion True --profile 0 --profiler_path=/work/models/PaddleNLP/language_model/profiler_paddingrnn_large_static --batch_size 20
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py:147: UserWarning: PaddlePaddle version 1.6.0 or higher is required, but 0.0.0 installed, Maybe you are using a develop version, please make sure the version is good with your code.
(min_version, fluid_version.full_version))
2020-02-11 03:05:36,212 - lm - INFO - Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_auto_fusion=True, enable_ce=True, init_from_pretrain_model=None, log_path=None, max_epoch=3, model_type='large', para_init=False, parallel=True, profile=False, profiler_path='/work/models/PaddleNLP/language_model/profiler_paddingrnn_large_static', rnn_model='static', save_model_dir='models', use_dataloader=True, use_gpu=True)
2020-02-11 03:05:36,212-INFO: Running with args : Namespace(batch_size=20, data_path='data/simple-examples/data/', enable_auto_fusion=True, enable_ce=True, init_from_pretrain_model=None, log_path=None, max_epoch=3, model_type='large', para_init=False, parallel=True, profile=False, profiler_path='/work/models/PaddleNLP/language_model/profiler_paddingrnn_large_static', rnn_model='static', save_model_dir='models', use_dataloader=True, use_gpu=True)
W0211 03:05:39.587276 26016 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 10.1, Runtime API Version: 9.0
W0211 03:05:39.591958 26016 device_context.cc:245] device: 0, cuDNN Version: 7.5.
/usr/local/lib/python2.7/dist-packages/paddle/fluid/framework.py:147: UserWarning: PaddlePaddle version 1.7.0 or higher is required, but 0.0.0 installed, Maybe you are using a develop version, please make sure the version is good with your code.
(min_version, fluid_version.full_version))
begin to load data
vocab word num 10000
finished load data
I0211 03:05:42.239248 26016 parallel_executor.cc:430] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
W0211 03:06:29.715612 26016 fuse_optimizer_op_pass.cc:191] Find sgd operators : 7, and 7 for dense gradients. To make the speed faster, those optimization are fused during training.
I0211 03:06:29.775419 26016 build_strategy.cc:368] SeqOnlyAllReduceOps:0, num_trainers:1
I0211 03:06:29.936949 26016 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0211 03:06:29.978932 26016 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
-- Epoch:[0]; Batch:[132]; Time: 0.05168 s; ppl: 5350.31348, lr: 1.00000
-- Epoch:[0]; Batch:[264]; Time: 0.05244 s; ppl: 2302.86011, lr: 1.00000
-- Epoch:[0]; Batch:[396]; Time: 0.05178 s; ppl: 1542.09558, lr: 1.00000
-- Epoch:[0]; Batch:[528]; Time: 0.05234 s; ppl: 1232.63391, lr: 1.00000
-- Epoch:[0]; Batch:[660]; Time: 0.05203 s; ppl: 1059.05396, lr: 1.00000
-- Epoch:[0]; Batch:[792]; Time: 0.05198 s; ppl: 922.34430, lr: 1.00000
-- Epoch:[0]; Batch:[924]; Time: 0.05422 s; ppl: 823.51917, lr: 1.00000
-- Epoch:[0]; Batch:[1056]; Time: 0.05134 s; ppl: 747.93158, lr: 1.00000
-- Epoch:[0]; Batch:[1188]; Time: 0.05243 s; ppl: 683.94556, lr: 1.00000
-- Epoch:[0]; Batch:[1320]; Time: 0.05182 s; ppl: 631.68329, lr: 1.00000
Train epoch:[0]; epoch Time: 117.88483; ppl: 629.88489; avg_time: 11.27102 steps/s
Valid ppl: 300.55658
Saved model to: models/0/params.
-- Epoch:[1]; Batch:[132]; Time: 0.05165 s; ppl: 284.29178, lr: 1.00000
-- Epoch:[1]; Batch:[264]; Time: 0.05174 s; ppl: 293.22693, lr: 1.00000
-- Epoch:[1]; Batch:[396]; Time: 0.05155 s; ppl: 285.89767, lr: 1.00000
-- Epoch:[1]; Batch:[528]; Time: 0.05342 s; ppl: 279.74893, lr: 1.00000
-- Epoch:[1]; Batch:[660]; Time: 0.05232 s; ppl: 277.07483, lr: 1.00000
-- Epoch:[1]; Batch:[792]; Time: 0.05249 s; ppl: 267.44934, lr: 1.00000
-- Epoch:[1]; Batch:[924]; Time: 0.05259 s; ppl: 261.28140, lr: 1.00000
-- Epoch:[1]; Batch:[1056]; Time: 0.05274 s; ppl: 256.47076, lr: 1.00000
-- Epoch:[1]; Batch:[1188]; Time: 0.05138 s; ppl: 249.94817, lr: 1.00000
-- Epoch:[1]; Batch:[1320]; Time: 0.05127 s; ppl: 244.38910, lr: 1.00000
Train epoch:[1]; epoch Time: 69.60030; ppl: 244.37781; avg_time: 19.09562 steps/s
Valid ppl: 193.27681
Saved model to: models/1/params.