The profiling results for book/test_recommender_system.py .
Created by: chengduoZH
Config and Env
- test_recommender_system.py
- pass=2; batch_size = 512
- OMP_NUM_THREADS = 1
- Env: nvidia-docker, CPU, profiling machine.
The profiling results
- By code: https://github.com/chengduoZH/Paddle/tree/follow_qingqing_add_timer
- Compling mode: Release
- CMake config:
-DWITH_GPU=ON -DWITH_TIMER=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release
I1211 11:50:30.828790 599 Stat.cc:102] ======= StatSet: [GlobalStatInfo] status ======
I1211 11:50:30.828833 599 Stat.cc:105] Stat=sgd TID=599 total=1881.91 avg=0.023 max=1.329 min=0.003 count=80914
I1211 11:50:30.828867 599 Stat.cc:105] Stat=sequence_conv_grad TID=599 total=4819.14 avg=1.369 max=4.989 min=0.153 count=3518
I1211 11:50:30.828876 599 Stat.cc:105] Stat=concat_grad TID=599 total=312.379 avg=0.044 max=0.136 min=0.01 count=7036
I1211 11:50:30.828882 599 Stat.cc:105] Stat=sequence_pool_grad TID=599 total=3842.45 avg=0.546 max=1.368 min=0.064 count=7036
I1211 11:50:30.828889 599 Stat.cc:105] Stat=tanh_grad TID=599 total=441.579 avg=0.041 max=0.264 min=0.005 count=10554
I1211 11:50:30.828897 599 Stat.cc:105] Stat=elementwise_add TID=599 total=1473.18 avg=0.052 max=0.91 min=0.004 count=28144
I1211 11:50:30.828919 599 Stat.cc:105] Stat=square_grad TID=599 total=13.419 avg=0.003 max=0.014 min=0.003 count=3518
I1211 11:50:30.828927 599 Stat.cc:105] Stat=elementwise_sub_grad TID=599 total=23.033 avg=0.006 max=0.023 min=0.005 count=3518
I1211 11:50:30.828964 599 Stat.cc:105] Stat=cos_sim_grad TID=599 total=87166.1 avg=24.777 max=54.537 min=1.002 count=3518
I1211 11:50:30.828974 599 Stat.cc:105] Stat=ExecutorRunTimer TID=599 total=128299 avg=36.458 max=91.512 min=3.412 count=3519
I1211 11:50:30.828980 599 Stat.cc:105] Stat=lookup_table TID=599 total=686.032 avg=0.027 max=0.669 min=0.003 count=24626
I1211 11:50:30.828987 599 Stat.cc:105] Stat=uniform_random TID=599 total=7.372 avg=0.491 max=2.658 min=0.005 count=15
I1211 11:50:30.828994 599 Stat.cc:105] Stat=mul_grad TID=599 total=4040.95 avg=0.164 max=1.742 min=0.012 count=24626
I1211 11:50:30.829000 599 Stat.cc:105] Stat=mul TID=599 total=2029.29 avg=0.082 max=13.604 min=0.005 count=24626
I1211 11:50:30.829007 599 Stat.cc:105] Stat=sequence_conv TID=599 total=2587.79 avg=0.735 max=1.994 min=0.099 count=3518
I1211 11:50:30.829015 599 Stat.cc:105] Stat=CreateLocalScopeTimer TID=599 total=315.411 avg=0.089 max=2.587 min=0.026 count=3519
I1211 11:50:30.829020 599 Stat.cc:105] Stat=elementwise_add_grad TID=599 total=1668.98 avg=0.059 max=1.184 min=0.007 count=28144
I1211 11:50:30.829027 599 Stat.cc:105] Stat=elementwise_sub TID=599 total=23.68 avg=0.006 max=0.03 min=0.005 count=3518
I1211 11:50:30.829035 599 Stat.cc:105] Stat=lookup_table_grad TID=599 total=6358.48 avg=0.258 max=3.266 min=0.033 count=24626
I1211 11:50:30.829041 599 Stat.cc:105] Stat=DeleteLocalScopeTimer TID=599 total=4444.55 avg=1.263 max=2.673 min=0.001 count=3519
I1211 11:50:30.829047 599 Stat.cc:105] Stat=feed TID=599 total=169.309 avg=0.006 max=0.296 min=0.001 count=28144
I1211 11:50:30.829056 599 Stat.cc:105] Stat=cos_sim TID=599 total=1089 avg=0.309 max=1.209 min=0.02 count=3518
I1211 11:50:30.829062 599 Stat.cc:105] Stat=CreateOpTimer TID=599 total=685.823 avg=0.001 max=0.069 min=0 count=355364
I1211 11:50:30.829069 599 Stat.cc:105] Stat=fill_constant TID=599 total=10.788 avg=0.003 max=0.195 min=0.002 count=3549
I1211 11:50:30.829075 599 Stat.cc:105] Stat=concat TID=599 total=323.133 avg=0.045 max=0.179 min=0.008 count=7036
I1211 11:50:30.829082 599 Stat.cc:105] Stat=sequence_pool TID=599 total=2653.1 avg=0.377 max=1.077 min=0.052 count=7036
I1211 11:50:30.829089 599 Stat.cc:105] Stat=square TID=599 total=10.888 avg=0.003 max=0.019 min=0.002 count=3518
I1211 11:50:30.829097 599 Stat.cc:105] Stat=mean TID=599 total=10.421 avg=0.002 max=0.031 min=0.002 count=3518
I1211 11:50:30.829103 599 Stat.cc:105] Stat=fetch TID=599 total=9.181 avg=0.002 max=0.013 min=0.002 count=3518
I1211 11:50:30.829109 599 Stat.cc:105] Stat=tanh TID=599 total=860.985 avg=0.081 max=0.236 min=0.006 count=10554
I1211 11:50:30.829116 599 Stat.cc:105] Stat=mean_grad TID=599 total=29.918 avg=0.008 max=0.022 min=0.003 count=3518
The operators needing to optimize
- cos_sim_grad/cos_sim
- cos_sim_grad is very time-consuming, because a lot of eigen's braodcast is used in it.
- sequence_conv_grad