Profiling of text_classification with Fluid.
Created by: peterzhang2029
text_classification的Profiling结果,源码https://github.com/PaddlePaddle/models/tree/develop/fluid/text_classification:
训练配置:
batch_size: 4
num_of_batch: 500
dict_size: 89528
Profiling Report 指标:
Calls: 表示总共调用的数量
Total:表示总共运行的时长
Min:表示所有的调用中最短的一次调用时长
Max:表示所有调用中最长的一次调用时长
Ave:表示所有调用的平均调用时长
训练500个batch的Profiling Report 结果:
The dictionary size is : 89528
Total time: 16.033737
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by total time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::sgd 3493 4830.48 0.009502 15.5302 1.3829
thread0::lookup_table_grad 499 4775.31 9.31438 12.5104 9.56976
thread0::sequence_conv_grad 499 1683.32 1.04954 15.5015 3.3734
thread0::sequence_conv 499 1048.58 0.698105 14.0347 2.10136
thread0::elementwise_add_grad 1497 210.856 0.013004 1.14577 0.140852
thread0::elementwise_add 1497 122.64 0.00811 0.577024 0.0819238
thread0::tanh 499 117.027 0.090373 0.623305 0.234523
thread0::sequence_pool 499 104.869 0.100548 0.479671 0.210159
thread0::lookup_table 499 67.5452 0.059546 0.555258 0.135361
thread0::tanh_grad 499 54.0636 0.044505 0.316446 0.108344
thread0::sequence_pool_grad 499 36.3875 0.039143 0.18097 0.0729209
thread0::mul_grad 998 35.8055 0.023166 0.237668 0.0358772
thread0::mul 998 26.0178 0.011744 0.074651 0.02607
thread0::cast 1996 19.4615 0.0061 0.158941 0.00975027
thread0::sum 998 15.1658 0.009326 0.09614 0.0151962
thread0::top_k 499 14.7059 0.025781 0.0949 0.0294708
thread0::accuracy 499 10.6152 0.018964 0.059349 0.021273
thread0::softmax 499 8.14185 0.014948 0.044151 0.0163163
thread0::cross_entropy_grad 499 7.07577 0.0132 0.023645 0.0141799
thread0::cross_entropy 499 6.96024 0.012991 0.02396 0.0139484
thread0::feed 998 6.73899 0.004097 0.035801 0.00675249
thread0::softmax_grad 499 6.43104 0.011922 0.032686 0.0128879
thread0::fetch 1497 5.35905 0.001853 0.019403 0.00357986
thread0::elementwise_div 499 5.16685 0.008804 0.073607 0.0103544
thread0::mean_grad 499 4.91204 0.009089 0.044894 0.00984377
thread0::mean 499 3.64095 0.006551 0.041162 0.00729649
thread0::fill_constant 499 3.02958 0.005365 0.031708 0.00607131
训练1个pass的Profiling Report 结果:
The dictionary size is : 89528
Total time: 246.533985
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by total time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::lookup_table_grad 6248 85039.4 12.9076 32.3335 13.6107
thread0::sgd 43736 68442.7 0.009991 35.3864 1.5649
thread0::sequence_conv_grad 6248 24102.5 0.776669 45.2805 3.85763
thread0::sequence_conv 6248 16146 0.47397 29.5967 2.58418
thread0::elementwise_add_grad 18744 2645.15 0.013075 1.76686 0.14112
thread0::elementwise_add 18744 1515.66 0.007877 1.35418 0.0808611
thread0::tanh 6248 1412.74 0.078598 2.58319 0.226111
thread0::sequence_pool 6248 1296.17 0.084166 1.01645 0.207454
thread0::lookup_table 6248 996.414 0.061383 2.69139 0.159477
thread0::tanh_grad 6248 686.66 0.040493 3.34762 0.109901
thread0::sequence_pool_grad 6248 460.941 0.03908 0.767207 0.0737741
thread0::mul_grad 12496 455.801 0.023004 1.12555 0.0364757
thread0::mul 12496 323.891 0.011963 0.718241 0.0259196
thread0::cast 24992 319.577 0.006238 24.1128 0.0127872
thread0::top_k 6248 238.347 0.026201 12.0761 0.0381477
thread0::sum 12496 227.388 0.009395 16.0632 0.0181969
thread0::accuracy 6248 171.487 0.020024 15.2487 0.0274467
thread0::softmax 6248 105.258 0.015404 0.702274 0.0168467
thread0::feed 12496 94.2271 0.003959 4.75252 0.00754058
thread0::cross_entropy_grad 6248 90.4667 0.013344 0.148185 0.0144793
thread0::cross_entropy 6248 88.0551 0.012642 0.698313 0.0140933
thread0::softmax_grad 6248 84.509 0.012117 0.699262 0.0135258
thread0::fetch 18744 76.1771 0.001746 0.098168 0.00406408
thread0::elementwise_div 6248 69.8883 0.009183 1.36969 0.0111857
thread0::mean_grad 6248 62.9747 0.008981 0.056408 0.0100792
thread0::mean 6248 45.4185 0.006505 0.045735 0.00726928
thread0::fill_constant 6248 38.8432 0.005422 0.637213 0.0062169
结论: 其中 lookup_table_grad 时间占用明显过长