Created by: luotao1
Inspired by the demands for profiling framework overhead, this PR add:
-
--profile_infershape
for infershape elapsed time of each operator -
--profile_compute
for compute elapsed time of each operator.
Examples:
- command:
./paddle/fluid/inference/tests/api/test_analyzer_pyramid_dnn --infer_model=third_party/inference_demo/pyramid_dnn/model/ --infer_data=third_party/inference_demo/pyramid_dnn/data.txt --gtest_filter=Analyzer_Pyramid_DNN.profile --repeat=10000 --zero_copy --warmup --profile --profile_infershape --profile_compute
- result:
Event Calls Total Min. Max. Ave. Ratio.
thread0::hash 120000 794.965 0.005452 0.473659 0.00662471 0.192506
thread0::fused_embedding_seq_pool 160000 691.328 0.003215 0.926386 0.0043208 0.167409
thread0::hash_compute 120000 588.484 0.003927 0.179345 0.00490404 0.142505
thread0::sequence_enumerate 120000 394.58 0.002732 0.838274 0.00328816 0.0955499
thread0::fused_embedding_seq_pool_compute 160000 385.397 0.001561 0.88532 0.00240873 0.0933263
thread0::fc 20000 347.146 0.016292 0.34892 0.0173573 0.0840636
thread0::fc_compute 20000 269.738 0.012725 0.267968 0.0134869 0.0653187
thread0::sequence_enumerate_compute 120000 172.885 0.001212 0.610675 0.00144071 0.0418652
thread0::sum 20000 161.158 0.007017 0.145441 0.0080579 0.0390254
thread0::softsign 20000 91.8232 0.004247 0.054925 0.00459116 0.0222355
thread0::sum_infershape 20000 69.8391 0.002788 0.027515 0.00349195 0.016912
thread0::cos_sim 10000 43.9477 0.003986 0.098801 0.00439477 0.0106422
thread0::sum_compute 20000 43.2604 0.001959 0.139936 0.00216302 0.0104758
thread0::softsign_compute 20000 30.9593 0.001391 0.042987 0.00154797 0.00749699
thread0::cos_sim_compute 10000 23.7063 0.00214 0.096713 0.00237063 0.00574063
thread0::softsign_infershape 20000 20.3502 0.000926 0.021815 0.00101751 0.00492793
In this example, we could find the framework overhead is big. For sum op: infershape costs 69/161=42%, and compute only costs 43/161=26%
thread0::sum 20000 161.158 0.007017 0.145441 0.0080579 0.0390254
thread0::sum_infershape 20000 69.8391 0.002788 0.027515 0.00349195 0.016912
thread0::sum_compute 20000 43.2604 0.001959 0.139936 0.00216302 0.0104758