Created by: Xreki
All inference examples can be profiled with this PR.
The profile result of recognize_digits_conv
on my server:
$ ./paddle/fluid/inference/tests/book/test_inference_recognize_digits_conv --dirname=/home/liuyiqun01/PaddlePaddle/Paddle/python/paddle/fluid/tests/book/recognize_digits_conv.inference.model --repeat=100
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from inference
[ RUN ] inference.recognize_digits
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0307 08:28:55.945766 22596 test_inference_recognize_digits.cc:29] FLAGS_dirname: /home/liuyiqun01/PaddlePaddle/Paddle/python/paddle/fluid/tests/book/recognize_digits_conv.inference.model
I0307 08:28:58.983011 22596 test_inference_recognize_digits.cc:51] --- CPU Runs: is_combined=0 ---
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::init_program 1 1.0292 1.0292 1.0292 1.0292
thread0::conv2d 200 25.4355 0.068962 0.538576 0.127177
thread0::elementwise_add 300 3.86118 0.00526 0.030406 0.0128706
thread0::relu 200 1.48954 0.005296 0.024998 0.0074477
thread0::pool2d 200 4.79081 0.014388 0.059552 0.0239541
thread0::batch_norm 100 1.59581 0.014327 0.03309 0.0159581
thread0::mul 100 1.59436 0.01552 0.01875 0.0159436
thread0::softmax 100 0.925062 0.008602 0.035497 0.00925062
thread0::run_inference 100 72.8846 0.654562 1.08013 0.728846
I0307 08:28:59.059149 22596 test_inference_recognize_digits.cc:54] 1, 10
I0307 08:28:59.059178 22596 test_inference_recognize_digits.cc:62] --- GPU Runs: is_combined=0 ---
-------------------------> Profiling Report <-------------------------
Place: CUDA
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::init_program 1 10.8645 10.8645 10.8645 10.8645
thread0::conv2d 200 20.242 0.071392 0.1608 0.10121
thread0::elementwise_add 300 7.44586 0.007392 0.067872 0.0248195
thread0::relu 200 3.20122 0.005824 0.034112 0.0160061
thread0::pool2d 200 8.21853 0.037696 0.059424 0.0410926
thread0::batch_norm 100 6.44045 0.060096 0.095584 0.0644045
thread0::mul 100 3.73216 0.034784 0.061344 0.0373216
thread0::softmax 100 5.14694 0.047328 0.076448 0.0514694
thread0::run_inference 100 100.706 0.955872 1.29904 1.00706
I0307 08:28:59.189551 22596 test_inference_recognize_digits.cc:65] 1, 10
I0307 08:28:59.189587 22596 test_inference_recognize_digits.cc:51] --- CPU Runs: is_combined=1 ---
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::init_program 1 0.315289 0.315289 0.315289 0.315289
thread0::conv2d 200 23.926 0.068842 0.313521 0.11963
thread0::elementwise_add 300 3.85191 0.005079 0.027506 0.0128397
thread0::relu 200 1.41673 0.005313 0.012125 0.00708363
thread0::pool2d 200 4.66253 0.014217 0.038472 0.0233126
thread0::batch_norm 100 1.55707 0.01443 0.022271 0.0155707
thread0::mul 100 1.60407 0.01551 0.037741 0.0160407
thread0::softmax 100 0.902546 0.008676 0.01315 0.00902546
thread0::run_inference 100 70.3325 0.679056 0.863227 0.703325
I0307 08:28:59.262215 22596 test_inference_recognize_digits.cc:54] 1, 10
I0307 08:28:59.262234 22596 test_inference_recognize_digits.cc:62] --- GPU Runs: is_combined=1 ---
-------------------------> Profiling Report <-------------------------
Place: CUDA
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::init_program 1 0.65248 0.65248 0.65248 0.65248
thread0::conv2d 200 20.3624 0.070976 0.177568 0.101812
thread0::elementwise_add 300 7.47549 0.00736 0.047872 0.0249183
thread0::relu 200 3.10365 0.005856 0.02992 0.0155182
thread0::pool2d 200 8.17149 0.037984 0.061472 0.0408574
thread0::batch_norm 100 6.37984 0.061152 0.07664 0.0637984
thread0::mul 100 3.70675 0.03504 0.039968 0.0370675
thread0::softmax 100 4.96397 0.045664 0.05664 0.0496397
thread0::run_inference 100 99.6111 0.951648 1.08646 0.996111
I0307 08:28:59.380663 22596 test_inference_recognize_digits.cc:65] 1, 10
[ OK ] inference.recognize_digits (3435 ms)
[----------] 1 test from inference (3435 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3435 ms total)
[ PASSED ] 1 test.