Created by: guoshengCS
Support pyreader and data feeding in Transformer. Also fix the profiling.
Place: All
Time unit: ms
Sorted by total time in descending order in the same thread
Event Calls Total Min. Max. Ave. Ratio.
thread4::mul_grad 486 366.476 0.443392 1.60563 0.754066 0.0570016
thread4::matmul_grad 199 248.72 0.135168 79.6283 1.24985 0.0386859
thread4::matmul 199 211.183 0.070656 41.4771 1.06122 0.0328473
thread4::mul 489 174.483 0.210944 0.842752 0.356817 0.0271391
thread4::dropout 333 142.126 0.08192 1.14278 0.426805 0.0221062
thread4::elementwise_add_grad 376 63.3958 0.01536 0.709632 0.168606 0.00986057
thread4::scale 2138 39.7292 0.01024 0.1024 0.0185824 0.00617946
thread4::layer_norm_grad 158 39.3718 0.2304 0.366592 0.249188 0.00612387
thread4::sum 199 34.7832 0.098304 1.08646 0.17479 0.00541017
thread4::reshape 399 29.8783 0.037888 1.50016 0.0748829 0.00464726
thread4::softmax_with_cross_entropy 5 28.7642 5.71392 5.79789 5.75283 0.00447397
thread4::adam 952 27.0336 0.011264 0.654336 0.0283966 0.0042048
thread4::reshape_grad 385 25.7628 0.036864 1.5145 0.0669164 0.00400714
thread4::elementwise_add 411 22.7492 0.019456 0.10752 0.0553508 0.0035384
thread4::dropout_grad 330 20.5896 0.02048 0.145408 0.0623926 0.00320249
thread4::softmax_with_cross_entropy_grad 9 19.5502 2.11866 2.20058 2.17225 0.00304083
thread4::transpose 350 18.1371 0.04096 0.077824 0.0518203 0.00282104
thread4::transpose_grad 347 17.2237 0.04096 0.06144 0.049636 0.00267897
thread4::fill_zeros_like 639 15.4419 0.01024 0.062464 0.0241658 0.00240183
thread4::softmax 71 14.6258 0.181248 0.218112 0.205997 0.00227489
thread4::layer_norm 184 10.4694 0.044032 0.120832 0.0568988 0.0016284
thread4::softmax_grad 86 9.96454 0.1024 0.175104 0.115867 0.00154988
thread4::relu_grad 54 8.30566 0.141312 0.162816 0.153809 0.00129186
thread4::one_hot 7 6.24845 0.86528 0.920576 0.892635 0.000971882
thread4::relu 49 6.18701 0.118784 0.134144 0.126265 0.000962325
thread4::label_smooth 2 4.46874 2.22515 2.24358 2.23437 0.000695066
thread4::lookup_table_grad 17 3.30752 0.156672 0.263168 0.19456 0.000514451
thread4::lookup_table 24 2.73818 0.07168 0.132096 0.114091 0.000425895
thread4::fill_constant 23 1.08442 0.011264 0.108544 0.0471485 0.00016867
thread4::elementwise_mul 22 0.612352 0.011264 0.111616 0.0278342 9.5245e-05
thread4::elementwise_pow 4 0.51712 0.111616 0.151552 0.12928 8.04327e-05
thread4::elementwise_div_grad 8 0.140288 0.016384 0.018432 0.017536 2.18203e-05
thread4::elementwise_min 4 0.12288 0.011264 0.060416 0.03072 1.91127e-05
thread4::increment 5 0.118784 0.004096 0.034816 0.0237568 1.84756e-05
thread4::reduce_sum_grad 3 0.100352 0.03072 0.037888 0.0334507 1.56087e-05
thread4::reduce_sum 6 0.093184 0.013312 0.018432 0.0155307 1.44938e-05
thread4::elementwise_mul_grad 4 0.072704 0.016384 0.02048 0.018176 1.13084e-05
thread4::elementwise_div 4 0.067584 0.016384 0.017408 0.016896 1.0512e-05
thread4::cast 1 0.003072 0.003072 0.003072 0.003072 4.77818e-07