Created by: jczaja
Hi,
During work on Fluid optimization for CPU, I created helpful code that counts how much time given type of fluid operator takes within analyzed workload. I decided to share this as it can be useful for other developers, and also wanted to get your opinion if it is useful, or perhaps should be improved.
It produce info in a format:
Operator[type of operator]: totalExec: total time operator of specific type took Ratio[totalExec/total_time]: percentage of total time, operator of specific type took% aveExec[ms]: Average time of specific type of operator took Num Execs: Number of times operator of specific type was called
Notes/TODO:
- Counters are accumulated during execution and presented after paddle finish work
- To enable profiler: -DWITH_FLUID_PROFILER=ON # OFF is default
- Works on linux only (I do not know how to read TSC clock Value to convert cycles to ms on MacOS)
- Produce results only if paddle is terminated with signal eg. interruppted training will not show stats.
- No unit tests yet
example:
Operator[mul_grad]: totalExec: 12973 ms Ratio[totalExec/total_time]: 25.787 % aveExec[ms]: 2.30258 Num Execs: 5628
Part of output when running RNN Search model
(https://github.com/dzhwinter/benchmark/blob/master/fluid/machine_translation.py):
........ pass_id=0, batch_id=6, train_loss: 10.306499 pass_id=0, batch_id=7, train_loss: 10.305863 pass_id=0, batch_id=8, train_loss: 10.304803 pass_id=0, batch_id=9, train_loss: 10.304098 Operator[mul_grad]: totalExec: 12604 ms Ratio[totalExec/total_time]: 30.1659 % aveExec[ms]: 2.29946 Num Execs: 5474 Operator[adam]: totalExec: 7769 ms Ratio[totalExec/total_time]: 18.594 % aveExec[ms]: 29.9425 Num Execs: 260 Operator[softmax]: totalExec: 5439 ms Ratio[totalExec/total_time]: 13.0175 % aveExec[ms]: 10.9859 Num Execs: 494 Operator[mul]: totalExec: 5313 ms Ratio[totalExec/total_time]: 12.7159 % aveExec[ms]: 0.962658 Num Execs: 5474 Operator[elementwise_mul_grad]: totalExec: 4912 ms Ratio[totalExec/total_time]: 11.7562 % aveExec[ms]: 2.48743 Num Execs: 1976 Operator[softmax_grad]: totalExec: 1799 ms Ratio[totalExec/total_time]: 4.30565 % aveExec[ms]: 3.64903 Num Execs: 494 ...... Total average execution time: 41782.3ms
I'm looking forward to your feedback. Whether it is useful? Or perhaps it should be moved (implementation) to some other location in code. Based on feedback I can introduce suggested changes.