Fork自 PaddlePaddle / Paddle
* add kernel profiler * add gpu timer tool * remove warmup * fix rocm complilation error