Performance Improvements suggestions on ConNets
Created by: tonyyang-svail
Some general feedbacks from Nvidia on profiling fluid ConvNet https://github.com/PaddlePaddle/Paddle/issues/6179:
- cuDNN convolution is not used(I am not sure whether this is intended). https://github.com/PaddlePaddle/Paddle/issues/6089
- For profiling, normally we ignore the first minibatch or several minibatch from benchmark result because it is slow on allocating and tuning algorithm. Doing the same thing here allow us to easier compare result to other frameworks to see how well we are doing
- Data pipeline: some part of it is not running in parallel with GPU. plus, it is slow and become the bottleneck if GPU perf gets reasonable
After changing three things above, by using cuDNN, use fake numpy data and only calculate speed for 10-50 minibatch, the TitanXp perf increased from 53img/sec to ~108img/sec
Also, another bug is caught at https://github.com/PaddlePaddle/Paddle/issues/6320.
After changing all four things above, we got ~40% speed up to 150img/sec on my Titan.