Performance Improvements suggestions on ConNets (#6321) · Issue · PaddlePaddle / Paddle

Performance Improvements suggestions on ConNets

Created by: tonyyang-svail

Some general feedbacks from Nvidia on profiling fluid ConvNet https://github.com/PaddlePaddle/Paddle/issues/6179:

cuDNN convolution is not used(I am not sure whether this is intended). https://github.com/PaddlePaddle/Paddle/issues/6089
For profiling, normally we ignore the first minibatch or several minibatch from benchmark result because it is slow on allocating and tuning algorithm. Doing the same thing here allow us to easier compare result to other frameworks to see how well we are doing
Data pipeline: some part of it is not running in parallel with GPU. plus, it is slow and become the bottleneck if GPU perf gets reasonable

After changing three things above, by using cuDNN, use fake numpy data and only calculate speed for 10-50 minibatch, the TitanXp perf increased from 53img/sec to ~108img/sec

Also, another bug is caught at https://github.com/PaddlePaddle/Paddle/issues/6320.

After changing all four things above, we got ~40% speed up to 150img/sec on my Titan.

PaddlePaddle / Paddle 1 年多 前同步成功

Performance Improvements suggestions on ConNets

PaddlePaddle / Paddle
1 年多前同步成功