Single GPU performance Improvement
Created by: tonyyang-svail
Profiler
Use either nvprof
or paddle.v2.fluid.profiler
Benchmark:
- resnet_cifar10: https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/tests/book/test_image_classification_train.py
- RNN
Steps:
- Update cuDNN, this includes:
- Update NVIDIA Container to CUDA 9 and cuDNN 7 (Current version is CUDA 8 and cuDNN 5)
- Link cuDNN into Paddle in cudnn.h
- Search over Paddle Operators, directly use CuDNN when possible
- Reduce the number of kernel launch of the kernel is written through eigen