Created by: qingqing01
This code are splited into pull https://github.com/baidu/Paddle/pull/218 and pull https://github.com/baidu/Paddle/pull/219 . SO, please to review them.
- add benchmark
- add ConvProjection to reduce GPU memory for googlenet
- increase workspace limit size in cudnn_conv to speed training.
- use TmpMatrix in cudnn_conv to reduce GPU memory.