Created by: qingqing01
The image processing is a key step for CNN. It's slower when reading image list directly by PyDataProvider. And this leads to a lower speedup on Multi-GPUs, especially on 8 GPUs or 16 GPUs.
This PR has two features:
- A C++ accelerated library that can be called by Python: OpenCV + Multi-threads
- A Python module with python-multiprocessing and Python-OpenCV
One can select one of them to use.
The performance on 4 GPUs( K40 ) with input size 224 * 224 *3 is as follows. 实验条件:
- 4 Tesla K40m
- 总batch size: 192
- 下面时间为20个mini-Batch的时间
- PIL直接读取图片list: 64.8s
- Pickle对数据打Batch再读取:46.8s
- 数据打Batch + Python多进程 + Python-OpenCV: 29.7s
- 数据打Batch + C++多线程 + OpenCV: 26.7s
下一个PR增加使用文档。