Created by: Xreki
Inspired by results in #8990 (closed) , we know the clone of program may be time-consuming. But the optimization in #8990 (closed) doesn't affect inference.
Profile results of image_classification_resnet: GPU: Tesla K40m, CUDA 8.0, CUDNN v7
batch_size | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 |
---|---|---|---|---|---|---|---|---|---|
2019/3/9 | 1048.01 | 1081.18 | 1041.19 | 1135.8 | 1222.75 | 1658.94 | 2339.08 | 4478.58 | 8556.82 |
2019/03/14,remove clone | 866.048 | 852.3 | 858.021 | 987.86 | 1115.01 | 1556.01 | 2232.87 | 4375.75 | 8443.42 |
speed up | 1.210106 | 1.268544 | 1.213478 | 1.149758 | 1.096627 | 1.06615 | 1.047567 | 1.0235 | #1 (closed).013431 |