In-place split to avoid inter-device duplication (#10230)
New Benchmark by in-place split:
>> keras.application.Resnet50 224x224x3 (NCWH; NVidia Tesla P100 x 4)
input_shape = 3x224x224, batch_size = 96 x 4: 392(images/sec) => 417(images/sec)
input_shape = 3x299x299, batch_size = 64 x 4: 229(images/sec) => 244(images/sec)
input_shape = 3x224x224, batch_size = 8 x 4: 148(images/sec) => 163(images/sec)
>> keras.application.InceptionV3 (NCWH; NVidia Tesla P100 x 4)
input_shape = 3x224x224, batch_size = 128 x 4: 488(images/sec) => 526(images/sec)
input_shape = 3x299x299, batch_size = 96 x 4: 270(images/sec) => 294(images/sec)
input_shape = 3x224x224, batch_size = 8 x 4: 146(images/sec) => 158(images/sec)
Signed-off-by: NCUI Wei <ghostplant@qq.com>
Showing
想要评论请 注册 或 登录