Why Larger Batch Size Slows Training
Created by: sbl1996
I am training WRN-28-10 on CIFAR10 using PaddleClas. When batch size > 128, using larger batch size, training gets slower. A detailed comparison is shown below.
Batch Size | Time (Per Epoch) |
---|---|
32 | 82.2s |
64 | 72.8s |
128 | 68.5s |
256 | 74.1s |
512 | 86.4s |
1024 | 110.5s |
The time of the 2nd epoch is reported, so warm-up time is not counted. Experiments showed that the results were consistent.
This behavior is strange and unexpected. Could you help me to find the reason?
Code to reproduce is here.
Thank you very much!