Created by: chenwhql
Now the dygraph DataLoader will decrease training speed, because of some implementation issues. This PR refine the dygraph DataLoader implementation to solve following problem.
Optimization effect (single card)
transformer(s) batch_size = 256, epoch = 20
original DataLoader | New DataLoader | Time decrease |
---|---|---|
1418.013 | 1320.847 | -6.9% |
resnet(s) batch_size = 256, epoch = 120
original DataLoader | New DataLoader | Time decrease |
---|---|---|
12221.662 | 11468.035 | -6.2% |