We Should Have The Alternative BN Implementation by Our Own
Created by: KaiyuYue
Hi there,
This issue is tracked by the project Simple Baselines for Human Pose Estimation in Fluid.
If BN function of cuDNN is used during training in the task of Human Pose Estimation, a known issue that training can't converge will be encountered on the Tesla P40 / P100 / V100 GPU cards , both in PyTorch and PaddlePaddle.
PyTorch has its own BN implementation to solve this problem, revise the torch.backends.cudnn.enabled
into False
to disable the cuDNN BN usage in their code.
PaddlePaddle doesn't allow us to make this change. Luckily, training 1 image on each GPU card can ease this probelm. So, I think we should have the alternative BN implementation by our own.