Created by: zhhsplendid
The training script used a deprecated memory optimization API and some new memory optimization strategies are not used. If we use the new API, we are able to be same at GPU usage performance of PyTorch for SE-Resnext model. I changed the API to use the new API.