Created by: sneaxiy
The following uts consume lots of gpu memory, and may cause other uts fail randomly in Py35 CI. This PR fixes them by serial running:
- test_parallel_executor_seresnext_base_gpu: about 8G gpu memory
- test_parallel_executor_seresnext_with_reduce_gpu: about 4G gpu memory
- test_parallel_executor_seresnext_with_fuse_all_reduce_gpu: about 7G gpu memory