Created by: wangxicoding
Cherry-pick from https://github.com/PaddlePaddle/Paddle/pull/22914
- Build strategy fuse_all_reduce_ops require that gradients should not be sparse types, close fuse when using DGC.
- Move program._enable_dgc strategy from parallel_executor.py to the compiler.py for the compiler.py is the final entry.