* add model parallel support in dygraph
* support hyparallel, add topology * fix utest
* new group * ci compatible fix * assert nccl