Created by: guru4elephant
PR types
Function optimization
PR changes
Others
Describe
In Paddle 2.0, paddle.distributed.fleet.DistributedStrategy can be serialized into protobuf, but it is still not in pretty format for printing. This PR mainly optimizes the log format of DistributedStrategy. Log samples are listed below
fleetrun
log sample:
+=======================================================================================+
| Distributed Envs Value |
+---------------------------------------------------------------------------------------+
| PADDLE_CURRENT_ENDPOINT 127.0.0.1:25538 |
| PADDLE_TRAINERS_NUM 4 |
| FLAGS_selected_gpus 4 | | PADDLE_TRAINER_ENDPOINTS ... 0.1:49874,127.0.0.1:15274,127.0.0.1:58230|
| PADDLE_TRAINER_ID 0 |
+=======================================================================================+
DistributedStrategy log sample:
+==============================================================================+
| |
| DistributedStrategy Overview |
| |
+==============================================================================+
| amp = True, please check amp_configs |
+------------------------------------------------------------------------------+
| init_loss_scaling 32768.0 |
| incr_every_n_steps 1000 |
| decr_every_n_nan_or_inf 2 |
| incr_ratio 2.0 |
| decr_ratio 0.800000011921 |
| use_dynamic_loss_scaling True |
+==============================================================================+
| recompute = True, please check recompute_configs |
+------------------------------------------------------------------------------+
| checkpoints pool2d_0.tmp_0 |
| res2a.add.output.5.tmp_1 |
| res2b.add.output.5.tmp_1 |
| res2c.add.output.5.tmp_1 |
| res3a.add.output.5.tmp_1 |
| res3b.add.output.5.tmp_1 |
| res3c.add.output.5.tmp_1 |
| res3d.add.output.5.tmp_1 |
| res4a.add.output.5.tmp_1 |
| res4b.add.output.5.tmp_1 |
| res4c.add.output.5.tmp_1 |
| res4d.add.output.5.tmp_1 |
| res4e.add.output.5.tmp_1 |
| res4f.add.output.5.tmp_1 |
| res5a.add.output.5.tmp_1 |
| res5b.add.output.5.tmp_1 |
| res5c.add.output.5.tmp_1 |
| pool2d_1.tmp_0 |
| fc_0.tmp_1 |
+==============================================================================+
| a_sync = True, please check a_sync_configs |
+------------------------------------------------------------------------------+
| k_steps -1 |
| max_merge_var_num 1 |
| send_queue_size 16 |
| independent_recv_thread False |
| min_send_grad_num_before_recv 1 |
| thread_pool_size 1 |
| send_wait_times 1 |
| runtime_split_send_recv False |
+==============================================================================+
| Environment Flags, Communication Flags |
+------------------------------------------------------------------------------+
| mode 1 |
| elastic False |
| auto False |
| sync_nccl_allreduce True |
| nccl_comm_num 1 |
| use_hierarchical_allreduce False |
| hierarchical_allreduce_inter_nranks 1 |
| sync_batch_norm False |
| fuse_all_reduce_ops True |
| fuse_grad_size_in_MB 32 |
| fuse_grad_size_in_TFLOPS 50.0 |
| cudnn_exhaustive_search True |
| conv_workspace_size_limit 4000 |
| cudnn_batchnorm_spatial_persistent True |
+==============================================================================+
| Build Strategy |
+------------------------------------------------------------------------------+
| enable_sequential_execution False |
| fuse_elewise_add_act_ops False |
| fuse_bn_act_ops False |
| fuse_relu_depthwise_conv False |
| fuse_broadcast_ops False |
| fuse_all_optimizer_ops False |
| enable_inplace False |
| enable_backward_optimizer_op_deps True |
| cache_runtime_context False |
+==============================================================================+
| Execution Strategy |
+------------------------------------------------------------------------------+
| num_threads 1 |
| num_iteration_per_drop_scope 10 |
| num_iteration_per_run 1 |
| use_thread_barrier False |
+==============================================================================+