Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • 合并请求
  • !27093

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

【paddle.fleet】refine launch and distributed repr string for print !27093

  • Report abuse
!27093 已合并 9月 06, 2020 由 saxon_zh@saxon_zh 创建
#<User:0x00007f0e5a6e0608>
  • 概览 0
  • 提交 5
  • 变更 4

Created by: guru4elephant

PR types

Function optimization

PR changes

Others

Describe

In Paddle 2.0, paddle.distributed.fleet.DistributedStrategy can be serialized into protobuf, but it is still not in pretty format for printing. This PR mainly optimizes the log format of DistributedStrategy. Log samples are listed below

fleetrun log sample:

    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:25538               |
    |                     PADDLE_TRAINERS_NUM                        4                      |
    |                     FLAGS_selected_gpus                        4                      |                              |                PADDLE_TRAINER_ENDPOINTS  ... 0.1:49874,127.0.0.1:15274,127.0.0.1:58230|
    |                       PADDLE_TRAINER_ID                        0                      |
    +=======================================================================================+

DistributedStrategy log sample:

    +==============================================================================+                        
    |                                                                              |
    |                         DistributedStrategy Overview                         |
    |                                                                              |
    +==============================================================================+
    |                     amp = True, please check amp_configs                     |
    +------------------------------------------------------------------------------+
    |                     init_loss_scaling                 32768.0                |
    |                    incr_every_n_steps                   1000                 |
    |               decr_every_n_nan_or_inf                    2                   |
    |                            incr_ratio                   2.0                  |
    |                            decr_ratio              0.800000011921            |
    |              use_dynamic_loss_scaling                   True                 |
    +==============================================================================+
    |               recompute = True, please check recompute_configs               |
    +------------------------------------------------------------------------------+
    |                           checkpoints              pool2d_0.tmp_0            |
    |                                               res2a.add.output.5.tmp_1       |
    |                                               res2b.add.output.5.tmp_1       |
    |                                               res2c.add.output.5.tmp_1       |
    |                                               res3a.add.output.5.tmp_1       |
    |                                               res3b.add.output.5.tmp_1       |
    |                                               res3c.add.output.5.tmp_1       |
    |                                               res3d.add.output.5.tmp_1       |
    |                                               res4a.add.output.5.tmp_1       |
    |                                               res4b.add.output.5.tmp_1       |
    |                                               res4c.add.output.5.tmp_1       |
    |                                               res4d.add.output.5.tmp_1       |
    |                                               res4e.add.output.5.tmp_1       |
    |                                               res4f.add.output.5.tmp_1       |
    |                                               res5a.add.output.5.tmp_1       |
    |                                               res5b.add.output.5.tmp_1       |
    |                                               res5c.add.output.5.tmp_1       |
    |                                                    pool2d_1.tmp_0       |
    |                                                      fc_0.tmp_1              |                      
    +==============================================================================+
    |                  a_sync = True, please check a_sync_configs                  |
    +------------------------------------------------------------------------------+
    |                               k_steps                    -1                  |
    |                     max_merge_var_num                    1                   |
    |                       send_queue_size                    16                  |
    |               independent_recv_thread                  False                 |
    |         min_send_grad_num_before_recv                    1                   |
    |                      thread_pool_size                    1                   |
    |                       send_wait_times                    1                   |
    |               runtime_split_send_recv                  False                 |
    +==============================================================================+
    |                    Environment Flags, Communication Flags                    |
    +------------------------------------------------------------------------------+
    |                                  mode                    1                   |
    |                               elastic                  False                 |
    |                                  auto                  False                 |
    |                   sync_nccl_allreduce                   True                 |
    |                         nccl_comm_num                    1                   |
    |            use_hierarchical_allreduce                  False                 |
    |   hierarchical_allreduce_inter_nranks                    1                   |
    |                       sync_batch_norm                  False                 |
    |                   fuse_all_reduce_ops                   True                 |
    |                  fuse_grad_size_in_MB                    32                  |
    |              fuse_grad_size_in_TFLOPS                   50.0                 |
    |               cudnn_exhaustive_search                   True                 |
    |             conv_workspace_size_limit                   4000                 |
    |    cudnn_batchnorm_spatial_persistent                   True                 |
    +==============================================================================+
    |                                Build Strategy                                |
    +------------------------------------------------------------------------------+
    |           enable_sequential_execution                  False                 |
    |              fuse_elewise_add_act_ops                  False                 |
    |                       fuse_bn_act_ops                  False                 |
    |              fuse_relu_depthwise_conv                  False                 |
    |                    fuse_broadcast_ops                  False                 |
    |                fuse_all_optimizer_ops                  False                 |
    |                        enable_inplace                  False                 |
    |     enable_backward_optimizer_op_deps                   True                 |
    |                 cache_runtime_context                  False                 |
    +==============================================================================+
    |                              Execution Strategy                              |
    +------------------------------------------------------------------------------+
    |                           num_threads                    1                   |
    |          num_iteration_per_drop_scope                    10                  |
    |                 num_iteration_per_run                    1                   |
    |                    use_thread_barrier                  False                 |
    +==============================================================================+
指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/Paddle!27093
Source branch: github/fork/guru4elephant/refine_log_format
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7