Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • 合并请求
  • !22957

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板

Integrated Trainer of Parameter Server (API add `fluid.contrib.layers.sparse_embedding` only) !22957

  • Report abuse
!22957 已合并 3月 11, 2020 由 saxon_zh@saxon_zh 创建
#<User:0x00007ff7d5dc2600>
  • 概览 63
  • 提交 368
  • 变更 119

Created by: seiriosPlus

PR types

New features, Function optimization

PR changes

APIs, OPs, Others

Describe

fully optimized code for parameter server training.

Introduction


API changes:

  1. add fluid.contrib.sparse_embedding for large sparse embedding.
  2. other API changes caused by code formatter.

OP changes:

  1. recv_save add attribute is_sparse
  2. send remove unused attribute send
  3. checkpoint_notify remove attributes trainer_id/dir/lookup_table/epmap
  4. checkpoint_notify add attributes is_slice/varname/remote_varnames/endpoints/slice_varnames/dirname
  5. distributed_lookup_table delete unused attribute height_sections

explain: recv_save / send / checkpoint_notify / distributed_lookup_table are all private ops for distributed training, user-friendly.

Transpiler

old: huge methods with if/for loop new: passes

trainer: delete_optimizer_pass->distributed_ops_pass->append_send_ops_pass->fake_init_ops_pass->init_from_server_pass->delet_extra_optimizes_pass pserver: add_listen_and_serv_pass->add_rpc_global_flags_pass->add_optimizer_pass->large_scale_sparse_pass->build_pserver_startup_program_pass->large_scale_sparse_pass

Communicator

reimplement: Communicator -> AsyncComunicator -> GeoCommunicator Communicator -> AsyncComunicator -> HalfAsyncCommunicator -> SyncCommunicator

Server

add LargeScaleKV implement

  1. auto growth id
  2. id in [0, INT64]
  3. hash by pservers, fix hotspot issues.
  4. save to SelectedRows/Text
  5. PServer Save

Experiments

CTR LARGE_SCALE VS 1.7.2(8) SPEED lines/esc TEST AUC
ASYNC + DATASET    
EPOCH 22957 1.7.2 22957 1.7.2
0 40382.7114 41824.4988 0.743417 0.747783
1 40852.5408 42742.1863 0.763407 0.765855
2 41757.7742 42824.1947 0.773173 0.775051
3 42245.0450 41419.2507 0.779447 0.780185
4 42255.9334 43046.1868 0.782906 0.783628
5 41638.7534 43064.7863 0.785447 0.785996
6 41625.8496 43903.9036 0.787759 0.787912
7 41826.0234 43849.0151 0.789067 0.789307
8 41644.1003 43084.6209 0.790005 0.790547
9 42006.6102 45298.1854 0.791332 0.791442
10 41520.9987 44090.6176 0.791979 0.792363
11 41644.4256 44754.7476 0.79242 0.792934
12 41600.1201 44129.0481 0.792934 0.793618
13 42242.2510 43911.7627 0.793653 0.793995
14 42493.9051 43280.5678 0.793839 0.794289
15 42483.4592 43156.0943 0.794197 0.794641
16 41973.2283 43050.7187 0.794247 0.794467
17 41648.6511 43470.0413 0.794318 0.795061
18 42001.9489 43878.2971 0.794401 0.794801
19 43204.9721 44350.7521 0.794796 0.794991
W2V LARGE_SCALE VS 1.7.2(8) SPEED words/esc   TEST ACC
ASYNC + DATASET    
EPOCH 22957 1.7.2 22957 1.7.2
0 38710.8308 52546.4205 0.357 0.291
1 31047.3975 53228.6121 0.488 0.421
2 38398.9667 53219.9925 0.552 0.483
3 30892.7404 52935.8162 0.592 0.53
4 38253.3532 53083.7943 0.609 0.57
5 30509.0984 52890.2298 0.614 0.583
6 36201.4896 52968.6198 0.621 0.6
7 31285.0067 53509.7486 0.628 0.608
8 34499.2948 52988.9057 0.632 0.615
9 34003.5555 53091.7894 0.634 0.62
10 32231.5232 52983.9373 0.637 0.623
11 36732.4116 52930.2642 0.64 0.63
12 29601.1020 52820.1654 0.639 0.632
13 47984.6335 53022.2126 0.64 0.634
14 44598.9333 53156.1390 0.641 0.635
SIMNET LARGE_SCALE VS 1.7.2(8) SPEED lines/esc   TEST PN
ASYNC + DATASET    
EPOCH 22957 1.7.2 22957 1.7.2
0 80882.7005 69527.7479 1.80131 1.83487
2 80851.1122 69427.2556 1.92519 1.95693
4 81212.7949 69551.4940 1.99275 2.01557
6 81046.8461 69718.6492 2.02755 2.065
8 81191.8261 69810.9468 2.0512 2.07834
10 81614.7143 69974.7245 2.08456 2.11172
12 80365.6785 69782.9580 2.06726 2.12081
14 80845.8114 70085.8516 2.1108 2.1329
16 81453.6759 70270.9776 2.12224 2.11487
18 80598.0124 70201.1548 2.14547 2.16318
20 80936.1689 69586.5245 2.12098 2.17401
22 80281.5667 69935.3969 2.16638 2.17755
24 80648.5904 69712.8900 2.17197 2.18454
26 80175.3672 69874.0455 2.17442 2.18979
28 80482.3953 69670.9059 2.1805 2.19599
30 80357.8911 69693.6987 2.18897 2.20132
32 80576.7476 69591.5745 2.19511 2.20476
34 81010.0043 69194.1689 2.19904 2.21155
36 80852.6387 69869.2369 2.19907 2.20889
38 80696.7542 69573.0540 2.2062 2.20458

NEXT Work

  1. the full unification of Tensor and LargeScakeLV on PServer
  2. Speed up for Geo/Async.
  3. more test at business situations.
指派人
分配到
审核者
Request review from
无
里程碑
无
分配里程碑
工时统计
标识: paddlepaddle/Paddle!22957
Source branch: github/fork/seiriosPlus/feature/integrated_ps_trainer
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7