提交 · 10f0a0f6c8f71436bad715b0f74329e89ea076f9 · 机器未来 / Paddle

18 10月, 2021 1 次提交

[HybridParallel]Support fp16 in dygraph hybrid parallel (#36420) · 10f0a0f6

由 Haohongxiang 提交于 10月 18, 2021

* [HybridParallel]Support fp16 in dygraph hybrid parallel

* update

* update

* update for recompute

* add unittest of pp+fp16

* add unittest of recompute+fp16

* update

* modify ut

10f0a0f6

15 10月, 2021 1 次提交
- D
  
  fix opt-offload save bug (#36433) · e703a2ed
  由 duanboqiang 提交于 10月 15, 2021
  
  e703a2ed
14 10月, 2021 3 次提交
- D
  
  optimize-offload support adamw op type (#36432) · 66c58fa3
  由 duanboqiang 提交于 10月 14, 2021
  
  66c58fa3
- S
  [HybridParallel]Rebuild code for pipeline (#36396) · 8ffcc7c8
  由 ShenLiang 提交于 10月 14, 2021
```
* add no_sync for parameters sync

* add pipeline for moe
```
  8ffcc7c8
- Y
  
  [hybrid enhance] add flag to control the avg position for grad merge under pipeline mode (#36384) · 03d8304f
  由 Yuang Liu 提交于 10月 14, 2021
  
  03d8304f
13 10月, 2021 2 次提交
- G
  
  support auto parallel data shard (#36055) · 85bb1a85
  由 Guoxia Wang 提交于 10月 13, 2021
  
  85bb1a85
- L
  [Amp] refine code of amp level (#36362) · 59e425cd
  由 Leo Chen 提交于 10月 13, 2021
```
* refine amp level

* fix typo

* update tracer._amp_level
```
  59e425cd
12 10月, 2021 1 次提交

fix bugs in mp_layers、pp_layers and HybridParallelClipGrad (#36144) · d247cf17

由 Haohongxiang 提交于 10月 12, 2021

* fix calling bug of HybridParallelClipGrad

* fix bugs of HybridParallelClipGrad

* add unittest of pp with HybridParallelClipGrad

* fix bugs in mp_layers.py

* update

* fix bugs in pp_layers.py

* update

d247cf17

11 10月, 2021 1 次提交

[heterps] add fuse_allreduce (#35131) · e5b4dd73

由 danleifeng 提交于 10月 11, 2021

* heterps:add fuse_allreduce op; test=develop
* add program_mode in minimize for pslib mode;test=develop

e5b4dd73

09 10月, 2021 1 次提交
- Z
  support ClipGradByGlobalNorm in sharding (#36012) · 623df429
  由 zhaoyingli 提交于 10月 09, 2021
```
* support ClipGradByGlobalNorm in sharding

* support ClipGradByGlobalNorm in sharding

* test=allcase
```
  623df429
08 10月, 2021 1 次提交
- Y
  
  add fs list_files_info (#36224) · ca16e8fd
  由 yaoxuefeng 提交于 10月 08, 2021
  
  ca16e8fd
07 10月, 2021 1 次提交
- H
  fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer (#36237) · 730dcaf4
  由 Haohongxiang 提交于 10月 07, 2021
```
* fix bugs in HybridParallelClipGrad of hybrid_parallel_optimizer

* update

* update
```
  730dcaf4
30 9月, 2021 1 次提交

李

Fix raw optim (#36176) · 5e0f199a

由李季提交于 9月 30, 2021

* fix raw optim

* pre-commit test file
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

5e0f199a

29 9月, 2021 1 次提交
- W
  
  [hybrid] Fix model parallel non-distributed param broadcast (#36186) · bec9fc9a
  由 WangXi 提交于 9月 29, 2021
  
  bec9fc9a
28 9月, 2021 1 次提交
- W
  
  [hybrid] optimizer sharding support optimize cast (#35878) · eef0a943
  由 WangXi 提交于 9月 28, 2021
  
  eef0a943
24 9月, 2021 2 次提交

S

add update (#36017) · 1691dc7a
由 ShenLiang 提交于 9月 24, 2021

1691dc7a

fix distributed ops combining problems (#35942) · 4c35f515

由 seemingwang 提交于 9月 24, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

4c35f515

18 9月, 2021 2 次提交
- W
  
  [hybird] fix pipeline section program Parameter (#35847) · 67c63639
  由 WangXi 提交于 9月 18, 2021
  
  67c63639
- G
  fix bug of module 'paddle' has no attribute 'distributed' for python3.6 (#35848) · d4cd2590
  由 Guoxia Wang 提交于 9月 18, 2021
```
* fix bug
```
  d4cd2590
17 9月, 2021 3 次提交

[AMP] Support pure fp16 training mode for dygraph (#35521) · adaeee4d

由 zhangbo9674 提交于 9月 17, 2021

* add pure fp16 major function in auto_cast & tracer

* support master weight in dygraph for pure fp16

* check mix dtype of fp16&fp32 for check_finite_and_unscale op

* change pure fp16 funtion name

* refine some bug in auto_cast

* refine auto_cast interface logic

* add param _casted_by_pure_fp16 for class Layer

* support state_dict hook for save model by user appointed dtype in pure_fp16_decorator

* refine pure_fp16_decorator as decorator

* add unittest

* add comment

* add comment

* support recompute

* add comment for auto_cast and decorator

* support to_static_state_dict for paddle.jit.save

* unlimite models num and optimizers num

* add lookup_table in black_list

* fix momentum and layer state_dict

* fix bug in layer state_dict

* fix bug in layer state_dict_helper

* refine unittest

* refine test_momentun_op

* refine interface and some code

* refine amp_decorator interface

* refine pure fp16 interface

* refine master weight interface

adaeee4d

G

test=document_fix (#35824) · 177bf52f
由 Guoxia Wang 提交于 9月 17, 2021

177bf52f
G
add launch doc (#35634) · 5548061b
由 Guoxia Wang 提交于 9月 17, 2021
```
* add launch doc
```
5548061b

16 9月, 2021 2 次提交
- Y
  
  [hybrid] Fix mp multi gradient clip prob (#35713) · a4eadd15
  由 Yuang Liu 提交于 9月 16, 2021
  
  a4eadd15
- W
  
  [hybrid] remove scale op in insert_scale_loss_grad_ops (#35775) · 02b0be08
  由 WangXi 提交于 9月 16, 2021
  
  02b0be08
15 9月, 2021 2 次提交
- H
  
  fix bugs of PR 35401 (#35746) · 09eaa7d7
  由 Haohongxiang 提交于 9月 15, 2021
  
  09eaa7d7
- W
  
  [hybrid] out data parallel as optimizer sharding parallel (#35593) · 78465703
  由 WangXi 提交于 9月 15, 2021
  
  78465703
14 9月, 2021 2 次提交

Add solutions to PyLayer which is unsupported in DataParallel (#35401) · d483b8c0

由 Haohongxiang 提交于 9月 14, 2021

* Add solutions to PyLayer which is unsupported in DataParallel

* modify note format for parallel.py

* modify docs of dataparallel

* add docs of dp with pylayer

* modify docs format

* modify example format

* change example of dp with pylayer

* add unittest for dp with pylayer

* modify ut

* merge latest codes

* update

* modify for CI-Coverage

* modify text-indent

d483b8c0

Z
Fix RawProgramOptimizer bug (#35704) · 0f741880
由 Zeng Jinle 提交于 9月 14, 2021
```
* fix raw optimizer gm

* update

* update ut
```
0f741880

13 9月, 2021 2 次提交
- S
  [HybridParallel]Fix scaler bug in pipeline_parallel/model_parallel (#35556) · 2bb44317
  由 ShenLiang 提交于 9月 13, 2021
```
* support grad group

* fix single card condition
```
  2bb44317
- G
  support hybrid parallel inference helper class (#35576) · dc3c845a
  由 Guoxia Wang 提交于 9月 13, 2021
```
* support hybrid parallel inference helper class
```
  dc3c845a
10 9月, 2021 2 次提交
- J
  [Dygraph 4D Parallel] Sharding Support MP-PP-DP Parallelism (#35580) · 2c922d63
  由 JZ-LIANG 提交于 9月 10, 2021
```
* sharding support dp

* sharding support mp

* sharding support pp
```
  2c922d63
- S
  
  fix bug of recompute in hybridparallel (#35588) · d53e567a
  由 ShenLiang 提交于 9月 10, 2021
  
  d53e567a
08 9月, 2021 2 次提交

[Auto Parallel] Integrate all modules (#35483) · 12155358

由 Yulong Ao 提交于 9月 08, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments

* support shard reader

* support shard reader

* add parallel mode

* update process mesh

* add method to compute comm_group

* implement dist_embedding forward func

* implement dist matmul forward func

* implement dist reshape forward func

* add transpiler framework

* add transpiler forward

* implement transpiler forward

* implement transpiler backward & update

* add process

* add unitest

* chmod

* chmod

* chmod

* update unitest

* add unitest for gpt

* remove unused print

* rename transpiler --> partitioner

* rename transpiler --> partitioner

* chmod

* chmod

* bug fixed

* remove amp function

* update case for dp mode

* update case for dp mode

* [Auto Parallel] Integrate all parts with the newest code

* Integrate all parts of auto parallel and improve codes

* Integrate all parts by AutoParallelizer
* Add unit test for AutoParallelizer
* Improve auto completion module for pipeline parallel
* Add support for matmul_v2 in dist_matmul
* Correct the typo "stratergy" to "strategy"

* Modify distributed_strategy.proto to conform the main stream

* Restore parts of distributed_strategy to conform the develop branch
Co-authored-by: Nsandyhouse <lilong12@baidu.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

12155358

Enable program passes on Fleet APIs (#34955) · 5f369881

由 Zeng Jinle 提交于 9月 08, 2021

* add fleet api for program pass

* turn on apply pass for CI test

* fix disable fuse_all_optimizer bug

* try to test ci

* fix CI

* fill unspecified op role

* fix fuse_allreduce

* add ut to improve coverage

* remove useless change

* improve c++ coverage

* follow some comments

* test ir pass pipeline

* update doc

* reduce ut time again

5f369881

01 9月, 2021 2 次提交
- S
  [HybridParallel]Support finetinue model for PipelineParallel (#35287) · 264ff9ef
  由 ShenLiang 提交于 9月 01, 2021
```
* add cache for send_recv

* add eval_batch for pipeline

* add eval batch for pipelineparallel

* add style code
```
  264ff9ef
- J
  
  bugfix for mp accuracy (#35326) · 7f17f9a0
  由 JZ-LIANG 提交于 9月 01, 2021
  
  7f17f9a0
25 8月, 2021 1 次提交
- W
  
  [hybrid npu] fix npu found_finite in hybrid (#35134) · f609ca37
  由 WangXi 提交于 8月 25, 2021
  
  f609ca37
20 8月, 2021 1 次提交
- Y
  
  [hybrid performance] Grad fuse for gradient merge under pipeline mode (#35004) · 4d9b2d6d
  由 Yuang Liu 提交于 8月 20, 2021
  
  4d9b2d6d
18 8月, 2021 2 次提交
- W
  [Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16... · a9673b44
  由 WangXi 提交于 8月 18, 2021
```
[Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16 param to the optimizer (#34965)
```
  a9673b44
- F
  [CPU-PSLIB] Add consistency insepection of use_var_list and data_generator... · 209075a4
  由 Fan Zhang 提交于 8月 18, 2021
```
[CPU-PSLIB] Add consistency insepection of use_var_list and data_generator data, test=develop (#34463)
```
  209075a4

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致