提交 · adaeee4d3d3834616e121c32c95b09a87f24712d · PaddlePaddle / Paddle

17 9月, 2021 3 次提交

[AMP] Support pure fp16 training mode for dygraph (#35521) · adaeee4d

由 zhangbo9674 提交于 9月 17, 2021

* add pure fp16 major function in auto_cast & tracer

* support master weight in dygraph for pure fp16

* check mix dtype of fp16&fp32 for check_finite_and_unscale op

* change pure fp16 funtion name

* refine some bug in auto_cast

* refine auto_cast interface logic

* add param _casted_by_pure_fp16 for class Layer

* support state_dict hook for save model by user appointed dtype in pure_fp16_decorator

* refine pure_fp16_decorator as decorator

* add unittest

* add comment

* add comment

* support recompute

* add comment for auto_cast and decorator

* support to_static_state_dict for paddle.jit.save

* unlimite models num and optimizers num

* add lookup_table in black_list

* fix momentum and layer state_dict

* fix bug in layer state_dict

* fix bug in layer state_dict_helper

* refine unittest

* refine test_momentun_op

* refine interface and some code

* refine amp_decorator interface

* refine pure fp16 interface

* refine master weight interface

adaeee4d

G

test=document_fix (#35824) · 177bf52f
由 Guoxia Wang 提交于 9月 17, 2021

177bf52f
G
add launch doc (#35634) · 5548061b
由 Guoxia Wang 提交于 9月 17, 2021
```
* add launch doc
```
5548061b

16 9月, 2021 3 次提交
- Y
  
  [hybrid] Fix mp multi gradient clip prob (#35713) · a4eadd15
  由 Yuang Liu 提交于 9月 16, 2021
  
  a4eadd15
- L
  remove distributed attributes at the last stage for auto parallel (#35605) · a3790606
  由 lilong12 提交于 9月 16, 2021
```
* update
```
  a3790606
- W
  
  [hybrid] remove scale op in insert_scale_loss_grad_ops (#35775) · 02b0be08
  由 WangXi 提交于 9月 16, 2021
  
  02b0be08
15 9月, 2021 3 次提交
- Z
  add dist_attr for dist op and var (#35585) · fc5fb2a1
  由 zhaoyingli 提交于 9月 15, 2021
```
* add dist_attr for dist op

* add unitest

* update inputname

* update function name

* add unitest

* update CMakeLists.txt for CI

* fix dis_matmul

* fix compile error

* update matmul to matmul_v2
```
  fc5fb2a1
- H
  
  fix bugs of PR 35401 (#35746) · 09eaa7d7
  由 Haohongxiang 提交于 9月 15, 2021
  
  09eaa7d7
- W
  
  [hybrid] out data parallel as optimizer sharding parallel (#35593) · 78465703
  由 WangXi 提交于 9月 15, 2021
  
  78465703
14 9月, 2021 2 次提交

Add solutions to PyLayer which is unsupported in DataParallel (#35401) · d483b8c0

由 Haohongxiang 提交于 9月 14, 2021

* Add solutions to PyLayer which is unsupported in DataParallel

* modify note format for parallel.py

* modify docs of dataparallel

* add docs of dp with pylayer

* modify docs format

* modify example format

* change example of dp with pylayer

* add unittest for dp with pylayer

* modify ut

* merge latest codes

* update

* modify for CI-Coverage

* modify text-indent

d483b8c0

Z
Fix RawProgramOptimizer bug (#35704) · 0f741880
由 Zeng Jinle 提交于 9月 14, 2021
```
* fix raw optimizer gm

* update

* update ut
```
0f741880

13 9月, 2021 4 次提交
- D
  
  fix launch util trainer rank function; test=develop (#35610) · a6ac4e80
  由 danleifeng 提交于 9月 13, 2021
  
  a6ac4e80
- 李
  upload global scatter and global gather operators related files (#35546) · ecfe8375
  由李季提交于 9月 13, 2021
```
* upload global scatter and global gather operators related files
```
  ecfe8375
- S
  [HybridParallel]Fix scaler bug in pipeline_parallel/model_parallel (#35556) · 2bb44317
  由 ShenLiang 提交于 9月 13, 2021
```
* support grad group

* fix single card condition
```
  2bb44317
- G
  support hybrid parallel inference helper class (#35576) · dc3c845a
  由 Guoxia Wang 提交于 9月 13, 2021
```
* support hybrid parallel inference helper class
```
  dc3c845a
11 9月, 2021 1 次提交
- B
  
  Add cpu npu cembedding (#35467) · ec252914
  由 Baibaifan 提交于 9月 11, 2021
  
  ec252914
10 9月, 2021 2 次提交
- J
  [Dygraph 4D Parallel] Sharding Support MP-PP-DP Parallelism (#35580) · 2c922d63
  由 JZ-LIANG 提交于 9月 10, 2021
```
* sharding support dp

* sharding support mp

* sharding support pp
```
  2c922d63
- S
  
  fix bug of recompute in hybridparallel (#35588) · d53e567a
  由 ShenLiang 提交于 9月 10, 2021
  
  d53e567a
08 9月, 2021 5 次提交

[Auto Parallel] Integrate all modules (#35483) · 12155358

由 Yulong Ao 提交于 9月 08, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments

* support shard reader

* support shard reader

* add parallel mode

* update process mesh

* add method to compute comm_group

* implement dist_embedding forward func

* implement dist matmul forward func

* implement dist reshape forward func

* add transpiler framework

* add transpiler forward

* implement transpiler forward

* implement transpiler backward & update

* add process

* add unitest

* chmod

* chmod

* chmod

* update unitest

* add unitest for gpt

* remove unused print

* rename transpiler --> partitioner

* rename transpiler --> partitioner

* chmod

* chmod

* bug fixed

* remove amp function

* update case for dp mode

* update case for dp mode

* [Auto Parallel] Integrate all parts with the newest code

* Integrate all parts of auto parallel and improve codes

* Integrate all parts by AutoParallelizer
* Add unit test for AutoParallelizer
* Improve auto completion module for pipeline parallel
* Add support for matmul_v2 in dist_matmul
* Correct the typo "stratergy" to "strategy"

* Modify distributed_strategy.proto to conform the main stream

* Restore parts of distributed_strategy to conform the develop branch
Co-authored-by: Nsandyhouse <lilong12@baidu.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

12155358

Intergrate GLOOParallelContext to support Multi-CPU Core for Dygraph DataParallel (#35154) · 51cc73f0

由 xiongkun 提交于 9月 08, 2021

* can pass the fake test

* add files

* modify cmake to pass windows-ci

* for ci pass

* WITH_GLOO=ON

* for pass coverage test

* add cpuonly testcase

* add

* disable nccl when compile with cuda

* change python version in cpuonly

* add backend argument

* add required gpu

* add required:gpu

51cc73f0

Enable program passes on Fleet APIs (#34955) · 5f369881

由 Zeng Jinle 提交于 9月 08, 2021

* add fleet api for program pass

* turn on apply pass for CI test

* fix disable fuse_all_optimizer bug

* try to test ci

* fix CI

* fill unspecified op role

* fix fuse_allreduce

* add ut to improve coverage

* remove useless change

* improve c++ coverage

* follow some comments

* test ir pass pipeline

* update doc

* reduce ut time again

5f369881

L
hidden the auto parallel apis (#35385) · afd1b372
由 lilong12 提交于 9月 08, 2021
```
* update, test=develop
```
afd1b372
L
add checkers for auto parallel apis (#35486) · 39540b0e
由 lilong12 提交于 9月 08, 2021
```
* update, test=develop
```
39540b0e

02 9月, 2021 1 次提交

[Auto Parallel] Logical Partition & Dist Op (#35117) · a622b701

由 JZ-LIANG 提交于 9月 02, 2021

* support shard reader

* support shard reader

* add parallel mode

* update process mesh

* add method to compute comm_group

* implement dist_embedding forward func

* implement dist matmul forward func

* implement dist reshape forward func

* add transpiler framework

* add transpiler forward

* implement transpiler forward

* implement transpiler backward & update

* add process

* add unitest

* chmod

* chmod

* chmod

* update unitest

* add unitest for gpt

* remove unused print

* rename transpiler --> partitioner

* rename transpiler --> partitioner

* chmod

* chmod

* bug fixed

* remove amp function

* update case for dp mode

* update case for dp mode

a622b701

01 9月, 2021 2 次提交
- S
  [HybridParallel]Support finetinue model for PipelineParallel (#35287) · 264ff9ef
  由 ShenLiang 提交于 9月 01, 2021
```
* add cache for send_recv

* add eval_batch for pipeline

* add eval batch for pipelineparallel

* add style code
```
  264ff9ef
- J
  
  bugfix for mp accuracy (#35326) · 7f17f9a0
  由 JZ-LIANG 提交于 9月 01, 2021
  
  7f17f9a0
27 8月, 2021 1 次提交
- W
  
  [hybrid] Fix row parallel linear bias (#35186) · 1533d7e2
  由 WangXi 提交于 8月 27, 2021
  
  1533d7e2
25 8月, 2021 1 次提交
- W
  
  [hybrid npu] fix npu found_finite in hybrid (#35134) · f609ca37
  由 WangXi 提交于 8月 25, 2021
  
  f609ca37
24 8月, 2021 2 次提交

L

add checker, test=develop (#35109) · 881e55e4
由 lilong12 提交于 8月 24, 2021

881e55e4

Add auto completion module for auto parallel (#34813) · 93d862b0

由 Yulong Ao 提交于 8月 24, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments
Co-authored-by: Nsandyhouse <lilong12@baidu.com>

93d862b0

23 8月, 2021 1 次提交
- B
  
  [CPU] Enable barrier op upon gloo (#34671) · e8f146a9
  由 Bo Liu 提交于 8月 23, 2021
  
  e8f146a9
20 8月, 2021 1 次提交
- Y
  
  [hybrid performance] Grad fuse for gradient merge under pipeline mode (#35004) · 4d9b2d6d
  由 Yuang Liu 提交于 8月 20, 2021
  
  4d9b2d6d
18 8月, 2021 3 次提交
- W
  [Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16... · a9673b44
  由 WangXi 提交于 8月 18, 2021
```
[Hybrid Performance] Move the cast op of AMP which cast fp32 param to fp16 param to the optimizer (#34965)
```
  a9673b44
- F
  [CPU-PSLIB] Add consistency insepection of use_var_list and data_generator... · 209075a4
  由 Fan Zhang 提交于 8月 18, 2021
```
[CPU-PSLIB] Add consistency insepection of use_var_list and data_generator data, test=develop (#34463)
```
  209075a4
- L
  
  Fix bug in alltoall (#34975) · 2e9a31eb
  由 lilong12 提交于 8月 18, 2021
  
  2e9a31eb
17 8月, 2021 1 次提交
- R
  
  [NPU]Adamw skip update for npu (#34897) · b4474fb4
  由 Roc 提交于 8月 17, 2021
  
  b4474fb4
13 8月, 2021 1 次提交
- S
  [Bug-Fix]fix bug of py36 import utils (#34873) · 507ea06f
  由 ShenLiang 提交于 8月 13, 2021
```
* fix bug of py36 import
```
  507ea06f
12 8月, 2021 1 次提交
- S
  [HybridParallel]Add Recompute for PipeLineParallel (#34607) · 589d13c5
  由 ShenLiang 提交于 8月 12, 2021
```
* add recompute for pp

* add recompute offload

* add recompute partition
```
  589d13c5
11 8月, 2021 2 次提交
- W
  
  [hybrid] pp+dp support fp16 allreduce (#34762) · 4d7af372
  由 WangXi 提交于 8月 11, 2021
  
  4d7af372
- L
  add the basic apis for auto_parallel (#33804) · 3f962e77
  由 lilong12 提交于 8月 11, 2021
```
* add auto_parallel apis
```
  3f962e77

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功