提交 · 2c922d63f275d09fe62b8db4577ad9d1eb484834 · 机器未来 / Paddle

10 9月, 2021 24 次提交
- J
  [Dygraph 4D Parallel] Sharding Support MP-PP-DP Parallelism (#35580) · 2c922d63
  由 JZ-LIANG 提交于 9月 10, 2021
```
* sharding support dp

* sharding support mp

* sharding support pp
```
  2c922d63
- F
  
  test=document_fix (#35655) · 49e243c9
  由 Feng Xing 提交于 9月 10, 2021
  
  49e243c9
- F
  
  re-submit softmax_with_cross_entropy hard label (#35283) · a4b67f78
  由 Feng Xing 提交于 9月 10, 2021
  
  a4b67f78
- S
  Fix warning (#34875) · 966f042d
  由 sunzhongkai588 提交于 9月 10, 2021
```
* fix warning error , test=document_fix

* fix warning error , test=document_fix

* fix warning error , test=document_fix

* fix warning error , test=document_fix

* fix warning error , test=document_fix

* fix warning error , test=document_fix

* fix warning error , test=document_fix
```
  966f042d
- add the extra for op rnn/sequence_conv/sequence_pool/sequence_softmax (#35554) · d8bfe83d
  由 zhouweiwei2014 提交于 9月 10, 2021
  
  d8bfe83d
- Z
  
  set gradient_merge_cond persistable to false (#35578) · 47d15a30
  由 Zhong Hui 提交于 9月 10, 2021
  
  47d15a30
- Z
  
  add api_op fill_diagonal_tensor (#34515) · 98d047d7
  由 zhiboniu 提交于 9月 10, 2021
  
  98d047d7
- S
  
  fix api doc of paddle.any' (#35631) · deb40f06
  由 Shang Zhizhou 提交于 9月 10, 2021
  
  deb40f06
- H
  add cumprod op (#35185) · 4e509f46
  由 hlygit66666 提交于 9月 10, 2021
```
* add test_cumprod_op

* Revert "add test_cumprod_op"

This reverts commit c96cf6dff5d09ae7d8cc72c1e8ae4369a153aa19.

* recommit

* add error message

* test input(x) initialize

* test use cpu

* update test code

* add test type

* add test case

* solve ci problem

* add complex case test

* add complex case test

* fix review problem

* fix conflict

* fix some docs

* change test case

* change test case

* fix review problems again

* fix docs

* fix inclusivescan bug
```
  4e509f46
- H
  Support float16 when using ClipGradByGlobalNorm. (#33565) · 5bdca05b
  由 huangxu96 提交于 9月 10, 2021
```
This PR supports gradient clip (ClipGradByGlobalNorm) when training with AMP(auto mixed precision).
```
  5bdca05b
- C
  add llvm::SmallVector to paddle (#34832) · 11965bca
  由 chentianyu03 提交于 9月 10, 2021
```
* add llvm::SmallVector to paddle

* rename small vector file

* merge paddle small vector to one file

* add small_vector_test

* modify smallvector test argument type

* add string header
```
  11965bca
- C
  import ska flat_hash_map (#34464) · 3d9603dc
  由 chentianyu03 提交于 9月 10, 2021
```
* import ska flat_hash_map

* add define NOMINMAX macro to fix windows build failed bug

* add brackets to std::max in flat_hash_map

* move flat_hash_map directions

* modify namespace to paddle

* modify namespace to paddle

* modify namespace to paddle

* modify namespace to paddle

* rm not used map.h and replace with op_info
```
  3d9603dc
- B
  
  add prelu trt converter test case (#35512) · 749945b3
  由 baoachun 提交于 9月 10, 2021
  
  749945b3
- F
  
  change trt_tile_op half diff and add some func for CE (#35597) · 922e23bf
  由 feng_shuai 提交于 9月 10, 2021
  
  922e23bf
- B
  
  add elementwise trt converter test cases (#35552) · 29cacee4
  由 baoachun 提交于 9月 10, 2021
  
  29cacee4
- R
  
  [NPU] support gradient_accumulator (#35044) · 0b6623d7
  由 ronnywang 提交于 9月 10, 2021
  
  0b6623d7
- S
  
  fix bug of recompute in hybridparallel (#35588) · d53e567a
  由 ShenLiang 提交于 9月 10, 2021
  
  d53e567a
- H
  Add As_extra to dropout op and lrn op (#35349) · 652da1f4
  由 huangjun12 提交于 9月 10, 2021
```
* add as_extra to dropout op and lrn op

* refine details

* fix dropout op maker
```
  652da1f4
- J
  fix extra op for expand, expand_as, tile, unstack (#35598) · 9c9eba13
  由 Jiawei Wang 提交于 9月 10, 2021
```
* fix extra op for expand, expand_as, tile, unstack

* Update expand_v2_op.cc
```
  9c9eba13
- C
  fix bn/in/squeeze/syncbn extra (#35502) · d7985052
  由 ceci3 提交于 9月 10, 2021
```
* fix bn/in/squeeze/syncbn extra

* update bn

* update

* update
```
  d7985052
- S
  add opdef extra (#35514) · 3896bdbd
  由 Shang Zhizhou 提交于 9月 10, 2021
```
* add opdef extra

* add reduce mean

* update style
```
  3896bdbd
- Z
  Fix scatter and gather bug (#35595) · 6f7aca9e
  由 Zeng Jinle 提交于 9月 10, 2021
```
* fix scatter gather bug:

* fix windows ci
```
  6f7aca9e
- W
  conv3d (#35507) · 42847d2e
  由 wenbin 提交于 9月 10, 2021
```
* conv3d

* remove const_cast

* modify ut

* disable dynamic shape for trt6.0

* remove trt5
```
  42847d2e
- P
  add asExtra for nce op (#35474) · 512329b0
  由 pangyoki 提交于 9月 10, 2021
```
* add asExtra for nce op

* fix unittest error in macos

* remove asExtra for is_test
```
  512329b0
09 9月, 2021 9 次提交
- W
  mark extra attr for unsqueeze2 (#35528) · 4beaa754
  由 Wei Shengyu 提交于 9月 09, 2021
```
* mark extra attr for unsqueeze2

* debug for inference
```
  4beaa754
- C
  
  optimization of index_select forward op (#32863) · f05e444a
  由 crystal 提交于 9月 09, 2021
  
  f05e444a
- X
  
  quant: fix a export bug (#35410) · 81e702ac
  由 XGZhang 提交于 9月 09, 2021
  
  81e702ac
- X
  
  Update quant_layers.py (#35392) · 2d6871d3
  由 XGZhang 提交于 9月 09, 2021
  
  2d6871d3
- W
  
  Add extra flags for attr of affine_grid_op (#35581) · 7fcb9e37
  由 whs 提交于 9月 09, 2021
  
  7fcb9e37
- Z
  
  test=document_fix (#35606) · 92810e69
  由 zhangchunle 提交于 9月 09, 2021
  
  92810e69
- T
  
  test=document_fix (#35592) · 16a2fdaf
  由 tianshuo78520a 提交于 9月 09, 2021
  
  16a2fdaf
- Z
  
  add a fusion op: fused_residual_dropout_bias (#34963) · cf8bf032
  由 zhangkaihuo 提交于 9月 09, 2021
  
  cf8bf032
- 0
  Add matrix_rank Op and it's GPU and CPU kernel (#34823) · eb1fbf12
  由 0x45f 提交于 9月 09, 2021
```
* init matrix_rank op, add matrix_rank CPU code and test

* add GPU kernel, remove svd_eigen.h

* add CPU kernel when tol is tensor

* add cpu and gpu code when tol is tensor

* fix CI-ROCM error

* add matrix_rank API describe, fix PR-CI-Py3 error

* fix PR-CI-Windows error, add matrix_rank API test

* delete useless comments

* fix review

* add my code in svd_helper.h

* update doc commets

* remove spaces
```
  eb1fbf12
08 9月, 2021 7 次提交

L
add backward inplace for dygraph (#35412) · 0cb413d3
由 Leo Chen 提交于 9月 08, 2021
```
* add backward inplace for dygraph

* fix bug

* support gradient accumulation
```
0cb413d3
X

Change depth as 1 when cloning benchmark code,test=document_fix (#35590) · abe70d3e
由 xiegegege 提交于 9月 08, 2021

abe70d3e
L

Refactor softmax_cudnn kernel impl for code reuse. (#35350) · ef61da86
由 Li Min 提交于 9月 08, 2021

ef61da86
modify unittest parallel rule to avoid UT failure (#35567) · 1159f753
由 zhouweiwei2014 提交于 9月 08, 2021

1159f753
add API Tensor.T for reverse dim of Tensor (#35379) · 2133f3dd
由 zhouweiwei2014 提交于 9月 08, 2021

2133f3dd

[Auto Parallel] Integrate all modules (#35483) · 12155358

由 Yulong Ao 提交于 9月 08, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments

* support shard reader

* support shard reader

* add parallel mode

* update process mesh

* add method to compute comm_group

* implement dist_embedding forward func

* implement dist matmul forward func

* implement dist reshape forward func

* add transpiler framework

* add transpiler forward

* implement transpiler forward

* implement transpiler backward & update

* add process

* add unitest

* chmod

* chmod

* chmod

* update unitest

* add unitest for gpt

* remove unused print

* rename transpiler --> partitioner

* rename transpiler --> partitioner

* chmod

* chmod

* bug fixed

* remove amp function

* update case for dp mode

* update case for dp mode

* [Auto Parallel] Integrate all parts with the newest code

* Integrate all parts of auto parallel and improve codes

* Integrate all parts by AutoParallelizer
* Add unit test for AutoParallelizer
* Improve auto completion module for pipeline parallel
* Add support for matmul_v2 in dist_matmul
* Correct the typo "stratergy" to "strategy"

* Modify distributed_strategy.proto to conform the main stream

* Restore parts of distributed_strategy to conform the develop branch
Co-authored-by: Nsandyhouse <lilong12@baidu.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

12155358

W
multiply supports bool · db5fd2a1
由 will-jl944 提交于 9月 08, 2021
```
multiply supports bool  
```
db5fd2a1

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致