提交 · 1159f7532d0748e26511861de36b6faabb5520d6 · PaddlePaddle / Paddle

08 9月, 2021 21 次提交

[Auto Parallel] Integrate all modules (#35483) · 12155358

由 Yulong Ao 提交于 9月 08, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments

* support shard reader

* support shard reader

* add parallel mode

* update process mesh

* add method to compute comm_group

* implement dist_embedding forward func

* implement dist matmul forward func

* implement dist reshape forward func

* add transpiler framework

* add transpiler forward

* implement transpiler forward

* implement transpiler backward & update

* add process

* add unitest

* chmod

* chmod

* chmod

* update unitest

* add unitest for gpt

* remove unused print

* rename transpiler --> partitioner

* rename transpiler --> partitioner

* chmod

* chmod

* bug fixed

* remove amp function

* update case for dp mode

* update case for dp mode

* [Auto Parallel] Integrate all parts with the newest code

* Integrate all parts of auto parallel and improve codes

* Integrate all parts by AutoParallelizer
* Add unit test for AutoParallelizer
* Improve auto completion module for pipeline parallel
* Add support for matmul_v2 in dist_matmul
* Correct the typo "stratergy" to "strategy"

* Modify distributed_strategy.proto to conform the main stream

* Restore parts of distributed_strategy to conform the develop branch
Co-authored-by: Nsandyhouse <lilong12@baidu.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

12155358

W
multiply supports bool · db5fd2a1
由 will-jl944 提交于 9月 08, 2021
```
multiply supports bool  
```
db5fd2a1
N

modified pool_op for higher performance (#33144) · b95c5ae0
由 niuliling123 提交于 9月 08, 2021

b95c5ae0

refactor new executor (#35537) · 0eb7c942

由 wanghuancoder 提交于 9月 08, 2021

* refactor new executor, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

0eb7c942

C
mark WhileOp AsExtra attribute (#35499) · ce7c18f6
由 CtfGo 提交于 9月 08, 2021
```
* mark WhileOp AsExtra attribute

* revert kX and kOutputs
```
ce7c18f6
N

Modify the reduce op according to the kernel primitive api (#35282) · 82b33be3
由 niuliling123 提交于 9月 08, 2021

82b33be3
L
add clip_by_norm fp16 kernel (#35446) · 7aa4d879
由 Leo Chen 提交于 9月 08, 2021
```
* add clip_by_norm fp16 kernel

* add ut
```
7aa4d879

Slice bug (#35357) · 28abd5d8

由 Shang Zhizhou 提交于 9月 08, 2021

* update slice plugin

* add test

* fix code style

* fix trt6

* update test

* fix test

* add timeout

* update trt version

* update cmake

28abd5d8

Intergrate GLOOParallelContext to support Multi-CPU Core for Dygraph DataParallel (#35154) · 51cc73f0

由 xiongkun 提交于 9月 08, 2021

* can pass the fake test

* add files

* modify cmake to pass windows-ci

* for ci pass

* WITH_GLOO=ON

* for pass coverage test

* add cpuonly testcase

* add

* disable nccl when compile with cuda

* change python version in cpuonly

* add backend argument

* add required gpu

* add required:gpu

51cc73f0

H

Add AsExtra in relu6 Op Maker (#35472) · 692ac3e5
由 hong19860320 提交于 9月 08, 2021

692ac3e5
G

fix bug (#35482) · e133d8ef
由 Guoxia Wang 提交于 9月 08, 2021

e133d8ef
Z
Fix scatter_nd_add and gather bug (#35544) · 3c457a38
由 Zeng Jinle 提交于 9月 08, 2021
```
* fix scatter_add_nd and gather bug

* fix gather compile error
```
3c457a38

Enable program passes on Fleet APIs (#34955) · 5f369881

由 Zeng Jinle 提交于 9月 08, 2021

* add fleet api for program pass

* turn on apply pass for CI test

* fix disable fuse_all_optimizer bug

* try to test ci

* fix CI

* fill unspecified op role

* fix fuse_allreduce

* add ut to improve coverage

* remove useless change

* improve c++ coverage

* follow some comments

* test ir pass pipeline

* update doc

* reduce ut time again

5f369881

fix the bug of layer_norm when batch_size=1 (#35480) · ad5f7494

由 zhangkaihuo 提交于 9月 08, 2021

The bug is that access to mean and var is incorrect, and the array will be out of bounds: the shape of mean and var is [batch_size], and the range of thread idx is 0~feature_size, so mean[idx] and var[idx] is incorrect.

When batch_size=1, the correct access is mean[0] and var[0], and a unit test with batch_size=1 is added.

ad5f7494

C

Add FP16 PRelu (#35532) · 4e62af80
由 cc 提交于 9月 08, 2021

4e62af80

merge CMakeList.txt manual (#35378) · c4a3e8b4

由 feng_shuai 提交于 9月 08, 2021

* merge CMakeList.txt manual

* add platform for changethreadnum

* repair some bugs according to make error

* do nothing just flush CI

* forget change thread num

* add inplace_atol param for check_output_with_place

* Windows

* std:min and std::max should be change because of windows

c4a3e8b4

L
[NPU] release gil before op run (#35370) · db6242e9
由 Leo Chen 提交于 9月 08, 2021
```
* release gil before op run

* support npu grad test

* fix op_test
```
db6242e9
Z

Add op define extra for norm and frobenius norm op. (#35329) · 3dab2e20
由 Zhong Hui 提交于 9月 08, 2021

3dab2e20

Work queue group (#35470) · a53460aa

由 liutiexing 提交于 9月 08, 2021

* Split Tracker and WorkQueue

* add WorkQueueGroup

* add unittest

* fix

* update

* update

* fix compile

a53460aa

add the matmul v2 grad kernel · b3787d1b

由 wawltor 提交于 9月 08, 2021

* add the matmul v2 grad kernel

* relief the test case time

* update the test case for the matmul double grad

* remove the unsed code for the matmul double grad

* update the test case for the double grad matmul

* remove the unused code in dot

b3787d1b

W

[NPU] add get_float_status op and refine NPU check_nan_inf (#35274) · c727ec4a
由 WangXi 提交于 9月 08, 2021

c727ec4a

07 9月, 2021 15 次提交

Y

support multi-node (#35396) · c6e0cedc
由 yaoxuefeng 提交于 9月 07, 2021

c6e0cedc
W
add conv op check for illegal input or attributes (#35337) · 8307b0cb
由 wangxinxin08 提交于 9月 07, 2021
```
* add conv op check for illegal input or attributes
```
8307b0cb
N

Modify the elementwise op according to the kernel primitive API (#34456) · eae4bf5b
由 niuliling123 提交于 9月 07, 2021

eae4bf5b
P

add as-extra for softplus/leaky_relu/softmax (#35493) · b211f02b
由 Pei Yang 提交于 9月 07, 2021

b211f02b
Q
[NPU] update batch norm op, test=develop (#35223) · cc6d2b07
由 Qi Li 提交于 9月 07, 2021
```
* [NPU] update batch norm op, test=develop

* add NHWC support for bn, test=develop
```
cc6d2b07

[NPU] Add norm_grad kernel (#35237) · cf408949

由 furnace 提交于 9月 07, 2021

* [NPU] fix for test_norm_op_npu

* [NPU] add norm_grad

* [NPU] add CheckAxis for axis

* [NPU] delete debug codes

* norm can not use L2Normalize, norm_grad can use L2NormalizeGrad

* [NPU] delete useless codes

* [NPU] optimize norm_grad OpMaker

* Update python import path

cf408949

[NPU] log_softmax_grad, test=develop (#35484) · e928274c

由 Qi Li 提交于 9月 07, 2021

* [NPU] log_softmax_grad, test=develop

* remove debug files, test=develop

* update lookup_table_v2 for CANN 5.0.x, test=develop

e928274c

[oneDNN] Disable cache matmul v1 & refactoring (#35331) · e9ae8dd0

由 Jacek Czaja 提交于 9月 07, 2021

* - refactoring progressing

- Fix

- compilation fix

- another compilation fix

- refactoring

* - fix

* - compilation fix

* - compilation fix

* - missing set_format

* - compilation fix

* - reverted setting memeory format

* - Brought back format

* - Fix

* - fixes after review

* CI rerun

* CI rerun

e9ae8dd0

J
Fix for reshape2 oneDNN op (#35455) · 36cdb6e2
由 jakpiase 提交于 9月 07, 2021
```
* fix for reshape2

* added reviewers sugestions
```
36cdb6e2
C

fix int8 (#35504) · ed97be09
由 ceci3 提交于 9月 07, 2021

ed97be09
D
operators/flatten_op.cc add AsExtra (#35471) · 0c71edc3
由 dyning 提交于 9月 07, 2021
```
* operators/flatten_op.cc add AsExtra

* operators/flatten_op.cc add AsExtra

* fix format
```
0c71edc3

add AsExtra in data_norm op (#35420) · 7907e241

由 XiangGao 提交于 9月 07, 2021

* add AsExtra in data_norm op

* pass data_layout from python to data_norm op

* fix data_layout in data_norm op
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>

7907e241

A
Fix DryRun unittest failed from test_standalon_executor.py (#35433) · 071e8156
由 Aurelius84 提交于 9月 07, 2021
```
* fix commit

* Open unittest

* fix unittest on Windows

* fix constructor
```
071e8156

support test different infer_ut suite type (#35435) · 5bb12853

由 Peihan 提交于 9月 07, 2021

* notest,test=inference;support test different suite type

* notest,test=inference;fix script bugs

* notest,test=inference;fix count time issue

* test=document_fix; fix readme grammar

5bb12853

[HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is in… (#35394) · 28b64075

由 xiayanming 提交于 9月 07, 2021

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug

28b64075

06 9月, 2021 4 次提交
- W
  support double in deformable conv (#35330) · 266fcbe0
  由 wangguanzhong 提交于 9月 06, 2021
```
* support double in deformable conv

* add double for dcn v2
```
  266fcbe0
- W
  Add the extra flag for the some ops (#35442) · 49797d85
  由 wawltor 提交于 9月 06, 2021
```
* Add the extra flag for the some ops

* fix the compile problem in matmul extra
```
  49797d85
- J
  Add fusion_lstm INT8 PTQ (#35334) · 7ef04da6
  由 joanna.wozna.intel 提交于 9月 06, 2021
```
* Add fusion_lstm INT8 PTQ

* Correct mkldnn_cache_capacity and enable fc_lstm_fuse_pass only for this test

* Change mkldnn_cache_capacity
```
  7ef04da6
- W
  Add grad grad for AvgPool2D (#35388) · 97798f9a
  由 Wei Shengyu 提交于 9月 06, 2021
```
* add pool2d grad grad

* dbg

* add unittest

* update format

* add more unittests

* dbg
```
  97798f9a

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功