提交 · 44b81e633b87ed2366649f13523d1a4b24082ad6 · BaiXuePrincess / Paddle

07 1月, 2021 5 次提交

[Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63

由 furnace 提交于 1月 07, 2021

* Layer norm fp16 (#29169)

* add fp16 for layer_norm op

* revert layernorm api

* fix forward

* fix forward

* fix backward for layernorm with fp16

* fix unit test for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16

* 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>

* fix with_mkldnn compile error for layernorm with fp16

* fix with_mkldnn compile error for layernorm with fp16
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* fix layer_norm accuracy (#29434)

* Layernorm opt (#29522)

* layernorm fw opt

* layernorm bw opt

* fix typo, test=develop

* remove const dim3 for windows CI compatibility

* merge develop
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

* Fix compile problem when cuda_arch < 6000 (#29576)

* fix compile problem when cuda_arch < 6000

* refine code

* refine code
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>

44b81e63

S

add inference api： DisableTensorRtOps (#30109) (#30178) · cb71fea0
由 Shang Zhizhou 提交于 1月 07, 2021

cb71fea0
T
pre padding in dygraph (#30179) · a2b0357d
由 tangwei12 提交于 1月 07, 2021
```
Change-Id: Ia5279b0cbb6a5b3970aff66e9510e0d85efa70ce
```
a2b0357d
L

fix xpu pe sync, test=notest (#30095) (#30114) · 85545bbc
由 liuyuhui 提交于 1月 07, 2021

85545bbc

Cherry pick bn (#30136) · 157ff094

由 ceci3 提交于 1月 07, 2021

* fix bn docs (#30096)

* add attribute for batch_norm (#29950)

* add attribute for batch_norm

157ff094

06 1月, 2021 5 次提交

support dygraph in xpu place (#30051) (#30112) · 285f33e5

由 hong 提交于 1月 06, 2021

* support dygraph in xpu place; test=develop

* fix cpu/gpu compile error; test=develop

* fix compile error; test=develop

* fix xpu compile error; testd=develop

285f33e5

G
Cherrypick 30071 (#30074) · 19bec2fe
由 gongweibao 提交于 1月 06, 2021
```
* fix log test=release/2.0

* fix ut test=develop
```
19bec2fe

[Cherry-pick]cherry-pick to Release/2.0 (#30076) · 1ad7fcbf

由 huangxu96 提交于 1月 06, 2021

* add fp16 check into max and avg pool (#29479)

* Add ReserveSpace in dygraph batch_norm. (#29221)

* Add ReserveSpace in dygraph batch_norm.

* Add unittest for reservespace

* add float16 into adaptive_avg_pool2d check list. (#29547)

1ad7fcbf

[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for... · 743649b5

由 liym27 提交于 1月 06, 2021

[Cherry-Pick 2.0][Dynamic Inplace] Support ShareInplaceVersionCounterWith for C++ Tensor (#29842) (#30105)

Before this PR, SharePlaceHolderWith share Tensor between different C++ Variable, which meas sharing the data, shape, and inplace_version_counter_ of Tensor.
But in some cases, Sharing data and inplace_version_counter_ but not sharing shape is needed. For example, inplace op reshape, can't share shape.

This PR, discard SharePlaceHolderWith, and expose ShareInplaceVersionCounterWith for C++ Tensor.
This reverts commit b10ecd9d.

* Support ShareInplaceVersionCounterWith to share the same inplace version counter for VarBase

743649b5

L
[Cherry-pick 2.0] Migrate 4 APIs about array to paddle.tensor.* (#29565) (#30101) · 52caf787
由 liym27 提交于 1月 06, 2021
```
4 APIs: array_length, array_read, array_write, create_array，cherry-pick #29565
```
52caf787

05 1月, 2021 7 次提交

T
add topo-aware in heter-ps (#30087) (#30117) · 7fc2ce50
由 Thunderbrook 提交于 1月 05, 2021
```
* add topo aware

* resource.h

* topo aware

* format
```
7fc2ce50

[Cherry-pick 2.0] cherry pick 3 PRs about Dynamic-to-Static (#30100) · faeee3c3

由 liym27 提交于 1月 05, 2021

* [cherry-pick 2.0] Fix unitest test_slice (#29740)

Before this commit, test_slice use old api `dygraph_to_static_func` to use Dynamic-t-Static and use Executor explicitly，which is not recommended to users.
After fixed, use recommended API `paddle.jit.to_static` to replace `dygraph_to_static_func`, which won't trigger the random exception on coverage CI.

* [cherry-pick 2.0][Dy2Stat] Support grammar: for ele in var[idx] (#29541)

Support to transformfor ele in var stms in which var is a slice of Tensor.

* [cherry-pick 2.0][Dy2Stat] Fix bug for loop: a variable is used and created in loop, but used before created (#29769)

faeee3c3

[cherry-pick 2.0] Support dygraph quant model and avoid the scale to be infinity (#30098) · 3fe71d0a

由 cc 提交于 1月 05, 2021

* fix ininite scale values (#29386)

* Support dygraph quant model (#29927)

* Avoid the scale to be infinity in quant2_int8_mkldnn_pass, test=develop
* support quantized model for paddle2.0 dygraph, test=develop
Co-authored-by: NWojciech Uss <wojciech.uss@intel.com>

3fe71d0a

G

fix test=release/2.0 (#30045) · 6e2066b0
由 gongweibao 提交于 1月 05, 2021

6e2066b0

[cherry pick]Set FLAGS_selected_gpus for spawn (#29962) (#30097) · cda7397f

由 Chen Weihang 提交于 1月 05, 2021

Set FLAGS_selected_gpus for spawn.

When the child process starts, it will inherit the configuration of the main process and set the FLAGS once, but the environment variable has not been set at this time, which leads to the FLAGS_selected_gpus is keep same with mainprocess(usually empty), so manually update the flags here.

注：增加了一个单测，又移除了，单测打印显示CI机器nvidia-smi只有两张卡，需要大于两张卡才能测这个问题

cda7397f

fix large scale memory (#30035) (#30085) · e3975223

由 tangwei12 提交于 1月 05, 2021

* memory holder optimize

Change-Id: Ic91af8ac6f2853336d28a9fbbc5e8d0c57b5d05e

* memory holder optimize

Change-Id: I2fd1c14ecc17f5d5ce88b87890381ea801e6367f

* fix large scale memory holder

Change-Id: Ief0992b02b00220e16c72cc637a56e7b5788140f

* fix large scale memory holder

Change-Id: I910142a3952ead643a5604f8f80955f3e6efe655

e3975223

C

[cherry-pick] Add mkldnn interpolate op, support manual enable mkldnn interpolate op (#30083) · 9a6926f5
由 cc 提交于 1月 05, 2021

9a6926f5

04 1月, 2021 3 次提交
- Z
  [cherry pick 2.0]support deepcopy for Layer/Tensor/Paramerbase (#29387) (#29873) · c06350c9
  由 Zhou Wei 提交于 1月 04, 2021
```
* support deepcopy for Layer/Tensor/Paramerbase

* fix some code
```
  c06350c9
- W
  
  make lite subgraph support multiple tensor precision. (#30055) · 878b6972
  由 Wilber 提交于 1月 04, 2021
  
  878b6972
- S
  
  fix op version checker of pass bug (#30028) (#30084) · 477b0c46
  由 Shang Zhizhou 提交于 1月 04, 2021
  
  477b0c46
31 12月, 2020 6 次提交
- L
  add the paddle.distributed.split api (#29970) (#30041) · 84c2315a
  由 lilong12 提交于 12月 31, 2020
```
* add distributed.split, test=develop
```
  84c2315a
- L
  fix the bug in pipeline data parallelism (#29731) (#29918) · f0e04e1f
  由 lilong12 提交于 12月 31, 2020
```
* update, test=develop
```
  f0e04e1f
- L
  [Cherry-pick] Disable gloo by default #29559 #29805 (#29601) · 640f8cf0
  由 lilong12 提交于 12月 31, 2020
```
* update, test=develop (#29559)

* Disable gloo by default (#29805)

* update, test=develop

* update, test=develop
```
  640f8cf0
- Z
  [cherry-pick] hardsigmoid add attr slope and offset (#29999) (#30032) · 38f83788
  由 zhupengyang 提交于 12月 31, 2020
```
test=develop
```
  38f83788
- X
  [cherry-pick] add alias for upsample (#29984) · 5d5faba8
  由 xiaoting 提交于 12月 31, 2020
```
* add alias for upsample, test=develop

* add alias for upsample

* fix example
```
  5d5faba8
- C
  [cherry-pick]update release/2.0 readme (#30015) · a01b7b7e
  由 Chen Long 提交于 12月 31, 2020
```
* update readme

* update readme test=document_fix
```
  a01b7b7e
30 12月, 2020 3 次提交
- W
  
  fix the state_dict bug for the xpu (#30008) · 9859afa9
  由 wawltor 提交于 12月 30, 2020
  
  9859afa9
- C
  [cherry-pick] Fix 2.0 bugs (#29992) · faf2bb39
  由 Chen Long 提交于 12月 30, 2020
```
* fix doc bugs test=document_fix

* fix code bugs test=document_fix

* fix code bugs test=document_fix

* fix doc bugs test=document_fix

* fix doc bugs test=document_fix

* fix doc bugs test=document_fix
```
  faf2bb39
- L
  Fix rotation bug when use cv2 backend (#29933) (#29982) · d6a4f89a
  由 LielinJiang 提交于 12月 30, 2020
```
* fix cv2 rotation
```
  d6a4f89a
29 12月, 2020 10 次提交

[Kunlun] 2.0 cherry-pick:Support for Baidu Kunlun XPU multi card training (#29713) · 847aa172

由 liuyuhui 提交于 12月 29, 2020

* [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337)

* [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574)

* [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29926)

* add bkcl.so in whl for kunlun (#29947)

* [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor  (#29961)
Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>

847aa172

[Cherry-pick] Complex network execute support (#29905) · 91ebc460

由 Chen Weihang 提交于 12月 29, 2020

* [Complex] Add support for complex grad accumulated (#29889)

* add support for complex grad accumulated

* add unittest for coverage

* update test dtype

* remove useless blank line

* [Complex] Handle complex to real after type promotion (#29855)

* try to add fwd op input dtypes

* refactor base impl

* return tmp_ins after dygraph prepare data

* fix typo found in debug

* polish comment & add complex net test

* revert detail change

* fix unittest failed

* add complex kernel condition control

* fix xpu test failed & polish comment

* polish details by review comments

* Complex op test (#29753)

* delete no need to calculate inputs in dygraph op_test

* delete no need to calculate inputs in dygraph op_test

* change grad elementwise_mul for complex types (#29757)

* add conj op for complex types

* add conj for complex types

* add more test case

* add conj_op test

* modify conj api and impl

* add complex type for fill_constant_op xpu

* add setConstant for complex type

* remove complex conj test file

* user define grad for test_conj_op

* add test case for static mode of conj api

* modify conj doc

* change input args name to x

* remove useless codes

* conj support real types

* add conj test case for real number

* delete no need to calculate inputs in dygraph op_test

* delete no need to calculate inputs in dygraph op_test

* modify grad of mul for complex types

* fix the grads of inputs args order not match bug

* change the grad of div when complex types (#29804)

* change the grad of div when complex types

* fix the grads of inputs args order not match bug
Co-authored-by: Nchentianyu03 <chentianyu03@baidu.com>

91ebc460

石

[cherry-pick] #26920 , #22924 (#29948) · bea300dd
由石晓伟提交于 12月 29, 2020

bea300dd
C

[cherry-pick] map matmul/squeeze2+matmul/reshape2+matmul to mul #29911 (#29980) · 160b3477
由 cc 提交于 12月 29, 2020

160b3477
W

Support mips (#29943) · 5a8d43bb
由 Wilber 提交于 12月 29, 2020

5a8d43bb
T
cherry pick heter ps (#29955) · a839ddca
由 Thunderbrook 提交于 12月 29, 2020
```
* cherry pick heter ps

* 　CMakeList
```
a839ddca
W
[Inference] FLAGS_call_statck is turned on default when ON_INFER=ON (#29800) · fae406ae
由 Wilber 提交于 12月 29, 2020
```
* [Inference] FLAGS_call_statck is turned on default when ON_INFER=ON

* cherry-pick 29828
```
fae406ae
W

[Inference] Solve 2.0 trt performance reduce compare 1.8. (#29925) (#29964) · bf653d8e
由 Wilber 提交于 12月 29, 2020

bf653d8e
L
Fix Conv2DTanspose bug when padding='same' (#29915) (#29936) · acb29ff8
由 LielinJiang 提交于 12月 29, 2020
```
* fix conv_transpose bug when padding=same
```
acb29ff8

[cherry-pick] clean redundant API alias in 2.0 - part 1 #29928 (#29960) · c9c835b5

由 XiaoguangHu 提交于 12月 28, 2020

* [cherry-pick] cherry-pick of PR#29928

* delete paddle.metric.chunk_eval and paddle.metric.mean_iou

* delete paddle.nn.clip and paddle.nn.clip_by_norm

* delete paddle.nn.functional.activation.hard_sigmoid and paddle.nn.functional.activation.hard_swish

* [cherry-pick] cherry-pick of PR#29928

* fix extension import error

c9c835b5

28 12月, 2020 1 次提交
- T
  support some shape for matmul and cast in xpu place (#29900) (#29907) · d84b8e83
  由 taixiurong 提交于 12月 28, 2020
```
* support some shape in matmul and cast

* modify matmul
```
  d84b8e83

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致