提交 · b95c5ae0413cf121c0fa2a80a46cdca532c08df7 · 机器未来 / Paddle

08 9月, 2021 16 次提交

L
add clip_by_norm fp16 kernel (#35446) · 7aa4d879
由 Leo Chen 提交于 9月 08, 2021
```
* add clip_by_norm fp16 kernel

* add ut
```
7aa4d879

由 Shang Zhizhou 提交于 9月 08, 2021

* update slice plugin

* add test

* fix code style

* fix trt6

* update test

* fix test

* add timeout

* update trt version

* update cmake

28abd5d8

Intergrate GLOOParallelContext to support Multi-CPU Core for Dygraph DataParallel (#35154) · 51cc73f0

由 xiongkun 提交于 9月 08, 2021

* can pass the fake test

* add files

* modify cmake to pass windows-ci

* for ci pass

* WITH_GLOO=ON

* for pass coverage test

* add cpuonly testcase

* add

* disable nccl when compile with cuda

* change python version in cpuonly

* add backend argument

* add required gpu

* add required:gpu

51cc73f0

G

fix bug (#35482) · e133d8ef
由 Guoxia Wang 提交于 9月 08, 2021

e133d8ef
Z
Fix scatter_nd_add and gather bug (#35544) · 3c457a38
由 Zeng Jinle 提交于 9月 08, 2021
```
* fix scatter_add_nd and gather bug

* fix gather compile error
```
3c457a38

Enable program passes on Fleet APIs (#34955) · 5f369881

由 Zeng Jinle 提交于 9月 08, 2021

* add fleet api for program pass

* turn on apply pass for CI test

* fix disable fuse_all_optimizer bug

* try to test ci

* fix CI

* fill unspecified op role

* fix fuse_allreduce

* add ut to improve coverage

* remove useless change

* improve c++ coverage

* follow some comments

* test ir pass pipeline

* update doc

* reduce ut time again

5f369881

fix the bug of layer_norm when batch_size=1 (#35480) · ad5f7494

由 zhangkaihuo 提交于 9月 08, 2021

The bug is that access to mean and var is incorrect, and the array will be out of bounds: the shape of mean and var is [batch_size], and the range of thread idx is 0~feature_size, so mean[idx] and var[idx] is incorrect.

When batch_size=1, the correct access is mean[0] and var[0], and a unit test with batch_size=1 is added.

ad5f7494

C

Add FP16 PRelu (#35532) · 4e62af80
由 cc 提交于 9月 08, 2021

4e62af80
L
hidden the auto parallel apis (#35385) · afd1b372
由 lilong12 提交于 9月 08, 2021
```
* update, test=develop
```
afd1b372
L
add checkers for auto parallel apis (#35486) · 39540b0e
由 lilong12 提交于 9月 08, 2021
```
* update, test=develop
```
39540b0e

merge CMakeList.txt manual (#35378) · c4a3e8b4

由 feng_shuai 提交于 9月 08, 2021

* merge CMakeList.txt manual

* add platform for changethreadnum

* repair some bugs according to make error

* do nothing just flush CI

* forget change thread num

* add inplace_atol param for check_output_with_place

* Windows

* std:min and std::max should be change because of windows

c4a3e8b4

L
support weight sharing for pipeline (#35351) · 5199c744
由 lilong12 提交于 9月 08, 2021
```
* support weight sharing
```
5199c744
L
[NPU] release gil before op run (#35370) · db6242e9
由 Leo Chen 提交于 9月 08, 2021
```
* release gil before op run

* support npu grad test

* fix op_test
```
db6242e9
Z

Add op define extra for norm and frobenius norm op. (#35329) · 3dab2e20
由 Zhong Hui 提交于 9月 08, 2021

3dab2e20

add the matmul v2 grad kernel · b3787d1b

由 wawltor 提交于 9月 08, 2021

* add the matmul v2 grad kernel

* relief the test case time

* update the test case for the matmul double grad

* remove the unsed code for the matmul double grad

* update the test case for the double grad matmul

* remove the unused code in dot

b3787d1b

W

[NPU] add get_float_status op and refine NPU check_nan_inf (#35274) · c727ec4a
由 WangXi 提交于 9月 08, 2021

c727ec4a

07 9月, 2021 15 次提交
- Z
  Fix scatter_nd_add doc (#35542) · 1635c02b
  由 Zeng Jinle 提交于 9月 07, 2021
```
* fix scatter_nd_add doc, test=document_fix

* update
test=document_fix
```
  1635c02b
- Y
  
  support multi-node (#35396) · c6e0cedc
  由 yaoxuefeng 提交于 9月 07, 2021
  
  c6e0cedc
- W
  add conv op check for illegal input or attributes (#35337) · 8307b0cb
  由 wangxinxin08 提交于 9月 07, 2021
```
* add conv op check for illegal input or attributes
```
  8307b0cb
- Q
  [NPU] update batch norm op, test=develop (#35223) · cc6d2b07
  由 Qi Li 提交于 9月 07, 2021
```
* [NPU] update batch norm op, test=develop

* add NHWC support for bn, test=develop
```
  cc6d2b07
- X
  fix trace op stack overflow (#35419) · d47a97db
  由 XiangGao 提交于 9月 07, 2021
```
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
```
  d47a97db
- A
  Add DPADDLE_WITH_CUDA for GCC (#35448) · cec36ea6
  由 Aurelius84 提交于 9月 07, 2021
```
* Add DPADDLE_WITH_CUDA for GCC

* polish code
```
  cec36ea6
- F
  [NPU] Add norm_grad kernel (#35237) · cf408949
  由 furnace 提交于 9月 07, 2021
```
* [NPU] fix for test_norm_op_npu

* [NPU] add norm_grad

* [NPU] add CheckAxis for axis

* [NPU] delete debug codes

* norm can not use L2Normalize, norm_grad can use L2NormalizeGrad

* [NPU] delete useless codes

* [NPU] optimize norm_grad OpMaker

* Update python import path
```
  cf408949
- Q
  [NPU] log_softmax_grad, test=develop (#35484) · e928274c
  由 Qi Li 提交于 9月 07, 2021
```
* [NPU] log_softmax_grad, test=develop

* remove debug files, test=develop

* update lookup_table_v2 for CANN 5.0.x, test=develop
```
  e928274c
- J
  Fix for reshape2 oneDNN op (#35455) · 36cdb6e2
  由 jakpiase 提交于 9月 07, 2021
```
* fix for reshape2

* added reviewers sugestions
```
  36cdb6e2
- X
  add AsExtra in data_norm op (#35420) · 7907e241
  由 XiangGao 提交于 9月 07, 2021
```
* add AsExtra in data_norm op

* pass data_layout from python to data_norm op

* fix data_layout in data_norm op
Co-authored-by: Nroot <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
```
  7907e241
- A
  Fix DryRun unittest failed from test_standalon_executor.py (#35433) · 071e8156
  由 Aurelius84 提交于 9月 07, 2021
```
* fix commit

* Open unittest

* fix unittest on Windows

* fix constructor
```
  071e8156
- S
  
  merge from latest develop branch, test=document_fix (#34995) · 1445103b
  由 Sing_chan 提交于 9月 07, 2021
  
  1445103b
- A
  [Dy2Stat]Open test_resnet_amp on Windows (#35323) · 3c8eeb5d
  由 Aurelius84 提交于 9月 07, 2021
```
* open test_resnet_amp on Windows

* disable on Windows CPU CI for timeout

* disable on Windows CPU CI for timeout

* fix code style
```
  3c8eeb5d
- W
  transfer the static.accurcay to v2 op (#35494) · 2b1efc35
  由 wawltor 提交于 9月 07, 2021
```
* transfer the static.accurcay to v2 api

* remove the unused code
```
  2b1efc35
- X
  [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is in… (#35394) · 28b64075
  由 xiayanming 提交于 9月 07, 2021
```
* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug
```
  28b64075
06 9月, 2021 8 次提交

W
support double in deformable conv (#35330) · 266fcbe0
由 wangguanzhong 提交于 9月 06, 2021
```
* support double in deformable conv

* add double for dcn v2
```
266fcbe0

Add fusion_lstm INT8 PTQ (#35334) · 7ef04da6

由 joanna.wozna.intel 提交于 9月 06, 2021

* Add fusion_lstm INT8 PTQ

* Correct mkldnn_cache_capacity and enable fc_lstm_fuse_pass only for this test

* Change mkldnn_cache_capacity

7ef04da6

W
Add grad grad for AvgPool2D (#35388) · 97798f9a
由 Wei Shengyu 提交于 9月 06, 2021
```
* add pool2d grad grad

* dbg

* add unittest

* update format

* add more unittests

* dbg
```
97798f9a

add kernel, stride check (#35106) · 13bbb6b6

由 Double_V 提交于 9月 06, 2021

* add kernel, stride check

* add unitest for param out of range

* delete max limit check

13bbb6b6

[NPU]add depthwise_conv_npu_grad op (#35374) · 4bea0ff1

由 heliqi 提交于 9月 06, 2021

* add depthwise_conv_npu_grad op

* add depthwise_conv_npu_grad op

* add depthwise_conv_npu_grad op

* add NHWC test case

4bea0ff1

W
support numpy dtype and polish code of list index. (#35404) · 60c5adaa
由 WeiXin 提交于 9月 06, 2021
```
* support numpy dtype and polish code of list index.

* polish code.
```
60c5adaa

replase pass with error exception (#35367) · 5675042d

由 Feng Xing 提交于 9月 06, 2021

This PR adds error exception in fused transformer python interface.
The function body are not implemented (will be implemented later).
Following zhiqiu's comment in previous PR-35206 (merged already), it is better to raise an exception instead of using "pass".

5675042d

W

update trt ut. (#35458) · 18934c53
由 Wilber 提交于 9月 06, 2021

18934c53

05 9月, 2021 1 次提交
- F
  [WIP] paddle.where api add broadcast, when x_shape == y_shape, and x_shape != cond_shape (#35092) · ffc3d364
  由 furnace 提交于 9月 05, 2021
```
* where op add broadcast, when x_shape == y_shape, and x_shape != cond_shape

* add static api tests, and delete debug codes
```
  ffc3d364

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致