提交 · 7b45a46e13fe057ca12a001dac7b8d6d24d9f211 · PaddlePaddle / Paddle

11 10月, 2021 13 次提交

Z
Add FLAGS_allreduce_record_one_event to remove event waiting number (#36263) · 7b45a46e
由 Zeng Jinle 提交于 10月 11, 2021
```
* add FLAGS_allreduce_record_one_event

* add more comments

* fix ut

* improve coverage

* fix ut, improve coverage
```
7b45a46e

Add nn.functional.sparse_attention and some test cases, test=develop (#35757) · 85b77232

由 Liu-xiandong 提交于 10月 11, 2021

Add paddle.nn.functional.sparse_attention API

本个PR主要将sparse_attention功能在python层进行了一层封装，OP的主体代码见：#PR35676

此外，对于封装的python 接口，增加了相应的单测。

85b77232

Y

fix_dp_grad_merge_with_grad_clip_by_global_norm (#36334) · 1026052c
由 Yuang Liu 提交于 10月 11, 2021

1026052c

[Paddle-ASP] Revise 4d tensor sparsity mask pattern for conv2d sparsity (#36054) · 00245cfd

由 zlsh80826 提交于 10月 11, 2021

Sparse tensor core for convolution requires the input channel dimension is 2:4 structed sparse.
So we have to mask the input channel dimension for using sparse tensor core

00245cfd

add reshard module (#35779) · c38b0488

由 caozhou 提交于 10月 11, 2021

* add reshard module

* fix conflict

* update reshard module

* update and add unitest

* update reshard module and unitest

* add more unitests

c38b0488

Y

fix multi-node (#36329) · 7a724ddb
由 yaoxuefeng 提交于 10月 11, 2021

7a724ddb
W
enhance yolobox trt plugin (#34128) · 71cb3ff8
由 wangxinxin08 提交于 10月 11, 2021
```
* enhance yolobox plugin
```
71cb3ff8

[NPU] fix matmul_v2 and utils.run_check, test=develop (#36164) · 7850f7ce

由 Qi Li 提交于 10月 11, 2021

* [NPU] fix matmul_v2 and utils.run_check, test=develop

* remove debug files, test=develop

* fix install_check, test=develop

* fix doc, test=develop

* fix review comments, test=develop

7850f7ce

Q
[NPU] fix set_value, test=develop (#36272) · 83541fd4
由 Qi Li 提交于 10月 11, 2021
```
* [NPU] fix set_value, test=develop

* fix typo, test=develop

* fix typo, test=develop
```
83541fd4

add mish trt plugin (#34123) · 2b7b752a

由 wangxinxin08 提交于 10月 11, 2021

* add mish trt plugin, compile & install success, run error. test=develop
* modify code according to review
* add TRT_NOEXCEPT for mish trt plugin
* add unittest for mish trt plugin
* remove unnecessary check of mish in op_teller.cc
* fix some problem of trt8
* add check and modify unittest while converting mish to trt plugin
Co-authored-by: Ndengkaipeng <dengkaipeng@baidu.com>

2b7b752a

B
add skip case in trt converter ut (#36287) · 34bd18ff
由 baoachun 提交于 10月 11, 2021
```
* add skip case in trt converter ut

* disable group_norm trt plugin
```
34bd18ff

Add use_cinn Flag and RunFromCinn in PE (#36107) · 5690666c

由 Huihuang Zheng 提交于 10月 11, 2021

Add use_cinn flag and use it to control whether we run PaddlePaddle using CINN.

Also add:

Replace PaddlePaddle graph with a CINN graph in a pass
PE Method to feed data and run the graph by CINN

5690666c

J

Add skip case for conv2d convert test (#36301) · 9b987b3d
由 JingZhuangzhuang 提交于 10月 10, 2021

9b987b3d

09 10月, 2021 5 次提交
- Y
  
  Enhance OpTest for bfloat16. (#36079) · 91119271
  由 Yiqun Liu 提交于 10月 09, 2021
  
  91119271
- F
  Add new API 'tensordot' (#36273) · 21dc7f40
  由 From00 提交于 10月 09, 2021
```
* Add new API tensordot

* Set timeout value 400 for UT; Fix format for EN docs

* Set timeout value 1000 for UT; Fix format for EN docs

* Remove some input check

* Coding style improve: don't compare boolean values to True or False
using ==
```
  21dc7f40
- Z
  
  fill_diagonal op fix border cross caused by offset (#36212) · 62e41150
  由 zhiboniu 提交于 10月 09, 2021
  
  62e41150
- Z
  support ClipGradByGlobalNorm in sharding (#36012) · 623df429
  由 zhaoyingli 提交于 10月 09, 2021
```
* support ClipGradByGlobalNorm in sharding

* support ClipGradByGlobalNorm in sharding

* test=allcase
```
  623df429
- W
  fix hasattr(paddle.fluid.ir.PassDesc.OP, '__name__') error (#36229) · d8887afa
  由 wuhuanzhou 提交于 10月 09, 2021
```
对于__getattr__重载后不满足条件的参数，全部抛出AttributeError异常，达到与未重载版本一致。
```
  d8887afa
08 10月, 2021 5 次提交
- Z
  Support CUDA Graph on ParallelExecutor (#36250) · f9591bb1
  由 Zeng Jinle 提交于 10月 08, 2021
```
* support CUDA Graph on PE

* add ut, fix CI compile

* reduce memory consumption

* fix CUDA 10 CI

* improve coverage

* improve python coverage
```
  f9591bb1
- Y
  
  add fs list_files_info (#36224) · ca16e8fd
  由 yaoxuefeng 提交于 10月 08, 2021
  
  ca16e8fd
- Q
  [NPU] BatchNorm support layout of NCL and NLC, test=develop (#35668) · 7cb19f57
  由 Qi Li 提交于 10月 08, 2021
```
* [NPU] support NCL and NCL for BatchNorm, test=develop

* [NPU] remove debug files, test=develop

* update, test=develop
```
  7cb19f57
- H
  add python interface of sub_graph (#36120) · a29ff4c7
  由 huangxu96 提交于 10月 08, 2021
```
Add python interface of subgraph: 1. all_sub_graphs() 2. get_sub_graph(idx)
```
  a29ff4c7
- A
  Added oneDNN BF16 relu (#36265) · 1bd9cfef
  由 arlesniak 提交于 10月 08, 2021
```
* Added oneDNN BF16 relu

* fixed typo

* refactored test, review fixes
```
  1bd9cfef
07 10月, 2021 1 次提交

[OneDNN] Conv op refactor. (#36252) · e9288340

由 Adam Osewski 提交于 10月 07, 2021

* Remove unused header.

* Use ConvMKLDNNHandlerT for conv2d INT8.

* Use absolute module path to import.

e9288340

05 10月, 2021 1 次提交

Added concat BF16/FP32 BWD OneDNN kernel (#35889) · dc4d5719

由 jakpiase 提交于 10月 05, 2021

* tmp

* added concat BF16/FP32 BWD oneDNN kernel

* minor change

* minor change

* fix for CI

* added formatting

* Reverted deleting static keyword

* added reviewers suggestions

* reverted deleting concat bf16 test file

* fixed concat tests

dc4d5719

30 9月, 2021 4 次提交
- L
  
  add test_hessian time out (#36234) · 56b04bc1
  由 levi131 提交于 9月 30, 2021
  
  56b04bc1
- A
  [NPU] modify transpose2 and index_select_grad kernels for model xlnet (#36214) · a66b9fba
  由 Aganlengzi 提交于 9月 30, 2021
```
* [NPU] modify transpose2 and index_select_grad kernels for model xlnet

* add transpose2 int64_t unit test

* add more transpose2 unit tests

* update test_transpose_op_npu.py
```
  a66b9fba
- 李
  Fix raw optim (#36176) · 5e0f199a
  由李季提交于 9月 30, 2021
```
* fix raw optim

* pre-commit test file
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
```
  5e0f199a
- 李
  
  fix the undefined variable bug in dist_transformer file (#36211) · 8af939f1
  由李季提交于 9月 30, 2021
  
  8af939f1
29 9月, 2021 10 次提交

Add basic support for CUDA Graph (#36190) · 21b93c3d

由 Zeng Jinle 提交于 9月 29, 2021

* add basic support for CUDA Graph

* fix ci compile error

* fix LOG print, fix windows CI

* follow comments and update

* small fix for default ctor

* fix rocm compile error

* fix CPU compile error

21b93c3d

Z
add optest for adamw (#36148) · 69eed34d
由 zhaoyingli 提交于 9月 29, 2021
```
* update func name

* skip cpu

* update unittest

* update unittest
```
69eed34d
L
fix cusparse compile problem, test=develop (#36199) · 3eb50715
由 Liu-xiandong 提交于 9月 29, 2021
```
* fix cusparse compile problem, test=develop

* Modify file permissions
```
3eb50715

Add functional autograd API:hessian (#36108) · 1f93582c

由 levi131 提交于 9月 29, 2021

* init functional jacobian api

* finish test with dtype float32

* add float64 test case

* polish code

* use atol=1e-5 with dtype float64

* fix for ci

* set timeout for test_jacobian

* init hessian API

* save status

* polish API docstring

* modify docstring

* add utils.py

* save status

* fix dygraph double grad dtype error when calling for high differential senario

* reinvoke ci

* test_hessian.py is ok

* polish hessian API

* init vhp

* Revert "init vhp"

This reverts commit cbd4d3b66abe82b0ac10721b9eddeb7d82e0a1c8.

* add test for partial_engine.cc

* modify numerical_delta with dtype float32

* merge fix for dtype float64

* spell fix

* polish code

* rm _stop_gradient_pre_process
Co-authored-by: NJiabinYang <360788950@qq.com>

1f93582c

Z
[npu] add box coder (#36171) · 83578cfa
由 zhulei 提交于 9月 29, 2021
```
* [npu] add box coder

* [npu] add box coder
```
83578cfa
P

fix bug of top_k npu op (#36175) · 2b8fd704
由 pangyoki 提交于 9月 29, 2021

2b8fd704

[NPU] Add group norm (#35937) · c79de728

由 zhulei 提交于 9月 29, 2021

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group_norm op

c79de728

[NPU] mod for model bert (#36165) · 7bddf2e8

由 Aganlengzi 提交于 9月 29, 2021

* merge conflict of paddle_gtest_main.cc

* modify FLAGS_npu_precision_mode and default not to call aclSetCompileopt

7bddf2e8

W

[hybrid] Fix model parallel non-distributed param broadcast (#36186) · bec9fc9a
由 WangXi 提交于 9月 29, 2021

bec9fc9a

Add op paddle.device.cuda.get_device_name and paddle.device.cuda.get_device_capability. (#35672) · f703558d

由 hlygit66666 提交于 9月 29, 2021

* add op paddle.device.cuda.get_device_name

* fix some bugs

* fix some bugs

* fix error message bugs

* fix en docs

* fix bugs

* fix bugs

* fix bugs

* add error message test case

* add get_device_name and get_device_capability

* fix review

* fix docs bug

* fix docs

* fix docs

f703558d

28 9月, 2021 1 次提交
- L
  Add sparse_attention api, test=develop (#35676) · 6b587e93
  由 Liu-xiandong 提交于 9月 28, 2021
```
Add sparse_attention OPs, python api will be added in next pr
```
  6b587e93

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功