提交 · 7b45a46e13fe057ca12a001dac7b8d6d24d9f211 · Crayon鑫 / Paddle

11 10月, 2021 14 次提交
- Z
  Add FLAGS_allreduce_record_one_event to remove event waiting number (#36263) · 7b45a46e
  由 Zeng Jinle 提交于 10月 11, 2021
```
* add FLAGS_allreduce_record_one_event

* add more comments

* fix ut

* improve coverage

* fix ut, improve coverage
```
  7b45a46e
- L
  Add nn.functional.sparse_attention and some test cases, test=develop (#35757) · 85b77232
  由 Liu-xiandong 提交于 10月 11, 2021
```
Add paddle.nn.functional.sparse_attention API

    本个PR主要将sparse_attention功能在python层进行了一层封装，OP的主体代码见：#PR35676

    此外，对于封装的python 接口，增加了相应的单测。
```
  85b77232
- J
  
  added missing bf16 ops (#36291) · 14393876
  由 jakpiase 提交于 10月 11, 2021
  
  14393876
- Z
  
  Add more tests and fix bugs for cudnn_norm_conv_test and cudnn_bn_and_relu_test (#36314) · a679fcbb
  由 Zhang Zheng 提交于 10月 11, 2021
  
  a679fcbb
- N
  Add functor_primitives.h for kernel primtive api (#36203) · 830debc2
  由 niuliling123 提交于 10月 11, 2021
```
* Add functor_primitives.h for kernel primtive api

* update

* move namespace kps

* subFunctor init_data

* delete InvalidArgumentError
```
  830debc2
- Y
  
  fix multi-node (#36329) · 7a724ddb
  由 yaoxuefeng 提交于 10月 11, 2021
  
  7a724ddb
- W
  enhance yolobox trt plugin (#34128) · 71cb3ff8
  由 wangxinxin08 提交于 10月 11, 2021
```
* enhance yolobox plugin
```
  71cb3ff8
- Q
  [NPU] fix matmul_v2 and utils.run_check, test=develop (#36164) · 7850f7ce
  由 Qi Li 提交于 10月 11, 2021
```
* [NPU] fix matmul_v2 and utils.run_check, test=develop

* remove debug files, test=develop

* fix install_check, test=develop

* fix doc, test=develop

* fix review comments, test=develop
```
  7850f7ce
- Q
  [NPU] fix set_value, test=develop (#36272) · 83541fd4
  由 Qi Li 提交于 10月 11, 2021
```
* [NPU] fix set_value, test=develop

* fix typo, test=develop

* fix typo, test=develop
```
  83541fd4
- Q
  
  [NPU] fix softmax_with_cross_entropy in dygraph, test=develop (#36297) · 11061325
  由 Qi Li 提交于 10月 11, 2021
  
  11061325
- X
  
  use unified external error message for cufft api (#36114) · 642aaa2e
  由 Xiaoxu Chen 提交于 10月 11, 2021
  
  642aaa2e
- W
  add mish trt plugin (#34123) · 2b7b752a
  由 wangxinxin08 提交于 10月 11, 2021
```
* add mish trt plugin, compile & install success, run error. test=develop
* modify code according to review
* add TRT_NOEXCEPT for mish trt plugin
* add unittest for mish trt plugin
* remove unnecessary check of mish in op_teller.cc
* fix some problem of trt8
* add check and modify unittest while converting mish to trt plugin
Co-authored-by: Ndengkaipeng <dengkaipeng@baidu.com>
```
  2b7b752a
- B
  add skip case in trt converter ut (#36287) · 34bd18ff
  由 baoachun 提交于 10月 11, 2021
```
* add skip case in trt converter ut

* disable group_norm trt plugin
```
  34bd18ff
- H
  Add use_cinn Flag and RunFromCinn in PE (#36107) · 5690666c
  由 Huihuang Zheng 提交于 10月 11, 2021
```
Add use_cinn flag and use it to control whether we run PaddlePaddle using CINN.

Also add:

Replace PaddlePaddle graph with a CINN graph in a pass
PE Method to feed data and run the graph by CINN
```
  5690666c
09 10月, 2021 5 次提交
- Z
  
  Implement Fused BN + Add + Relu with cudnnFusedOps API. (#35955) · 7e6c0cee
  由 Zhang Zheng 提交于 10月 09, 2021
  
  7e6c0cee
- Y
  
  Enhance OpTest for bfloat16. (#36079) · 91119271
  由 Yiqun Liu 提交于 10月 09, 2021
  
  91119271
- Z
  Add const for OpDesc::id() and VarDesc::id() (#36298) · cb620ca6
  由 Zeng Jinle 提交于 10月 09, 2021
```
* add const OpDesc id()

* add const for VarDesc::id()
```
  cb620ca6
- Z
  
  fill_diagonal op fix border cross caused by offset (#36212) · 62e41150
  由 zhiboniu 提交于 10月 09, 2021
  
  62e41150
- W
  C++ support register pass via PassDesc (#36095) · 2fd8deea
  由 wuhuanzhou 提交于 10月 09, 2021
```
支持C++开发注册GeneratePass，简化针对fusion等子图优化场景开发方式。
```
  2fd8deea
08 10月, 2021 6 次提交
- J
  Fix for oneDNN conv op (#36284) · 57e8cbec
  由 jakpiase 提交于 10月 08, 2021
```
* fix for conv op

* Minor change
```
  57e8cbec
- Z
  Support CUDA Graph on ParallelExecutor (#36250) · f9591bb1
  由 Zeng Jinle 提交于 10月 08, 2021
```
* support CUDA Graph on PE

* add ut, fix CI compile

* reduce memory consumption

* fix CUDA 10 CI

* improve coverage

* improve python coverage
```
  f9591bb1
- Q
  [NPU] BatchNorm support layout of NCL and NLC, test=develop (#35668) · 7cb19f57
  由 Qi Li 提交于 10月 08, 2021
```
* [NPU] support NCL and NCL for BatchNorm, test=develop

* [NPU] remove debug files, test=develop

* update, test=develop
```
  7cb19f57
- H
  add python interface of sub_graph (#36120) · a29ff4c7
  由 huangxu96 提交于 10月 08, 2021
```
Add python interface of subgraph: 1. all_sub_graphs() 2. get_sub_graph(idx)
```
  a29ff4c7
- A
  Added oneDNN BF16 relu (#36265) · 1bd9cfef
  由 arlesniak 提交于 10月 08, 2021
```
* Added oneDNN BF16 relu

* fixed typo

* refactored test, review fixes
```
  1bd9cfef
- Z
  
  fix cast cuda implementation (#36266) · 9814f895
  由 Zeng Jinle 提交于 10月 08, 2021
  
  9814f895
07 10月, 2021 1 次提交

[OneDNN] Conv op refactor. (#36252) · e9288340

由 Adam Osewski 提交于 10月 07, 2021

* Remove unused header.

* Use ConvMKLDNNHandlerT for conv2d INT8.

* Use absolute module path to import.

e9288340

05 10月, 2021 1 次提交

Added concat BF16/FP32 BWD OneDNN kernel (#35889) · dc4d5719

由 jakpiase 提交于 10月 05, 2021

* tmp

* added concat BF16/FP32 BWD oneDNN kernel

* minor change

* minor change

* fix for CI

* added formatting

* Reverted deleting static keyword

* added reviewers suggestions

* reverted deleting concat bf16 test file

* fixed concat tests

dc4d5719

30 9月, 2021 3 次提交
- Y
  
  add slotrecord datafeed (#36099) · 0a3dbe8a
  由 yaoxuefeng 提交于 9月 30, 2021
  
  0a3dbe8a
- W
  
  fix yolo (#36240) · c12176e8
  由 wenbin 提交于 9月 30, 2021
  
  c12176e8
- A
  [NPU] modify transpose2 and index_select_grad kernels for model xlnet (#36214) · a66b9fba
  由 Aganlengzi 提交于 9月 30, 2021
```
* [NPU] modify transpose2 and index_select_grad kernels for model xlnet

* add transpose2 int64_t unit test

* add more transpose2 unit tests

* update test_transpose_op_npu.py
```
  a66b9fba
29 9月, 2021 10 次提交
- Z
  Add basic support for CUDA Graph (#36190) · 21b93c3d
  由 Zeng Jinle 提交于 9月 29, 2021
```
* add basic support for CUDA Graph

* fix ci compile error

* fix LOG print, fix windows CI

* follow comments and update

* small fix for default ctor

* fix rocm compile error

* fix CPU compile error
```
  21b93c3d
- L
  fix cusparse compile problem, test=develop (#36199) · 3eb50715
  由 Liu-xiandong 提交于 9月 29, 2021
```
* fix cusparse compile problem, test=develop

* Modify file permissions
```
  3eb50715
- L
  Spinlock (#36030) · a9ea41c5
  由 liutiexing 提交于 9月 29, 2021
```
* add align for WorkQueue

* add spinlock

* merge spinlock
```
  a9ea41c5
- Y
  
  add slot record dataset (#36200) · 79bd5f90
  由 yaoxuefeng 提交于 9月 29, 2021
  
  79bd5f90
- Z
  [npu] add box coder (#36171) · 83578cfa
  由 zhulei 提交于 9月 29, 2021
```
* [npu] add box coder

* [npu] add box coder
```
  83578cfa
- P
  
  fix bug of top_k npu op (#36175) · 2b8fd704
  由 pangyoki 提交于 9月 29, 2021
  
  2b8fd704
- Z
  [NPU] Add group norm (#35937) · c79de728
  由 zhulei 提交于 9月 29, 2021
```
* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group_norm op
```
  c79de728
- A
  [NPU] mod for model bert (#36165) · 7bddf2e8
  由 Aganlengzi 提交于 9月 29, 2021
```
* merge conflict of paddle_gtest_main.cc

* modify FLAGS_npu_precision_mode and default not to call aclSetCompileopt
```
  7bddf2e8
- Y
  
  Implement the grad and enhance the cache of norm_convolution fusion ops. (#36168) · 767050d9
  由 Yiqun Liu 提交于 9月 29, 2021
  
  767050d9
- Z
  
  remove wait if no fetch (#36150) · b3d2dc7b
  由 Zeng Jinle 提交于 9月 29, 2021
  
  b3d2dc7b

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致