提交 · 8c3decd8d464de1126e355352c312936f92bf4ae · Crayon鑫 / Paddle

27 10月, 2021 11 次提交

W
add dcnv2 trt plugin (#36612) · 8c3decd8
由 wangxinxin08 提交于 10月 27, 2021
```
* add dcnv2 plugin
```
8c3decd8
Z

fix ernie serialize problem (#36769) · d6b1beb0
由 zlsh80826 提交于 10月 27, 2021

d6b1beb0

Added fp32 / bf16 forward and backward elementwise_div_mkldnn operator (#36158) · e92e6b06

由 piotrekobiIntel 提交于 10月 27, 2021

* Add WIP version of elementwise_div_mkldnn without working dy grad

* Add dy gradient calculation implementation, disable broadcast tests

* Readd removed tests from static_mode_white_list

* Add bfloat16 gradient tests, remove int8 and uint8 support

* - Change the way dy grad is calculated to improve performance
- Refactor BinaryMKLDNNHandler to use a default parameter

* Change copyright year

* Refactor as suggested

* Attempt to bypass CI Approval
not accepting max_relative_error

* Fix formatting issue

e92e6b06

Add LRUCache for fft plans (#36646) · 737992eb

由 Feiyu Chan 提交于 10月 27, 2021

* WIP: add cache

* delete move constructor and operator= for CuFFTHandle and FFTConfig

* remove log from CuFFTHandle and FFTConfig

* add lrucache for fft rocm backend

* disable LRUCache when CUFFT_VERSION >= 10200

* disbale copy and move for hipFFTHandle; format code

* clean debug code
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>

737992eb

Fused transformer encoder layer and fused feedforward layer (#36604) · 9f3613f3

由 zhangkaihuo 提交于 10月 27, 2021

本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

9f3613f3

B
add matmul_v2 to v1 CPU pass and fix matmul dim error (#36731) · d5245a35
由 baoachun 提交于 10月 27, 2021
```
* fix matmul dim error

* fix wrong dim check in matmul
```
d5245a35

fix fftshift/ifftshift on static mode (#36748) · 34b6860e

由 Feiyu Chan 提交于 10月 27, 2021

* fix fftshift/ifftshift on static mode
* update roll_op version
* add more test cases for fftshift/ifftshift

34b6860e

T

add fp16 unittests for kl2 (#36583) · 6838a187
由 taixiurong 提交于 10月 27, 2021

6838a187
W

enable trt test check and fix trt ut error（3/3） (#36581) · 8c1c72af
由 Wilber 提交于 10月 27, 2021

8c1c72af

add paddle.linalg.eigvalsh API (#35615) · 9f9ed3ae

由 huangjun12 提交于 10月 27, 2021

* add eigvalsh with is_test

* add eigvalsh op

* fix backward bug

* forward and backward, float and complex, unittest

* remove eigvalsh_helper.h

* remove changes of cusolver.h

* fix unittest

* fix unittest bug

* update code following eigh

* fix test

* update lapack

* pull develop

* update funcor

* fix unittest bug

* fix details

* add tensor_method_func

* fix notes

9f9ed3ae

W

Fix inverse in fake quant (#36762) · 542ba214
由 whs 提交于 10月 27, 2021

542ba214

26 10月, 2021 14 次提交

Add fused attention op backward and python layer. (#36498) · 5119428e

由 Li Min 提交于 10月 26, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

5119428e

F

roll_op: support Tensor as input for shifts (#36727) · 7b1e30fc
由 Feiyu Chan 提交于 10月 26, 2021

7b1e30fc
Z

Add roi_align grad (#36724) · 236ed94d
由 zhulei 提交于 10月 26, 2021

236ed94d
L
[new-exec] cache exception in child thread (#36692) · 87fbbd36
由 Leo Chen 提交于 10月 26, 2021
```
* cache exception in child thread

* add ut

* fix ut
```
87fbbd36

[new-exec] Add cancel for thread pool (#36688) · fe6dbdd3

由 liutiexing 提交于 10月 26, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* update

* update

* update Error MSG

* update EventsWaiter

* Add Cancel For ThreadPool

* Add UT for Cancel

fe6dbdd3

L
Move fused_attention and fused_feedforward functional api path to incubate (#36704) · 9aeca2f1
由 Li Min 提交于 10月 26, 2021
```
将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。
```
9aeca2f1

[NPU] fix argsort op, test=develop (#36576) · 3523bbe8

由 Qi Li 提交于 10月 26, 2021

* [NPU] fix argsort op, test=develop

* remove debug files, test=develop

* fix typo, test=develop

* address review comments, test=develop

3523bbe8

fix wrong trt dim when input dim is 2 (#36614) · 43dcf235

由 baoachun 提交于 10月 26, 2021

* fix wrong trt dim when input dim is 2

* update leaky_relu and instance_norm converter unit test

* add instance_norm input dim check

43dcf235

Z
Fix the null ptr bug in build_cinn_pass. (#36698) · 28bab073
由 Zhen Wang 提交于 10月 26, 2021
```
* Fix the null ptr bug in build_cinn_pass.

* Add test for empty&ctrl var.
```
28bab073
L

enable flags_benchmark for dygraph (#36686) · 21bece3f
由 Leo Chen 提交于 10月 26, 2021

21bece3f

[Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, matmul,... · 93c591e2

由 Wangzheee 提交于 10月 26, 2021

[Paddle-Inference]Add MatmulV2ToMatmul convert Pass, fix (matmul_v2, matmul, mul) convert pass, fix (matmul, mul) op_teller (#36652)

* new_Matmul2ToMatmulToMul

* new_Matmul2ToMatmulToMul

* fix paddle_pass_builder

* fix paddle_pass_builder

* fix paddle_pass_builder

* tem

* tem

* Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass

* Add MatmulV2ToMatmul convert Pass; MatmulV2ToMul convert Pass

* add matmul_broadcast_unitest

* fix op_teller

93c591e2

J
Optimize FasterTokenizer (#36701) · 290ded7a
由 Jack Zhou 提交于 10月 26, 2021
```
* optimize fast tokenizer
```
290ded7a

Support various length support for SelectedRows in GLOO::AllGather (#36637) · eca78a9f

由 xiongkun 提交于 10月 26, 2021

* In cpu parallel using gloo, add various length support for SelectedRows

* fix bug

* fix bugs

* fix by code review

* remove timeout

eca78a9f

F

Pool3d 2.0 (#36545) · 229bae81
由 feng_shuai 提交于 10月 26, 2021

229bae81

25 10月, 2021 10 次提交

Z

add ctr accessor (#36601) · cea1ba88
由 zhaocaibei123 提交于 10月 25, 2021

cea1ba88
A
[NPU] modifications for model ernie-1.0 (#36642) · 19b02d95
由 Aganlengzi 提交于 10月 25, 2021
```
* [NPU] modifications for model ernie-1.0

* rollback 503003 and change cast to dtype
```
19b02d95

add op: fused_feedforward(backward) (#35611) · 2dd0a46a

由 zhangkaihuo 提交于 10月 25, 2021

这个PR是fused_feedforward反向的代码

相关kernel实现：fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias

fused_feedforward是一个融合算子，该算子对transformer模型的feed forward层的算子进行融合和封装，使得前端只呈现一个接口，通过融合减少部分访存和kernel launch的时间，以此提升性能。

2dd0a46a

Add bincount op (#36317) · 39f19127

由 smallv0221 提交于 10月 25, 2021

* Add bincount op

* upload cpu version

* fix unitest

* fix unittest

* fix unittest

* fix en doc

* add more test

* fix en doc

* add more test case

* fix test

* fix input vailidation

* fix input check

* fix unittest

* fix test

* fix en doc

39f19127

T
CI build PR and dev whl (#36532) · e16fe48d
由 tianshuo78520a 提交于 10月 25, 2021
```
CI build PR and dev whl
```
e16fe48d

Create CinnCompiler class for compiling subgraphs found by build_cinn_pass. (#36562) · 4c460378

由 Zhen Wang 提交于 10月 25, 2021

* Init the functions of CinnCompiler.

* Add the unit test for CinnCompiler.

* Fix some compilation errors.

* Update the UT of cinn_compiler.

* Use Decomposer&OpFusion passes in CinnCompiler::CompileGraph.

* Update some comments.

* Uncomment some includes in build_cinn_pass.cc.

* Use refs instead of ptrs as returned types of FindGraph & Compile in
CinnCompiler.

* Use the merged CinnGraphSymbolization functions in CinnCompiler.

4c460378

add some ops to train ssd on kunlun (#36407) · 50778ad6

由 TTerror 提交于 10月 25, 2021

* add some ops to train ssd on kunlun

* add some ops to train ssd on kunlun

* add some ops to train ssd on kunlun

* update cast op unittest

* update cast op unittest

* update cast op unittest

* update xpu cmake

* update cast unittest

50778ad6

[new-exec] Add events waiter (#36480) · cdb9bfa3

由 liutiexing 提交于 10月 25, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* update

* update

* update Error MSG

* update EventsWaiter

cdb9bfa3

W

Fix grid sampler while input size is [1] (#36183) · eff3ee5e
由 whs 提交于 10月 25, 2021

eff3ee5e

add op: fused_feedforward(forward) (#35843) · b18cbfb2

由 zhangkaihuo 提交于 10月 25, 2021

这个PR只包含fused_feedforward前向的代码。

相关kernel实现：fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias

b18cbfb2

24 10月, 2021 1 次提交
- Z
  
  Add the macro `-DPADDLE_WITH_CINN`. (#36660) · e2173b68
  由 Zhen Wang 提交于 10月 24, 2021
  
  e2173b68
23 10月, 2021 4 次提交

add cinn graph symbolization (#36417) · bbd4bd73

由 jiangcheng 提交于 10月 23, 2021

* add cinn graph symbolization

* fix some bug

* add paddle scope to cinn scope

* add paddle scope to CINN scope in Symbolization, and add feed op when build cinn pass

* fix some bug

* fix some bug by review advices

* optimize code problem

* revert build_cinn_pass and move the change to https://github.com/PaddlePaddle/Paddle/pull/36503

* fix some bug after co-compilation

* perfect single test script

* remove scope and rename feed_target to input_tensor

* using std::unordered_map instead of absl::flat_hash_map

* fix single test bug

* revert to preverion for WITH_CINN has add in later PR

* full error information for CI

* full enfore information for CI pass

bbd4bd73

W
disable padding if dynamic shape (#36648) · 99e396f8
由 wenbin 提交于 10月 23, 2021
```
* disable padding if dynamic shape

* add parentheses

* correct
```
99e396f8
B

fix interpolate mkldnn op error (#36623) · f6d82526
由 baoachun 提交于 10月 23, 2021

f6d82526
W
add file exists check (#36628) · 425db7c8
由 Wilber 提交于 10月 23, 2021
```
* add file check

* add ut
```
425db7c8

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致