提交 · d80fe2683d21710a339877715d4d8157669457e3 · PaddlePaddle / Paddle

17 12月, 2021 6 次提交

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

[pten] modify reduce_sum reduce_mean args (#38216) · eaa2363e

由 chentianyu03 提交于 12月 17, 2021

* modify sum mean args

* add GetExpectedPtenKernelArgs for redcue_op

* modify kernel args number

* modify kernel args number

eaa2363e

K

add op/api repeat/interleave (#37981) · a7de0e66
由 kuizhiqing 提交于 12月 17, 2021

a7de0e66

add launch bound to limit the registers usage for volta architecture (#38113) · 18a59822

由 zlsh80826 提交于 12月 17, 2021

From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block.
There are two ways to resolve this problem:
Reduce the threads per block launch configuration
add __launch_bound__ to give information to nvcc compiler for reducing registers usage
this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.

18a59822

N

Delete cub_reduce.h and modified the TensorReduce to TensorReduceFunctorImpl (#38197) · 9a8a4c77
由 niuliling123 提交于 12月 17, 2021

9a8a4c77
L
[BugFix]: Elementwise branch selection and Broadcast dimension merge (#38204) · e097a748
由 limingshu 提交于 12月 17, 2021
```
* fix_bugs_for_elementwise_branch_selection

* fix merge_dims bugs

* fix all influenced file
```
e097a748

16 12月, 2021 11 次提交

Faster implementation of CPU kernel for ROI Align operator (#37848) · 023ff4f5

由 Tomasz Socha 提交于 12月 16, 2021

* Faster implementation of CPU kernel for ROI_ALIGN Operator

* Add missing variable to CUDA roi_align_op

* Style

* Fix boundaries

* Rename variables for indexes calculation

* Remove unnecessary emplace

* Revert "Remove unnecessary emplace"

This reverts commit c10e87f7fb812f1a672fde32f2690a97d47e2f5a.

* Style

023ff4f5

C

pylayer support HIP (#38184) · 2e76d5ad
由 chentianyu03 提交于 12月 16, 2021

2e76d5ad

Add arc hyperbolic function op (#37076) · 36b7368d

由 xiaoting 提交于 12月 16, 2021

* add activation

* update activation_op

* add unitest for activation

* fix acosh for init, test=develop

36b7368d

王

Revert "modify the fix_seed attribute in dropout op is a def... · 464f2af8

由王明冬提交于 12月 16, 2021

Revert "modify the fix_seed attribute in dropout op is a def attribute.test=develop (#38100)" (#38127)

This reverts commit f44add7b.

464f2af8

[PTen] Add register_ctx_kernel marco and move scale kernel (#38121) · af498677

由 Chen Weihang 提交于 12月 16, 2021

* add register_ctx_kernel and move scale kernel

* polish details by reviewer comment

* fix xpu compile failed

* fix cmake error

af498677

D
[psgpu]add checknan print and fix trainer device (#38131) · 092839d6
由 danleifeng 提交于 12月 16, 2021
```
* trainer_device fix and checknan tool for psgpu;test=develop

* disable show_one_table;test=develop
```
092839d6
L
Add fmax and fmin operators (#37826) · dd3afc9d
由 LJQ❤️ 提交于 12月 16, 2021
```
Add elementwise_fmax and elementwise_fmin operators
```
dd3afc9d

Add sparse_attention mask ,test=develop (#37973) · fa463b90

由 Liu-xiandong 提交于 12月 16, 2021

Add key_padding_mask and attn_mask in sparse_attention Api

1.Key padding mask is a tensor with dimensions [batch_size, seq_len], and attention mask is a tensor with dimensions [seq_len, seq_len]. The data types of the two masks are consistent with Q, K, and V, which are float32 or float64. If the value in Mask is 0, it means that the position needs to be masked.

2.The changed files are mainly paddle/fluid/operators/sparse_attention_op.cu and python/paddle/fluid/tests/unittests/test_sparse_attention_op.py. sparse_attention has three parts: sddmm, softmax, and dsd. Adding the mask operation only needs to modify the softmax. It has no effect on the other two parts. In addition, in order to test the mask function, related tests has been added.

fa463b90

N
Add the transformop parameter in TensorReduceFunctorImpl (#38135) · 524389ee
由 niuliling123 提交于 12月 16, 2021
```
* Add the transformop parameter in TensorReduceFunctorImpl
```
524389ee

[Pten]Modify registered kernel name (#38109) · be874c08

由 YuanRisheng 提交于 12月 16, 2021

* Reduce reshape kernel functions in pten

* delete notes

* fix bugs when compile

* modify register name

* fix compile bugs

be874c08

Add float16 type for scatter op. (#38136) · 9bac4a76

由 Li Min 提交于 12月 16, 2021

* Add float16 type for scatter op.

* Add fp16 test for scatter op.

* Add int and int64 support for scatter_grad on gpu.

* Add int and int64 for check_variable_and_dtype routine.

* Minors.

* Code format.

9bac4a76

15 12月, 2021 3 次提交
- Y
  Change a comment to avoid the disturb to op benchmark ci. (#38148) · 4d8242df
  由 Yiqun Liu 提交于 12月 15, 2021
```
test=document_fix
```
  4d8242df
- H
  Add cinn_launch_op_test into Paddle-CINN ci (#38076) · e5a838f8
  由 Huihuang Zheng 提交于 12月 15, 2021
```
As the title.
```
  e5a838f8
- C
  replace with pten kernel in cast cuda compute and remove unused codes (#38074) · 75332401
  由 chentianyu03 提交于 12月 15, 2021
```
* replace with pten kernel in cast cuda compute and remove unused codes

* rm unused header file

* replace CastCUDAOpKernel with CastOpKernel
```
  75332401
14 12月, 2021 6 次提交
- S
  add map_matmul and fc_act_fuse passes to quant2_int8_mkldnn_pass (#38023) · 8f800dc0
  由 Sylwester Fraczek 提交于 12月 14, 2021
```
* add map_matmul passes to quant2_int8_mkldnn_pass

* fix fc+act fuse (activation scale)

* ci fix, c++17 structured bindings not available

* fix ci static check
```
  8f800dc0
- B
  add conv_gelu_mkldnn_fuse_pass (#38107) · 206a33b3
  由 baoachun 提交于 12月 14, 2021
```
* add conv_gelu_mkldnn_fuse_pass

* add post ops
```
  206a33b3
- W
  
  modify the fix_seed attribute in dropout op is a def attribute.test=develop (#38100) · f44add7b
  由 weishengying 提交于 12月 14, 2021
  
  f44add7b
- Y
  [PTen] Reduce reshape kernel functions in pten (#38055) · a3c8abc7
  由 YuanRisheng 提交于 12月 14, 2021
```
* Reduce reshape kernel functions in pten

* delete notes

* fix bugs when compile
```
  a3c8abc7
- W
  
  fix generate_proposals op doc (#38048) · c117dfba
  由 wangguanzhong 提交于 12月 14, 2021
  
  c117dfba
- S
  add reshape+transpose+matmul_v2 only (#37847) · a922168a
  由 Sylwester Fraczek 提交于 12月 14, 2021
```
* reshape+transpose+matmul_v2

* in_name->input_name

* fix pr-ci-static-check
```
  a922168a
13 12月, 2021 4 次提交

T

update xpu_memcpy (#38049) · bdf5834e
由 taixiurong 提交于 12月 13, 2021

bdf5834e
N

[pnorm] Optimize p_norm op for special cases (#37685) · 10d9ab4b
由 Noel 提交于 12月 13, 2021

10d9ab4b

add logit API (#37844) · b197bfe6

由 wangzhen38 提交于 12月 13, 2021

* add Logit API

* add unittest

* conflict

* pull conflit

* pull conflit logit

* fix unititest

* fix code style

* update docs style of

* update en doc

* fix docs en style

* fix docs en style1

* fix docs en style2

* fix docs en style3

* fix docs en style4

* fix docs en style5

* fix docs en style6

* fix docs en style7

* fix docs en style8

* update by review

* fix nan bug

b197bfe6

C
complement deps on cinn_launch_context cmake (#38031) · cba84f88
由 CtfGo 提交于 12月 13, 2021
```
complement deps of cmake files under WITH_CINN compilation
```
cba84f88

10 12月, 2021 5 次提交
- L
  
  fix int32 overflow in cuda kernel loop (#38007) · 37f43ebc
  由 Leo Chen 提交于 12月 10, 2021
  
  37f43ebc
- Z
  fix pscore geo&lr_decay (#37995) · 513d1f97
  由 zhaocaibei123 提交于 12月 10, 2021
```
* fix

* modify log

* fix batch_size
```
  513d1f97
- F
  add as_complex and as_real op (#37784) · ae40370d
  由 Feiyu Chan 提交于 12月 10, 2021
```
* add as_complex and as_real op
```
  ae40370d
- J
  
  support pylayer with different input dtype (#37974) · c732c831
  由 Jiabin Yang 提交于 12月 10, 2021
  
  c732c831
- C
  
  change serval variable name and usage related cinn_launch (#38022) · a9bd6f0c
  由 CtfGo 提交于 12月 10, 2021
  
  a9bd6f0c
09 12月, 2021 5 次提交
- C
  cache scope and place on CinnLaunchContext and pass them to callback (#37983) · 151c5d74
  由 CtfGo 提交于 12月 09, 2021
```
cinn_launch_op： cache scope and place on CinnLaunchContext to skip duplicate alloc/free callback construction
```
  151c5d74
- C
  [PTen] Refine Kernel Registrar Writing (#37977) · b199ba85
  由 Chen Weihang 提交于 12月 09, 2021
```
* refine the kernel register impl

* fix cmake and symbol error

* remove overload marco

* polish details
```
  b199ba85
- J
  
  add ipu device p2 (#37840) · cb636a48
  由 jianghaicheng 提交于 12月 09, 2021
  
  cb636a48
- R
  
  optimize flip op, removing duplicated computation when dim size is one (#37825) · 890638cf
  由 Roc 提交于 12月 09, 2021
  
  890638cf
- F
  
  format softmax forward (#37927) · 18aca3f5
  由 Feng Xing 提交于 12月 09, 2021
  
  18aca3f5

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功