提交 · 6b59b58cd19d6d7a1e1a4a6fdbda4d5f740eb5b6 · PaddlePaddle / Paddle

23 12月, 2021 9 次提交

由 wuhuanzhou 提交于 12月 23, 2021

* add erfinv API, test=develop

* fix gradient accuracy error, test=develop

* fix cuda compilation error on Windows, test=develop

* fix M_2_SQRTPI undeclared identifier on Windows, test=develop

6b59b58c

Upgrade work queue (#38335) · 198d11be

由 liutiexing 提交于 12月 23, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update EventsWater

* fix

* split workqueue files

* add more tests

* fix

* bugfix

* bugfix

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

198d11be

Z
【PTen】Add empty and empty_like kernel in pten (#38334) · 4221cd33
由 zyfncg 提交于 12月 23, 2021
```
* add empty and empty_like kernel in pten

* add empty dev_api
```
4221cd33
W
Support external stream. (#38373) · 15ad7ee4
由 Wilber 提交于 12月 23, 2021
```
* support external stream.

* update

* update

* update
```
15ad7ee4
H

add-leaky-relu-to-xpu2-op-list (#38366) · b7bafee8
由 houj04 提交于 12月 23, 2021

b7bafee8

add mkldnn conv_elementwise_add_mkldnn_fuse_pass ut (#37612) · f88065d3

由 baoachun 提交于 12月 23, 2021

* add mkldnn conv_elementwise_add_mkldnn_fuse_pass ut

* update mkldnn conv_elementwise_add_mkldnn_fuse_pass ut

* update conv_elementwise_add_mkldnn_fuse_pass ut

* update conv_elementwise_add_mkldnn_fuse_pass ut

* update conv_elementwise_add_mkldnn_fuse_pass ut

* restrict conv2d data_format in conv_elementwise_add_mkldnn_fuse_pass

* update conv_elementwise_add_mkldnn_fuse_pass OpCompat

* update conv_elementwise_add_mkldnn_fuse_pass ut

* update ut

f88065d3

add new API: paddle.clone;Tensor.element_size;nn.utils.parameters_to_vector (#38020) · 0eb03ed7
由 zhouweiwei2014 提交于 12月 23, 2021
```
* add new API: paddle.clone;Tensor.element_size;nn.utils.parameters_to_vector

* fix comment
```
0eb03ed7

Add unittest for flatten2_matmul squeeze2_matmul reshape2_matmul pass (#37644) · aa059885

由 heliqi 提交于 12月 23, 2021

* add flatten2_matmul squeeze2_matmul reshape2_matmul test case

* modify skip func to ignore_pass_case func

* rebuild CI

* add test_xx_matmul_fuse_pass timeout

* add test_map_xx_pass timeout

* add max_duration of test cast

* add trt skip

* add timeout

* del commented code

aa059885

C

move sign kernel impl (#38363) · bb38b6aa
由 Chen Weihang 提交于 12月 22, 2021

bb38b6aa

22 12月, 2021 8 次提交
- C
  use elementwise to optimize gelu backward implementation on GPU (#38263) · 858e4358
  由 crystal 提交于 12月 22, 2021
```
* optimize gelu backward

* optimize gelu backward

* optimize code

* Number to expression

* Replacement number
```
  858e4358
- Y
  
  optimize buddy_allocator (#38312) · 8fe1cb72
  由 Yang 提交于 12月 22, 2021
  
  8fe1cb72
- B
  add mkldnn reshape_transpose_matmul fuse pass ut and op version check (#37468) · 274b135b
  由 baoachun 提交于 12月 22, 2021
```
* add mkldnn reshape_transpose_matmul fuse pass ut and op version check

* update reshape_transpose_matmul_mkldnn_fuse_pass ut

* update ut
```
  274b135b
- B
  update mkldnn batch_norm_activation fuse pass ut (#37402) · 3d7e737c
  由 baoachun 提交于 12月 22, 2021
```
* update mkldnn batch_norm_activation fuse pass ut

* update ut

* update mkldnn batch_norm_act_fuse_pass ut

* update batch_norm_act_fuse_pass ut

* update ut
```
  3d7e737c
- L
  
  [fleet_executor] Move IntraSend to Carrier. Using blocking queue (#38322) · ddc15a18
  由 LiYuRio 提交于 12月 22, 2021
  
  ddc15a18
- Y
  [PTen]Move flatten kernel to new directory (#38255) · 4d1ce184
  由 YuanRisheng 提交于 12月 22, 2021
```
* move flatten

* fix bugs of test

* modify header file

* add copy declare

* fix compile bugs
```
  4d1ce184
- J
  
  Add nearest_interp/v2 int8 and uint8 support (#37985) · 56e2a6a6
  由 joanna.wozna.intel 提交于 12月 22, 2021
  
  56e2a6a6
- W
  CE fix (#38324) · 90e9a486
  由 wenbin 提交于 12月 22, 2021
```
* CE fix

* more format
```
  90e9a486
21 12月, 2021 9 次提交
- Z
  Fix inplace problem of setitem (#38298) · da61df5c
  由 zyfncg 提交于 12月 21, 2021
```
* add inplace_map for trace_op in pybind

* fix inplace problem of setitem

* refactor the param format  of trace_op
Co-authored-by: Npangyoki <pangyoki@126.com>
```
  da61df5c
- B
  update seqconv_eltadd_relu_fuse_pass ut (#37907) · 4e578c2b
  由 baoachun 提交于 12月 21, 2021
```
* update seqconv_eltadd_relu_fuse_pass ut

* update ut

* update ut

* update ut
```
  4e578c2b
- B
  update squared_mat_sub_fuse_pass ut (#37838) · aadc8674
  由 baoachun 提交于 12月 21, 2021
```
* update squared_mat_sub_fuse_pass ut

* update ut

* update ut
```
  aadc8674
- C
  [PTen] Rename cuda dir and context to gpu (#38296) · dc7597e3
  由 Chen Weihang 提交于 12月 21, 2021
```
* rename cuda to gpu

* revert CMake change

* resolve conflit

* rename other cuda to gpu

* poish details
```
  dc7597e3
- C
  use elementwise to optimize gelu forward implementation on GPU (#38188) · aff43684
  由 crystal 提交于 12月 21, 2021
```
* relu forward opt

* add gelu functor

* optimize code
```
  aff43684
- A
  
  Fix for wrong conditions between forward and backward in elementwise_add_grad op (#38176) · d9780a22
  由 arlesniak 提交于 12月 21, 2021
  
  d9780a22
- Y
  
  [fleet_executor] Python side fleet executor and task node (#38290) · a4afb97a
  由 Yuang Liu 提交于 12月 21, 2021
  
  a4afb97a
- B
  add seqpool_cvm_concat_fuse_pass ut (#37902) · 06cf314a
  由 baoachun 提交于 12月 21, 2021
```
* add seqpool_cvm_concat_fuse_pass ut

* rename ut name
```
  06cf314a
- S
  Support FP16 mean (#38289) · 643a268e
  由 sneaxiy 提交于 12月 21, 2021
```
* mean first version

* fix scalar mean

* add fp16 dtype for api
```
  643a268e
20 12月, 2021 14 次提交

add mkldnn conv_transpose_bias fuse pass ut (#37508) · ac696941

由 baoachun 提交于 12月 20, 2021

* add mkldnn conv_transpose_bias fuse pass ut

* update conv_transpose_bias_mkldnn_fuse_pass ut

* update conv_transpose_bias_mkldnn_fuse_pass ut

* update conv_transpose_bias_mkldnn_fuse_pass ut

* restrict conv2d data_format in conv_transpose_bias_mkldnn_fuse_pass

* update ut timeout setting

* update ut

ac696941

[pten]add pten conj kernel (#38247) · a2793e5e

由 chentianyu03 提交于 12月 20, 2021

* add pten conj kernel

* modify conj_kernel file path

* add defined cuda macro to cuda/conj_kernel.h

a2793e5e

B

add gelu pbtxt for conv+gelu mkldnn fuse pass (#38162) · 1b7f6ae9
由 baoachun 提交于 12月 20, 2021

1b7f6ae9
F

[MLU]add mlu backend (#38207) · 76514a1f
由 fwenguang 提交于 12月 20, 2021

76514a1f
F

Skip zero-size Allocation in RecordStream (#38264) · 48937020
由 From00 提交于 12月 20, 2021

48937020

Support FP16 for more ops (#38123) · 1f445bf3

由 sneaxiy 提交于 12月 20, 2021

* support FP16 for more ops

* add amp list tests

* refine reduce_mean_grad

* fix OP benchmark ci

* fix fp16 reduce_mean

* updat ut, but still have some problems

* remove mean/reduce_mean fp16 kernel

1f445bf3

optimize softmax with cross entropy soft label (#32387) · f8955602

由 Feng Xing 提交于 12月 20, 2021

softmax_with_cross_entropy optimization with soft label. This PR includes optimization of
    "SoftmaxWithCrossEntropySoftLabel" : compute log_softmax and then compute loss.
    "CrossEntropySoftLabel" : compute loss with softmax as input.
These optimization includes following technics:
    read data to buffer with vectorization
    compute max and sum in warp
    fixed loop size with macro
Performance (computation time):
    softmax_with_cross_entropy_0 (forward) : -40.1%
    softmax_with_cross_entropy_0 (backward): -41%

f8955602

石

changes the call AllocShared to Alloc, test=develop (#38258) · bb0713b2
由石晓伟提交于 12月 20, 2021

bb0713b2
F

fix typos in header inclusion in complex_op.cc (#38272) · 2635cc86
由 Feiyu Chan 提交于 12月 20, 2021

2635cc86

add matmul_scale_fuse_pass (#37962) · ce335c23

由 heliqi 提交于 12月 20, 2021

* add matmul_scale matmul_v2_scale fuse pass

* add scaletensor judge

* modify var name

* add timeout notest;test=coverag

* fix error commit

* fix use_mkldnn attr

* fix use_mkldnn attr

ce335c23

S

fix use of implicitly deleted constructor (#38225) · 23d9e947
由 Sylwester Fraczek 提交于 12月 20, 2021

23d9e947

Add multi_tensor for momentum optimizer and clear_grads (#37564) · 0cc5e22c

由 zhangbo9674 提交于 12月 20, 2021

* add multi_tensor for momentum and clear_grads for optimizer

* fix bug for dygraph

* add unittest

* refine comment

* add param_group

* refine regularizaiton logic

* del clear_grads

* add clear_grads

* add dispensable check of None

* refine clear_grad

* fix build bug

* refine code by comment

* refine code

* add multi tensor check

* refine param_group update

* add multi tensor for static mode

* refine comments

* delete useless comma for momentum

* refine comment for momentum

* refine code by commment

0cc5e22c

Y

[fleet_executor] Remove runtime graph, all scheduler on python side (#38261) · 2f188341
由 Yuang Liu 提交于 12月 20, 2021

2f188341
Y
Fix bugs that copy occurs when tensor "in" and tensor "out" is same in reshape kernel (#38249) · a615002a
由 YuanRisheng 提交于 12月 20, 2021
```
* fix bugs when run reshape

* fix ci bug
```
a615002a

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功