提交 · c2b2416691e04ff91fa7c53fa76501b23d49efbd · PaddlePaddle / Paddle

15 3月, 2023 7 次提交
- S
  【AMP OP&Test】Add fp16 test for divide, matmul, pnorm (#51005) · c2b24166
  由 Siming Dai 提交于 3月 15, 2023
```
* add fp16 test for divide, matmul, pnorm

* add cumsum fp16 unittest

* fix threshold

* revert cumsum

* fix code-style

* fix according to review

* fix kernel not found
```
  c2b24166
- G
  
  add inplace sigmoid_ and multiply_ (#50267) · b3caa233
  由 Guoxia Wang 提交于 3月 15, 2023
  
  b3caa233
- W
  
  [JitLayer]Fix Load error when load path like 'export.jit' (#46279) · 77d9b4c3
  由 WangZhen 提交于 3月 15, 2023
  
  77d9b4c3
- Z
  Delete hardswish_raw op (#51634) · 3e636ec9
  由 zhangyuqin1998 提交于 3月 15, 2023
```
* Delete hardswish_raw op

* fix ut
```
  3e636ec9
- R
  [CustomDevice] fix SyncDefaultStream for process_group_custom (#51618) · bcec0dce
  由 ronnywang 提交于 3月 15, 2023
```
* [CustomDevice] fix SyncDefaultStream for process_group_custom

* update
```
  bcec0dce
- W
  refine amp scaler (#51340) · 1e232e27
  由 wanghuancoder 提交于 3月 15, 2023
```
* refine _found_inf
```
  1e232e27
- X
  【prim】 modify_yaml (#51436) · 870c0837
  由 xiaoguoguo626807 提交于 3月 15, 2023
```
* modify_yaml

* delete default param

* add output for matmul_double_grad
```
  870c0837
14 3月, 2023 33 次提交

[Zero Dim] hack process Tensor.numpy() from 0D to 1D to avoid much incompatible (#51586) · 4a8b97ee
由 zhouweiwei2014 提交于 3月 14, 2023

4a8b97ee
V

Adjust tolerance without modify grad (#51459) · 145a6cbb
由 Vvsmile 提交于 3月 14, 2023

145a6cbb
P

delete numpy version (#49556) · 117df481
由 pangyoki 提交于 3月 14, 2023

117df481
fix -Werror=maybe-uninitialized (#51608) · ae428a0a
由 engineer1109 提交于 3月 14, 2023

ae428a0a
C

Fix typos (#51379) · e34c79c7
由 chenxujun 提交于 3月 14, 2023

e34c79c7
[Zero-Dim] correct some code to adapt to 0D Tensor (#51562) · 6737226f
由 zhouweiwei2014 提交于 3月 14, 2023

6737226f

add split and split_with_num composite rule (#51341) · bb9eb20f

由 ccrrong 提交于 3月 14, 2023

* add split_with_num composite rule

* add split_with_num composite rule

* add split composite rule

* update

* update test

* update test

* delete split_with_num_grad

bb9eb20f

add symbol InitDevices InitMemoryMethod (#51553) · a5ebe6ae
由 engineer1109 提交于 3月 14, 2023
```
fix abi

fix tab
```
a5ebe6ae
Q

implement expand as using tile (#51577) · 300b687a
由 qizhaoaoe 提交于 3月 14, 2023

300b687a

Optimization for layerNormGrad [Part1] (#51282) · 7a3d05d9

由 limingshu 提交于 3月 14, 2023

* first commit

* fix code bugs in for_loop

* fix bugs in cuLoadAddStridedInputs.

* optimization for LayerNormBackwardComputeGradInput

* add unitest for validating the optimization

* fix windows ci error

7a3d05d9

G

[Divide by 0 Error] add DataNormKernel check (#51583) · e4ba5f86
由 gouzil 提交于 3月 14, 2023

e4ba5f86

cuda graph support multi-stream for new executor (#51389) · 579fb5fd

由 pangyoki 提交于 3月 14, 2023

* cuda graph support multi-stream for new executor

* fix windows compile error

* delete create_cuda_graph_stream

579fb5fd

Z

fix cmakelist (#51546) · 26007b1d
由 zhaoyingli 提交于 3月 14, 2023

26007b1d

[AMP OP&Test] Append bf16/fp16 support 4 elementwise_max (#51151) · 143eceeb

由 YuhangLi 提交于 3月 14, 2023

* wisemax fp16 support

* add bf16 support 4 elementwise_max

* append broadcast 4 op 4 fp16 / bf16

* fix elewise_max ut bf16 numeric delta

* append fp/bf16 uts

* add fp/bf16 uts

* change bf16 uts delta

* fix some issue

* add prim 4 fp16

143eceeb

W

fix rank=1 (#51413) · b4f49aa1
由 wangxiaoning 提交于 3月 14, 2023

b4f49aa1
W

fix test_layernorm_shift_partition_pass time out (#51612) · b642461d
由 wenbin 提交于 3月 14, 2023

b642461d
X

[dy2static] fix the speed problem introduced by #50883 (#51606) · 46d6080d
由 xiongkun 提交于 3月 14, 2023

46d6080d
W
[TRT] Fix conv2d filter of trt elementwiseadd_trans fusion UT (#51294) · dca81a43
由 Wang Bojun 提交于 3月 14, 2023
```
* fix conv2d filter
```
dca81a43
X
【prim】test composite rules with -1 shape (#51435) · 82a7c33e
由 xiaoguoguo626807 提交于 3月 14, 2023
```
* init

* modify
```
82a7c33e
W

fix (#51552) · c3f8ba9b
由 wangxiaoning 提交于 3月 14, 2023

c3f8ba9b
I

add output defs for histogram kernel (#51317) · 2876f6f8
由 Infinity_lee 提交于 3月 14, 2023

2876f6f8
Z
【AMP OP&Test】add fp16 and bf16 test (#51286) · 376dbb82
由 zhiboniu 提交于 3月 14, 2023
```
* add fp16 and bf16 test

* update
```
376dbb82
A
add register of select (#51595) · 93867e20
由 Ackeraa 提交于 3月 14, 2023
```
add register of select
Co-authored-by: Nwqgo <1552367872@qq.com>
```
93867e20

update empty api to support complex dtype at static mode (#51377) · fc0497bd

由 Li-fAngyU 提交于 3月 14, 2023

* update empty api to support compex dtype at static mode

* code style

* code style

* 补充注释里的类型描述

fc0497bd

H

[Tensor Operants & Prim-Relevant] Multiply operants replace by scale (#51469) · 2d0e8c3b
由 HongyuJia 提交于 3月 14, 2023

2d0e8c3b
C

[Prim] enable whitelist and blacklist for custom_vjp · 300f36c0
由 cxxly 提交于 3月 05, 2023

300f36c0

Cxx prim custom vjp () · e4a93b05

由 xiongkun 提交于 3月 02, 2023

* [CINN]Enhance CacheKey hash logic by considering input dtypes (#50557)

---------
Co-authored-by: Njiangcheng <thisjiang@qq.com>

* [prim] enable dygraph_to_static to support custom_vjp

* Pr 50885 (#7)

* [CINN]Enhance CacheKey hash logic by considering input dtypes (#50557)

* [CINN]Enhance CacheKey hash logic by considering input dtypes

---------
Co-authored-by: Njiangcheng <thisjiang@qq.com>

* [prim] enable dygraph_to_static to support custom_vjp

* fix code in a dy2static-friendly way.

* [dystatic] add hooker for prim

---------
Co-authored-by: NAurelius84 <zhangliujie@baidu.com>
Co-authored-by: Njiangcheng <thisjiang@qq.com>
Co-authored-by: Ncxxly <chenxx_id@163.com>

* [prim] enable dygraph_to_static to support custom_vjp

* fix cast prim and vjp dtype mapping error bug

* [dy2static-ci] fix dy2static ci errors.

---------
Co-authored-by: NAurelius84 <zhangliujie@baidu.com>
Co-authored-by: Njiangcheng <thisjiang@qq.com>
Co-authored-by: Ncxxly <chenxx_id@163.com>

e4a93b05

C

fix cast prim and vjp dtype mapping error bug · 5dda91a8
由 cxxly 提交于 3月 02, 2023

5dda91a8
C

[prim] enable dygraph_to_static to support custom_vjp · ece6837f
由 cxxly 提交于 2月 24, 2023

ece6837f

Pr 50885 () · ecc842f1

由 xiongkun 提交于 2月 28, 2023

* [CINN]Enhance CacheKey hash logic by considering input dtypes (#50557)

* [CINN]Enhance CacheKey hash logic by considering input dtypes

* add unittest

* fix typo

* fix typo

* fix map.at

* fix find

* fix test

* fix cinn cache key structure realize

* using ordered map for attributes

* add test by review advice

---------
Co-authored-by: Njiangcheng <thisjiang@qq.com>

* [prim] enable dygraph_to_static to support custom_vjp

* fix code in a dy2static-friendly way.

* [dystatic] add hooker for prim

---------
Co-authored-by: NAurelius84 <zhangliujie@baidu.com>
Co-authored-by: Njiangcheng <thisjiang@qq.com>
Co-authored-by: Ncxxly <chenxx_id@163.com>

ecc842f1

C

[prim] enable dygraph_to_static to support custom_vjp · d0c80f43
由 cxxly 提交于 2月 24, 2023

d0c80f43

[CINN]Enhance CacheKey hash logic by considering input dtypes (#50557) · 539d05c6

由 Aurelius84 提交于 2月 24, 2023

* [CINN]Enhance CacheKey hash logic by considering input dtypes

* add unittest

* fix typo

* fix typo

* fix map.at

* fix find

* fix test

* fix cinn cache key structure realize

* using ordered map for attributes

* add test by review advice

---------
Co-authored-by: Njiangcheng <thisjiang@qq.com>

539d05c6

[IR] Type system stage4: Add some built-in types and type conversion methods (#51112) · 3a3ff942

由 zhangbo9674 提交于 3月 14, 2023

* add builtin-type DenseTensorType Float16Type Float64Type Int16Type Int64Type

* refine comment

* refine comment

* add classof for Type class

* refine test code

* add get param func for DenseTensorType

* add dyn_cast and refine isa

* set default WITH_NEWIR=OFF

* refine cast_utils

* Refine code by comment

* refine code by comment

* refine code by comment

* refine code by comment

* fix bug of dyn_cast

* set WITH_NEWIR=OFF

* refine code by comment

3a3ff942

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功