提交 · 18de0c9496d670ecb3bb6174e0cfa9ee6aed60ed · PaddlePaddle / Paddle

26 7月, 2023 3 次提交
- D
  
  Add FP16 & BF16 for lamb (#55641) · 84a56b4a
  由 Difer 提交于 7月 26, 2023
  
  84a56b4a
- L
  [Reshard] Implement replicated to split with same placement (#55552) · 9f3b5f15
  由 LiYuRio 提交于 7月 26, 2023
```
* Implement replicated to split reshard function

* fix link error in clang

* refine split functor

* simplify reshard code
```
  9f3b5f15
- G
  
  add modernize-redundant-void-arg check (#55652) · 12fb18dd
  由 gouzil 提交于 7月 26, 2023
  
  12fb18dd
25 7月, 2023 7 次提交

L

fix a bug caused by hipcc lambda value capture (#55612) · 8db3ff1f
由 lishicheng1996 提交于 7月 25, 2023

8db3ff1f

Bugfix, fast layer norm, OOB (#55639) · 017a6164

由 Jeng Bai-Cheng 提交于 7月 25, 2023

* Fix LayerNormForward perf issue

* Bugfix, fast_layer_norm OOB

* apply pre-commit

---------
Co-authored-by: NShijie Wang <jaywan@nvidia.com>

017a6164

傅

add all false bool indices support for index_put (#55655) · c737f0ae
由傅剑寒提交于 7月 25, 2023

c737f0ae
L

fix bugs in rnn op (#55656) · 0cd422b6
由 Lucas 提交于 7月 25, 2023

0cd422b6
W

fix div 0 bug (#55644) · 690ffe81
由 wanghuancoder 提交于 7月 25, 2023

690ffe81

[NewIR]new ir dygraph to static supoort gpu (#55620) · fb9bec5d

由 hong 提交于 7月 25, 2023

* add kernel dialect

* change DenseTensorTypeStorage to DenseTensorType

* add test case`

* add first pd_op to kernel dialect

* lower pd op to kernel dialect

* update

* update

* remove useless code

* add attrite print test

* fix bug

* update

* update

* update

* update

* polish code

* fix bug

* polish  code  and add python test

* add test

* fix test error

* relax constraint when inserting get_parameter

* add env flag

* fix bug

* dygraph2static support new ir

* fix bug

* revert test env

* change cc_test_old to cc_test

* update

* fix build_static bug

* update test

* fix type test error

* udpate cmake

* disable test in windows

* fix inference compile

* fix program translator error

* only run on cpu, not support gpu yet

* fix conflict

* polish code

* fix bug

* add feed with place op

* update

* remove useless unitest

* udpate mkldnn

* update

* update

* align mkldnn version

* new ir support builtin slice op

* fix bug

* fix phi kernel adaptor bug

* add enable static

* add enable_static

* remove useless test case

* change feed list to single variable

* update

* add feed with place and shaddow output op

* fix bug

* remove usless code

* support gpu

* fix bug

* fix bug

* remove template

* add more data type

* fix cimpile bug

* udpate

* remove useless code

* revert dygraph2st test

* remove usless code

* revert op

* fix bug

* new ir dygraph2static support gpu

* remove usless code

* code polish

* add const

* revert code and remove useless code

* revert code

* revert legacy op yaml

* remove useless code

* delete std::move

---------
Co-authored-by: Nkangguangli <kangguangli@hotmail.com>

fb9bec5d

J

[XPU] Add FP16 support for arg_min_max (#55642) · 14094aad
由 jiangfan06 提交于 7月 25, 2023

14094aad

24 7月, 2023 1 次提交
- H
  
  [PHI] add fused_softmax_mask and fused_softmax_mask_grad for CPU. (#55616) · b10b899c
  由 houj04 提交于 7月 24, 2023
  
  b10b899c
20 7月, 2023 4 次提交

[NewIR]Change feed list to variable list && support GPU (#55401) · 75517841

由 hong 提交于 7月 20, 2023

* add feed with place op

* remove useless unitest

* udpate mkldnn

* update

* new ir support builtin slice op

* fix phi kernel adaptor bug

* add enable_static

* remove useless test case

* change feed list to single variable

* support gpu

* fix bug

* remove template

* add more data type

* fix cimpile bug

75517841

Z

[XPU] fuse cast to conv2d/fc in mixed precision model (#54493) · 4df00939
由 zhupengyang 提交于 7月 20, 2023

4df00939
Z

rename hard_sigmoid to hardsigmoid for kernel name (#55559) · c3080386
由 zyfncg 提交于 7月 20, 2023

c3080386

[XPU][PHI Kernels] bind reduce_max_int64 set_value_bool sin_grad_fp32... · ab00c96c

由 lijin23 提交于 7月 20, 2023

[XPU][PHI Kernels] bind reduce_max_int64 set_value_bool sin_grad_fp32 cos_grad_fp32 for XPU (#55375)

* bind kernels for xpu

* format code

* format code

* 0d support for set value

* refine set_value

ab00c96c

19 7月, 2023 3 次提交

[NewIR]Add feed with place op (#55343) · 8e9e0659

由 hong 提交于 7月 19, 2023

* add feed with place op

* remove useless unitest

* udpate mkldnn

* update

* add enable_static

* remove useless test case

* register int and doubel type

* fix bug

8e9e0659

delete relu6_raw (#55383) · 56d46ccc

由 zhangyuqin1998 提交于 7月 19, 2023

* delete relu6_raw

* fix codestyle

* Update test_mkldnn_matmul_activation_fuse_pass.py

* fix

* Update backward.yaml

* Update ops.yaml

* Update backward.yaml

56d46ccc

S
Fix mea segmentation fault error (#55408) · cc262c55
由 sneaxiy 提交于 7月 19, 2023
```
* fix mea seg fault develop

* fix bias_grad seg fault
```
cc262c55

18 7月, 2023 2 次提交

batch add inpalce api (#55078) · 19302938

由 GGBond8488 提交于 7月 18, 2023

* batch add inpalce api

* fix inplace fn generate

* add test for  new inpalce api

* fix typro

* fix typro

* fix typro

* fix test error

* fix atan2

* remove atan2

* auto genereate inpalce api

* fix inplace generate fn error

* fix windows error

* fix test error

* fix test error

* fix windows ci error

* fix test error

* fix test_error

* fix test error

* fix eigen aliasing error in inplace

* remove elementwise_pow inplace

* fix doc error

* fix test error

19302938

[NewIR]Fix new ir concat split bug (#55419) · 5e6645d7

由 hong 提交于 7月 18, 2023

* fix new ir concat op bug

* fix bug

* using add_n_with_kernel instead of add_n impl

* fix pd_op yaml bug

* fix bug

5e6645d7

17 7月, 2023 2 次提交

Support more dtype for any/all API. (#55253) · 7b19efe4

由 zxcd 提交于 7月 17, 2023

* add more data type for all/any.

* remove xpu fix.

* add test unit.

* fix typename name.

* fix output data type.

7b19efe4

Z
TensorSetConstantXPU support to use xpu::constant when T is float/float16 (#55122) · 6692dc9a
由 zhangyikun02 提交于 7月 17, 2023
```
* TensorSetConstantXPU support to use xpu::constant when T is float/float16

* add xpu_wait for TensorSetConstantXPU
```
6692dc9a

14 7月, 2023 4 次提交
- Z
  
  fix embedding_with_eltwise_add_xpu (#55354) · 95aab366
  由 zhupengyang 提交于 7月 14, 2023
  
  95aab366
- S
  
  fix fisher yates sample (#55329) · f311a927
  由 Siming Dai 提交于 7月 14, 2023
  
  f311a927
- H
  
  [XPU] Fix yolo_box to support multi-stream based inference (#55310) · 7e4290c5
  由 hong19860320 提交于 7月 14, 2023
  
  7e4290c5
- T
  Update CUDNN Frontend API to v0.9.1 (#54949) · 76b77d81
  由 Tian Zheng 提交于 7月 14, 2023
```
* Update CUDNN Frontend API to v0.9.1
- Remove old patches
- Remove workarounds that are no longer needed

* Fix test_switch_autotune
```
  76b77d81
13 7月, 2023 7 次提交
- F
  [inference] Add FusedBiasActKernel (#55301) · 0a4d1999
  由 freeliuzc 提交于 7月 13, 2023
```
* add init value for CudaSwishFunctor

* add new phi kernel fusedBiasActKernel
```
  0a4d1999
- add phi operator c_concat and ut (#55320) · 788be26d
  由 lil-Xing 提交于 7月 13, 2023
```
* add phi operator c_concat and ut

* update create_var use

* update copyright
```
  788be26d
- Z
  Move compare_raw_kernel to legacy (#53928) · 1dd8770a
  由 zhangyuqin1998 提交于 7月 13, 2023
```
* Move compare_raw_kernel to legacy

* fix

* Update compare_kernel.cc

* Move compare_raw_kernel to legacy
```
  1dd8770a
- F
  
  fix roi_align roi_pool to static num 0 (#55342) · 0a21836d
  由 Feng Ni 提交于 7月 13, 2023
  
  0a21836d
- W
  
  fix conv_fusion in multi thread. (#55374) · ceb83562
  由 Wilber 提交于 7月 13, 2023
  
  ceb83562
- R
  Add matmul_int8 op (#55228) · 27cc0df5
  由 RichardWooSJTU 提交于 7月 13, 2023
```
* add matmul int8
```
  27cc0df5
- Q
  Modify bf16 and fix the elementwise_max (#54799) · 6f7ceca0
  由 Qi Shao 提交于 7月 13, 2023
```
* modify the accuracy checking framework of bf16 optest, including both of forward and backward
```
  6f7ceca0
12 7月, 2023 3 次提交

Fix llm int8 build error (#55338) · 006bd959

由 FormlessUnit 提交于 7月 12, 2023

* add macro to avoid llm.int8 build error

* fix ci

---------
Co-authored-by: Nwufeisheng <wfs1997@163.com>

006bd959

[ONEDNN] Upgrade oneDNN version to v3.1 (#52463) · cfa513f7

由 YangQun 提交于 7月 12, 2023

* squash pick the poc code
* fix build after rebase
* fix int8 conv and fc uts
* Fix and clean-up Get_SRC_Scale_Memory
* fix floating point fc uts
* fix test_analyzer_int8_googlenet
* test_analyzer_int8_mobilenetv1
* fix int8 mobilenet v2 and v3
* fix build error after rebase
* [oneDNN] rename library version
* fix conv bias datatype
* try to fix import error
* fix rebase error
* [oneDNN] pack library into python wheel
* add MKLDNN_SHARED_LIB_3 to env_dict
* fix test_analyzer_bert
* fix fill_constant op kernel
* fix ernie and matmul op ut
* fix softplus ut
* fix conv+relu6 fusion ut
* fix hardswish fusion
* fix quant+transpose fusion ut
* fixsgd ut
* fix int8 matmul with flatten
* fix fc+scale fusion
* fix conv/matmul+gelu fusion uts
* fix rebase error
* Revert "fix conv/matmul+gelu fusion uts"
This reverts commit 47eb5e49972bd8f7271a233def9bfb3e98ce78e1.
* upgrade to onednn v3.1
* remove older version onednn
* use densetensor::data() for achieving mean and var in layernorm impl
* comments for atol of integer tests
* fix clang-format
* Revert "remove older version onednn"
This reverts commit 783e57ddfd4401254596eae7d47adb9b03590c09.
* improve binary handle
* fix expand kernel
* Revert "use densetensor::data() for achieving mean and var in layernorm impl"
* always use forward_inference for conv
* remove activation scales
* rollback changes to mkldnn.cmake
* address comments
* port changes to dequantize kernel
* fix merge error
* fix fused_elementwise_kernel
* upgrade onednn version to v3.1.1
* fix some approval error
* fix error msg format
* remove old onednn libs
* try to fix symbolic link issue
* fix cinn test case segfault
* do not explicit link test with onednn
* remove unnecessary changes
* integrate CINN with onednn v3
* link with mkldnn project
* fix cinn build file

---------
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>
Co-authored-by: NChen, Xinyu1 <xinyu1.chen@intel.com>
Co-authored-by: Ntianshuo78520a <707759223@qq.com>

cfa513f7

[clang-tidy] enable `readability-container-size-empty` check (#55279) · be3a6fa7

由 Wang Xin 提交于 7月 12, 2023

* [clang-tidy] enable readability-container-size-empty check

* fix test_custom_kernel Failed

* add clang-tid-10 in dockerfile

* add clang-tidy in dockerfile

* fix bug

be3a6fa7

11 7月, 2023 3 次提交
- R
  
  [ROCM] reduce build log (#55097) · a1396a80
  由 ronnywang 提交于 7月 11, 2023
  
  a1396a80
- Integrate rmsnorm kernel (#54998) · 97d3d6ee
  由 MarDino 提交于 7月 11, 2023
```
* add rmsnorm kernel
* add static graph test
* fix round type
* use alignas to avoid msvc compile error
* remove redundant headerfile to avoid rocm compile error
* fix rocm compile not found cub
* Add document
```
  97d3d6ee
- Linear compress (#55128) · f4290a92
  由 FormlessUnit 提交于 7月 11, 2023
```
* rename weight_only/llm.int8
```
  f4290a92
07 7月, 2023 1 次提交
- X
  
  [fix] move exception throw out of omp parallel for loop (#55064) · 9ed8bafd
  由 xiaoye 提交于 7月 07, 2023
  
  9ed8bafd

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功