提交 · 7d138402a57d27cd12c75f92953702f87b07894b · PaddlePaddle / Paddle

07 3月, 2023 1 次提交

[OpTest]add only_check_prim parameter in check grad (#51210) · 017452e9

由 Charles-hit 提交于 3月 07, 2023

* support elementwise_pow bfloat16

* add only_check_prim parameters in check_grad

* modify unit test

* fix floor test

* fix sigmoid bfloat16 test

017452e9

06 3月, 2023 1 次提交

[phi decoupling] decouple dependency to device_context in phi (Part 1) (#50865) · a1006b2b

由 Huang Jiyi 提交于 3月 06, 2023

* move DeviceContextPool to phi

* add EmplaceExternalContextFunc

* update namespace

* update cmake

* fix bugs and create context_pool_impl.h

* replace platform::is_xxx_place

* fix bugs

* update generator

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix enforce usage

* Revert "fix enforce usage"

This reverts commit 5f521f08a69713cee506e64a00ec6d9fba709e27.

* fix bugs

* rm XPUDeviceContext and CustomDeviceContext

* fix bugs

* fix fix context init bug

* fix bugs after merge

* fix bugs

* fix name

* fix mutable_data

* update and fix bugs

* fix bugs

* update

* fix bugs

* fix name

* fix bugs

* merge

* fix bugs

* create context_pool in phi/backends

* create context_pool in phi/backends

* fix bugs

* fix xpu bugs

* fix rocm bugs

* fix bugs

* fix bugs

* fix bugs

* fix xpu bugs

* update

* update

* fix bugs

* fix bugs

a1006b2b

03 3月, 2023 2 次提交

【Hackathon No.70】[PHI decoupling] move jit kernels from fluid to phi (#50911) · 2d36c9a9

由 gouzil 提交于 3月 03, 2023

* [phi] move jit kernels from fluid to phi

* [phi] fix paddle::phi err

* [phi] fix windows 'posix_memalign': identifier not found

* [phi] fix windows 'posix_memalign_free': identifier not found

* [phi] fix readme directory structure, fc_functor  paddle::platform

2d36c9a9

Y
[PHI Decoupling]Remove memory header (Part2) (#50870) · 558068cc
由 YuanRisheng 提交于 3月 03, 2023
```
* decouple memory copy

* fix ci bugs

* fix ci compile bugs

* fix rocm compile

* fix ci bugs
```
558068cc

02 3月, 2023 2 次提交

Cache for cublaslt descriptor (#50931) · 819f8939

由 limingshu 提交于 3月 02, 2023

* first commit

* finish base work

* modification for good

* fix for cache setting and gather the algo and desc as one data for cache storage

* fix for cache setting and gather the algo and desc as one data for cache storage

* install pre-commit check

819f8939

[AMP OP&Test] register fp16 and bf16 kernel for uniform_random (#50993) · 72f34450

由 Leo Chen 提交于 3月 02, 2023

* register fp16 and bf16 kernel for uniform_random

* fix compile

* support selected_rows

* add ut

* revert cpu

* fp16 test skip cpu

72f34450

01 3月, 2023 1 次提交

[XPU] Add kernels for VITDET (#50992) · 798b527c

由 duanyanhui 提交于 3月 01, 2023

* add support of int64 add for xpu

* add transpose support for int64

* add randperm kernel

* fix randperm

* add distribute_fpn_proposal kernel

* fix comment

* add reduce_sum_int32

798b527c

28 2月, 2023 1 次提交
- G
  【Hackathon No.69】[PHI decoupling] move device_wrapper from fluid to phi (#50749) · 7b6d5ac0
  由 gouzil 提交于 2月 28, 2023
```
* [phi] move device_wrapper from fluid to phi

* [phi] fix ‘PADDLE_ENFORCE_XDNN_SUCCESS’ was not declared in this scope
```
  7b6d5ac0
27 2月, 2023 2 次提交
- B
  Reduce redundant cpu computation in slice compute (#50348) · 8aec0580
  由 Bo Zhang 提交于 2月 27, 2023
```
* conflict

* add UpdateSliceAttrs
```
  8aec0580
- Y
  
  Add PADDLE_THROW in ToCudaDataType and polish codes. (#50922) · 2eeaaa7d
  由 Yiqun Liu 提交于 2月 27, 2023
  
  2eeaaa7d
26 2月, 2023 1 次提交

Matmul performance optimization with cuBlasLt (#46431) · d4217fc6

由 limingshu 提交于 2月 26, 2023


* implement of matmul using cublasLt instead of cublas

* Update matmul_kernel_impl_via_blasLt.h

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>
Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d4217fc6

22 2月, 2023 1 次提交

Fix some typos. (#50429) · 93b2bf4b

由 Shuangchi He 提交于 2月 22, 2023

* Fix some typos.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* pre-commit
Signed-off-by: Yulv-git <yulvchi@qq.com>

---------
Signed-off-by: Yulv-git <yulvchi@qq.com>

93b2bf4b

21 2月, 2023 2 次提交

[PHI Decoupling]Remove memory header (Part1) (#50419) · 1cfcb71d

由 YuanRisheng 提交于 2月 21, 2023

* decouple_memory

* perfect memory utils

* fix ci bugs

* fix inference bugs

* fix custom test bugs

* fix converage bugs

* modify code according comment

* modify namespace

* deal with compile bugs

1cfcb71d

[phi decoupling] move sequence_padding from fluid to phi (#50639) · 5f443601

由 Huang Jiyi 提交于 2月 21, 2023

* move sequence_padding to phi

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix buga

* fix bugs

* revert and update phi::XPUContext

5f443601

20 2月, 2023 1 次提交
- R
  
  [PHI decoupling] remove reference to fluid/framework/tensor.h in phi (#50475) · 1c8e15c9
  由 RedContritio 提交于 2月 20, 2023
  
  1c8e15c9
17 2月, 2023 2 次提交

[phi decoupling] move platform/transform to phi (#50498) · fe332794

由 Huang Jiyi 提交于 2月 17, 2023

* move platform::transform to phi

* fix bugs

* move transform_test to phi

* fix cmake

* update namespace

* fix cmake

fe332794

[phi decoupling] clean TensorCopy usage in phi (#50538) · b5da73c5

由 Huang Jiyi 提交于 2月 17, 2023

* rm framework::tensor_util in phi

* clean TensoCopy

* fix bugs

* fix bugs

* fix bugs

* repalce mutable_data

* revert custom_device_test.cc

b5da73c5

16 2月, 2023 2 次提交

[Phi decouple] move layer_norm_kernel.cu.h to phi (#50506) · 8910bb4a

由 Huang Jiyi 提交于 2月 16, 2023

* move layer_norm_kernel.cu.h to phi

* fix bugs

* fix namespace

* fix bugs

* fix CI-Windwos

* replace mutable_data

* fix bugs

* fix bugs

8910bb4a

[phi decoupling] remove variable.h in phi (#50407) · 905cefd4

由 Huang Jiyi 提交于 2月 16, 2023

* move variable_utils from phi_api_utils to fluid

* fix coment

* update include

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* update

* update

* fix CI-Windows-OpenBLAS

* fix bugs

* fix bugs

* fix bugs

* update include

* move variable_utils to phi_utils

* fix namespace

905cefd4

15 2月, 2023 1 次提交

[PHI Decoupling]Remove Profiler header (Part2) (#50183) · 8fabca11

由 YuanRisheng 提交于 2月 15, 2023

* move profiler

* add file

* fix mac compile bugs

* fix ci bugs

* fix mac bugs

* fix ci bugs

* fix compile bugs

* perfect code according comment

8fabca11

14 2月, 2023 2 次提交

decouple tensor_utils (#50264) · 057cdb95

由 engineer1109 提交于 2月 14, 2023

fix X

remove TensorCopy

codestyle

add fluid memory header

fix symbol

fix cmake

fix cmake

fix context

fix header

fix place

fix context

fix context

fix context

fix code

fix custom context

fix custom context

fix copy

fix data_transform

fix style

remove changes of custom

fix scalar

057cdb95

Decrease usage of GetVecSize for optimizing host computation efficiency (#50353) · 976606fe

由 limingshu 提交于 2月 14, 2023

* first commit.

* a little changes

* add some changes for get vec_size efficiently

* fix bugs

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>

976606fe

10 2月, 2023 1 次提交

Fix UFA非法地址访问(UFA illegal address access) of case2: paddle.scatter (#50025) · fb228c4a

由 RedContritio 提交于 2月 10, 2023

* add dim check in scatter

* add check in scatter.cu

* add unittest

* remove unnecessary log and comment

---------

Co-authored-by: RedContritio <>

fb228c4a

09 2月, 2023 2 次提交

[PHI decoupling] move strided_memcpy.h to phi (#50346) · 17318c1a

由 Huang Jiyi 提交于 2月 09, 2023

* decouple strided_memcpy

* move strided_memcpy

* move strided_memcpy to phi

* fix namespace

* update

* fix gpu compile bugs

17318c1a

Add MultiTenosrAdam OP (#49220) · 10654c77

由 yuehuayingxueluo 提交于 2月 09, 2023

* add multi_tenosr_adam

* update multi_tensor_base.py, test_multi_tensor_adam.py, adamw.py

* fix adam.py optimizer.py

* fix adamw.py

* fix test_multi_tensor_adam.py

* fix CI bug

* fix CI coverage

* fix ci bug

* fix betapow

* fix some bugs

* fix test_adamw_op.py

* fix CI coverage

* fix multi_tensor_adam_kernel.cc

* fix CI bug

* fix multi_tensor_adam_op.cc and test_multi_tensor_adam.py

* fix code style

* update C++ parts

* remove python parts modification temporarily

* add C++ ut

* update betapow copy code logic

* fix ci ut

* fix windows ci

* fix coverage ci

* improve coverage rate

---------
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

10654c77

08 2月, 2023 1 次提交
- H
  
  move mixed_vector (#50282) · 35d7d1f0
  由 Huang Jiyi 提交于 2月 08, 2023
  
  35d7d1f0
07 2月, 2023 1 次提交
- Y
  
  Fix gather, scatter op 0d tenor GPU error. (#50271) · 05c9c0a5
  由 Yuang Liu 提交于 2月 07, 2023
  
  05c9c0a5
03 2月, 2023 1 次提交
- R
  
  Fix div 0 error of case20: paddle.min (#50013) · 50c43dd3
  由 RedContritio 提交于 2月 03, 2023
  
  50c43dd3
02 2月, 2023 2 次提交
- R
  Fix div 0 error of case10: paddle.nn.functional.max_pool2d/max_pool3d (#50012) · 1451fa51
  由 RedContritio 提交于 2月 02, 2023
```
* add stride check for PoolOutputSize

* add unittest
```
  1451fa51
- Y
  [BugFix]Fix bugs when compile with OneDNN (#50096) · 3c557e2f
  由 YuanRisheng 提交于 2月 02, 2023
```
* fix bugs

* fix ci bugs
```
  3c557e2f
01 2月, 2023 3 次提交

R
Fix div 0 error of case11: paddle.nn.functional.max_pool1d/max_pool2d/max_pool3d (#50010) · 3ab6faa8
由 RedContritio 提交于 2月 01, 2023
```
* add stride check for MaxPool

* add unittests
```
3ab6faa8

Combination of multiple paddle::memory::allocate operation into one for ops (#49126) · bdae5481

由 limingshu 提交于 2月 01, 2023

* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* fix code according to comments

* fix codes according to  review comments

* adding some function overload

* relocate the power operation.

* add bf16 support for index select relevant ops

* revert bf16 type change.

* add changes for more op

* fix code writting bugs

bdae5481

H2D data transfer optimization for split kernel (#49086) · 057ba778

由 limingshu 提交于 2月 01, 2023

* profile reduce kernel for fp16 and reduceHigherdim

* use reinterpret_cast

* fix for CI on ROCm

* add Macro for ROCm

* ROCm CI config

* ROCm CI config

* unit test repair

* pull

* add common_funcs.h

* reduceType

* Update reduce_function.h

* not higher

* rename

* implement of matmul using cublasLt instead of cublas

* cublasLt bugfix

* Update matmul_kernel_impl.h

* Update matmul_kernel_impl_via_blasLt.h

* for-loop-algo

* PR comments changes

* add macro

* ci unused variable isCublasLt

* ci unused variable isCublasLt macro

* split matmul to autotune

* rewrite the split kernel with segmented_array

* rewrite the split kernel with segmented_array

* rewrite the split kernel with segmented_array

* add some method for cuda_graph

* fix bugs for rocm

* change for ci-error

* i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work.

* add some changes for passing mode_benchmark and coverage ci

* fix ci error

* fix ci-rocm error

* add some changes for header

---------
Co-authored-by: Nzhangbopd <1299246947@qq.com>
Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>

057ba778

31 1月, 2023 5 次提交

Z

optimize 2D sync_batch_norm (#49663) · 9a4acfee
由 zhangkaihuo 提交于 1月 31, 2023

9a4acfee
张
fix div 0 error in floormod (#49997) · 26bdea0f
由张春乔提交于 1月 31, 2023
```
* fix mod 0 error

* fix div 0 error in floormod
```
26bdea0f

support 0d tensor for interpolate (#49929) · 2e156ac8

由 xiaoting 提交于 1月 31, 2023

* support 0d tensor for interpolate

* support 0d tensor for interpolate

* add xpu unittest for interp

* update unittest for interpolate

* fix coverage

* fix code style

* fix for coverage

* fix coverage

2e156ac8

张

fix div 0 error in conv1_transpose (#50000) · 1755a154
由张春乔提交于 1月 31, 2023

1755a154

Unify the gpu implementation of stack and unstack to reuse the optimization. (#49748) · 3586e856

由 Yiqun Liu 提交于 1月 31, 2023

* Unify the gpu implementation of stack and unstack to reuse the optimization.

* Optimize the cuda implementation of unstack.

* Use GpuMemcpyAsync instead of memory::Copy.

* Fix error of calculating the index.

* Use FastDivMod to further imporve the performance of unstack.

3586e856

30 1月, 2023 1 次提交
- E
  add phi tensor vector array api from fluid (#49885) · 094e3b8c
  由 engineer1109 提交于 1月 30, 2023
```
replace all TensorFromVector & TensorToVector

AssignKernel async copy
```
  094e3b8c
18 1月, 2023 1 次提交
- Add align check for Concat Kernel (#49761) · 24379442
  由 MarDino 提交于 1月 18, 2023
```
* add align check

* refine
```
  24379442

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功