提交 · 2039115c51af202df0e845f2efbea1319a497d7a · PaddlePaddle / Paddle

05 5月, 2023 2 次提交
- S
  
  [XPU] Fusion of gather and assign operators to fused_mt op for reducing memory usage (#53262) · 2039115c
  由 shentanyue 提交于 5月 05, 2023
  
  2039115c
- S
  
  [XPU] Fix the out_max of the branch in xpu_conv2d op(#53343) · d27f15ed
  由 sprouteer 提交于 5月 05, 2023
  
  d27f15ed
04 5月, 2023 1 次提交
- W
  
  Fix a bug in constant folding pass (#53456) · ace61b8b
  由 weishengying 提交于 5月 04, 2023
  
  ace61b8b
28 4月, 2023 1 次提交
- H
  
  [CINN Support 0D-Tensor] CINN hack squeeze2 with trick temporarily (#53454) · 09f8e31d
  由 HongyuJia 提交于 4月 28, 2023
  
  09f8e31d
27 4月, 2023 4 次提交
- Z
  
  xpu quant weight only (#53306) · 1c97aa69
  由 zhupengyang 提交于 4月 27, 2023
  
  1c97aa69
- W
  
  set sync_param default true (#53335) · 421f56a8
  由 wuhuachaocoding 提交于 4月 27, 2023
  
  421f56a8
- H
  [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily (#53382) · 9ab14865
  由 HongyuJia 提交于 4月 27, 2023
```
* [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily

* Add unittest
```
  9ab14865
- G
  remove some [-Wunused-parameter] warning (#53365) · 0fac3281
  由 Galaxy1458 提交于 4月 27, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop
```
  0fac3281
26 4月, 2023 1 次提交

remove some [-Wunused-parameter] waring (#53319) · f9e5072b

由 Galaxy1458 提交于 4月 26, 2023

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

f9e5072b

25 4月, 2023 3 次提交
- S
  
  [XPU][BUG] Fix link_xpu_op_max_pass bug (#53258) · be1b3fc3
  由 sprouteer 提交于 4月 25, 2023
  
  be1b3fc3
- W
  
  add mp_sync config. (#53254) · 503f422e
  由 wuhuachaocoding 提交于 4月 25, 2023
  
  503f422e
- Y
  [PHI]Add flags macro for PHI (#52991) · 22e96bde
  由 YuanRisheng 提交于 4月 25, 2023
```
* add flags for phi

* fix compile bugs

* fix ci bugs

* fix inference bugs

* fix cinn' bugs

* fix cinn bugs

* perfect code according comment

* fix ci bugs

* fix ci bugs
```
  22e96bde
24 4月, 2023 4 次提交
- N
  
  Add "enable_tensor_checker" and "disable_tensor_checker" to api list (#52936) · 41138718
  由 niuliling123 提交于 4月 24, 2023
  
  41138718
- Z
  
  transform cachekv datalayout of fused_multi_transformer_xpu (#53144) · bfa5d6b8
  由 zhupengyang 提交于 4月 24, 2023
  
  bfa5d6b8
- 张
  
  rm is_npu_place (#53105) · a85e038a
  由张春乔提交于 4月 24, 2023
  
  a85e038a
- G
  remove some [-Wunused-parameter] (#53185) · 834eb2ba
  由 Galaxy1458 提交于 4月 24, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test ,test=develop
```
  834eb2ba
23 4月, 2023 2 次提交

apply gcc12 to gpups (#52960) · cbfd43e4

由 risemeup1 提交于 4月 23, 2023

* apply gcc12 to gpups

* apply gcc12 to gpups

* apply gcc12 to gpups

* apply gcc12 to gpups

* apply gcc12 to gpups

* apply gcc12 to gpups

* apply gcc12 to gpips

* apply gcc12 to gpups

* apply gcc12 to gpups

* test

* test

* apply gcc12 to gpups

* apply_gcc12_to_gpups

* fix compiler bug

* fix compiler bug

* test

* fix dangling-pointer compiler

* fix dangling-pointer compiler

* fix dangling-pointer compiler

* apply_gcc12_to_gpups

* apply gcc12 to gpups

* Update cuda_streams_py.cc

cbfd43e4

remove some [-Wunused-parameter] (#53162) · b02687cc

由 Galaxy1458 提交于 4月 23, 2023

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

* test,test=develop

b02687cc

21 4月, 2023 4 次提交

support 0-D output and 0-D as indice in __getitem__/__setitem__ (#52814) · 4e939c89

由 JYChen 提交于 4月 21, 2023

* support 0-D output and 0-D as indice in __getitem__

* fix tests

* fix inference and UT

* add unittest for setitem

* fix xpu test

* fix xpu 0-d

4e939c89

Y

init output 4 all backend (#53124) · c2cd02de
由 YuhangLi 提交于 4月 21, 2023

c2cd02de

Fix bug of block desc. (#53163) · ba899b5c

由 Ghost Screaming 提交于 4月 21, 2023

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* Remove climits.

* Fix bug of BlockDesc::MoveFrom(). It's used to rebuild main_program_desc from ProgramDesc modified by Fusion Pass. As some fused operators need to create new Variables in modified ProgramDesc, MoveFrom function uses std::move() function to move these VarDesc to main_program_desc. As a result, their pointers holded by modified ProgramDesc become nullptr. When call block()->Program()->proto() function, it will call ProgramDesc::Flush() function at first, which may cause a segmentation fault.

ba899b5c

Z

optimize write_read_array, gather if beam_size=1 (#53130) · e8e9d6c5
由 zhupengyang 提交于 4月 21, 2023

e8e9d6c5

20 4月, 2023 4 次提交
- T
  Revert "[BUG FIX]Fix performance bugs that created by PR#49116 (#52124)" (#53109) · 06ecc6d2
  由 tianshuo78520a 提交于 4月 20, 2023
```
This reverts commit 543efcc5.
```
  06ecc6d2
- H
  [CustomOP error] Add attrs type check (#53030) · 195d6d0f
  由 HongyuJia 提交于 4月 20, 2023
```
* [CustomOP error] Add attrs type check

* fix global variable order bug

* include unordered_set

* fix ParseAttrStr compile error
```
  195d6d0f
- H
  Register fluid kerenls to phi [part 10] (#53034) · 1c12f2f3
  由 huangjiyi 提交于 4月 20, 2023
```
* update

* update

* Revert "update"

* fix bug

* update
```
  1c12f2f3
- R
  
  fix ninja error (#53076) · cea6b6de
  由 risemeup1 提交于 4月 20, 2023
  
  cea6b6de
19 4月, 2023 4 次提交
- S
  Move fused_attention op to phi [迁移XPU OpKernel] [ test=kunlun ] (#53011) · 7b56bd25
  由 Sonder 提交于 4月 19, 2023
```
* trans fused attention to phi

* add optional parm

* trans fused_attention_grad to phi

* add fused attention grad register info

* fix include

* test=kunlun

* add fused attention to static build list

* add remove

* update remove
```
  7b56bd25
- Y
  [BUG FIX]Fix performance bugs that created by PR#49116 (#52124) · 543efcc5
  由 YuanRisheng 提交于 4月 19, 2023
```
* fix performance bugs

* fix ci bugs
```
  543efcc5
- J
  
  optimzie reshape related fusion (#53066) · c29dc34e
  由 Jiabin Yang 提交于 4月 19, 2023
  
  c29dc34e
- C
  
  fix_delete_repeated_ops_pass bug (#53042) · b64b8163
  由 csy0225 提交于 4月 19, 2023
  
  b64b8163
18 4月, 2023 3 次提交
- H
  register fluid kerenls to phi [part 6.5] (#52882) · cb81befa
  由 huangjiyi 提交于 4月 18, 2023
```
* update

* fix bug

* update

* fix bug
```
  cb81befa
- G
  
  test,test=develop (#52993) · 8b82f77e
  由 Galaxy1458 提交于 4月 18, 2023
  
  8b82f77e
- 张
  
  remove mlu(#53007) · 4d5a3ad6
  由张春乔提交于 4月 18, 2023
  
  4d5a3ad6
17 4月, 2023 4 次提交

[Paddle-Inference] Add cutlass conv2d_depthwise (#51792) · bd3b096a

由 zhoutianzi666 提交于 4月 17, 2023

* initial commit for cutlass_teller

* second commit for cutlass_teller

* add conv2d_depthwise python template

* add conv2d_depthwise cutlass template

* /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h

* refine code in Conv2dFusionCanSupport

* add macro in cutlass_teller.h

* add 3x3 5x5 teller

* add groups not 1 or conv2d_depthwise teller

* 只生成ic是8的倍数的conv2d_depthwise 的kernel

* add EXPLICIT in cutlass_teller.h

* final commit

* add split_k_slices in conv2d_depthwise

* make stages == 2

* 重构部分代码

* add CutlassFusionType

* solve illegal memory

* make stride_h=stride_w && make dilation==1

* must check HasAttr(use_cutlass) before GetAttrIfExists

* add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String

* modify decl.h and util.cu

bd3b096a

G

remove some [-Wunused-paramter] warning (#52924) · 337cc2ca
由 Galaxy1458 提交于 4月 17, 2023

337cc2ca

Add output defs for some kernelsPhi register (#52941) · 23f87442

由 Sonder 提交于 4月 17, 2023

* add register info for eigh and eig_gard

* add sync_batch_norm_op.cu register info

* add lamb output register info

* add unique register info

* change type name

* change type name

* add output register info for check_finite_and_unscale

* update cmake and config file

* add register info for adagrad

* fix build error

* add sync to run_unittests.sh

* add register info for unique_consecutive

* fix build error

* add eigh to STATIC_BUILD_TESTS

* update eig_kernel.cc

* update eig_kernel.cc

* fix infer mate error

* fix unique register error

* fix lamb register info error

* fix lamb register info

* update lamb register info

* fix lamb

* remove one Output Register

* update static build file

* add eigh op to disable_wingpu_test

* update run_unittests

23f87442

H

[Dygraph] Support delaying div loss by accumulate_steps in PipelineLayer (#52848) · 0abdcff6
由 Haohongxiang 提交于 4月 17, 2023

0abdcff6

14 4月, 2023 3 次提交

J
delete SupportNPU(), SupportMLU() (#52911) · 8601859e
由 jjyaoao 提交于 4月 14, 2023
```
* delete SupportNPU(), SupportMLU()

* delete npu branch
```
8601859e

1. modify set_value op, use Scalars to represent attr `values`, instead of a... · dd2a749a

由 Feiyu Chan 提交于 4月 14, 2023

1. modify set_value op, use Scalars to represent attr `values`, instead of a bunch of attributs of various types; (#52408)

2. add program converter and set_value op as an example, which provides the functionality to convert `paddle::framework::ProgramDesc` between old and new formats(the differences are mainly some operators with incompatible updates in the definition);
3. program version and operator version map now are always saved when serializing `paddle::framework::ProgramDesc` to identify the version;
3. provide an option `legacy_format=false` in serialization of `paddle::framework::ProgramDesc`, it decided whether to convert ProgramDesc back to a legacy format, which is compatible for paddle 2.4.2 or earlier versions to load and execute;
4. deserialization of `paddle::framework::ProgramDesc` is now automatically detecting whether the bytes it receives is in legacy format(contains any of the operators that has been incompatibly updated and have any attribute of type `Scalar`) and convert it to new format. But if you want a faithful deserialization without the automatic conversion, you can use protobuf's deserialization instead. Though it is not recommended, it can be used for the purpose of testing.

dd2a749a

Z

delete cast if lookup_table_v2 support fp16; delete repeated ops (#52888) · 7aafeb45
由 zhupengyang 提交于 4月 14, 2023

7aafeb45

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功