提交 · 3ac1ccf95cde5113902ccb87bcc78ee93bd637f9 · PaddlePaddle / Paddle

23 5月, 2023 6 次提交

fix processing logic of the arange function when dtype is empty. (#53800) · 3ac1ccf9

由 zxcd 提交于 5月 23, 2023

* fix processing logic of the arange function when dtype is empty.

* update commit version

* fix ValueError when end is None.

* add unitest for new case.

* fix tensor type.

* remove paddle.to_tensor(), add more test unit.

* remove useless line.

* fix enable_static

* add new test unit.

* fix by comment.

3ac1ccf9

R

fix windows compiler error (#54029) · c8375f86
由 risemeup1 提交于 5月 23, 2023

c8375f86
Y

fix trt inference fp16 io (#54032) · 7c1bc000
由 Yuanle Liu 提交于 5月 23, 2023

7c1bc000

[NewIR] Program desc convert to IRProgram (#53707) · 07223e34

由 kangguangli 提交于 5月 23, 2023

* Use copy_if_different to avoid recompilation of generated cutlass
kernels.

* add program parameter dialect_interface

* fix op create bug

* add conv2d

* draft of paddle converter

* fix CI

* fix windows CI

* fix program destructor

* printer draft

* fix bug

* printer draft finish

* fix windows CI

* reserve inplace semantics

* revert program::destroy since no need to do topology sort

* revert

* modify by reviews

* polish

* fix op definition

* fix CI

* refresh file changes

---------
Co-authored-by: Numiswing <umiswing@foxmail.com>
Co-authored-by: Nzhangbo9674 <zhangbo54@baidu.com>

07223e34

Z

add multi_encoder_xpu_fuse_pass case (#54025) · 7590c7c7
由 zhupengyang 提交于 5月 23, 2023

7590c7c7
H
[0D-Tensor] Support elementwise_add (#53955) · 26c824db
由 HongyuJia 提交于 5月 23, 2023
```
* [0D-Tensor] Support elementwise_add

* support elementwise_add ZeroDim2&3
```
26c824db

22 5月, 2023 25 次提交

精简 virtual pipeline 调度逻辑 (#54003) · 6fde2056

由 zhenhailiu 提交于 5月 22, 2023

* unify code

* remove useless code

* polish

* python/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py

* polish

* polish

6fde2056

[dygraph]unify _non_static_mode() in_dygraph_mode() and in_dynamic_mode() (#53856) · 3794d171

由 Meteor Liu 提交于 5月 22, 2023

* [dygraph]unify _non_static_mode() in_dygraph_mode() and in_dynamic_mode()

* [dygraph]unify _non_static_mode() in_dygraph_mode() and in_dynamic_mode()

* [dygraph]unify _non_static_mode() in_dygraph_mode() and in_dynamic_mode()

* [dygraph]unify _non_static_mode() in_dygraph_mode() and in_dynamic_mode()

* [dygraph]unify _non_static_mode() in_dygraph_mode() and in_dynamic_mode()

* [dygraph]unify _non_static_mode() in_dygraph_mode() and in_dynamic_mode()

* fixed cyclic reference that caused patial import

* fixed bad change

* fix bad import

* fix bad import

* fix bad import

* fix ut failed caused by change in_dynamic_mode

* fix ut failed caused by change in_dynamic_mode

* fixed usage of in_dynamic_mode() or in_dygraph_mode()

* revert python3 to python in .pre-commit-config.yaml

* fix merge conflicts

3794d171

Z

remove depthwise_conv from extra black list (#53901) · 98f4446a
由 Zhang Ting 提交于 5月 22, 2023

98f4446a

update_c++14_to_c++17_on_windows (#53958) · 6e043202

由 risemeup1 提交于 5月 22, 2023

* update_c++14_to_c++17_on_windows

* disable test_audio_logmel_feature and test_audio_mel_feature

6e043202

N

Fix ctest error in test_amp_api (#53885) · 56947361
由 niuliling123 提交于 5月 22, 2023

56947361
N

Delete the chinese decription in ctest (#54018) · f7083f47
由 niuliling123 提交于 5月 22, 2023

f7083f47
R

fix gcc12 error of coverage_ci (#54009) · a0085a77
由 risemeup1 提交于 5月 22, 2023

a0085a77

Eval frame speedup (#53969) · 9fb22293

由 xiongkun 提交于 5月 22, 2023

* [Dy2static-Fallback] add set_eval_frame function in pybind.
1. add set_eval_frame function in pybind.

* add unittest for eval frame hooker.

* [support py38]

* fix-GeneratorExit error in eval frame hooker

* support python == 3.9

* support 3.10

* fix some comments

* speed up eval frame for cache hitted code.

* code format

* fix unittest

---------
Co-authored-by: NSigureMo <sigure.qaq@gmail.com>

9fb22293

Z
[IR] Refine operation dyn_cast (#53996) · 9ded0692
由 zhangbo9674 提交于 5月 22, 2023
```
* refine op dyn_cast

* fix bug

* refine code

* refine code

* refine code

* refine code
```
9ded0692

[NewIR] Printer for Program/Operation/Type and Register ops for ResNet50 (#53988) · a9b1e887

由 kangguangli 提交于 5月 22, 2023


* add conv2d

* printer draft

* fix bug

* printer draft finish

* fix windows CI

* commit printer and resnet50 related ops

* fix

* fix

* fix op definition

---------
Co-authored-by: Numiswing <umiswing@foxmail.com>
Co-authored-by: Nzhangbo9674 <zhangbo54@baidu.com>

a9b1e887

S

[Phi] Fix phi kernel error when kernel fallback cpu (#53879) · bc9b6e26
由 shentanyue 提交于 5月 22, 2023

bc9b6e26

[Prim] Simplify bn vjp (#54012) · ad49e0fb

由 cyber-pioneer 提交于 5月 22, 2023

* recompute bn grad

* fix test case

---------
Co-authored-by: Nsunli <466530738@qq.com>

ad49e0fb

R

[CustomDevice] Disable custom device by default on GPU/XPU (#53970) · 36da353d
由 ronnywang 提交于 5月 22, 2023

36da353d
L
[XPU][PHI Kernels] fix errors when numel is zero for xpu (#54010) · 423dda37
由 lijin23 提交于 5月 22, 2023
```
* fix empty bugs for xpu

* fix empty bugs for xpu
```
423dda37
Z

multi_encoder support adaptive seqlen (#53982) · 664a2753
由 zhupengyang 提交于 5月 22, 2023

664a2753
Z

[xpu][infer] support runtime configs (#53595) · e135069d
由 zhupengyang 提交于 5月 22, 2023

e135069d
Y

fix inference demo CMakeLists.txt (#53994) · d327d3e1
由 Yuanle Liu 提交于 5月 22, 2023

d327d3e1
J

fix device changed in setitem-numpy case (#53987) · ae35f502
由 JYChen 提交于 5月 22, 2023

ae35f502
W

revert logical for xpu. (#53976) · 89b73ef1
由 Wilber 提交于 5月 22, 2023

89b73ef1
Y
[Inference] add config.enable_low_precision_io api and remove rely on... · d1bbd900
由 Yuanle Liu 提交于 5月 22, 2023
```
[Inference] add config.enable_low_precision_io api and remove rely on AnalysisConfig::Precison in trt (#52485)
```
d1bbd900
Z
[Paddle Inference] Fix transfer_layout when input size if too big (#53881) · 5ac8c040
由 zhoutianzi666 提交于 5月 22, 2023
```
* fix transfer_layout when input size if too big
* do not add TransferLayoutKernelGPU
* add int64 and add check
```
5ac8c040
Z

[XPU] batch_norm_grad support float16 for xpu (#53977) · 934d8b89
由 zhangyikun02 提交于 5月 22, 2023

934d8b89

Add multiclass_nms3 GPU kernel (#52401) · f71c805e

由 Tian Zheng 提交于 5月 22, 2023

* Add GPU kernel for multiclass_nms3 op

* Make multiclass_nms3 gpu kernel output consistent with cpu kernel

* Fix API incompatibility

* Fix unittests on builds without CUDA

* Fix ROCM build

* Remove fluid headers; Use default atol for unittest

* Change function and variable naming

* Add comments; Reduce redundant code

* Use paddle test framework

f71c805e

N
Print python trace back when debugmode = CHECK_NAN_INF_AND_ABORT and backward... · d2fa26f6
由 niuliling123 提交于 5月 22, 2023
```
Print python trace back when debugmode = CHECK_NAN_INF_AND_ABORT  and backward has nan/inf  (#52808)
```
d2fa26f6
W
[XPU] bind 3D grid sample, fix edge cases in slice & reshape (#53981) · e5021ee9
由 wangshengxiang 提交于 5月 22, 2023
```
* bind xpu op: 3D grid sample

* fix edge cases in xpu op: reshape & slice
```
e5021ee9

20 5月, 2023 3 次提交
- S
  
  add info for topology (#54000) · 83a12b11
  由 ShenLiang 提交于 5月 20, 2023
  
  83a12b11
- Z
  
  fix ir program delete bug (#53978) · 8acbf10b
  由 zhangbo9674 提交于 5月 20, 2023
  
  8acbf10b
- Z
  [IR] Add types and attributes to builtin and pd dialect (#53953) · 5c10be4f
  由 zhangbo9674 提交于 5月 20, 2023
```
* add types and attributes

* remove some const_cast

* refine code
```
  5c10be4f
19 5月, 2023 6 次提交

S

[Inference] Save optimized model by pass (#53696) · fa08a514
由 shentanyue 提交于 5月 19, 2023

fa08a514

Improve stablity of Paddle-TensorRT FP16 UT GitHub (1) (#51554) · 645e81f0

由 Frank Lin 提交于 5月 19, 2023

* Improve Readability and Overall Clarity of Logging

* Adds the set_input_type API for specifying input data types

* Specifying input data types

645e81f0

W

[XPU] fix fallback (#53801) · 4b85e5db
由 wz1qqx 提交于 5月 19, 2023

4b85e5db

add minimum grad composite rules (#52561) · 97690816

由 warrentdrew 提交于 5月 19, 2023

* add minimum grad composite rules

* add public python api

* fix format

* fix format

* update testcase

* fix testcase

* fix format

* fix cmakelist.txt

* fix format

* fix param problem

* fix op and composite rule

* fix bf16 cpu support problem

* fix bf16 cpu issue

* fix axis error log

* add axis for maximum

* revert commit

* remove .orig

* fix generic problem

* revert max op

* fix axis error

* fix maximum axis

* fix test_check_output

* fix cinn

* fix minimum maximum axis check

97690816

王

[IR] fine-tune the implementation of ir component. (#53894) · 9d9f0ce5
由王明冬提交于 5月 19, 2023

9d9f0ce5

Add flash attention to speedup fused_gate_attention. (#52731) · d29c1f8e

由 limingshu 提交于 5月 19, 2023

* Reorganize the forward codes of flash-attention.

* Fix forward.

* Remove some noused codes.

* Simplify codes and fix backward.

* Change all LOG(INFO) to VLOG and fix the backward.

* add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes

* decrease the effect of debug print on performance

* Unify the initialize of flashattn arguments.

* Rewirte the reshape of temp_mask and temp_bias.

* API support use_flash_attn.

* Fix compiling error on CI.

* Try to crop the flash-attention lib.

* Correct the condition of whether can use flash-attn.

* Remove the softmax_out argument.

* Remove is_causal.

* Polish codes.

* Fix qkv_transpose_out's shape and scaling of Q * K.

* Update commit of flash-attention.

---------
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

d29c1f8e

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功