提交 · d7d9807e8dd45ca11da43c6d0bfd7b84819465b4 · PaddlePaddle / Paddle

02 9月, 2022 2 次提交
- S
  
  enable add passes pre-calculating scales/quantizing weights (#44680) · cdb36da4
  由 Sylwester Fraczek 提交于 9月 02, 2022
  
  cdb36da4
- F
  padding the length of input for vit_attention (#45506) · f79be656
  由 feng_shuai 提交于 9月 02, 2022
```
* vit_384_opt

* just support trt8

* padding + unpadding

* fix:unit test

* refactor:padding

* fix: change the position of round_up

* refactor: delete workspace
```
  f79be656
30 8月, 2022 1 次提交
- Z
  [Paddle-TRT] constant-folding (#45494) · 97f43a8e
  由 zhoutianzi666 提交于 8月 30, 2022
```
add constant folding pass， for some model，it will get less latency；
```
  97f43a8e
29 8月, 2022 1 次提交
- Y
  
  TensorRT Engine context memory bind with predictor id (#45468) · 02621079
  由 Yuanle Liu 提交于 8月 29, 2022
  
  02621079
26 8月, 2022 2 次提交
- W
  Layernorm shape bugfix (#45431) · 3ca8cf44
  由 Wang Bojun 提交于 8月 26, 2022
```
* fix bug fix

* add shape size check

* polish code

* multi -1 shape fix

* code style improve

* bug fix

* code style fix
```
  3ca8cf44
- W
  
  fix_multihead (#45429) · fa06d9c3
  由 Wangzheee 提交于 8月 26, 2022
  
  fa06d9c3
25 8月, 2022 2 次提交
- W
  
  fix params sync multi times problem (#45406) · 20d38664
  由 Wilber 提交于 8月 25, 2022
  
  20d38664
- Z
  
  enforce_reshape (#45386) · 0bf40070
  由 zhoutianzi666 提交于 8月 25, 2022
  
  0bf40070
24 8月, 2022 3 次提交
- W
  fix mean/variance shape infer bug during loop call of dynamic trt enqueue (#45387) · 4e3f0b95
  由 Wang Bojun 提交于 8月 24, 2022
```
* fix bug fix
```
  4e3f0b95
- Y
  
  fix op_teller with_dynamic_shape judge bug (#45384) · 9e0baf6e
  由 Yuanle Liu 提交于 8月 24, 2022
  
  9e0baf6e
- W
  
  fix convert weight failed. (#45346) · 3d514e48
  由 Wilber 提交于 8月 24, 2022
  
  3d514e48
22 8月, 2022 4 次提交
- J
  Add int8 support for matmul+elementwise_add fuse pass (#45077) · 9e5f3a38
  由 joanna.wozna.intel 提交于 8月 22, 2022
```
* Add int8 support for matmul+elementwiae_add fuse

* Corrections after review and ernie test fix
```
  9e5f3a38
- S
  Extend conv_concat_relu to support all activations (#45089) · d03ef054
  由 Sławomir Siwek 提交于 8月 22, 2022
```
* merge conv_concat_relu to conv_act

* fix typo

* extend unit test

* reuse existing gpd

* codestyle

* enforce mkldnn conv
```
  d03ef054
- Z
  
  [Paddle-TRT] support output_padding in conv2d_transpose and conv3d_transpose (#45004) · 25d25b00
  由 zhoutianzi666 提交于 8月 22, 2022
  
  25d25b00
- Y
  
  remove trt_skip_layernorm_fuse_pass from gpu passes (#45293) · 25d58db6
  由 Yuanle Liu 提交于 8月 22, 2022
  
  25d58db6
19 8月, 2022 2 次提交
- W
  fix layernormTrt meanVar alloc bug (#45255) · 6fb34e74
  由 Wang Bojun 提交于 8月 19, 2022
```
* fix layernormTrt meanVar alloc bug
```
  6fb34e74
- W
  Trt groupnorm dynamic plugin (#44911) · 1aa6adb1
  由 Wang Bojun 提交于 8月 19, 2022
```
* add group_norm dyanmic plugin
```
  1aa6adb1
18 8月, 2022 2 次提交

[inference]predictor add GetInputType interface (#45143) · a8ae87f1

由 heliqi 提交于 8月 18, 2022

* predictor add GetInputType interface

* predictor change GetInputType to GetInputTypes

* predictor add tester

* predictor add tester

* predictor change GetInputType to GetInputTypes

* predictor change GetInputType to GetInputTypes

* predictor add tester

a8ae87f1

fix infer tans scope (#45203) · 2d0bb2c3

由 JingZhuangzhuang 提交于 8月 18, 2022

* fix infer tans scop

* fix infer trans scope

* fic infer trans scope

* fic infer trans scope
Co-authored-by: Ndingjiawei <327396238@qq.com>

2d0bb2c3

16 8月, 2022 2 次提交

convert multihead to oss (#45019) · f706d95d

由 feng_shuai 提交于 8月 16, 2022

* convert multihead to oss

* fix:bug

* fix:delete const cast

* fix:don't support bias_qk

* add vit pass

* fix:convert bug and add preln_residual_bias

* support length=-1

* add UT for convert

* add no_bias_qk support for gpu_multihead_op

* delete infer_shape depends on bias_qk

* oss just can be used in T4 and A*

* fix:change api for ROCM CI

f706d95d

W

memoptim and fp16 mixed precision (#45132) · fa890092
由 Wilber 提交于 8月 16, 2022

fa890092

15 8月, 2022 3 次提交

Y

fused_embedding_eltwise_layernorm_op and skip_layernorm_op support fp16 (#44969) · ac0553a0
由 Yuanle Liu 提交于 8月 15, 2022

ac0553a0

Refine TRT unit test (#45102) · 3512bf11

由 zlsh80826 提交于 8月 15, 2022

* Reduce pool2d test configuration

* Reduce depthwise_conv2d test configuration

* Reduce trt_convert_conv2d_fusion test configuration

* Reduce trt_convert_conv2d test configuration

* Reduce trt_convert_conv2d_transpose test configuration

* Reduce trt_convert_hard_swish test configuration

* Enhance trt auto scan test error message and mechanism

* Increase FP16 trt ut tolerance

3512bf11

W
convert_fp16 support multi block (#45050) · 9aecf286
由 Wilber 提交于 8月 15, 2022
```
* convert_fp16 support multi block

* update

* update
```
9aecf286

14 8月, 2022 1 次提交
- X
  Revert "[Paddle Inference] Support cuda_graph. (#44878)" (#45115) · b0e7681f
  由 xiaoxiaohehe001 提交于 8月 14, 2022
```
This reverts commit 84bf5c31.
```
  b0e7681f
12 8月, 2022 1 次提交
- Y
  trt engine input data type should be consistent with trt input bindin… (#45103) · a3eb341e
  由 Yuanle Liu 提交于 8月 12, 2022
```
* trt engine input data type should be consistent with trt input bindings type

* fix some bugs

* fix some bugs

* fix some bugs
```
  a3eb341e
11 8月, 2022 1 次提交
- W
  
  Change bias to persistable in preln_residual_bias_fuse_pass (#45037) · 26c573de
  由 whs 提交于 8月 11, 2022
  
  26c573de
10 8月, 2022 2 次提交
- W
  [Paddle Inference]Disable skip layernorm half (#45047) · 4805da50
  由 Wangzheee 提交于 8月 10, 2022
```
* disable_skip_layernorm_fp16
```
  4805da50
- X
  [Paddle Inference] Support cuda_graph. (#44878) · 84bf5c31
  由 xiaoxiaohehe001 提交于 8月 10, 2022
```
* cuda_graph

* cuda_graph_

* cuda_graph_

* cuda_graph_
```
  84bf5c31
09 8月, 2022 1 次提交
- A
  
  fix format for paddle/phi/api/lib/tensor.cc (#44972) · b54abbe8
  由 Allen Guo 提交于 8月 09, 2022
  
  b54abbe8
08 8月, 2022 1 次提交
- L
  clean includes of tensor.h (#44928) · ee9ea48d
  由 Leo Chen 提交于 8月 08, 2022
```
* clean tensor.h

* fix gather_nd
```
  ee9ea48d
05 8月, 2022 2 次提交

Merge matmul_v1 and matmul_v2 fuse passes (#44870) · d0cf9d9d

由 Sławomir Siwek 提交于 8月 05, 2022

* remove v2_transpose_reshape

* matmul_transpose_reshape

* reshape_transpose_matmul

* restore ut

* adjust old ut

* restore parallel UT ruels

* feedback from review

d0cf9d9d

update trt workspace size param (#44469) · bdce552b

由 Zhang Jun 提交于 8月 05, 2022

* update trt workspace size param

* update

* update

* update

* use int64_t

* use int64_t

* upate

* update

bdce552b

04 8月, 2022 4 次提交

Matmuls with activation and elementwise_add fuses (#44655) · 0420d514

由 Sławomir Siwek 提交于 8月 04, 2022

* Add unit tests

* matmul_v2 + activation

* matmuls + elementwise_add

* matmul_v2 postops

* transform matmul to v2

* opcompat

* fix fusing matmul with multipe outs

* add shape constraints

* remove unused vars

* change pass order

* - Unit tests to be debugged

- fix

- refactor

- diagnostic

- more diagnostic

- fix

- Fix number two

- fix

- fix

- fix

- alpha added

- more fixes

- compilation fix

- removed diagnostic code

- cosmetic fixes

* lint

* add alpha constraint

* merge matmul refactor

* trigger CI

* - fix

* - another fix

* code style

* add support for matmul+elementwise_add+activation

* code style

* fix bfloat16 bugs

* change append_binary to append_sum
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

0420d514

Z
[Paddle-TRT] add Rnn (#44678) · ffc8defa
由 zhoutianzi666 提交于 8月 04, 2022
```
* add rnn
```
ffc8defa
C

fix bug (#44875) · c693a027
由 ccrrong 提交于 8月 04, 2022

c693a027
W
convert support multi block. (#44866) · b4a4eef2
由 Wilber 提交于 8月 04, 2022
```
* convert support multi block.

* update
```
b4a4eef2

03 8月, 2022 1 次提交
- Z
  
  remove stack plugin (#44756) · a9f3719b
  由 zhoutianzi666 提交于 8月 03, 2022
  
  a9f3719b
02 8月, 2022 1 次提交

Multihead matmul fp16 (#44792) · 0fd8ee63

由 Wilber 提交于 8月 02, 2022

* multihead matmul add fp16

* fix windows error

* fix rocm error

* fix rocm error

0fd8ee63

01 8月, 2022 1 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功