提交 · d03ef0541959391e7414d6d8780f6248383fef18 · 机器未来 / Paddle

22 8月, 2022 3 次提交
- S
  Extend conv_concat_relu to support all activations (#45089) · d03ef054
  由 Sławomir Siwek 提交于 8月 22, 2022
```
* merge conv_concat_relu to conv_act

* fix typo

* extend unit test

* reuse existing gpd

* codestyle

* enforce mkldnn conv
```
  d03ef054
- Z
  
  [Paddle-TRT] support output_padding in conv2d_transpose and conv3d_transpose (#45004) · 25d25b00
  由 zhoutianzi666 提交于 8月 22, 2022
  
  25d25b00
- Y
  
  remove trt_skip_layernorm_fuse_pass from gpu passes (#45293) · 25d58db6
  由 Yuanle Liu 提交于 8月 22, 2022
  
  25d58db6
19 8月, 2022 2 次提交
- W
  fix layernormTrt meanVar alloc bug (#45255) · 6fb34e74
  由 Wang Bojun 提交于 8月 19, 2022
```
* fix layernormTrt meanVar alloc bug
```
  6fb34e74
- W
  Trt groupnorm dynamic plugin (#44911) · 1aa6adb1
  由 Wang Bojun 提交于 8月 19, 2022
```
* add group_norm dyanmic plugin
```
  1aa6adb1
18 8月, 2022 2 次提交

[inference]predictor add GetInputType interface (#45143) · a8ae87f1

由 heliqi 提交于 8月 18, 2022

* predictor add GetInputType interface

* predictor change GetInputType to GetInputTypes

* predictor add tester

* predictor add tester

* predictor change GetInputType to GetInputTypes

* predictor change GetInputType to GetInputTypes

* predictor add tester

a8ae87f1

fix infer tans scope (#45203) · 2d0bb2c3

由 JingZhuangzhuang 提交于 8月 18, 2022

* fix infer tans scop

* fix infer trans scope

* fic infer trans scope

* fic infer trans scope
Co-authored-by: Ndingjiawei <327396238@qq.com>

2d0bb2c3

16 8月, 2022 2 次提交

convert multihead to oss (#45019) · f706d95d

由 feng_shuai 提交于 8月 16, 2022

* convert multihead to oss

* fix:bug

* fix:delete const cast

* fix:don't support bias_qk

* add vit pass

* fix:convert bug and add preln_residual_bias

* support length=-1

* add UT for convert

* add no_bias_qk support for gpu_multihead_op

* delete infer_shape depends on bias_qk

* oss just can be used in T4 and A*

* fix:change api for ROCM CI

f706d95d

W

memoptim and fp16 mixed precision (#45132) · fa890092
由 Wilber 提交于 8月 16, 2022

fa890092

15 8月, 2022 3 次提交

Y

fused_embedding_eltwise_layernorm_op and skip_layernorm_op support fp16 (#44969) · ac0553a0
由 Yuanle Liu 提交于 8月 15, 2022

ac0553a0

Refine TRT unit test (#45102) · 3512bf11

由 zlsh80826 提交于 8月 15, 2022

* Reduce pool2d test configuration

* Reduce depthwise_conv2d test configuration

* Reduce trt_convert_conv2d_fusion test configuration

* Reduce trt_convert_conv2d test configuration

* Reduce trt_convert_conv2d_transpose test configuration

* Reduce trt_convert_hard_swish test configuration

* Enhance trt auto scan test error message and mechanism

* Increase FP16 trt ut tolerance

3512bf11

W
convert_fp16 support multi block (#45050) · 9aecf286
由 Wilber 提交于 8月 15, 2022
```
* convert_fp16 support multi block

* update

* update
```
9aecf286

14 8月, 2022 1 次提交
- X
  Revert "[Paddle Inference] Support cuda_graph. (#44878)" (#45115) · b0e7681f
  由 xiaoxiaohehe001 提交于 8月 14, 2022
```
This reverts commit 84bf5c31.
```
  b0e7681f
12 8月, 2022 1 次提交
- Y
  trt engine input data type should be consistent with trt input bindin… (#45103) · a3eb341e
  由 Yuanle Liu 提交于 8月 12, 2022
```
* trt engine input data type should be consistent with trt input bindings type

* fix some bugs

* fix some bugs

* fix some bugs
```
  a3eb341e
11 8月, 2022 1 次提交
- W
  
  Change bias to persistable in preln_residual_bias_fuse_pass (#45037) · 26c573de
  由 whs 提交于 8月 11, 2022
  
  26c573de
10 8月, 2022 2 次提交
- W
  [Paddle Inference]Disable skip layernorm half (#45047) · 4805da50
  由 Wangzheee 提交于 8月 10, 2022
```
* disable_skip_layernorm_fp16
```
  4805da50
- X
  [Paddle Inference] Support cuda_graph. (#44878) · 84bf5c31
  由 xiaoxiaohehe001 提交于 8月 10, 2022
```
* cuda_graph

* cuda_graph_

* cuda_graph_

* cuda_graph_
```
  84bf5c31
09 8月, 2022 1 次提交
- A
  
  fix format for paddle/phi/api/lib/tensor.cc (#44972) · b54abbe8
  由 Allen Guo 提交于 8月 09, 2022
  
  b54abbe8
08 8月, 2022 1 次提交
- L
  clean includes of tensor.h (#44928) · ee9ea48d
  由 Leo Chen 提交于 8月 08, 2022
```
* clean tensor.h

* fix gather_nd
```
  ee9ea48d
05 8月, 2022 2 次提交

Merge matmul_v1 and matmul_v2 fuse passes (#44870) · d0cf9d9d

由 Sławomir Siwek 提交于 8月 05, 2022

* remove v2_transpose_reshape

* matmul_transpose_reshape

* reshape_transpose_matmul

* restore ut

* adjust old ut

* restore parallel UT ruels

* feedback from review

d0cf9d9d

update trt workspace size param (#44469) · bdce552b

由 Zhang Jun 提交于 8月 05, 2022

* update trt workspace size param

* update

* update

* update

* use int64_t

* use int64_t

* upate

* update

bdce552b

04 8月, 2022 4 次提交

Matmuls with activation and elementwise_add fuses (#44655) · 0420d514

由 Sławomir Siwek 提交于 8月 04, 2022

* Add unit tests

* matmul_v2 + activation

* matmuls + elementwise_add

* matmul_v2 postops

* transform matmul to v2

* opcompat

* fix fusing matmul with multipe outs

* add shape constraints

* remove unused vars

* change pass order

* - Unit tests to be debugged

- fix

- refactor

- diagnostic

- more diagnostic

- fix

- Fix number two

- fix

- fix

- fix

- alpha added

- more fixes

- compilation fix

- removed diagnostic code

- cosmetic fixes

* lint

* add alpha constraint

* merge matmul refactor

* trigger CI

* - fix

* - another fix

* code style

* add support for matmul+elementwise_add+activation

* code style

* fix bfloat16 bugs

* change append_binary to append_sum
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

0420d514

Z
[Paddle-TRT] add Rnn (#44678) · ffc8defa
由 zhoutianzi666 提交于 8月 04, 2022
```
* add rnn
```
ffc8defa
C

fix bug (#44875) · c693a027
由 ccrrong 提交于 8月 04, 2022

c693a027
W
convert support multi block. (#44866) · b4a4eef2
由 Wilber 提交于 8月 04, 2022
```
* convert support multi block.

* update
```
b4a4eef2

03 8月, 2022 1 次提交
- Z
  
  remove stack plugin (#44756) · a9f3719b
  由 zhoutianzi666 提交于 8月 03, 2022
  
  a9f3719b
02 8月, 2022 1 次提交

Multihead matmul fp16 (#44792) · 0fd8ee63

由 Wilber 提交于 8月 02, 2022

* multihead matmul add fp16

* fix windows error

* fix rocm error

* fix rocm error

0fd8ee63

01 8月, 2022 4 次提交
- L
  unify gpu context (#44740) · 86763023
  由 Leo Chen 提交于 8月 01, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes
```
  86763023
- W
  infer context fix place error. (#44726) · 74e46a93
  由 Wilber 提交于 8月 01, 2022
```
* infer context fix place error.

* update

* update
```
  74e46a93
- W
  [Paddle Inference] add varlen_token_prune plugin, pass, convert (#44733) · 24187fcb
  由 Wangzheee 提交于 8月 01, 2022
```
* add varlen_token_prune plugin, pass, convert
```
  24187fcb
- H
  
  ort backend support output mutable data (#44724) · 3948c243
  由 heliqi 提交于 7月 31, 2022
  
  3948c243
29 7月, 2022 2 次提交
- C
  skip cast trt convert when input dtype is bool (#44716) · 5d94618d
  由 ccrrong 提交于 7月 29, 2022
```
* skip cast trt convert when input dtype is bool
```
  5d94618d
- M
  fused_fc_elementwise_layernorm_op support fp16 (#44710) · 856f741a
  由 ming1753 提交于 7月 29, 2022
```
* fused_fc_elementwise_layernorm support fp16

* fused_fc_elementwise_layernorm support double
```
  856f741a
28 7月, 2022 1 次提交
- H
  
  clone ort_predictor reuse session (#44703) · 72b65d6b
  由 heliqi 提交于 7月 28, 2022
  
  72b65d6b
26 7月, 2022 1 次提交
- W
  inference multi stream support handle lazy init. (#44563) · 1892a441
  由 Wilber 提交于 7月 26, 2022
```
* multi stream support handle lazy init.

* support eigen lazy init

* update

* fix ci problem
```
  1892a441
25 7月, 2022 1 次提交
- Z
  add swish using TensorRT layer (#44561) · c5a1e49c
  由 Zhang Jun 提交于 7月 25, 2022
```
* update

* empty commit

* update

* update

* update
```
  c5a1e49c
22 7月, 2022 3 次提交
- Z
  
  commit (#44534) · db864f0b
  由 zhoutianzi666 提交于 7月 22, 2022
  
  db864f0b
- X
  
  shufflechannelfix (#44516) · a2b39320
  由 xiaoxiaohehe001 提交于 7月 22, 2022
  
  a2b39320
- W
  
  add batch stream (#44524) · 4f86092b
  由 Wilber 提交于 7月 22, 2022
  
  4f86092b
21 7月, 2022 1 次提交

Fc fp16 (#44505) · 3e1280ea

由 ming1753 提交于 7月 21, 2022

* fc support fp16

* add a ‘,’ on paddle_pass_builder.cc

* fc support fp16 on non-cuda.

3e1280ea

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致