提交 · 9336dd3ee9f6938507a67ab33d8b4b4b83f55249 · 机器未来 / Paddle

08 8月, 2022 1 次提交

add trt int8 dynamic support (#44800) · 9336dd3e

由 JingZhuangzhuang 提交于 8月 08, 2022

* add trt int8 dynamic support

* just support trt7+

* just for trt7.1.3.a

* Update tensorrt_subgraph_pass.cc

* delete trt_engine when it not use

9336dd3e

05 8月, 2022 1 次提交
- Z
  
  commit (#44887) · 247002ec
  由 zhoutianzi666 提交于 8月 05, 2022
  
  247002ec
04 8月, 2022 3 次提交
- G
  [cherry-pick] fix QuantizeLinear pass and support reduce_max in quantization (#44872) · 24b3bbde
  由 Guanghua Yu 提交于 8月 04, 2022
```
* fix QuantizeLinear kernel and pass in QAT (#44784)

* Add Reduce Max in Quant (#44825)
Co-authored-by: NChang Xu <molixu7@gmail.com>
```
  24b3bbde
- Z
  [Paddle-TRT][cherry pick] Slice to 2.3 (#44757) · 245005d4
  由 zhoutianzi666 提交于 8月 04, 2022
```
* slice_to_2.3
```
  245005d4
- C
  [cherry pick] add cast trt convert (#44837) · 7cdce09b
  由 ccrrong 提交于 8月 04, 2022
```
* add cast trt convert

* skip cast trt convert when input dtype is bool

* code format

* fix bug

* update unittest

* fix bug
```
  7cdce09b
02 8月, 2022 2 次提交

[cherry-pick]Ort backend optimizer(#44136 #44703 #44724) (#44766) · 35297bd8

由 heliqi 提交于 8月 02, 2022

* [Inference]ort backend optimizer (#44136)

* add ort clone interface

* paddle2onnx update to 1.0.0rc

* ort input_tensor use mutable data of scope

* clone ort_predictor reuse session (#44703)

* ort backend support output mutable data (#44724)

* 2.3 interface is different from the Develop interface

* 2.3 interface is different from the Develop interface

* 2.3 interface is different from the Develop interface

35297bd8

Fix operator type record in profiler [cherry-pick PR44582] (#44654) · 6de20581

由 chenjian 提交于 8月 02, 2022

* fix record event for operator type in new dygraph (#44582)

* fix new dygraph record event for op

* update unit test

* fix file mode

6de20581

25 7月, 2022 1 次提交
- [cherry-pick]remove unuse cuSparse function (#44511) · 684b12ee
  由 zhouweiwei2014 提交于 7月 25, 2022
```
cherry-pick #43626
```
  684b12ee
19 7月, 2022 1 次提交

Record op shape data for profiler [cherry-pick PR43405 43578 43822] (#44384) · a2240190

由 chenjian 提交于 7月 19, 2022

* add serialization for new field in event node (#43405)

* add serialization for new field in event node

* fix a bug

* add more field to memory record (#43578)

* Add infer shape in dygraph (#43822)

* record memory and op supplement info

* update

* update

* fix a bug

* fix memory recording

* fix a bug

* update

* update

* fix a bug

* update

* fix a bug

* fix a bug

* fix a bug

* update dygraph record

* add infer shape record

* fix

* fix

* fix

* add comments

* fix a bug

* fix

* fix

* add record op info

* fix file mode

* add op input shape info

* fix dependency

a2240190

12 7月, 2022 1 次提交

add new field for event node (#43223) (#44245) · 94271bc2

由 chenjian 提交于 7月 12, 2022

* add new field for event node

* fix

* fix bug

* fix bug

* fix clang

* fix clang format

* fix code format

94271bc2

30 6月, 2022 2 次提交
- W
  [Paddle Inference ]Fix emb pass for ernie3.0 (#43948) · 35abeda7
  由 Wangzheee 提交于 6月 30, 2022
```
* fix emb pass for ernie3.0

* fix emb pass for ernie3.0

* fix emb pass for ernie3.0
```
  35abeda7
- J
  
  modify graph_pattern to thread_local (#43945) · 1ea9971a
  由 JingZhuangzhuang 提交于 6月 30, 2022
  
  1ea9971a
29 6月, 2022 1 次提交
- R
  cherry pick 43890 (#43892) · 69e82d83
  由 ronnywang 提交于 6月 29, 2022
```
* cherry pick 43890
```
  69e82d83
28 6月, 2022 1 次提交
- Z
  [Inference TRT] elementwise layer support (#43851) · 17a2003d
  由 zhoutianzi666 提交于 6月 28, 2022
```
* elementwise support

* commit
```
  17a2003d
27 6月, 2022 2 次提交

G
[cherry-pick]Update quantization round and clip calculation methods (#43829) · ff70a269
由 Guanghua Yu 提交于 6月 27, 2022
```
* update quantization clip and round

* fix quantization clip and round Attribute

* fix typo
```
ff70a269

[Cherry-pick] Fix incompatible error for place type (#43830) · 9e776f62

由 Chen Weihang 提交于 6月 27, 2022

* Create Tensor by paddle::empty  in custom operator (#41840)

* create tensor by empty in custom op

* fix some bug

* update relu custom op demo (#43173)

* Fix incompatible error for custom op Placetype (#43749)

* fix incompatible error

* rmeove default constructor

* add macro

* fix cpu make error

* add DefaultGPUPlace api
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>

9e776f62

25 6月, 2022 1 次提交
- L
  [new-exec] lazy creating work queue (#43551) (#43768) · 0c44dd64
  由 Leo Chen 提交于 6月 25, 2022
```
* lazy creating work queue

* fix dry_run
```
  0c44dd64
24 6月, 2022 1 次提交

[cherry-pick] NVIDIA fixes (#43780) · 9edbe4aa

由 Aganlengzi 提交于 6月 24, 2022

* Use all sitepackages path as the library/include path (#42940)

* Fix several unit tests and increase the unit tests stability (#43670)

* Reduce gather op unit tests size and increase the timeout

* Add NVIDIA_TF32_OVERRIDE for multi-processes environment

* Remove record test for device event ut

* Fix 3 unittest errors (#43532)

* Fix test_fuse_resnet_unit failure

* Fix test_imperative_auto_mixed_precision failure

* Fix sparse_attention_op error

* Fix sparse_attention_op error

* Use fixed random seed (#43659)

* for CI test_collective_sendrecv_api
Co-authored-by: Nzlsh80826 <rewang@nvidia.com>
Co-authored-by: NShijie <505749828@qq.com>

9edbe4aa

23 6月, 2022 4 次提交
- L
  
  remove slowing down pass (#43750) · 096eb801
  由 lidanqing 提交于 6月 23, 2022
  
  096eb801
- W
  
  [cherry pick][Inference]Enhance gpu multihead matmul v3 fuse pass (#43765) · 94bacb47
  由 WJJ1995 提交于 6月 23, 2022
  
  94bacb47
- H
  [cherry pick 2.3][Inference]Fix the ort Backend multiple input bug(#43621 #43742) (#43739) · babba557
  由 heliqi 提交于 6月 22, 2022
```
* cherry pick form develop 43621

* code format

* paddle2onnx update to 0.9.8
```
  babba557
- L
  [cherry-pick] release/2.3 elementwise_mul and matmul mkldnn fix (#43725) · a7e0cdea
  由 lidanqing 提交于 6月 23, 2022
```
* Correct elementwise quantization (#43693)

* [Bug fix] Do not quantize weights Y when matmul X and Y both other ops outputs (#43297)

* fix some matmul that X and Y both other ops outputs, do not dequantize the Y.

* fix CI format

* fix according to review
Co-authored-by: Njoanna.wozna.intel <joanna.wozna@intel.com>
```
  a7e0cdea
22 6月, 2022 4 次提交

Cherry pick 43307 (#43618) · d0bbf46c

由 ccrrong 提交于 6月 22, 2022

* add bilinear_interp_v2 converter

* update op_teller.cc

* add unittest for bilinear_interp_v2 converter

* code format

* bug fix

* code format and add unittest

* remove merged modify in op_teller.cc

* code format

* code format

* fix scale init error

d0bbf46c

Optimize linspace to avoid GPU -> CPU copy. (#42750) (#43746) · 4dcfc6df

由 Yiqun Liu 提交于 6月 22, 2022

cherry-pick #42750。

QA反馈，#42750 优化后，solov2模型性能可提升6%，故cherry-pick到2.3。因#41096 将linspace python实现从fluid.layers.tensor挪到了paddle.tensor.creation下，该pr不在release/2.3分支中，故将#42750 中python修改同步到fluid.layers.tensor.linspace中。

4dcfc6df

[cherry pick] Support optional residual add in fused ops and slice large... · 0660d5f2

由 Zhang Ting 提交于 6月 22, 2022

[cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax (#43719)

 [cherry pick] Support optional residual add in fused ops and slice large tensor for cudnn_softmax

cherry-pick #43635 #43681 #43474

0660d5f2

test=document_fix;cherry pick code format check upgrade to release/2.3 (#43732) · 8e6a1945

由 Sing_chan 提交于 6月 22, 2022

Only cherry pick format tool(clang-format, yapf, cmake-format) upgrade to release/2.3, lint tool such as cpplint will not move, because we are not going to fix cpplint error in release/2.3
pre_commit.sh also is moved to release/2.3 so that both PR-CI-pre-commit and PR-CI-pre-commit-23 can works.
pre install clang-format to avoid repeat installation due to pre-commit's multi-thread running.

8e6a1945

21 6月, 2022 4 次提交
- G
  [cherry pick #43088 #40664] Add float16 to fake quantize/dequantize OP (#43689) · 9783e887
  由 Guanghua Yu 提交于 6月 21, 2022
```
* cherry pick #43088 #40664

* fix clang format
```
  9783e887
- C
  [Cherry-pick] Update CUDA and TensorRT version for CI (#43642) · a363e5ab
  由 chalsliu 提交于 6月 21, 2022
```
* Update CUDA and TensorRT version for CI

* disable ut

* Update TensorRT for CUDA 10.2
```
  a363e5ab
- N
  delete the log printing in layout autotune (#43677) · 090a9132
  由 niuliling123 提交于 6月 21, 2022
```
删除 layout autotune 中的多余打印
背景 ：layout autotune log会导致模型打印信息增多
```
  090a9132
- Z
  
  fix compile fail in cuda11.6 (#43588) · e1604f9e
  由 zhoutianzi666 提交于 6月 21, 2022
  
  e1604f9e
20 6月, 2022 1 次提交
- X
  [Cherry pick] Einsum memory optimization PR #43397 (#43554) · 638b69dc
  由 xiongkun 提交于 6月 20, 2022
```
* cherry pick from #43397

* fix code
```
  638b69dc
17 6月, 2022 2 次提交

W

Export symbols of phi operator library (#43478) · 68ed3b86
由 weishengying 提交于 6月 17, 2022

68ed3b86

[cherry-pick 2.3] Cherry parallel fused transformer api (#43505) · 19b87aec

由 WangXi 提交于 6月 17, 2022

* Rename dropout is test (#43098)

* replace dropout_is_test with is_test.
* improve atol on a100.

* fused_attention fused_feedforward api support Model Tensor Parallel (#42985)

* fix is_test bug in fused_feedforward. (#43508)
Co-authored-by: NLi Min <11663212+limin2021@users.noreply.github.com>

19b87aec

15 6月, 2022 1 次提交
- Z
  [cherry-pick] Fix bug of strided_slice and slice (#43388, #43443) (#43432) · 7e940b84
  由 zyfncg 提交于 6月 15, 2022
```
* fix bug of strided_slice (#43388)

* fix stride_slice bug

* fix bug

* fix bug of infer shape for slice (#43443)
```
  7e940b84
14 6月, 2022 1 次提交

[ CherryPick ] Cherry pick for einsum optimization. (#43468) · 22e75d92

由 xiongkun 提交于 6月 14, 2022

* [EinsumOp] Polish forward logic and backward logic for optimize (#42603)

* change logic for optimize

* modifty

* merge

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0 (#43010)

* [EinsumOp] Make EinsumOp support bfloat16. (#43085)

* change einsum_v2 as default and add new flags: FLAG_einsum_opt=1|0

* make EInsumOP support bf16

* add unittest for BF16

* add condition for test_BF16

* fix bugs

* fix

* change the backward api to fit einsum op

22e75d92

09 6月, 2022 1 次提交
- Z
  
  disable lite gpu (#43178) · 36980306
  由 zhupengyang 提交于 6月 09, 2022
  
  36980306
08 6月, 2022 2 次提交

Replace ReduceAmax/Amax.part.cu with KP (#43202) (#43263) · e161979e

由 niuliling123 提交于 6月 08, 2022

Reduce amax/amin frobenius_norm_kerne原始实现为Eigen实现，文件编译时间较长，因此本PR将其替换为KP实现
删除DefaultElementwiseOperator中重复功能支持，减少elementwise_double_grad OP编译时间

e161979e

H
Resolve protobuf of ORT Backend conflict (#43275) · c2804390
由 heliqi 提交于 6月 07, 2022
```
解决onnxruntime后端依赖的protobuf跟框架或外部protobuf版本冲突问题
```
c2804390

06 6月, 2022 1 次提交

cherry-pick 42645 (#43205) · 835a1888

由 niuliling123 提交于 6月 06, 2022

删除Broadcast function中rank例化以及Elementwise调用，降低编译时间。
从develop分支中的#42645 PR修改而来，由于develop分支与release分支相差较大，无法实现cherry-pick，因此针对release2.3重新提交PR.
Broadcast中关于rank的例化会导致底层模板展开较多，造成reduce_sum_grad_kernel.cu.o文件体积过大，修改后可以降低.o体积及编译时间

835a1888

30 5月, 2022 1 次提交
- W
  [Dy2St]Fix cond_block_grad error when handle no need grad vras (#43034) (#43084) · e6e85b35
  由 WangZhen 提交于 5月 30, 2022
```
* Fix cond_block_grad error when handle no need grad vras

* Add comment and UT
```
  e6e85b35

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致