提交 · 394f92aac8f8308849896209172f0f6db81edc69 · Crayon鑫 / Paddle

12 7月, 2021 10 次提交

[Paddle-TRT] IPluginExt -> IPluginV2 (#33680) · 394f92aa

由 zlsh80826 提交于 7月 12, 2021

* add trt LT version helper

* upgrade PluginTensorRT to IPluginV2Ext

* trt plugin factory is not usable in IPluginV2

* upgrade add plugin api to use IPluginV2

* remove IPlugin register and adapt getSerializeSize(), serialize()

* adapt IPluginV2Layer

* downgrade to IPluginV2

* implement elementwise clone

* add gelu plugin creator and fix gelu serialization bug

* add swish plugin creator and fix swish serialization bug

* format

* fix typo

* add elementwise plugin creator and fix serialization

* add base creator class

* add gelu plugin creator

* add hard swish creator and fix serialization

* add instance norm creator and fix serialization

* add layer norm creator and fix serialization

* add pool creator and fix serialization

* add prelu creator and fix serialization

* add slice creator and fix serialization

* add swish creator and fix serialization

* add instance norm op unittest

* remove redundent api

* fix wrong graph size to enable trt

* instance norm function move to cc

* add trt elementwise ut to trigger coverage

* remove opt cahce to hit serialization coverage

* remove opt cahce to hit serialization coverage

* remove unused code

* remove unused inputs_

* add dbg info

* remove dbg info

* add instance norm serialization

* roll back

* remove comment code

* remove trt plugin registery

* fix prelu dynamic serialization

* add prelu ut and reduce the input size to reduce memory usage

* fix pool dynamic plugin serialization and add ut

* refine pool ut with subtest

* add env for avoiding oom

* reduce test input size & increase pool op ut to 45s

* add the contributor

* remove copyright (will add in contributor)

* remove copyright (will add in contributor)

394f92aa

Q

[NPU] add NPU ops of stack and unstack, test=develop (#34084) · 0b20b76e
由 Qi Li 提交于 7月 12, 2021

0b20b76e
Z

optimize perfermance of multiple-dimension reduce (#33761) · 2dde0eb0
由 Zhang Zheng 提交于 7月 12, 2021

2dde0eb0
W

Support finetuning the model saved on the mac platform on the Linux platform (#34027) · 4d259b91
由 WeiXin 提交于 7月 12, 2021

4d259b91
P
[NPU] add dropout npu op (#34081) · c4e04986
由 pangyoki 提交于 7月 12, 2021
```
* add dropout npu op

* fix bugs

* add unittest

* fix bugs

* support 1-D input
```
c4e04986
P
[NPU] change ScatterAdd to EmbeddingDenseGrad in lookup_table NPU op (#33866) · 4d842050
由 pangyoki 提交于 7月 12, 2021
```
* change ScatterAdd to EmbeddingDenseGrad in lookup_table NPU op

* EmbeddingDenseGrad only supports dim 32

* fix shape error
```
4d842050
P

[NPU] slice support Tensor Input (#34067) · 871edade
由 pangyoki 提交于 7月 12, 2021

871edade
W

tem_fix_reshape_unitest (#34069) · 113539eb
由 Wangzheee 提交于 7月 12, 2021

113539eb
Y
softmax mask fuse upper triangle (#33981) · e2e1c57b
由 Yuang Liu 提交于 7月 12, 2021
```
* softmax mask fuse upper triangle

* cover not implemented cpu code
```
e2e1c57b
Z

add paddle/linalg.py to add new linalg apis (#34033) · bfbea8fd
由 zhiboniu 提交于 7月 12, 2021

bfbea8fd

09 7月, 2021 11 次提交
- Y
  
  [hybrid performance] pipeline cache trainer (#33998) · 98c7191d
  由 Yuang Liu 提交于 7月 09, 2021
  
  98c7191d
- L
  refine varbase init function (#34052) · dfff52ea
  由 Leo Chen 提交于 7月 09, 2021
```
* remove check on kwargs

* refine code, reuse commom function
```
  dfff52ea
- A
  Use CBLAS for SelectedRows elementwise add operation. (#34008) · 1412d3bc
  由 arlesniak 提交于 7月 09, 2021
```
* Use CBLAS for SelectedRows elementwise add operation. It's faster.

* template compilation fix

* reverted template compilation fix

* slimmed template compilation fix
Co-authored-by: NAdam Osewski <adam.osewski@intel.com>
```
  1412d3bc
- F
  
  depthwise_conv_mkl_pass (#33936) · 78ab656c
  由 feng_shuai 提交于 7月 09, 2021
  
  78ab656c
- Z
  
  fix output data type selection (#34040) · 033d736d
  由 zlsh80826 提交于 7月 09, 2021
  
  033d736d
- W
  opt dygraph python code (#33997) · 0a9ad8d7
  由 wanghuancoder 提交于 7月 09, 2021
```
* opt dygraph python code, test=develop

* refine, test=develop
```
  0a9ad8d7
- C
  
  [dygraph qat] change default config and fix bug (#34047) · 7858d332
  由 cc 提交于 7月 09, 2021
  
  7858d332
- L
  [NPU] Fix vector overflow in slice grad npu op (#34032) · 1f28968b
  由 Leo Chen 提交于 7月 09, 2021
```
* fix vector overflow

* refine code

* refine ut
```
  1f28968b
- C
  [PTQ ] wrap simulated layers and save the quantized model (#33962) · fd85be80
  由 cc 提交于 7月 09, 2021
```
* PTQ save quantized model

* Wrap simulated layer

* post process the inference model
```
  fd85be80
- J
  
  add NVIDIAN into AUTHORS (#34035) · 477d9f1e
  由 Jeng Bai-Cheng 提交于 7月 09, 2021
  
  477d9f1e
- Z
  
  fix double grad hang bug (#34023) · 8768ffb7
  由 Zeng Jinle 提交于 7月 09, 2021
  
  8768ffb7
08 7月, 2021 12 次提交
- D
  
  Add the op def for elementwise_mul and enhance layer_norm_fuse_pass (#33560) · 3508bd28
  由 dyning 提交于 7月 08, 2021
  
  3508bd28
- W
  correct conditions of gather in opteller (#33999) · 11f5a400
  由 wenbin 提交于 7月 08, 2021
```
* correct conditions of gather in opteller

* test=develop

* test=allcase
```
  11f5a400
- Y
  fix for no python coverage found (#33848) · 9f0411f1
  由 YUNSHEN XIE 提交于 7月 08, 2021
```
* fix error for no python coverage data
```
  9f0411f1
- W
  delete the function of saving layer object. (#33697) · e22701c4
  由 WeiXin 提交于 7月 08, 2021
```
* delete the function of saving layer object.

* edit doc of paddle.save/load and polish error message
```
  e22701c4
- C
  
  up cxx11 to cxx14 (#34015) · 6df7ac72
  由 Chen Weihang 提交于 7月 08, 2021
  
  6df7ac72
- H
  opt dygraph python code for 215 unchecked calls (#34024) · 9b611ea2
  由 Hao Lin 提交于 7月 08, 2021
```
* opt dygraph python API, test=develop

* Fix unbind bug in manipulation.py
```
  9b611ea2
- S
  add num_iters in fit/evalate (#33986) · 97faf90e
  由 shangliang Xu 提交于 7月 08, 2021
```
* add num_iters in fit/evalate, test=develop
```
  97faf90e
- L
  
  fix the bug, test=develop (#33996) · 6a36977d
  由 lilong12 提交于 7月 08, 2021
  
  6a36977d
- 王
  
  [pass_enhance]add global extra attributes for op def, test=develop (#34009) · 05643dc3
  由王明冬提交于 7月 08, 2021
  
  05643dc3
- fix zip inference library bug (#34025) · 80bd093a
  由 zhouweiwei2014 提交于 7月 08, 2021
  
  80bd093a
- M
  
  Distributed Automatic SParsity with Fleet (#33558) · 86cb3fb8
  由 Ming-Xu Huang 提交于 7月 08, 2021
  
  86cb3fb8
- W
  Fix test_jit_save_load random failure. (#34004) · 1e5437de
  由 WeiXin 提交于 7月 08, 2021
```
* Fix test_jit_save_load random failure.

* Since CI is not activated, recommit the code.

* delete temp file.
```
  1e5437de
07 7月, 2021 7 次提交
- 王
  
  fix some errors about pass enhance, test=develop (#33993) · 77a5b8b0
  由王明冬提交于 7月 07, 2021
  
  77a5b8b0
- W
  
  fix reshape trt condition (#34007) · 0914ff97
  由 Wilber 提交于 7月 07, 2021
  
  0914ff97
- P
  
  add Wait after TensorCopy (#34005) · cb73feea
  由 pangyoki 提交于 7月 07, 2021
  
  cb73feea
- L
  [NPU] NpuOpRunner supports host tensor as input (#33992) · cbf22d65
  由 Leo Chen 提交于 7月 07, 2021
```
* NpuOpRunner supports host tensor as input

* fix compile issue
```
  cbf22d65
- X
  
  [HIP] 解决hipMemcpy无法overlap的问题，修改后AMD GPU性能提升大于10% (#33982) · 20da7703
  由 xiayanming 提交于 7月 07, 2021
  
  20da7703
- F
  
  add no tensorrt warning (#33874) · 758dd7bb
  由 feng_shuai 提交于 7月 07, 2021
  
  758dd7bb
- J
  Added PRelu BF16/FP32 FWD/BWD kernels (#33878) · 375e5618
  由 jakpiase 提交于 7月 07, 2021
```
* added prelu bf16/fp32 fwd/bwd kernel
```
  375e5618

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致