提交 · 2b557da0a1561b4d2cbc1e62c3bdb28dd76a1dc4 · PaddlePaddle / Paddle

13 7月, 2021 1 次提交
- Z
  
  expose gc analysis interface (#34092) · 2b557da0
  由 Zeng Jinle 提交于 7月 13, 2021
  
  2b557da0
12 7月, 2021 10 次提交

W

[hybrid performance] Optimize pipeline send wait (#34086) · 5f65ff91
由 WangXi 提交于 7月 12, 2021

5f65ff91

[NPU ]add npu kernel for gaussian random (#33983) · 9cda0596

由 houj04 提交于 7月 12, 2021

* add npu operator for gaussian random.

* bugfix: add wait after memory copy.

* update gaussian random op: use TensorCopy.

9cda0596

[Paddle-TRT] IPluginExt -> IPluginV2 (#33680) · 394f92aa

由 zlsh80826 提交于 7月 12, 2021

* add trt LT version helper

* upgrade PluginTensorRT to IPluginV2Ext

* trt plugin factory is not usable in IPluginV2

* upgrade add plugin api to use IPluginV2

* remove IPlugin register and adapt getSerializeSize(), serialize()

* adapt IPluginV2Layer

* downgrade to IPluginV2

* implement elementwise clone

* add gelu plugin creator and fix gelu serialization bug

* add swish plugin creator and fix swish serialization bug

* format

* fix typo

* add elementwise plugin creator and fix serialization

* add base creator class

* add gelu plugin creator

* add hard swish creator and fix serialization

* add instance norm creator and fix serialization

* add layer norm creator and fix serialization

* add pool creator and fix serialization

* add prelu creator and fix serialization

* add slice creator and fix serialization

* add swish creator and fix serialization

* add instance norm op unittest

* remove redundent api

* fix wrong graph size to enable trt

* instance norm function move to cc

* add trt elementwise ut to trigger coverage

* remove opt cahce to hit serialization coverage

* remove opt cahce to hit serialization coverage

* remove unused code

* remove unused inputs_

* add dbg info

* remove dbg info

* add instance norm serialization

* roll back

* remove comment code

* remove trt plugin registery

* fix prelu dynamic serialization

* add prelu ut and reduce the input size to reduce memory usage

* fix pool dynamic plugin serialization and add ut

* refine pool ut with subtest

* add env for avoiding oom

* reduce test input size & increase pool op ut to 45s

* add the contributor

* remove copyright (will add in contributor)

* remove copyright (will add in contributor)

394f92aa

Q

[NPU] add NPU ops of stack and unstack, test=develop (#34084) · 0b20b76e
由 Qi Li 提交于 7月 12, 2021

0b20b76e
P
[NPU] add dropout npu op (#34081) · c4e04986
由 pangyoki 提交于 7月 12, 2021
```
* add dropout npu op

* fix bugs

* add unittest

* fix bugs

* support 1-D input
```
c4e04986
P
[NPU] change ScatterAdd to EmbeddingDenseGrad in lookup_table NPU op (#33866) · 4d842050
由 pangyoki 提交于 7月 12, 2021
```
* change ScatterAdd to EmbeddingDenseGrad in lookup_table NPU op

* EmbeddingDenseGrad only supports dim 32

* fix shape error
```
4d842050
P

[NPU] slice support Tensor Input (#34067) · 871edade
由 pangyoki 提交于 7月 12, 2021

871edade
W

tem_fix_reshape_unitest (#34069) · 113539eb
由 Wangzheee 提交于 7月 12, 2021

113539eb
Y
softmax mask fuse upper triangle (#33981) · e2e1c57b
由 Yuang Liu 提交于 7月 12, 2021
```
* softmax mask fuse upper triangle

* cover not implemented cpu code
```
e2e1c57b
Z

add paddle/linalg.py to add new linalg apis (#34033) · bfbea8fd
由 zhiboniu 提交于 7月 12, 2021

bfbea8fd

09 7月, 2021 6 次提交
- Y
  
  [hybrid performance] pipeline cache trainer (#33998) · 98c7191d
  由 Yuang Liu 提交于 7月 09, 2021
  
  98c7191d
- W
  opt dygraph python code (#33997) · 0a9ad8d7
  由 wanghuancoder 提交于 7月 09, 2021
```
* opt dygraph python code, test=develop

* refine, test=develop
```
  0a9ad8d7
- C
  
  [dygraph qat] change default config and fix bug (#34047) · 7858d332
  由 cc 提交于 7月 09, 2021
  
  7858d332
- L
  [NPU] Fix vector overflow in slice grad npu op (#34032) · 1f28968b
  由 Leo Chen 提交于 7月 09, 2021
```
* fix vector overflow

* refine code

* refine ut
```
  1f28968b
- C
  [PTQ ] wrap simulated layers and save the quantized model (#33962) · fd85be80
  由 cc 提交于 7月 09, 2021
```
* PTQ save quantized model

* Wrap simulated layer

* post process the inference model
```
  fd85be80
- Z
  
  fix double grad hang bug (#34023) · 8768ffb7
  由 Zeng Jinle 提交于 7月 09, 2021
  
  8768ffb7
08 7月, 2021 6 次提交
- W
  delete the function of saving layer object. (#33697) · e22701c4
  由 WeiXin 提交于 7月 08, 2021
```
* delete the function of saving layer object.

* edit doc of paddle.save/load and polish error message
```
  e22701c4
- H
  opt dygraph python code for 215 unchecked calls (#34024) · 9b611ea2
  由 Hao Lin 提交于 7月 08, 2021
```
* opt dygraph python API, test=develop

* Fix unbind bug in manipulation.py
```
  9b611ea2
- S
  add num_iters in fit/evalate (#33986) · 97faf90e
  由 shangliang Xu 提交于 7月 08, 2021
```
* add num_iters in fit/evalate, test=develop
```
  97faf90e
- L
  
  fix the bug, test=develop (#33996) · 6a36977d
  由 lilong12 提交于 7月 08, 2021
  
  6a36977d
- M
  
  Distributed Automatic SParsity with Fleet (#33558) · 86cb3fb8
  由 Ming-Xu Huang 提交于 7月 08, 2021
  
  86cb3fb8
- W
  Fix test_jit_save_load random failure. (#34004) · 1e5437de
  由 WeiXin 提交于 7月 08, 2021
```
* Fix test_jit_save_load random failure.

* Since CI is not activated, recommit the code.

* delete temp file.
```
  1e5437de
07 7月, 2021 4 次提交
- W
  
  fix reshape trt condition (#34007) · 0914ff97
  由 Wilber 提交于 7月 07, 2021
  
  0914ff97
- P
  
  add Wait after TensorCopy (#34005) · cb73feea
  由 pangyoki 提交于 7月 07, 2021
  
  cb73feea
- J
  Added PRelu BF16/FP32 FWD/BWD kernels (#33878) · 375e5618
  由 jakpiase 提交于 7月 07, 2021
```
* added prelu bf16/fp32 fwd/bwd kernel
```
  375e5618
- T
  
  [xpu] add dropout & amp ops in xpu place (#33891) · 84e813e3
  由 taixiurong 提交于 7月 07, 2021
  
  84e813e3
06 7月, 2021 7 次提交
- T
  add so parser (#33969) · b1c458d0
  由 Thunderbrook 提交于 7月 06, 2021
```
* add delta score, scale show

* so parser

* windows

* windows
```
  b1c458d0
- Z
  Add gpu implementation of shuffle_batch_op (#33938) · c6b6ba1f
  由 Zeng Jinle 提交于 7月 06, 2021
```
* add gpu implementation of shuffle batch
test=develop

* add thrust cuda patches
test=develop

* fix macro guard

* fix shuffle batch compile on windows/hip

* fix hip compilation error

* refine CMakeLists.txt

* fix windows compile error

* try to fix windows CI compilation error

* fix windows compilation again

* fix shuffle_batch op test on Windows
```
  c6b6ba1f
- K
  
  make DataLoader warning less noisy. test=develop (#33712) · 5085c44b
  由 Kaipeng Deng 提交于 7月 06, 2021
  
  5085c44b
- W
  
  [hybrid performance] pipeline add program cache (#33954) · c9ae1362
  由 WangXi 提交于 7月 06, 2021
  
  c9ae1362
- Z
  
  public api:add bn\ln\in; add static.xpu_place (#33897) · 6b95e674
  由 zhiboniu 提交于 7月 06, 2021
  
  6b95e674
- X
  Enhance error message for interpolate_v2 (#33941) · f2068eec
  由 xiaoting 提交于 7月 06, 2021
```
* fix interpolate for shape[i]=0, test=develop

* fix test_trilinear_interp_v2 random failure, test=develop
```
  f2068eec
- D
  【HETERPS】pipeline adaptive for heterps (#33159) · bfef7feb
  由 danleifeng 提交于 7月 06, 2021
```
* pipeline adaptive for heterps;test=develop
* fix finalize hang;test=develop
* add is_compiled_with_heterps for dataset;test=develop
* fix hashtable core when pass ins_num=0;test=develop
```
  bfef7feb
05 7月, 2021 6 次提交
- A
  
  [Dy2Stat]Fix unique_name in create_static_variable_gast_node (#33963) · 740f4e30
  由 Aurelius84 提交于 7月 05, 2021
  
  740f4e30
- J
  add `reduce_sum` op into amp black list (#33960) · aa9fdd0d
  由 jiangcheng 提交于 7月 05, 2021
```
* reduce sum op default fp32, add into amp black list

* reduce_sum default fp32 can avoid return inf when the sum value large than 65504
```
  aa9fdd0d
- W
  
  [hybrid performance] optimize pipeline performance · 9914dff7
  由 WangXi 提交于 7月 05, 2021
  
  9914dff7
- L
  
  make stop_gradient=True for random op in static graph (#33959) · 43876e8b
  由 Leo Chen 提交于 7月 05, 2021
  
  43876e8b
- W
  
  Add fused elemwise gelu and optimize performance (#33480) · eae31856
  由 WangXi 提交于 7月 05, 2021
  
  eae31856
- P
  [NPU] change Add to AddN in sum npu op (#33957) · fa5ddfd9
  由 pangyoki 提交于 7月 05, 2021
```
* change Add to AddN in sum npu op

* add AddInputNames

* change fp16 to fp32 because numpy has accuracy loss in fp16 adding

* delete check

* fix runner error
```
  fa5ddfd9

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功