提交 · 12df57fbd03b762de0736c4440881822df18e15d · Crayon鑫 / Paddle

01 9月, 2021 5 次提交
- N
  add ElementwiseTernary, Reduce, ReadDataStride (#35075) · 12df57fb
  由 niuliling123 提交于 9月 01, 2021
```
* add ElementwiseTernary, Reduce, ReadDataStride
```
  12df57fb
- C
  
  label prelu op (#35315) · d9afa839
  由 cc 提交于 9月 01, 2021
  
  d9afa839
- A
  [Dy2Stat]Support append method and initialized value for List in ControlFlow (#35212) · 3b52f68e
  由 Aurelius84 提交于 9月 01, 2021
```
* Support append method and initialized value for List in ControlFlow

* polish error msg and en doc

* fix code style
```
  3b52f68e
- Z
  Support settiem by Bool index (#35133) · d387820d
  由 zyfncg 提交于 9月 01, 2021
```
* Support getitem by Bool index

* delete some debug info of bool index

* support the case that the shape of bool index is different from indexed tensor

* support setitem by bool index

* add the unittest for throwing exception

* merge conflict

* add check for int tensor when index is bool
```
  d387820d
- Z
  
  reverse xpu adamw to the combination of ops version. (#35286) · 884011a4
  由 zhaoyingli 提交于 9月 01, 2021
  
  884011a4
31 8月, 2021 18 次提交

Support CostInfo and MemProfiler in InterpreterCore (#34981) · 572bad8a

由 Aurelius84 提交于 8月 31, 2021

* polish code

* fix unittest on windows

* refine pybind interface

* support statistic MemSize of AllocatorPool

* Replace mutex into atomic

572bad8a

transformer opt python files (#35206) · e2991555

由 Feng Xing 提交于 8月 31, 2021

This PR adds fused transformer python related files. It defines interface of fused transformer.

Fused transformer implements an optimized version of transformer layer (in python/paddle/nn/layer/transformer.py). In this PR, four layers (functions) are defined:
(1) FusedMultiHeadAttention: multi-head attention layer
(2) FusedFeedForward: feed forward layer
(3) FusedTransformerEncoderLayer: transformer encoder layer
(4) FusedTransformer: transformer layer

e2991555

A
[Dy2Stat]Add model ResNet50 for Dy2stat AMP training (#35276) · 079c585c
由 Aurelius84 提交于 8月 31, 2021
```
* Add model for ResNet50 for Dy2stat AMP training

* fix timeout

* fix dataloader
```
079c585c
Q
[NPU] fix cmake for ascend ci, test=develop (#35255) · f6004ab9
由 Qi Li 提交于 8月 31, 2021
```
* [NPU] fix cmake for ascend ci, test=develop

* update paddle_build.sh scripts, test=allcase
```
f6004ab9
S
Revert "Revert "Add copy from tensor (#34406)" (#35173)" (#35256) · 6116f9af
由 Shang Zhizhou 提交于 8月 31, 2021
```
* Revert "Revert "Add copy from tensor (#34406)" (#35173)"

This reverts commit 32c1ec42.

* add template instantiation
```
6116f9af
fix bug that cmake find python (#35304) · 00c9aeb0
由 zhouweiwei2014 提交于 8月 31, 2021

00c9aeb0

New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d

由 Zhanlue Yang 提交于 8月 31, 2021

[Background]
Expansion in code size can be irreversible in the long run, leading to huge release packages which
not only hampers user experience but also exceeds a hard limit of pypi.

In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
arches supported.

This PR aims to prune this NV_FATBIN.

[Solution]
In the new release strategy, two types of whl packages will be involved:

Cubin PIP package:
PIP package maintains a smaller window for GPU arches support, containing
sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches

JIT release package:
This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
compute_70, compute_75, compute_80, with best performance and GPU arches coverage.

However, it takes around 10 min to install due to the JIT compilation.

[How to use]
The new release strategy is disabled by default.
To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL

2f3b393d

T
Put code style check on gpu_ci (#35309) · d9f59fd1
由 tianshuo78520a 提交于 8月 31, 2021
```
* notest;test=cpu

* test

* test=document_fix
```
d9f59fd1
W

update infer trt ut. (#35261) · 96e7d903
由 Wilber 提交于 8月 31, 2021

96e7d903
W
add trt error information. (#35277) · a2afcace
由 wenbin 提交于 8月 31, 2021
```
* add trt error information.

* rerun ci
```
a2afcace
W
fix CI skip cc test error (#35264) · 3d76d003
由 wuhuanzhou 提交于 8月 31, 2021
```
* fix CI skip cc test error, test=develop

* remove test code, test=develop
```
3d76d003
H
Add AsExtra() for conditional_block_op.h (#35268) · 2100816c
由 Huihuang Zheng 提交于 8月 31, 2021
```
As the title, see details at the PR description.
```
2100816c
fix windows batch file error:The system cannot find the batch label specified (#35288) · 2c0d667b
由 zhouweiwei2014 提交于 8月 31, 2021

2c0d667b
王

fix the pass compat check position error, test=develop (#35272) · 54f07019
由王明冬提交于 8月 31, 2021

54f07019
X

support fuse layers for ptq (#35015) · ef536250
由 XGZhang 提交于 8月 31, 2021

ef536250
A

NPU add elementwise_mod (#35245) · 561841d2
由 Aganlengzi 提交于 8月 31, 2021

561841d2
A

NPU add fill_zeros_like kernel (#35246) · aaaa9965
由 Aganlengzi 提交于 8月 31, 2021

aaaa9965
P
change exit code and polish infer_ut summary style (#35254) · 531a8909
由 Peihan 提交于 8月 31, 2021
```
* change exit code and summary style

* disable test_ernie_text_cls on windows
```
531a8909

30 8月, 2021 10 次提交
- X
  [Paddle Inference-TRT]Adding six op unittest codes of TRT INT8 (#35130) · 39565147
  由 xiaoxiaohehe001 提交于 8月 30, 2021
```
* add_op_unittest
```
  39565147
- Z
  [NPU] Add log_loss op (#35010) · b94d7ff3
  由 zhulei 提交于 8月 30, 2021
```
* [NPU] Add log_loss op

* [NPU] Add log_loss op

* [NPU] Add log_loss op
```
  b94d7ff3
- C
  
  fix using boost::none as the init value when using paddle::optional (#35215) · e864667b
  由 chentianyu03 提交于 8月 30, 2021
  
  e864667b
- J
  
  - candidate fix (#35231) · ca4d2fca
  由 Jacek Czaja 提交于 8月 30, 2021
  
  ca4d2fca
- Z
  [Op Def] Add extra def of linear_interp & linear_interp_v2 & addmm (#35233) · 6ff179ab
  由 zhulei 提交于 8月 30, 2021
```
* [Op Def] Add extra def of linear_interp & linear_interp_v2

* [Op Def] Add extra def of linear_interp & linear_interp_v2 & addmm
```
  6ff179ab
- C
  [paddle-TRT]support matmul set to int8 in multihead (#34917) · 0043fa8c
  由 ceci3 提交于 8月 30, 2021
```
* update ernie int8
```
  0043fa8c
- T
  
  del message;test=document_fix (#35248) · c0bdef5d
  由 tianshuo78520a 提交于 8月 30, 2021
  
  c0bdef5d
- X
  Set value (#34886) · 37d281c9
  由 xiongkun 提交于 8月 30, 2021
```
* tmp

* Tile - Assign - Crop

* Finish the set value npu kernel and test case in npu

* improve the error message

* Modify according to zhangliujie

* code review
```
  37d281c9
- T
  Add cpu/gpu for PR-CI-CPU-Py2 (#35174) · 8f94d349
  由 tianshuo78520a 提交于 8月 30, 2021
```
* notest;test=cpu_gpu

* notest;test=cpu_gpu

* notest;test=cpu_gpu

* notest;test=cpu_gpu

* notest;test=cpu_gpu

* notest;test=cpu_gpu

* notest;test=cpu_gpu

* fix

* fix
```
  8f94d349
- A
  Abstract GenerateDeviceEventFlag to shield platforms (#35219) · 20cfa8ba
  由 Aurelius84 提交于 8月 30, 2021
```
* Abstract GenerateDeviceEventFlag to shield platforms

* Remove get_cuda_flags
```
  20cfa8ba
29 8月, 2021 1 次提交
- G
  
  test=document_fix (#35221) · 31cd1065
  由 Guoxia Wang 提交于 8月 29, 2021
  
  31cd1065
27 8月, 2021 6 次提交
- G
  
  test=document_fix (#35222) · 5dcff7c8
  由 Guoxia Wang 提交于 8月 27, 2021
  
  5dcff7c8
- J
  
  add uniform_ op and UT (#33934) · be29b8ee
  由 JYChen 提交于 8月 27, 2021
  
  be29b8ee
- X
  
  add more models for model_benchmark_ci,test=document_fix (#35178) · 5a72cf43
  由 xiegegege 提交于 8月 27, 2021
  
  5a72cf43
- X
  Add unpool2d op & Expose max_unpool2d API (#35056) · ceee71a0
  由 xiaoting 提交于 8月 27, 2021
```
* add maxunppol2d op, test=develop

* fix typo, test=develop

* fix unpool unitest, test=develop

* fix unpool code-example, test=develop

* fix for unpool_op_unittest,test=develop

* fix example code, test=develop

* add noqa:F401, test=develop

* fix converage, test=develop

* fix unitest for unpool, test=develop

* rename unpool2d to unpool, test=develop

* rename unpool2d to unpool, test=develop
```
  ceee71a0
- G
  sparse_momentum_op is used to save w@GRAD memory for gather_op (#34942) · 234ce932
  由 Guoxia Wang 提交于 8月 27, 2021
```
* sparse_momentum_op is used to save w@GRAD memory for gather_op when gather from a large parameter
```
  234ce932
- W
  
  [hybrid] Fix row parallel linear bias (#35186) · 1533d7e2
  由 WangXi 提交于 8月 27, 2021
  
  1533d7e2

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致