提交 · e57a88b373633200cc82473a8444aae0f5e0688d · PaddlePaddle / Paddle

02 9月, 2021 8 次提交

Z
[NPU] Add label_smooth_op (#34828) · e57a88b3
由 zhulei 提交于 9月 02, 2021
```
* [NPU] Add label_smooth_op

* [NPU] Add label_smooth_op
```
e57a88b3
Y

[hybrid] [npu] fit npu nan/inf check (#35171) · 67ed7e12
由 Yuang Liu 提交于 9月 02, 2021

67ed7e12
L

Refactor transpose cuda kernel impl. (#35308) · 6e638d72
由 Li Min 提交于 9月 02, 2021

6e638d72
Z

add npu code not exec linux/windows cases (#35363) · 8525dd13
由 zhangchunle 提交于 9月 02, 2021

8525dd13
W

fix static error in summary (#35303) · b28cc734
由 wangna11BD 提交于 9月 02, 2021

b28cc734
W
add axis check for elementwise op while the dimension of x is equal to the... · 25871e0e
由 wangxinxin08 提交于 9月 02, 2021
```
add axis check for elementwise op while the dimension of x is equal to the dimension of tensor (#35340)
```
25871e0e

[Auto Parallel] Logical Partition & Dist Op (#35117) · a622b701

由 JZ-LIANG 提交于 9月 02, 2021

* support shard reader

* support shard reader

* add parallel mode

* update process mesh

* add method to compute comm_group

* implement dist_embedding forward func

* implement dist matmul forward func

* implement dist reshape forward func

* add transpiler framework

* add transpiler forward

* implement transpiler forward

* implement transpiler backward & update

* add process

* add unitest

* chmod

* chmod

* chmod

* update unitest

* add unitest for gpt

* remove unused print

* rename transpiler --> partitioner

* rename transpiler --> partitioner

* chmod

* chmod

* bug fixed

* remove amp function

* update case for dp mode

* update case for dp mode

a622b701

B

[npu] add update_loss_scaling npu min value (#35270) · 280d7421
由 Baibaifan 提交于 9月 02, 2021

280d7421

01 9月, 2021 22 次提交
- L
  
  add AsExtra for grid_sampler_op (#35339) · df57df94
  由 LielinJiang 提交于 9月 01, 2021
  
  df57df94
- J
  Added slice BF16/FP32 FWD/BWD kernels (#34332) · 070cab11
  由 jakpiase 提交于 9月 01, 2021
```
* aded slice FWD FP32

* added tests for slice FWD FP32

* added slice bwd

* added bf16 tests

* CI fix

* CI fix

* added reason to skip_if

* minor change

* temporary fix for failing test

* temporary fix

* changes after review

* CI rerun
```
  070cab11
- T
  [HeterPs] merge dense && data norm && g2sum (#35029) · a647b80a
  由 Thunderbrook 提交于 9月 01, 2021
```
* merge dense

* log level

* tensor copy sync

* format
```
  a647b80a
- S
  [HybridParallel]Support finetinue model for PipelineParallel (#35287) · 264ff9ef
  由 ShenLiang 提交于 9月 01, 2021
```
* add cache for send_recv

* add eval_batch for pipeline

* add eval batch for pipelineparallel

* add style code
```
  264ff9ef
- Y
  
  [NPU] set constant before copy data (#35335) · bee511d5
  由 Yuang Liu 提交于 9月 01, 2021
  
  bee511d5
- W
  modify fetch logic, use D2H Stream (#35191) · c56d6978
  由 wanghuancoder 提交于 9月 01, 2021
```
* modify fetch logic, use D2H Stream, test=develop

* refine, test=develop
```
  c56d6978
- B
  add strided_slice_grad op for npu (#35204) · 7743cdf2
  由 baoachun 提交于 9月 01, 2021
```
* add strided_slice_grad op for npu
```
  7743cdf2
- L
  support setting linewidth when printing tensor (#35175) · 5fa7d9ce
  由 Leo Chen 提交于 9月 01, 2021
```
* support setting linewith when printing tensor

* fix ut

* refine code

* update comments

* use small precision since windows/linux has different ramdom value

* fix typo

* adjust parameter order for consistency
```
  5fa7d9ce
- L
  add input and output description docs for vision transform (#34926) · 4f54891c
  由 LielinJiang 提交于 9月 01, 2021
```
* add input and output docs for vision transform
```
  4f54891c
- Q
  
  [NPU] skip NPU UT if no npu files changed, test=develop (#35338) · 5eefc8c7
  由 Qi Li 提交于 9月 01, 2021
  
  5eefc8c7
- J
  
  bugfix for mp accuracy (#35326) · 7f17f9a0
  由 JZ-LIANG 提交于 9月 01, 2021
  
  7f17f9a0
- 0
  [Dy2stat]modify dy2stat error message in compile time (#35320) · b24f84c8
  由 0x45f 提交于 9月 01, 2021
```
* modify dy2stat error message in compile time

* fix variable name
```
  b24f84c8
- W
  fix bug:When axes in paddle.slice is a tuple, an error occurs. (#35267) · b53887fd
  由 WeiXin 提交于 9月 01, 2021
```
* fix bug:When axes in paddle.sile is a tuple, an error occurs.

* polish code.
```
  b53887fd
- W
  Stablize depthwise conv (#35161) · 3c21f26b
  由 wangguanzhong 提交于 9月 01, 2021
```
* stablize depthwise conv

* clean commend
```
  3c21f26b
- Q
  support KL label smooth (#35177) · 7ca28bb6
  由 QingshuChen 提交于 9月 01, 2021
```
* support KL label smooth

* update UT for KL label_smooth
```
  7ca28bb6
- C
  
  add support ops for quantization (#35312) · 5baccfdd
  由 cc 提交于 9月 01, 2021
  
  5baccfdd
- R
  
  [NPU]shard index op for npu (#35281) · 5c27c2c0
  由 Roc 提交于 9月 01, 2021
  
  5c27c2c0
- N
  add ElementwiseTernary, Reduce, ReadDataStride (#35075) · 12df57fb
  由 niuliling123 提交于 9月 01, 2021
```
* add ElementwiseTernary, Reduce, ReadDataStride
```
  12df57fb
- C
  
  label prelu op (#35315) · d9afa839
  由 cc 提交于 9月 01, 2021
  
  d9afa839
- A
  [Dy2Stat]Support append method and initialized value for List in ControlFlow (#35212) · 3b52f68e
  由 Aurelius84 提交于 9月 01, 2021
```
* Support append method and initialized value for List in ControlFlow

* polish error msg and en doc

* fix code style
```
  3b52f68e
- Z
  Support settiem by Bool index (#35133) · d387820d
  由 zyfncg 提交于 9月 01, 2021
```
* Support getitem by Bool index

* delete some debug info of bool index

* support the case that the shape of bool index is different from indexed tensor

* support setitem by bool index

* add the unittest for throwing exception

* merge conflict

* add check for int tensor when index is bool
```
  d387820d
- Z
  
  reverse xpu adamw to the combination of ops version. (#35286) · 884011a4
  由 zhaoyingli 提交于 9月 01, 2021
  
  884011a4
31 8月, 2021 10 次提交

Support CostInfo and MemProfiler in InterpreterCore (#34981) · 572bad8a

由 Aurelius84 提交于 8月 31, 2021

* polish code

* fix unittest on windows

* refine pybind interface

* support statistic MemSize of AllocatorPool

* Replace mutex into atomic

572bad8a

transformer opt python files (#35206) · e2991555

由 Feng Xing 提交于 8月 31, 2021

This PR adds fused transformer python related files. It defines interface of fused transformer.

Fused transformer implements an optimized version of transformer layer (in python/paddle/nn/layer/transformer.py). In this PR, four layers (functions) are defined:
(1) FusedMultiHeadAttention: multi-head attention layer
(2) FusedFeedForward: feed forward layer
(3) FusedTransformerEncoderLayer: transformer encoder layer
(4) FusedTransformer: transformer layer

e2991555

A
[Dy2Stat]Add model ResNet50 for Dy2stat AMP training (#35276) · 079c585c
由 Aurelius84 提交于 8月 31, 2021
```
* Add model for ResNet50 for Dy2stat AMP training

* fix timeout

* fix dataloader
```
079c585c
Q
[NPU] fix cmake for ascend ci, test=develop (#35255) · f6004ab9
由 Qi Li 提交于 8月 31, 2021
```
* [NPU] fix cmake for ascend ci, test=develop

* update paddle_build.sh scripts, test=allcase
```
f6004ab9
S
Revert "Revert "Add copy from tensor (#34406)" (#35173)" (#35256) · 6116f9af
由 Shang Zhizhou 提交于 8月 31, 2021
```
* Revert "Revert "Add copy from tensor (#34406)" (#35173)"

This reverts commit 32c1ec42.

* add template instantiation
```
6116f9af
fix bug that cmake find python (#35304) · 00c9aeb0
由 zhouweiwei2014 提交于 8月 31, 2021

00c9aeb0

New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d

由 Zhanlue Yang 提交于 8月 31, 2021

[Background]
Expansion in code size can be irreversible in the long run, leading to huge release packages which
not only hampers user experience but also exceeds a hard limit of pypi.

In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
arches supported.

This PR aims to prune this NV_FATBIN.

[Solution]
In the new release strategy, two types of whl packages will be involved:

Cubin PIP package:
PIP package maintains a smaller window for GPU arches support, containing
sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches

JIT release package:
This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
compute_70, compute_75, compute_80, with best performance and GPU arches coverage.

However, it takes around 10 min to install due to the JIT compilation.

[How to use]
The new release strategy is disabled by default.
To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL

2f3b393d

T
Put code style check on gpu_ci (#35309) · d9f59fd1
由 tianshuo78520a 提交于 8月 31, 2021
```
* notest;test=cpu

* test

* test=document_fix
```
d9f59fd1
W

update infer trt ut. (#35261) · 96e7d903
由 Wilber 提交于 8月 31, 2021

96e7d903
W
add trt error information. (#35277) · a2afcace
由 wenbin 提交于 8月 31, 2021
```
* add trt error information.

* rerun ci
```
a2afcace

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功