提交 · 01063218f273931f6856777b7aa797109fedbbae · PaddlePaddle / Paddle

18 9月, 2021 5 次提交

A
split cuda_profiler into .h and .cc (#35821) · 01063218
由 Aurelius84 提交于 9月 18, 2021
```
* split cuda_profiler into .h and .cc

* fix cmake

* remove inline
```
01063218
W

trt support serialize and deserialize (#35828) · ba71421c
由 Wilber 提交于 9月 18, 2021

ba71421c
A
Clean ParseMemInfo and Fix unittest failed under multi-thread (#35840) · 2fff5a58
由 Aurelius84 提交于 9月 18, 2021
```
* Clean ParaseMemInfo and fix unittest with multi-thread

* fix declare
```
2fff5a58

[oneDNN] Disable caching of Reorder operation (#35664) · e4c2a854

由 Jacek Czaja 提交于 9月 18, 2021

* - REorder disabling caching

* - compilation fix

* - another compilation fix

* - another compilation fix

* - compilation fix

* - Fix

* - yet another compilation fix

* - suppresingly another compilation fix

* - lint

* - fix after review

* - fix

e4c2a854

Add new API "eigvals" in linalg (#35720) · d411a038

由 From00 提交于 9月 18, 2021

* Add linalg.eigvals API

* pre-commit check

* Adjust code style

* Fix conflict

* Improve code style

* Modify the test code to ignore testing CUDA kernel

* Sort ouput data before checking in test code

* Set timeout value for UT

* Improve API example code to pass CI

* Fix bug for None fetch_list in Windows

* Delete grad Op

d411a038

17 9月, 2021 29 次提交

[AMP] Support pure fp16 training mode for dygraph (#35521) · adaeee4d

由 zhangbo9674 提交于 9月 17, 2021

* add pure fp16 major function in auto_cast & tracer

* support master weight in dygraph for pure fp16

* check mix dtype of fp16&fp32 for check_finite_and_unscale op

* change pure fp16 funtion name

* refine some bug in auto_cast

* refine auto_cast interface logic

* add param _casted_by_pure_fp16 for class Layer

* support state_dict hook for save model by user appointed dtype in pure_fp16_decorator

* refine pure_fp16_decorator as decorator

* add unittest

* add comment

* add comment

* support recompute

* add comment for auto_cast and decorator

* support to_static_state_dict for paddle.jit.save

* unlimite models num and optimizers num

* add lookup_table in black_list

* fix momentum and layer state_dict

* fix bug in layer state_dict

* fix bug in layer state_dict_helper

* refine unittest

* refine test_momentun_op

* refine interface and some code

* refine amp_decorator interface

* refine pure fp16 interface

* refine master weight interface

adaeee4d

L
temporally disable the warnings (#35560) · 68ae6345
由 Leo Chen 提交于 9月 17, 2021
```
* temporally disable the warnings

* disable ut
```
68ae6345
Z

change to PADDLE_DEFINE_EXPORTED (#35841) · d22914fd
由 Zeng Jinle 提交于 9月 17, 2021

d22914fd
G

fix unittest (#35808) · fcfb0afe
由 Guoxia Wang 提交于 9月 17, 2021

fcfb0afe

Disabled oneDNN reshape1/2 and squeeze1/2 kernels (#35781) · 0eaab803

由 jakpiase 提交于 9月 17, 2021

* disabled matmul_v2 grad

* Revert "disabled matmul_v2 grad"

This reverts commit b569bcef162116ca9f7963f3975b4a412f9e8555.

* reverted disabling matmul_v2, disabled reshape and squeeze

0eaab803

Make flag adding easier (#35823) · 2c781455

由 Zeng Jinle 提交于 9月 17, 2021

* make flag setter easier

* update

* rename macro name

* fix bug of public/writable

* update to pass CI

* polish

* fix CPU link error

2c781455

Add linalg pinv api (#35804) · 71e01d3f

由 andyjpaddle 提交于 9月 17, 2021

* add pinv api, test=develop
* add linalg pinv api, test=develop
* update example code, test=develop

71e01d3f

F
broadcast qkv_op (#35780) · cf9eae4c
由 feng_shuai 提交于 9月 17, 2021
```
* broadcast qkv_op

* use PADDLE_ENFORCE_GT to replace assert
```
cf9eae4c

add a fusion op: fused_layernorm_residual_dropout_bias (#35151) · 7975dfcf

由 zhangkaihuo 提交于 9月 17, 2021

Fused elementwise_add, dropout, elementwise_add and layer_norm into one operator, only support Forward. 
No Python API changed.

7975dfcf

Support EMA in Paddle2.x and Fleet (#35673) · fb4d5689

由 Haohongxiang 提交于 9月 17, 2021

* Support EMA in Paddle2.x and Fleet

* update

* update

* update

* modify ut of ema

* modify docs

* modify bugs

* update

* update

* update

* modify ut

fb4d5689

G

test=document_fix (#35824) · 177bf52f
由 Guoxia Wang 提交于 9月 17, 2021

177bf52f

add inplace op support to prune, scale_op is no longer need in jit.save (#35730) · 21921936

由 Haipeng Wang 提交于 9月 17, 2021

* add scale_op in model save step is not necessary, just fix the prune method to support static graph and inplace op

* fix jit.save, no need to add scale_op to each outputvar anymore.
fix prune_with_input, now it supports inplace op

* temporarily disable test_trt_dynamic_shape.TRTDynamicShapeOutOfBound2Test

21921936

Intergrate MultiThreadedWorkQueue to execute program ops (#35356) · a0871194

由 Aurelius84 提交于 9月 17, 2021

* format code

* format interface

* polish interface

* Remove std::memory_order

* modify into SpinLock

* remove fetch_context_pool_

* fix comment

* modify into WorkQueueGroup

* refine code

* fix pointer

* fix paddle_enforce

* split into AsyncWorkQueue

* polish code

* specify std::memory_relax

* fix atomic fetch_sub

* fix num_thread

a0871194

津

[inference]add hard_swish dynamic plugin (#35214) · c59c8e4f
由津提交于 9月 17, 2021

c59c8e4f
C

remove cuda sync in ext_tensor copy_to (#35802) · d43f797a
由 Chen Weihang 提交于 9月 17, 2021

d43f797a
Z

Fix segment api document. (#35818) · 6d5fc220
由 Zhong Hui 提交于 9月 17, 2021

6d5fc220

Add skip teller (#35807) · 0f74e5e7

由 xiaoxiaohehe001 提交于 9月 17, 2021

* add_skip_layernorm

* add_skip_layernorm

* add_skip_layernorm

* add_skip_layernorm

* add_skip_layernorm

* add_skip_layernorm

* add_skiplayernorm_teller

* add_skip_layernorm

* add_skip_layernorm_teller

* add_skip_layernorm_teller

* add_skip_layernorm

* add_skip_teller

0f74e5e7

L
expose cuda stream to users (#35813) · 40cfa512
由 Leo Chen 提交于 9月 17, 2021
```
* expose cuda stream to users

* add ut
```
40cfa512
津
[inference]add reduce converter test (#35145) · 05275010
由津提交于 9月 17, 2021
```
* add test

* add test

* add test
```
05275010
津
leaky_relu test (#35318) · 867f4fa0
由津提交于 9月 17, 2021
```
* add test

* add test

* add test

* add test

* add test
```
867f4fa0
W

polish code. (#35783) · 61010bb8
由 WeiXin 提交于 9月 17, 2021

61010bb8
X
fix unpool doc, test=document_fix (#35806) · 652e655f
由 xiaoting 提交于 9月 17, 2021
```
* fix unpool doc, test=document_fix

* fix typo for python example, test=document_fix
```
652e655f
F
Add New CI -- GPUBOX (#35755) · 00865930
由 Fan Zhang 提交于 9月 17, 2021
```
* Add New CI - GPUBOX
```
00865930
W
fix the memory leak for the static.auc · 0fd09fdf
由 wawltor 提交于 9月 17, 2021
```
fix the memory leak for the static.auc 
```
0fd09fdf
J

update acc func using topk v2 (#35789) · 94d2cf82
由 Jiaqi Liu 提交于 9月 17, 2021

94d2cf82

增强equal API，输入Y支持int，float，bool或者tensor类型 (#35695) · 9b2d53fc

由 yeliang2258 提交于 9月 17, 2021

* update equal op, input Y can be float,int,bool or tensor

* update test

* update code style

* update code style

* update doc

* update str check

* remote str

* add type check

9b2d53fc

0

refine matrix_rank op code and doc (#35722) · 28fffef6
由 0x45f 提交于 9月 17, 2021

28fffef6
G
add launch doc (#35634) · 5548061b
由 Guoxia Wang 提交于 9月 17, 2021
```
* add launch doc
```
5548061b

GeneratePass for Python Pass (#35708) · f6db9806

由 wuhuanzhou 提交于 9月 17, 2021

#### 背景

#35602 提供Python侧开发子图替换类Pass的方式：

- 利用Paddle Python API或者辅助类型定义子图program用来匹配/替换图；
- Python侧注册Pass时，将注册函数最终转换为protobuf定义的PassDesc数据形式，供C++侧进行解析完成Pass实例注册。

本PR即为根据PassDesc规则描述解析生成Pass实例。

#### 方案设计

##### Pass规则验证

在以往的Pass开发中，会存在随着算子迭代引发的匹配失效或者错误匹配的问题，该问题可以通过扫描算子支持的参数设置及参数类型等来判断是否应该使用该Pass或者给出提示需要修改Pass代码。

当前Pass开发中提供了算子兼容性OpCompatSensiblePass用于解决上述问题。但同时还存在不足：由于以往Pass开发在运行时才能获取到pattern信息，所以需要在执行Pass时才可以判断。

使用PassDesc表示的Pass可以在执行Pass前验证上述问题，这个过程在VerifyDesc中完成。

##### 根据匹配子图构造pattern

GeneratePass对于图匹配和替换使用GraphPatternDecetor完成，构造匹配pattern实际上就是将对应对象成员PDPattern中添加PDNode和边关系。该过程在函数`InitGeneratePattern`中完成，该函数没有作为GeneratePass的成员方法，主要出于后续可能开发新的Decetor考虑，GeneratePass与Decetor的操作是没有关联的。

初始化pattern主要通过遍历匹配子图program的全部算子实现：

1. 添加当前算子对应PDNode及限制条件（算子类型、属性限制等）；
2. 遍历当前算子对应输入并从pattern中尝试获取PDNode：
   - 在pattern中获取到PDNode且为输出节点：表示属于匹配子图的中间节点，将该PDNode设置为中间节点；
   - 在pattern中没有获取到PDNode：添加该输入PDNode并设置作为输入节点；
   - 设置输入到算子的边关系；
3. 遍历当前算子对应输出：
   - 在pattern中获取到PDNode且为输入节点：表示属于匹配子图的中间节点，将该PDNode设置为中间节点；
   - 在pattern中没有获取到PDNode：添加该输入PDNode并设置作为输出节点；
   - 设置算子到输出的边关系；

##### 根据替换子图操作graph

替换子图操作的过程在`GetGenerateRewrite`函数中完成，与`InitGeneratePattern`类似没有作为GeneratePass的成员方法。

生成替换子图操作过程如下：

1. 判断冗余替换子图；
2. 遍历替换子图program的全部算子添加替换子图Node：
   1. 添加当前算子的Node及属性设置；
   2. 遍历当前算子对应输入，添加中间variable节点；
   3. 遍历当前算子对应输出，添加中间variable节点；
   4. 添加输入/输出节点与算子节点的边关系；
3. 删除匹配图中属于中间节点的Node；

##### 优化子图验证

对于替换子图或者替换后的计算图是否可以正确运行等，可以在执行Pass时验证，从而防止在后续执行计算图时出现异常。

当前Pass执行直接修改计算图，验证失败时无法很好的完成还原操作，目前子图验证暂时默认成功，留到后续改进。

f6db9806

16 9月, 2021 6 次提交
- W
  fix bug in pscore (#35698) · e64fed86
  由 wangguanqun 提交于 9月 16, 2021
```
* add trainer desc config to distributed strategy

* code style modified

* data_feed set lod

* fix bug

* code style

* fix bug
```
  e64fed86
- Y
  
  [hybrid] Fix mp multi gradient clip prob (#35713) · a4eadd15
  由 Yuang Liu 提交于 9月 16, 2021
  
  a4eadd15
- Z
  
  Add segment apis to paddle.incubate (#35759) · 4b683887
  由 Zhong Hui 提交于 9月 16, 2021
  
  4b683887
- Z
  
  fix group_size = floor(C/groups) from ceil(C/groups); add groupnorm group divisible check (#35644) · f218330e
  由 zhiboniu 提交于 9月 16, 2021
  
  f218330e
- A
  [NPU] add index_select_grad kernel and unit tests (#35594) · 67a094b5
  由 Aganlengzi 提交于 9月 16, 2021
```
* [NPU] add index_select_grad kernel and unit tests

* dim=0 not need transpose
```
  67a094b5
- K
  fix dataloader exit terminate error (#34501) · e93c18a3
  由 Kaipeng Deng 提交于 9月 16, 2021
```
* fix DataLoader exit with SIGABRT/SIGSEGV. test=develop
```
  e93c18a3

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功