提交 · bd40dd9aebba4d7c0ec85fd63a364df74df5109f · Crayon鑫 / Paddle

25 10月, 2021 5 次提交

Z
[Cherry Pick]Add fp16 kernel for clip_op (#36577) (#36672) · bd40dd9a
由 zhangbo9674 提交于 10月 25, 2021
```
Add fp16 kernel for clip_op.
```
bd40dd9a
Z
[Cherry Pick] refine comments for GradScaler state_dict (#36522) (#36671) · 304fb2b5
由 zhangbo9674 提交于 10月 25, 2021
```
Refine comments for GradScaler state_dict.
```
304fb2b5
F
[cherry-pick] Add new API 'tensordot' (#36273) (#36454) · 2bfee7d3
由 From00 提交于 10月 25, 2021
```
* Add new API tensordot
cherry-pick #36273
```
2bfee7d3

Add fused_attention_op: add impl wrappers. (#35903) (#36673) · 8c0bacd4

由 Li Min 提交于 10月 25, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

8c0bacd4

Add fused_dropout wrapper to ease use. (#36185) (#36640) · 05d7e2fd

由 Li Min 提交于 10月 25, 2021

In fused_attention op and fused_ffn op, the fused bias_add+dropout+residual+layernorm kernel or bias_add+dropout+residual kernel is used. To ease the use of this kernel, we provide a wrapper in this PR.
1.To reuse the increment computing code, we exact the corresponding code to "GetSeedDataAndIncrement" routine in dropout_impl_util.h.
2.The fused_dropout_helper.h provides the fused dropout kernel wrapper.

Note: the test of this warper will be provided in the following fused_attention_op and fused_ffn PRs.

05d7e2fd

24 10月, 2021 1 次提交

Add viterbi decode (#35778) (#36615) · 1906c746

由 Jack Zhou 提交于 10月 24, 2021

* add viterbi decode cpu kernel

* add viterbi decoder api in paddle.text

* add a data buffer once to avoid create many small pieces of data buffer frequently

* fix viterbi max_seq_length bug

* fix seq_len=1 bug

* fix device context

* move split out of for loop

* remove INVERSE_SUB

* remove 2 GET_CAST_MASK

* remove 1 loop

* remove Functor

* add to_static deploy code

* use MAX_FUNC instead of ELE_MAX

* add MaxFunctor

* impl max_func

* remove MaxFunctor

* remove cast op

* use REGISTER_OP_WITHOUT_GRADIENT

* add viterbi cuda kernel

* add FIX_BLOCKDIM_CASE macro

* add MKL add, mul; add get data mask

* add arange mkl impl

* add CPU Argmax

* add cpu gather

* use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL

* use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP

* use SAME_DIMS_ELEMENT_BINARY_OP

* add SimpleBroadcastBinaryOP

* use int instead of int64_t to accelerate

* optimize SimpleBroadcastBinaryOP

* optimize SimpleBroadcastBinaryOP

* optimize performance in both single thread and multithread situation

* remove useless line

* remove useless code

* add CREATE_TENSOR_BUFFER macro

* add INIT_REQUIRED_TENSOR macro

* add comment

* fix windows ci

* add viterbi unittest

* remove cuda add functor

* remove cuda equal

* remove a template function

* fix windows ci

* fix windows dtype

* remove some template instance

* remove useless header file

* remove some blockdim

* remove transpose impl

* accelerate cpu performance on single thread situation

* viterbi_decode->crf_decode

* rename crf params name

* add viterbi api test

* remove useless import

* add enable_static

* use viterbi decoder

* fix viterbi len=1

* fix  viterbi unittest

* remove useless comments

* reconstruct viterbi decode

* remove ADD,SUB,MUL structure

* fix coverage

* remove CREATE_TENSOR

* add name args

* crf.py->ops.py; with_start_stop_tag->include_start_end_tag

* update crf_decode en docs

* fix viterbi decode en docs

* fix some review comments

* add FIXED_BLOCK_DIM_CASE in cuda

* push_back->emplace_back

* crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag

* paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode

* fix viterbi_decode en docs

1906c746

22 10月, 2021 1 次提交

Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 (#36373) (#36616) · 6840cf55

由 niuliling123 提交于 10月 22, 2021

* Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1
* Update the implement of reduceAnyKernel according to kernel primitive api

6840cf55

21 10月, 2021 3 次提交
- N
  [Cherry-pick] Add functor_primitives.h for kernel primitive api (#36418) · 30909889
  由 niuliling123 提交于 10月 21, 2021
```
* Add functor_primitives.h for kernel primtive api
```
  30909889
- improve replicate pad error information (#36531) · a201a691
  由 littletomatodonkey 提交于 10月 21, 2021
```
* fix replicate pad when input size is 0

* add unit test
```
  a201a691
- 0
  remove no_value using var.name (#36513) (#36565) · 6a20205d
  由 0x45f 提交于 10月 21, 2021
```
* remove no_value using var.name
```
  6a20205d
20 10月, 2021 2 次提交
- W
  
  [cherry-pick] Inference add type check in copy_from_cpu (#36552) · b5404f09
  由 Wilber 提交于 10月 20, 2021
  
  b5404f09
- X
  catch the generatorfunction and intercept it. (#35369) (#36536) · 023eb3f9
  由 xiongkun 提交于 10月 20, 2021
```
* catch the generatorfunction and intercept it.

* add test generator

* add test case

* refine the testcase
```
  023eb3f9
19 10月, 2021 4 次提交

[cherry-pick]Add sparse attention cherrypick (#36447) · 36edb0e1

由 Liu-xiandong 提交于 10月 19, 2021

The code of this PR can only support CUDA 11.2. Currently, CI does not have GPU with CUDA 11.2 , and all tests will be skipped automatically.

The new OP is paddle._C_ops.sparse_attention. Regarding the work of the python API, it will be resolved in a follow-up PR.

The code of this PR lacks tests on dynamic graphs and static graphs, and will be added in subsequent PRs.

36edb0e1

W

cherry-pick 36424 inference support bert when exists matmul_v2 (#36500) · d974dbd1
由 Wilber 提交于 10月 19, 2021

d974dbd1
C
quant support matmul_v2 (#36469) (#36499) · b8167ed2
由 ceci3 提交于 10月 19, 2021
```
* quant support matmul_v2

* fix format
```
b8167ed2

Add operators for async read & async write (#36333) (#36501) · d65f8af8

由 Siming Dai 提交于 10月 19, 2021

* fix async_read bug

* change index place to cpu

* add tensor size judge

* add async_read & async_write test

* fix bug in async_write

* fix mac py3 ci

* fix bug for cpu version paddle

* fix windows ci bug

* change input argument error type

* change const_cast to mutable_data

* add async_write out-of-bound check and consumate error hint

* fix a small bug for dst_tensor

* add docs and refine codes

* refine docs

* notest,test=windows_ci

* fix windows ci

* fix require

* fix code-block

* add core.is_compiled_with_cuda()

d65f8af8

18 10月, 2021 1 次提交

[Cherry-pick][Dy2stat]fix no_grad context error in train mode when using... · 2b9d1922

由 0x45f 提交于 10月 18, 2021

[Cherry-pick][Dy2stat]fix no_grad context error in train mode when using save/load (#36434) (#36463)

修复使用jit.save/load接口加载模型后，在train模式和no_grad上下文中，显存会一直增长的问题

2b9d1922

15 10月, 2021 2 次提交

[cherry-pick]Verify the correctness of graph rewrited by GeneratePass (#36453) · cc449652

由 wuhuanzhou 提交于 10月 15, 2021

* [WIP]Verify the correctness of graph rewrited by GeneratePass, test=develop

* add delete subgraph and unittest, test=develop

* check simple pass, test=develop

* fix coverage, test=develop

* limit with input_spec via Paddle API, test=develop

cc449652

Y
[cherry-pick] add sparse_embedding doc (#36312) · fc429fea
由 Yanxing Shi 提交于 10月 15, 2021
```
* add sparse_embedding doc

* modify sample code

* fix sample code error
```
fc429fea

14 10月, 2021 1 次提交
- fix windows bug that python virtual env can't find python executable (#36227) (#36370) · 976f0146
  由 zhouweiwei2014 提交于 10月 14, 2021
```
ATT，cherry-pick #36227
```
  976f0146
13 10月, 2021 3 次提交
- 0
  delete remove_static_file() function in error.py (#36153) (#36375) · a5767bb6
  由 0x45f 提交于 10月 13, 2021
```
* change time to remove static tempfile

* delete remove_static_file() function
```
  a5767bb6
- W
  [cherrypick] change paddle.mm api to matmul v2 op (#36374) · 7a66160d
  由 wawltor 提交于 10月 13, 2021
```
* change the paddle.mm to matmul_v2

* update the code for the mm

* update the document for the mm
```
  7a66160d
- J
  
  fix for matmul_v2 6D x 2D (#36379) · ce6a27d9
  由 jakpiase 提交于 10月 13, 2021
  
  ce6a27d9
12 10月, 2021 2 次提交
- A
  Fix stop_gradient in RunProgramOp (#36339) (#36353) · a6868c91
  由 Aurelius84 提交于 10月 12, 2021
```
* Fix stop_gradient in RunProgramOp

* fix reference
```
  a6868c91
- W
  
  fix yolo precision issue(#36365) · 10eebfa0
  由 wenbin 提交于 10月 12, 2021
  
  10eebfa0
11 10月, 2021 3 次提交
- S
  
  dlpack fix (#35817) (#36177) · 31a5829a
  由 Siming Dai 提交于 10月 11, 2021
  
  31a5829a
- W
  [cherry-pick]C++ support register pass via PassDesc (#36302) · 21c65f66
  由 wuhuanzhou 提交于 10月 11, 2021
```
(cherry picked from PR #36095)

PR主要功能：支持C++开发注册GeneratePass，简化针对fusion等子图优化场景开发方式。
```
  21c65f66
- W
  [cherry-pick]fix hasattr(paddle.fluid.ir.PassDesc.OP, '__name__') error (#36294) · 45de9312
  由 wuhuanzhou 提交于 10月 11, 2021
```
对于__getattr__重载后不满足条件的参数，全部抛出AttributeError异常，达到与未重载版本一致。

(cherry picked from PR #36229)
```
  45de9312
30 9月, 2021 6 次提交
- Z
  add optest for adamw (#36148) (#36239) · 70e67843
  由 zhaoyingli 提交于 9月 30, 2021
```
* update func name

* skip cpu

* update unittest

* update unittest
```
  70e67843
- 李
  Fix raw optim (#36176) (#36231) · 28d12007
  由李季提交于 9月 30, 2021
```
* fix raw optim

* pre-commit test file
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
```
  28d12007
- 李
  
  fix the undefined variable bug in dist_transformer file (#36211) (#36233) · 789012c0
  由李季提交于 9月 30, 2021
  
  789012c0
- G
  
  fix bug of reduce_sum when src_dtype != dst_dtype and reduce_num == 1 (#36123) (#36193) · e8efba57
  由 Guoxia Wang 提交于 9月 30, 2021
  
  e8efba57
- G
  
  support fp16 (#35888) (#36191) · 87cc8d48
  由 Guoxia Wang 提交于 9月 30, 2021
  
  87cc8d48
- F
  [cherry-pick] add roi align (#36207) · dcd17d6b
  由 Feng Ni 提交于 9月 30, 2021
```
add roi align, cherry-pick #35102
```
  dcd17d6b
29 9月, 2021 4 次提交
- L
  add API paddle.linalg.eig (#35674) (#36188) · 4e2daa9a
  由 Lijunhui 提交于 9月 29, 2021
```
向PaddlePaddle中的线性代数库添加eig算子，该算子计算一般方阵的特征分解。
cherry-pick 自#35674.
```
  4e2daa9a
- H
  Add op paddle.device.cuda.get_device_name and paddle.device.cuda.get_device_capability. (#36172) · 96fd98bc
  由 hlygit66666 提交于 9月 29, 2021
```
* add get_device_name and get_device_capability

* fix docs

* fix docs

* fix decs
```
  96fd98bc
- Y
  [cherry-pick] fix paddle.device.cuda.get_device_properties doc (#36174) · dd14f7f0
  由 Yanxing Shi 提交于 9月 29, 2021
```
* test=document_fix

* test=document_fix

* test=document_fix

* test=document_fix
```
  dd14f7f0
- W
  Add roi pool (#35084) (#36154) · b0289de5
  由 Wenyu 提交于 9月 29, 2021
```
* add roi pool

* rename input as x
```
  b0289de5
28 9月, 2021 2 次提交

[cherry-pick] update multi_dot exposure rules (#36018) (#36131) · 632a0064

由 zhangkaihuo 提交于 9月 28, 2021

根据线性代数库的API暴露规则修改multi_dot的API暴露规则：
1、在python/paddle/tensor/linalg.py 路径下实现
2、在python/paddle/linalg.py 下import并加入__all__列表
3、在python/paddle/tensor/init.py下引入并加入tensor_method_func列表
4、删除了pythonpaddle/init.py的import

632a0064

R
[cherry-pick] [ROCM] bugfix for bilinear_interp_v2_grad (#36160) #36161 · c576169b
由 ronnywang 提交于 9月 28, 2021
```
ATT, cherry-pick #36160
```
c576169b

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致