提交 · beb920cd332217c542021999b88881959053853c · PaddlePaddle / Paddle

26 10月, 2021 1 次提交

[cherry-pick] Support CPU Parallel in DataParallel Interface by GLOO to speed... · beb920cd

由 xiongkun 提交于 10月 26, 2021

[cherry-pick] Support CPU Parallel in DataParallel Interface by GLOO to speed up training (#35745) (#36605)

* User specified backend (#35745)

* remove tensordot

beb920cd

25 10月, 2021 8 次提交

[cherry-pick 2.2] static model parallel dropout support deterministic RandomSeedGenerator (#36682) · 59615fff

由 WangXi 提交于 10月 25, 2021

* Revert "Add fused_dropout wrapper to ease use. (#36185) (#36640)"

This reverts commit 05d7e2fd.

* [hybrid] seed and dropout op support force-cpu (#35820)

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] fix seed ci failed issue

* add AsExtra for force_cpu of seed op

* Add fused_dropout wrapper to ease use. (#36185)

* [hybrid] static model parallel dropout support deterministic RandomSeedGenerator (#36228)
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: NLi Min <11663212+limin2021@users.noreply.github.com>

59615fff

W

[cherry-pick] enable trt test check and fix trt ut error (#36549) (#36655) · a540769b
由 Wilber 提交于 10月 25, 2021

a540769b
W

[cherry-pick] enable trt test check and fix trt ut error (#36371) (#36654) · 0951bfd1
由 Wilber 提交于 10月 25, 2021

0951bfd1
J

Add pool2d test convert (#36338) (#36663) · 7612bf1c
由 JingZhuangzhuang 提交于 10月 24, 2021

7612bf1c
W

Cherrypick (#36666) · a9b7d1d2
由 wenbin 提交于 10月 25, 2021

a9b7d1d2

Add nn.functional.sparse_attention and some test cases, test=develop (#35757) (#36551) · c57d1e91

由 Liu-xiandong 提交于 10月 25, 2021

Add paddle.nn.functional.sparse_attention API

本个PR主要将sparse_attention功能在python层进行了一层封装，OP的主体代码见：#PR35676

此外，对于封装的python 接口，增加了相应的单测。

c57d1e91

Z
[Cherry Pick]Add fp16 kernel for clip_op (#36577) (#36672) · bd40dd9a
由 zhangbo9674 提交于 10月 25, 2021
```
Add fp16 kernel for clip_op.
```
bd40dd9a
F
[cherry-pick] Add new API 'tensordot' (#36273) (#36454) · 2bfee7d3
由 From00 提交于 10月 25, 2021
```
* Add new API tensordot
cherry-pick #36273
```
2bfee7d3

24 10月, 2021 1 次提交

Add viterbi decode (#35778) (#36615) · 1906c746

由 Jack Zhou 提交于 10月 24, 2021

* add viterbi decode cpu kernel

* add viterbi decoder api in paddle.text

* add a data buffer once to avoid create many small pieces of data buffer frequently

* fix viterbi max_seq_length bug

* fix seq_len=1 bug

* fix device context

* move split out of for loop

* remove INVERSE_SUB

* remove 2 GET_CAST_MASK

* remove 1 loop

* remove Functor

* add to_static deploy code

* use MAX_FUNC instead of ELE_MAX

* add MaxFunctor

* impl max_func

* remove MaxFunctor

* remove cast op

* use REGISTER_OP_WITHOUT_GRADIENT

* add viterbi cuda kernel

* add FIX_BLOCKDIM_CASE macro

* add MKL add, mul; add get data mask

* add arange mkl impl

* add CPU Argmax

* add cpu gather

* use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL

* use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP

* use SAME_DIMS_ELEMENT_BINARY_OP

* add SimpleBroadcastBinaryOP

* use int instead of int64_t to accelerate

* optimize SimpleBroadcastBinaryOP

* optimize SimpleBroadcastBinaryOP

* optimize performance in both single thread and multithread situation

* remove useless line

* remove useless code

* add CREATE_TENSOR_BUFFER macro

* add INIT_REQUIRED_TENSOR macro

* add comment

* fix windows ci

* add viterbi unittest

* remove cuda add functor

* remove cuda equal

* remove a template function

* fix windows ci

* fix windows dtype

* remove some template instance

* remove useless header file

* remove some blockdim

* remove transpose impl

* accelerate cpu performance on single thread situation

* viterbi_decode->crf_decode

* rename crf params name

* add viterbi api test

* remove useless import

* add enable_static

* use viterbi decoder

* fix viterbi len=1

* fix  viterbi unittest

* remove useless comments

* reconstruct viterbi decode

* remove ADD,SUB,MUL structure

* fix coverage

* remove CREATE_TENSOR

* add name args

* crf.py->ops.py; with_start_stop_tag->include_start_end_tag

* update crf_decode en docs

* fix viterbi decode en docs

* fix some review comments

* add FIXED_BLOCK_DIM_CASE in cuda

* push_back->emplace_back

* crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag

* paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode

* fix viterbi_decode en docs

1906c746

21 10月, 2021 2 次提交
- improve replicate pad error information (#36531) · a201a691
  由 littletomatodonkey 提交于 10月 21, 2021
```
* fix replicate pad when input size is 0

* add unit test
```
  a201a691
- 0
  remove no_value using var.name (#36513) (#36565) · 6a20205d
  由 0x45f 提交于 10月 21, 2021
```
* remove no_value using var.name
```
  6a20205d
20 10月, 2021 2 次提交
- W
  
  [cherry-pick] Inference add type check in copy_from_cpu (#36552) · b5404f09
  由 Wilber 提交于 10月 20, 2021
  
  b5404f09
- X
  catch the generatorfunction and intercept it. (#35369) (#36536) · 023eb3f9
  由 xiongkun 提交于 10月 20, 2021
```
* catch the generatorfunction and intercept it.

* add test generator

* add test case

* refine the testcase
```
  023eb3f9
19 10月, 2021 1 次提交

[cherry-pick]Add sparse attention cherrypick (#36447) · 36edb0e1

由 Liu-xiandong 提交于 10月 19, 2021

The code of this PR can only support CUDA 11.2. Currently, CI does not have GPU with CUDA 11.2 , and all tests will be skipped automatically.

The new OP is paddle._C_ops.sparse_attention. Regarding the work of the python API, it will be resolved in a follow-up PR.

The code of this PR lacks tests on dynamic graphs and static graphs, and will be added in subsequent PRs.

36edb0e1

18 10月, 2021 1 次提交

[Cherry-pick][Dy2stat]fix no_grad context error in train mode when using... · 2b9d1922

由 0x45f 提交于 10月 18, 2021

[Cherry-pick][Dy2stat]fix no_grad context error in train mode when using save/load (#36434) (#36463)

修复使用jit.save/load接口加载模型后，在train模式和no_grad上下文中，显存会一直增长的问题

2b9d1922

15 10月, 2021 1 次提交

[cherry-pick]Verify the correctness of graph rewrited by GeneratePass (#36453) · cc449652

由 wuhuanzhou 提交于 10月 15, 2021

* [WIP]Verify the correctness of graph rewrited by GeneratePass, test=develop

* add delete subgraph and unittest, test=develop

* check simple pass, test=develop

* fix coverage, test=develop

* limit with input_spec via Paddle API, test=develop

cc449652

13 10月, 2021 1 次提交
- J
  
  fix for matmul_v2 6D x 2D (#36379) · ce6a27d9
  由 jakpiase 提交于 10月 13, 2021
  
  ce6a27d9
12 10月, 2021 1 次提交
- A
  Fix stop_gradient in RunProgramOp (#36339) (#36353) · a6868c91
  由 Aurelius84 提交于 10月 12, 2021
```
* Fix stop_gradient in RunProgramOp

* fix reference
```
  a6868c91
11 10月, 2021 1 次提交

[cherry-pick]fix hasattr(paddle.fluid.ir.PassDesc.OP, '__name__') error (#36294) · 45de9312

由 wuhuanzhou 提交于 10月 11, 2021

对于__getattr__重载后不满足条件的参数，全部抛出AttributeError异常，达到与未重载版本一致。

(cherry picked from PR #36229)

45de9312

30 9月, 2021 3 次提交
- Z
  add optest for adamw (#36148) (#36239) · 70e67843
  由 zhaoyingli 提交于 9月 30, 2021
```
* update func name

* skip cpu

* update unittest

* update unittest
```
  70e67843
- 李
  Fix raw optim (#36176) (#36231) · 28d12007
  由李季提交于 9月 30, 2021
```
* fix raw optim

* pre-commit test file
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
Co-authored-by: Nsneaxiy <sneaxiy@126.com>
```
  28d12007
- 李
  
  fix the undefined variable bug in dist_transformer file (#36211) (#36233) · 789012c0
  由李季提交于 9月 30, 2021
  
  789012c0
29 9月, 2021 2 次提交
- L
  add API paddle.linalg.eig (#35674) (#36188) · 4e2daa9a
  由 Lijunhui 提交于 9月 29, 2021
```
向PaddlePaddle中的线性代数库添加eig算子，该算子计算一般方阵的特征分解。
cherry-pick 自#35674.
```
  4e2daa9a
- H
  Add op paddle.device.cuda.get_device_name and paddle.device.cuda.get_device_capability. (#36172) · 96fd98bc
  由 hlygit66666 提交于 9月 29, 2021
```
* add get_device_name and get_device_capability

* fix docs

* fix docs

* fix decs
```
  96fd98bc
27 9月, 2021 4 次提交
- Y
  Add paddle.device.cuda.get_device_properties (#35875) · cea0bc26
  由 Yanxing Shi 提交于 9月 27, 2021
```
* Initial Commit

* fix py2 error

* fix wrong words and doc

* test=document_fix

* fix _gpuDeviceProperties
```
  cea0bc26
- Z
  remove linalg api in paddle.__init__ (#36112) · a57f0810
  由 zhiboniu 提交于 9月 27, 2021
```
remove recent linalg api in paddle.init;
add args 'name' in some new linalg api interface
```
  a57f0810
- L
  
  Correct the misspelled part of the unit test (#36101) · 4bcff7b2
  由 LJQ❤️ 提交于 9月 27, 2021
  
  4bcff7b2
- J
  [Cherry-pick] Add new func/class API psroi_pool and UT (#36111) · 81557da6
  由 JYChen 提交于 9月 27, 2021
```
cherry-pick from #35352

Add new detection api paddle.vision.ops.psroi_pool and paddle.vision.ops.PSRoIPool
```
  81557da6
26 9月, 2021 4 次提交
- H
  [cherry-pick] Add Det and Slogdet API to Release 2.2 (#36083) · ba2a1bb4
  由 Huihuang Zheng 提交于 9月 26, 2021
```
This PR added det and slogdet API to release/2.2
It is cherry-pick from #34992 and #36013
```
  ba2a1bb4
- W
  [Cherry-Pick]Add paddle.linalg.solve OP (#35715) (#36056) · 6b4f2fbf
  由 Weilong Wu 提交于 9月 26, 2021
```
This PR supports linalg.solve calculation for linear algorithm module of Paddle. One may call paddle.linalg.solve to use it.
```
  6b4f2fbf
- R
  [NPU] add randperm_op_npu (#35763) (#36026) · df81915a
  由 ronnywang 提交于 9月 26, 2021
```
* add randperm_op_npu

* fix test_set_value_op_npu
```
  df81915a
- Z
  [cherry pick]split minimize and add unscale_ for GradScaler (#35927) · e262125d
  由 zhangbo9674 提交于 9月 26, 2021
```
1、Split function GradScaler::minimize() to GradScaler::step() + GradScaler::update()
2、Add GradScaler::unscale_(optimizer)
```
  e262125d
24 9月, 2021 2 次提交
- H
  Basic PR on Cost Model (#35774) (#35915) · efcd108d
  由 Huihuang Zheng 提交于 9月 24, 2021
```
Add basic Cost Model, it uses executor to run program and profile it to get op time.

This is an early basic version, we will add more functions in the future.
```
  efcd108d
- J
  
  add pool2d convert test (#35925) · 063fca8e
  由 JingZhuangzhuang 提交于 9月 23, 2021
  
  063fca8e
23 9月, 2021 3 次提交
- C
  [cherry-pick] FixEighOP; Unified MatrixEighFunctor function (#35812) (#35919) · 4629401e
  由 crystal 提交于 9月 23, 2021
```
cherry-pick #35812，修复Eigh OP
```
  4629401e
- W
  
  add dilation check for conv (#35894) · 91f25ee3
  由 wangguanzhong 提交于 9月 23, 2021
  
  91f25ee3
- T
  op:transpose_op supports bool type (#35886) (#35926) · 95c100c1
  由 TeslaZhao 提交于 9月 23, 2021
```
* Pass compat of conv_transpose_bias_mkldnn_fuse_pass

* Fix a bug of strided_slice op, about the axes parameter access memory out of bounds

* Fix a bug of transpose op, about accessing memory out of bounds of the perm param

* op:transpose_op supports bool type
```
  95c100c1
22 9月, 2021 2 次提交
- B
  
  add hard_sigmoid trt converter test cases (#35908) · 6cc8b167
  由 baoachun 提交于 9月 22, 2021
  
  6cc8b167
- Z
  [cherry-pick]increase test_imperative_auto_mixed_precision time PROPERTIES... · 17879369
  由 zhangbo9674 提交于 9月 22, 2021
```
 [cherry-pick]increase test_imperative_auto_mixed_precision time PROPERTIES TIMEOUT (#35863) (#35898)

Increase test_imperative_auto_mixed_precision PROPERTIES TIMEOUT from 120s to 300s.
```
  17879369

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功