提交 · 856873482aecafae0cca727051fe512b1f0c0fd7 · BaiXuePrincess / Paddle

01 7月, 2021 1 次提交

[AMP] add get() and set() for Grad_scaler (#33835) · 85687348

由 zhangbo9674 提交于 7月 01, 2021

* add get and set for Grad_scaler

* refine some API name and comments

* refine API name and comments

* refine some comments

85687348

30 6月, 2021 3 次提交

Added matmul_v2 BF16/FP32 FWD kernel (#33750) · 24783c84

由 jakpiase 提交于 6月 30, 2021

* added matmul_v2 bf16/fp32 FWD kernel

added matmul_v2 bf16/fp32 FWD kernel

* added formatting

* removed some tests due to timeout in CI

* refactored tests

* merged tests classes into one file

* minor change

* removed test guard for CUDA

* remove skipIf

* changes after review

* formated one file

* minor change

* added skipping UT in CUDA place

24783c84

[Dy2Stat] Refine PartialProgramLayer logic (#33796) · 97f86d84

由 Aurelius84 提交于 6月 30, 2021

* refine temp_scope_vec logic

* polish partial_program

* fix fake var

* add stop_gradient in spec

* fix fake_var

* fix unittest

97f86d84

H
[NPU] support set_device (#33815) · 8225a6a1
由 houj04 提交于 6月 30, 2021
```
* support set_device for NPU.

* minor update doc and add more unit test.
```
8225a6a1

29 6月, 2021 4 次提交
- L
  Add Returns for BeamSearchDecoder doc (#33721) · 07eeb36e
  由 liu zhengxi 提交于 6月 29, 2021
```
* add returns for beamsearchdecoder doc, test=document_fix
```
  07eeb36e
- T
  Remove HeterBox (#33718) · 66c7a076
  由 Thunderbrook 提交于 6月 29, 2021
```
* remove heterbox

* remove heterbox
```
  66c7a076
- Z
  
  polish avx/no_avx install (#33818) · 5c514f5e
  由 Zhou Wei 提交于 6月 29, 2021
  
  5c514f5e
- T
  
  xpu support amp (#33809) · 4d4fb660
  由 taixiurong 提交于 6月 29, 2021
  
  4d4fb660
28 6月, 2021 3 次提交
- J
  
  fix undef var (#33780) · 83284c8c
  由 Jiangxinz 提交于 6月 28, 2021
  
  83284c8c
- Q
  [ROCM] fix RNN miopen as weight need to permuted, test=develop (#33733) · 6024488d
  由 Qi Li 提交于 6月 28, 2021
```
* [ROCM] fix RNN miopen as weight need to permuted, test=develop

* [ROCM] fix data share when is_test, test=develop

* update, test=develop
```
  6024488d
- P
  [Paddle-TRT]Fix flatten converter when batch_size > 1 (#33768) · d91352c0
  由 Pei Yang 提交于 6月 28, 2021
```
* fix trt flatten converter when batch_size > 1

* change ut to same dynamic shape
```
  d91352c0
25 6月, 2021 2 次提交
- W
  
  [ pass_enhance ]quant_conv2d_dequant_fuse_pass (#33737) · bd68761a
  由 Wangzheee 提交于 6月 25, 2021
  
  bd68761a
- W
  
  static support mp_layers (#33700) · 91a0acdb
  由 WangXi 提交于 6月 25, 2021
  
  91a0acdb
24 6月, 2021 8 次提交
- H
  [NPU] support dygraph execution on npu place(#33579) · 6aea6be2
  由 houj04 提交于 6月 24, 2021
```
* in NPU environment, use CPUPlace for missing operators.

* in NPU environment, use CPUPlace for missing operators.

* fix TensorCopy bug and add unit test.

* fix code style.

* add more unit tests.
```
  6aea6be2
- J
  [oneDNN] Fix to #33282 , added support of X input broadcasting to oneDNN elementwise ops (#33549) · 049dd853
  由 Jacek Czaja 提交于 6月 24, 2021
```
* - fix to #33282

* - Increased threshold for elementwise_mul_bf16 grad

* -disabled faulty UT

* - fix to approval
```
  049dd853
- A
  [Dy2Stat]Support Python3 type hint (#33745) · c7797802
  由 Aurelius84 提交于 6月 24, 2021
```
* support type hint

* fix unittest
```
  c7797802
- W
  TestSaveLoadLargeParameters use cpu place. (#33756) · 1def9e05
  由 WeiXin 提交于 6月 24, 2021
```
* TestSaveLoadLargeParameters use cpu place.

* edit unittest
```
  1def9e05
- J
  
  fix undef var (#33692) · 68c1fe8c
  由 Jiangxinz 提交于 6月 24, 2021
  
  68c1fe8c
- J
  
  fix undef var (#33691) · 49638f25
  由 Jiangxinz 提交于 6月 24, 2021
  
  49638f25
- X
  
  fix a quantization bug (#33753) · 57352bc7
  由 XGZhang 提交于 6月 24, 2021
  
  57352bc7
- C
  supplet several interface of static Variable to consistent with dygraph Tensor (#33330) · af9dcb2d
  由 CtfGo 提交于 6月 24, 2021
```
As the title
```
  af9dcb2d
23 6月, 2021 4 次提交
- J
  Added split op bf16/fp32 oneDNN kernel (#33584) · 68106509
  由 jakpiase 提交于 6月 23, 2021
```
* base changes for split op

* 90% of split functionality added

* full fp32 functionality

* added bf16 test

* added submemory caching

* added bf test to static mode whitelist

* minor change

* enabled split op for inference

* minor fix

* minor fix
```
  68106509
- K
  elastic unitest (#33728) · 9b58cbf1
  由 kuizhiqing 提交于 6月 23, 2021
```
* elastic unitest

* rename demo
```
  9b58cbf1
- B
  
  repair npu matmul_grad and comm_init_hccl (#33719) · 9bf00cd5
  由 Baibaifan 提交于 6月 23, 2021
  
  9bf00cd5
- Z
  
  Add new operation: BroadcastTensorsOp (#33294) · affddfaa
  由 Zhanlue Yang 提交于 6月 23, 2021
  
  affddfaa
22 6月, 2021 7 次提交

[API/OP]Add a new API paddle.diagonal (#33586) · ad106290

由 zhangbo9674 提交于 6月 22, 2021

* new api diagonal, test=develop

* add new api diagonal, test=develop

* new api diagonal, test=develop

* add new api paddle.diagonal, test=develop

* use framework::stride replace ComputeDimStride

* replace cudaMalloc/cudaMemcpy by TensorFormVector in cudaKernel and cudaGradKernel

* perfect funciton: when attr(offset) is exceed attr(axis1) or attr(axis2), set the diagonal dim is 0

* fix RP-Mac-CI bug: replace framework::stride() by ComputDimStride.

* perfect code-block

* perfect code of python API diagonal

* api supports dtype of float16 and bool

* api supports dtype of float16 and bool

* modify unittest code

* modify unittest code

* perfect dtype describe

* perfect code-block

ad106290

D

adaptive for py3 for ps util;test=develop (#33727) · e5a6bb1d
由 danleifeng 提交于 6月 22, 2021

e5a6bb1d
Z

Fix the save path problem of UT test_pass_builder. (#33717) · 8a5bbae6
由 Zhen Wang 提交于 6月 22, 2021

8a5bbae6

Gpu samplecode test On PR-CPU-Py2 (#33634) · dd4297cd

由 Ren Wei (任卫) 提交于 6月 22, 2021

* using argparse to handle selections

* 2 TODOs

* 先不更改pipeline配置，这里强制改成GPU版本

* sorted the all_names

* exec gpu sample codes tests incrementally

* get all apis from the pr.spec file

* condition with WITH_GPU

WITH_GPU == ON

save

* delete the useless codes

* delete the useless codes.

test=document_fix

* echo the diff result

test=document_fix

* dont reuse the variables

* rename fun to _func not work. put it into the skiplist

https://github.com/PaddlePaddle/Paddle/commit/038ffc795025170e8cda74bcd473b46301b9a1c0
test=document_fix

* skip it in check api approvals

test=document_fix

save

* skip the private _variables

* print signatures wrong. now rename it to _func

test=document_fix

dd4297cd

C
transform complex scale to tensor (#33699) · 5db0c84b
由 chentianyu03 提交于 6月 22, 2021
```
* transform complex scale to tensor

* add test_case for complex scalar

* modify import paddle
```
5db0c84b
J

solve ANSI escape sequences print error in cmd and powershell (#33689) · 18284261
由 jiangcheng 提交于 6月 22, 2021

18284261
C
Dygraph post trainging quantization (#33445) · 2b6fc108
由 cc 提交于 6月 22, 2021
```
* dygraph post training quantization

* refine the ptq config

* refine ptq quantizer
```
2b6fc108

21 6月, 2021 8 次提交

Add AXPY oneDNN handler (#33632) · 773aabc7

由 lidanqing 提交于 6月 21, 2021

* Add oneDNN AXPY handler.

* Add fallback for small tensors.

* Fix ifdefs

* Remove unnecessary namespace prefixes and add missing headers.

* Guard handler_axpy with proper ifdefs.

* Compilation of this function is possible only when Paddle is not build
with CUDA nor HIP.

* Move AXPY handler code to separate files.

* Use oneDNN AXPY handler in SGD op.

* Use axpy handler only when Paddle is built with oneDNN.

* Add test for SUM BF16 with big rows.

* Fix SFINAE rules for elementwise_add_to.

* Add test case for SGD with big rows.

* update

* update
Co-authored-by: NAdam Osewski <adam.osewski@intel.com>

773aabc7

Y

add sync calc stream and add ut for fuse on gpu (#33580) · e0e0c0fa
由 Yuang Liu 提交于 6月 21, 2021

e0e0c0fa
W

update fp16 gray_list for tensor parallel (#33660) · 1681a2dd
由 WangXi 提交于 6月 21, 2021

1681a2dd

[NPU] optimize mul op, use BatchMatMul to realize (#33616) · f91dfe15

由 pangyoki 提交于 6月 21, 2021

* use BatchMatMul

* replace TensorCopy with ShareDataWith

* remove check fp16 grad

* fix format

* add grad_check

* fix grad check

f91dfe15

C
Combine amp and qat (#33484) · f88af205
由 cc 提交于 6月 21, 2021
```
* Combine amp and qat
* add unit test
```
f88af205
T
Del six.PY code2 (#33607) · 0f7187af
由 tianshuo78520a 提交于 6月 21, 2021
```
* del py2 code2

* fix test timeout
```
0f7187af
J

fix undef val (#33562) · 4b9430a1
由 Jiangxinz 提交于 6月 21, 2021

4b9430a1

[NPU] flatten params and grads, fuse grad_clip and optimizer op (#33461) · c269a160

由 Leo Chen 提交于 6月 21, 2021

* enable npu alignment

* support flatten_params/grads

* support clip by global norm

* remove memset in coalesce_tensor_op

* fix npu kernel of sum op when input is one tensor

* add ut for flatten_param_grads+regularizer

* fix ut

* fix typo

c269a160

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致