提交 · c6b6ba1fc45344d8f8d0201b63d530ab2e8a703c · BaiXuePrincess / Paddle

06 7月, 2021 3 次提交

Add gpu implementation of shuffle_batch_op (#33938) · c6b6ba1f

由 Zeng Jinle 提交于 7月 06, 2021

* add gpu implementation of shuffle batch
test=develop

* add thrust cuda patches
test=develop

* fix macro guard

* fix shuffle batch compile on windows/hip

* fix hip compilation error

* refine CMakeLists.txt

* fix windows compile error

* try to fix windows CI compilation error

* fix windows compilation again

* fix shuffle_batch op test on Windows

c6b6ba1f

Enhance error message for interpolate_v2 (#33941) · f2068eec

由 xiaoting 提交于 7月 06, 2021

* fix interpolate for shape[i]=0, test=develop

* fix test_trilinear_interp_v2 random failure, test=develop

f2068eec

【HETERPS】pipeline adaptive for heterps (#33159) · bfef7feb

由 danleifeng 提交于 7月 06, 2021

* pipeline adaptive for heterps;test=develop
* fix finalize hang;test=develop
* add is_compiled_with_heterps for dataset;test=develop
* fix hashtable core when pass ins_num=0;test=develop

bfef7feb

05 7月, 2021 5 次提交
- W
  
  Add fused elemwise gelu and optimize performance (#33480) · eae31856
  由 WangXi 提交于 7月 05, 2021
  
  eae31856
- P
  [NPU] change Add to AddN in sum npu op (#33957) · fa5ddfd9
  由 pangyoki 提交于 7月 05, 2021
```
* change Add to AddN in sum npu op

* add AddInputNames

* change fp16 to fp32 because numpy has accuracy loss in fp16 adding

* delete check

* fix runner error
```
  fa5ddfd9
- Q
  
  [NPU] add abs and uniform_random op and npu dockerfile, test=develop (#33942) · a84e48b9
  由 Qi Li 提交于 7月 05, 2021
  
  a84e48b9
- S
  [HybridParallel] Add amp support for pipeline_parallel (#33951) · 0b911330
  由 ShenLiang 提交于 7月 05, 2021
```
* add amp support for pp

* add amp untest
```
  0b911330
- D
  【HeterPS】fix hdfs and fleet_util for supporting save/load/infer (#33903) · 2ef6188b
  由 danleifeng 提交于 7月 05, 2021
```
* fix hdfs and fleet_util for supporting save/load infer;test=develop
```
  2ef6188b
04 7月, 2021 1 次提交
- P
  [NPU] delete useless GELU in gelu grad npu op (#33872) · 4d167240
  由 pangyoki 提交于 7月 04, 2021
```
* delete useless GELU in gelu npu op

* add description

* fix format

* add check_grad in gelu unittest
```
  4d167240
02 7月, 2021 1 次提交
- W
  
  fix fleet amp get_loss_scaling (#33935) · 17a81df6
  由 WangXi 提交于 7月 02, 2021
  
  17a81df6
01 7月, 2021 4 次提交
- Y
  
  gradient scale (#33862) · 57aabbab
  由 Yuang Liu 提交于 7月 01, 2021
  
  57aabbab
- S
  
  roll optimize (#32880) · 3fc56aa0
  由 sunli 提交于 7月 01, 2021
  
  3fc56aa0
- J
  Dygraph/sharding (#33633) · f33f2444
  由 JZ-LIANG 提交于 7月 01, 2021
```
* dygraph sharding

* update unitest hybrid_parallel_communicate_group
```
  f33f2444
- Z
  [AMP] add get() and set() for Grad_scaler (#33835) · 85687348
  由 zhangbo9674 提交于 7月 01, 2021
```
* add get and set for Grad_scaler

* refine some API name and comments

* refine API name and comments

* refine some comments
```
  85687348
30 6月, 2021 2 次提交

Added matmul_v2 BF16/FP32 FWD kernel (#33750) · 24783c84

由 jakpiase 提交于 6月 30, 2021

* added matmul_v2 bf16/fp32 FWD kernel

added matmul_v2 bf16/fp32 FWD kernel

* added formatting

* removed some tests due to timeout in CI

* refactored tests

* merged tests classes into one file

* minor change

* removed test guard for CUDA

* remove skipIf

* changes after review

* formated one file

* minor change

* added skipping UT in CUDA place

24783c84

H
[NPU] support set_device (#33815) · 8225a6a1
由 houj04 提交于 6月 30, 2021
```
* support set_device for NPU.

* minor update doc and add more unit test.
```
8225a6a1

28 6月, 2021 3 次提交
- J
  
  fix undef var (#33780) · 83284c8c
  由 Jiangxinz 提交于 6月 28, 2021
  
  83284c8c
- Q
  [ROCM] fix RNN miopen as weight need to permuted, test=develop (#33733) · 6024488d
  由 Qi Li 提交于 6月 28, 2021
```
* [ROCM] fix RNN miopen as weight need to permuted, test=develop

* [ROCM] fix data share when is_test, test=develop

* update, test=develop
```
  6024488d
- P
  [Paddle-TRT]Fix flatten converter when batch_size > 1 (#33768) · d91352c0
  由 Pei Yang 提交于 6月 28, 2021
```
* fix trt flatten converter when batch_size > 1

* change ut to same dynamic shape
```
  d91352c0
25 6月, 2021 1 次提交
- W
  
  static support mp_layers (#33700) · 91a0acdb
  由 WangXi 提交于 6月 25, 2021
  
  91a0acdb
24 6月, 2021 7 次提交
- H
  [NPU] support dygraph execution on npu place(#33579) · 6aea6be2
  由 houj04 提交于 6月 24, 2021
```
* in NPU environment, use CPUPlace for missing operators.

* in NPU environment, use CPUPlace for missing operators.

* fix TensorCopy bug and add unit test.

* fix code style.

* add more unit tests.
```
  6aea6be2
- J
  [oneDNN] Fix to #33282 , added support of X input broadcasting to oneDNN elementwise ops (#33549) · 049dd853
  由 Jacek Czaja 提交于 6月 24, 2021
```
* - fix to #33282

* - Increased threshold for elementwise_mul_bf16 grad

* -disabled faulty UT

* - fix to approval
```
  049dd853
- A
  [Dy2Stat]Support Python3 type hint (#33745) · c7797802
  由 Aurelius84 提交于 6月 24, 2021
```
* support type hint

* fix unittest
```
  c7797802
- W
  TestSaveLoadLargeParameters use cpu place. (#33756) · 1def9e05
  由 WeiXin 提交于 6月 24, 2021
```
* TestSaveLoadLargeParameters use cpu place.

* edit unittest
```
  1def9e05
- J
  
  fix undef var (#33692) · 68c1fe8c
  由 Jiangxinz 提交于 6月 24, 2021
  
  68c1fe8c
- J
  
  fix undef var (#33691) · 49638f25
  由 Jiangxinz 提交于 6月 24, 2021
  
  49638f25
- C
  supplet several interface of static Variable to consistent with dygraph Tensor (#33330) · af9dcb2d
  由 CtfGo 提交于 6月 24, 2021
```
As the title
```
  af9dcb2d
23 6月, 2021 4 次提交
- J
  Added split op bf16/fp32 oneDNN kernel (#33584) · 68106509
  由 jakpiase 提交于 6月 23, 2021
```
* base changes for split op

* 90% of split functionality added

* full fp32 functionality

* added bf16 test

* added submemory caching

* added bf test to static mode whitelist

* minor change

* enabled split op for inference

* minor fix

* minor fix
```
  68106509
- K
  elastic unitest (#33728) · 9b58cbf1
  由 kuizhiqing 提交于 6月 23, 2021
```
* elastic unitest

* rename demo
```
  9b58cbf1
- B
  
  repair npu matmul_grad and comm_init_hccl (#33719) · 9bf00cd5
  由 Baibaifan 提交于 6月 23, 2021
  
  9bf00cd5
- Z
  
  Add new operation: BroadcastTensorsOp (#33294) · affddfaa
  由 Zhanlue Yang 提交于 6月 23, 2021
  
  affddfaa
22 6月, 2021 3 次提交

[API/OP]Add a new API paddle.diagonal (#33586) · ad106290

由 zhangbo9674 提交于 6月 22, 2021

* new api diagonal, test=develop

* add new api diagonal, test=develop

* new api diagonal, test=develop

* add new api paddle.diagonal, test=develop

* use framework::stride replace ComputeDimStride

* replace cudaMalloc/cudaMemcpy by TensorFormVector in cudaKernel and cudaGradKernel

* perfect funciton: when attr(offset) is exceed attr(axis1) or attr(axis2), set the diagonal dim is 0

* fix RP-Mac-CI bug: replace framework::stride() by ComputDimStride.

* perfect code-block

* perfect code of python API diagonal

* api supports dtype of float16 and bool

* api supports dtype of float16 and bool

* modify unittest code

* modify unittest code

* perfect dtype describe

* perfect code-block

ad106290

Z

Fix the save path problem of UT test_pass_builder. (#33717) · 8a5bbae6
由 Zhen Wang 提交于 6月 22, 2021

8a5bbae6
C
transform complex scale to tensor (#33699) · 5db0c84b
由 chentianyu03 提交于 6月 22, 2021
```
* transform complex scale to tensor

* add test_case for complex scalar

* modify import paddle
```
5db0c84b

21 6月, 2021 6 次提交

Add AXPY oneDNN handler (#33632) · 773aabc7

由 lidanqing 提交于 6月 21, 2021

* Add oneDNN AXPY handler.

* Add fallback for small tensors.

* Fix ifdefs

* Remove unnecessary namespace prefixes and add missing headers.

* Guard handler_axpy with proper ifdefs.

* Compilation of this function is possible only when Paddle is not build
with CUDA nor HIP.

* Move AXPY handler code to separate files.

* Use oneDNN AXPY handler in SGD op.

* Use axpy handler only when Paddle is built with oneDNN.

* Add test for SUM BF16 with big rows.

* Fix SFINAE rules for elementwise_add_to.

* Add test case for SGD with big rows.

* update

* update
Co-authored-by: NAdam Osewski <adam.osewski@intel.com>

773aabc7

Y

add sync calc stream and add ut for fuse on gpu (#33580) · e0e0c0fa
由 Yuang Liu 提交于 6月 21, 2021

e0e0c0fa

[NPU] optimize mul op, use BatchMatMul to realize (#33616) · f91dfe15

由 pangyoki 提交于 6月 21, 2021

* use BatchMatMul

* replace TensorCopy with ShareDataWith

* remove check fp16 grad

* fix format

* add grad_check

* fix grad check

f91dfe15

T
Del six.PY code2 (#33607) · 0f7187af
由 tianshuo78520a 提交于 6月 21, 2021
```
* del py2 code2

* fix test timeout
```
0f7187af
J

fix undef val (#33562) · 4b9430a1
由 Jiangxinz 提交于 6月 21, 2021

4b9430a1

[NPU] flatten params and grads, fuse grad_clip and optimizer op (#33461) · c269a160

由 Leo Chen 提交于 6月 21, 2021

* enable npu alignment

* support flatten_params/grads

* support clip by global norm

* remove memset in coalesce_tensor_op

* fix npu kernel of sum op when input is one tensor

* add ut for flatten_param_grads+regularizer

* fix ut

* fix typo

c269a160

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致