提交 · 0ae8a2d67623f33c13f2dc14141587619cc3ba7e · 机器未来 / Paddle

31 5月, 2022 3 次提交

L
Fix the underflow of fp16 fake quantize operators (#43088) · 0ae8a2d6
由 Leo Chen 提交于 5月 31, 2022
```
Co-authored-by: NRyan Jeng <rjeng@nvidia.com>
```
0ae8a2d6
L
Rename dropout is test (#43098) · 67497119
由 Li Min 提交于 5月 31, 2022
```
* replace dropout_is_test with is_test.
* improve atol on a100.
```
67497119

OneDNN md-in-tensor refactoring part 5: Memory descriptor enabled for... · 12d8a567

由 jakpiase 提交于 5月 30, 2022

OneDNN md-in-tensor refactoring part 5: Memory descriptor enabled for elementwises, reductions and expand_v2 ops (#43036)

* enabled md in elementwises, reductions and expand_v2

* CI fix for invalid numpy copy

* fixed formatting

* CI rerun

* changes after review

12d8a567

30 5月, 2022 6 次提交

C

[mlu] add one_hot_v2 mlu kernel (#43025) · 13a21cf7
由 Chenxiao Niu 提交于 5月 30, 2022

13a21cf7
L
Add fused_bias_dropout_residual_ln op and layer. (#43062) · dceccd9d
由 Li Min 提交于 5月 30, 2022
```
* add fused_bias_dropout_residual_ln op and layer.
```
dceccd9d
C

Implement fused_gate_attention operator for AlphaFold. (#42018) · fdcdbec5
由 crystal 提交于 5月 30, 2022

fdcdbec5

【PaddlePaddle Hackathon 2】15 新增 API Nanmedian (#42385) · f87fa3c0

由 thunder95 提交于 5月 30, 2022

* nanmedian op

* 修改cuda kernel的bug

* 修复count_if在其他硬件平台不兼容

* 修复某些cpu硬件不兼容

* 修复某些cpu硬件不兼容

* 修复isnan判断

* 兼容numpy低版本不支持全部nan的情况

* 兼容numpy低版本不支持全部nan的情况

* fix code example

* fix api comment error

* 修改反向传播逻辑以及c++处理逻辑

* 完成修改建议

* typo pre_dim

* update en docs, test=document_fix

* remove numpy in en doc, test=document_fix

* add r,test=document_fix

* 添加api到all

* follow advice from chenwhql

f87fa3c0

C

[MLU]add mlu kernel for log_softmax op (#43040) · 586f9429
由 cambriconhsq 提交于 5月 30, 2022

586f9429
W
[Dy2St]Fix cond_block_grad error when handle no need grad vras (#43034) · cd3d0911
由 WangZhen 提交于 5月 30, 2022
```
* Fix cond_block_grad error when handle no need grad vras

* Add comment and UT
```
cd3d0911

28 5月, 2022 1 次提交
- S
  [Bug Fix]Fix global_scatter/global_gather in ProcessGroup (#43027) · 8cc2e28c
  由 ShenLiang 提交于 5月 28, 2022
```
* fix alltoall

* rename utest
```
  8cc2e28c
27 5月, 2022 2 次提交

[Phi] Change optional tensor from `optional<const Tensor&>` to `optional<Tensor>` (#42939) · 6d78524c

由 zyfncg 提交于 5月 27, 2022

* refactor the optional tensor

* remove optiona<MetaTensor> in InferMeta

* fix bug

* fix optional<vector<Tensor>>

* fix bug

* fix rmsprop

* fix amp of eager_gen

* polish code

* fix deleted code

* fix merge conflict

* polish code

* remove is_nullopt_

* fix merge conflict

* fix merge conflict

6d78524c

R
Support memory stats for CPU (#42945) · 21f11d35
由 Ruibiao Chen 提交于 5月 27, 2022
```
* Support memory stats for CPU

* Add UTs

* Fix typos

* Fix typos
```
21f11d35

26 5月, 2022 2 次提交
- Y
  
  move instance_norm_double_grad (#43021) · b2b78cd4
  由 YuanRisheng 提交于 5月 26, 2022
  
  b2b78cd4
- Y
  [Phi]Refactor InstanceNormKernel and InstanceNormGradKernel (#42978) · cc272afb
  由 YuanRisheng 提交于 5月 26, 2022
```
* move instance_norm

* change mutable_data

* fix compile bugs
```
  cc272afb
25 5月, 2022 4 次提交

J
OneDNN md-in-tensor refactoring part 4: Memory descriptor enabled for more ops (#42946) · 657abd51
由 jakpiase 提交于 5月 25, 2022
```
* added support for md in more ops

* fixed typo
```
657abd51

fix maybe-uninitialized warning (#42902) · f1f79b0d

由 Leo Chen 提交于 5月 25, 2022

* fix maybe-uninitialized warning

* fix compile

* fix xpu compile

* fix npu compile

* fix infer compile

* fix compile

* fix compile

f1f79b0d

F

[MLU] adapt coalesce_tensor op for mlu (#42873) · cbb24136
由 fwenguang 提交于 5月 25, 2022

cbb24136

[EinsumOp] Optimize the backward speed of EinsumOp (#42663) · 71b046cd

由 xiongkun 提交于 5月 25, 2022

* change logic for optimize

* modifty

* optimize the backward speed of EinsumOp

* add cache optimizer for einsum op

* EinsumOp: fix new dygraph mode error

* fix bug

* change Cache->InnerCache

* fix code

* fix

* add nan inf utils for einsum op

* add as_extra

* Compatible with v2.3 EinsumOp

* remove dispensable

71b046cd

24 5月, 2022 2 次提交

Y
[Phi]Move grad_add op kernel into phi and delete elementwise_add_op file (#42903) · 4d7a9eef
由 YuanRisheng 提交于 5月 24, 2022
```
* move grad_add

* fix unittest bugs

* fix compile bugs
```
4d7a9eef

[XPUPS] Modify XPU Kernel (#42745) · f8931c97

由 Fan Zhang 提交于 5月 24, 2022

* Adapt XPUPS - 1st version - 3.24

* Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24

* Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25

* refactor heter comm kernel

* update. test=develop

* Adapt XPUPS - modify by compilation - 4th version - 3.27

* update calc_shard_offset. test=develop

* update xpu kernel. test=develop

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* heter_comm update

* heter_comm update

* update calc_shard_offset. test=develop

* heter_comm update

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30

* update. test=develop

* update pslib.cmake

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* Adapt XPUPS - modify by kp compilation  - 6th version - 3.30

* update. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* used by minxu

* update heter_comm_inl

* fix. test=develop

* Adapt XPUPS - modify by kp compilation  - 7th version - 3.30

* fix. test=develop

* add optimizer kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 3.31 update

* Adapt XPUPS - update kp compilation path  - 8th version - 3.31

* add optimizer kernel. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm.h 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update hashtable. test=develop

* update. test=develop

* Adapt XPUPS - update by kp compilation  - 9th version - 4.1

* update hashtable. test=develop

* fix. test=develop

* update hashtable 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 10th version - 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* modify by compilation 4.1

* update. test=develop

* update. test=develop

* fix. test=develop

* modify by compilation 4.1

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1 19:30

* fix. test=develop

* update ps_gpu_wrapper.kps 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 11th version - 4.1

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 12nd version - 4.2

* fix. test=develop

* fix. test=develop

* modify by compilation 4.2

* 4.2 update

* fix. test=develop

* template init. test=develop

* update 4.6

* fix. test=develop

* template init. test=develop

* 4.6 modify by compilation

* hashtable template init. test=develop

* hashtable template init. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 13nd version - 4.7

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.11 update

* fix. test=develop

* fix. test=develop

* 4.11 update

* update by pre-commit

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.12 update

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 14th version - 4.13

* 4.13 update

* 4.14 update

* 4.14 update

* 4.14 update

* 4.14 modify by merged latest compilation

* retry CI 4.14

* 4.15 pass static check

* 4.15 modify by gpups CI

* 3.16 update by gpups CI - modify ps_gpu_wrapper.h

* 4.16 update

* 4.16 pass xpu compile

* 4.16 retry CI

* 4.16 update

* Adapt XPUPS - adapt BKCL comm for XPUPS - 4.24

* update by compilation

* Adapt XPUPS - register PSGPUTrainer for XPUPS - 4.25

* update device_worker_factory

* Adapt XPUPS - split heter_ps into .cu and .cc - 4.27

* Adapt XPUPS - register pull_box_sparse op under XPU_KP - 4.28

* update

* 5.7 modify ps_gpu_wrapper pull_sparse

* 5.11 update ps_gpu_wrapper CopyKeysKernel

* 5.13 modify calc_shard_offset_kernel & fill_shard_key_kernel

* modify fill_dvals_kernel & PullCopy & c_sync_calc_stream - 5.18

* modify PushCopy & fill_shard_grads_kernel & register push_box_sparse - 5.19
Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>

f8931c97

23 5月, 2022 2 次提交

[Internal reviewing] NHWC fix to am_vocoder model for oneDNN 2.6 (#42729) · d414af94

由 Jacek Czaja 提交于 5月 23, 2022

* - prototype of reimplemented fixes

* - compilation fixes

* - compilation fix

* - cosmetic info

* - hopefully fix

* - compilation fix

* - supported for nested blocking of cache clearing

* - fix

* - Unit test to changes

* - Compilation fix to windows (hopefully)

* - Moved resetting layout to ResetBlob

* - fixes after review

d414af94

remove is_init_py of RandomGenerator, and use Global RandomGenerator by default (#42876) · 3b488bae
由 zhouweiwei2014 提交于 5月 23, 2022
```
* remove is_init_py of RandomGenerator, and use Global Generator if not OP seed

* fix comment
```
3b488bae

20 5月, 2022 4 次提交

W

fix fused_attention_op cacheKV InferShape (#42900) · 7306d1fb
由 WangXi 提交于 5月 20, 2022

7306d1fb
Y

move activation kernel (#42880) · 191c441a
由 YuanRisheng 提交于 5月 20, 2022

191c441a

[Hackathon No.5] tril_indices OP (#41639) · 75db5b86

由 xiaoguoguo626807 提交于 5月 20, 2022

* add tril_indices cpu kernal

* modify tril_indice cpu op

* modify bug

* modify bug

* add tril_indices python api

* add tril_indices python api

* resolve conflict

* add tril_indices test

* modify details

* add tril_indices.cu

* pythonapi pass

* save tril_indices

* CPU tril_indices pass

* delete vlog

* modify test_tril_indices_op.py

* delete tril_indices_kernel.cc.swp

* delete tril_indice.cu

* modify code style

* add newline in creation.py

* modify creation.py linux newline

* delete annotation

* check code style

* check .py style add final_state??

* modify code style

* add gpu_tril_indices

* modify gpu_compiled_juage

* modify gpu judge

* code style

* add test example

* modify english document

modify english document

modify english document

modify document

modify document

* modify pram name

* modify pram name

* modify pram

* reduce test ex

75db5b86

Y
merge dymf branch (#42714) · 3f619290
由 yaoxuefeng 提交于 5月 20, 2022
```
merge dymf branch
```
3f619290

19 5月, 2022 3 次提交
- Q
  
  [MLU] add lookup_table_v2 and unstack op (#42847) · e726960a
  由 qipengh 提交于 5月 19, 2022
  
  e726960a
- J
  OneDNN md-in-tensor refactoring part 3: Changes in quantize and dequantize (#42766) · b522ca52
  由 jakpiase 提交于 5月 19, 2022
```
* added md support inside (de)quantizes

* added missing file

* changed paddle enforce text

* another paddle enforce change

* same as before

* removed broken tests
```
  b522ca52
- A
  
  [NPU] minor changes for version control to support version without suffix (#42856) · 892f6850
  由 Aganlengzi 提交于 5月 19, 2022
  
  892f6850
18 5月, 2022 3 次提交
- S
  matmul and matmul_v2 refactor (#42732) · 570d0322
  由 Sławomir Siwek 提交于 5月 18, 2022
```
* matmul refactor

* remove UT which only check ENFORCE output

* code format

* improve memory usage
```
  570d0322
- A
  [NPU] add take_along_axis and take_along_axis_grad kernels (#42773) · 6f0a28f5
  由 Aganlengzi 提交于 5月 18, 2022
```
* [NPU] add take_along_axis and take_along_axis_grad ops

* [NPU] add take_along_axis and take_along_axis_grad ops

* fix ut because cpu kernel can not be fallbacked
```
  6f0a28f5
- Y
  
  [collective] dynamic shape for send_v2 and recv_v2 (#42765) · 1f64c42e
  由 Yuang Liu 提交于 5月 18, 2022
  
  1f64c42e
17 5月, 2022 3 次提交
- A
  [NPU] add multinomial op (#42613) · fd140696
  由 Aganlengzi 提交于 5月 17, 2022
```
* [NPU] add multinomial op

* fix place

* deal with cann version

* fix for old operator

* change another way
```
  fd140696
- Z
  
  add yolo_box_fuse_pass, yolo_box_head_op, yolo_box_post_op (#42641) · 6b58de95
  由 zhupengyang 提交于 5月 17, 2022
  
  6b58de95
- A
  
  [NPU] add reduce_max_grad op (#42672) · 78d5cf7b
  由 Aganlengzi 提交于 5月 17, 2022
  
  78d5cf7b
16 5月, 2022 5 次提交
- N
  
  delete rank switch in broadcast_function.h for compile (#42645) · 8501fb00
  由 niuliling123 提交于 5月 16, 2022
  
  8501fb00
- W
  Add the new XDNN implementation. test=kunlun (#42683) · 87667c66
  由 wbn 提交于 5月 16, 2022
```
* Add the new XDNN implementation. test=kunlun

* Add the new XDNN implementation. test=kunlun

* Modify the code based on review, test=kunlun
```
  87667c66
- Y
  
  Optimize linspace to avoid GPU -> CPU copy. (#42750) · 34cda80b
  由 Yiqun Liu 提交于 5月 16, 2022
  
  34cda80b
- W
  
  fused_multi_transformer add fused softmax mask (#42636) · f9d5ae4e
  由 WangXi 提交于 5月 16, 2022
  
  f9d5ae4e
- J
  optimize cinn find graph by graph address (#42697) · 661d0800
  由 jiangcheng 提交于 5月 16, 2022
```
* optimize cinn find graph by graph address

* graph_key use int64_t instead of program string

* fix framework _to_readable_code python code

* rename get_readable_comile_key to get_serialize_comile_key
```
  661d0800

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致