提交 · d8b691242d02b4117eb4b06985cd0553946bac12 · PaddlePaddle / Paddle

20 5月, 2022 2 次提交

[Hackathon No.5] tril_indices OP (#41639) · 75db5b86

由 xiaoguoguo626807 提交于 5月 20, 2022

* add tril_indices cpu kernal

* modify tril_indice cpu op

* modify bug

* modify bug

* add tril_indices python api

* add tril_indices python api

* resolve conflict

* add tril_indices test

* modify details

* add tril_indices.cu

* pythonapi pass

* save tril_indices

* CPU tril_indices pass

* delete vlog

* modify test_tril_indices_op.py

* delete tril_indices_kernel.cc.swp

* delete tril_indice.cu

* modify code style

* add newline in creation.py

* modify creation.py linux newline

* delete annotation

* check code style

* check .py style add final_state??

* modify code style

* add gpu_tril_indices

* modify gpu_compiled_juage

* modify gpu judge

* code style

* add test example

* modify english document

modify english document

modify english document

modify document

modify document

* modify pram name

* modify pram name

* modify pram

* reduce test ex

75db5b86

Y
merge dymf branch (#42714) · 3f619290
由 yaoxuefeng 提交于 5月 20, 2022
```
merge dymf branch
```
3f619290

19 5月, 2022 3 次提交
- Q
  
  [MLU] add lookup_table_v2 and unstack op (#42847) · e726960a
  由 qipengh 提交于 5月 19, 2022
  
  e726960a
- J
  OneDNN md-in-tensor refactoring part 3: Changes in quantize and dequantize (#42766) · b522ca52
  由 jakpiase 提交于 5月 19, 2022
```
* added md support inside (de)quantizes

* added missing file

* changed paddle enforce text

* another paddle enforce change

* same as before

* removed broken tests
```
  b522ca52
- A
  
  [NPU] minor changes for version control to support version without suffix (#42856) · 892f6850
  由 Aganlengzi 提交于 5月 19, 2022
  
  892f6850
18 5月, 2022 3 次提交
- S
  matmul and matmul_v2 refactor (#42732) · 570d0322
  由 Sławomir Siwek 提交于 5月 18, 2022
```
* matmul refactor

* remove UT which only check ENFORCE output

* code format

* improve memory usage
```
  570d0322
- A
  [NPU] add take_along_axis and take_along_axis_grad kernels (#42773) · 6f0a28f5
  由 Aganlengzi 提交于 5月 18, 2022
```
* [NPU] add take_along_axis and take_along_axis_grad ops

* [NPU] add take_along_axis and take_along_axis_grad ops

* fix ut because cpu kernel can not be fallbacked
```
  6f0a28f5
- Y
  
  [collective] dynamic shape for send_v2 and recv_v2 (#42765) · 1f64c42e
  由 Yuang Liu 提交于 5月 18, 2022
  
  1f64c42e
17 5月, 2022 3 次提交
- A
  [NPU] add multinomial op (#42613) · fd140696
  由 Aganlengzi 提交于 5月 17, 2022
```
* [NPU] add multinomial op

* fix place

* deal with cann version

* fix for old operator

* change another way
```
  fd140696
- Z
  
  add yolo_box_fuse_pass, yolo_box_head_op, yolo_box_post_op (#42641) · 6b58de95
  由 zhupengyang 提交于 5月 17, 2022
  
  6b58de95
- A
  
  [NPU] add reduce_max_grad op (#42672) · 78d5cf7b
  由 Aganlengzi 提交于 5月 17, 2022
  
  78d5cf7b
16 5月, 2022 5 次提交
- N
  
  delete rank switch in broadcast_function.h for compile (#42645) · 8501fb00
  由 niuliling123 提交于 5月 16, 2022
  
  8501fb00
- W
  Add the new XDNN implementation. test=kunlun (#42683) · 87667c66
  由 wbn 提交于 5月 16, 2022
```
* Add the new XDNN implementation. test=kunlun

* Add the new XDNN implementation. test=kunlun

* Modify the code based on review, test=kunlun
```
  87667c66
- Y
  
  Optimize linspace to avoid GPU -> CPU copy. (#42750) · 34cda80b
  由 Yiqun Liu 提交于 5月 16, 2022
  
  34cda80b
- W
  
  fused_multi_transformer add fused softmax mask (#42636) · f9d5ae4e
  由 WangXi 提交于 5月 16, 2022
  
  f9d5ae4e
- J
  optimize cinn find graph by graph address (#42697) · 661d0800
  由 jiangcheng 提交于 5月 16, 2022
```
* optimize cinn find graph by graph address

* graph_key use int64_t instead of program string

* fix framework _to_readable_code python code

* rename get_readable_comile_key to get_serialize_comile_key
```
  661d0800
12 5月, 2022 5 次提交

S

Fix some typos in paddle/. (#42408) · 2012672c
由 Shuangchi He 提交于 5月 12, 2022

2012672c
F

[MLU] fix cnnl error when index is 2D (#42669) · 190cf44f
由 fwenguang 提交于 5月 12, 2022

190cf44f

Add cinn pass to program (#42623) · 9ac736c2

由 sneaxiy 提交于 5月 12, 2022

* add cinn pass to program

* remove build_cinn_pass ut

* polish ut, add ut

* guard ut with is_compiled_with_cinn

* enable ut test_build_cinn_pass_resnet

9ac736c2

add xpu buffer_reader, *test=kunlun (#42578) · cc343a41

由 z8hanghuan 提交于 5月 12, 2022

* add xpu buffer_reader, *test=kunlun

* xpu buffer_reader, use XPUDeviceGuard, *test=kunlun

* modify xpu.cmake, *test=kunlun

* modify xpu.cmake, *test=kunlun

* modify xpu.cmake, *test=kunlun

* add xpu buffer_reader, *test=kunlun

* add xpu buffer reader, *test=kunlun

* add xpu buffer reader, *test=kunlun

cc343a41

F

[MLU] add slice kernel (#42245) · ddb3868e
由 fwenguang 提交于 5月 12, 2022

ddb3868e

11 5月, 2022 2 次提交

Move weights and biases scale computing into pass (#42241) · c0652972

由 Zuza Gawrysiak 提交于 5月 11, 2022

* Add int8 scales gathering pass for convolution

* Fix typo

* Add unittest

* Add corrected unit test

* Change test name

* Remove enabling mkldnn in test

* Speed up test

* Change max examples

* Add functional test

* Change test name

* Add new test case

* Rename pass

c0652972

T

remove old XDNN implementation test=kunlun (#42404) · 7b828f71
由 taixiurong 提交于 5月 11, 2022

7b828f71

10 5月, 2022 8 次提交
- J
  shape mkldnn kernel adapted to NHWC (#42548) · d47690b2
  由 Jacek Czaja 提交于 5月 10, 2022
```
* - shape mkldnn adapted to NHWC

- NHWC shape mkldnn ut

- fixes to UT

- Fix to UT

- Fixes to UT

- Fix of compilation

* - lint candidate fix
```
  d47690b2
- J
  [Video detection] Added fill_constant FP32 FWD oneDNN kernel (#37216) · 66a10f36
  由 jakpiase 提交于 5月 10, 2022
```
* added fill_constant kernel

* CI fix

* ci fix

* switched from nan to zero memory

* CI FIX

* ci fixes

* CI rerun

* ci fix

* minor change

* CI rerun
```
  66a10f36
- B
  【PaddlePaddle Hackathon 2】18、为 Paddle 新增 paddle.heaviside 和 paddle.Tensor.heaviside API (#41872) · 4892d592
  由 BrilliantYuKaimin 提交于 5月 10, 2022
```
* Create elementwise_heaviside_op.cc

* add ElementwiseHeavisideFunctor

* Create test_elementwise_heaviside_op.py

* 增加heaviside的python接口

* add heaviside in white list

* 增加heaviside的签名

* 增加heaviside的核函数

* 增加heaviside梯度的核函数

* 增加heaviside梯度的注册

* 调整代码格式

* Update elementwise_sig.cc

* add heaviside in __all__

* Update heaviside docs

* Update math.py

* Update math.py

* Update math.py
```
  4892d592
- L
  
  add int8 for cast (#42634) · 8a100774
  由 lilong12 提交于 5月 10, 2022
  
  8a100774
- Q
  
  [MLU]add adam, adamw op of mlu device (#42557) · cc077693
  由 qipengh 提交于 5月 10, 2022
  
  cc077693
- F
  
  [MLU] add layernorm mlu kernel (#42356) · ecd6db43
  由 fwenguang 提交于 5月 10, 2022
  
  ecd6db43
- Q
  
  [MLU]add assign op of mlu device (#42591) · 4e5fb733
  由 qipengh 提交于 5月 10, 2022
  
  4e5fb733
- L
  
  fix bug for heter (#42590) · 21b35167
  由 lilong12 提交于 5月 10, 2022
  
  21b35167
09 5月, 2022 3 次提交
- J
  [Need approval] Add AdamW-CPU FP32 JIT assembly kernel (#42522) · 766c50ac
  由 joanna.wozna.intel 提交于 5月 09, 2022
```
* Add AdamW jit kernel

* Second implementation

* Add missing header

* Correct number of jit kernels in the test
```
  766c50ac
- J
  [Ready to merge] oneDNN NHWC matmul & elementwise kernels fixes (#42506) · bf481550
  由 Jacek Czaja 提交于 5月 09, 2022
```
* - fix to crash

- more fixes

- added diagnostic

- matmul output fixes.

- compilation fix

- stop rotating too small shapes

* - Added enabling of matmul_V2 onednn test
```
  bf481550
- C
  
  fix split stride_numel may be 0 (#42537) · d878f971
  由 chentianyu03 提交于 5月 09, 2022
  
  d878f971
07 5月, 2022 1 次提交

Reduce the number of threads per block of deformable_psroi_pooling to solve... · 8c1b2fa6

由 FlyingQianMM 提交于 5月 07, 2022

Reduce the number of threads per block of deformable_psroi_pooling to solve the bug where too many resources requested for launch (#42531)

8c1b2fa6

06 5月, 2022 5 次提交

bind elementwise_mod_op_xpu (#42175) · 6ea2f049

由 enzodechine 提交于 5月 06, 2022

* bind elementwise_mod_op_xpu *test=kunlun

* add more supported dtypes and UTs *test=kunlun

* fix datatype error

* add op to in xpu1_op_list

* Update Mac cmake version >=3.15 (#41456)

* Update Mac cmake version >=3.15

* notest;read test1

notest;read test2

notest;read test3

* fix inference link error

* fix inference link error

* fix windows link error

* fix cmake_policy

* fix build big size

* Add paddle::variant and replace paddle::any (#42139)

* add variant and replace any

* split attribute

* disable unittest failed in eager CI in temporary (#42101)

* test=py3-eager

* test=py3-eager

* test=py3-eager

* combine graph_table and feature_table in graph_engine (#42134)

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add dsm sample method

* add graph_neighbor_sample_v2

* Add graph_neighbor_sample_v2

* fix for loop

* add cpu sample interface

* fix kernel judgement

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* change index settings

* recover test

* recover test

* fix spelling

* recover

* fix

* move cudamemcpy after cuda stream sync

* fix linking problem

* remove comment

* add cpu test

* test

* add cpu test

* change comment

* combine feature table and graph table

* test

* test

* pybind

* test

* test

* test

* test

* pybind

* pybind

* fix cmake

* pybind

* fix

* fix

* add pybind

* add pybind
Co-authored-by: NDesmonDay <908660116@qq.com>

* [CustomDevice] add eager mode support (#42034)

* fix FlattenContiguousRangeOpConverter out dim error (#42087)

* fix FlattenContiguousRangeOpConverter out dim error

* update code

* fix python3.10 compile bug on windows (#42140)

* Optimize dygraph GetExpectedKernelType perf (#42154)

* opt dygraph scheduling

* revert part impl

* fix incorrect usages of std::move and other compile errors (#41045)

* fix bug of std::move and others

* fix an compile error in debug mode

* fix wrong copy assignment operator
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* reformat
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* reformat
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* fix ArrayRef constructor following llvm

* fix format

* fix conflict with master

* fix variant compile error (#42203)

* [Eager] Support numpy.ndarry in CastNumpy2Scalar (#42136)

* [Eager] Remove redundancy code, fix fp16 case (#42169)

* [Eager] Support div(scalar) in eager mode (#42148)

* [Eager] Support div scalar in eager mode

* Updated and remove debug logs

* Remove list, use 'or' directly

* Remove useless statement

* fix recompute (#42128)

* fix recompute

* modify return

* add LICENSE in wheel dist-info package (#42187)

* replace any by variant in infermeta (#42181)

* 【PaddlePaddle Hackathon 2】24、为 Paddle 新增 nn.ChannelShuffle 组网 API (#40743)

* Add infermeta for ChannelShuffle

* Create channel_shuffle_grad_kernel.h

* Create channel_shuffle_kernel.h

* Create channel_shuffle_sig.cc

* Create channel_shuffle_op.cc

ChannelShuffle算子的描述

* Create channel_shuffle_kernel_impl.h

ChannelShuffle核函数的实现

* Create channel_shuffle_grad_kernel_impl.h

ChannelShuffle反向核函数的实现

* Add kernel register of channel shuffle and grad

注册ChannelShuffle及其反向的核函数

* add nn.functional.channel_shuffle

* add nn.ChannelShuffle

* Create test_channel_shuffle.py

* Update example of ChannelShuffle in vision.py

* Update test_channel_shuffle.py

* 修改channel_shuffle核函数的实现位置

* 修正代码格式

* 删除多余空格

* 完善channel_shuffle的错误检查

* Update unary.cc

* Update channel_shuffle_op.cc

* Update test_channel_shuffle.py

* Update unary.cc

* add channel_shuffle

* Update test_channel_shuffle.py

* Update vision.py

* 调整代码格式

* Update channel_shuffle_sig.cc

* 更新ChannelShuffle的文档

* 更新channel_shuffle的文档

* remove ChannelShuffleOpArgumentMapping

* add ChannelShuffleGradInferMeta

* Update channel_shuffle_op.cc

* 调整channel_shuffle及其梯度的核函数的位置

* Do not reset default stream for StreamSafeCUDAAllocator (#42149)

* remove redundant computation in Categorical.probs (#42114)

* Downloading data for test_analyzer_vit_ocr (#42041)

* Change server URL

* update config

* add test to parallel UT rule

* add checksum to ensure files are downloaded

* change downloading target

* reuse existing variable

* change target directory

* fix en docs of some Apis (gradients, scope_guard, cuda_places, name_scope, device_guard, load_program_state, scale, ParamAttr and WeightNormParamAttr) (#41604)

* Update scope_guard; test=document_fix

* gradients; test=document_fix

* gradients; test=document_fix

* name_scope; test=document_fix

* cpu_places; test=document_fix

* WeightNormParamAttr; test=document_fix

* cuda_places; test=document_fix

* load_program_state; test=document_fix

* device_guard; test=document_fix

* device_guard; test=document_fix

* ParamAttr; test=document_fix

* scale; test=document_fix

* scale; test=document_fix

* update code example；test=document_fix
Co-authored-by: NChen Long <1300851984@qq.com>

* fix datatype error

add op to in xpu1_op_list

*test=kunlun

* fix elementwise_mod op path error  *test=kunlun

* fix elementwise_mod UT error  *test=kunlun

* fix datatype error

add op to in xpu1_op_list

*test=kunlun

add op to in xpu1_op_list

fix elementwise_mod op path error  *test=kunlun

fix elementwise_mod UT error  *test=kunlun
Co-authored-by: Ntianshuo78520a <707759223@qq.com>
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Npangyoki <pangyoki@126.com>
Co-authored-by: Nseemingwang <seemingwang@users.noreply.github.com>
Co-authored-by: NDesmonDay <908660116@qq.com>
Co-authored-by: Nronnywang <524019753@qq.com>
Co-authored-by: Nbaoachun <962571062@qq.com>
Co-authored-by: Zhou Wei <1183042833@qq.com>
Co-authored-by: Ntiancaishaonvjituizi <452565578@qq.com>
Co-authored-by: NWeilong Wu <veyron_wu@163.com>
Co-authored-by: NRoc <30228238+sljlp@users.noreply.github.com>
Co-authored-by: NBrilliantYuKaimin <91609464+BrilliantYuKaimin@users.noreply.github.com>
Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
Co-authored-by: NFeiyu Chan <chenfeiyu@baidu.com>
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
Co-authored-by: NYilingyelu <103369238+Yilingyelu@users.noreply.github.com>
Co-authored-by: NChen Long <1300851984@qq.com>

6ea2f049

A

[NPU] support model PPO (#42484) · d73eb38c
由 Aganlengzi 提交于 5月 06, 2022

d73eb38c
A
[NPU] add clip_by_norm op (#42411) · 1588e7e7
由 Aganlengzi 提交于 5月 06, 2022
```
* [NPU] add clip_by_norm op

* fix

* update
```
1588e7e7

[XPUPS] Register pull_box_sparse op under XPU_KP compilation (#42354) · 63067e90

由 Fan Zhang 提交于 5月 06, 2022

* Adapt XPUPS - 1st version - 3.24

* Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24

* Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25

* refactor heter comm kernel

* update. test=develop

* Adapt XPUPS - modify by compilation - 4th version - 3.27

* update calc_shard_offset. test=develop

* update xpu kernel. test=develop

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* heter_comm update

* heter_comm update

* update calc_shard_offset. test=develop

* heter_comm update

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30

* update. test=develop

* update pslib.cmake

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* Adapt XPUPS - modify by kp compilation  - 6th version - 3.30

* update. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* used by minxu

* update heter_comm_inl

* fix. test=develop

* Adapt XPUPS - modify by kp compilation  - 7th version - 3.30

* fix. test=develop

* add optimizer kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 3.31 update

* Adapt XPUPS - update kp compilation path  - 8th version - 3.31

* add optimizer kernel. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm.h 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update hashtable. test=develop

* update. test=develop

* Adapt XPUPS - update by kp compilation  - 9th version - 4.1

* update hashtable. test=develop

* fix. test=develop

* update hashtable 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 10th version - 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* modify by compilation 4.1

* update. test=develop

* update. test=develop

* fix. test=develop

* modify by compilation 4.1

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1 19:30

* fix. test=develop

* update ps_gpu_wrapper.kps 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 11th version - 4.1

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 12nd version - 4.2

* fix. test=develop

* fix. test=develop

* modify by compilation 4.2

* 4.2 update

* fix. test=develop

* template init. test=develop

* update 4.6

* fix. test=develop

* template init. test=develop

* 4.6 modify by compilation

* hashtable template init. test=develop

* hashtable template init. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 13nd version - 4.7

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.11 update

* fix. test=develop

* fix. test=develop

* 4.11 update

* update by pre-commit

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.12 update

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 14th version - 4.13

* 4.13 update

* 4.14 update

* 4.14 update

* 4.14 update

* 4.14 modify by merged latest compilation

* retry CI 4.14

* 4.15 pass static check

* 4.15 modify by gpups CI

* 3.16 update by gpups CI - modify ps_gpu_wrapper.h

* 4.16 update

* 4.16 pass xpu compile

* 4.16 retry CI

* 4.16 update

* Adapt XPUPS - adapt BKCL comm for XPUPS - 4.24

* update by compilation

* Adapt XPUPS - register PSGPUTrainer for XPUPS - 4.25

* update device_worker_factory

* Adapt XPUPS - split heter_ps into .cu and .cc - 4.27

* Adapt XPUPS - register pull_box_sparse op under XPU_KP - 4.28

* update
Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>

63067e90

Z

Fix the implementation of fused_fast_ln_fwd_kernel in test mode (#42527) · 5acd764d
由 Zhang Zheng 提交于 5月 06, 2022

5acd764d

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功