提交 · c37af19c963ab1b4c65ac4f7ca83e31f864c76d3 · Crayon鑫 / Paddle

15 4月, 2022 16 次提交

由 Roc 提交于 4月 15, 2022

* moe ref

* ref commit; test=document_fix

* update; test=document_fix

* update test=document_fix

c37af19c

H
fix a bug which will casue cuda address error when the input size is very large (#41824) · e25b75b6
由 huangxu96 提交于 4月 15, 2022
```
As the title
```
e25b75b6

[Phi]Reduce kernels into multiply files (#41747) · 1927aff9

由 chentianyu03 提交于 4月 15, 2022

* split reduce_kernel

* rm reduce_kernel in cmake

* split reduce_grad kernels

* fix cmake build error

* format code

* fix standalone_executor_test error

1927aff9

[DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode (#41730) · 27f28e82

由 Zhanlue Yang 提交于 4月 15, 2022

* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad

* Fixed elementwise issue

* Addressed CI failures

* [DoubleGrad] Enabled test_imperative_triple_grad test cases under eager_mode

* [DoubleGrad] Enabled test_autograd_functional_dynamic.py under eager mode

* Enabled more test cases

* [DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode

* Adjusted test_imperative_star_gan_with_gradient_penalty.py

27f28e82

H
[Dygraph] Refactor Model Parallel in eager mode (#41761) · e6fb6599
由 Haohongxiang 提交于 4月 15, 2022
```
* refactor mp in eager mode

* update

* update

* add uts
```
e6fb6599
T

add fp16 for masked_select on kunlun, *test=kunlun (#41215) · ff818c77
由 TTerror 提交于 4月 15, 2022

ff818c77
L

update (#41762) · 482e5b6c
由 lilong12 提交于 4月 15, 2022

482e5b6c
D
【GPUPS】add afsclient and gpupsutil (#41324) · 30a1213b
由 danleifeng 提交于 4月 15, 2022
```
* add gpupsutil and afsclient; test=develop
```
30a1213b
F

[MLU] add mlu softmax kernel (#41816) · 2d6b71a2
由 fwenguang 提交于 4月 15, 2022

2d6b71a2

Add eager string tensor (#41039) · a22b68b8

由 Jack Zhou 提交于 4月 15, 2022

* Add core.eager.StringTensor __init__ which pyarray args can be passed

* Add the numpy method of core.eager.StringTensor

* revert tensor.to_string modification

* Add ToPyObject for core.eager.StringTensor

* Add debug string for core.eager.StringTensor

* Remove place args of core.eager.StringTensor temporarily

* Fix check string_tensor error

* remove dtype of core.eager.StringTensor

* add core.eager.StringTensor unittest

* remove pstring from VarDesc

* Add InitStringTensorWithStringTensor

* Remove to_string modification

* Remove zero_copy arg from StringTensor creator

a22b68b8

[XPUPS]fix hashtable_kernel.kps (#41790) · ef6ff4ef

由 zmxdream 提交于 4月 15, 2022

* refactor heter comm kernel

* update. test=develop

* update calc_shard_offset. test=develop

* update xpu kernel. test=develop

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* update. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* add optimizer kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update hashtable. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* template init. test=develop

* hashtable template init. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix hashtable_kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop
Co-authored-by: NWorgenZhang <frank08081993@gmail.com>

ef6ff4ef

A
[IPU] add mixed-precission support for ipu (#41733) · d7224482
由 Allen Guo 提交于 4月 15, 2022
```
* add mixed-precission support for ipu

* restore cast_model_to_fp16 api

* update UTs
```
d7224482

support no_need_buffer in eager_fluid state (#41720) · 840d2eb6

由 pangyoki 提交于 4月 15, 2022

* support no_need_buffer in eager_fluid state

* change no_need_buffer info from fwd_info to bwd_info

* fix CI fail, gru_unit donnot use no_need_buffer

* fix conflict between no_need_buffer and dispensable

* use tensor.define in dispensable

* solve conflict

* solve conflict

840d2eb6

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

F

[MLU] add mlu activation kernels (#41751) · 10114859
由 fwenguang 提交于 4月 15, 2022

10114859
F
[MLU] add mlu new profiler (#41138) · fc208b7e
由 fwenguang 提交于 4月 15, 2022
```
* [MLU] add mlu new profiler

* fix format
```
fc208b7e

14 4月, 2022 16 次提交

L
[KP] Add registry for elementwise_add/max/min/sub/div/mul/floordiv on XPU2 with KP lib (#41494) · fbe2c311
由 Lijunhui 提交于 4月 14, 2022
```
* regist elementwise_xxx
```
fbe2c311
C

remove all is initialized using (#41766) · 4733fe60
由 Chen Weihang 提交于 4月 14, 2022

4733fe60

[Phi] Support construct Scalar by using Non-CPU Tensor (#41765) · 54ccc308

由 YuanRisheng 提交于 4月 14, 2022

* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency

* deal with conflict

* fix bugs when run unit test

* fix unit test bugs

54ccc308

Y

Optimize the finding of max workspace size. (#41741) · 3ce879db
由 Yiqun Liu 提交于 4月 14, 2022

3ce879db
L
executor perf statistics (#41648) · cbe7466f
由 liutiexing 提交于 4月 14, 2022
```
* executor perf statistics

* fix ut

* fix ut

* fix ut

* add ut

* add ut
```
cbe7466f

Fix to #38693 (minimal UT) (#41026) · d0f3296b

由 Jacek Czaja 提交于 4月 14, 2022

* Add UT

- Added missed data_layout

- Added missing conversions

- NDHWC added

- NDHWC support in data_transform

- another fix

- condddate change

- fix

u- fix

- fix

- fix

- fix

- fix

- fix to hack

- compilation fix

- fix to automatic merge

* - reduced UT

* - fix

* - lint

* - fix to lint

d0f3296b

FC+elementwise_add (residual connection) (#41776) · 92d8d0bc

由 Sławomir Siwek 提交于 4月 14, 2022

* Change tensor name to match activation

* declare fc_eltwise_add pass

* merge conv_eltwise refactor PR

* first compilable draft

* unittest feedback tools

* Fuse pass tester

* Move IsReachable() to shared file

* 100% coverage of fuse_pass_tester.cc

* register pass

* Add bias node

* Improve unit tests / remove bias node from pattern

* improve fc_eltwiseadd_unittest

* cancel eltwise_add fuse if act is already fused

* Add elementwise_input scale

* Residual MVP

* Add new FC attrs

* Add more test cases

* Add missing op attrs

* Adapt code to new Elementwise pattern

* reuse existing fcpattern

* improve code style

* remove unused arguments

* fix typo

* remove whitespace

* remove int8 related code

* Remove attributes from base ops

* style

* style check

* Remove input from base op

* Set attribute during fuse

* ut timeout

* download and test model

* DRY

* apply feedback from review

* Style check

* fix typo

* cosmetic changes

* explicitly set residual as output

* VIT-OCR accuracy check

* trigger CI

* remove whitespaces

* fix missing data file

92d8d0bc

S

fix bug of ps_py_proto cant find path for the folder not created (#41793) · 6dc881e9
由 Sing_chan 提交于 4月 14, 2022

6dc881e9

support multi layer and bidirection of lstm_grad, *test=kunlun (#41742) · 8b07ce0e

由 z8hanghuan 提交于 4月 14, 2022

* support multi layer and bidirection of lstm_grad, *test=kunlun

* support multi layer and bidirection of lstm_grad, *test=kunlun

8b07ce0e

S

fix bug of set cuda lib in demo_ci and infer_ut (#41677) · bda4965a
由 Sing_chan 提交于 4月 14, 2022

bda4965a

[DoubleGrad] Enabled test_autograd_functional_dynamic.py under eager mode (#41668) · ad9585b6

由 Zhanlue Yang 提交于 4月 14, 2022

* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad

* Fixed elementwise issue

* Addressed CI failures

* [DoubleGrad] Enabled test_imperative_triple_grad test cases under eager_mode

* [DoubleGrad] Enabled test_autograd_functional_dynamic.py under eager mode

* Enabled more test cases

* Fixed performance issues

* Fixed minor issue

ad9585b6

C

remove inner_place using (#41768) · de2a3942
由 Chen Weihang 提交于 4月 14, 2022

de2a3942
X

[fix bug] communication op suppport rccl (#41763) · e26e51ba
由 xiayanming 提交于 4月 14, 2022

e26e51ba
Z

support weakref for eager tensor (#41769) · 419d8eb2
由 zhangbo9674 提交于 4月 14, 2022

419d8eb2

add mkldnn int8 pass [step3] (#41599) · 8e2d4d30

由 baoachun 提交于 4月 14, 2022

* add mkldnn int8 pass [step3]

* Add test for compute_propagate_scales_mkldnn_pass

* update pass

* update api comment and python api
Co-authored-by: Nwozna <joanna.wozna@intel.com>

8e2d4d30

Added shuffle_channel BF16/FP32 FWD oneDNN kernel (#39756) · c7623d72

由 jakpiase 提交于 4月 14, 2022

* added shuffle_channel bf16/fp32 fwd kernel

* added missing files

* CI fix

* changed from pten to phi

* tmp save

* added reviewers suggestions

* fix for test

c7623d72

13 4月, 2022 8 次提交

Lml/add prim ops (#41201) · 97dec7ca

由 levi131 提交于 4月 13, 2022

* native commit for triple grad of sigmod

* Updated unittests files

* init functional jacobian api

* Updated trible_test func

* Updated gradient_checker & test_script

* finish test with dtype float32

* add float64 test case

* polish code

* use atol=1e-5 with dtype float64

* fix for ci

* set timeout for test_jacobian

* fix dygraph grad to support high differential

* polish API docstring

* Updated gradient checker and some related files

* fix double grad strip error for high differential

* fix double grad strip error for high differential

* Add Sigmoid triple grad tests

* fix dygraph double grad dtype error when calling for high differential senario

* Updated triple grad teses func

* Use np.random to initialize ddx

* Updated triple_grad_check func

* add todo for gradient checker and refine some comments

* remove additional code

* add test for warnging in backward.py

* format python code

* support multi input in triple gradient checker

* Add matmul triple grad kernel

* Updated comments of TODO

* Supported some special tests

* Change code-format to follow CI std

* Updated gradient_checker.py

* Fix conflicts

* Removed unnecessary printing log

* Change code style to follow CI std

* merge upstream

* add_p

* rm useless files

* add sub_p mul_p div_p

* add sqrt_p and tanh_p

* add reshape_p

* add broadcast_p

* add broadcast_p fill_constant_p matmul_p reduce_p reshape_p transpose_p

* add split_p and concat_p

* add gather_p and scatter_add_p

* add slice_select_p and slice_assign_p

* add multi input check for add_p, sub_p, mul_p, div_p

* update concat_p

* refine gather_p and scatter_add_p

* refine slice_assign_p and slice_select_p

* add 9 test for prim ops

* add more test and fix some bug

* add more test

* register proto

* add shape valid check for broadcast_p op, and add keepdim attr into reduce_p op proto

* support multi input and multi output for split_p and concat_p

* fix slice bug for slice_select_p and slice_assign_p

* dtype for axis attr should be long int

* update dtype for axis attr int64_t

* update for iscan CI

* add more shape and dtype check

* change IndexTensor into int32 dtype

97dec7ca

the one ps proto (#41659) · b12af9e1

由 wangguanqun 提交于 4月 13, 2022

* the one ps proto

* the one ps proto

* fix

* fix

* fix

* fix windows ci

* fix windows ci

* add dependency

* add dependency

b12af9e1

Z
[XPUPS]add support for kunlun2 (#40985) · c9c03e7b
由 zmxdream 提交于 4月 13, 2022
```
[XPUPS]add support for kunlun2
Co-authored-by: NWorgenZhang <frank08081993@gmail.com>
```
c9c03e7b
C
fix new dygraph record event (#41715) · ca4aea2c
由 chenjian 提交于 4月 13, 2022
```
* fix new dygraph record event

* refine name

* fix

* fix

* fix according to review
```
ca4aea2c
L

Use densetensor instead of Tensor for ProcessGroup (#41403) · 1e56ca8a
由 lilong12 提交于 4月 13, 2022

1e56ca8a

init roll convert (#41689) · 14c3c450

由 feng_shuai 提交于 4月 13, 2022

* init roll convert

* add ut for roll convert

* roll convert don't support trt6.0

* fix: change ut for trt 7.0.0.1

14c3c450

Z
Add yaml and unittest for SGD (#41485) · 6d1e03a2
由 zyfncg 提交于 4月 13, 2022
```
* add sgd yaml

* change python api

* open eager mode in sgd

* fix bug
```
6d1e03a2
T
Revert "[Phi] Support construct Scalar by using Non-CPU Tensosr (#41528)" (#41740) · 404c4a6b
由 tianshuo78520a 提交于 4月 13, 2022
```
This reverts commit fe214af2.
```
404c4a6b

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致