提交 · 7ee9ba2f29f78ab669ee35f9d32e2c967e67c849 · PaddlePaddle / Paddle

18 4月, 2022 7 次提交
- J
  
  fix_poo2d_trt_convert (#41860) · 7ee9ba2f
  由 JingZhuangzhuang 提交于 4月 18, 2022
  
  7ee9ba2f
- W
  
  remove fluid memory pool (#41862) · 9f9e591d
  由 Wilber 提交于 4月 18, 2022
  
  9f9e591d
- A
  [Eager] Add _fallback_legacy_dygraph for npu/xpu/rocm (#41774) · 5a103150
  由 Aurelius84 提交于 4月 18, 2022
```
* [Eager] add _fallback_legacy_dygraph for npu/xpu/rocm

* fix import
```
  5a103150
- T
  cinn_launch_op: optimize the overhead of preparing variables before executing... · 2d4fe163
  由 TeFeng Chen 提交于 4月 18, 2022
```
cinn_launch_op: optimize the overhead of preparing variables before executing cinn compiled program (#41777)

* optimize preparation overhead before executing cinn compiled program

* update code notes

* fix flag annotation

* add a flag of auto-tune feature beforehand
```
  2d4fe163
- Z
  
  Add sparse kernel coalesced (#41784) · 8f469ddd
  由 zhangkaihuo 提交于 4月 18, 2022
  
  8f469ddd
- S
  Optimization for graph_sample_neighbors API (#41447) · c31dd04c
  由 Siming Dai 提交于 4月 18, 2022
```
* add eids result for graph_sample_neighbors

* fix bug

* move fisher_yates sample to warp

* add cpu eid output

* delete comment

* delete comment

* change nullptr placeholder

* optimize sample kernel

* fix mutable_data
```
  c31dd04c
- Q
  [MLU]add op: reduce_sum, elementwise_sub (#41697) · 9f06069d
  由 qipengh 提交于 4月 18, 2022
```
* [MLU]add op: reduce_sum, elementwise_sub

* [MLU]del unrelated code
```
  9f06069d
17 4月, 2022 3 次提交

由 Fan Zhang 提交于 4月 17, 2022

* Adapt XPUPS - 1st version - 3.24

* Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24

* Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25

* refactor heter comm kernel

* update. test=develop

* Adapt XPUPS - modify by compilation - 4th version - 3.27

* update calc_shard_offset. test=develop

* update xpu kernel. test=develop

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* heter_comm update

* heter_comm update

* update calc_shard_offset. test=develop

* heter_comm update

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30

* update. test=develop

* update pslib.cmake

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* Adapt XPUPS - modify by kp compilation  - 6th version - 3.30

* update. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* used by minxu

* update heter_comm_inl

* fix. test=develop

* Adapt XPUPS - modify by kp compilation  - 7th version - 3.30

* fix. test=develop

* add optimizer kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 3.31 update

* Adapt XPUPS - update kp compilation path  - 8th version - 3.31

* add optimizer kernel. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm.h 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update hashtable. test=develop

* update. test=develop

* Adapt XPUPS - update by kp compilation  - 9th version - 4.1

* update hashtable. test=develop

* fix. test=develop

* update hashtable 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 10th version - 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* modify by compilation 4.1

* update. test=develop

* update. test=develop

* fix. test=develop

* modify by compilation 4.1

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1 19:30

* fix. test=develop

* update ps_gpu_wrapper.kps 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 11th version - 4.1

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 12nd version - 4.2

* fix. test=develop

* fix. test=develop

* modify by compilation 4.2

* 4.2 update

* fix. test=develop

* template init. test=develop

* update 4.6

* fix. test=develop

* template init. test=develop

* 4.6 modify by compilation

* hashtable template init. test=develop

* hashtable template init. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 13nd version - 4.7

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.11 update

* fix. test=develop

* fix. test=develop

* 4.11 update

* update by pre-commit

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.12 update

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 14th version - 4.13

* 4.13 update

* 4.14 update

* 4.14 update

* 4.14 update

* 4.14 modify by merged latest compilation

* retry CI 4.14

* 4.15 pass static check

* 4.15 modify by gpups CI

* 3.16 update by gpups CI - modify ps_gpu_wrapper.h

* 4.16 update

* 4.16 pass xpu compile

* 4.16 retry CI

* 4.16 update
Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>

0ef3ef28

[Perf] Optimize dygraph scheduling performance (#41696) · 7ee31a96

由 Chen Weihang 提交于 4月 17, 2022

* split phi and fluid infermeta context

* resolve conflict

* fix type error

* optimize scheduling perf

* spec small vector size

* replace all grad var name

* fix test failed

* move init defalut signature

* polish details

* polish details

* fix no init bug

* init sig for tests

* add init sig for infer

* fix infrt error

* fix infrt failed

* fix kunlun error

* fix infrt failed

7ee31a96

[CustomOp] Fix PlaceType related compat error (#41826) · b5d9c31c

由 Chen Weihang 提交于 4月 17, 2022

* fix place type related compat error

* fix test failed

* remove dll decl

* revert place type change

* add dll decl

b5d9c31c

16 4月, 2022 5 次提交

王

move fc_functor from fluid to phi.test=develop (#41856) · 21aa3adc
由王明冬提交于 4月 16, 2022

21aa3adc

modify xpu.cmake,*test=kunlun (#41832) · f3753b7f

由 z8hanghuan 提交于 4月 16, 2022

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

f3753b7f

B

fix_sharding_copy_right (#41849) · 5e5ae0a0
由 Baibaifan 提交于 4月 16, 2022

5e5ae0a0

Moe ref (#41864) · e9a63237

由 Roc 提交于 4月 16, 2022

* moe ref

* ref commit; test=document_fix

* update; test=document_fix

* update test=document_fix

* update; test=document_fix

e9a63237

Lml/prim op pywrapper (#41813) · ebf4fe6e

由 levi131 提交于 4月 16, 2022

* native commit for triple grad of sigmod

* Updated unittests files

* init functional jacobian api

* Updated trible_test func

* Updated gradient_checker & test_script

* finish test with dtype float32

* add float64 test case

* polish code

* use atol=1e-5 with dtype float64

* fix for ci

* set timeout for test_jacobian

* fix dygraph grad to support high differential

* polish API docstring

* Updated gradient checker and some related files

* fix double grad strip error for high differential

* fix double grad strip error for high differential

* Add Sigmoid triple grad tests

* fix dygraph double grad dtype error when calling for high differential senario

* Updated triple grad teses func

* Use np.random to initialize ddx

* Updated triple_grad_check func

* add todo for gradient checker and refine some comments

* remove additional code

* add test for warnging in backward.py

* format python code

* support multi input in triple gradient checker

* Add matmul triple grad kernel

* Updated comments of TODO

* Supported some special tests

* Change code-format to follow CI std

* Updated gradient_checker.py

* Fix conflicts

* Removed unnecessary printing log

* Change code style to follow CI std

* merge upstream

* add priops.py

* add_p

* rm useless files

* add sub_p mul_p div_p

* add sqrt_p and tanh_p

* add reshape_p

* add broadcast_p

* Add python primitive wrappers.

* Jvp rules updated.

* JVP rules done for all the 17 primops.

* quick check and fixes.

* add jvp(op, *args)

* add broadcast_p fill_constant_p matmul_p reduce_p reshape_p transpose_p

* add split_p and concat_p

* add gather_p and scatter_add_p

* add slice_select_p and slice_assign_p

* Add transpose rules.

* add multi input check for add_p, sub_p, mul_p, div_p

* update concat_p

* Linearize and transpose in progress..

* refine gather_p and scatter_add_p

* updated.

* update transpose.

* refine slice_assign_p and slice_select_p

* init commit for lower

* Merged with primitive ops.

* small update

* add rules for orig2prim and prim2orig

* add 9 test for prim ops

* add more test and fix some bug

* add more test

* register proto

* Adding primops test.

* add shape valid check for broadcast_p op, and add keepdim attr into reduce_p op proto

* support multi input and multi output for split_p and concat_p

* Test updated.

* update

* fix slice bug for slice_select_p and slice_assign_p

* updated.

* Ops updated.

* Refactor and bug fixes.

* updated.

* finish orig2prim and prim2orig rules

* dtype for axis attr should be long int

* update dtype for axis attr int64_t

* update for iscan CI

* Update primx.

* Refactor vars in primx.

* update for lower transform

* update primx.py

* update

* Fix linearize and transpose.

* Update is_dot

* Update is_dot

* Update is_dot

* add gradient aggregation, fix add_transpose.

* pass first linearize+transpose test.

* update test

* add_prim_op_pywrapper

* Add primops UT

* Fix set_value and update

* Fix code format and PR-CI-Coverage
Co-authored-by: Nveyron95 <veyron_wu@163.com>
Co-authored-by: NJiabin Yang <360788950@qq.com>
Co-authored-by: NTongxin Bai <waffle.bai@gmail.com>
Co-authored-by: N0x45f <wangzhen45@baidu.com>

ebf4fe6e

15 4月, 2022 25 次提交

solve brpc compile in arm-ubantu18 (#41649) · 56dafc4f

由 ziyoujiyi 提交于 4月 15, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* arm_brpc compile

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* only output is ok

* base is ok

* .

* .

* .

* .

* .

* .

* .

* .

* add switch server bin

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* adapt brpc ssl

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

56dafc4f

gpu_graph engine optimization+ (#41455) · ce72690c

由 seemingwang 提交于 4月 15, 2022

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* recover test

* recover test

* fix spelling

* recover

* fix

ce72690c

Moe ref (#41836) · c37af19c

由 Roc 提交于 4月 15, 2022

* moe ref

* ref commit; test=document_fix

* update; test=document_fix

* update test=document_fix

c37af19c

H
fix a bug which will casue cuda address error when the input size is very large (#41824) · e25b75b6
由 huangxu96 提交于 4月 15, 2022
```
As the title
```
e25b75b6

[Yaml]add adamw yaml (#41678) · ea0a164b

由 chentianyu03 提交于 4月 15, 2022

* add adamw yaml

* fix test case error

* make the name of weight and bias in linear1 and linear2 to be constant

ea0a164b

[Phi]Reduce kernels into multiply files (#41747) · 1927aff9

由 chentianyu03 提交于 4月 15, 2022

* split reduce_kernel

* rm reduce_kernel in cmake

* split reduce_grad kernels

* fix cmake build error

* format code

* fix standalone_executor_test error

1927aff9

[DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode (#41730) · 27f28e82

由 Zhanlue Yang 提交于 4月 15, 2022

* [DoubleGrad] Enabled double grad test cases in eager_mode for test_imperative_double_grad

* Fixed elementwise issue

* Addressed CI failures

* [DoubleGrad] Enabled test_imperative_triple_grad test cases under eager_mode

* [DoubleGrad] Enabled test_autograd_functional_dynamic.py under eager mode

* Enabled more test cases

* [DoubleGrad] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode

* Adjusted test_imperative_star_gan_with_gradient_penalty.py

27f28e82

H
[Dygraph] Refactor Model Parallel in eager mode (#41761) · e6fb6599
由 Haohongxiang 提交于 4月 15, 2022
```
* refactor mp in eager mode

* update

* update

* add uts
```
e6fb6599
T

add fp16 for masked_select on kunlun, *test=kunlun (#41215) · ff818c77
由 TTerror 提交于 4月 15, 2022

ff818c77
L

update (#41762) · 482e5b6c
由 lilong12 提交于 4月 15, 2022

482e5b6c
D
【GPUPS】add afsclient and gpupsutil (#41324) · 30a1213b
由 danleifeng 提交于 4月 15, 2022
```
* add gpupsutil and afsclient; test=develop
```
30a1213b
F

[MLU] add mlu softmax kernel (#41816) · 2d6b71a2
由 fwenguang 提交于 4月 15, 2022

2d6b71a2

Add eager string tensor (#41039) · a22b68b8

由 Jack Zhou 提交于 4月 15, 2022

* Add core.eager.StringTensor __init__ which pyarray args can be passed

* Add the numpy method of core.eager.StringTensor

* revert tensor.to_string modification

* Add ToPyObject for core.eager.StringTensor

* Add debug string for core.eager.StringTensor

* Remove place args of core.eager.StringTensor temporarily

* Fix check string_tensor error

* remove dtype of core.eager.StringTensor

* add core.eager.StringTensor unittest

* remove pstring from VarDesc

* Add InitStringTensorWithStringTensor

* Remove to_string modification

* Remove zero_copy arg from StringTensor creator

a22b68b8

[XPUPS]fix hashtable_kernel.kps (#41790) · ef6ff4ef

由 zmxdream 提交于 4月 15, 2022

* refactor heter comm kernel

* update. test=develop

* update calc_shard_offset. test=develop

* update xpu kernel. test=develop

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* update. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* add optimizer kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update hashtable. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* template init. test=develop

* hashtable template init. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix hashtable_kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop
Co-authored-by: NWorgenZhang <frank08081993@gmail.com>

ef6ff4ef

A
[IPU] add mixed-precission support for ipu (#41733) · d7224482
由 Allen Guo 提交于 4月 15, 2022
```
* add mixed-precission support for ipu

* restore cast_model_to_fp16 api

* update UTs
```
d7224482
C

polish tensor depreacted method warning (#41807) · e83e44c7
由 Chen Weihang 提交于 4月 15, 2022

e83e44c7
Z

Add API: Sparse Convolution3D (#41434) · 1665594d
由 zhangkaihuo 提交于 4月 15, 2022

1665594d

support no_need_buffer in eager_fluid state (#41720) · 840d2eb6

由 pangyoki 提交于 4月 15, 2022

* support no_need_buffer in eager_fluid state

* change no_need_buffer info from fwd_info to bwd_info

* fix CI fail, gru_unit donnot use no_need_buffer

* fix conflict between no_need_buffer and dispensable

* use tensor.define in dispensable

* solve conflict

* solve conflict

840d2eb6

A

【Hackathon No.25】为 Paddle 新增 nanquantile 数学计算API (#41343) · b9ee6a29
由 Asthestarsfalll 提交于 4月 15, 2022

b9ee6a29
Z

support KL2 multi-card training, refactor KL2 unittest, *test=kunlun (#41543) · 2eac4db8
由 zhangxiaoci 提交于 4月 15, 2022

2eac4db8

Change cuDNN Conv kernel for auto tune feature (#41313) · 35acfeda

由 limingshu 提交于 4月 15, 2022

* change cudnn helper for auto-tune

* Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.

* Fix the bug in calculating and printing current step cache hit rate.

* Improve the autotune cache and fix unittest.

* Change the key from AlgorithmType to int64_t.

* Fix unittest for cpu-only env.

* change ChooseAlgoByWorkspace for heuristic mode
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

35acfeda

F

[MLU] add mlu activation kernels (#41751) · 10114859
由 fwenguang 提交于 4月 15, 2022

10114859
F
[MLU] add mlu new profiler (#41138) · fc208b7e
由 fwenguang 提交于 4月 15, 2022
```
* [MLU] add mlu new profiler

* fix format
```
fc208b7e
C
[Auto Parallel]update cluster (#41722) · 605552a9
由 caozhou 提交于 4月 15, 2022
```
* update cluster
```
605552a9

fix batch norm memory issue (#41717) · 42abcc08

由 hong 提交于 4月 15, 2022

* try to fix batch norm memory issue

* fix batch norm memroy alloc bug

* polish some code

42abcc08

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功