提交 · 64afa638cfa61a2e2716494ea38f54e32c22fc85 · PaddlePaddle / Paddle

25 8月, 2022 4 次提交

R
[NPU] add run_program_op_npu (#45349) · 64afa638
由 ronnywang 提交于 8月 25, 2022
```
* [NPU] add run_program_op_npu

* add run_program_op_npu ut
```
64afa638

optimize conv algo cache (#41891) · 1cd7e68b

由 hong 提交于 8月 25, 2022

* optimizer conv alog speed

* code polish

* remove useless code

* fix compile error

* fix cpu compile error

* not use cudnn alog t

* add search cache max number

* polish code

* fix cache test bug

* add groups data format to conv args

* fix cache test bug

* fix cudnn_deterministic bug

* fix test switch auto tune bug

* fix test swith autotune bug;

* fix conv cache bug

* fix cache test error

* fix cache test bug

* fix windows mac compile error

* fix workspace search error

* update cudnn cache

* fix cache test bug; test=develop

* fix autotune swith test error

* polish code

* oplish code

1cd7e68b

R

[triu_indices] add triu_indices_op (#45168) · a410c397
由 Rayman 提交于 8月 25, 2022

a410c397
U

fix roi_align_op_npu to pass the unittest (#45310) · 256bf6ff
由 USTCKAY 提交于 8月 25, 2022

256bf6ff

24 8月, 2022 6 次提交

make tensor_util contains no cuda code (#45256) · 78916a7a

由 Leo Chen 提交于 8月 24, 2022

* make tensor_util contains no cuda code

* refine isfinite

* revert ut

* move isfinite function to its op

* fix test

* fix compile

* std::isnan is not defined for int type on windows

* fix windows compile

* fix fp16

* fix rocm compile

* revert gradient node

78916a7a

W

Adapt tensor axis for cumsum (#45372) · 7f49b9ba
由 WangZhen 提交于 8月 24, 2022

7f49b9ba

Support fp16 of adam operator in xpu environment (#45292) · a012d426

由 mengqingchun02 提交于 8月 24, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

a012d426

W
[OpAttr]Adapt tensor minlength for bincount (#45342) · 12917c8c
由 WangZhen 提交于 8月 24, 2022
```
* Adapt minlength attr for bincount
```
12917c8c
W
trt input tensor issue (#45358) · 55393172
由 wenbin 提交于 8月 24, 2022
```
* fix

* optimize
```
55393172
Z

[MLU]: compile prior box CPU kernel when WITH_MLU is ON for SSD (#45238) · 596a9405
由 zhaoying9105 提交于 8月 24, 2022

596a9405

23 8月, 2022 2 次提交
- N
  
  Delete the template parameter BLockSize in Kernel Primitive API (#45220) · 1a0cd447
  由 niuliling123 提交于 8月 23, 2022
  
  1a0cd447
- Y
  [Phi]Move distribute_fpn_proposals to PHI (#45212) · 8f8ed7de
  由 YuanRisheng 提交于 8月 23, 2022
```
* move distribute_fpn_proposals

* fix some code

* fix yaml bugs

* add set dtype

* move proposal_impl to funcs

* fix compile bugs
```
  8f8ed7de
20 8月, 2022 1 次提交
- S
  【autograd】add max_p primitive operator for new autograd (#45178) · 197f4048
  由 Sing_chan 提交于 8月 20, 2022
```
* add max_p without test

* add test of max_p

* make max_p consistent with paddle.maximum
```
  197f4048
19 8月, 2022 5 次提交

H

polish REGISTER_OPERATOR parameter of fill_any (#45263) · 1c4134f6
由 HongyuJia 提交于 8月 19, 2022

1c4134f6
H

[XPU] c_allreduce support int. update bkcl to 1.0.5. test=kunlun (#45248) · 9f1f1b0a
由 houj04 提交于 8月 19, 2022

9f1f1b0a

Make up beam_search_decode operator test cases on xpu and cpu environment (#45264) · 3deab77f

由 mengqingchun02 提交于 8月 19, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* make up beam_search_decode operator test cases on xpu and cpu environment. test=kunlun

3deab77f

[XPU] add merged_momentum unittest and change momentum (#45241) · e0f1c9f2

由 dongfangshenzhu 提交于 8月 19, 2022

* add merged_momentum *test=kunlun

* add merged_momentum *test=kunlun

* add fp16 to merged_momentum,*test=kunlun

* change dist_model.cc

* add merged_momentum unittest and  change momentum,test=kunlun

* add merged_momentum unittest and  change momentum,test=kunlun

* add merged_momentum unittest and  change momentum,test=kunlun

* add merged_momentum unittest and  change momentum,test=kunlun

e0f1c9f2

Support beam search decode op in XPU environment (#44917) · adaffb7b

由 mengqingchun02 提交于 8月 19, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* fix beam_search operator bugs on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

* support beam_search_decode operator on xpu. test=kunlun

adaffb7b

18 8月, 2022 2 次提交

apply buffer_shared_inplace_pass and inplace_addto_op_pass pass to program in... · d8d124b6

由 pangyoki 提交于 8月 18, 2022

apply buffer_shared_inplace_pass and inplace_addto_op_pass pass to program in Standalone Executor (#45085)

* apply inplace addto in python apply_pass

* fix

* apply inplace pass for program

* skip feed and fetch var

* fix block_desc.move_from

* fix block desc

* alltoall remove inplace

* fix

d8d124b6

A
[OpAttr]Squeeze axes support Tensor (#45189) · c93451f4
由 Aurelius84 提交于 8月 18, 2022
```
* [OpAttr]Squeeze axes support Tensor

* add support_tensor

* fix unittest

* fix coverage
```
c93451f4

17 8月, 2022 4 次提交
- A
  [OpAttr]Add SupportTensor for OpMaker with whitelist mechanism (#45084) · 2594935a
  由 Aurelius84 提交于 8月 17, 2022
```
* [OpAttr]Add SupportTensor for OpMaker

* fix typo

* fix code style

* add SupportTensor for concat op

* add unittest for register Tensor

* add shape checker and split attribute
```
  2594935a
- W
  fix multi stream error. (#45196) · a79d4a75
  由 Wilber 提交于 8月 17, 2022
```
* fix multi stream error.
```
  a79d4a75
- F
  
  [MLU] fix copy error (#45194) · 75690584
  由 fwenguang 提交于 8月 17, 2022
  
  75690584
- Y
  add instance norm op for xpu (#45097) · 216d25ac
  由 ykkk2333 提交于 8月 17, 2022
```
* xpu unittest grad compute supports more types, *test=kunlun

* add instance norm xpu, *test=kunlun
```
  216d25ac
16 8月, 2022 5 次提交

[Phi] Move amp ops into phi (#45079) · b4f67757

由 Chen Weihang 提交于 8月 16, 2022

* move check finite and unscale kernel into phi

* move infershape into phi

* move update_loss_scaling kernel into phi

* remove original kernels

* move update loss scaling infershape into phi

* add header for xpu and npu

* solve coverage failed

* fix npu test failed

* remove mutable data in cu file

* fix new executor failed

* add valid check for meta tensor output

b4f67757

convert multihead to oss (#45019) · f706d95d

由 feng_shuai 提交于 8月 16, 2022

* convert multihead to oss

* fix:bug

* fix:delete const cast

* fix:don't support bias_qk

* add vit pass

* fix:convert bug and add preln_residual_bias

* support length=-1

* add UT for convert

* add no_bias_qk support for gpu_multihead_op

* delete infer_shape depends on bias_qk

* oss just can be used in T4 and A*

* fix:change api for ROCM CI

f706d95d

A

support fp16 softmax on custom place (#45177) · a0bbfbd4
由 Aganlengzi 提交于 8月 16, 2022

a0bbfbd4
F
Fix problem that the shape of tensor is not inited correctly when backward in static graph (#45030) · e26f80ad
由 feifei-111 提交于 8月 16, 2022
```
* fix_shape

* code style

* fix assert

* fix to_tensor badreturn
```
e26f80ad

【autograd】add select_p、eq_p、pow_p primitive operator for new autograd (#44813) · b681c88c

由 Sing_chan 提交于 8月 16, 2022

* add select_p

* fix bugs

* add custom test for select_p; modify select_p primrules

* modify according to xiaoxu's comment

* add eq_p, select_p, pow_p, use autograd to test grad of high order

* add requirement of autograd, modify expected type of eq

* modify according to xiaoxu's comment

* import primops to use primops.pow

b681c88c

15 8月, 2022 4 次提交
- Y
  
  fused_embedding_eltwise_layernorm_op and skip_layernorm_op support fp16 (#44969) · ac0553a0
  由 Yuanle Liu 提交于 8月 15, 2022
  
  ac0553a0
- Z
  
  add mish and mish_grad for XPU, test=kunlun (#45098) · 6815c8ab
  由 zhangyikun02 提交于 8月 15, 2022
  
  6815c8ab
- H
  [XPU] add some collective ops. (#45049) · 7e2a20d5
  由 houj04 提交于 8月 15, 2022
```
* [XPU] add some collective ops. test=kunlun

* use XPUOpTestWrapper. test=kunlun

* skip kl1 for collective ops. fix typo: deivce -> device. test=kunlun
```
  7e2a20d5
- W
  convert_fp16 support multi block (#45050) · 9aecf286
  由 Wilber 提交于 8月 15, 2022
```
* convert_fp16 support multi block

* update

* update
```
  9aecf286
12 8月, 2022 6 次提交

Offload calculations from matmul op to fuse pass (#44941) · acb78ea2

由 Sławomir Siwek 提交于 8月 12, 2022

* remove v2_transpose_reshape

* matmul_transpose_reshape

* reshape_transpose_matmul

* Add int8 support for matmulV2

* restore ut

* adjust old ut

* restore parallel UT ruels

* remove mkldnn code from base ops

* move enforces to pass

* remove duplicated functions

* delete duplicated enforces

* feedback from review

* add comments to variables

* enable eltwise support

* dynamic attribute

* remove fusepass tests from op test

* remove fuse pass cases from op test

* revert introduction of dynamic attributes

* style
Co-authored-by: Nwozna <joanna.wozna@intel.com>

acb78ea2

transfer memcpy_h2d from fluid to phi (#44932) · 7bc57d35

由 kangguangli 提交于 8月 12, 2022

* transfer memcpy_h2d from fluid to phi

* use UnchangedInferMeta instead

* restore test_standalone_executor

* add newline to fix codestyle check

* rename pt -> phi

* simplify logic and add check

* make the comment more clear

* remove useless comment

* refine code

7bc57d35

Y
trt engine input data type should be consistent with trt input bindin… (#45103) · a3eb341e
由 Yuanle Liu 提交于 8月 12, 2022
```
* trt engine input data type should be consistent with trt input bindings type

* fix some bugs

* fix some bugs

* fix some bugs
```
a3eb341e
D
enhance grid_sampler to support 3d input (#45015) · 1773fbba
由 duanyanhui 提交于 8月 12, 2022
```
* enhance grid_sampler to support 3d input
```
1773fbba
Z

fix extra output of kernels for inference (#45048) · 1cb883da
由 zyfncg 提交于 8月 12, 2022

1cb883da

[geometric]Add paddle.geometric.send_ue_recv API (#43174) · 615b15a3

由 Siming Dai 提交于 8月 12, 2022

* add init file

* add op definition and infermeta

* add kernel definition funcs

* add broadcast infer shape

* add gpu forward kernel

* delete SUB and DIV

* add x_grad

* add template

* add e_grad for min and max

* fix small bug

* temp commit

* temp commit

* add e_grad for sum and mean

* fix some compile bug

* fix compile bugs

* fix compile problem

* add sum forward unittest

* fix broadcast error, add kernel sig, register e_grad, change unit test

* fix grad

* add temp grad fix

* temp commit

* add min max unittest

* add max, min unittest, fix mul bug

* add cpu forward sum and mean

* add forward min max, fix mean unittest

* add cpu backward min max

* fix code-style

* add backward sum mean

* fix rocm ci

* set uniitest timeout

* fix bug of x broadcast to e, gpu grad

* fix bug of x broadcast to e, cpu grad

* rename BOOST_GET_CONST macro

* fix rocm ci

* mv graph_send_e_recv to graph_send_ue_recv

* move out_size to IntArray

* add eager op test

* fix max pool type bug, add unittest for api

* revise api doc

* add fp16 for atomic min and max, add unittest

* add unittest

* add fp16 support for graph_send_recv

* fix unittest fp16 bug

* change OutSizeTensor to Out_size

* move E to Y

* add copyright, fix comment

* review code

* fix thread block size

* fix thread block size

* change api attribute name: pool_type to reduce_op, compute_type to message_op

* change api attribute name, move pool_type to reduce_op, move compute_type to message_op

615b15a3

11 8月, 2022 1 次提交
- C
  make affine_grid_op support 5d input_dim on cpu and gpu (#45012) · 7812522c
  由 carryyu 提交于 8月 11, 2022
```
* make affine_grid_op support 5d_input on cpu and gpu
```
  7812522c

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功