提交 · acb78ea20c9cc1a55889c5169290c8f178d72486 · PaddlePaddle / Paddle

12 8月, 2022 17 次提交

Offload calculations from matmul op to fuse pass (#44941) · acb78ea2

由 Sławomir Siwek 提交于 8月 12, 2022

* remove v2_transpose_reshape

* matmul_transpose_reshape

* reshape_transpose_matmul

* Add int8 support for matmulV2

* restore ut

* adjust old ut

* restore parallel UT ruels

* remove mkldnn code from base ops

* move enforces to pass

* remove duplicated functions

* delete duplicated enforces

* feedback from review

* add comments to variables

* enable eltwise support

* dynamic attribute

* remove fusepass tests from op test

* remove fuse pass cases from op test

* revert introduction of dynamic attributes

* style
Co-authored-by: Nwozna <joanna.wozna@intel.com>

acb78ea2

[phi] Transfer linear_interp_v2 yaml to phi (#45072) · c737232f

由 HongyuJia 提交于 8月 12, 2022

* support optional<vector<Tensor>> in yaml and eager

* delete useless comments in eager_gen.py

* fix api_base.py support optional<vector<TTensor>>

* python_c_gen.py support optional<vector<tensor>>

* transfer linear_interp_v2 yaml from fluid to phi

* fix op_test typo error

* change linear_interp_v2 testcase

* fix args in final_state_linear_interp_v2

* fix zeropad2d typo. test=document_fix

c737232f

[Auto Parallel] Update reshard for auto search (#45002) · 8624f3b1

由 caozhou 提交于 8月 12, 2022

* update reshard for auto search

* fix unittest bug

* update dist tensor

* update reshard output

* fix unittests bug

* merge develop

8624f3b1

C

Add Quant Row&Column ParallelLinear (#44869) · 236ad4fc
由 Chang Xu 提交于 8月 12, 2022

236ad4fc
A

fix compilation (#45087) · 4eec94dd
由 Allen Guo 提交于 8月 12, 2022

4eec94dd
A
Fix concat and tile attribute for 2ONNX (#44658) · bb8203cd
由 Aurelius84 提交于 8月 12, 2022
```
* Fix concat and tile attribute for ONNX

* disable unittest
```
bb8203cd
J
[Auto Parallel] Data Parallel Optimization Pass 1 (#44882) · 7aeec4ed
由 JZ-LIANG 提交于 8月 12, 2022
```
* bugfix

* remove scaling

* support rescale_grad opt
```
7aeec4ed

[Eager] Support more final_state code (#44986) · cf17ae8a

由 Jiabin Yang 提交于 8月 12, 2022

* support more final_state code

* support more final_state code

* fix api error

* fix norm error

* fix pool3d error

* revert pool3d and max_pool_3d_adaptive

* fix code check error

* fix norm problem

cf17ae8a

transfer memcpy_h2d from fluid to phi (#44932) · 7bc57d35

由 kangguangli 提交于 8月 12, 2022

* transfer memcpy_h2d from fluid to phi

* use UnchangedInferMeta instead

* restore test_standalone_executor

* add newline to fix codestyle check

* rename pt -> phi

* simplify logic and add check

* make the comment more clear

* remove useless comment

* refine code

7bc57d35

Y
trt engine input data type should be consistent with trt input bindin… (#45103) · a3eb341e
由 Yuanle Liu 提交于 8月 12, 2022
```
* trt engine input data type should be consistent with trt input bindings type

* fix some bugs

* fix some bugs

* fix some bugs
```
a3eb341e
H

change default log level (#45093) · 34234282
由 hong 提交于 8月 12, 2022

34234282

Remove some custom_impl api (#45066) · adb61b7b

由 zyfncg 提交于 8月 12, 2022

* remove some custom_impl api and make them generated by yaml completely

* delete useless code

* fix adamw bug

* fix infermeta

* revert adamw

* polish code

* fix bug

adb61b7b

Z

refix index resize in multiclassnms3 (#45095) · 49e2a4d8
由 zhiboniu 提交于 8月 12, 2022

49e2a4d8

[Auto Parallel] Pybind ProcessMesh and DeviceMesh (#45013) · 5bf3dec9

由 Yulong Ao 提交于 8月 12, 2022

* [Auto Parallel] Pybind11 ProcessMesh and DeviceMesh

* [Auto Parallel] Fix the unittest problem

* [Auto Parallel] Explicitly add the src file for auto_parallel target

* [Auto Parallel] Add the proto depedency explicitly

* [Auto Parallel] Fix the cmake bug on windows and mac

* [Auto Parallel] Remove the pybind11 header file in process_mesh.h

5bf3dec9

D
enhance grid_sampler to support 3d input (#45015) · 1773fbba
由 duanyanhui 提交于 8月 12, 2022
```
* enhance grid_sampler to support 3d input
```
1773fbba
Z

fix extra output of kernels for inference (#45048) · 1cb883da
由 zyfncg 提交于 8月 12, 2022

1cb883da

[geometric]Add paddle.geometric.send_ue_recv API (#43174) · 615b15a3

由 Siming Dai 提交于 8月 12, 2022

* add init file

* add op definition and infermeta

* add kernel definition funcs

* add broadcast infer shape

* add gpu forward kernel

* delete SUB and DIV

* add x_grad

* add template

* add e_grad for min and max

* fix small bug

* temp commit

* temp commit

* add e_grad for sum and mean

* fix some compile bug

* fix compile bugs

* fix compile problem

* add sum forward unittest

* fix broadcast error, add kernel sig, register e_grad, change unit test

* fix grad

* add temp grad fix

* temp commit

* add min max unittest

* add max, min unittest, fix mul bug

* add cpu forward sum and mean

* add forward min max, fix mean unittest

* add cpu backward min max

* fix code-style

* add backward sum mean

* fix rocm ci

* set uniitest timeout

* fix bug of x broadcast to e, gpu grad

* fix bug of x broadcast to e, cpu grad

* rename BOOST_GET_CONST macro

* fix rocm ci

* mv graph_send_e_recv to graph_send_ue_recv

* move out_size to IntArray

* add eager op test

* fix max pool type bug, add unittest for api

* revise api doc

* add fp16 for atomic min and max, add unittest

* add unittest

* add fp16 support for graph_send_recv

* fix unittest fp16 bug

* change OutSizeTensor to Out_size

* move E to Y

* add copyright, fix comment

* review code

* fix thread block size

* fix thread block size

* change api attribute name: pool_type to reduce_op, compute_type to message_op

* change api attribute name, move pool_type to reduce_op, move compute_type to message_op

615b15a3

11 8月, 2022 12 次提交
- H
  
  [XPU] fix typo in unit tests. (#45081) · 9b35f035
  由 houj04 提交于 8月 11, 2022
  
  9b35f035
- C
  make affine_grid_op support 5d input_dim on cpu and gpu (#45012) · 7812522c
  由 carryyu 提交于 8月 11, 2022
```
* make affine_grid_op support 5d_input on cpu and gpu
```
  7812522c
- H
  add new ptq method ptf (#44246) · f4bc69ec
  由 handiz 提交于 8月 11, 2022
```
* add new ptq method ptf

* add post training quantization mobilenetv1 test for ptf

* add post training quantization mobilenetv1 test for ptf test=allcases
```
  f4bc69ec
- R
  
  Disable ExecutionStrategy UTs for standalone executor (#45067) · f901f020
  由 Ruibiao Chen 提交于 8月 11, 2022
  
  f901f020
- P
  
  【PaddlePaddle Hackathon 3 No.17】为 Paddle 新增 sgn (#44568) · f7a0bfa1
  由 peachlcy 提交于 8月 11, 2022
  
  f7a0bfa1
- Z
  Refine cpups cmake (#45055) · 0dd895d2
  由 zhaocaibei123 提交于 8月 11, 2022
```
* first refine

* second refine

* remove some code unuseful
```
  0dd895d2
- C
  Add input shape record for new dygraph operator (#44999) · 8ea83400
  由 chenjian 提交于 8月 11, 2022
```
* fix

* add control flag and input shapes for new dygraph

* fix file mode

* improve code coverage

* fix a bug in statstic

* fix according to review

* optimize performance

* fix
```
  8ea83400
- K
  
  launch suport ip port (#45052) · 35166902
  由 kuizhiqing 提交于 8月 11, 2022
  
  35166902
- Z
  Fix submanifold conv (#45060) · 27e3b06f
  由 zhangkaihuo 提交于 8月 11, 2022
```
* fix submanifold conv
```
  27e3b06f
- W
  
  Change bias to persistable in preln_residual_bias_fuse_pass (#45037) · 26c573de
  由 whs 提交于 8月 11, 2022
  
  26c573de
- W
  Polish black_ops_list logic in eager_gen (#44188) · 49d2a778
  由 Weilong Wu 提交于 8月 11, 2022
```
* Polish black_ops_list logic in eager_gen

* update black_ops_list
```
  49d2a778
- W
  [Eager] use final_state_full / *full_ instead fill_constant under eager mode (#45044) · b61d8f77
  由 Weilong Wu 提交于 8月 11, 2022
```
* [Eager] use final_state_fill_constant_

* fill_constant use str_value

* add fill_constant_ to no_amp_list

* use float(value) as input

* support final state full_ same as fill_constant

* [Eager] use final_state_full / *full_ instead fill_constant under eager

* polish code

* fix mistakes
```
  b61d8f77
10 8月, 2022 11 次提交
- W
  [Paddle Inference]Disable skip layernorm half (#45047) · 4805da50
  由 Wangzheee 提交于 8月 10, 2022
```
* disable_skip_layernorm_fp16
```
  4805da50
- Y
  
  fix mkldnn interpolate ops (#45008) · 3f49817a
  由 yeliang2258 提交于 8月 10, 2022
  
  3f49817a
- C
  
  polish backend and layout details (#45029) · 35839aee
  由 Chen Weihang 提交于 8月 10, 2022
  
  35839aee
- F
  1. change the codegen code to avoid conversion from heterogeneous 'initializer... · 083b4eb6
  由 Feiyu Chan 提交于 8月 10, 2022
```
1. change the codegen code to avoid conversion from heterogeneous 'initializer list' to tuple, which fails on gcc 5.4; (#45036)

2. add a template CheckTensorHasNanOrInf to handle arbitary tuple of supported types.
```
  083b4eb6
- A
  
  [BugFix]Fix save/load_inference_model API BUG while program contains no param (#45038) · aa42bc25
  由 Aurelius84 提交于 8月 10, 2022
  
  aa42bc25
- D
  [phi] migration of class center sample infermeta (#45025) · b1e33bea
  由 duanboqiang 提交于 8月 10, 2022
```
* add class center sample infershape

* add yaml

* modify unittest

* modify unittest

* remove comment
```
  b1e33bea
- Z
  add macro control in enforce_xpu.h, test=kunlun (#45022) · 9e74211f
  由 zhangxiaoci 提交于 8月 10, 2022
```
* add macro control in enforce_xpu.h, test=kunlun

* minor bugfix

* minor bugfix
```
  9e74211f
- fix bug of adaptive pool2d_grad, *test=kunlun (#45031) · 01d05bc0
  由 z8hanghuan 提交于 8月 10, 2022
```
* fix bug of adaptive pool2d_grad, *test=kunlun

* fix bug of adaptive pool2d_grad, *test=kunlun

* fix bug of adaptive pool2d_grad, *test=kunlun
```
  01d05bc0
- X
  [Paddle Inference] Support cuda_graph. (#44878) · 84bf5c31
  由 xiaoxiaohehe001 提交于 8月 10, 2022
```
* cuda_graph

* cuda_graph_

* cuda_graph_

* cuda_graph_
```
  84bf5c31
- N
  [CodeStyle] use np.testing.assert_array_equal instead of... · 93c5c887
  由 Nyakku Shigure 提交于 8月 10, 2022
```
[CodeStyle] use np.testing.assert_array_equal instead of self.assertTrue(np.array_equal(...)) (#44947)

* automatically fix

* update comments

* numpy -> np

* self.assertEqual(..., True)

* wrong usage (err_msg=True)

这不是修复导致的错误，这些是原来 `self.assertTrue(..., True)`
的错误用法，因此在修复后将其认为位置参数 `err_msg`

* some missing fix
```
  93c5c887
- X
  prepare_gradient_aggregation for non-leaf output of PartialProgramLayer (#44893) · f694e991
  由 xiongkun 提交于 8月 10, 2022
```
* 1. add prepare_gradient_aggregation in PartialProgramLayer

* 1. draft

* fix ci problem
```
  f694e991

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功