提交 · bc113e10487115fd91cfc738c4279372eeb7c2a2 · PaddlePaddle / Paddle

02 3月, 2022 16 次提交
- J
  
  add logic kernel for mlu (#39940) · bc113e10
  由 joeqiao12 提交于 3月 02, 2022
  
  bc113e10
- Y
  [fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for... · 244ae318
  由 Yuang Liu 提交于 3月 02, 2022
```
[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference (#39992)
```
  244ae318
- L
  
  [KP] Activation op registration for XPU2. part 1/2 (#40002) · 90ab7403
  由 Lijunhui 提交于 3月 02, 2022
  
  90ab7403
- C
  [Phi] Unify complex type trait and fix real imag bug (#40036) · 0764fda2
  由 Chen Weihang 提交于 3月 02, 2022
```
* unify complex type trait and fix real imag bug

* add unittest for type tratis
```
  0764fda2
- Q
  [MLU] adapt matmul op (#39727) · b4d931e8
  由 qipengh 提交于 3月 02, 2022
```
* [MLU] adapt matmul op

* [MLU] fix phi namespace
```
  b4d931e8
- Z
  
  test=document_fix;record py3 case time (#40018) · 9070d5c5
  由 zhangchunle 提交于 3月 02, 2022
  
  9070d5c5
- 王
  
  [infrt] speed up the infrt ci. test=devvelop (#40032) · 36660d4c
  由王明冬提交于 3月 02, 2022
  
  36660d4c
- F
  
  [MLU] add transpose2 mlu kernel (#39994) · 4cab812e
  由 fwenguang 提交于 3月 02, 2022
  
  4cab812e
- B
  
  add_new_comm_primitive (#40040) · 4e00d2bb
  由 Baibaifan 提交于 3月 02, 2022
  
  4e00d2bb
- optimize CUDA implementaion of randint OP (#39952) · fb635089
  由 zhouweiwei2014 提交于 3月 02, 2022
```
* change CUDA implementaion of randint OP,move distribution common func to phi

* fix CI

* fix CI
```
  fb635089
- W
  [Eager] open eager when WITH_PYTHON (#39979) · 9af72957
  由 wanghuancoder 提交于 3月 02, 2022
```
* open eager when WITH_PYTHON, test=develop

* refine, test=develop

* refine, test=develop

* add DWITH_PYTHON for gen_fluid_lib, test=develop
```
  9af72957
- W
  
  ernie: revert skip_layernorm_fp16 (#39991) · 26e2b918
  由 Wangzheee 提交于 3月 02, 2022
  
  26e2b918
- J
  
  add share external data interface (#39809) · 1ff1c1e0
  由 JingZhuangzhuang 提交于 3月 02, 2022
  
  1ff1c1e0
- F
  [Pten] Gru lstm migration (#39729) · e4dba69a
  由 Feiyu Chan 提交于 3月 02, 2022
```
* move sequence2batch

* move lstm and gru

* Add phi/kernels directory into exclusion to stop using hipcc to compile non .cu files in it.
```
  e4dba69a
- F
  
  Fix bug for prepare phi OP (#40033) · fb0cadfd
  由 From00 提交于 3月 02, 2022
  
  fb0cadfd
- S
  update pd_2_trt lower pass (#40019) · acdf0663
  由 Shang Zhizhou 提交于 3月 02, 2022
```
* update pd_2_trt lower pass

* update pd_2_trt lower pass

* update style

* udpate

* change trt.graph to trt.create_engine

* update comments

* update comments

* add test
```
  acdf0663
01 3月, 2022 24 次提交

Z

Added attr & tensor type mapping for final state codegen (#39997) · 852a872f
由 Zhanlue Yang 提交于 3月 01, 2022

852a872f
Q

[ROCM] fix to get rocm number in script, test=develop (#39938) · 72e462cd
由 Qi Li 提交于 3月 01, 2022

72e462cd
A

fix compiling and running with ipu (#39920) · 69ab2700
由 Allen Guo 提交于 3月 01, 2022

69ab2700

[Phi]rm reduce infershape (#39820) · 09039636

由 chentianyu03 提交于 3月 01, 2022

* modify infershape utils and rm reduce infershape

* merge develop

* fix infermete bug

* add IsForInferShape func in ArgumentMappingContext

* add reduce_mean infermeta

* modify annotation

* add default dims

09039636

[phi] tranfer the selu_op and pass the CI (#39819) · 197da15a

由 xiongkun 提交于 3月 01, 2022

* tranfer the selu_op and pass the CI

* add sig files

* fix code

* fix by code review

* remove TOOD

* change the include position

* change the head position

197da15a

Add function description for Kernel Primitive API (#39884) · 255bf609

由 niuliling123 提交于 3月 01, 2022

* Add function description for Kernel Primitive API
1. Set cumsum and sort share memory size = 1024
2.sort and cumsum api limitation : blockDim.x must be less than 512 (blockDim.x <= 512)

255bf609

Fixed auto codegen for intermediate tensors (#39797) · 2592805b

由 Zhanlue Yang 提交于 3月 01, 2022

* Refactored GradNodeAccumulation data structure and behaviour

* Fixed CI issues

* Fix compilation issues

* Fixed minor issues

* Reverted changes for intermediate and OverwriteOutput

* fixed minor issue

* Fixed auto codegen for intermediate tensors

* Removed restriction on AccumulationNode modification

* Fixed CI Coverage issues

* Adjusted Log contents

* Fixed CI issues

2592805b

Add mobilenetv3_large performance test for bf16 and int8 (#39738) · eb7c211a

由 joanna.wozna.intel 提交于 3月 01, 2022

* Add mobilenetv3_large performance test

* Disable the BF16 test if the device does not support BF16 computations

* Change test timeout

eb7c211a

[bf16] add bf16 kernel: layer_norm p_norm reduce_sum (#39843) · ce8ed978

由 zhangbo9674 提交于 3月 01, 2022

* add layer norm

* add p norm

* add reduce sum

* refine layer norm register bf16 for cudnn811

* add bf16 cast for hip

* add unittest

* refine rocm

* refine layer_norm unittest

* refine reduce op

* refine unittest

* enhance atol for reduce unittest

ce8ed978

W
remove conv_affine_channel_fuse_pass (#39817) · fc06be9d
由 wenbin 提交于 3月 01, 2022
```
* remove

* pass

* more pass
```
fc06be9d

[bf16] add bf16 kernel: scale gather sum (#39683) · 6d26b332

由 zhangbo9674 提交于 3月 01, 2022

* add scale gather sum

* refine CUDA_ATOMIC_WRAPPER ADD for bf16

* add gather unittest

* solve conflict

* add scale uinttest

* add sum unittest

* solve conflict

* refine gather unittest

* refine unittest

6d26b332

G

add MasterParam and MasterParamOut for sparse_momentum op (#39969) · 9de79892
由 Guoxia Wang 提交于 3月 01, 2022

9de79892
R

[phi] migrate where kernel into phi (#39811) · 468a2a17
由 ronnywang 提交于 3月 01, 2022

468a2a17

[PHI] Remove reseting dtype, layout and allocation by arg_def for outputs in executor (#39781) · 4fbcf6f4

由 zyfncg 提交于 3月 01, 2022

* remove SetAllocationForOutputTenosr

* add place param for copy kernel

* recover SetAllocationForOutputTenosr

* polish code

* fix empty_dev api bug

* remove reseting dtype and layout for output in executor

* fix merge bug

* [Phi] Add ClearHolder when re-alloc on new place in DeviceContext

* fix hostAlloc

* remove setting output allocation

* remove full_kernel_impl.h

* fix bug of xpu full_like
Co-authored-by: NAurelius84 <zhangliujie@baidu.com>

4fbcf6f4

L
[phi] move uniform_random to phi (#39937) · b3466387
由 Leo Chen 提交于 3月 01, 2022
```
* move uniform_random to phi

* fit selected_rows

* replace mutable_data
```
b3466387

[Phi] Support kps backend and kernel registry (#39941) · 08b43cce

由 Chen Weihang 提交于 3月 01, 2022

* support kps backend and compile

* resolve conflict

* fix kps backend trans

* test in xpu2 device

* remove dummy kernel

08b43cce

optimize mergeadd for sparse_adam,*test=kunlun (#39966) · d4911594

由 z8hanghuan 提交于 3月 01, 2022

* optimize mergeadd for sparse_adam,*test=kunlun

* optimize mergeadd for sparse_adam,*test=kunlun

* optimize mergeadd for sparse_adam, *test=kunlun

d4911594

[PHI] Support Multi Input and Output for InferShape (#39870) · e8d45583

由 zyfncg 提交于 3月 01, 2022

* add multi input for infer_shape

* support multi output for infershape

* fix split bug

* fix bug of concat

* support vector<MetaTensor*> in infrt

* fix bug

e8d45583

A
[Phi] Migrate logical_and/or/not/xor into Phi (#39942) · 8c237973
由 Aurelius84 提交于 3月 01, 2022
```
* [Phi] Migrate logical_and/or/not/xor into Phi

* fix unittest

* fix function name
```
8c237973
S
[DP] Construct reducer group (#39987) · 4da841e0
由 ShenLiang 提交于 3月 01, 2022
```
* add reducer
```
4da841e0

Optimize group_norm op forward (#39596) · 657dd5a9

由 crystal 提交于 3月 01, 2022

* optimize group norm forward

* use vectorized optimization

* add scalar calculation code

* optimize code

657dd5a9

C

remove dot infershape (#39945) · 75280d36
由 chentianyu03 提交于 3月 01, 2022

75280d36
王

add type constrait for DenseTensor (#39967) · 4149cabe
由王明冬提交于 3月 01, 2022

4149cabe
S
Optimize the CUDA kernel in DistributedFusedLamb optimizer (#39972) · d17961ed
由 sneaxiy 提交于 3月 01, 2022
```
* vectorize lamb kernel

* remove flags, add ut

* remove useless codes

* refine code, add param order
```
d17961ed

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功