提交 · 1980e33a901efa5128e7799a83bcd35ee8ada199 · BaiXuePrincess / Paddle

02 3月, 2022 24 次提交

L
add check for backward hook (#40041) · 1980e33a
由 Leo Chen 提交于 3月 02, 2022
```
* add check for backward hook

* refine ut
```
1980e33a
S
Move gather.h/gather.cu.h/scatter.h/scatter.cu.h to the phi library (#40043) · 09258040
由 sneaxiy 提交于 3月 02, 2022
```
* move gather.h gather.cu.h scatter.h scatter.cu.h to phi library

* fix CI

* fix rocm ci
```
09258040
S

vec scale kernel (#40011) · 2e6548a9
由 sneaxiy 提交于 3月 02, 2022

2e6548a9
Y
[Phi]Move elementwise function to funcs directory (#39986) · 5898e9ab
由 YuanRisheng 提交于 3月 02, 2022
```
* move elementwise function to funcs directory

* fix compile bugs

* modify according to comment
```
5898e9ab
A
[XPU] Fix Phi Kernel cache problem in operator.cc (#40044) · 66196573
由 Aurelius84 提交于 3月 02, 2022
```
* [XPU] Fix Phi Kernel cache problem in operator.cc

* fix typo
```
66196573

Move transpose to pten (#39327) · 7a857924

由 hong 提交于 3月 02, 2022

* immigrate_transpose_to_pten cpu kernel only; test=develop

* fix bug; test=develop

* add transpose cuda api

* bug fix;

* fix bugs

* fix bugs; test=develop

* bug fix;

* move transepose to pten; test=develop

* fix bug; test=develop

* fix bugs; test=develop

* add transpose grad fp16 support; test=develop

* fix bug; test=develop

* fix npu bug; test=develop

* fix nemul = 0 bug; test=develop

* add fp16 support; test=develop

* fix data type register bug; test=develop

* fix transpose bug; test=develop

* update transpose

* fix transpose bug; test=develop

* remove useless code; test=develop

* remove useless code; test=develop

* fix transpose alias bug; test=develop

* polish code; test=develop

* resolve confict; test=develop

* resolve confilct; test=develop

* recover prepared operator; test=develop

* fix bug; test=develop

* polish code; test=develop

* fix bug; test=develop

* fix bug; test=develop

7a857924

Move BroadcastTensors OP to phi (#40047) · 2a5590a1

由 From00 提交于 3月 02, 2022

* Move BroadcastTensors OP to phi

* Remove mutable_data in impl

* Move BilinearTensorProductInferMeta to multiary.h/cc

2a5590a1

new fleet_desc builder (#39948) · 1c4e3e5d

由 ziyoujiyi 提交于 3月 02, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .

* cpu-async-ps minimize test ok & gpu minimize test ok

* add heter 2stage unittest

* add heter 2stage unittest

* add heter 2stage unittest

* sync/geo test ok & fix heter_worker program ok

* .

* new fleet desc generator

* new fleet_desc builder

* new fleet_desc builder

* .

* .

* correct ps.proto compile

* .
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

1c4e3e5d

H

[Infrt]add phi kernel dialect (#39726) · 07dad6d6
由 huzhiqiang 提交于 3月 02, 2022

07dad6d6
Z
[bf16] add bf16 kernel: softmax & log_softmax (#39999) · 4a4215ff
由 zhangbo9674 提交于 3月 02, 2022
```
* add softmax log_softmax

* refine rocm

* refine unittest
```
4a4215ff
C
【phi】migrate gather_tree,reduce_prod to phi (#39844) · 6af2729e
由 crystal 提交于 3月 02, 2022
```
* move to phi

* migrate gather_tree_op into phi

* move reduce_prod tp phi

* optimize code
```
6af2729e

Upgrade new profiler (#39984) · 0c3f7fbc

由 chenjian 提交于 3月 02, 2022

* add new profiler components

* fix bug

* upgrade new profiler

* fix operator.cc

* fix operator.cc

* fix cmakelists.txt

* fix bug

* fix according to pr

* fix bug

* fix cmake

* fix bug

* fix a bug

* fix bug

* fix bug

0c3f7fbc

J

add logic kernel for mlu (#39940) · bc113e10
由 joeqiao12 提交于 3月 02, 2022

bc113e10
Y
[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for... · 244ae318
由 Yuang Liu 提交于 3月 02, 2022
```
[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference (#39992)
```
244ae318
L

[KP] Activation op registration for XPU2. part 1/2 (#40002) · 90ab7403
由 Lijunhui 提交于 3月 02, 2022

90ab7403
C
[Phi] Unify complex type trait and fix real imag bug (#40036) · 0764fda2
由 Chen Weihang 提交于 3月 02, 2022
```
* unify complex type trait and fix real imag bug

* add unittest for type tratis
```
0764fda2
Q
[MLU] adapt matmul op (#39727) · b4d931e8
由 qipengh 提交于 3月 02, 2022
```
* [MLU] adapt matmul op

* [MLU] fix phi namespace
```
b4d931e8
F

[MLU] add transpose2 mlu kernel (#39994) · 4cab812e
由 fwenguang 提交于 3月 02, 2022

4cab812e
B

add_new_comm_primitive (#40040) · 4e00d2bb
由 Baibaifan 提交于 3月 02, 2022

4e00d2bb

[Eager] open eager when WITH_PYTHON (#39979) · 9af72957

由 wanghuancoder 提交于 3月 02, 2022

* open eager when WITH_PYTHON, test=develop

* refine, test=develop

* refine, test=develop

* add DWITH_PYTHON for gen_fluid_lib, test=develop

9af72957

W

ernie: revert skip_layernorm_fp16 (#39991) · 26e2b918
由 Wangzheee 提交于 3月 02, 2022

26e2b918
J

add share external data interface (#39809) · 1ff1c1e0
由 JingZhuangzhuang 提交于 3月 02, 2022

1ff1c1e0

[Pten] Gru lstm migration (#39729) · e4dba69a

由 Feiyu Chan 提交于 3月 02, 2022

* move sequence2batch

* move lstm and gru

* Add phi/kernels directory into exclusion to stop using hipcc to compile non .cu files in it.

e4dba69a

F

Fix bug for prepare phi OP (#40033) · fb0cadfd
由 From00 提交于 3月 02, 2022

fb0cadfd

01 3月, 2022 16 次提交

Z

Added attr & tensor type mapping for final state codegen (#39997) · 852a872f
由 Zhanlue Yang 提交于 3月 01, 2022

852a872f
A

fix compiling and running with ipu (#39920) · 69ab2700
由 Allen Guo 提交于 3月 01, 2022

69ab2700

[Phi]rm reduce infershape (#39820) · 09039636

由 chentianyu03 提交于 3月 01, 2022

* modify infershape utils and rm reduce infershape

* merge develop

* fix infermete bug

* add IsForInferShape func in ArgumentMappingContext

* add reduce_mean infermeta

* modify annotation

* add default dims

09039636

[phi] tranfer the selu_op and pass the CI (#39819) · 197da15a

由 xiongkun 提交于 3月 01, 2022

* tranfer the selu_op and pass the CI

* add sig files

* fix code

* fix by code review

* remove TOOD

* change the include position

* change the head position

197da15a

Fixed auto codegen for intermediate tensors (#39797) · 2592805b

由 Zhanlue Yang 提交于 3月 01, 2022

* Refactored GradNodeAccumulation data structure and behaviour

* Fixed CI issues

* Fix compilation issues

* Fixed minor issues

* Reverted changes for intermediate and OverwriteOutput

* fixed minor issue

* Fixed auto codegen for intermediate tensors

* Removed restriction on AccumulationNode modification

* Fixed CI Coverage issues

* Adjusted Log contents

* Fixed CI issues

2592805b

Add mobilenetv3_large performance test for bf16 and int8 (#39738) · eb7c211a

由 joanna.wozna.intel 提交于 3月 01, 2022

* Add mobilenetv3_large performance test

* Disable the BF16 test if the device does not support BF16 computations

* Change test timeout

eb7c211a

[bf16] add bf16 kernel: layer_norm p_norm reduce_sum (#39843) · ce8ed978

由 zhangbo9674 提交于 3月 01, 2022

* add layer norm

* add p norm

* add reduce sum

* refine layer norm register bf16 for cudnn811

* add bf16 cast for hip

* add unittest

* refine rocm

* refine layer_norm unittest

* refine reduce op

* refine unittest

* enhance atol for reduce unittest

ce8ed978

W
remove conv_affine_channel_fuse_pass (#39817) · fc06be9d
由 wenbin 提交于 3月 01, 2022
```
* remove

* pass

* more pass
```
fc06be9d

[bf16] add bf16 kernel: scale gather sum (#39683) · 6d26b332

由 zhangbo9674 提交于 3月 01, 2022

* add scale gather sum

* refine CUDA_ATOMIC_WRAPPER ADD for bf16

* add gather unittest

* solve conflict

* add scale uinttest

* add sum unittest

* solve conflict

* refine gather unittest

* refine unittest

6d26b332

G

add MasterParam and MasterParamOut for sparse_momentum op (#39969) · 9de79892
由 Guoxia Wang 提交于 3月 01, 2022

9de79892
R

[phi] migrate where kernel into phi (#39811) · 468a2a17
由 ronnywang 提交于 3月 01, 2022

468a2a17

[PHI] Remove reseting dtype, layout and allocation by arg_def for outputs in executor (#39781) · 4fbcf6f4

由 zyfncg 提交于 3月 01, 2022

* remove SetAllocationForOutputTenosr

* add place param for copy kernel

* recover SetAllocationForOutputTenosr

* polish code

* fix empty_dev api bug

* remove reseting dtype and layout for output in executor

* fix merge bug

* [Phi] Add ClearHolder when re-alloc on new place in DeviceContext

* fix hostAlloc

* remove setting output allocation

* remove full_kernel_impl.h

* fix bug of xpu full_like
Co-authored-by: NAurelius84 <zhangliujie@baidu.com>

4fbcf6f4

L
[phi] move uniform_random to phi (#39937) · b3466387
由 Leo Chen 提交于 3月 01, 2022
```
* move uniform_random to phi

* fit selected_rows

* replace mutable_data
```
b3466387

[Phi] Support kps backend and kernel registry (#39941) · 08b43cce

由 Chen Weihang 提交于 3月 01, 2022

* support kps backend and compile

* resolve conflict

* fix kps backend trans

* test in xpu2 device

* remove dummy kernel

08b43cce

optimize mergeadd for sparse_adam,*test=kunlun (#39966) · d4911594

由 z8hanghuan 提交于 3月 01, 2022

* optimize mergeadd for sparse_adam,*test=kunlun

* optimize mergeadd for sparse_adam,*test=kunlun

* optimize mergeadd for sparse_adam, *test=kunlun

d4911594

[PHI] Support Multi Input and Output for InferShape (#39870) · e8d45583

由 zyfncg 提交于 3月 01, 2022

* add multi input for infer_shape

* support multi output for infershape

* fix split bug

* fix bug of concat

* support vector<MetaTensor*> in infrt

* fix bug

e8d45583

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致