提交 · 8492d3bbf6f01e98d6674b57b27913fe537584dd · 机器未来 / Paddle

02 3月, 2022 28 次提交

Z
The backward code of Sparse Conv3d (#40054) · 8492d3bb
由 zhangkaihuo 提交于 3月 02, 2022
```
Sparse Conv3d backward code
```
8492d3bb
L

run recompute's real backward with amp disabled (#40042) · 28795771
由 Leo Chen 提交于 3月 02, 2022

28795771

new fleet_desc builder (#39948) · 1c4e3e5d

由 ziyoujiyi 提交于 3月 02, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .

* cpu-async-ps minimize test ok & gpu minimize test ok

* add heter 2stage unittest

* add heter 2stage unittest

* add heter 2stage unittest

* sync/geo test ok & fix heter_worker program ok

* .

* new fleet desc generator

* new fleet_desc builder

* new fleet_desc builder

* .

* .

* correct ps.proto compile

* .
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

1c4e3e5d

P
support checking `phi` directory in CI op benchmark (#40026) · f30b3f81
由 pangyoki 提交于 3月 02, 2022
```
* support phi checking in CI op benchmark

* add sparse/gpu

* remove h file in cpu directory
```
f30b3f81
H

[Infrt]add phi kernel dialect (#39726) · 07dad6d6
由 huzhiqiang 提交于 3月 02, 2022

07dad6d6
Z
[bf16] add bf16 kernel: softmax & log_softmax (#39999) · 4a4215ff
由 zhangbo9674 提交于 3月 02, 2022
```
* add softmax log_softmax

* refine rocm

* refine unittest
```
4a4215ff
J
[Auto Parallel] Adapt Partitioner & DistOp for ERNIE3.0 Inference and cache (#39895) · c9cd47d9
由 JZ-LIANG 提交于 3月 02, 2022
```
* adapot dist op

* add dist_fill_constant_batch_size_like

* remvoe print

* update compitable

* add unitest
```
c9cd47d9
C
【phi】migrate gather_tree,reduce_prod to phi (#39844) · 6af2729e
由 crystal 提交于 3月 02, 2022
```
* move to phi

* migrate gather_tree_op into phi

* move reduce_prod tp phi

* optimize code
```
6af2729e

[IPU] update ipu unittests p0 (#39707) · 1db188f3

由 Allen Guo 提交于 3月 02, 2022

* update ipu UTs part0

* rename UT

* sync api changes

* update uts for new api

* use_ipumodel() as classmethod

1db188f3

Upgrade new profiler (#39984) · 0c3f7fbc

由 chenjian 提交于 3月 02, 2022

* add new profiler components

* fix bug

* upgrade new profiler

* fix operator.cc

* fix operator.cc

* fix cmakelists.txt

* fix bug

* fix according to pr

* fix bug

* fix cmake

* fix bug

* fix a bug

* fix bug

* fix bug

0c3f7fbc

J

add logic kernel for mlu (#39940) · bc113e10
由 joeqiao12 提交于 3月 02, 2022

bc113e10
Y
[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for... · 244ae318
由 Yuang Liu 提交于 3月 02, 2022
```
[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference (#39992)
```
244ae318
L

[KP] Activation op registration for XPU2. part 1/2 (#40002) · 90ab7403
由 Lijunhui 提交于 3月 02, 2022

90ab7403
C
[Phi] Unify complex type trait and fix real imag bug (#40036) · 0764fda2
由 Chen Weihang 提交于 3月 02, 2022
```
* unify complex type trait and fix real imag bug

* add unittest for type tratis
```
0764fda2
Q
[MLU] adapt matmul op (#39727) · b4d931e8
由 qipengh 提交于 3月 02, 2022
```
* [MLU] adapt matmul op

* [MLU] fix phi namespace
```
b4d931e8
Z

test=document_fix;record py3 case time (#40018) · 9070d5c5
由 zhangchunle 提交于 3月 02, 2022

9070d5c5
王

[infrt] speed up the infrt ci. test=devvelop (#40032) · 36660d4c
由王明冬提交于 3月 02, 2022

36660d4c
F

[MLU] add transpose2 mlu kernel (#39994) · 4cab812e
由 fwenguang 提交于 3月 02, 2022

4cab812e
B

add_new_comm_primitive (#40040) · 4e00d2bb
由 Baibaifan 提交于 3月 02, 2022

4e00d2bb
L

fix unittests for eignvalsh (#39841) · aa47297a
由 lkylkylky 提交于 3月 02, 2022

aa47297a
optimize CUDA implementaion of randint OP (#39952) · fb635089
由 zhouweiwei2014 提交于 3月 02, 2022
```
* change CUDA implementaion of randint OP,move distribution common func to phi

* fix CI

* fix CI
```
fb635089

[Eager] open eager when WITH_PYTHON (#39979) · 9af72957

由 wanghuancoder 提交于 3月 02, 2022

* open eager when WITH_PYTHON, test=develop

* refine, test=develop

* refine, test=develop

* add DWITH_PYTHON for gen_fluid_lib, test=develop

9af72957

W

ernie: revert skip_layernorm_fp16 (#39991) · 26e2b918
由 Wangzheee 提交于 3月 02, 2022

26e2b918
J

add share external data interface (#39809) · 1ff1c1e0
由 JingZhuangzhuang 提交于 3月 02, 2022

1ff1c1e0

[Pten] Gru lstm migration (#39729) · e4dba69a

由 Feiyu Chan 提交于 3月 02, 2022

* move sequence2batch

* move lstm and gru

* Add phi/kernels directory into exclusion to stop using hipcc to compile non .cu files in it.

e4dba69a

W

[Eager] Support gnn ptb_rnn in eager mode (#39993) · dbcf8797
由 Weilong Wu 提交于 3月 02, 2022

dbcf8797
F

Fix bug for prepare phi OP (#40033) · fb0cadfd
由 From00 提交于 3月 02, 2022

fb0cadfd

update pd_2_trt lower pass (#40019) · acdf0663

由 Shang Zhizhou 提交于 3月 02, 2022

* update pd_2_trt lower pass

* update pd_2_trt lower pass

* update style

* udpate

* change trt.graph to trt.create_engine

* update comments

* update comments

* add test

acdf0663

01 3月, 2022 12 次提交

Z

Added attr & tensor type mapping for final state codegen (#39997) · 852a872f
由 Zhanlue Yang 提交于 3月 01, 2022

852a872f
Q

[ROCM] fix to get rocm number in script, test=develop (#39938) · 72e462cd
由 Qi Li 提交于 3月 01, 2022

72e462cd
fix bug of paddle.to_tensor and paddle.moveaxis (#39662) · 4617c1b2
由 zhouweiwei2014 提交于 3月 01, 2022
```
* fix bug of paddle.to_tensor and paddle.moveaxis

* fix CI
```
4617c1b2
A

fix compiling and running with ipu (#39920) · 69ab2700
由 Allen Guo 提交于 3月 01, 2022

69ab2700

[Phi]rm reduce infershape (#39820) · 09039636

由 chentianyu03 提交于 3月 01, 2022

* modify infershape utils and rm reduce infershape

* merge develop

* fix infermete bug

* add IsForInferShape func in ArgumentMappingContext

* add reduce_mean infermeta

* modify annotation

* add default dims

09039636

[phi] tranfer the selu_op and pass the CI (#39819) · 197da15a

由 xiongkun 提交于 3月 01, 2022

* tranfer the selu_op and pass the CI

* add sig files

* fix code

* fix by code review

* remove TOOD

* change the include position

* change the head position

197da15a

Add function description for Kernel Primitive API (#39884) · 255bf609

由 niuliling123 提交于 3月 01, 2022

* Add function description for Kernel Primitive API
1. Set cumsum and sort share memory size = 1024
2.sort and cumsum api limitation : blockDim.x must be less than 512 (blockDim.x <= 512)

255bf609

Fixed auto codegen for intermediate tensors (#39797) · 2592805b

由 Zhanlue Yang 提交于 3月 01, 2022

* Refactored GradNodeAccumulation data structure and behaviour

* Fixed CI issues

* Fix compilation issues

* Fixed minor issues

* Reverted changes for intermediate and OverwriteOutput

* fixed minor issue

* Fixed auto codegen for intermediate tensors

* Removed restriction on AccumulationNode modification

* Fixed CI Coverage issues

* Adjusted Log contents

* Fixed CI issues

2592805b

Add mobilenetv3_large performance test for bf16 and int8 (#39738) · eb7c211a

由 joanna.wozna.intel 提交于 3月 01, 2022

* Add mobilenetv3_large performance test

* Disable the BF16 test if the device does not support BF16 computations

* Change test timeout

eb7c211a

[bf16] add bf16 kernel: layer_norm p_norm reduce_sum (#39843) · ce8ed978

由 zhangbo9674 提交于 3月 01, 2022

* add layer norm

* add p norm

* add reduce sum

* refine layer norm register bf16 for cudnn811

* add bf16 cast for hip

* add unittest

* refine rocm

* refine layer_norm unittest

* refine reduce op

* refine unittest

* enhance atol for reduce unittest

ce8ed978

W
remove conv_affine_channel_fuse_pass (#39817) · fc06be9d
由 wenbin 提交于 3月 01, 2022
```
* remove

* pass

* more pass
```
fc06be9d
Z

add test_warpctc_op in mac (#39983) · 25650774
由 zhangchunle 提交于 3月 01, 2022

25650774

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致