提交 · 1c4e3e5dd0d32a4216bdad0b1cafcab4ca5ed5bb · PaddlePaddle / Paddle

02 3月, 2022 13 次提交

new fleet_desc builder (#39948) · 1c4e3e5d

由 ziyoujiyi 提交于 3月 02, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* refactor ps optimize

* refactor ps optimize

* refactor ps optimize

* .

* .

* .

* .

* .

* .

* refactor theoneps

* the_one_ps

* add ps pass unittest

* add ps pass unittest

* ps unitest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* ps unittest frame

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* add cpu_async_ps_mode test

* ps unittest ready

* ps unittest ready

* solve dist_pass init conflict

* solve import CommContext error

* unittest ok

* implement AllocateFrom

* solve setup.py.in conflict

* solve conflict

* solve conflict

* solve conflict

* .

* .

* cpu-async-ps minimize test ok & gpu minimize test ok

* add heter 2stage unittest

* add heter 2stage unittest

* add heter 2stage unittest

* sync/geo test ok & fix heter_worker program ok

* .

* new fleet desc generator

* new fleet_desc builder

* new fleet_desc builder

* .

* .

* correct ps.proto compile

* .
Co-authored-by: Nzkh2016 <zhangkaihuo@baidu.com>

1c4e3e5d

Z
[bf16] add bf16 kernel: softmax & log_softmax (#39999) · 4a4215ff
由 zhangbo9674 提交于 3月 02, 2022
```
* add softmax log_softmax

* refine rocm

* refine unittest
```
4a4215ff
J
[Auto Parallel] Adapt Partitioner & DistOp for ERNIE3.0 Inference and cache (#39895) · c9cd47d9
由 JZ-LIANG 提交于 3月 02, 2022
```
* adapot dist op

* add dist_fill_constant_batch_size_like

* remvoe print

* update compitable

* add unitest
```
c9cd47d9

[IPU] update ipu unittests p0 (#39707) · 1db188f3

由 Allen Guo 提交于 3月 02, 2022

* update ipu UTs part0

* rename UT

* sync api changes

* update uts for new api

* use_ipumodel() as classmethod

1db188f3

J

add logic kernel for mlu (#39940) · bc113e10
由 joeqiao12 提交于 3月 02, 2022

bc113e10
Y
[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for... · 244ae318
由 Yuang Liu 提交于 3月 02, 2022
```
[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference (#39992)
```
244ae318
Q
[MLU] adapt matmul op (#39727) · b4d931e8
由 qipengh 提交于 3月 02, 2022
```
* [MLU] adapt matmul op

* [MLU] fix phi namespace
```
b4d931e8
F

[MLU] add transpose2 mlu kernel (#39994) · 4cab812e
由 fwenguang 提交于 3月 02, 2022

4cab812e
B

add_new_comm_primitive (#40040) · 4e00d2bb
由 Baibaifan 提交于 3月 02, 2022

4e00d2bb
L

fix unittests for eignvalsh (#39841) · aa47297a
由 lkylkylky 提交于 3月 02, 2022

aa47297a
optimize CUDA implementaion of randint OP (#39952) · fb635089
由 zhouweiwei2014 提交于 3月 02, 2022
```
* change CUDA implementaion of randint OP,move distribution common func to phi

* fix CI

* fix CI
```
fb635089

[Eager] open eager when WITH_PYTHON (#39979) · 9af72957

由 wanghuancoder 提交于 3月 02, 2022

* open eager when WITH_PYTHON, test=develop

* refine, test=develop

* refine, test=develop

* add DWITH_PYTHON for gen_fluid_lib, test=develop

9af72957

W

[Eager] Support gnn ptb_rnn in eager mode (#39993) · dbcf8797
由 Weilong Wu 提交于 3月 02, 2022

dbcf8797

01 3月, 2022 13 次提交
- fix bug of paddle.to_tensor and paddle.moveaxis (#39662) · 4617c1b2
  由 zhouweiwei2014 提交于 3月 01, 2022
```
* fix bug of paddle.to_tensor and paddle.moveaxis

* fix CI
```
  4617c1b2
- A
  
  fix compiling and running with ipu (#39920) · 69ab2700
  由 Allen Guo 提交于 3月 01, 2022
  
  69ab2700
- C
  [Phi]rm reduce infershape (#39820) · 09039636
  由 chentianyu03 提交于 3月 01, 2022
```
* modify infershape utils and rm reduce infershape

* merge develop

* fix infermete bug

* add IsForInferShape func in ArgumentMappingContext

* add reduce_mean infermeta

* modify annotation

* add default dims
```
  09039636
- J
  Add mobilenetv3_large performance test for bf16 and int8 (#39738) · eb7c211a
  由 joanna.wozna.intel 提交于 3月 01, 2022
```
* Add mobilenetv3_large performance test

* Disable the BF16 test if the device does not support BF16 computations

* Change test timeout
```
  eb7c211a
- Z
  [bf16] add bf16 kernel: layer_norm p_norm reduce_sum (#39843) · ce8ed978
  由 zhangbo9674 提交于 3月 01, 2022
```
* add layer norm

* add p norm

* add reduce sum

* refine layer norm register bf16 for cudnn811

* add bf16 cast for hip

* add unittest

* refine rocm

* refine layer_norm unittest

* refine reduce op

* refine unittest

* enhance atol for reduce unittest
```
  ce8ed978
- W
  remove conv_affine_channel_fuse_pass (#39817) · fc06be9d
  由 wenbin 提交于 3月 01, 2022
```
* remove

* pass

* more pass
```
  fc06be9d
- Z
  
  add test_warpctc_op in mac (#39983) · 25650774
  由 zhangchunle 提交于 3月 01, 2022
  
  25650774
- Z
  [bf16] add bf16 kernel: scale gather sum (#39683) · 6d26b332
  由 zhangbo9674 提交于 3月 01, 2022
```
* add scale gather sum

* refine CUDA_ATOMIC_WRAPPER ADD for bf16

* add gather unittest

* solve conflict

* add scale uinttest

* add sum unittest

* solve conflict

* refine gather unittest

* refine unittest
```
  6d26b332
- H
  
  update error_string when target is out of bound (#40001) · a7acfc5b
  由 HydrogenSulfate 提交于 3月 01, 2022
  
  a7acfc5b
- Z
  [PHI] Support Multi Input and Output for InferShape (#39870) · e8d45583
  由 zyfncg 提交于 3月 01, 2022
```
* add multi input for infer_shape

* support multi output for infershape

* fix split bug

* fix bug of concat

* support vector<MetaTensor*> in infrt

* fix bug
```
  e8d45583
- A
  [Phi] Migrate logical_and/or/not/xor into Phi (#39942) · 8c237973
  由 Aurelius84 提交于 3月 01, 2022
```
* [Phi] Migrate logical_and/or/not/xor into Phi

* fix unittest

* fix function name
```
  8c237973
- S
  [DP] Construct reducer group (#39987) · 4da841e0
  由 ShenLiang 提交于 3月 01, 2022
```
* add reducer
```
  4da841e0
- S
  Optimize the CUDA kernel in DistributedFusedLamb optimizer (#39972) · d17961ed
  由 sneaxiy 提交于 3月 01, 2022
```
* vectorize lamb kernel

* remove flags, add ut

* remove useless codes

* refine code, add param order
```
  d17961ed
28 2月, 2022 5 次提交
- R
  
  fix where api doc (#39980) · 5471d162
  由 ronnywang 提交于 2月 28, 2022
  
  5471d162
- Z
  PR-CI-Py3 change cpu test (#39659) · 3cb93edf
  由 zhangchunle 提交于 2月 28, 2022
```
* update;test=cpu-py3
```
  3cb93edf
- C
  [Pten->Phi PR4] Rename pten in funcs to phi (#39961) · eb42dd52
  由 Chen Weihang 提交于 2月 28, 2022
```
* rename pten_utils to phi_utils

* rename pten_utils target

* rename Pten to Phi

* replace pten with phi

* resolve conflict
```
  eb42dd52
- Z
  [bf16] Refine BF16 amp-o1 logic (#39815) · 18ee051e
  由 zhangbo9674 提交于 2月 28, 2022
```
* refine bf16 amp-o1 logic

* refine amp GLOG

* refine unittest

* refine unittest
```
  18ee051e
- Z
  [Pten] Support optional param for C++ API (#39760) · aceb25e1
  由 zyfncg 提交于 2月 28, 2022
```
* fix selected_rows bug in C++ API

* add optional for C++ APIO

* data transform support optional

* remove data transform for optional vector<Tensor>

* adjust some format of funtcion

* fix empyt bug
```
  aceb25e1
27 2月, 2022 1 次提交
- L
  fix pylayer problem with amp (#39950) · 282e09dc
  由 Leo Chen 提交于 2月 27, 2022
```
* fix pylayer problem with amp

* add ut

* refine code
```
  282e09dc
26 2月, 2022 2 次提交

Support custom implement for C++ API (#39521) · caea126c

由 zyfncg 提交于 2月 26, 2022

* Support custom implement for C++ API

* rename api_invoke_impl to api_custom_impl

* remove manual_api

* delete mutable_data in copy_to api

* fix problem of copy_to

* add unittest for infer_meta_fn_factory

* fix split cofig in yaml

* fix split cofig in yaml

* modify sum api yaml

* add copy_to wrapped infermeta

* rollback copy impl

caea126c

W
[Eager Hook] Support GradientHook and ReduceHook, expose related interface to python (#39893) · a456dda6
由 Weilong Wu 提交于 2月 26, 2022
```
* Support Eager Hook, expose interface to python

* Fix CI issue
```
a456dda6

25 2月, 2022 6 次提交
- J
  
  added logsoftmax oneDNN kernel (#39793) · 584844ec
  由 jakpiase 提交于 2月 25, 2022
  
  584844ec
- S
  Add MultiTensorApply to calculate L2-Norm in DistributedFusedLamb optimizer (#39900) · d32a0102
  由 sneaxiy 提交于 2月 25, 2022
```
* add multi tensor apply l2 norm

* add multi_tensor_apply code

* make sizeof(TensorMeta) smalller

* move code to distributed_fused_lamb_op.cu

* remove useless FLAGS
```
  d32a0102
- Z
  
  [MLU]support launch process on mlu (#39839) · 2533cac6
  由 zn 提交于 2月 25, 2022
  
  2533cac6
- Z
  [bf16] add bf16 kernel: elementwise_add elementwise_mul elementwise_sub (#39716) · 2fedd39b
  由 zhangbo9674 提交于 2月 25, 2022
```
* add ele_add

* add ele_mul

* add ele_sub

* sovle conflict

* fix npu

* refine ele_add

* add ele_mul unittest

* refine ele_sub

* refine ci

* refine unittest
```
  2fedd39b
- J
  
  add reduce_min and reduce_max (#39899) · 44da9b42
  由 joeqiao12 提交于 2月 25, 2022
  
  44da9b42
- F
  
  [MLU] add elementwise_mul mlu kernel (#39864) · 04d324b2
  由 fwenguang 提交于 2月 25, 2022
  
  04d324b2

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功