提交 · c770053cb2230c1893a4d4995d45b95183a297d1 · 机器未来 / Paddle

03 8月, 2022 4 次提交
- S
  Add use_hierarchical_allreduce for DistributedFusedLAMB (#44821) · c770053c
  由 sneaxiy 提交于 8月 03, 2022
```
* add use_hierarchical_allreduce

* support hierarchical allreduce for more cases
```
  c770053c
- Z
  Phi edit distance (#44447) · 5ad3228c
  由 zhiboniu 提交于 8月 03, 2022
```
* phi_edit_distance

* fix
```
  5ad3228c
- L
  
  clean class EigenCudaStreamDevice and CudnnWorkspaceHandle in device_context.cc (#44829) · 7eb37a7e
  由 Leo Chen 提交于 8月 03, 2022
  
  7eb37a7e
- Z
  
  opt bn1d backward (#44783) · 36f08826
  由 zhangkaihuo 提交于 8月 03, 2022
  
  36f08826
02 8月, 2022 25 次提交

L

fix namespace of GPUContext (#44822) · 65f38869
由 Leo Chen 提交于 8月 02, 2022

65f38869

由 seemingwang 提交于 8月 02, 2022

* move renorm op

* change python api

* change op class func

* alloc data

* remove comments

* fix grad arguments

* fix python argument

* fix python argument

* change unit-test

* remove shape func registration

* recover extra-arguments

* recover shape functor

669353c1

C
【PFCC算子性能优化】 SeluKernel Optimization (#44490) · 859c4077
由 carryyu 提交于 8月 02, 2022
```
* [PFCC] SeluKernel Optimization

* selu kernel optimization

* add private

Co-authored-by: carryyu <>
```
859c4077

Multihead matmul fp16 (#44792) · 0fd8ee63

由 Wilber 提交于 8月 02, 2022

* multihead matmul add fp16

* fix windows error

* fix rocm error

* fix rocm error

0fd8ee63

X
[ Dy2Static ] Remove assign split (#44769) · be0ec904
由 xiongkun 提交于 8月 02, 2022
```
* just a test

* remove split assign test

* remove other useless code related to split assign
```
be0ec904
R

Do not set return_merge in test_parallel_executor_run_cinn (#44809) · 683f8190
由 Ruibiao Chen 提交于 8月 02, 2022

683f8190
R

Skip not return_merged cases for standalone executor (#44810) · 221e1376
由 Ruibiao Chen 提交于 8月 02, 2022

221e1376
D

fix gpups CUDADeviceContext to phi-GPUContext;test=develop (#44804) · 3491d183
由 danleifeng 提交于 8月 02, 2022

3491d183

[Eager] use eager final state instead intermediate state (#44722) · f1873b90

由 Weilong Wu 提交于 8月 02, 2022

* [Eager] call final_state_slice under eager mode

* rm useless comments

* use eager final state instead intermidiate state

* update fill_constant yaml

* update fill_constant yaml

* modify wrapped_infermeta_gen logic to fix special case

* fix slice in manipulation

* use fill_constant_

* modify slice infermeta

* rm final_state_conv2d

* use final_state_slice

* use final_state_slice only

* polish slice, use final state

* add paddle_throw for SplitInferMeta

* rm fill_constant_ temply

* recover array_equal, not allclose

* recover original code

f1873b90

[Phi] Move QR to Phi (#44742) · 2cf2e786

由 Yulong Ao 提交于 8月 02, 2022

* [Phi] Move Qr to the Phi

* [Phi] Regiter the cpu grad kernel for qr

* [Phi] Share the cuda kernels to lstsq

* [Phi] Remove some improper inlcude files

* [Phi] Modify codes based on the reviews

* [Phi] Remove unecessary files and add the cuda_only comment

* [Phi] Remove the unecessary include file

* [Phi] Remove qr_op.cu and lstsq_op.cu

2cf2e786

X
[Eager]Menual fused_gemm_epilogue (#44748) · a2980169
由 xiaoguoguo626807 提交于 8月 02, 2022
```
* manuel_fused_gemm_epilogue
```
a2980169
W
[Phi] polish and rename, pt* -> phi* (#44697) · 942ff89f
由 Weilong Wu 提交于 8月 02, 2022
```
* polish and rename, pt* -> phi*

* fix code format
```
942ff89f
Y
fix get_pr_ut error (#44787) · d985b4b1
由 YUNSHEN XIE 提交于 8月 02, 2022
```
* fix get_pr_ut error

* fix bug
```
d985b4b1
H
[XPU] fp16 for layer_norm op (#44778) · 4c3e13de
由 houj04 提交于 8月 02, 2022
```
* [XPU] fp16 for layer_norm op. test=kunlun
```
4c3e13de

[Dy2St]Raise TypeError when call to_static to convert a method of a common class (#44781) · c3d4a3d8

由 WangZhen 提交于 8月 02, 2022

* Fix to_static error when call to_static to convert a method of a common class

* raise typerror when class no inherits from layer

* Fix @to_static

c3d4a3d8

X

fix test-einsum-v2 unittest in cuda 11.7 (#44772) · acfdb8b3
由 xiongkun 提交于 8月 02, 2022

acfdb8b3

write trainer_desc file (#44702) · 65a3530c

由 ziyoujiyi 提交于 8月 02, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fl-ps v1.0

* .

* support N + N mode

* .

* .

* .

* .

* delete print

* .

* .

* .

* .

* fix bug

* .

* .

* fl-ps with coordinator ready

* merge dev

* update message parse only

* update fl client scheduler

* fix bug

* update multithreads sync

* fix ci errors

* update role_maker.py

* update role_maker.py

* fix ci error: windows py import error

* fix ci error: windows py import error

* fix windows ci pylib import error

* add dump fields & params

* try to fix windows import fleet error

* fix ps FLAGS error

* fix logging risk

* fix logging possible risk

* write trainer_desc file

65a3530c

R
Skip inplace for coalesce_tensor_op outputs (#44795) · bb22e59c
由 Ruibiao Chen 提交于 8月 02, 2022
```
* Skip inplace for coalesce_tensor_op outputs

* Fix typos

* Add UTs

* Fix typos
```
bb22e59c
熊

Update manipulation.py for rot90() (#44038) · 756f01db
由熊峻峰提交于 8月 02, 2022

756f01db

[phi] add yolov3_loss yaml and unittest (#44476) · c7cf12fc

由 ccrrong 提交于 8月 02, 2022

* add yaml and unittest

* update yaml

* update backward yaml and unittest

* update yaml

* add Yolov3LossGradInferMeta

* update yolov3_loss_op.cc

* fix bug

* code format

c7cf12fc

support beam_search operator on xpu. test=kunlun (#44720) · 9bf80772

由 mengqingchun02 提交于 8月 02, 2022

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

9bf80772

K

fix ut new_group_api (#44764) · d8fedcb9
由 kuizhiqing 提交于 8月 02, 2022

d8fedcb9
Z

update xpu.cmake to 20220731, test=kunlun (#44767) · 1bd6e28f
由 zhangyikun02 提交于 8月 02, 2022

1bd6e28f
R
Refactor build_op_downstream_map for standalone executor (#44729) · 9b97ac70
由 Ruibiao Chen 提交于 8月 02, 2022
```
* Refactor build_op_downstream_map for standalone executor

* Add some comments
```
9b97ac70
Modify the output result annotation under the lerp function (#44035) · d788e727
由沧夜2021 提交于 8月 02, 2022

d788e727

01 8月, 2022 11 次提交

API doc(en) Bugs fix in 第四期体验评估 (#44749) · 937ea24e

由 yang131313 提交于 8月 01, 2022

* fix docs(en) bugs;test=document_fix

* update paddle.add docs;test=document_fix

* update paddle.where docs;test=document_fix

* for ci;test=document_fix

* Update manipulation.py

* update paddle.where;test=document_fix
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>

937ea24e

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

Z

Revert for cmake static library errors on XPU KP #44762 · f15d930a
由 zhiboniu 提交于 8月 01, 2022

f15d930a

GPUGraph merge to develop (#44594) · 798670bb

由 danleifeng 提交于 8月 01, 2022

798670bb

[Sparse] optimize sparse attention (#44743) · 1149a378
由 zhouweiwei2014 提交于 8月 01, 2022

1149a378
S
set parallel_job according to CUDA memory in Windows CI unittest (#44695) · c28bb981
由 Sing_chan 提交于 8月 01, 2022
```
* set parallel_job according to CUDA memory

* fix bug: add whitespace between conten and [] or condition wont work
```
c28bb981
H

paddle2onnx update version to 1.0.0rc2 (#44759) · ffb31540
由 heliqi 提交于 8月 01, 2022

ffb31540
W
[JitLayer]Polish PEFuntion to speed up JitLayer and fix memory leak (#44738) · 75122319
由 WangZhen 提交于 8月 01, 2022
```
* Polish PEFuntion to speed up JitLayer

* Polish PEFunction code

* Fix comments
```
75122319
A

generate_unify_header supports excludes (#44761) · 212f015f
由 Aganlengzi 提交于 8月 01, 2022

212f015f
R

[CI] CI for Distributed (#44085) · f064ead6
由 Roc 提交于 8月 01, 2022

f064ead6
update manipulation.py paddle.moveaxis (#44191) · 16c7c96e
由沧夜2021 提交于 8月 01, 2022

16c7c96e

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致