提交 · b027652b9272a5acd2b52528671f950db7228ba5 · 机器未来 / Paddle

21 9月, 2022 1 次提交
- G
  
  remove tmp fp32 var for gaussian_random (#46285) · b027652b
  由 Guoxia Wang 提交于 9月 21, 2022
  
  b027652b
20 9月, 2022 21 次提交

由 ziyoujiyi 提交于 9月 20, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fix gloo compile warning

* adapt for nn fl-ps

* flps del fake-init op

* add learning_rate_0 intializer op

3e8b3220

W

fix miss return error. (#46298) · 461099c0
由 Wilber 提交于 9月 20, 2022

461099c0
W
Revert "Optimiza params sync between CPU and GPU. (#45805)" (#46274) · bab11094
由 Wilber 提交于 9月 20, 2022
```
This reverts commit a2b2af90.
```
bab11094
Z
[Paddle-TRT][Cherry-Pick]Fix cast bug (#46293) · 230b9a82
由 zhoutianzi666 提交于 9月 20, 2022
```
* fix cast bug
```
230b9a82
H
[cherry-pick][xpu] update xdnn activations (#46282) · a43f960e
由 houj04 提交于 9月 20, 2022
```
* [XPU] update xdnn activations. (#46246)

* [XPU] update xpu cmake. test=kunlun
```
a43f960e
Z
[Paddle-TRT] Full support for ops with persistable input (#45545) (#46280) · adb2f5e6
由 zhoutianzi666 提交于 9月 20, 2022
```
* Move ITensor construction for Weight (persistable variable) from OpConvert to TensorRTEngine.
```
adb2f5e6
H
[PolishComments] Polish some code comments (#46032) (#46261) · 42e56f65
由 HongyuJia 提交于 9月 20, 2022
```
* polish code comments

* polish data_device_transform.cc
```
42e56f65

[Cherry-Pick][AutoParallel] change import way and fix strategy (#46270) · c43ebfcf

由 zhaoyingli 提交于 9月 20, 2022

* [Auto Parallel] Change the import way of Auto Parallel (#46115)

* fix strategy (#46256)

* [Auto Parallel] performance improvement for Sharding-DP hybrid parallelism (#46180)

* remove no need grad allreduce communication when sharding-dp

* remove no need grad allreduce communication when sharding-dp

* bugfix

* bugfix

* bugfix
Co-authored-by: NYulong Ao <aoyulong@baidu.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>

c43ebfcf

[Cherry-pick] Fix amp error cp (#46272) · da173c40

由 Jiabin Yang 提交于 9月 20, 2022

* [Eager] Fix ocr (#46124)

* fix linspace error in amp

* fix log

* fix amp error

* fix ocr error which caused by amp

* add more check

* rename dtype ns

* [Eager Bug fix]Fix Detection (#46147)

* fix linspace error in amp

* fix log

* fix amp error

* Revert "Simplify size op impl (#45808)"

This reverts commit c252b1de.

* fix_seg

* fix detection
Co-authored-by: NChen Weihang <sunny_cwh@163.com>
Co-authored-by: NChen Weihang <sunny_cwh@163.com>

da173c40

[Release/2.4][Cherry-pick] Fix bug of reduce_sum op (#46160) · 759736df

由 Ghost Screaming 提交于 9月 20, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX,
its result is wrong.

* Cherry-pick of PR 46045

* Fix bug of reduce_sum kp op.

* Fix bug of reduce_sum kp operator compilation.
If compilation device is XPU, eigen kernel should be ignored.

759736df

Z
[Paddle-TRT] Support matmul_v2 in Paddle-TensorRT (#46177) · 654807cd
由 zhoutianzi666 提交于 9月 20, 2022
```
* Support matmul_v2 in Paddle-TensorRT converter.
```
654807cd
L
[cherry-pick] Refine thread pool config of interpretercore (#46219) · 1418a719
由 Leo Chen 提交于 9月 20, 2022
```
* add config

* add config

* follow comments

* fix serial run
```
1418a719
W
Fix TransDataBackend Error when call unsqueeze using MKL Tensor (#46094) (#46186) · 50340302
由 WangZhen 提交于 9月 20, 2022
```
* Fix TransDataBackend Error when call unsqueeze using MKL Tensor

* Add UT

* Refine UT
```
50340302

[Cherry-pick] Sparse add InferMeta (#46235) · fd8ec4a1

由 zhangkaihuo 提交于 9月 20, 2022

cherry-pick : #46016, #46021, #45974

* [Sparse]Sparse add support gpu (#45974)

* [Sparse]Remove unused code (#46021)

* [Sparse] Add infer meta (#46016)

fd8ec4a1

J
[Eager] Fix linspace error in amp (#46088) (#46206) · 38c0fd02
由 Jiabin Yang 提交于 9月 20, 2022
```
* fix linspace error in amp

* fix log

* fix amp error
```
38c0fd02

(cherry-pick)Support some op refuse forward and fix some bugs (#46211) · bc92d5f5

由 Charles-hit 提交于 9月 20, 2022

* support cast op backward refuse forward and fix some bugs (#46173)

* support cast op backward refuse forward

* Fix the bug of high order unit test framework

* support sign op backward refuse forward (#46002)

bc92d5f5

Z
[Inference] fix preln_residual_bias_fuse_pass bug in TNT_small model (#46178) (#46260) · c384b00d
由 zhoutianzi666 提交于 9月 20, 2022
```
* fix preln_residual_bias_fuse_pass bug in TNT_small model
```
c384b00d

Run_program_op add scope cache & reuse (#45813) (#46223) · 4f28a4c2

由 zhangbo9674 提交于 9月 20, 2022

* add scope cache & reuse

* add gc scope for end of each train step

* del scope reuse for jit

* refine code

* test

4f28a4c2

[Cherry-pick] Update layoutautotune for inplace (#45826) (#46226) · c0324e82

由 niuliling123 提交于 9月 20, 2022

cherry-pick from #45826
LayoutAutotune 支持 inplace 类型的OP
 根据 Add eager layout autotune #45409 修改意见调整UseAutotune
将LayoutAutotune判断放到controller中，与AMP 判断保持一致

c0324e82

[Cherry-pick] Fix static check (#46253) · 7712ce14

由 Roc 提交于 9月 20, 2022

* fix static_check error when compile twice (#46140)

* [CI] fix static check in build_pr_dev (#46192)
Co-authored-by: Zhou Wei <1183042833@qq.com>

7712ce14

Fix wrong eigen header include (#46082) (#46202) · ac8cce20

由 zyfncg 提交于 9月 20, 2022

* fix wrong eigen header include

* fix complie bug

* fix nan_inf_utils_detail

* fix resource_manager

* fix conv_miopen_helper

ac8cce20

19 9月, 2022 18 次提交

W

Recompute unify incubate (#46073) (#46210) · 4bced24a
由 wuhuachaocoding 提交于 9月 19, 2022

4bced24a

[vision.ops.nms] Fix return order error and duplicate results with specific... · be84cac7

由 RichardWooSJTU 提交于 9月 19, 2022

[vision.ops.nms] Fix return order error and duplicate results with specific inputs (#46148) (#46193)

* fix return order error and duplicate results with specific inputs

be84cac7

[cherry-pick] add abs,mean,sum,ge,gt,pow,etc higher-order differentiation operators (#46184) · ad8beaaf

由 Xiaoxu Chen 提交于 9月 19, 2022

* [cherry-pick] extend reduce_sum,reduce_sum,eq,ne,ge,abs,pow,etc higher order operators

* add reduce_mean,reduce_sum primitive ops
* add ne_p gt_p primitive operators
* add ge_p abs_p primitive oparators
* add cast primitive operators
* add pow,square prim2oirg rules
* add elementwise_div orig2prim rule

* [cherry-pick] add mean,sum,ge,gt,ne,abs,etc higher-order differentiation operators(#45888)

* add reduce_mean,reduce_sum primitive ops

* add ne_p gt_p primitive operators

* add ge_p abs_p primitive oparators

ad8beaaf

W

[JitLayer]Save property meta file to correct path (#46131) (#46195) · 45a3c656
由 WangZhen 提交于 9月 19, 2022

45a3c656

[cherry-pick] [dy2static] support user to use decorator in their program (#46194) · d1ce974e

由 feifei-111 提交于 9月 19, 2022

* [dy2static] support user to use decorator in their program (#45768)

* support deco

* fix deco ast type

* arg_str

* 1

* support callable deco

* code style

* codestyle

* test_error

* fix decos in another file

* recover conflict codes

* [BugFix] fixed a bug in decorator transformer, it can not analyze decorator with params correctly (#46055)

* fix deco call

* add raise

* add test

* add warn, fix paddle api

* fix error type

* fix coverage

d1ce974e

W

Add symbolic shape deduction function for general Plugin mechanism (#46179) · a0566010
由 weishengying 提交于 9月 19, 2022

a0566010
W

cherry-pick 46152 (#46183) · 707d838b
由 Wilber 提交于 9月 19, 2022

707d838b

(cherry-pick)support some op backward refuse forward (#46201) · adab3c59

由 Charles-hit 提交于 9月 19, 2022

* add unit test for sum higher level op (#45961)

* support slice op backward refuse forward and add high level unit test (#45960)

* support tile op backward refuse forward (#45942)

* support expand_v2 op backward refuse forward (#45941)

* support concat backward refuse forward (#45940)

adab3c59

W

Remove redundant code in pe engine (#46110) (#46145) · 7f0c1f0d
由 WangZhen 提交于 9月 19, 2022

7f0c1f0d

[Cherry-pick] Support bmm and bmm_grad in xpu (#45887) (#46132) · 1c7e95cc

由 Jiabin Yang 提交于 9月 19, 2022

* [PHI] Support bmm and bmm_grad in xpu (#45887)

* support bmm and bmm_grad in xpu

* add error removal

* test=kunlun

* refactor code for better structure

* test=kunlun

* add fp16 kernel for bmm

* test=kunlun

* test=kunlun

1c7e95cc

Z

fix (#46125) · 855fddec
由 zhaocaibei123 提交于 9月 19, 2022

855fddec
W

fix_recover_remove_padding kernel (#46050) (#46198) · 6b59a073
由 Wangzheee 提交于 9月 19, 2022

6b59a073
M
Add INT8 support for fused_multi_transformer_op (#45284) (#46169) · db368d5b
由 minghaoBD 提交于 9月 19, 2022
```
Co-authored-by: NRichardWooSJTU <37864677+RichardWooSJTU@users.noreply.github.com>
```
db368d5b

refactor mp. (#45803) (#46121) · e5dc9d61

由 wuhuachaocoding 提交于 9月 19, 2022

* refactor mp.

* update setup.py.

* update mp_layers.py for compatibility.

* add documents for mp_layers.py

* update init.py

* update collective.py.

* update.

* update mp_ops.py

* update.

* update code style.

* update code style.

e5dc9d61

[Cherry-pick][Auto Parallel] Improve the APIs (#46164) · c5cc4278

由 Yulong Ao 提交于 9月 19, 2022

* [AutoParallel] adapt gradient merge pass (#45915)

* adapt gradient merge

* fix op_role

* fix strategy

* [Auto Parallel] Gradient Fuse Allreduce (#45643)

* bugfix (#45332)

* dist embedding support lookup table v1

* add unitest

* customize wait_comm

* group gradients

* bugfix

* update program

* [Auto Parallel] Improve the APIs (#45776)

* [Auto Parallel] Use c++ dist attr in the completion process

* [Auto Parallel] Add minor changes

* [Auto Parallel] Use c++ dist attr in the completion process

* [Auto Parallel] Add minor changes

* [Auto Parallel] Add the serialization process for dist attrs

* [Auto Parallel] Remove unnecessary comments

* [Auto Parallel] Fix some bugs

* [Auto Parallel] Fix the code style

* [Auto Parallel] Remove unnecessary impls

* [Auto Parallel] Fix the importing error

* [Auto Parallel] Fix the copy from bugs of op dist attr

* [Auto Parallel] Replace the use of constexpr if

* [Auto Parallel] Redesign the shard_tensor, shard_op and ProcessMesh

* [Auto Parallel] Change API of the completion unittest

* [Auto Parallel] Fix the bug when set_attr an int

* [Auto Parallel] Add the unittest for the serialization

* [Auto Parallel] Add some unit tests

* [Auto Paralle] Unify the strategy

* [Auto Parallel] Improve the engine api

* [Auto Parallel] Reset the changes made to the framework

* [Auto Parallel] Change the engine unittest

* [Auto Parallel] Update API of the completion and partitioner

* [Auto Parallel] Update unit tests using engine api

* update shard annotation

* [Auto Parallel] Remove the modifications of other modules

* [Auto Parallel] Add docs for APIs

* add new strategy

* [Auto Parallel] Replace the logger

* [Auto Parallel] Restore the test_program.py

* [Auto Parallel] Change the import rules

* [Auto Parallel] Add the examples for Engine

* [Auto Parallel] Do some minor changes

* [Auto Parallel] Remove yaml dependency

* [Auto Parallel] Fix the unittests

* add valid after train

* bug fix
Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>

* [Auto Parallel] Bugfix allreduce fuse for MP (#46086)

* bugfix

* bugfix

* typos fixed

* update strategy (#46138)
Co-authored-by: Nzhaoyingli <86812880+zhaoyinglia@users.noreply.github.com>
Co-authored-by: NJZ-LIANG <jianzhongliang10@gmail.com>
Co-authored-by: Nzhaoyingli <zhaoyingli@baidu.com>
Co-authored-by: Ncaozhou <caozhou@radi.ac.cn>
Co-authored-by: Ncaozhou <48191911+Caozhou1995@users.noreply.github.com>

c5cc4278

S

fix broadcast kernel (#46158) · 860f6077
由 sneaxiy 提交于 9月 19, 2022

860f6077
J
[Eager] Optimize log (#45783) (#46133) · e468e93c
由 Jiabin Yang 提交于 9月 19, 2022
```
* make eager log readable

* fix compile error

* recover test

* invoke ci again
```
e468e93c
X

convfusion_cache (#46054) · f4ec1563
由 xiaoxiaohehe001 提交于 9月 19, 2022

f4ec1563

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致