提交 · 0d537003eb0681f010d08fd227f34ab952d10463 · PaddlePaddle / Paddle

25 4月, 2022 2 次提交

[cherry-pick] Optimize performance of dygraph (#42093, #42103, #42137) (#42171) · 0d537003

由 zyfncg 提交于 4月 25, 2022

* optimiaze performance of PreparePhiData (#42093)

* Dygraph performance optimization (v2) (#42103)

* optimiaze performance of PreparePhiData

* dygraph performance optimization

* optimize performance of dygraph (#42137)

0d537003

[Cherry-Pick][Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm... · 58d0d15e

由 Aurelius84 提交于 4月 25, 2022

[Cherry-Pick][Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm and fix shape op (#42170)

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT (#42138)

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT

* [Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm (#42132)

58d0d15e

24 4月, 2022 2 次提交
- T
  add build pylayer depend pybind (#42135) · dd4ef244
  由 tianshuo78520a 提交于 4月 24, 2022
```
解决编译依赖失败问题
```
  dd4ef244
- W
  [Cherry-pick, Eager] Fix CastPyArg2scalar for max value of int64 (#42098) (#42129) · b543998f
  由 Weilong Wu 提交于 4月 24, 2022
```
* [Eager] Fix CastPyArg2scalar for max value of int64 (#42098)

* [Eager] Fix CastPyArg2Scalar in Long case

* Add more test cases for paddle.clip

* Use PyLong_AsLongLong

* Fix merge conflicts
```
  b543998f
23 4月, 2022 1 次提交

[XPUPS]add hashtable interface (#41987) (#42110) · 6ab441bb

由 zmxdream 提交于 4月 23, 2022

* add hashtable interface. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix optimizer config for xpups. test=develop

* fix. test=develop

* fix. test=develop

6ab441bb

22 4月, 2022 7 次提交
- A
  
  [Eager]Fix SetDeviceId in eager_final_state_api from python_c_gen.py (#42025) (#42067) · 9f3d9381
  由 Aurelius84 提交于 4月 22, 2022
  
  9f3d9381
- P
  Cherry pick PR41990, add _grad_name and _grad_value for eager tensor (#41990) (#42079) · 3475c2bf
  由 pangyoki 提交于 4月 22, 2022
```
* add _grad_name and _grad_value for eager tensor

* fix paddle_enforce

* fix paddle_enforce 2

* fix grad_name

* _grad_value return lodtensor rather than tensor

* fix
```
  3475c2bf
- H
  fix onnxruntime bug (#42095) (#42104) · 26cc5c54
  由 heliqi 提交于 4月 22, 2022
```
修复ORT在batch变动时，输出shape不对问题
```
  26cc5c54
- B
  
  add mkldnn compute_propagate_scales int8 pass (#41592) (#42080) · 41003161
  由 baoachun 提交于 4月 22, 2022
  
  41003161
- J
  
  Add UT (#42055) · 4f6aba87
  由 Jacek Czaja 提交于 4月 22, 2022
  
  4f6aba87
- B
  [Cherry-pick] sharding for eager tensor (#42054) · 6ad0f061
  由 Baibaifan 提交于 4月 22, 2022
```
* sharding_for_eager_tensor (#41415)

* fix_sharding_copy_right (#41849)
```
  6ad0f061
- A
  [IPU] add mixed-precission support for ipu (#41733) (#41906) · c09b1d68
  由 Allen Guo 提交于 4月 22, 2022
```
add mixed-precission support for ipu

cherry-pick from #41733
```
  c09b1d68
21 4月, 2022 18 次提交
- W
  
  [Eager] Support numpy.narray as input for eager expand (#42043) (#42064) · ef0b5fdc
  由 Weilong Wu 提交于 4月 21, 2022
  
  ef0b5fdc
- Z
  [Cherry-Pick]Move pass optimizations into CINN. (#42047) (#42070) · 2f2f987c
  由 Zhen Wang 提交于 4月 21, 2022
```
* Move pass optimizations into CINN.
```
  2f2f987c
- Z
  
  support bce_loss and bce_loss_grad in XPU, test=kunlun (#41610) · b1ba98ca
  由 zhangyikun02 提交于 4月 13, 2022
  
  b1ba98ca
- [cherry-pick]support multi_layer of bilstm,*test=kunlun (#42076) · 58f6d459
  由 z8hanghuan 提交于 4月 21, 2022
```
* modify xpu.cmake,*test=kunlun (#41832)

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* modify xpu.cmake,*test=kunlun

* support bilstm,*test=kunlun

* [cherry-pick]support multi_layer of bilstm,*test=kunlun
```
  58f6d459
- L
  [Cherry-pick] fix the bug for nccl barrier and alltoall (#42042) · 8a12f459
  由 lilong12 提交于 4月 21, 2022
```
* fix_nccl_barrier (#41970)

* be compatible with the old version of alltoall (#42007)
Co-authored-by: NBaibaifan <39549453+Baibaifan@users.noreply.github.com>
```
  8a12f459
- W
  
  fix inf in fused_attention (#41933) (#42032) · 50fd2450
  由 WangXi 提交于 4月 21, 2022
  
  50fd2450
- W
  double accessor and show_scale (#41943) (#42014) · efaef31a
  由 wangguanqun 提交于 4月 21, 2022
```
* double accessor and show_scale

* double accessor and show_scale

* rename

* fix bug in pslib config

* add unittest
```
  efaef31a
- B
  update gpu fp16 op blacklist (#41703) (#42051) · 97104695
  由 baoachun 提交于 4月 21, 2022
```
* update gpu fp16 op blacklist

* update blacklist
```
  97104695
- B
  
  add mkldnn int8 pass [step1] (#41579) (#42045) · 04f20b83
  由 baoachun 提交于 4月 21, 2022
  
  04f20b83
- Z
  [cherry-pick] Adjust the Phi C++ API and yaml (#41576, #41778, #41909) (#41928) · d24a402e
  由 zyfncg 提交于 4月 21, 2022
```
* [PHI] Support some c++ api in paddle namespace (#41778)

* support some c++ api in paddle namespace

* change c++ api namespace in custom op

* [Phi] Support setting size of vector<Tensor> for out in yaml (#41576)

* support setting vector out size in yaml

* support setting size of vector<tensor> for out in yaml

* add data transform config for shape and size (#41909)

* fix api_gen bug
```
  d24a402e
- C
  [Cherry-pick] Optimize dygraph scheduling performance (#42010) · ec1d2a16
  由 Chen Weihang 提交于 4月 21, 2022
```
* [Phi] Support setting size of vector<Tensor> for out in yaml (#41576)

* support setting vector out size in yaml

* support setting size of vector<tensor> for out in yaml

* resolve conflict
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>
```
  ec1d2a16
- A
  [Eager]Fix full_like/clip with np.generic type as attribute (#41808) (#41974) · e4cb897e
  由 Aurelius84 提交于 4月 21, 2022
```
* [Eager]Fix full_like/clip with np.generic type as attribute

* support numpy genertic

* remove usless code
```
  e4cb897e
- T
  [cherry-pick] enable auto-tune when using cinn (#41795) (#42006) · f5d356b8
  由 TeFeng Chen 提交于 4月 21, 2022
```
cherry-pick #41795
```
  f5d356b8
- B
  
  update demo_ci ut threshold (#41981) (#42030) · efddf9ea
  由 baoachun 提交于 4月 21, 2022
  
  efddf9ea
- J
  
  fix adaptive pool pass bug (#42022) · 5b9cdd9b
  由 JingZhuangzhuang 提交于 4月 21, 2022
  
  5b9cdd9b
- J
  [Cherry-pick] Enabled test_imperative_star_gan_with_gradient_penalty.py under eager mode (#41994) · af7439ad
  由 Jiabin Yang 提交于 4月 21, 2022
```
* cherry-pick python/paddle/utils/code_gen/backward.yaml

* remove unsupported yaml
Co-authored-by: NZhanlue Yang <jim19930609@gmail.com>
```
  af7439ad
- J
  [Eager] make fast through to linear (#41945) (#41995) · 0c141322
  由 Jiabin Yang 提交于 4月 21, 2022
```
* make fast through to linear

* make fast through to linear

* add to do for later upgrades

* support build once for now
```
  0c141322
- C
  [Cherry-pick] Polish custom op details (#42008) · f637e3d2
  由 Chen Weihang 提交于 4月 21, 2022
```
* polish tensor api details (#41971)

* [CustomOp] Fix custom op pinned input error (#41972)

* fix custom op pinned input error

* fix compile error

* fix inference custom op (#41999)

* resolve conflict
```
  f637e3d2
20 4月, 2022 10 次提交

L

update (#41762) (#41843) · 1e18b57b
由 lilong12 提交于 4月 20, 2022

1e18b57b
H
[Dygraph] Refactor Model Parallel in eager mode (#41761) (#41960) · 5ce7f48d
由 Haohongxiang 提交于 4月 20, 2022
```
* refactor mp in eager mode

* update

* update

* add uts
```
5ce7f48d

cherry pick recent updates in graph-engine to release2.3 (#42027) · ef78c9c2

由 seemingwang 提交于 4月 20, 2022

* gpu_graph engine optimization+ (#41455)

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* recover test

* recover test

* fix spelling

* recover

* fix

* Cpu gpu graph engine (#41942)

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* recover test

* recover test

* fix spelling

* recover

* fix

* fix linking problem

* remove comment

ef78c9c2

[cherry-pick] Cherry pick pr of new-exec (#42009) · 80992253

由 Leo Chen 提交于 4月 20, 2022

* [new-exec] shrink downstream map (#41471)

* shrink downstream map

* shrink last live ops of var

* add comment

* fix bug

* add dependency for send/recv to support pp parallel (#41652)

* [new-exec] clear the scope listener after run (#41947)

* clear the listener after run

* only sync variables in program

* refine code

* fit for lod_tensor_blocking_queue

80992253

H
windows compile add onnxruntime switch (#41988) (#42015) · 23cc4636
由 heliqi 提交于 4月 20, 2022
```
windows编译脚本增加onnxruntime编译选项
```
23cc4636
F

fix:conflict (#41913) · 4ef0a0b7
由 feng_shuai 提交于 4月 20, 2022

4ef0a0b7

Cherry-pick PR41720, support no_need_buffer in eager_fluid state (#41720) (#41956) · 279d2db3

由 pangyoki 提交于 4月 20, 2022

* support no_need_buffer in eager_fluid state

* change no_need_buffer info from fwd_info to bwd_info

* fix CI fail, gru_unit donnot use no_need_buffer

* fix conflict between no_need_buffer and dispensable

* use tensor.define in dispensable

* solve conflict

* solve conflict

279d2db3

J
Fixed performance issue regarding BackwardRun using add_final_state_dygraph (#41912) (#41991) · 968bf46e
由 Jiabin Yang 提交于 4月 20, 2022
```
Co-authored-by: NZhanlue Yang <jim19930609@gmail.com>
```
968bf46e

[Phi] Support construct Scalar by using Non-CPU Tensor (#41765) (#41963) · 3b25afb2

由 YuanRisheng 提交于 4月 20, 2022

* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency

* deal with conflict

* fix bugs when run unit test

* fix unit test bugs

3b25afb2

[Cherry-pick]fix bug for eager mode distributed training (#41975) · 9a75b4b9

由 Aurelius84 提交于 4月 20, 2022

* update (#41636)

* fix bug for eager mode distributed training (#41841)
Co-authored-by: Nlilong12 <lilong12@baidu.com>

9a75b4b9

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功