提交 · 69e82d8331472ee6a3709ccff2ac8c6809c6a813 · 机器未来 / Paddle

29 6月, 2022 1 次提交
- R
  cherry pick 43890 (#43892) · 69e82d83
  由 ronnywang 提交于 6月 29, 2022
```
* cherry pick 43890
```
  69e82d83
27 6月, 2022 1 次提交

[Cherry-pick] Fix incompatible error for place type (#43830) · 9e776f62

由 Chen Weihang 提交于 6月 27, 2022

* Create Tensor by paddle::empty  in custom operator (#41840)

* create tensor by empty in custom op

* fix some bug

* update relu custom op demo (#43173)

* Fix incompatible error for custom op Placetype (#43749)

* fix incompatible error

* rmeove default constructor

* add macro

* fix cpu make error

* add DefaultGPUPlace api
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>

9e776f62

23 6月, 2022 1 次提交
- H
  [cherry pick 2.3][Inference]Fix the ort Backend multiple input bug(#43621 #43742) (#43739) · babba557
  由 heliqi 提交于 6月 22, 2022
```
* cherry pick form develop 43621

* code format

* paddle2onnx update to 0.9.8
```
  babba557
08 6月, 2022 1 次提交
- H
  Resolve protobuf of ORT Backend conflict (#43275) · c2804390
  由 heliqi 提交于 6月 07, 2022
```
解决onnxruntime后端依赖的protobuf跟框架或外部protobuf版本冲突问题
```
  c2804390
11 5月, 2022 1 次提交
- A
  
  [Eager]Fix EagerTensor _copy_to memory overlap problem (#42668) (#42686) · d0e733dd
  由 Aurelius84 提交于 5月 11, 2022
  
  d0e733dd
10 5月, 2022 1 次提交

[cherry-pick][MLU] support add callback to stream and profiler (#42115) · 25124d7f

由 fwenguang 提交于 5月 10, 2022

* [MLU] add mlu new profiler (#41138)

* [MLU] add mlu new profiler

* fix format

* [MLU] support add callback to stream (#41831)

* [MLU] add gather mlu kernel (#41969)

* [MLU] add mlu activation kernels (#41751)

25124d7f

09 5月, 2022 1 次提交

[Cherry-pick][IPU] merge recent changes (#42078) (#42582) · 1f9b60df

由 Allen Guo 提交于 5月 09, 2022

    add class NameScopeHelper for adding namescope info
    添加更多 种类优化器状态的映射
    为 IpuStrategy 添加 compilation_progress_logger option 用于输出 编译进度
    部分代码清理和杂项优化

1f9b60df

05 5月, 2022 1 次提交
- W
  
  fix the v100 cuda11.2 matmul_v2 and elementwise_div bug (#42479) · e052fde7
  由 wawltor 提交于 5月 05, 2022
  
  e052fde7
04 5月, 2022 1 次提交

graph partition (#42472) · a3917625

由 seemingwang 提交于 5月 04, 2022

* enable graph-engine to return all id (#42319)

* enable graph-engine to return all id

* change vector's dimension

* change vector's dimension

* enlarge returned ids dimensions

* change sample result's structure to fit training (#42426)

* enable graph-engine to return all id

* change vector's dimension

* change vector's dimension

* enlarge returned ids dimensions

* add actual_val

* change vlog

* fix bug

* bug fix

* bug fix

* fix display test

* singleton of gpu_graph_wrapper

* change sample result's structure to fit training

* recover sample code

* fix

* secondary sample

* add graph partition

* fix pybind
Co-authored-by: NDesmonDay <908660116@qq.com>
Co-authored-by: NDesmonDay <908660116@qq.com>

a3917625

29 4月, 2022 1 次提交

[cherry-pick 2.3] Add fused_multi_transformer op to optimize transformer... · 50bfe420

由 WangXi 提交于 4月 29, 2022

[cherry-pick 2.3] Add fused_multi_transformer op to optimize transformer generation performance (#42311)

* Add fused_multi_transformer op to optimize transformer generation performance (#41814)

* fix fused_multi_transformer compile failed in cuda arch < sm53 (#42315)

* fix ci timeout

50bfe420

28 4月, 2022 2 次提交

A

[Performance]Add static inline for MakeReturnPyObject (#42334) (#42339) · 3d35827c
由 Aurelius84 提交于 4月 28, 2022

3d35827c

[cherry-pick] Optimize performance of dygraph (#42231, #42253) (#42309) · 69a92b7b

由 zyfncg 提交于 4月 28, 2022

* Optimize the performanece of sum api (#42231)

* optimize the performanece of sum api

* optimize IsDenseTensorInput

* remove debug log

* Add move construct for KernelSignature (#42253)

* add move construct for KernelSignature

* add noexcept

* fix cherry-pick problem

69a92b7b

27 4月, 2022 1 次提交

fix data_structure problems in gpu graph_engine (#42321) · 9e1aa116

由 seemingwang 提交于 4月 27, 2022

* combine graph_table and feature_table in graph_engine (#42134)

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add dsm sample method

* add graph_neighbor_sample_v2

* Add graph_neighbor_sample_v2

* fix for loop

* add cpu sample interface

* fix kernel judgement

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* change index settings

* recover test

* recover test

* fix spelling

* recover

* fix

* move cudamemcpy after cuda stream sync

* fix linking problem

* remove comment

* add cpu test

* test

* add cpu test

* change comment

* combine feature table and graph table

* test

* test

* pybind

* test

* test

* test

* test

* pybind

* pybind

* fix cmake

* pybind

* fix

* fix

* add pybind

* add pybind
Co-authored-by: NDesmonDay <908660116@qq.com>

* fix conflicts

* fix test api problem (#42297)

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add dsm sample method

* add graph_neighbor_sample_v2

* Add graph_neighbor_sample_v2

* fix for loop

* add cpu sample interface

* fix kernel judgement

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* change index settings

* recover test

* recover test

* fix spelling

* recover

* fix

* move cudamemcpy after cuda stream sync

* fix linking problem

* remove comment

* add cpu test

* test

* add cpu test

* change comment

* combine feature table and graph table

* test

* test

* pybind

* test

* test

* test

* test

* pybind

* pybind

* fix cmake

* pybind

* fix

* fix

* add pybind

* add pybind

* optimize pybind

* test

* fix pybind

* fix

* pybind change

* remove file
Co-authored-by: NDesmonDay <908660116@qq.com>
Co-authored-by: NDesmonDay <908660116@qq.com>

9e1aa116

26 4月, 2022 3 次提交
- H
  [Cherry-Pick]Fix compiling ort test cases error on Windows(#42186) (#42247) · 5eba3847
  由 heliqi 提交于 4月 26, 2022
```
* fix windows compile test case error
```
  5eba3847
- W
  
  [Eager] Support numpy.ndarry in CastNumpy2Scalar (#42136) (#42213) · 983fcb56
  由 Weilong Wu 提交于 4月 26, 2022
  
  983fcb56
- fix python3.10 compile bug on windows (#42140) (#42180) · 42297995
  由 zhouweiwei2014 提交于 4月 26, 2022
```
cherry-pick #42140
```
  42297995
25 4月, 2022 1 次提交

[cherry-pick] Optimize performance of dygraph (#42093, #42103, #42137) (#42171) · 0d537003

由 zyfncg 提交于 4月 25, 2022

* optimiaze performance of PreparePhiData (#42093)

* Dygraph performance optimization (v2) (#42103)

* optimiaze performance of PreparePhiData

* dygraph performance optimization

* optimize performance of dygraph (#42137)

0d537003

24 4月, 2022 1 次提交

[Cherry-pick, Eager] Fix CastPyArg2scalar for max value of int64 (#42098) (#42129) · b543998f

由 Weilong Wu 提交于 4月 24, 2022

* [Eager] Fix CastPyArg2scalar for max value of int64 (#42098)

* [Eager] Fix CastPyArg2Scalar in Long case

* Add more test cases for paddle.clip

* Use PyLong_AsLongLong

* Fix merge conflicts

b543998f

22 4月, 2022 2 次提交

Cherry pick PR41990, add _grad_name and _grad_value for eager tensor (#41990) (#42079) · 3475c2bf

由 pangyoki 提交于 4月 22, 2022

* add _grad_name and _grad_value for eager tensor

* fix paddle_enforce

* fix paddle_enforce 2

* fix grad_name

* _grad_value return lodtensor rather than tensor

* fix

3475c2bf

B
[Cherry-pick] sharding for eager tensor (#42054) · 6ad0f061
由 Baibaifan 提交于 4月 22, 2022
```
* sharding_for_eager_tensor (#41415)

* fix_sharding_copy_right (#41849)
```
6ad0f061

21 4月, 2022 4 次提交
- W
  
  [Eager] Support numpy.narray as input for eager expand (#42043) (#42064) · ef0b5fdc
  由 Weilong Wu 提交于 4月 21, 2022
  
  ef0b5fdc
- C
  [Cherry-pick] Optimize dygraph scheduling performance (#42010) · ec1d2a16
  由 Chen Weihang 提交于 4月 21, 2022
```
* [Phi] Support setting size of vector<Tensor> for out in yaml (#41576)

* support setting vector out size in yaml

* support setting size of vector<tensor> for out in yaml

* resolve conflict
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>
```
  ec1d2a16
- A
  [Eager]Fix full_like/clip with np.generic type as attribute (#41808) (#41974) · e4cb897e
  由 Aurelius84 提交于 4月 21, 2022
```
* [Eager]Fix full_like/clip with np.generic type as attribute

* support numpy genertic

* remove usless code
```
  e4cb897e
- J
  [Eager] make fast through to linear (#41945) (#41995) · 0c141322
  由 Jiabin Yang 提交于 4月 21, 2022
```
* make fast through to linear

* make fast through to linear

* add to do for later upgrades

* support build once for now
```
  0c141322
20 4月, 2022 3 次提交

Cherry-pick PR41720, support no_need_buffer in eager_fluid state (#41720) (#41956) · 279d2db3

由 pangyoki 提交于 4月 20, 2022

* support no_need_buffer in eager_fluid state

* change no_need_buffer info from fwd_info to bwd_info

* fix CI fail, gru_unit donnot use no_need_buffer

* fix conflict between no_need_buffer and dispensable

* use tensor.define in dispensable

* solve conflict

* solve conflict

279d2db3

[Cherry-pick]fix bug for eager mode distributed training (#41975) · 9a75b4b9

由 Aurelius84 提交于 4月 20, 2022

* update (#41636)

* fix bug for eager mode distributed training (#41841)
Co-authored-by: Nlilong12 <lilong12@baidu.com>

9a75b4b9

Z
[cherry-pick] Implement Amp Layout AutoTune(41884) (#41964) · 85a4ecb6
由 Zhang Ting 提交于 4月 20, 2022
```
 cherry-pick #41884 
```
85a4ecb6

19 4月, 2022 3 次提交

[Eager] Fix numpy interface for constructing empty tensor (#41904) (#41954) · 551e9140

由 Weilong Wu 提交于 4月 19, 2022

* [Eager] Fix numpy interface for constructing empty tensor

* Fix CI, construct empty tensor

* Modify empty tensor's shape from [] to [0]

* Add more test for constructing empty tensor

551e9140

Y
[Cherry-pick 2.3] Autotune the workspace and kernel choosing of conv (#41833) · b4adbe5c
由 Yiqun Liu 提交于 4月 19, 2022
```
Cherry-pick #40338 #41741 #41313
```
b4adbe5c

[cherry-pick] XPUPS Adaptation (#41917) · a9d8b947

由 Fan Zhang 提交于 4月 19, 2022

* XPUPS Adaptation (#40991)

* Adapt XPUPS - 1st version - 3.24

* Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24

* Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25

* refactor heter comm kernel

* update. test=develop

* Adapt XPUPS - modify by compilation - 4th version - 3.27

* update calc_shard_offset. test=develop

* update xpu kernel. test=develop

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* heter_comm update

* heter_comm update

* update calc_shard_offset. test=develop

* heter_comm update

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30

* update. test=develop

* update pslib.cmake

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* Adapt XPUPS - modify by kp compilation  - 6th version - 3.30

* update. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* used by minxu

* update heter_comm_inl

* fix. test=develop

* Adapt XPUPS - modify by kp compilation  - 7th version - 3.30

* fix. test=develop

* add optimizer kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 3.31 update

* Adapt XPUPS - update kp compilation path  - 8th version - 3.31

* add optimizer kernel. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm.h 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update hashtable. test=develop

* update. test=develop

* Adapt XPUPS - update by kp compilation  - 9th version - 4.1

* update hashtable. test=develop

* fix. test=develop

* update hashtable 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 10th version - 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* modify by compilation 4.1

* update. test=develop

* update. test=develop

* fix. test=develop

* modify by compilation 4.1

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1 19:30

* fix. test=develop

* update ps_gpu_wrapper.kps 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 11th version - 4.1

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 12nd version - 4.2

* fix. test=develop

* fix. test=develop

* modify by compilation 4.2

* 4.2 update

* fix. test=develop

* template init. test=develop

* update 4.6

* fix. test=develop

* template init. test=develop

* 4.6 modify by compilation

* hashtable template init. test=develop

* hashtable template init. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 13nd version - 4.7

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.11 update

* fix. test=develop

* fix. test=develop

* 4.11 update

* update by pre-commit

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.12 update

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 14th version - 4.13

* 4.13 update

* 4.14 update

* 4.14 update

* 4.14 update

* 4.14 modify by merged latest compilation

* retry CI 4.14

* 4.15 pass static check

* 4.15 modify by gpups CI

* 3.16 update by gpups CI - modify ps_gpu_wrapper.h

* 4.16 update

* 4.16 pass xpu compile

* 4.16 retry CI

* 4.16 update
Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>

* modify ps_gpu_wrapper.cc

* update
Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>

a9d8b947

18 4月, 2022 3 次提交

L

update (#41756) · 97d1ab2a
由 lilong12 提交于 4月 18, 2022

97d1ab2a

Add eager string tensor (#41039) (#41839) · 623f8308

由 Jack Zhou 提交于 4月 18, 2022

* Add core.eager.StringTensor __init__ which pyarray args can be passed

* Add the numpy method of core.eager.StringTensor

* revert tensor.to_string modification

* Add ToPyObject for core.eager.StringTensor

* Add debug string for core.eager.StringTensor

* Remove place args of core.eager.StringTensor temporarily

* Fix check string_tensor error

* remove dtype of core.eager.StringTensor

* add core.eager.StringTensor unittest

* remove pstring from VarDesc

* Add InitStringTensorWithStringTensor

* Remove to_string modification

* Remove zero_copy arg from StringTensor creator

623f8308

[Cherry-pick] Organize the API of custom operators (#41882) · 897911fc

由 Chen Weihang 提交于 4月 18, 2022

* [Phi&CustomOp] Remove deprecated enum PlaceType for custom op & add warning (#41647)

* remove old custom op placetype

* replace dist  placetype using

* add with gpu macro

* fix mutable_data error

* fix set value error

* add comment

* remove all is initialized using (#41766)

* remove inner_place using (#41768)

* polish tensor depreacted method warning (#41807)

* [CustomOp] Fix PlaceType related compat error (#41826)

* fix place type related compat error

* fix test failed

* remove dll decl

* revert place type change

* add dll decl

* resolve conflict

897911fc

15 4月, 2022 2 次提交
- Z
  [cherry-pick] Add Sparse API to_dense, to_sparse_coo and values (#41394) (#41834) · 8300e618
  由 zhangkaihuo 提交于 4月 15, 2022
```
Add paddle.sparse and three Sparse API (#41276)
Add Sparse API to_dense, to_sparse_coo and values (#41394)
```
  8300e618
- Z
  
  support weakref for eager tensor (#41769) (#41797) · 6c067e09
  由 zhangbo9674 提交于 4月 15, 2022
  
  6c067e09
12 4月, 2022 2 次提交
- J
  [cherry-pick] add python share_date interface (#41627) · 43ee4a33
  由 JingZhuangzhuang 提交于 4月 12, 2022
```
* add python share_date interface

* Update inference_api.cc

* add python share_data interface
```
  43ee4a33
- fix dynamic flag bug on mac (#41571) (#41660) · 883d5be3
  由 zhouweiwei2014 提交于 4月 12, 2022
```
cherry-pick #41571
```
  883d5be3
11 4月, 2022 2 次提交

Unittest recover (#41431) (#41590) · 74626f66

由 zhaocaibei123 提交于 4月 11, 2022

* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix

* remove some interface

* fix

* remove

* code style

* recover

* fix

* remove code unused

* remove some unused table & accessor & CommonDenseTable => MemoryDenseTable

* fix

* fix

* fix

* recover

* remove unused code

* recover unittest

* fix

* remove

* fix

* remove code unuseful

* remove

* fix

* recover

* remove
Co-authored-by: Nesythan <esythan@126.com>
Co-authored-by: Nesythan <esythan@126.com>

74626f66

L

add send/recv to/from switch module for PrcoessGroupHeter (#41285) (#41502) · 8525bc63
由 lilong12 提交于 4月 11, 2022

8525bc63

07 4月, 2022 1 次提交
- 0
  Fix eager try catch (#41438) (#41477) · b73a70d6
  由 0x45f 提交于 4月 07, 2022
```
[Cherry-Pick]Fix eager try catch (#41438)
```
  b73a70d6

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致