提交 · b2704837d2f4dd094abd1f97167862e09f0fc1b2 · Crayon鑫 / Paddle

24 6月, 2022 2 次提交

王

add xpu support for new static alone executor. test=develop (#43076) · b2704837
由王明冬提交于 6月 24, 2022

b2704837

record memory and op supplement info (#43550) · 8dd0a3b9

由 chenjian 提交于 6月 24, 2022

* record memory and op supplement info

* update

* update

* fix a bug

* fix memory recording

* fix a bug

* update

* update

* fix a bug

* update

* fix a bug

* fix a bug

* fix a bug

* Revert "fix a bug"

This reverts commit c1d4df52762ba9ae7c7e27cd2ba4fc3a7ed9c7a5.

* fix a bug

* fix format

* fix

8dd0a3b9

13 6月, 2022 1 次提交
- R
  
  Fix cmakelint errors for some files (#43428) · edf69ae0
  由 Ruibiao Chen 提交于 6月 13, 2022
  
  edf69ae0
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
12 5月, 2022 2 次提交

S

Fix some typos in paddle/. (#42408) · 2012672c
由 Shuangchi He 提交于 5月 12, 2022

2012672c

add xpu buffer_reader, *test=kunlun (#42578) · cc343a41

由 z8hanghuan 提交于 5月 12, 2022

* add xpu buffer_reader, *test=kunlun

* xpu buffer_reader, use XPUDeviceGuard, *test=kunlun

* modify xpu.cmake, *test=kunlun

* modify xpu.cmake, *test=kunlun

* modify xpu.cmake, *test=kunlun

* add xpu buffer_reader, *test=kunlun

* add xpu buffer reader, *test=kunlun

* add xpu buffer reader, *test=kunlun

cc343a41

14 4月, 2022 1 次提交

[Phi] Support construct Scalar by using Non-CPU Tensor (#41765) · 54ccc308

由 YuanRisheng 提交于 4月 14, 2022

* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency

* deal with conflict

* fix bugs when run unit test

* fix unit test bugs

54ccc308

13 4月, 2022 2 次提交

T
Revert "[Phi] Support construct Scalar by using Non-CPU Tensosr (#41528)" (#41740) · 404c4a6b
由 tianshuo78520a 提交于 4月 13, 2022
```
This reverts commit fe214af2.
```
404c4a6b

[Phi] Support construct Scalar by using Non-CPU Tensosr (#41528) · fe214af2

由 YuanRisheng 提交于 4月 13, 2022

* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency

fe214af2

31 3月, 2022 1 次提交

Maintain old profiler (#41132) · a6bf2218

由 chenjian 提交于 3月 31, 2022

* no

* maintain old profiler

* exclude new python record events for old profiler

* maintain old profiler

* maintain

* maintain old profiler

* maintain

* fix cmakes

a6bf2218

21 3月, 2022 1 次提交

[Phi] Add phi device context pool (#40635) · 0e1191f4

由 Chen Weihang 提交于 3月 21, 2022

* add phi device context pool

* change year

* fix compile error

* fix operator = error

* refine init impl

* polish details

* refine init impl

0e1191f4

08 3月, 2022 1 次提交
- A
  [custom kernel]Upgrade support for multiple libs (#40223) · c39aa18e
  由 Aganlengzi 提交于 3月 08, 2022
```
* [custom kernel]Upgade support for multi libs

* upgrade phi_custom_kernel deps
```
  c39aa18e
03 3月, 2022 1 次提交
- R
  
  [CustomRuntime] migrate CustomRuntime into phi (#39908) · b4665d23
  由 ronnywang 提交于 3月 03, 2022
  
  b4665d23
24 2月, 2022 1 次提交
- C
  [PTen->Phi PR3] Rename pten make target to phi (#39832) · f77019a0
  由 Chen Weihang 提交于 2月 24, 2022
```
* rename pten to phi

* fix infrt compile failed

* resolve conflict
```
  f77019a0
23 2月, 2022 1 次提交
- [MLU] add cncl parallel context and mlu resource pool (#39803) · 6241913b
  由 mhhhh1 提交于 2月 23, 2022
```
* [MLU] add cncl parallel context and mlu resource pool

* [MLU] fix the cncl_context_test
```
  6241913b
22 2月, 2022 1 次提交
- R
  
  [CustomRuntime] fix CustomDeviceContext (#39766) · 60fc555e
  由 ronnywang 提交于 2月 22, 2022
  
  60fc555e
21 2月, 2022 1 次提交

[PluggableDevice]custom kernel to phi core structs (#39690) · 68631ed4

由 Aganlengzi 提交于 2月 21, 2022

* [PluggableDevice]custom kernel to pten core structs

* mod extension.h for custom op

* compatible python for CI

* support custom context

* refactor to pten

* fix windows and ut

68631ed4

14 2月, 2022 1 次提交
- W
  context add generator (#39475) · 463e31f4
  由 Wilber 提交于 2月 14, 2022
```
* context add generator

* update
```
  463e31f4
06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
30 1月, 2022 1 次提交
- feat(cncl_mlu): add cncl dev for mlu distributed backend (#39294) · d28f6f7b
  由 mhhhh1 提交于 1月 30, 2022
  
  d28f6f7b
28 1月, 2022 1 次提交

Host tracer and ProfilerController (#39230) · 7c489c2e

由 liutiexing 提交于 1月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* split template

* Add Profiler and HostTracer

* update

* update

* update

* updateg

* fix cmake
Co-authored-by: Nliutiexing <liutiexing@google.com>

7c489c2e

27 1月, 2022 2 次提交

Q

[MLU] add compile ci scripts for MLU, test=mlu_ci (#39122) · 56410b4a
由 Qi Li 提交于 1月 27, 2022

56410b4a

[PluggableDevice] Add custom kernel support based on pten kernel management (#38848) · a8879215

由 Aganlengzi 提交于 1月 27, 2022

* [Demo] custom kernel based on pten kernel

* merge and npu custom work well

* del comments

* delete other code

* fix CUDAContext

* fix not found small_vector.h

* support NPU

* fix NPUContext

* fix DeviceContext support

* add UT

* fix call

* add UT

* fix

* fix for comments and ut

* add MACRO control

* fix multi input output

* support env CUSTOM_DEVICE_ROOT

* deal with special cases

* fix for Windows

* try coverage with test_custom_kernel_dot.py

* fix test_custom_kernel_dot

* fix test_custom_kernel_dot

* fix merge

* fix merge

* fix CI

* update

* merge and fix

* remove WITH_CUSTOM_KERNEL

* fix merge

* merge and fix

* fix ut

* fix ut for mac

* add more UT

* add more UT

* fix

a8879215

26 1月, 2022 2 次提交

[IPU] sync misc changes 01 (#38876) · 4efbebea

由 Allen Guo 提交于 1月 26, 2022

* sync misc changes

* apply comments 01

* fix compile error

* remove is_ipu_place check

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* sync changes

* restore cmake

* update ir cmake and setup.py

* update inference_lib cmake

* split PR
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

4efbebea

W
[PTEN] cpu_context add eigen deps (#39234) · bd5c962d
由 Wilber 提交于 1月 26, 2022
```
* add eigen deps

* update
```
bd5c962d

25 1月, 2022 2 次提交
- W
  
  [PTEN] Add xpu context. (#39098) · c1e5a393
  由 Wilber 提交于 1月 25, 2022
  
  c1e5a393
- X
  [PTen] Migrate string tinyformat errors and part of enforce into pten (#39051) · 6ca49164
  由 xiongkun 提交于 1月 25, 2022
```
* transfer: string tinyformat errors and part of enforce into pten

* remove comment

* fix by code review

* assert is not compile in -DNDEBUG

* add string as dependences of paddle_inference
```
  6ca49164
21 1月, 2022 1 次提交

[PTEN] Add cpu context (#38979) · 064bc4b8

由 Wilber 提交于 1月 21, 2022

* add cpu_context.

* update

* update

* update

* update

* update

* fix ci problem

* fix npu ci problem

* update

* fix ci compile

064bc4b8

17 1月, 2022 1 次提交

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

12 1月, 2022 1 次提交

Os info (#38779) · 0d8d1e0e

由 liutiexing 提交于 1月 12, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* os_info update

* update

* update

* update

* update

* update

* fix

* update

* update for windows

* fix windows

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

0d8d1e0e

10 1月, 2022 1 次提交

Profiler skeleton (#38826) · a8afed69

由 liutiexing 提交于 1月 10, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* profiler skeleton

* update

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

a8afed69

29 12月, 2021 1 次提交

Make profiler better (#38280) · 851637fd

由 liutiexing 提交于 12月 29, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update OS info

* split host_event_recorder

* split host_event_recorder

* update

* update

* update

* update

* update

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

851637fd

20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
16 12月, 2021 1 次提交

Adapt host event recorder to profiler (#37766) · 5b6be4d7

由 liutiexing 提交于 12月 16, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* add os_info

* update

* update

* update

* update

* update

* update for bugfix

* update

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

5b6be4d7

07 12月, 2021 1 次提交
- J
  
  add ipu device p1 (#37841) · c9a3c669
  由 jianghaicheng 提交于 12月 07, 2021
  
  c9a3c669
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

23 11月, 2021 1 次提交
- Q
  [XPU] Reorganize xpu device codes in platform, test=develop (#37428) · 79800978
  由 Qi Li 提交于 11月 23, 2021
```
* [XPU] Reorganize xpu device codes in platform, test=develop

* fix xpu_header.h, test=develop
```
  79800978
01 11月, 2021 1 次提交

Paddle Tensor Operation Library initial implementation (#34425) · b9fdd3bc

由 Chen Weihang 提交于 11月 01, 2021

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: Nzyfncg <1370305206@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

b9fdd3bc

29 9月, 2021 1 次提交

Add basic support for CUDA Graph (#36190) · 21b93c3d

由 Zeng Jinle 提交于 9月 29, 2021

* add basic support for CUDA Graph

* fix ci compile error

* fix LOG print, fix windows CI

* follow comments and update

* small fix for default ctor

* fix rocm compile error

* fix CPU compile error

21b93c3d

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致