提交 · 1d6fd81dbbdf63cf702ddb265208118f012e8f78 · PaddlePaddle / Paddle

29 1月, 2022 1 次提交

由 Liu-xiandong 提交于 1月 29, 2022

* Add XPU compiler for paddle, test=develop

* clean code

* clean useless code

* clean useless code

* clean useless code

* test

* add include path

* use clang compiler

* xpu2.cmake

* XPU2 compiler passed

* update

* update after pten

* combination the WITH_XPU and WITH_XPU2

* update the fuse operation in WITH_XPU and WITH_XPU2

* update

* update

* update

* fix the merge error

* update

* update the code

* update the code

* add run_kp_kernel flag

* update

* update

* fix prepared type_ bug

* clean and update the code

* reset the kernel_primitives

* update

* clean the code

* delete useless comment

* fix the bug in WITH_XPU

* update

* update

* modify the abi

* delete some useless code

* Parameter automation in xpu compilation

* Parameter automation in xpu compilation

* delete kps in cmake

* delete useless comment

* clean the code

* clean the code

92da5055

01 11月, 2021 1 次提交

Paddle Tensor Operation Library initial implementation (#34425) · b9fdd3bc

由 Chen Weihang 提交于 11月 01, 2021

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: Nzyfncg <1370305206@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

b9fdd3bc

31 8月, 2021 1 次提交
- W
  fix CI skip cc test error (#35264) · 3d76d003
  由 wuhuanzhou 提交于 8月 31, 2021
```
* fix CI skip cc test error, test=develop

* remove test code, test=develop
```
  3d76d003
29 7月, 2021 1 次提交
- Improve sccache hit rate and avoid absolute path (#34435) · 92d8fed8
  由 zhouweiwei2014 提交于 7月 29, 2021
  
  92d8fed8
26 5月, 2021 1 次提交
- Z
  Fix ninja compilation bug and warning on windows (#32987) · accf284b
  由 Zhou Wei 提交于 5月 26, 2021
```
* fix ninja compilation bug on windows

* polish windows ci

* polish windows ci
```
  accf284b
19 5月, 2021 1 次提交

CI skip inference test if only python files modified (#32962) · 7896b51a

由 wuhuanzhou 提交于 5月 19, 2021

* CI skip inference test if only python files modified, test=develop

* fix compilation error on ROCM, test=develop

* fix cmake error on PR-CI-ROCM-Compile, test=develop

7896b51a

21 4月, 2021 1 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

23 3月, 2021 1 次提交
- Z
  Update windows compiler and CI from VS2015 to VS2017 (#31652) · a70de87d
  由 Zhou Wei 提交于 3月 23, 2021
```
* modify windows CI to VS2017

* modify windows CI to VS2017

* modify windows CI to VS2017
```
  a70de87d
04 3月, 2021 2 次提交
- Y
  fix python full coverage decrease issue (#31429) · 62289fcc
  由 YUNSHEN XIE 提交于 3月 04, 2021
```
* fix python full coverage decrease issue

* fix
```
  62289fcc
- W
  
  Windows system supports Ninja compilation (#31161) · 4d6d2db8
  由 wuhuanzhou 提交于 3月 04, 2021
  
  4d6d2db8
03 3月, 2021 1 次提交
- W
  
  compile with VS2017, test=develop (#31388) · c1bc2236
  由 wuhuanzhou 提交于 3月 03, 2021
  
  c1bc2236
23 2月, 2021 1 次提交
- Z
  
  fix UNIX cmake problem (#31113) · 44ee251f
  由 Zhou Wei 提交于 2月 23, 2021
  
  44ee251f
21 1月, 2021 1 次提交
- Q
  
  [ROCM] update cmake and dockerfile, test=develop (#30598) · 1f5841c2
  由 Qi Li 提交于 1月 21, 2021
  
  1f5841c2
18 1月, 2021 1 次提交
- W
  
  if pybind.cc changed, generate total report, test=develop (#30514) · bd971922
  由 wanghuancoder 提交于 1月 18, 2021
  
  bd971922
24 12月, 2020 2 次提交

if PR have no .py files, do not use 'python coverage run', to speedup unit test (#29739) · 26f9ab70

由 wanghuancoder 提交于 12月 24, 2020

* reopen python coverage --include for test, test=develop

* if no .py file modified, not use coverage run, test=develop

* remove test code, test=develop

* add WITH_INCREMENTAL_COVERAGE, test=develop

* refine if else, test=develop

26f9ab70

[Feature] one ps (3/4) (#29604) · 032414ca

由 tangwei12 提交于 12月 24, 2020

* oneps (3/4)
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nmalin10 <malin10@baidu.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>

032414ca

16 12月, 2020 1 次提交

添加rocm平台支持代码 (#29342) · 76738504

由 Y_Xuan 提交于 12月 16, 2020

* 添加rocm平台支持代码

* 修改一些问题

* 修改一些歧义并添加备注

* 修改代码格式

* 解决冲突后的代码修改

* 修改operators.cmake

* 修改格式

* 修正错误

* 统一接口

* 修改日期

76738504

15 12月, 2020 1 次提交

New UT should not exceed 15s (#29492) · 2926e743

由 YUNSHEN XIE 提交于 12月 15, 2020

* added UT should not exceed 15s

* fix error

* UT limit of 15s is the first to be executed

* fix error

* fix error with CI_SKIP_CPP_TEST

* modfied tiemout setting

* fix error

2926e743

11 12月, 2020 1 次提交

Add the strategy of skipping cc/cu test compilation and execution in CI (#29499) · b5d4a1f3

由 LoveAn 提交于 12月 11, 2020

* Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop

* fix if error with CI_SKIP_TEST, test=develop

* fix add properties to test error on Linux/MAC, test=develop

* fix set test properties of test_code_generator error, test=develop

* remove test codes and advance judgment of file modification on Linux, test=develop

* rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix

* Add branch judgement on Linux, test=develop

b5d4a1f3

03 12月, 2020 1 次提交
- W
  
  add coverage incremental switch, test=develop (#29290) · 3765da98
  由 wanghuancoder 提交于 12月 03, 2020
  
  3765da98
01 12月, 2020 1 次提交
- W
  
  revert python file coverage, delete coverage run --include, test=develop (#29230) · 2b2cd186
  由 wanghuancoder 提交于 12月 01, 2020
  
  2b2cd186
30 11月, 2020 1 次提交

Generate code coverage reports only for incremental files (#28508) · 0239f796

由 wanghuancoder 提交于 11月 30, 2020

* Generate code coverage reports only for incremental files, test=develop

* Generate code coverage reports only for incremental files, test=develop

* Generate code coverage reports only for incremental files, test=develop

* test for diff python file, test=develop

* fix no python diff report, test=develop

* add cc test file, test=develop

* fix bug in generic.cmake, test=develop

* for debug no cc report, test=develp

* modify compire branch form test_pr to test, test=develop

* fix bug, test=develop

* test for h file changed, test=develop

* debug for redefinition of argument optimize error, test=develop

* close -o3 for test, test=develop

* remove -o3 for test, test=develop

* remove coverage option for nvcc, test=develop

* use CMAKE_CXX_FLAGS open coverage option when header file changed, test=develop

* reopen -o3, test=develop

* remove debug code, test=develop

* remove unused code, test=develop

0239f796

27 11月, 2020 1 次提交
- Z
  
  fix CUDA 11 error on windows (#29101) · e668cb07
  由 Zhou Wei 提交于 11月 27, 2020
  
  e668cb07
24 11月, 2020 1 次提交
- Y
  
  restore timeout value (#29027) · 5cb8e17a
  由 YUNSHEN XIE 提交于 11月 24, 2020
  
  5cb8e17a
09 11月, 2020 2 次提交
- Y
  modified timeout value on windows (#28499) · d3b2d07d
  由 YUNSHEN XIE 提交于 11月 09, 2020
```
* modified timeout value on windows

* fix some error
```
  d3b2d07d
- Y
  exec ut no more than 15s 2 (#28441) · 72c78e4d
  由 YUNSHEN XIE 提交于 11月 09, 2020
```
* exec ut no more than 15s 2

* fix for ut test_inplace_addto_strategy timeout
```
  72c78e4d
24 9月, 2020 2 次提交
- Z
  add unittest count ,install check on windows (#27492) · d20349b5
  由 Zhou Wei 提交于 9月 24, 2020
```
* add unittest count of windows

* Reduce the number of retries
```
  d20349b5
- W
  
  windows lib size crop from 5.4G to 3.9G (#27477) · ec4155d7
  由 Wilber 提交于 9月 24, 2020
  
  ec4155d7
15 9月, 2020 1 次提交
- C
  
  Set timeout value on windows and mac (#27197) · cb34cf18
  由 chalsliu 提交于 9月 15, 2020
  
  cb34cf18
26 8月, 2020 1 次提交
- Y
  
  modified timeout value on windows and mac (#26690) · ada1e129
  由 YUNSHEN XIE 提交于 8月 26, 2020
  
  ada1e129
24 8月, 2020 1 次提交

find timeout unittests (#26371) · 39fe0d35

由 YUNSHEN XIE 提交于 8月 24, 2020

* find timeout unittests

* setting timeout value

* fix some error

* fix some error

* fix some error

* fix no newline of end file error

39fe0d35

29 7月, 2020 1 次提交
- Z
  
  fix random compile failure due to missing file (#25661) · e0a9115e
  由 Zhou Wei 提交于 7月 29, 2020
  
  e0a9115e
02 7月, 2020 1 次提交
- M
  Encryption infer (#25119) · 3b8f0a64
  由 MRXLT 提交于 7月 02, 2020
```
* add encrypt api for inference lib
```
  3b8f0a64
23 6月, 2020 1 次提交
- S
  generate dummy file using cmake configure_file function to avoid re-generating it. (#25161) · f8d5fd6f
  由 Shibo Tao 提交于 6月 23, 2020
```
* generate dummy file using cmake configure_file function to avoid re-generating it. test=develop

* add cmake/dummy.c.in. test=develop
```
  f8d5fd6f
21 6月, 2020 1 次提交

don't re-generate header file if content doesn't change (#25130) · 19c4db1b

由 Shibo Tao 提交于 6月 21, 2020

* don't re-generate header file if content doesn't change. test=develop

* add copy_if_different function. test=develop

19c4db1b

05 6月, 2020 1 次提交

Builtin cuda (#24904) · 211ef78c

由 T8T9 提交于 6月 05, 2020

* support CUDA using cmake built-in way (#24395)

* support CUDA using cmake built-in way. test=develop

* test=develop

* cmake_minimum_required 3.10

* test=develop

211ef78c

01 6月, 2020 1 次提交
- W
  
  [Inference] [unittest] Inference unit tests rely on dynamic libraries (#24743) · f8e370ac
  由 Wilber 提交于 6月 01, 2020
  
  f8e370ac
13 5月, 2020 1 次提交
- S
  Revert "support CUDA using cmake built-in way (#24395). test=develop" (#24468) · 30efee33
  由 Shibo Tao 提交于 5月 13, 2020
```
This reverts commit 068d3690.
```
  30efee33
12 5月, 2020 1 次提交
- S
  support CUDA using cmake built-in way (#24395) · 068d3690
  由 Shibo Tao 提交于 5月 12, 2020
```
* support CUDA using cmake built-in way. test=develop

* test=develop
```
  068d3690
14 1月, 2020 1 次提交
- Z
  faster build by reduce by-product, reduce linking library and fix compile... · 549e6de7
  由 zhouwei25 提交于 1月 14, 2020
```
faster build by reduce by-product, reduce linking library and fix compile warning of std=c++11 (#22164)
```
  549e6de7

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功