提交 · 2d6d6fa1ed3e6dca69dc1d3c1bf2248a2743570f · 机器未来 / Paddle

30 1月, 2022 2 次提交
- L
  
  delete FLAGS_run_pten_kernel (#39330) · 2d6d6fa1
  由 Leo Chen 提交于 1月 30, 2022
  
  2d6d6fa1
- feat(cncl_mlu): add cncl dev for mlu distributed backend (#39294) · d28f6f7b
  由 mhhhh1 提交于 1月 30, 2022
  
  d28f6f7b
29 1月, 2022 2 次提交

由 Liu-xiandong 提交于 1月 29, 2022

* Add XPU compiler for paddle, test=develop

* clean code

* clean useless code

* clean useless code

* clean useless code

* test

* add include path

* use clang compiler

* xpu2.cmake

* XPU2 compiler passed

* update

* update after pten

* combination the WITH_XPU and WITH_XPU2

* update the fuse operation in WITH_XPU and WITH_XPU2

* update

* update

* update

* fix the merge error

* update

* update the code

* update the code

* add run_kp_kernel flag

* update

* update

* fix prepared type_ bug

* clean and update the code

* reset the kernel_primitives

* update

* clean the code

* delete useless comment

* fix the bug in WITH_XPU

* update

* update

* modify the abi

* delete some useless code

* Parameter automation in xpu compilation

* Parameter automation in xpu compilation

* delete kps in cmake

* delete useless comment

* clean the code

* clean the code

92da5055

Q
fix kunlun2 softmax unitest bug (#39274) · 23bb2836
由 QingshuChen 提交于 1月 29, 2022
```
* fix kunlun2 softmax unitest bug
*test=kunlun

* minor
```
23bb2836

28 1月, 2022 2 次提交

F

Remove macro for GetGpuBasePtr (#39279) · 9a001c09
由 From00 提交于 1月 28, 2022

9a001c09

Host tracer and ProfilerController (#39230) · 7c489c2e

由 liutiexing 提交于 1月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* split template

* Add Profiler and HostTracer

* update

* update

* update

* updateg

* fix cmake
Co-authored-by: Nliutiexing <liutiexing@google.com>

7c489c2e

27 1月, 2022 4 次提交

[PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c

由 Aurelius84 提交于 1月 27, 2022

* Support allocate_from in Tensor and allocate_data in Context

* fix #ifdef CUDA

* fix cycle depends

* fix test_xxx_dev_api failed

* fix windows compiling error

* fix unittest

* modify into PImpl

* fix selected rows

* add TODO comment

* refine interface according reviewer

5631da9c

Q

[MLU] add compile ci scripts for MLU, test=mlu_ci (#39122) · 56410b4a
由 Qi Li 提交于 1月 27, 2022

56410b4a

[PluggableDevice] Add custom kernel support based on pten kernel management (#38848) · a8879215

由 Aganlengzi 提交于 1月 27, 2022

* [Demo] custom kernel based on pten kernel

* merge and npu custom work well

* del comments

* delete other code

* fix CUDAContext

* fix not found small_vector.h

* support NPU

* fix NPUContext

* fix DeviceContext support

* add UT

* fix call

* add UT

* fix

* fix for comments and ut

* add MACRO control

* fix multi input output

* support env CUSTOM_DEVICE_ROOT

* deal with special cases

* fix for Windows

* try coverage with test_custom_kernel_dot.py

* fix test_custom_kernel_dot

* fix test_custom_kernel_dot

* fix merge

* fix merge

* fix CI

* update

* merge and fix

* remove WITH_CUSTOM_KERNEL

* fix merge

* merge and fix

* fix ut

* fix ut for mac

* add more UT

* add more UT

* fix

a8879215

A

[NPU] fix aarch64 deps (#39257) · 80dfa010
由 Aganlengzi 提交于 1月 27, 2022

80dfa010

26 1月, 2022 4 次提交

[pten] remove deprecated fluid op kernel for pten (#38842) · 3ab9aef1

由 Leo Chen 提交于 1月 26, 2022

* update cmake file to remove fluid kernel

* add pten declaration.h to where pybind.h used

* fix sync_bn and tensorrt_engine

* refine detection_library

* fix interpreter_core

* support eager legacy

* fit eager legacy for pten

* fall back to cpu if not found kernel

* fix compile problem

* fix compile problem

* refine fallback logic

* fit operator.run()

* fix xpu compile

* fit for new_exec

* add REGISTER_OP_WITHOUT_GRADIENT

* un-cache pt_kernel_context

* fix compile

* fix cudnn

* fix compiling with on_infer

* fix mkldnn

* fix isfinite_v2

* fix xpu problem

* fix op_device

* refine fallback for xpu

* fix xpu compile

* merge develop

* refine code format

* fix compile

* fix compile

* add data_transfer

* fix PreparePtenData

* fix cpu context

* merge develop

* fix compile

* fix error device context

* fix xpu

* fix dev_ctx

3ab9aef1

[IPU] sync misc changes 01 (#38876) · 4efbebea

由 Allen Guo 提交于 1月 26, 2022

* sync misc changes

* apply comments 01

* fix compile error

* remove is_ipu_place check

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* sync changes

* restore cmake

* update ir cmake and setup.py

* update inference_lib cmake

* split PR
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

4efbebea

W
[PTEN] cpu_context add eigen deps (#39234) · bd5c962d
由 Wilber 提交于 1月 26, 2022
```
* add eigen deps

* update
```
bd5c962d

add sigmoid cross entropy with logits to kl2 (#38915) · fd44de58

由 houj04 提交于 1月 26, 2022

* add sigmoid cross entropy with logits to kl2. test=kunlun

* add sigmoid cross entropy with logits to kl2. test=kunlun

* follow comments. test=kunlun

fd44de58

25 1月, 2022 5 次提交

add trace event data structure definition (#39109) · 57b2033b

由 chenjian 提交于 1月 25, 2022

* add trace event data structure definition

* convert enum item to string for cupti enum explaination

* modify paddle_enforce_eq description

57b2033b

J
[MLU]add mlu kernel for split and concat (#39020) · ac3dc0bb
由 joeqiao12 提交于 1月 25, 2022
```
* [MLU]add mlu kernel for concat and split op

* delete device_context DEPS
```
ac3dc0bb

Optimize nearest_interp forward (#38528) · 232bbce2

由 Lijunhui 提交于 1月 25, 2022

* init commit

* remove comments

* remove nchw branch

* optimize code

* apply fast div mod in 1D kernel, rm 3D kernel

* move init of FastDivMode to CPU

* 3D kernel for nchw, FastDiv for 1D kernel

* debug done. process boundary

* 2^n

* optimize

* optimize

* change code & optimize code

232bbce2

W

[PTEN] Add xpu context. (#39098) · c1e5a393
由 Wilber 提交于 1月 25, 2022

c1e5a393

[PTen] Migrate string tinyformat errors and part of enforce into pten (#39051) · 6ca49164

由 xiongkun 提交于 1月 25, 2022

* transfer: string tinyformat errors and part of enforce into pten

* remove comment

* fix by code review

* assert is not compile in -DNDEBUG

* add string as dependences of paddle_inference

6ca49164

24 1月, 2022 3 次提交

Remved redundant defintions of likely/unlikely (#38911) · 43919d0a

由 Jacek Czaja 提交于 1月 24, 2022

* - more unlikely

* - compilation fix

* - removed redundant definition

* - fix

* - Fixes

* - compilation fix for windows

43919d0a

[Pten] Migration of eigen numeric extensions and functors in paddle/fluid/operatos/eigen (#39124) · a1e40dc6

由 Feiyu Chan 提交于 1月 24, 2022

* migration of functors in paddle/fluid/operators/eigen and paddle/fluid/platform/eigen_ext.h
* update path of data types like float16.h in includes in extensions.h

a1e40dc6

[PTEN] Move dynload from fluid to pten. (#39120) · 3c1dc6f6

由 Wilber 提交于 1月 24, 2022

* move dynload from fluid to pten.

* fix ci compile

* fix windows ci compile.

* update

* update

* fix compile error

3c1dc6f6

21 1月, 2022 4 次提交
- T
  refactor unittest for kunlun (#38772) · 4f1fef60
  由 TTerror 提交于 1月 21, 2022
```
* refactor unittests for kunlun

* refactor unittests for kunlun, test=kunlun
```
  4f1fef60
- A
  [PTen]Migrate Dim and DDim from paddle::framework into pten namespace (#39053) · 4e23ba32
  由 Aurelius84 提交于 1月 21, 2022
```
* Migrate Dim and DDim from paddle::framework into pten namespace

* fix paddle::framework::Array

* fix framework::Array
```
  4e23ba32
- R
  
  fix npu c_allgather int64 (#39099) · 89f903da
  由 ronnywang 提交于 1月 21, 2022
  
  89f903da
- W
  [PTEN] Add cpu context (#38979) · 064bc4b8
  由 Wilber 提交于 1月 21, 2022
```
* add cpu_context.

* update

* update

* update

* update

* update

* fix ci problem

* fix npu ci problem

* update

* fix ci compile
```
  064bc4b8
20 1月, 2022 2 次提交
- S
  
  fix device_context place print (#39062) · 3dd7f353
  由 sneaxiy 提交于 1月 20, 2022
  
  3dd7f353
- A
  [Pten] Migrate bfloat16/float16/complex from paddle::platform into pten::common (#39044) · f1143f0c
  由 Aurelius84 提交于 1月 20, 2022
```
* Migrate bfloat16/float16/complex from platform into pten::common

* fix typo

* fix code style
```
  f1143f0c
19 1月, 2022 1 次提交
- Z
  
  Add conv2d_transpose and conv2d_transpose_grad for XPU,test=kunlun (#38956) · c7de7440
  由 zhangyikun02 提交于 1月 19, 2022
  
  c7de7440
18 1月, 2022 3 次提交

change CUDA implementaion of uniform/gaussian OP (#38611) · bbbd75e4
由 zhouweiwei2014 提交于 1月 18, 2022
```
* change CUDA implementaion of uniform/gaussian OP

* fix unittest
```
bbbd75e4

[Unify Tensors PR #8] Merged Tensor into DenseTensor, test=allcases (#38914) · 2052f1e3

由 Zhanlue Yang 提交于 1月 18, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Patched python level LoDTensor

* Merge Tensor into DenseTensor

* Fixed namespace issues,test=allcases

* Fixed merge issues

* Fixed inference issues

* Fixed NPU test issues

* Fixed merge issues

2052f1e3

S
Speedup FP16 Gelu op using fast math and vectorized 8 kernel (#38980) · 8c20d668
由 sneaxiy 提交于 1月 18, 2022
```
* speedup gelu using fast math

* add bwd part
```
8c20d668

17 1月, 2022 4 次提交

J

fix for conv2D training error (#38938) · 944ea436
由 jakpiase 提交于 1月 17, 2022

944ea436

update ipu_executor, remove ipu_optimizer (#38986) · 05c98ec7

由 Allen Guo 提交于 1月 17, 2022

Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

05c98ec7

[IPU] update ipu_backend p0 (#38854) · b2aee3e3

由 Allen Guo 提交于 1月 17, 2022

* update ipu_backend

* sync with paddle internal
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* apply comments 01

* update error messag

* restore ipu_executor and ipu_optimizer

* add clang-format on
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

b2aee3e3

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

15 1月, 2022 1 次提交

[Unify Tensors PR ] Merged LoDTensor with Tensor, test=allcases (#38880) · 88966b28

由 Zhanlue Yang 提交于 1月 15, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Fixed example code failure

* Polished function names, removed duplicated forward declarations

88966b28

14 1月, 2022 1 次提交

[XPU]add stack_grad op for kunlun2,*test=kunlun (#38674) · 87ee3e4f

由 Zhangjingyu06 提交于 1月 14, 2022

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun,*test=kunlun

* [XPU]add stack_grad op for kunlun2,*test=kunlun
Co-authored-by: NQingshuChen <chenqingshu@baidu.com>

87ee3e4f

13 1月, 2022 2 次提交

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

石

splits allocation for pten, test=develop (#38853) · 277cf900
由石晓伟提交于 1月 13, 2022

277cf900

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致