提交 · 463e31f4b3d64361f8fdbf57e6f2a918c2d1f2cc · Crayon鑫 / Paddle

14 2月, 2022 2 次提交

W
context add generator (#39475) · 463e31f4
由 Wilber 提交于 2月 14, 2022
```
* context add generator

* update
```
463e31f4

由 liutiexing 提交于 2月 14, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* add log for Executor

* Add CudaTracer to trace CUDA events
Co-authored-by: Nliutiexing <liutiexing@google.com>

0790f949

11 2月, 2022 5 次提交

Fix add profiler node tree implementation cmake error (#39474) · 739da6cb

由 chenjian 提交于 2月 11, 2022

* add event node implementation

* modify profiler.stop interface

* fix according to review

* fix file mode

* modify class method name in event_node.cc

* modify LLONG_MAX to ULLONG_MAX

* fix ci error

* fix ci error

* fix dependency error

739da6cb

Added shape (U)INT8/BF16/FP32 oneDNN kernel (#36033) · 52bbaae9

由 jakpiase 提交于 2月 11, 2022

* added shape oneDNN kernel

* removed unnecessary import from test

* added skipping tests for GPU

* refactoring

* refactored shape kernel

* added tests in new framework

* removed one line

* minor change

* added newline at EOF

* added formatting

* added attributes as extra

52bbaae9

Z
[MLU]support c_gen_cncl_id_op run on MLU device (#39336) · 89aa8b1a
由 zn 提交于 2月 11, 2022
```
Co-authored-by: Nzhangna <zhangna@cambricon.com>
```
89aa8b1a

Add profiler node tree implementation (#39316) · f38c2e5c

由 chenjian 提交于 2月 11, 2022

* add event node implementation

* modify profiler.stop interface

* fix according to review

* fix file mode

* modify class method name in event_node.cc

* modify LLONG_MAX to ULLONG_MAX

* fix ci error

* fix ci error

f38c2e5c

Z
Support different dtypes of inputs for elementwise ops (#38859) · bf305033
由 Zhang Ting 提交于 2月 11, 2022
```
* improve backward performance

* support different dtypes for elementwise ops
```
bf305033

09 2月, 2022 2 次提交
- C
  
  move stream into pten (#39392) · 266955a9
  由 Chen Weihang 提交于 2月 09, 2022
  
  266955a9
- Q
  
  [MLU]fix compile and add cncl (#39394) · a7d08db9
  由 qipengh 提交于 2月 09, 2022
  
  a7d08db9
08 2月, 2022 5 次提交

Fix to #38126 (#39097) · f884edb9

由 Jacek Czaja 提交于 2月 08, 2022

* - 38126 potential fix

* - fix

* - build fix

* - another candidate fix

* - compilation fix

* - another fix

* - Fix to activation of NHWC being first oneDNN op in chain on oneDNN ops

* - compilation fix

* - added NHWC reotating for elementwise being first op

* - compilation fix

* - compilation fix

* - Added UT

* - cosmetic fixes

f884edb9

W
[PTEN] Update gpu_context. (#39359) · 24103cbb
由 Wilber 提交于 2月 08, 2022
```
* gpu_context..

* update

* update

* update
```
24103cbb
N
Replace clip, bce_loss, full and full_like with elementwise (#39197) · 424700ff
由 niuliling123 提交于 2月 08, 2022
```
* Replace clip, bce_loss, full and full_like with elementwise
```
424700ff

Support allocate CUDA managed memory (#39075) · 42910361

由 From00 提交于 2月 08, 2022

* Rough implementation for experiment

* Support allocate cuda managed memory

* Fix CI error

* Modify UT

* Check whether support memory oversubscription

* Fix ROCM Compile error

* Fix ROCM Compile error

* Fix UT cuda_managed_memory_test

* Set UT timeout to 40

* Add UT OOMExceptionTest

* Set UT timeout to 50

42910361

L

[bf16] change bf16 print behavior (#39370) · 96964ff8
由 Leo Chen 提交于 2月 08, 2022

96964ff8

07 2月, 2022 1 次提交
- T
  
  add sequence_conv op in xpu place (#39025) · fee4316d
  由 tanzhipeng 提交于 2月 07, 2022
  
  fee4316d
06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
30 1月, 2022 2 次提交
- L
  
  delete FLAGS_run_pten_kernel (#39330) · 2d6d6fa1
  由 Leo Chen 提交于 1月 30, 2022
  
  2d6d6fa1
- feat(cncl_mlu): add cncl dev for mlu distributed backend (#39294) · d28f6f7b
  由 mhhhh1 提交于 1月 30, 2022
  
  d28f6f7b
29 1月, 2022 2 次提交

Add xpu2 compiler (#37254) · 92da5055

由 Liu-xiandong 提交于 1月 29, 2022

* Add XPU compiler for paddle, test=develop

* clean code

* clean useless code

* clean useless code

* clean useless code

* test

* add include path

* use clang compiler

* xpu2.cmake

* XPU2 compiler passed

* update

* update after pten

* combination the WITH_XPU and WITH_XPU2

* update the fuse operation in WITH_XPU and WITH_XPU2

* update

* update

* update

* fix the merge error

* update

* update the code

* update the code

* add run_kp_kernel flag

* update

* update

* fix prepared type_ bug

* clean and update the code

* reset the kernel_primitives

* update

* clean the code

* delete useless comment

* fix the bug in WITH_XPU

* update

* update

* modify the abi

* delete some useless code

* Parameter automation in xpu compilation

* Parameter automation in xpu compilation

* delete kps in cmake

* delete useless comment

* clean the code

* clean the code

92da5055

Q
fix kunlun2 softmax unitest bug (#39274) · 23bb2836
由 QingshuChen 提交于 1月 29, 2022
```
* fix kunlun2 softmax unitest bug
*test=kunlun

* minor
```
23bb2836

28 1月, 2022 2 次提交

F

Remove macro for GetGpuBasePtr (#39279) · 9a001c09
由 From00 提交于 1月 28, 2022

9a001c09

Host tracer and ProfilerController (#39230) · 7c489c2e

由 liutiexing 提交于 1月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* split template

* Add Profiler and HostTracer

* update

* update

* update

* updateg

* fix cmake
Co-authored-by: Nliutiexing <liutiexing@google.com>

7c489c2e

27 1月, 2022 4 次提交

[PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c

由 Aurelius84 提交于 1月 27, 2022

* Support allocate_from in Tensor and allocate_data in Context

* fix #ifdef CUDA

* fix cycle depends

* fix test_xxx_dev_api failed

* fix windows compiling error

* fix unittest

* modify into PImpl

* fix selected rows

* add TODO comment

* refine interface according reviewer

5631da9c

Q

[MLU] add compile ci scripts for MLU, test=mlu_ci (#39122) · 56410b4a
由 Qi Li 提交于 1月 27, 2022

56410b4a

[PluggableDevice] Add custom kernel support based on pten kernel management (#38848) · a8879215

由 Aganlengzi 提交于 1月 27, 2022

* [Demo] custom kernel based on pten kernel

* merge and npu custom work well

* del comments

* delete other code

* fix CUDAContext

* fix not found small_vector.h

* support NPU

* fix NPUContext

* fix DeviceContext support

* add UT

* fix call

* add UT

* fix

* fix for comments and ut

* add MACRO control

* fix multi input output

* support env CUSTOM_DEVICE_ROOT

* deal with special cases

* fix for Windows

* try coverage with test_custom_kernel_dot.py

* fix test_custom_kernel_dot

* fix test_custom_kernel_dot

* fix merge

* fix merge

* fix CI

* update

* merge and fix

* remove WITH_CUSTOM_KERNEL

* fix merge

* merge and fix

* fix ut

* fix ut for mac

* add more UT

* add more UT

* fix

a8879215

A

[NPU] fix aarch64 deps (#39257) · 80dfa010
由 Aganlengzi 提交于 1月 27, 2022

80dfa010

26 1月, 2022 4 次提交

[pten] remove deprecated fluid op kernel for pten (#38842) · 3ab9aef1

由 Leo Chen 提交于 1月 26, 2022

* update cmake file to remove fluid kernel

* add pten declaration.h to where pybind.h used

* fix sync_bn and tensorrt_engine

* refine detection_library

* fix interpreter_core

* support eager legacy

* fit eager legacy for pten

* fall back to cpu if not found kernel

* fix compile problem

* fix compile problem

* refine fallback logic

* fit operator.run()

* fix xpu compile

* fit for new_exec

* add REGISTER_OP_WITHOUT_GRADIENT

* un-cache pt_kernel_context

* fix compile

* fix cudnn

* fix compiling with on_infer

* fix mkldnn

* fix isfinite_v2

* fix xpu problem

* fix op_device

* refine fallback for xpu

* fix xpu compile

* merge develop

* refine code format

* fix compile

* fix compile

* add data_transfer

* fix PreparePtenData

* fix cpu context

* merge develop

* fix compile

* fix error device context

* fix xpu

* fix dev_ctx

3ab9aef1

[IPU] sync misc changes 01 (#38876) · 4efbebea

由 Allen Guo 提交于 1月 26, 2022

* sync misc changes

* apply comments 01

* fix compile error

* remove is_ipu_place check

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* sync changes

* restore cmake

* update ir cmake and setup.py

* update inference_lib cmake

* split PR
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

4efbebea

W
[PTEN] cpu_context add eigen deps (#39234) · bd5c962d
由 Wilber 提交于 1月 26, 2022
```
* add eigen deps

* update
```
bd5c962d

add sigmoid cross entropy with logits to kl2 (#38915) · fd44de58

由 houj04 提交于 1月 26, 2022

* add sigmoid cross entropy with logits to kl2. test=kunlun

* add sigmoid cross entropy with logits to kl2. test=kunlun

* follow comments. test=kunlun

fd44de58

25 1月, 2022 5 次提交

add trace event data structure definition (#39109) · 57b2033b

由 chenjian 提交于 1月 25, 2022

* add trace event data structure definition

* convert enum item to string for cupti enum explaination

* modify paddle_enforce_eq description

57b2033b

J
[MLU]add mlu kernel for split and concat (#39020) · ac3dc0bb
由 joeqiao12 提交于 1月 25, 2022
```
* [MLU]add mlu kernel for concat and split op

* delete device_context DEPS
```
ac3dc0bb

Optimize nearest_interp forward (#38528) · 232bbce2

由 Lijunhui 提交于 1月 25, 2022

* init commit

* remove comments

* remove nchw branch

* optimize code

* apply fast div mod in 1D kernel, rm 3D kernel

* move init of FastDivMode to CPU

* 3D kernel for nchw, FastDiv for 1D kernel

* debug done. process boundary

* 2^n

* optimize

* optimize

* change code & optimize code

232bbce2

W

[PTEN] Add xpu context. (#39098) · c1e5a393
由 Wilber 提交于 1月 25, 2022

c1e5a393

[PTen] Migrate string tinyformat errors and part of enforce into pten (#39051) · 6ca49164

由 xiongkun 提交于 1月 25, 2022

* transfer: string tinyformat errors and part of enforce into pten

* remove comment

* fix by code review

* assert is not compile in -DNDEBUG

* add string as dependences of paddle_inference

6ca49164

24 1月, 2022 3 次提交

Remved redundant defintions of likely/unlikely (#38911) · 43919d0a

由 Jacek Czaja 提交于 1月 24, 2022

* - more unlikely

* - compilation fix

* - removed redundant definition

* - fix

* - Fixes

* - compilation fix for windows

43919d0a

[Pten] Migration of eigen numeric extensions and functors in paddle/fluid/operatos/eigen (#39124) · a1e40dc6

由 Feiyu Chan 提交于 1月 24, 2022

* migration of functors in paddle/fluid/operators/eigen and paddle/fluid/platform/eigen_ext.h
* update path of data types like float16.h in includes in extensions.h

a1e40dc6

[PTEN] Move dynload from fluid to pten. (#39120) · 3c1dc6f6

由 Wilber 提交于 1月 24, 2022

* move dynload from fluid to pten.

* fix ci compile

* fix windows ci compile.

* update

* update

* fix compile error

3c1dc6f6

21 1月, 2022 2 次提交
- T
  refactor unittest for kunlun (#38772) · 4f1fef60
  由 TTerror 提交于 1月 21, 2022
```
* refactor unittests for kunlun

* refactor unittests for kunlun, test=kunlun
```
  4f1fef60
- A
  [PTen]Migrate Dim and DDim from paddle::framework into pten namespace (#39053) · 4e23ba32
  由 Aurelius84 提交于 1月 21, 2022
```
* Migrate Dim and DDim from paddle::framework into pten namespace

* fix paddle::framework::Array

* fix framework::Array
```
  4e23ba32

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致