提交 · 270f25e91d45cb7b0d0f7fd621ba5ff68ed151ec · Crayon鑫 / Paddle

15 7月, 2022 1 次提交
- Z
  support KL2 multi-card training, *test=kunlun (#43889) · 270f25e9
  由 zhangxiaoci 提交于 7月 15, 2022
```
* update xccl lib
    * use separate streams for compute/comm on XPU
    * add broadcast op to xpu2_op_list
```
  270f25e9
14 7月, 2022 2 次提交

[Phi]Improve the mechanism for mkldnn kernel in PHI (#43941) · e9b4d0be

由 YuanRisheng 提交于 7月 14, 2022

* adapt mkldnn kernel in PHI

* fix ci compile bugs

* fix compile bugs

* fix compile bugs

* fix compile bugs

* fix compile bugs

* delete comment

* fix compile bugs in windows-inference

* delete code for converage

* modify code by review

* modify code by review

* add todo

* fix compile bugs

* fix compile bugs

* fix compile bugs

* fix unittest bugsx

e9b4d0be

R
[CustomDevice] add custom ccl 1/2 (#44294) · d88e77a7
由 ronnywang 提交于 7月 14, 2022
```
* [CustomDevice] add custom ccl api

* add ut
```
d88e77a7

13 7月, 2022 1 次提交
- R
  [CustomKernel] capi add eager mode support (#44164) · 033ef5e9
  由 ronnywang 提交于 7月 13, 2022
```
* [CustomKernel] add capi eager mode support

* add ut

* add capi test
```
  033ef5e9
12 7月, 2022 1 次提交
- C
  [PHI] Clean glog header in public header (#44216) · b0c9f24a
  由 Chen Weihang 提交于 7月 12, 2022
```
* clean glog header in public header

* move marco pos
```
  b0c9f24a
06 7月, 2022 1 次提交
- H
  
  minor fix VLOG for xpu. test=kunlun. (#44099) · 502062da
  由 houj04 提交于 7月 06, 2022
  
  502062da
05 7月, 2022 1 次提交
- R
  Dataloader add custom device support (#44013) · a0dc361c
  由 ronnywang 提交于 7月 05, 2022
```
* Dataloader add custom device support

* update test=document_fix
```
  a0dc361c
02 7月, 2022 1 次提交

unify cpu context (#43989) · 09096aeb

由 Leo Chen 提交于 7月 01, 2022

* unify cpu context

* fix init()

* delete test_device_context

* fix test_scalar

09096aeb

28 6月, 2022 1 次提交
- 【Sparse】add SparseTensor mv kernel(csr*dense_vec->dence_vec, coo*dense_vec->dense_vec) (#43668) · 5161a047
  由 zhouweiwei2014 提交于 6月 28, 2022
```
* [Sparse]add SparseTensor mv kernel(csr*dense_vec->dence_vec, coo*dense_vec->dense_vec)

* fix CI
```
  5161a047
24 6月, 2022 2 次提交
- [Sparse] support batch compute of SparseTensor matmul/masked_matmul/softmax (#43703) · eec4e034
  由 zhouweiwei2014 提交于 6月 24, 2022
  
  eec4e034
- X
  
  change svd_cpu_kernel from Eigen to Lapack, speed up the compile from 120s -> 20s (#43784) · bafd8dec
  由 xiongkun 提交于 6月 24, 2022
  
  bafd8dec
18 6月, 2022 1 次提交
- remove unuse cuSparse function (#43626) · 4a08c781
  由 zhouweiwei2014 提交于 6月 18, 2022
  
  4a08c781
16 6月, 2022 1 次提交

[CustomKernel] add custom kernel c api (#42986) · 6fe10181

由 ronnywang 提交于 6月 16, 2022

* [CustomKernel] add custom kernel c api

* update

* update

* fix unable to export capi
Co-authored-by: Nronny1996 <524019753@qq.com>

6fe10181

15 6月, 2022 3 次提交

add some kernels(csr*dense->csr, dense*dense->csr) of SparseTensor matmul (#42935) · 346efe96
由 zhouweiwei2014 提交于 6月 15, 2022
```
* add some kernel(csr*dense->csr, dense*dense->csr) of SparseTensor matmul

* fix CI

* fix CI

* fix comment

* fix comment
```
346efe96

Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to... · 15577630

由 Yiqun Liu 提交于 6月 15, 2022

Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to support large tensor. (#43506)

* Change some data type from int to int64_t in GetGpuLaunchConfig1D to support large tensor.

* Use int64_t in ElementwiseKernel as index type to support large tensor.

15577630

R
Refactor dynload/port.h (#43431) · 332fdd1e
由 Ruibiao Chen 提交于 6月 15, 2022
```
* Refactor port.h

* Remove some unnecessary code

* Fix CI errors
```
332fdd1e

13 6月, 2022 2 次提交
- R
  
  Fix cmakelint errors for some files (#43428) · edf69ae0
  由 Ruibiao Chen 提交于 6月 13, 2022
  
  edf69ae0
- Z
  sparse convertion kernel support secondary dispatch (#43345) · 5752643b
  由 zhangkaihuo 提交于 6月 13, 2022
```
* use GpuMemcpy and GpuMemset

* sparse convert kernel support double dispatch by indices dtype

* cudaMemcpyKind->gpuMemcpyKind
```
  5752643b
09 6月, 2022 1 次提交
- M
  
  [sparse inference] Supporting 2:4 sparse inference (#43179) · 20b38cfa
  由 minghaoBD 提交于 6月 09, 2022
  
  20b38cfa
08 6月, 2022 1 次提交
- X
  
  call_once (#43206) · cad139a7
  由 xiaoxiaohehe001 提交于 6月 08, 2022
  
  cad139a7
07 6月, 2022 1 次提交
- W
  
  [multi-stream] Fix split and concat problem. (#43039) · 8c3777df
  由 Wilber 提交于 6月 07, 2022
  
  8c3777df
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
19 5月, 2022 1 次提交
- C
  [CompileOpt] Refine enforce code and remove boost/variant include (#41093) · ca359fec
  由 Chen Weihang 提交于 5月 19, 2022
```
* refine enforce code

* refine enforce code

* fix compile failed

* fix infrt failed
```
  ca359fec
13 5月, 2022 1 次提交
- W
  
  add gpu resources. (#42723) · 1280f294
  由 Wilber 提交于 5月 13, 2022
  
  1280f294
05 5月, 2022 1 次提交

update xpu depends (#42365) · d90e24ac

由 QingshuChen 提交于 5月 05, 2022

* update xpu depends
*test=kunlun

* minor
*test=kunlun
Co-authored-by: Nroot <root@yq01-sys-hic-p40-0091.yq01.baidu.com>

d90e24ac

04 5月, 2022 1 次提交
- X
  
  fix bug when compiling with cusparse in CUDA version >=11.4 (#42455) · 92fdfe33
  由 XiaoguangHu 提交于 5月 04, 2022
  
  92fdfe33
22 4月, 2022 1 次提交

[WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72

由 Ming-Xu Huang 提交于 4月 22, 2022

* Fix leading dimension setting error in fused_gemm_epilogue_grad_op.

* Add dyload to cuBlasLt functions.

* Added cublasLtMatmulAlgoGetHeuristic to improve performance.

* Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue

* Added UTs to FLAGS_cublaslt_exhaustive_search_times

* Added warmup runs in algo searching of Gemm epilogue.

* Update copyright and documents.

* Fixed error handling.

19650d72

21 4月, 2022 1 次提交
- A
  [CustomDevice] fix macro (#42073) · ec995c59
  由 Aganlengzi 提交于 4月 21, 2022
```
* [CustomDevice] fix macro

* fix
```
  ec995c59
12 4月, 2022 2 次提交

[CustomOp] Add context pool unittests (#41085) · 59ec9599

由 Chen Weihang 提交于 4月 12, 2022

* add context pool unittests

* fix timeout

* polish details

* change option pos

* add dll decl for wndows

* fix pre-commit error

* move dll_decl and export DeviceContext

* replace lost dll_decl.h

59ec9599

J
fix_paddle_numel_check (#41607) · 51cae7f7
由 JingZhuangzhuang 提交于 4月 12, 2022
```
* fix_paddle_numel_check

* fix_paddle_numel_check
```
51cae7f7

09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

01 4月, 2022 1 次提交

[Phi]Interploatd kernels into phi (#40855) · d65a7a46

由 chentianyu03 提交于 4月 01, 2022

* add interploate cpu kernel

* fix nullptr bug

* add interpolate gpu kernel

* fix unit test error

* remove raw kernels

* add cuda kernel impl

* add infermeta

* recover accidentally deleted kernels in interpolate op

* fix grad x_grad name error

* remove interpolate_v2_op.h

* rm unused codes

* fix xpu build error

* fix build error

* fix namespace error

* add register header for nup

* fix infermeta error

* modify by review

* add the missing args in test_trt_convert_nearest_interp_v2

d65a7a46

25 3月, 2022 2 次提交
- F
  add maximum limit for grid of reduce, elementwise, gather and scatter (#40813) · 608a5f55
  由 FlyingQianMM 提交于 3月 25, 2022
```
* add maximum limit for grid of reduce, elementwise and gather

* add {} after if
```
  608a5f55
- Q
  
  [ROCm] fix compile error on DTK21.10, test=develop (#40893) · 41f813e9
  由 Qi Li 提交于 3月 25, 2022
  
  41f813e9
24 3月, 2022 1 次提交
- R
  
  [custom runtime] clear headers (#40845) · d3a43477
  由 ronnywang 提交于 3月 24, 2022
  
  d3a43477
17 3月, 2022 1 次提交

Trt engine. (#40532) · 3082ed46

由 Wilber 提交于 3月 17, 2022

* infrt add trt engine

* fix register

* file generate

* fix ci error

* fix conflict

* add copyright

* update

* update

* update

* update engine name

* refactor trt code

* update

* update

* update

* update

* fix conflict

* update

* fix compile with cuda

3082ed46

16 3月, 2022 1 次提交
- R
  
  clean up DeviceManager in advance manually (#40504) · 23c036d6
  由 ronnywang 提交于 3月 16, 2022
  
  23c036d6
15 3月, 2022 1 次提交
- R
  
  add CHECK_VERSION macro (#40512) · 464f65b1
  由 ronnywang 提交于 3月 15, 2022
  
  464f65b1
14 3月, 2022 1 次提交

fix gpu callback (#40445) · 2c21d240

由 Leo Chen 提交于 3月 14, 2022

* fix gpu conetxt callback

* fix gpu callback

* fix callback early destruct problem

2c21d240

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致