- 19 7月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* compile into one static library * fix xpu compile * fix xpu compile * fix inference compile * fix inference compile * add custom test * revert one file
-
- 18 7月, 2022 2 次提交
-
-
由 zhouweiwei2014 提交于
-
由 ronnywang 提交于
-
- 15 7月, 2022 1 次提交
-
-
由 zhangxiaoci 提交于
* update xccl lib * use separate streams for compute/comm on XPU * add broadcast op to xpu2_op_list
-
- 14 7月, 2022 2 次提交
-
-
由 YuanRisheng 提交于
* adapt mkldnn kernel in PHI * fix ci compile bugs * fix compile bugs * fix compile bugs * fix compile bugs * fix compile bugs * delete comment * fix compile bugs in windows-inference * delete code for converage * modify code by review * modify code by review * add todo * fix compile bugs * fix compile bugs * fix compile bugs * fix unittest bugsx
-
由 ronnywang 提交于
* [CustomDevice] add custom ccl api * add ut
-
- 13 7月, 2022 1 次提交
-
-
由 ronnywang 提交于
* [CustomKernel] add capi eager mode support * add ut * add capi test
-
- 12 7月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* clean glog header in public header * move marco pos
-
- 06 7月, 2022 1 次提交
-
-
由 houj04 提交于
-
- 05 7月, 2022 1 次提交
-
-
由 ronnywang 提交于
* Dataloader add custom device support * update test=document_fix
-
- 02 7月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* unify cpu context * fix init() * delete test_device_context * fix test_scalar
-
- 28 6月, 2022 1 次提交
-
-
由 zhouweiwei2014 提交于
* [Sparse]add SparseTensor mv kernel(csr*dense_vec->dence_vec, coo*dense_vec->dense_vec) * fix CI
-
- 24 6月, 2022 2 次提交
-
-
由 zhouweiwei2014 提交于
-
由 xiongkun 提交于
-
- 18 6月, 2022 1 次提交
-
-
由 zhouweiwei2014 提交于
-
- 16 6月, 2022 1 次提交
-
-
由 ronnywang 提交于
* [CustomKernel] add custom kernel c api * update * update * fix unable to export capi Co-authored-by: Nronny1996 <524019753@qq.com>
-
- 15 6月, 2022 3 次提交
-
-
由 zhouweiwei2014 提交于
* add some kernel(csr*dense->csr, dense*dense->csr) of SparseTensor matmul * fix CI * fix CI * fix comment * fix comment
-
由 Yiqun Liu 提交于
Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to support large tensor. (#43506) * Change some data type from int to int64_t in GetGpuLaunchConfig1D to support large tensor. * Use int64_t in ElementwiseKernel as index type to support large tensor.
-
由 Ruibiao Chen 提交于
* Refactor port.h * Remove some unnecessary code * Fix CI errors
-
- 13 6月, 2022 2 次提交
-
-
由 Ruibiao Chen 提交于
-
由 zhangkaihuo 提交于
* use GpuMemcpy and GpuMemset * sparse convert kernel support double dispatch by indices dtype * cudaMemcpyKind->gpuMemcpyKind
-
- 09 6月, 2022 1 次提交
-
-
由 minghaoBD 提交于
-
- 08 6月, 2022 1 次提交
-
-
由 xiaoxiaohehe001 提交于
-
- 07 6月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 04 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 19 5月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* refine enforce code * refine enforce code * fix compile failed * fix infrt failed
-
- 13 5月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 05 5月, 2022 1 次提交
-
-
由 QingshuChen 提交于
* update xpu depends *test=kunlun * minor *test=kunlun Co-authored-by: Nroot <root@yq01-sys-hic-p40-0091.yq01.baidu.com>
-
- 04 5月, 2022 1 次提交
-
-
由 XiaoguangHu 提交于
-
- 22 4月, 2022 1 次提交
-
-
由 Ming-Xu Huang 提交于
* Fix leading dimension setting error in fused_gemm_epilogue_grad_op. * Add dyload to cuBlasLt functions. * Added cublasLtMatmulAlgoGetHeuristic to improve performance. * Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue * Added UTs to FLAGS_cublaslt_exhaustive_search_times * Added warmup runs in algo searching of Gemm epilogue. * Update copyright and documents. * Fixed error handling.
-
- 21 4月, 2022 1 次提交
-
-
由 Aganlengzi 提交于
* [CustomDevice] fix macro * fix
-
- 12 4月, 2022 2 次提交
-
-
由 Chen Weihang 提交于
* add context pool unittests * fix timeout * polish details * change option pos * add dll decl for wndows * fix pre-commit error * move dll_decl and export DeviceContext * replace lost dll_decl.h
-
由 JingZhuangzhuang 提交于
* fix_paddle_numel_check * fix_paddle_numel_check
-
- 09 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 01 4月, 2022 1 次提交
-
-
由 chentianyu03 提交于
* add interploate cpu kernel * fix nullptr bug * add interpolate gpu kernel * fix unit test error * remove raw kernels * add cuda kernel impl * add infermeta * recover accidentally deleted kernels in interpolate op * fix grad x_grad name error * remove interpolate_v2_op.h * rm unused codes * fix xpu build error * fix build error * fix namespace error * add register header for nup * fix infermeta error * modify by review * add the missing args in test_trt_convert_nearest_interp_v2
-
- 25 3月, 2022 2 次提交
-
-
由 FlyingQianMM 提交于
* add maximum limit for grid of reduce, elementwise and gather * add {} after if
-
由 Qi Li 提交于
-
- 24 3月, 2022 1 次提交
-
-
由 ronnywang 提交于
-
- 17 3月, 2022 1 次提交
-
-
由 Wilber 提交于
* infrt add trt engine * fix register * file generate * fix ci error * fix conflict * add copyright * update * update * update * update engine name * refactor trt code * update * update * update * update * fix conflict * update * fix compile with cuda
-