- 09 12月, 2022 1 次提交
-
-
由 PuQing 提交于
-
- 08 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move cuda_graph from fluid to phi * move device_memory_aligment from fluid to phi * Revert "move device_memory_aligment from fluid to phi" This reverts commit b92fcd39a0a50fdac13278f49be0237a85f3a13f. * update xpu cmake
-
- 05 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
-
- 30 11月, 2022 1 次提交
-
-
由 Netpunk 提交于
* migrate transpose_op.cu.h and gpu_utils.h * format code style * fix some problems * format code * reset tranpose_op.cc * test commit * recover transpose_op.h * delete transpose_op.h * adjust header files order in transpose_op.cc
-
- 28 11月, 2022 2 次提交
-
-
由 huangjiyi 提交于
* decouple cudnn_desc.h from fluid * move cudnn_desc.h from fluid to phi * fix bugs * decouple cudnn_helper.h from fluid * fix bugs * move cudnn_helper.h from fluid to phi * add fluid cudnn_helper.h * move miopen_desc.h from fluid to phi * move miopen_helper.h from fluid to phi * fix bugs * move gpu_dnn.h from fluid to phi * fix bugs * update copyright year * simplify gpu_dnn.h in fluid * fix bugs * fix xpu build bug * fix compile bug * fix bug
-
由 YuanRisheng 提交于
* Fix onednn kernel bugs * fix gpu bugs
-
- 25 11月, 2022 1 次提交
-
-
由 sneaxiy 提交于
-
- 24 11月, 2022 1 次提交
-
-
由 PuQing 提交于
-
- 23 11月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* make bfloat16 implicit convert to float/double * fix bfloat16_test ut compile
-
- 21 11月, 2022 1 次提交
-
-
由 LiYuRio 提交于
-
- 18 11月, 2022 2 次提交
- 17 11月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add vectorized bfloat16 atomicAdd * fix compile error * fix compile error again * fix V100 compile error * fix V100 compile again
-
- 16 11月, 2022 1 次提交
-
-
由 Wang Xin 提交于
-
- 10 11月, 2022 1 次提交
-
-
由 pangyoki 提交于
change cudnn error to cuda error if compiled cuda version is incompatible with installed cuda version (#47743) * fix cudnn error * fix * fix * fix
-
- 04 11月, 2022 1 次提交
-
-
由 pangyoki 提交于
-
- 01 11月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* add extra attr property set * add type_info for all context * add onednn context to all context * fix context compile error * simplify conv kernel args * pass runtime attr into dev_ctx * fix marco error * clear conv_grad_kernel extra args * merge conv_grad_grad into conv_grad * clear conv2d_grad_grad extra attrs * clear yaml and eager extra attr * fix conv1d error * change to thread local * fix npu compile failed * try to fix windows compile failed * add conv2d onednn phi kernel * fix ci bugs (#36) * fix compile bugs (#38) * fix extra input transform bug (#39) * support dynamic created attr (#40) * reset extra info gen code * rm conv_grad_grad kernel * reimpl pass attr adapting * add int attr support * remove vector inputnames creating * fix map at error * Update paddle/phi/kernels/onednn/conv_grad_kernel.cc Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com> * remove useless extra attrs * replace mkldnn_engine by onednn_engine Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com> Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
-
- 16 9月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* support int64 non-broadcast * support broadcast case for int64 index * fix bug * support more Arity * remove some codes * upgrade patchelf to v0.15.0 to pass CI build * fix bug * fix patchelf installation * add debug flags * remove useless codes * fix viterbi_decode and set_value op uts * remove always enable int64
-
- 06 9月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 05 9月, 2022 1 次提交
-
-
由 sneaxiy 提交于
-
- 24 8月, 2022 1 次提交
-
-
由 Rayman 提交于
* 【Hackathon No.34】优化 poisson op * [poisson] code style fix * modify code style * prevent from big number * modify code style * modify code style * modify import * modify import * modify code style
-
- 10 8月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* set cuda device before run * add header file * fix compile
-
- 05 8月, 2022 1 次提交
-
-
由 Qi Li 提交于
-
- 01 8月, 2022 1 次提交
-
-
由 Wilber 提交于
* infer context fix place error. * update * update
-
- 29 7月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* init * move CUDAStream to phi * fix compilation * merge develop * add stream_owned_ member * split cuda_stream.h * fix cpu compile * fix constructor * fix bug * fix windows compile * fix inference test_levit * fix windows tests
-
- 26 7月, 2022 1 次提交
-
-
由 Wilber 提交于
* multi stream support handle lazy init. * support eigen lazy init * update * fix ci problem
-
- 19 7月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* compile into one static library * fix xpu compile * fix xpu compile * fix inference compile * fix inference compile * add custom test * revert one file
-
- 12 7月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* clean glog header in public header * move marco pos
-
- 15 6月, 2022 2 次提交
-
-
由 zhouweiwei2014 提交于
* add some kernel(csr*dense->csr, dense*dense->csr) of SparseTensor matmul * fix CI * fix CI * fix comment * fix comment
-
由 Yiqun Liu 提交于
Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to support large tensor. (#43506) * Change some data type from int to int64_t in GetGpuLaunchConfig1D to support large tensor. * Use int64_t in ElementwiseKernel as index type to support large tensor.
-
- 13 6月, 2022 1 次提交
-
-
由 zhangkaihuo 提交于
* use GpuMemcpy and GpuMemset * sparse convert kernel support double dispatch by indices dtype * cudaMemcpyKind->gpuMemcpyKind
-
- 08 6月, 2022 1 次提交
-
-
由 xiaoxiaohehe001 提交于
-
- 07 6月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 04 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 19 5月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* refine enforce code * refine enforce code * fix compile failed * fix infrt failed
-
- 13 5月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 12 4月, 2022 2 次提交
-
-
由 Chen Weihang 提交于
* add context pool unittests * fix timeout * polish details * change option pos * add dll decl for wndows * fix pre-commit error * move dll_decl and export DeviceContext * replace lost dll_decl.h
-
由 JingZhuangzhuang 提交于
* fix_paddle_numel_check * fix_paddle_numel_check
-
- 09 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-