- 28 11月, 2022 2 次提交
 - 
- 
由 huangjiyi 提交于
* decouple cudnn_desc.h from fluid * move cudnn_desc.h from fluid to phi * fix bugs * decouple cudnn_helper.h from fluid * fix bugs * move cudnn_helper.h from fluid to phi * add fluid cudnn_helper.h * move miopen_desc.h from fluid to phi * move miopen_helper.h from fluid to phi * fix bugs * move gpu_dnn.h from fluid to phi * fix bugs * update copyright year * simplify gpu_dnn.h in fluid * fix bugs * fix xpu build bug * fix compile bug * fix bug
 - 
由 YuanRisheng 提交于
* Fix onednn kernel bugs * fix gpu bugs
 
 - 
 - 25 11月, 2022 1 次提交
 - 
- 
由 sneaxiy 提交于
 
 - 
 - 24 11月, 2022 1 次提交
 - 
- 
由 PuQing 提交于
 
 - 
 - 23 11月, 2022 1 次提交
 - 
- 
由 sneaxiy 提交于
* make bfloat16 implicit convert to float/double * fix bfloat16_test ut compile
 
 - 
 - 21 11月, 2022 1 次提交
 - 
- 
由 LiYuRio 提交于
 
 - 
 - 18 11月, 2022 2 次提交
 - 17 11月, 2022 1 次提交
 - 
- 
由 sneaxiy 提交于
* add vectorized bfloat16 atomicAdd * fix compile error * fix compile error again * fix V100 compile error * fix V100 compile again
 
 - 
 - 16 11月, 2022 1 次提交
 - 
- 
由 Wang Xin 提交于
 
 - 
 - 10 11月, 2022 1 次提交
 - 
- 
由 pangyoki 提交于
change cudnn error to cuda error if compiled cuda version is incompatible with installed cuda version (#47743) * fix cudnn error * fix * fix * fix
 
 - 
 - 04 11月, 2022 1 次提交
 - 
- 
由 pangyoki 提交于
 
 - 
 - 01 11月, 2022 1 次提交
 - 
- 
由 Chen Weihang 提交于
* add extra attr property set * add type_info for all context * add onednn context to all context * fix context compile error * simplify conv kernel args * pass runtime attr into dev_ctx * fix marco error * clear conv_grad_kernel extra args * merge conv_grad_grad into conv_grad * clear conv2d_grad_grad extra attrs * clear yaml and eager extra attr * fix conv1d error * change to thread local * fix npu compile failed * try to fix windows compile failed * add conv2d onednn phi kernel * fix ci bugs (#36) * fix compile bugs (#38) * fix extra input transform bug (#39) * support dynamic created attr (#40) * reset extra info gen code * rm conv_grad_grad kernel * reimpl pass attr adapting * add int attr support * remove vector inputnames creating * fix map at error * Update paddle/phi/kernels/onednn/conv_grad_kernel.cc Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com> * remove useless extra attrs * replace mkldnn_engine by onednn_engine Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com> Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
 
 - 
 - 16 9月, 2022 1 次提交
 - 
- 
由 sneaxiy 提交于
* support int64 non-broadcast * support broadcast case for int64 index * fix bug * support more Arity * remove some codes * upgrade patchelf to v0.15.0 to pass CI build * fix bug * fix patchelf installation * add debug flags * remove useless codes * fix viterbi_decode and set_value op uts * remove always enable int64
 
 - 
 - 06 9月, 2022 1 次提交
 - 
- 
由 Wilber 提交于
 
 - 
 - 05 9月, 2022 1 次提交
 - 
- 
由 sneaxiy 提交于
 
 - 
 - 24 8月, 2022 1 次提交
 - 
- 
由 Rayman 提交于
* 【Hackathon No.34】优化 poisson op * [poisson] code style fix * modify code style * prevent from big number * modify code style * modify code style * modify import * modify import * modify code style
 
 - 
 - 10 8月, 2022 1 次提交
 - 
- 
由 Leo Chen 提交于
* set cuda device before run * add header file * fix compile
 
 - 
 - 05 8月, 2022 1 次提交
 - 
- 
由 Qi Li 提交于
 
 - 
 - 01 8月, 2022 1 次提交
 - 
- 
由 Wilber 提交于
* infer context fix place error. * update * update
 
 - 
 - 29 7月, 2022 1 次提交
 - 
- 
由 Leo Chen 提交于
* init * move CUDAStream to phi * fix compilation * merge develop * add stream_owned_ member * split cuda_stream.h * fix cpu compile * fix constructor * fix bug * fix windows compile * fix inference test_levit * fix windows tests
 
 - 
 - 26 7月, 2022 1 次提交
 - 
- 
由 Wilber 提交于
* multi stream support handle lazy init. * support eigen lazy init * update * fix ci problem
 
 - 
 - 19 7月, 2022 1 次提交
 - 
- 
由 Leo Chen 提交于
* compile into one static library * fix xpu compile * fix xpu compile * fix inference compile * fix inference compile * add custom test * revert one file
 
 - 
 - 12 7月, 2022 1 次提交
 - 
- 
由 Chen Weihang 提交于
* clean glog header in public header * move marco pos
 
 - 
 - 15 6月, 2022 2 次提交
 - 
- 
由 zhouweiwei2014 提交于
* add some kernel(csr*dense->csr, dense*dense->csr) of SparseTensor matmul * fix CI * fix CI * fix comment * fix comment
 - 
由 Yiqun Liu 提交于
Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to support large tensor. (#43506) * Change some data type from int to int64_t in GetGpuLaunchConfig1D to support large tensor. * Use int64_t in ElementwiseKernel as index type to support large tensor.
 
 - 
 - 13 6月, 2022 1 次提交
 - 
- 
由 zhangkaihuo 提交于
* use GpuMemcpy and GpuMemset * sparse convert kernel support double dispatch by indices dtype * cudaMemcpyKind->gpuMemcpyKind
 
 - 
 - 08 6月, 2022 1 次提交
 - 
- 
由 xiaoxiaohehe001 提交于
 
 - 
 - 07 6月, 2022 1 次提交
 - 
- 
由 Wilber 提交于
 
 - 
 - 05 6月, 2022 1 次提交
 - 
- 
由 Sing_chan 提交于
 
 - 
 - 04 6月, 2022 1 次提交
 - 
- 
由 Sing_chan 提交于
 
 - 
 - 19 5月, 2022 1 次提交
 - 
- 
由 Chen Weihang 提交于
* refine enforce code * refine enforce code * fix compile failed * fix infrt failed
 
 - 
 - 13 5月, 2022 1 次提交
 - 
- 
由 Wilber 提交于
 
 - 
 - 12 4月, 2022 2 次提交
 - 
- 
由 Chen Weihang 提交于
* add context pool unittests * fix timeout * polish details * change option pos * add dll decl for wndows * fix pre-commit error * move dll_decl and export DeviceContext * replace lost dll_decl.h
 - 
由 JingZhuangzhuang 提交于
* fix_paddle_numel_check * fix_paddle_numel_check
 
 - 
 - 09 4月, 2022 1 次提交
 - 
- 
由 limingshu 提交于
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com> 
 - 
 - 01 4月, 2022 1 次提交
 - 
- 
由 chentianyu03 提交于
* add interploate cpu kernel * fix nullptr bug * add interpolate gpu kernel * fix unit test error * remove raw kernels * add cuda kernel impl * add infermeta * recover accidentally deleted kernels in interpolate op * fix grad x_grad name error * remove interpolate_v2_op.h * rm unused codes * fix xpu build error * fix build error * fix namespace error * add register header for nup * fix infermeta error * modify by review * add the missing args in test_trt_convert_nearest_interp_v2
 
 - 
 - 25 3月, 2022 2 次提交
 - 
- 
由 FlyingQianMM 提交于
* add maximum limit for grid of reduce, elementwise and gather * add {} after if - 
由 Qi Li 提交于
 
 - 
 - 17 3月, 2022 1 次提交
 - 
- 
由 Wilber 提交于
* infrt add trt engine * fix register * file generate * fix ci error * fix conflict * add copyright * update * update * update * update engine name * refactor trt code * update * update * update * update * fix conflict * update * fix compile with cuda
 
 -