- 21 3月, 2023 1 次提交
-
-
由 iSerendipity 提交于
* move DataType from paddle::experimental to phi * convert namespace * convert namespace * convert namespace * clarify namespace * convert more datatype * Revert "convert more datatype" This reverts commit 083b462959e6a22d4d8767707b628b95b396642e. * convert more in auto_code_generator * fix conflicts for XPU * fix namespace conflicts * fix errors * Revert "fix errors" This reverts commit f9d9958b54ee32141112274c8a5c3c381ab0f876. * fix errors * fix formatting
-
- 16 3月, 2023 1 次提交
-
-
由 Huang Jiyi 提交于
* remove fluid thread_data_registry * update * fix bug
-
- 15 3月, 2023 1 次提交
-
-
由 JingZhuangzhuang 提交于
-
- 16 1月, 2023 1 次提交
-
-
由 zlsh80826 提交于
* Update warpctc for cuda-12 * Deprecate cudaProfilerInitialize for CUDA > 11 * Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040 * Add the missing thrust header
-
- 20 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move dropout_impl from fluid to phi * move cuda_graph_with_memory_pool from fluid to phi * update namespace * remove cuad_graph in fluid * fix mac-build * fix bugs * correct CodeStyle * fix mac-build * fix mutable_data * fix stl include * fix copy param
-
- 09 12月, 2022 1 次提交
-
-
由 PuQing 提交于
-
- 08 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move cuda_graph from fluid to phi * move device_memory_aligment from fluid to phi * Revert "move device_memory_aligment from fluid to phi" This reverts commit b92fcd39a0a50fdac13278f49be0237a85f3a13f. * update xpu cmake
-
- 05 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
-
- 28 11月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* decouple cudnn_desc.h from fluid * move cudnn_desc.h from fluid to phi * fix bugs * decouple cudnn_helper.h from fluid * fix bugs * move cudnn_helper.h from fluid to phi * add fluid cudnn_helper.h * move miopen_desc.h from fluid to phi * move miopen_helper.h from fluid to phi * fix bugs * move gpu_dnn.h from fluid to phi * fix bugs * update copyright year * simplify gpu_dnn.h in fluid * fix bugs * fix xpu build bug * fix compile bug * fix bug
-
- 24 11月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* rm dependence to "convert_utils.h" in some files * fix bugs * replace DataType2String with DataTypeToString * replace framework::DataTypeSize with phi::SizeOf * mv convert_function from fluid to phi and rm old map * recommit with pre-commit * repalce ProtoVarType with ProtoDataType and update comment. * fix error about include "dnnl.hpp" * revert add dep mkldnn to convert_utils in phi * add mkldnn deps in convert_utils.h in phi * move deps to convert_utils.h in phi
-
- 22 11月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move "paddle/phi/backends/gpu/gpu_device_function.h" to phi * update copyright years * rm "fluid/platform/device/gpu/gpu_device_function.h" in phi * rm dependence to "gpu_device_function.h" in fluid * rm gpu_device_function.h etc in fluid * fix rocm-complie bugs * fix cuda_helper_test.cu bugs
-
- 18 11月, 2022 3 次提交
-
-
由 zyfncg 提交于
* fix bug of zero_allocator in host * fix test compile bug * add unittest * update test
-
由 Tian Zheng 提交于
* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation * Fix macro * Add implementation for conv_kernel and conv_grad_kernel * Modification after rebase onto latest develop * Modify plan cache to comply with the API of phi::autotune * Refactor to reduce duplicate code * Review fix: - move functions in conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu - add const specifier for input tensor - add logging when plans fail to execute - move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h * - move plan building outside of cache * Fix ROCM build
-
由 Wang Xin 提交于
* remove "gpu_primitives.h" in fluid namespace * fix PR-CI-GpuPS fail * fix PR-CI-GpuPS fail
-
- 17 11月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add vectorized bfloat16 atomicAdd * fix compile error * fix compile error again * fix V100 compile error * fix V100 compile again
-
- 07 11月, 2022 1 次提交
-
-
由 HongyuJia 提交于
* move cudnn hardcode outside GetExpectedKernelType * add header file * debug * update interpreter_util with hardcode * update interpreter_util headerfile * solve activation hardcode * debug with CI * add mkldnn_op_list header file * temporarily uncomment mkldnn * temporarily uncomment mkldnn * delete sequence_softmax cudnn hardcode * add hardcode to data_transfer.cc * update data_transfer headerfile * try fix segment fault * update cudnn&miopen_helper * reset HasAttr of DygraphExctnCtx * debug, this commit should pass all CI * debug should pass CI, temporarily disable activation * debug should pass CI * fix default_attr=nullptr bug * clean debug code * Call SetDnnFallback function in the base class * activation fallback to plain kernel * fix default GetExpectedKernelType find wrong kernel * search cudnn kernel instead of fallback * fix cudnn_handle bug * remove tanh use_cudnn * restore tanh use_cudnn * debug tanh * fix tanh bug * delete activation cudnn kernel * polish code
-
- 02 11月, 2022 1 次提交
-
- 01 11月, 2022 1 次提交
-
-
由 HongyuJia 提交于
* move cudnn hardcode outside GetExpectedKernelType * add header file * debug * update interpreter_util with hardcode * update interpreter_util headerfile * solve activation hardcode * debug with CI * add mkldnn_op_list header file * temporarily uncomment mkldnn * temporarily uncomment mkldnn * delete sequence_softmax cudnn hardcode * add hardcode to data_transfer.cc * update data_transfer headerfile * try fix segment fault * update cudnn&miopen_helper * reset HasAttr of DygraphExctnCtx * debug, this commit should pass all CI * debug should pass CI, temporarily disable activation * debug should pass CI * fix default_attr=nullptr bug * clean debug code
-
- 25 10月, 2022 1 次提交
-
-
由 HongyuJia 提交于
-
- 21 10月, 2022 1 次提交
-
-
由 Yuanle Liu 提交于
* fix nvprof_nvtx_push interface bug
-
- 19 10月, 2022 1 次提交
-
-
由 Yuanle Liu 提交于
-
- 11 10月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 30 9月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* support pure bfloat16 * support bf16 linear * update PR to pass CI * tiny fix where_grad_kernel.cu * add bfloat16 to selu_grad to pass CI * fix selu grad compilation error
-
- 28 9月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* remove needless using tensor * remove needless using tensor * resolve conflict * replace tensor using * fix format error * revert needless changing * fix rocm and npu compile error * fix cinn compile error * fix format error * fix mkldnn format error * fix mkldnn format error * fix cinn compile error * fix cinn compile error * fix cinn compile error * resolve conflict
-
- 16 9月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* support int64 non-broadcast * support broadcast case for int64 index * fix bug * support more Arity * remove some codes * upgrade patchelf to v0.15.0 to pass CI build * fix bug * fix patchelf installation * add debug flags * remove useless codes * fix viterbi_decode and set_value op uts * remove always enable int64
-
- 05 9月, 2022 2 次提交
- 12 8月, 2022 1 次提交
-
-
由 Siming Dai 提交于
* add init file * add op definition and infermeta * add kernel definition funcs * add broadcast infer shape * add gpu forward kernel * delete SUB and DIV * add x_grad * add template * add e_grad for min and max * fix small bug * temp commit * temp commit * add e_grad for sum and mean * fix some compile bug * fix compile bugs * fix compile problem * add sum forward unittest * fix broadcast error, add kernel sig, register e_grad, change unit test * fix grad * add temp grad fix * temp commit * add min max unittest * add max, min unittest, fix mul bug * add cpu forward sum and mean * add forward min max, fix mean unittest * add cpu backward min max * fix code-style * add backward sum mean * fix rocm ci * set uniitest timeout * fix bug of x broadcast to e, gpu grad * fix bug of x broadcast to e, cpu grad * rename BOOST_GET_CONST macro * fix rocm ci * mv graph_send_e_recv to graph_send_ue_recv * move out_size to IntArray * add eager op test * fix max pool type bug, add unittest for api * revise api doc * add fp16 for atomic min and max, add unittest * add unittest * add fp16 support for graph_send_recv * fix unittest fp16 bug * change OutSizeTensor to Out_size * move E to Y * add copyright, fix comment * review code * fix thread block size * fix thread block size * change api attribute name: pool_type to reduce_op, compute_type to message_op * change api attribute name, move pool_type to reduce_op, move compute_type to message_op
-
- 01 8月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* remove cudaDeviceContext * remove more template * fix rocm compile * remove alias name CUDADeviceContext * fix compile * fix tests * revert changes
-
- 28 7月, 2022 1 次提交
-
-
由 LiYuRio 提交于
-
- 27 7月, 2022 1 次提交
-
-
由 Yuang Liu 提交于
-
- 19 7月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* compile into one static library * fix xpu compile * fix xpu compile * fix inference compile * fix inference compile * add custom test * revert one file
-
- 14 7月, 2022 2 次提交
-
-
由 Leo Chen 提交于
* build into one static library * move memory/detail to memory/allocation * fix bug * fix profiler * fix framework_proto * fix deps * fix inference compilation * fix rocm compile * follow comments * fix buddy_allocator_test
-
由 wanghuancoder 提交于
* Compilation optimization
-
- 26 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 24 6月, 2022 1 次提交
-
-
由 chenjian 提交于
* record memory and op supplement info * update * update * fix a bug * fix memory recording * fix a bug * update * update * fix a bug * update * fix a bug * fix a bug * fix a bug * Revert "fix a bug" This reverts commit c1d4df52762ba9ae7c7e27cd2ba4fc3a7ed9c7a5. * fix a bug * fix format * fix
-
- 15 6月, 2022 1 次提交
-
-
由 zhouweiwei2014 提交于
* add some kernel(csr*dense->csr, dense*dense->csr) of SparseTensor matmul * fix CI * fix CI * fix comment * fix comment
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 04 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 02 6月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* fix cuda graph sizeof * fix tuple type
-