- 22 5月, 2023 1 次提交
-
-
由 Tian Zheng 提交于
* Add GPU kernel for multiclass_nms3 op * Make multiclass_nms3 gpu kernel output consistent with cpu kernel * Fix API incompatibility * Fix unittests on builds without CUDA * Fix ROCM build * Remove fluid headers; Use default atol for unittest * Change function and variable naming * Add comments; Reduce redundant code * Use paddle test framework
-
- 18 5月, 2023 1 次提交
-
-
由 RedContritio 提交于
* simplify layer_norm_op.cc * support auto generate for op layer_norm * update unittest for composite_layer_norm * remove layer_norm_op.cc from scripts * replace layer_norm_op with generated_op * add get_expected_kernel for layer_norm * update cmake kernel register function for layer_norm_mkldnn_op
-
- 25 4月, 2023 2 次提交
- 18 4月, 2023 1 次提交
-
-
由 张春乔 提交于
-
- 13 4月, 2023 1 次提交
-
-
由 zhangyuqin1998 提交于
* rename PD_REGISTER_GENERAL_KERNEL * Update feed_op.cc * fix * Update strings_empty_kernel.cc
-
- 10 4月, 2023 1 次提交
-
-
由 jjyaoao 提交于
-
- 07 4月, 2023 1 次提交
-
-
由 Wang Xin 提交于
-
- 06 4月, 2023 1 次提交
-
-
由 RedContritio 提交于
-
- 03 4月, 2023 1 次提交
-
-
由 engineer1109 提交于
-
- 30 3月, 2023 1 次提交
-
-
由 huangjiyi 提交于
* update assign_pos * update attention_lstm * update barrier * update batch_fc * update beam_search * update beam_search_decode * update bilateral_slice * fix bug * Handle Structure kernel for InterpreterCore::RunOperator * fix bug * fix rocm compile * fix rocm compile * Revert "fix rocm compile" * test * revert test and update cmake --------- Co-authored-by: Nchenruibiao <chenruibiao@baidu.com>
-
- 29 3月, 2023 1 次提交
-
-
由 huangjiyi 提交于
* fix kp compile * test * Revert "test" This reverts commit 3a1cbfaa0f23e6e06d3dcd8d0b0c28aa63a98e70. * update copyright * update cmake * update cmake * update cmake * update cmake
-
- 23 3月, 2023 1 次提交
-
-
由 Huang Jiyi 提交于
* update * update * update * update * update * fix test
-
- 08 2月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
* unify_kernel * fix compile bugs * modify macro name * perfect code according comment * fix compile bugs * fix compile bugs * fix ci bugs * fix ci bug * fix ci bugs * fix ci bugs * modify code according comment * rm conv_fusion_op
-
- 03 2月, 2023 1 次提交
-
-
由 HappyHeavyRain 提交于
* generate some static graph ops * fix the bug of pow * add REGISTER_ACTIVATION_OP in operators.cmake * modify the file operators.cmake
-
- 17 1月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
* change feed_op to phi kernel * fix ci bugs * fix build bugs * fix ci bugs * fix compile bugs * fix ci bugs * perfect code * perfect comment code * fix install bugs * modify code according comment * remove visitor in feed_op * modify according comment * perfect code according comment * add infershape * fix py3 bugs * fix getexpected kernel type * fix getexpected kernel type * fix ci bugs * add registry for custom device * fix py3 bugs * fix floating point error * fix py3 test bugs
-
- 12 12月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* add new tensor * fix windows compile bugs * fix ci bugs * fix ci bugs * fix ci bugs * perfect according comment * fix ci compile bugs * add raw tensor * fix ci bugs * modify code by comment * delete String
-
- 20 10月, 2022 2 次提交
-
-
由 HongyuJia 提交于
* remove fc mkldnn hardcode * remove useless enum of kFCMKLDNN * fix macro error * update operators.cmake
-
由 JingZhuangzhuang 提交于
* Add infer prune function * Update phi.cmake * Update operators.cmake * add fusion op
-
- 14 10月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* simplify conv_mkldnn op registration * remove custom type value in conv grad op
-
- 11 10月, 2022 1 次提交
-
-
由 HongyuJia 提交于
* solve transpose2, follow #22402 * fix CI cmake * update REGISTER_OP_KERNEL of transpose2
-
- 22 9月, 2022 1 次提交
-
-
由 Sławomir Siwek 提交于
* gaussian random * mkldnn to onednn renaming * fix merge conflicts * remove fluid code * onednn renaming * gelu fwd * sort activations * gelu gradient * remove unused macros * merge conflicts * fix merge conflicts * remove extra contraint from gelu op
-
- 05 8月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* move mkldnn activation kernel * fix compile bugs * fix compile bugs * deal with conflict * fix compile bugs * fix windows compile bugs * mkldnn unittest fix * change mutable to alloc * fix unittest bugs * modify code according comment
-
- 14 6月, 2022 1 次提交
-
-
由 Wilber 提交于
* cmake-lint * update
-
- 04 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 18 5月, 2022 1 次提交
-
-
由 Aganlengzi 提交于
* [NPU] add take_along_axis and take_along_axis_grad ops * [NPU] add take_along_axis and take_along_axis_grad ops * fix ut because cpu kernel can not be fallbacked
-
- 17 5月, 2022 1 次提交
-
-
由 Aganlengzi 提交于
* [NPU] add multinomial op * fix place * deal with cann version * fix for old operator * change another way
-
- 28 3月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* fix assign kernel bug * fix xpu kernel select error * add cudn pinned place * fix copy error * fix infrt error
-
- 08 3月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
[Phi]Move Relu/Cos/Sin/Tan/Acos/Asin/Atan/Sinh/Cosh/Asinh/Acosh/Atanh kernels in Activation to Phi (#40175) * move activation op * adjust code format * fix compile bugs * fix ci bugs * code format adjust * code format adjust2 * activate ci status * modify according to comment
-
- 07 3月, 2022 1 次提交
-
-
由 Ming-Xu Huang 提交于
* Added cuBlasLtHandle_t to device context. * Added fused_gemm_epilogue op. 1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue. 2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2. 2. Act currently only be supported ReLU. (Will add GeLU in the future). * Added UT to fused_gemm_epilogue op. * Added LinearAct Pattern 1. Added LinearAct into graph_pattern_detector.* to define (2.)'s pattern. 2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)). 3. act currently only support ReLU (Will support GeLU in the future). * Added FuseGemmEpiloguePass 1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU} fusion (GeLU will be supported in the future). 2. Only support matmul_v2 from nn.Linear. * Added pybind to BuildStrageter.fuse_gemm_epilogue_. * Added UT for fuse_gemm_epilogue_pass. * GeLU support and EpilogueSingleton 1. Added GeLU support to fused_gemm_epilogue op. 2. Added EpilogueSingleton to cache auxiliary pointer. 3. Added related UTs. * Rename cublaslt_epilogue_opto gemm_epilogue_op.*. * Added both train and infer pattern to LinearAct. 1. Added support of fwd graph with grap_ops linking to LinearAct. 2. Added related changes to fuse_gemm_epilogue_pass for above modification. * Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass. * Added identity activation support to gemm_epilogue_op. * Added Linear Fusion (matmul_v2 + ele_add) 1. Added matmul_v2 + ele_add pattern to LinearActPattern. 2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass. * Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.* * Add fused_gemm_epilogue_grad op. 1. Added fused_gemm_epilogue_grad to support backward epilogue fusion. * Add UTs to fused_gemm_epilogue_grad_op. * Change attribute name in fused_gemm_epilogue_grad_op for clearing. * Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op. * Added ElementwiseAdd+Matmul+Act graph pattern detection. * Fuse backward of Linear( Act(x)) 1. Added backward fusion pass to Linear( Act(x)). 2. Added backward fusion pass to Linear(x). * Added UTs to backward fusion of Linear(Act(x)). * Complete document of arguments to fused_gemm_epilogue_op. * Made arguments of some functions pass by reference. * Modify code with review comments. 1. Made arguments of some function pass by reference. 2. Removed redundant code. 3. Followed Google code style to change code. * Made 'const' code style be consistent * Fixed random seed of python UTs. * Set Compiling constrains to cuBlasLt 1. Require CUDA 11.6+ 2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6. * Code Reivew from Paddle 1. Changed arguments name is_first_gemm to without_x_gradient for clearing. 2. Applied PADDLE_THROW in fused_gemm_epilogue_op. * Remove EpilogueSingleton 1. Applied ReserveSpace to replace Epilogue for passing auxiliary pointers between FWD and BWD. * Fix a logical error and enhance UTs. 1. Added act op count checking in UTs. 2. Fix issue to fuse backward or ReLU(Linear(X)). 3. TODO: solve GELU fusion issues. * Fix Linear and GeLU fusion issues. 1. Modified graph_detech_pattern to fit with both linear wiht gelu or relu. 2. Modified data range in Uts to allow negative values. * Removed fused_gemm_epilogue_op.h. * Rename namespace pten to phi. * Rename name of arguments in fused_gemm_epilogue_op 1. bias -> Bias. 2. out -> Out. 3. reserve_space -> ReserveSpace. * Change EpiloguePassActivationCache as local variable. 1. Removed singleton in EpiloguePassActivationCache. 2. Made EpiloguePassActivationCache as an argument to each pass functions.
-
- 28 2月, 2022 1 次提交
-
-
由 Liu-xiandong 提交于
* [KP] Unify .cu and .xpu files with .kps files * fix CI bug in GPU and modify the list * fix conflict * modify the date
-
- 23 2月, 2022 1 次提交
-
-
由 Liu-xiandong 提交于
* [KP] Add elementwise add xpu, test=develop * modify the File Permissions * modify the copyright time * modify code style * modify code style
-
- 14 2月, 2022 1 次提交
-
-
由 Qi Li 提交于
-
- 29 1月, 2022 1 次提交
-
-
由 Liu-xiandong 提交于
* Add XPU compiler for paddle, test=develop * clean code * clean useless code * clean useless code * clean useless code * test * add include path * use clang compiler * xpu2.cmake * XPU2 compiler passed * update * update after pten * combination the WITH_XPU and WITH_XPU2 * update the fuse operation in WITH_XPU and WITH_XPU2 * update * update * update * fix the merge error * update * update the code * update the code * add run_kp_kernel flag * update * update * fix prepared type_ bug * clean and update the code * reset the kernel_primitives * update * clean the code * delete useless comment * fix the bug in WITH_XPU * update * update * modify the abi * delete some useless code * Parameter automation in xpu compilation * Parameter automation in xpu compilation * delete kps in cmake * delete useless comment * clean the code * clean the code
-
- 26 1月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* update cmake file to remove fluid kernel * add pten declaration.h to where pybind.h used * fix sync_bn and tensorrt_engine * refine detection_library * fix interpreter_core * support eager legacy * fit eager legacy for pten * fall back to cpu if not found kernel * fix compile problem * fix compile problem * refine fallback logic * fit operator.run() * fix xpu compile * fit for new_exec * add REGISTER_OP_WITHOUT_GRADIENT * un-cache pt_kernel_context * fix compile * fix cudnn * fix compiling with on_infer * fix mkldnn * fix isfinite_v2 * fix xpu problem * fix op_device * refine fallback for xpu * fix xpu compile * merge develop * refine code format * fix compile * fix compile * add data_transfer * fix PreparePtenData * fix cpu context * merge develop * fix compile * fix error device context * fix xpu * fix dev_ctx
-
- 10 1月, 2022 1 次提交
-
-
由 Haohongxiang 提交于
* add lstsq gpu kernel * update * add docs_en * modify ut * fix bugs * modify example in docs_en * remove lstsq_op.cu from ROCM cmake * modify docs_en * modify docs_en * modify docs_en * remove unneccessary TensorCopy
-
- 30 12月, 2021 1 次提交
-
-
由 zhiboniu 提交于
LGTM
-
- 24 12月, 2021 1 次提交
-
-
由 zhiboniu 提交于
-
- 20 12月, 2021 1 次提交
-
-
由 fwenguang 提交于
-
- 27 10月, 2021 1 次提交
-
-
由 huangjun12 提交于
* add eigvalsh with is_test * add eigvalsh op * fix backward bug * forward and backward, float and complex, unittest * remove eigvalsh_helper.h * remove changes of cusolver.h * fix unittest * fix unittest bug * update code following eigh * fix test * update lapack * pull develop * update funcor * fix unittest bug * fix details * add tensor_method_func * fix notes
-