1. 08 3月, 2022 1 次提交
  2. 07 3月, 2022 1 次提交
  3. 04 3月, 2022 1 次提交
  4. 02 3月, 2022 1 次提交
    • F
      [Pten] Gru lstm migration (#39729) · e4dba69a
      Feiyu Chan 提交于
      * move sequence2batch
      
      * move lstm and gru
      
      * Add phi/kernels directory into exclusion to stop using hipcc to compile non .cu files in it.
      e4dba69a
  5. 28 2月, 2022 1 次提交
  6. 23 2月, 2022 1 次提交
  7. 19 2月, 2022 1 次提交
    • Z
      [Pten] Add selected_rows kernel for Full (#39465) · 79f8eeca
      zyfncg 提交于
      * Add selected_rows kernel for full
      
      * remove fill_constant register in fluid
      
      * fix bug without GPU
      
      * add jit_kernel_helper dependency for fc
      
      * do some refactor
      
      * add unittest for ops signatures
      
      * add coverage unittest
      
      * fix merge conflict
      
      * fix full selectew_rows bug
      79f8eeca
  8. 18 2月, 2022 1 次提交
  9. 11 2月, 2022 1 次提交
  10. 21 1月, 2022 1 次提交
  11. 30 12月, 2021 1 次提交
    • Z
      Add cusparse and unittest (#38431) · 667dc9f0
      zhangkaihuo 提交于
      
      
          将cuSparse的handle与DeviceContext进行绑定,避免op中进行创建和销毁
          添加对cuSparse中dense和sparse转换的API进行封装
          添加对封装的API的单测
      667dc9f0
  12. 25 11月, 2021 1 次提交
  13. 23 11月, 2021 1 次提交
  14. 24 9月, 2021 1 次提交
    • W
      Add paddle.linalg.solve OP (#35715) · 8caf951c
      Weilong Wu 提交于
      * Add linalg.solve op, test=develop
      
      * Fix a bug caused by accidental deletion
      
      * updated description and fix a bug: missing a comma
      
      * Add linalg.solve op, test=develop
      
      * updated solve op backward logic
      
      * updated solve op backward logic again
      
      * Add linalg.solve Op, test=develop
      
      * Updated and modified to fit CI requirements
      
      * Fix a bug
      
      * 1)Add more test cases; 2)Fix a wrong usage in reduces operation; 3)Remove redundant code
      
      * Remove redundant comments
      
      * 1)Removed redundant code; 2)Updated to enhance code robustness
      
      * Removed redundant code
      
      * Updated API documents
      8caf951c
  15. 22 9月, 2021 1 次提交
  16. 15 9月, 2021 1 次提交
    • P
      [NPU] add beam_search npu op (#34860) · 3760be06
      pangyoki 提交于
      * add beam_search npu op
      
      * fix CMakeList and add unittest
      
      * fix bug of beam search npu op
      
      * fix unittest
      
      * let input ids become int64
      
      * set output ids to int64_t
      
      * delete check_dygraph
      
      * fix beam_width=1
      3760be06
  17. 21 6月, 2021 1 次提交
    • L
      Add AXPY oneDNN handler (#33632) · 773aabc7
      lidanqing 提交于
      * Add oneDNN AXPY handler.
      
      * Add fallback for small tensors.
      
      * Fix ifdefs
      
      * Remove unnecessary namespace prefixes and add missing headers.
      
      * Guard handler_axpy with proper ifdefs.
      
      * Compilation of this function is possible only when Paddle is not build
      with CUDA nor HIP.
      
      * Move AXPY handler code to separate files.
      
      * Use oneDNN AXPY handler in SGD op.
      
      * Use axpy handler only when Paddle is built with oneDNN.
      
      * Add test for SUM BF16 with big rows.
      
      * Fix SFINAE rules for elementwise_add_to.
      
      * Add test case for SGD with big rows.
      
      * update
      
      * update
      Co-authored-by: NAdam Osewski <adam.osewski@intel.com>
      773aabc7
  18. 02 3月, 2021 1 次提交
  19. 16 12月, 2020 1 次提交
    • Y
      添加rocm平台支持代码 (#29342) · 76738504
      Y_Xuan 提交于
      * 添加rocm平台支持代码
      
      * 修改一些问题
      
      * 修改一些歧义并添加备注
      
      * 修改代码格式
      
      * 解决冲突后的代码修改
      
      * 修改operators.cmake
      
      * 修改格式
      
      * 修正错误
      
      * 统一接口
      
      * 修改日期
      76738504
  20. 11 12月, 2020 1 次提交
    • L
      Add the strategy of skipping cc/cu test compilation and execution in CI (#29499) · b5d4a1f3
      LoveAn 提交于
      * Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop
      
      * fix if error with CI_SKIP_TEST, test=develop
      
      * fix add properties to test error on Linux/MAC, test=develop
      
      * fix set test properties of test_code_generator error, test=develop
      
      * remove test codes and advance judgment of file modification on Linux, test=develop
      
      * rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix
      
      * Add branch judgement on Linux, test=develop
      b5d4a1f3
  21. 07 12月, 2020 1 次提交
    • L
      Compiling operator libraries with Unity build (#29130) · 671555ed
      LoveAn 提交于
      * Compiling operator libraries with Unity Build on Windows CPU.
      
      * Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci
      
      * Add option in windows ci script, no_test, test=windows_ci
      
      * Optimize parallel compiling, test=develop
      
      * remove limit of parallel compile and skip some ops in UB, test=develop
      
      * remove changes of header file, test=develop
      
      * remove changes of header file, test=develop
      
      * fix test_eye_op unittest failed, test=develop
      
      * Compiling operator libraries with Unity Build on Linux, test=develop
      
      * set default WITH_UNITY_BUILD=OFF, test=develop
      
      * Move unity build rules into a single file and add comment, test=develop
      
      * optimize parallel compilation, test=develop
      
      * fix undefined reference error on coverage ci, test=develop
      671555ed
  22. 08 11月, 2020 1 次提交
    • Y
      exec ut no more than 15s 1 (#28439) · ba075632
      YUNSHEN XIE 提交于
      * disable ut test_parallel_executor_fetch_isolated_var,test=document_fix
      
      * test for limiting ut exec time as 15S
      
      * fix an error caused by cannot find ut
      
      * fix some error
      
      * can not find test_transformer
      
      * fix error caused by ut not run in windows
      
      * fix error caused by Compiler Options
      
      * fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt
      
      * setting timeout value to 120s for old ut
      
      * add the timeout value setting
      
      * fix error caused by ut only run in coverage_ci
      
      * add analyzer_transformer_profile_tester
      
      * fix some error
      
      * fix some error
      
      * fix error with inference option
      
      * fix error with inference option setting as ON_INFER
      
      * add some ut to set timeout
      
      * modified some option
      
      * fix error
      
      * fix some timeout error
      
      * fix error
      
      * fix error
      
      * fix timeout for test_analyzer_bfloat16_resnet50
      
      * fix error
      
      * setting timeout properity for some ut
      
      * first pr for new ut timeout as 15S
      ba075632
  23. 22 9月, 2020 1 次提交
  24. 09 9月, 2020 1 次提交
  25. 27 4月, 2020 1 次提交
  26. 24 4月, 2020 1 次提交
  27. 26 3月, 2020 1 次提交
    • Z
      [Paddle-TRT]: Ernie Dynamic shape support. (#23138) · 430b0099
      Zhaolong Xing 提交于
      * add dynamic plugin support.
      test=develop
      
      * change emb eltwise layernorm to math function
      test=develop
      
      * add emb eltwise layernorm
      test=develop
      
      * can run dynamic shape ernie
      test=develop
      
      * fix ci
      test=develop
      
      * add ut for trt ernie dynamic
      
      test=develop
      
      * refine dynamic shape c++ interface.
      test=develop
      
      * fix comments
      test=develop
      
      * fix comments
      test=develop
      430b0099
  28. 11 9月, 2019 1 次提交
    • Y
      Implement the GPU kernel of fc operator (#19687) · a65c728e
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      a65c728e
  29. 05 9月, 2019 1 次提交
  30. 02 2月, 2019 1 次提交
  31. 30 1月, 2019 1 次提交
  32. 29 1月, 2019 1 次提交
  33. 24 1月, 2019 1 次提交
    • Y
      Add the CUDA kernel for beam_search op (#15020) · 3008fa12
      Yiqun Liu 提交于
      * Refine the beam_search op and test.
      
      * A basic CUDA implementation of beam_search for small batch_size.
      
      * Implement CUDA kernel for beam_search_op.
      
      * Use multiple CUDA threads in the same block to select the top beam.
      
      * Update the python api of beam_search op.
      
      * Enable extend function in CPU kernel of beam_search op.
      
      * Unify the CUDA codes.
      test=develop
      
      * Unify the CPU kernel of beam_search op.
      
      * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.
      
      * Update the description of beam_search in API.spec.
      
      * Enable the use of CUDA kernel in beam_search op.
      
      * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
      test=develop
      
      * Follow comments.
      test=develop
      
      * Call the CPU kernel for beam_search op when batch_size > 4.
      test=develop
      
      * Remove the except of is_empty op in PrepareData.
      test=develop
      3008fa12
  34. 18 1月, 2019 1 次提交
    • Z
      Tree conv op (#15217) · e2ba9668
      zhaozhehao 提交于
      * refactor tree2col operator with new memory mechanism test=develop
      
      * test=develop
      
      * test=develop
      
      * Modified API according to panyx0718 test=develop
      
      * fix API change according to heavengate test=develop
      
      * Modify API comment test=develop
      e2ba9668
  35. 04 1月, 2019 1 次提交
  36. 17 12月, 2018 1 次提交
  37. 05 12月, 2018 1 次提交
  38. 03 12月, 2018 1 次提交
  39. 22 11月, 2018 1 次提交
    • W
      Windows/online (#14474) · d9a1f3e5
      wopeizl 提交于
      * add recordio support
      
      * disable the openblas multi-thread on windows since no support
      adjust the python script
      
      * code style
      
      * code style
      test=develop
      
      * add create_recordio_file_reader back
      
      * fix code style
      test=develop
      
      * fix the gtest.cmake on windows
      
      * fix cc_test on windows
      
      * fix the win build
      test=develop
      
      * remove fused compile support on windows
      test=develop
      
      * add the jit support
      test=develop
      
      * add the jit support, test=develop
      
      * add the jit support, test=develop
      
      * add the jit back
      fix compile error on windows
      
      * rollback test=develop
      
      * test case fix
      
      * disable DSO by default on windows
      
      * exclude warpctc_op on windows
      
      * exclude the dynload_warpctc out on windows
      test=develop
      
      * fix the scripts error
      test=develop
      
      * disable avx on windows by default
      test=develop
      
      * re-organize the cmake file
      
      * disable mkl on windows by default
      
      * add warp_ctc back
      
      * fix the dependency
      
      * fix the dependency
      
      * fix the build issue on windows
      
      * remove unsupported flag on windows
      
      * code style
      
      * code style
      test=develop
      
      * fix issue
      
      * add profiler, parallel_executor back
      
      * clean up the pre-definitions on windows
      
      * fix build issue
      
      * test=develop
      d9a1f3e5
  40. 19 11月, 2018 1 次提交
    • Y
      Optimize the layer_norm operator with AVX intrinsic function (#14417) · f4c869d8
      Yihua Xu 提交于
      * Optimize layer_norm operator with AVX intrinsic functions
      
      * Revert the wrong modifications
      
      * Implement the jit kernel for layer_norm operator
      
      * Add math headfile to fix the compile issue (test=develop)
      
      * Add math headfile to fix the compile issue (test=develop)
      
      * Fixed the intrinsic headfile issue (test=develop)
      
      * Fix the conflicts (test=develop)
      
      * Revert for CUDA compiler (test=develop)
      
      * Fixed the cuda depency (test=develop)
      
      * Fix the marco issues (test=develop)
      f4c869d8