1. 24 4月, 2020 1 次提交
  2. 26 3月, 2020 1 次提交
    • Z
      [Paddle-TRT]: Ernie Dynamic shape support. (#23138) · 430b0099
      Zhaolong Xing 提交于
      * add dynamic plugin support.
      test=develop
      
      * change emb eltwise layernorm to math function
      test=develop
      
      * add emb eltwise layernorm
      test=develop
      
      * can run dynamic shape ernie
      test=develop
      
      * fix ci
      test=develop
      
      * add ut for trt ernie dynamic
      
      test=develop
      
      * refine dynamic shape c++ interface.
      test=develop
      
      * fix comments
      test=develop
      
      * fix comments
      test=develop
      430b0099
  3. 11 9月, 2019 1 次提交
    • Y
      Implement the GPU kernel of fc operator (#19687) · a65c728e
      Yiqun Liu 提交于
      * Refine the codes related to fc op.
      
      * Add GPU implementation for fc functor.
      
      * Apply fc_fuse_pass in GPU inference.
      test=develop
      
      * Change the cmake for fc op.
      
      * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.
      
      * Add an attribute to set the activation type in fc_op.
      
      * Enhance the unittest of fc_op.
      test=develop
      
      * Remove the declaration of FCOpGrad back to the header file.
      test=develop
      
      * Set default value for newly added arguments in test_fc_op.
      test=develop
      a65c728e
  4. 05 9月, 2019 1 次提交
  5. 02 2月, 2019 1 次提交
  6. 30 1月, 2019 1 次提交
  7. 29 1月, 2019 1 次提交
  8. 24 1月, 2019 1 次提交
    • Y
      Add the CUDA kernel for beam_search op (#15020) · 3008fa12
      Yiqun Liu 提交于
      * Refine the beam_search op and test.
      
      * A basic CUDA implementation of beam_search for small batch_size.
      
      * Implement CUDA kernel for beam_search_op.
      
      * Use multiple CUDA threads in the same block to select the top beam.
      
      * Update the python api of beam_search op.
      
      * Enable extend function in CPU kernel of beam_search op.
      
      * Unify the CUDA codes.
      test=develop
      
      * Unify the CPU kernel of beam_search op.
      
      * Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.
      
      * Update the description of beam_search in API.spec.
      
      * Enable the use of CUDA kernel in beam_search op.
      
      * Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
      test=develop
      
      * Follow comments.
      test=develop
      
      * Call the CPU kernel for beam_search op when batch_size > 4.
      test=develop
      
      * Remove the except of is_empty op in PrepareData.
      test=develop
      3008fa12
  9. 18 1月, 2019 1 次提交
    • Z
      Tree conv op (#15217) · e2ba9668
      zhaozhehao 提交于
      * refactor tree2col operator with new memory mechanism test=develop
      
      * test=develop
      
      * test=develop
      
      * Modified API according to panyx0718 test=develop
      
      * fix API change according to heavengate test=develop
      
      * Modify API comment test=develop
      e2ba9668
  10. 04 1月, 2019 1 次提交
  11. 17 12月, 2018 1 次提交
  12. 05 12月, 2018 1 次提交
  13. 03 12月, 2018 1 次提交
  14. 22 11月, 2018 1 次提交
    • W
      Windows/online (#14474) · d9a1f3e5
      wopeizl 提交于
      * add recordio support
      
      * disable the openblas multi-thread on windows since no support
      adjust the python script
      
      * code style
      
      * code style
      test=develop
      
      * add create_recordio_file_reader back
      
      * fix code style
      test=develop
      
      * fix the gtest.cmake on windows
      
      * fix cc_test on windows
      
      * fix the win build
      test=develop
      
      * remove fused compile support on windows
      test=develop
      
      * add the jit support
      test=develop
      
      * add the jit support, test=develop
      
      * add the jit support, test=develop
      
      * add the jit back
      fix compile error on windows
      
      * rollback test=develop
      
      * test case fix
      
      * disable DSO by default on windows
      
      * exclude warpctc_op on windows
      
      * exclude the dynload_warpctc out on windows
      test=develop
      
      * fix the scripts error
      test=develop
      
      * disable avx on windows by default
      test=develop
      
      * re-organize the cmake file
      
      * disable mkl on windows by default
      
      * add warp_ctc back
      
      * fix the dependency
      
      * fix the dependency
      
      * fix the build issue on windows
      
      * remove unsupported flag on windows
      
      * code style
      
      * code style
      test=develop
      
      * fix issue
      
      * add profiler, parallel_executor back
      
      * clean up the pre-definitions on windows
      
      * fix build issue
      
      * test=develop
      d9a1f3e5
  15. 19 11月, 2018 1 次提交
    • Y
      Optimize the layer_norm operator with AVX intrinsic function (#14417) · f4c869d8
      Yihua Xu 提交于
      * Optimize layer_norm operator with AVX intrinsic functions
      
      * Revert the wrong modifications
      
      * Implement the jit kernel for layer_norm operator
      
      * Add math headfile to fix the compile issue (test=develop)
      
      * Add math headfile to fix the compile issue (test=develop)
      
      * Fixed the intrinsic headfile issue (test=develop)
      
      * Fix the conflicts (test=develop)
      
      * Revert for CUDA compiler (test=develop)
      
      * Fixed the cuda depency (test=develop)
      
      * Fix the marco issues (test=develop)
      f4c869d8
  16. 18 11月, 2018 1 次提交
  17. 17 11月, 2018 1 次提交
  18. 16 11月, 2018 1 次提交
    • W
      Make nce support more distribution. (#13549) · 17226782
      whs 提交于
      * Fix truncated normal.
      
      * Fix.
      
      * Make nce support more distribution.
      
      * Fix API.spec.
      
      * Fix python API.
      
      * Fix.
      test=develop
      
      * Fix API.spec
      test=develop
      
      * Fix sampler.
      
      * Fix order of arguments in python API.
      test=develop
      17226782
  19. 08 11月, 2018 3 次提交
  20. 06 11月, 2018 1 次提交
  21. 05 11月, 2018 1 次提交
  22. 01 11月, 2018 3 次提交
  23. 31 10月, 2018 1 次提交
  24. 30 10月, 2018 1 次提交
  25. 26 10月, 2018 1 次提交
  26. 24 10月, 2018 1 次提交
  27. 23 10月, 2018 1 次提交
    • C
      Refine Split op (#13967) · a7497653
      chengduo 提交于
      * speedup split_op
      test=develop
      
      * speedup split_op
      test=develop
      
      * rename ConcatGrad to Split
      
      * refine concat and split
      test=develop
      
      * fix compile error
      a7497653
  28. 22 10月, 2018 1 次提交
  29. 18 10月, 2018 1 次提交
  30. 17 10月, 2018 1 次提交
  31. 11 10月, 2018 2 次提交
  32. 09 10月, 2018 2 次提交
  33. 29 9月, 2018 2 次提交