1. 11 1月, 2022 1 次提交
    • W
      [cherry-pick]mish trt plugin (#38866) · 4cd8a78a
      wangxinxin08 提交于
      * add mish trt plugin, compile & install success, run error. test=develop
      
      * modify code of mish plugin
      
      * upgrade mish trt plugin
      
      * modify code according to review
      
      * add TRT_NOEXCEPT for mish trt plugin
      
      * add unittest for mish trt plugin
      
      * remove unnecessary check of mish in op_teller.cc
      
      * fix some problem of trt8
      
      * add check and modify unittest while converting mish to trt plugin
      Co-authored-by: Ndengkaipeng <dengkaipeng@baidu.com>
      4cd8a78a
  2. 23 4月, 2021 1 次提交
    • W
      move semantic checks to op_teller (#32279) · 7c38114f
      wenbin 提交于
      * move semantic checks to op_teller
      
      * more ops
      
      * more ops
      
      * revert block related change
      
      * part1
      
      * revert activation
      
      * remove if
      
      * remove const_cast
      
      * reslove conflict
      
      * remove const_cast
      
      * delete useless var
      
      * replace vlog(1) with vlog(3), replace assert with PADDLE_ENFORCE
      
      * down to 19 files
      7c38114f
  3. 02 4月, 2021 1 次提交
  4. 04 2月, 2021 1 次提交
  5. 27 11月, 2020 1 次提交
    • S
      detect tensorRT plugin fp16 in runtime (#27933) · b9e76a01
      Shang Zhizhou 提交于
      * remove -DSUPPORTS_CUDA_FP16 in cuda.cmake
      
      * comile with cuda9
      
      * add some unittest
      
      * notest;test=coverage
      
      * add unittest for trt plugin swish && split
      
      * update ernie unittest
      
      * fix some error message
      
      * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter
      
      * fix comile errror when CUDA_ARCH_NAME < Pascal"
      
      * fix comile error
      
      * update unittest timeout
      
      * compile with cuda9
      
      * update error msg
      
      * fix code style
      
      * add some comments
      
      * add define IF_CUDA_ARCH_SUPPORT_FP16
      
      * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED
      b9e76a01
  6. 24 9月, 2020 1 次提交
    • W
      use iwyu clean include (#27267) · df43905f
      wanghuancoder 提交于
      * use iwyu clean include, test=develop, test=win
      
      * compilation error, test=develop
      
      * fix compilation error2, test=develop
      
      * fix compilation error3, test=develop
      
      * fix compilation error4, test=develop
      
      * fix compilation error5, test=develop
      
      * fix compilation error6, test=develop
      
      * fix compilation error7, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error10, test=develop
      
      * fix compilation error11, test=develop
      df43905f
  7. 01 4月, 2020 1 次提交
  8. 06 1月, 2020 1 次提交
  9. 24 7月, 2019 1 次提交
    • Z
      Update trt5 for paddle-trt (#18645) · 26ae6d49
      Zhaolong Xing 提交于
      * update paddle-trt for:
          1. fix bug: when batch > 2, core in split plugin.
          2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.)
          3. add new attr to dropout.
          4. shuffle channel, swish, relu6 support
          test=develop
      
      * 1. fix ci
      test=develop
      26ae6d49
  10. 25 5月, 2019 1 次提交
    • Z
      TRT: Support set dynamic range in int8 mode. (#17524) · 61221ebc
      Zhaolong Xing 提交于
      * fluid int8 train and trt int8 predict align.
      trt int8 predict init
      op converter
      
      * 2. align fluid int8 train and trt int8 inference.
      enhance quant dequant fuse pass
      enhance op converter, trt engine, trt engine op, trt subgraph pass.
      
      * 3. add delete_quant_dequant_pass for trt
      
      test=develop
      
      * 4. add the missing file
      test=develop
      
      * 5. i modify the c++ interface, but forget to modify the pybind code
      fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
      test=develop
      61221ebc
  11. 12 11月, 2018 1 次提交
  12. 08 11月, 2018 1 次提交
  13. 09 8月, 2018 1 次提交
  14. 25 7月, 2018 1 次提交
  15. 24 7月, 2018 2 次提交
  16. 07 6月, 2018 2 次提交
  17. 06 6月, 2018 1 次提交
  18. 01 6月, 2018 1 次提交
  19. 14 5月, 2018 1 次提交
  20. 03 5月, 2018 1 次提交
  21. 27 4月, 2018 1 次提交
  22. 25 4月, 2018 2 次提交
  23. 23 4月, 2018 1 次提交
  24. 26 2月, 2018 2 次提交
  25. 12 2月, 2018 1 次提交
  26. 10 2月, 2018 2 次提交
  27. 09 1月, 2018 1 次提交
    • Y
      Port WarpCTC Operator (#5107) · b5fda272
      Yiqun Liu 提交于
      * Add Seq2BatchFunctor, which will be used in WarpCTCOp.
      
      * Implement WrapCTCFunctor and WrapCTCKernel.
      
      * Add unittest of warpctc_op.
      
      * Modify the check_output inferface in python unittest framework to allow check a subset of outputs.
      
      * Use absolute offset lod in warpctc_op and related functors.
      
      * Refine the comments of warpctc_op.
      
      * The new python unittest supports checking a subset of the outputs, so revoke the previous change.
      
      * Rename the transform from LoDTensor to Tensor with shape [max_sequence_length, num_sequences, sequence_width] to PaddingSequenceFunctor.
      
      * Update to the newest codes.
      
      * Rename the PaddingSequenceFunctor to PaddingLoDTensorFunctor and remove the computation of dimensions out of the functos.
      b5fda272
  28. 04 8月, 2017 1 次提交
  29. 11 7月, 2017 1 次提交