1. 05 12月, 2022 1 次提交
    • W
      Reverse roll fuse (#46914) · feb68dd1
      Wang Bojun 提交于
      * pass
      
      * pass
      
      * draft version
      
      * share mem opt
      
      * remove sharemem
      
      * add pattern for the case with circle_shift=0
      
      * add UT
      
      * pass opt
      
      * test_fix
      
      * code-commit
      
      * code-style
      
      * code style
      
      * code-style
      
      * ut-fix
      
      * op teller refine
      
      * resolve conflict
      
      * adjust position op_teller list and pass order for swin
      
      * ut code style update
      
      * adjust paddle pass order
      
      * refine pass order
      
      * refine pass order
      
      * refine pass order
      feb68dd1
  2. 01 12月, 2022 1 次提交
  3. 10 11月, 2022 2 次提交
  4. 09 11月, 2022 1 次提交
  5. 26 10月, 2022 1 次提交
  6. 18 10月, 2022 1 次提交
    • W
      Merge layernorm trt fuse (#46320) · 5e9f491e
      Wang Bojun 提交于
      * first version, accuracy corrected
      
      * disable debug print
      
      * use blockReduceSum in phi
      
      * add UT
      
      * add opCompat
      
      * code style
      
      * code refine
      
      * bug fix
      
      * code refine
      
      * test fix
      
      * bugfix
      
      * codesytle fix
      
      * code style
      
      * code-style
      
      * code-style
      
      * code-style
      5e9f491e
  7. 08 10月, 2022 1 次提交
  8. 27 9月, 2022 1 次提交
  9. 15 9月, 2022 1 次提交
  10. 07 9月, 2022 1 次提交
    • W
      Layernorm shift partition (#45736) · 960109af
      wenbin 提交于
      * first commit
      
      * conver done
      
      * correct format
      
      * layernorm_shift_partition
      
      * correct convert
      
      * redefine plugin
      
      * runable
      
      * bug fix
      
      * modify ShiftPartitionPattern
      
      * correct
      
      * add UT
      
      * modify ut
      
      * compile
      
      * modify enforce
      
      * modify UT
      960109af
  11. 19 8月, 2022 1 次提交
  12. 15 7月, 2022 1 次提交
  13. 20 6月, 2022 1 次提交
  14. 09 6月, 2022 1 次提交
  15. 04 6月, 2022 1 次提交
  16. 02 6月, 2022 1 次提交
  17. 17 5月, 2022 1 次提交
  18. 24 11月, 2021 1 次提交
  19. 27 10月, 2021 1 次提交
  20. 26 10月, 2021 1 次提交
  21. 11 10月, 2021 1 次提交
    • W
      add mish trt plugin (#34123) · 2b7b752a
      wangxinxin08 提交于
      * add mish trt plugin, compile & install success, run error. test=develop
      * modify code according to review
      * add TRT_NOEXCEPT for mish trt plugin
      * add unittest for mish trt plugin
      * remove unnecessary check of mish in op_teller.cc
      * fix some problem of trt8
      * add check and modify unittest while converting mish to trt plugin
      Co-authored-by: Ndengkaipeng <dengkaipeng@baidu.com>
      2b7b752a
  22. 12 7月, 2021 1 次提交
    • Z
      [Paddle-TRT] IPluginExt -> IPluginV2 (#33680) · 394f92aa
      zlsh80826 提交于
      * add trt LT version helper
      
      * upgrade PluginTensorRT to IPluginV2Ext
      
      * trt plugin factory is not usable in IPluginV2
      
      * upgrade add plugin api to use IPluginV2
      
      * remove IPlugin register and adapt getSerializeSize(), serialize()
      
      * adapt IPluginV2Layer
      
      * downgrade to IPluginV2
      
      * implement elementwise clone
      
      * add gelu plugin creator and fix gelu serialization bug
      
      * add swish plugin creator and fix swish serialization bug
      
      * format
      
      * fix typo
      
      * add elementwise plugin creator and fix serialization
      
      * add base creator class
      
      * add gelu plugin creator
      
      * add hard swish creator and fix serialization
      
      * add instance norm creator and fix serialization
      
      * add layer norm creator and fix serialization
      
      * add pool creator and fix serialization
      
      * add prelu creator and fix serialization
      
      * add slice creator and fix serialization
      
      * add swish creator and fix serialization
      
      * add instance norm op unittest
      
      * remove redundent api
      
      * fix wrong graph size to enable trt
      
      * instance norm function move to cc
      
      * add trt elementwise ut to trigger coverage
      
      * remove opt cahce to hit serialization coverage
      
      * remove opt cahce to hit serialization coverage
      
      * remove unused code
      
      * remove unused inputs_
      
      * add dbg info
      
      * remove dbg info
      
      * add instance norm serialization
      
      * roll back
      
      * remove comment code
      
      * remove trt plugin registery
      
      * fix prelu dynamic serialization
      
      * add prelu ut and reduce the input size to reduce memory usage
      
      * fix pool dynamic plugin serialization and add ut
      
      * refine pool ut with subtest
      
      * add env for avoiding oom
      
      * reduce test input size & increase pool op ut to 45s
      
      * add the contributor
      
      * remove copyright (will add in contributor)
      
      * remove copyright (will add in contributor)
      394f92aa
  23. 05 6月, 2021 1 次提交
  24. 01 4月, 2021 1 次提交
    • Z
      [Paddle-TRT] add anchor generator op plugin (#31730) · b807e408
      zlsh80826 提交于
      * add anchor generator op plugin
      
      * add anchor generator unit_test
      
      * remove dbg info
      
      * remove redundant line
      
      * replace assertion with paddle enforce
      
      * dynamic plugin replaces assertion with paddle enforce
      
      * anchor generator support dynamic shape on spatial axis
      
      * anchor generator test with fp16, dynamic shape
      
      * add anchor generator test all
      
      * add back main
      
      * reduce test input size to not exceed the timelimit of ci
      
      * change super to InferencePassTest for python2 compatibility
      
      * reuse paddle operator anchor generator
      
      * move creator construct to header with default
      
      * add cuda ifdef
      
      * reduce line
      
      * change super to InferencePassTest for python2 compatibility
      
      * fix anchor generator fp16 serialize setting
      
      * split unittest from test_all
      
      * restrict anchor generator input format before version 7234
      
      * anchor generator only support greater than trt7.1
      
      * change min_graph_size to 2
      
      * min_graph size to 3 if dynamic shape
      
      * reduce dynamic shape size to avoid trt search tactic too long to exceed time limit
      
      * remove anchor from fetch list
      
      * anchor generator support all trt version
      
      * fix memory not allocated but if serialized
      b807e408
  25. 30 3月, 2021 1 次提交
  26. 29 3月, 2021 1 次提交
    • Z
      [Paddle-TRT] roi_align_plugin (#31732) · e3a38d79
      zlsh80826 提交于
      * add roi_align_plugin
      
      * add roi align unit_test
      
      * add roi align serialization
      
      * remove roi align static plugin because of batch dim issue
      
      * refine roi align unittest and add fp16/serialization
      
      * add trt roi align condition to op_teller
      
      * refine error message
      
      * remove unnecessary reshape layer
      e3a38d79
  27. 23 3月, 2021 1 次提交
  28. 03 11月, 2020 1 次提交
    • S
      TensorRT中ernie模型推理性能优化,支持变长输入 (#28367) · ea851796
      Shang Zhizhou 提交于
      * fp16 result ok
      
      * change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS
      
      * auto detect special slice op converter for ernie with trt oss
      
      * ernie oss only support fp16
      
      * fix special_slice_plugin serialize bug
      
      * matmul in tensorrt ok
      
      * ernie unittest ok
      
      * add matmul tensorrt unittest
      
      * remove demo code
      ea851796
  29. 01 9月, 2020 1 次提交
    • Z
      [Paddle-TRT] Stack op plugin (#25605) · ad6e3dd6
      zlsh80826 提交于
      * add stack_op to CMakeLists
      
      * add dim=3 support for scale op
      
      * add trt stack op, test=develop
      
      * remove debug message
      
      * add stack plugin serialize
      
      * remove slice, scale op, will add later
      
      * enhence error message
      
      * revise trt ernie test to conver the stack op CI testi, test=develop
      
      * add stack op serialization
      
      * fix test shape after adding stack op
      
      * remove slice op, will add after implementing serialization
      
      * roll back to min_graph=5 to avoid using slice op
      
      * fix scale op output layer
      
      * implement stack op createPlugin
      
      * use workspace and move the defination to .cu
      
      * move stack plugin creator definition to .cu, test=develop
      ad6e3dd6
  30. 19 4月, 2020 1 次提交
  31. 14 4月, 2020 1 次提交
  32. 08 4月, 2020 2 次提交
  33. 26 3月, 2020 1 次提交
    • Z
      [Paddle-TRT]: Ernie Dynamic shape support. (#23138) · 430b0099
      Zhaolong Xing 提交于
      * add dynamic plugin support.
      test=develop
      
      * change emb eltwise layernorm to math function
      test=develop
      
      * add emb eltwise layernorm
      test=develop
      
      * can run dynamic shape ernie
      test=develop
      
      * fix ci
      test=develop
      
      * add ut for trt ernie dynamic
      
      test=develop
      
      * refine dynamic shape c++ interface.
      test=develop
      
      * fix comments
      test=develop
      
      * fix comments
      test=develop
      430b0099
  34. 07 1月, 2020 1 次提交
  35. 06 1月, 2020 1 次提交
  36. 23 10月, 2019 1 次提交
  37. 24 7月, 2019 1 次提交
    • Z
      Update trt5 for paddle-trt (#18645) · 26ae6d49
      Zhaolong Xing 提交于
      * update paddle-trt for:
          1. fix bug: when batch > 2, core in split plugin.
          2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.)
          3. add new attr to dropout.
          4. shuffle channel, swish, relu6 support
          test=develop
      
      * 1. fix ci
      test=develop
      26ae6d49
  38. 08 3月, 2019 1 次提交
    • N
      5. add static trt load model · f3d164fa
      nhzlx 提交于
      1). add static trt load model
      2). fix bug: when device_id is not 0, the trt will have a bug
      test=develop
      f3d164fa