1. 24 6月, 2021 1 次提交
  2. 27 11月, 2020 1 次提交
    • S
      detect tensorRT plugin fp16 in runtime (#27933) · b9e76a01
      Shang Zhizhou 提交于
      * remove -DSUPPORTS_CUDA_FP16 in cuda.cmake
      
      * comile with cuda9
      
      * add some unittest
      
      * notest;test=coverage
      
      * add unittest for trt plugin swish && split
      
      * update ernie unittest
      
      * fix some error message
      
      * remove repeated judgement of CUDA version in mbEltwiseLayerNormOpConverter
      
      * fix comile errror when CUDA_ARCH_NAME < Pascal"
      
      * fix comile error
      
      * update unittest timeout
      
      * compile with cuda9
      
      * update error msg
      
      * fix code style
      
      * add some comments
      
      * add define IF_CUDA_ARCH_SUPPORT_FP16
      
      * rename IF_CUDA_ARCH_SUPPORT_FP16 to CUDA_ARCH_FP16_SUPPORTED
      b9e76a01
  3. 15 9月, 2020 1 次提交
    • S
      Optimize slice trt plugin (#26970) · 47fdc60e
      Shang Zhizhou 提交于
      * optimize slice TRT plugin
      
      This patch removes unnecessary barrier for data transfer of needed offset,
      so data transfer can be overlap with GPU kernel execution.
      
      This patch also fixes incorrect name of slice plugin. That is, replaces
      "layernorm" with "slice"
      
      test=develop
      
      * add serialize/deserialize to slice plugin
      
      * add static shape slice trt plugin
      
      * fix slice trt op convertor dynamic shape bug
      
      * fix format by clang-format
      
      * fix pylint format error
      
      * fix problems commented by peiyang
      Co-authored-by: NRyan Jeng <rjeng@nvidia.com>
      47fdc60e
  4. 15 6月, 2020 1 次提交
  5. 23 4月, 2020 1 次提交
  6. 19 4月, 2020 1 次提交