1. 16 7月, 2021 3 次提交
  2. 15 7月, 2021 9 次提交
  3. 14 7月, 2021 7 次提交
  4. 13 7月, 2021 11 次提交
  5. 12 7月, 2021 10 次提交
    • W
      [hybrid performance] Optimize pipeline send wait (#34086) · 5f65ff91
      WangXi 提交于
      5f65ff91
    • H
      [NPU ]add npu kernel for gaussian random (#33983) · 9cda0596
      houj04 提交于
      * add npu operator for gaussian random.
      
      * bugfix: add wait after memory copy.
      
      * update gaussian random op: use TensorCopy.
      9cda0596
    • Z
      [Paddle-TRT] IPluginExt -> IPluginV2 (#33680) · 394f92aa
      zlsh80826 提交于
      * add trt LT version helper
      
      * upgrade PluginTensorRT to IPluginV2Ext
      
      * trt plugin factory is not usable in IPluginV2
      
      * upgrade add plugin api to use IPluginV2
      
      * remove IPlugin register and adapt getSerializeSize(), serialize()
      
      * adapt IPluginV2Layer
      
      * downgrade to IPluginV2
      
      * implement elementwise clone
      
      * add gelu plugin creator and fix gelu serialization bug
      
      * add swish plugin creator and fix swish serialization bug
      
      * format
      
      * fix typo
      
      * add elementwise plugin creator and fix serialization
      
      * add base creator class
      
      * add gelu plugin creator
      
      * add hard swish creator and fix serialization
      
      * add instance norm creator and fix serialization
      
      * add layer norm creator and fix serialization
      
      * add pool creator and fix serialization
      
      * add prelu creator and fix serialization
      
      * add slice creator and fix serialization
      
      * add swish creator and fix serialization
      
      * add instance norm op unittest
      
      * remove redundent api
      
      * fix wrong graph size to enable trt
      
      * instance norm function move to cc
      
      * add trt elementwise ut to trigger coverage
      
      * remove opt cahce to hit serialization coverage
      
      * remove opt cahce to hit serialization coverage
      
      * remove unused code
      
      * remove unused inputs_
      
      * add dbg info
      
      * remove dbg info
      
      * add instance norm serialization
      
      * roll back
      
      * remove comment code
      
      * remove trt plugin registery
      
      * fix prelu dynamic serialization
      
      * add prelu ut and reduce the input size to reduce memory usage
      
      * fix pool dynamic plugin serialization and add ut
      
      * refine pool ut with subtest
      
      * add env for avoiding oom
      
      * reduce test input size & increase pool op ut to 45s
      
      * add the contributor
      
      * remove copyright (will add in contributor)
      
      * remove copyright (will add in contributor)
      394f92aa
    • Q
      0b20b76e
    • Z
      2dde0eb0
    • W
    • P
      [NPU] add dropout npu op (#34081) · c4e04986
      pangyoki 提交于
      * add dropout npu op
      
      * fix bugs
      
      * add unittest
      
      * fix bugs
      
      * support 1-D input
      c4e04986
    • P
      [NPU] change ScatterAdd to EmbeddingDenseGrad in lookup_table NPU op (#33866) · 4d842050
      pangyoki 提交于
      * change ScatterAdd to EmbeddingDenseGrad in lookup_table NPU op
      
      * EmbeddingDenseGrad only supports dim 32
      
      * fix shape error
      4d842050
    • P
      [NPU] slice support Tensor Input (#34067) · 871edade
      pangyoki 提交于
      871edade
    • Y
      softmax mask fuse upper triangle (#33981) · e2e1c57b
      Yuang Liu 提交于
      * softmax mask fuse upper triangle
      
      * cover not implemented cpu code
      e2e1c57b