1. 14 7月, 2021 4 次提交
  2. 13 7月, 2021 8 次提交
  3. 12 7月, 2021 10 次提交
    • W
      [hybrid performance] Optimize pipeline send wait (#34086) · 5f65ff91
      WangXi 提交于
      5f65ff91
    • H
      [NPU ]add npu kernel for gaussian random (#33983) · 9cda0596
      houj04 提交于
      * add npu operator for gaussian random.
      
      * bugfix: add wait after memory copy.
      
      * update gaussian random op: use TensorCopy.
      9cda0596
    • Z
      [Paddle-TRT] IPluginExt -> IPluginV2 (#33680) · 394f92aa
      zlsh80826 提交于
      * add trt LT version helper
      
      * upgrade PluginTensorRT to IPluginV2Ext
      
      * trt plugin factory is not usable in IPluginV2
      
      * upgrade add plugin api to use IPluginV2
      
      * remove IPlugin register and adapt getSerializeSize(), serialize()
      
      * adapt IPluginV2Layer
      
      * downgrade to IPluginV2
      
      * implement elementwise clone
      
      * add gelu plugin creator and fix gelu serialization bug
      
      * add swish plugin creator and fix swish serialization bug
      
      * format
      
      * fix typo
      
      * add elementwise plugin creator and fix serialization
      
      * add base creator class
      
      * add gelu plugin creator
      
      * add hard swish creator and fix serialization
      
      * add instance norm creator and fix serialization
      
      * add layer norm creator and fix serialization
      
      * add pool creator and fix serialization
      
      * add prelu creator and fix serialization
      
      * add slice creator and fix serialization
      
      * add swish creator and fix serialization
      
      * add instance norm op unittest
      
      * remove redundent api
      
      * fix wrong graph size to enable trt
      
      * instance norm function move to cc
      
      * add trt elementwise ut to trigger coverage
      
      * remove opt cahce to hit serialization coverage
      
      * remove opt cahce to hit serialization coverage
      
      * remove unused code
      
      * remove unused inputs_
      
      * add dbg info
      
      * remove dbg info
      
      * add instance norm serialization
      
      * roll back
      
      * remove comment code
      
      * remove trt plugin registery
      
      * fix prelu dynamic serialization
      
      * add prelu ut and reduce the input size to reduce memory usage
      
      * fix pool dynamic plugin serialization and add ut
      
      * refine pool ut with subtest
      
      * add env for avoiding oom
      
      * reduce test input size & increase pool op ut to 45s
      
      * add the contributor
      
      * remove copyright (will add in contributor)
      
      * remove copyright (will add in contributor)
      394f92aa
    • Q
      0b20b76e
    • P
      [NPU] add dropout npu op (#34081) · c4e04986
      pangyoki 提交于
      * add dropout npu op
      
      * fix bugs
      
      * add unittest
      
      * fix bugs
      
      * support 1-D input
      c4e04986
    • P
      [NPU] change ScatterAdd to EmbeddingDenseGrad in lookup_table NPU op (#33866) · 4d842050
      pangyoki 提交于
      * change ScatterAdd to EmbeddingDenseGrad in lookup_table NPU op
      
      * EmbeddingDenseGrad only supports dim 32
      
      * fix shape error
      4d842050
    • P
      [NPU] slice support Tensor Input (#34067) · 871edade
      pangyoki 提交于
      871edade
    • W
      tem_fix_reshape_unitest (#34069) · 113539eb
      Wangzheee 提交于
      113539eb
    • Y
      softmax mask fuse upper triangle (#33981) · e2e1c57b
      Yuang Liu 提交于
      * softmax mask fuse upper triangle
      
      * cover not implemented cpu code
      e2e1c57b
    • Z
      add paddle/linalg.py to add new linalg apis (#34033) · bfbea8fd
      zhiboniu 提交于
      bfbea8fd
  4. 09 7月, 2021 6 次提交
  5. 08 7月, 2021 6 次提交
  6. 07 7月, 2021 4 次提交
  7. 06 7月, 2021 2 次提交
    • T
      add so parser (#33969) · b1c458d0
      Thunderbrook 提交于
      * add delta score, scale show
      
      * so parser
      
      * windows
      
      * windows
      b1c458d0
    • Z
      Add gpu implementation of shuffle_batch_op (#33938) · c6b6ba1f
      Zeng Jinle 提交于
      * add gpu implementation of shuffle batch
      test=develop
      
      * add thrust cuda patches
      test=develop
      
      * fix macro guard
      
      * fix shuffle batch compile on windows/hip
      
      * fix hip compilation error
      
      * refine CMakeLists.txt
      
      * fix windows compile error
      
      * try to fix windows CI compilation error
      
      * fix windows compilation again
      
      * fix shuffle_batch op test on Windows
      c6b6ba1f