1. 13 7月, 2021 2 次提交
  2. 12 7月, 2021 8 次提交
    • H
      [NPU ]add npu kernel for gaussian random (#33983) · 9cda0596
      houj04 提交于
      * add npu operator for gaussian random.
      
      * bugfix: add wait after memory copy.
      
      * update gaussian random op: use TensorCopy.
      9cda0596
    • Z
      [Paddle-TRT] IPluginExt -> IPluginV2 (#33680) · 394f92aa
      zlsh80826 提交于
      * add trt LT version helper
      
      * upgrade PluginTensorRT to IPluginV2Ext
      
      * trt plugin factory is not usable in IPluginV2
      
      * upgrade add plugin api to use IPluginV2
      
      * remove IPlugin register and adapt getSerializeSize(), serialize()
      
      * adapt IPluginV2Layer
      
      * downgrade to IPluginV2
      
      * implement elementwise clone
      
      * add gelu plugin creator and fix gelu serialization bug
      
      * add swish plugin creator and fix swish serialization bug
      
      * format
      
      * fix typo
      
      * add elementwise plugin creator and fix serialization
      
      * add base creator class
      
      * add gelu plugin creator
      
      * add hard swish creator and fix serialization
      
      * add instance norm creator and fix serialization
      
      * add layer norm creator and fix serialization
      
      * add pool creator and fix serialization
      
      * add prelu creator and fix serialization
      
      * add slice creator and fix serialization
      
      * add swish creator and fix serialization
      
      * add instance norm op unittest
      
      * remove redundent api
      
      * fix wrong graph size to enable trt
      
      * instance norm function move to cc
      
      * add trt elementwise ut to trigger coverage
      
      * remove opt cahce to hit serialization coverage
      
      * remove opt cahce to hit serialization coverage
      
      * remove unused code
      
      * remove unused inputs_
      
      * add dbg info
      
      * remove dbg info
      
      * add instance norm serialization
      
      * roll back
      
      * remove comment code
      
      * remove trt plugin registery
      
      * fix prelu dynamic serialization
      
      * add prelu ut and reduce the input size to reduce memory usage
      
      * fix pool dynamic plugin serialization and add ut
      
      * refine pool ut with subtest
      
      * add env for avoiding oom
      
      * reduce test input size & increase pool op ut to 45s
      
      * add the contributor
      
      * remove copyright (will add in contributor)
      
      * remove copyright (will add in contributor)
      394f92aa
    • Q
      0b20b76e
    • P
      [NPU] add dropout npu op (#34081) · c4e04986
      pangyoki 提交于
      * add dropout npu op
      
      * fix bugs
      
      * add unittest
      
      * fix bugs
      
      * support 1-D input
      c4e04986
    • P
      [NPU] change ScatterAdd to EmbeddingDenseGrad in lookup_table NPU op (#33866) · 4d842050
      pangyoki 提交于
      * change ScatterAdd to EmbeddingDenseGrad in lookup_table NPU op
      
      * EmbeddingDenseGrad only supports dim 32
      
      * fix shape error
      4d842050
    • P
      [NPU] slice support Tensor Input (#34067) · 871edade
      pangyoki 提交于
      871edade
    • W
      tem_fix_reshape_unitest (#34069) · 113539eb
      Wangzheee 提交于
      113539eb
    • Y
      softmax mask fuse upper triangle (#33981) · e2e1c57b
      Yuang Liu 提交于
      * softmax mask fuse upper triangle
      
      * cover not implemented cpu code
      e2e1c57b
  3. 09 7月, 2021 2 次提交
  4. 08 7月, 2021 3 次提交
  5. 07 7月, 2021 4 次提交
  6. 06 7月, 2021 3 次提交
    • Z
      Add gpu implementation of shuffle_batch_op (#33938) · c6b6ba1f
      Zeng Jinle 提交于
      * add gpu implementation of shuffle batch
      test=develop
      
      * add thrust cuda patches
      test=develop
      
      * fix macro guard
      
      * fix shuffle batch compile on windows/hip
      
      * fix hip compilation error
      
      * refine CMakeLists.txt
      
      * fix windows compile error
      
      * try to fix windows CI compilation error
      
      * fix windows compilation again
      
      * fix shuffle_batch op test on Windows
      c6b6ba1f
    • X
      Enhance error message for interpolate_v2 (#33941) · f2068eec
      xiaoting 提交于
      * fix interpolate for shape[i]=0, test=develop
      
      * fix test_trilinear_interp_v2 random failure, test=develop
      f2068eec
    • D
      【HETERPS】pipeline adaptive for heterps (#33159) · bfef7feb
      danleifeng 提交于
      * pipeline adaptive for heterps;test=develop
      * fix finalize hang;test=develop
      * add is_compiled_with_heterps for dataset;test=develop
      * fix hashtable core when pass ins_num=0;test=develop
      bfef7feb
  7. 05 7月, 2021 5 次提交
  8. 04 7月, 2021 1 次提交
  9. 02 7月, 2021 1 次提交
  10. 01 7月, 2021 4 次提交
  11. 30 6月, 2021 2 次提交
    • J
      Added matmul_v2 BF16/FP32 FWD kernel (#33750) · 24783c84
      jakpiase 提交于
      * added matmul_v2 bf16/fp32 FWD kernel
      
      added matmul_v2 bf16/fp32 FWD kernel
      
      * added formatting
      
      * removed some tests due to timeout in CI
      
      * refactored tests
      
      * merged tests classes into one file
      
      * minor change
      
      * removed test guard for CUDA
      
      * remove skipIf
      
      * changes after review
      
      * formated one file
      
      * minor change
      
      * added skipping UT in CUDA place
      24783c84
    • H
      [NPU] support set_device (#33815) · 8225a6a1
      houj04 提交于
      * support set_device for NPU.
      
      * minor update doc and add more unit test.
      8225a6a1
  12. 28 6月, 2021 3 次提交
  13. 25 6月, 2021 1 次提交
  14. 24 6月, 2021 1 次提交
    • H
      [NPU] support dygraph execution on npu place(#33579) · 6aea6be2
      houj04 提交于
      * in NPU environment, use CPUPlace for missing operators.
      
      * in NPU environment, use CPUPlace for missing operators.
      
      * fix TensorCopy bug and add unit test.
      
      * fix code style.
      
      * add more unit tests.
      6aea6be2