1. 19 7月, 2021 8 次提交
    • C
      Add Cuda event and stream API (#32460) · 9c7f6af5
      chentianyu03 提交于
      * add cuda event and stream api
      
      * add cuda event and stream api
      
      * add get_current_stream api
      
      * add get_current_stream api
      
      * init streams
      
      * modify get_current_stream
      
      * modify get_cuttent_stream
      
      * add synchronize func
      
      * add current_stream doc and test file
      
      * move get_current_stream into CUDA macro
      
      * move CudaEvent into CUDA macro
      
      * move _get_current_stream and _device_synchronize into cuda macro
      
      * modify the macro of cuda stream and event
      
      * add test case for synchronize
      
      * add paddle.devices.cuda module
      
      * event and stream support hip
      
      * add doc for stream and event class
      
      * move cuda stream and event into single pybind
      
      * add cuda_streams_py.cc to cmakelist
      
      * add _device_synchronize and _get_current_stream to core module
      
      * add test case for cudastream and cudaevent
      
      * move __all__ in streams.py
      
      * fix test fail
      
      * add cuda to devices __all__
      
      * fix current_stream doc writing error
      
      * move devices to device direction, and merge device.py into __init__.py
      
      * add required:gpu to sample codes
      
      * remove cuda direction from device/__init__.py
      9c7f6af5
    • J
      enabled bf16 tests in prelu (#34196) · 68f51239
      jakpiase 提交于
      68f51239
    • J
      Fix Bug of Used before Assignment (#34090) · d8839292
      Jiangxinz 提交于
      * fix used before assign
      
      * fix used before assign
      d8839292
    • R
      [NPU hybrid] Partial send /recv/ allgather for npu (#34189) · 0cd21fac
      Roc 提交于
      0cd21fac
    • J
      Fix function redefine (#34186) · bf700264
      Jiangxinz 提交于
      * fix func redef
      
      * fix-func-redef
      bf700264
    • J
      Fix function redefine (#34185) · a4ded243
      Jiangxinz 提交于
      * fix func redef
      
      * fix func redef
      
      * fix func redef
      a4ded243
    • J
      Fix func redef3 (#34184) · 8b8124b5
      Jiangxinz 提交于
      * fix func redef
      
      * fix func redef
      
      * fix func redef
      8b8124b5
    • W
      [Inference] Add config.Summary api (#34122) · 831c1c6c
      Wilber 提交于
      831c1c6c
  2. 16 7月, 2021 5 次提交
  3. 15 7月, 2021 4 次提交
  4. 14 7月, 2021 10 次提交
  5. 13 7月, 2021 8 次提交
  6. 12 7月, 2021 5 次提交
    • W
      [hybrid performance] Optimize pipeline send wait (#34086) · 5f65ff91
      WangXi 提交于
      5f65ff91
    • H
      [NPU ]add npu kernel for gaussian random (#33983) · 9cda0596
      houj04 提交于
      * add npu operator for gaussian random.
      
      * bugfix: add wait after memory copy.
      
      * update gaussian random op: use TensorCopy.
      9cda0596
    • Z
      [Paddle-TRT] IPluginExt -> IPluginV2 (#33680) · 394f92aa
      zlsh80826 提交于
      * add trt LT version helper
      
      * upgrade PluginTensorRT to IPluginV2Ext
      
      * trt plugin factory is not usable in IPluginV2
      
      * upgrade add plugin api to use IPluginV2
      
      * remove IPlugin register and adapt getSerializeSize(), serialize()
      
      * adapt IPluginV2Layer
      
      * downgrade to IPluginV2
      
      * implement elementwise clone
      
      * add gelu plugin creator and fix gelu serialization bug
      
      * add swish plugin creator and fix swish serialization bug
      
      * format
      
      * fix typo
      
      * add elementwise plugin creator and fix serialization
      
      * add base creator class
      
      * add gelu plugin creator
      
      * add hard swish creator and fix serialization
      
      * add instance norm creator and fix serialization
      
      * add layer norm creator and fix serialization
      
      * add pool creator and fix serialization
      
      * add prelu creator and fix serialization
      
      * add slice creator and fix serialization
      
      * add swish creator and fix serialization
      
      * add instance norm op unittest
      
      * remove redundent api
      
      * fix wrong graph size to enable trt
      
      * instance norm function move to cc
      
      * add trt elementwise ut to trigger coverage
      
      * remove opt cahce to hit serialization coverage
      
      * remove opt cahce to hit serialization coverage
      
      * remove unused code
      
      * remove unused inputs_
      
      * add dbg info
      
      * remove dbg info
      
      * add instance norm serialization
      
      * roll back
      
      * remove comment code
      
      * remove trt plugin registery
      
      * fix prelu dynamic serialization
      
      * add prelu ut and reduce the input size to reduce memory usage
      
      * fix pool dynamic plugin serialization and add ut
      
      * refine pool ut with subtest
      
      * add env for avoiding oom
      
      * reduce test input size & increase pool op ut to 45s
      
      * add the contributor
      
      * remove copyright (will add in contributor)
      
      * remove copyright (will add in contributor)
      394f92aa
    • Q
      0b20b76e
    • P
      [NPU] add dropout npu op (#34081) · c4e04986
      pangyoki 提交于
      * add dropout npu op
      
      * fix bugs
      
      * add unittest
      
      * fix bugs
      
      * support 1-D input
      c4e04986