1. 18 11月, 2021 1 次提交
    • Z
      Opt topk (#37256) · c4862d99
      zhangkaihuo 提交于
      topk中有cub和手写kernel两种实现,而cub是通过排序来获取topk,通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。
      c4862d99
  2. 17 11月, 2021 20 次提交
  3. 16 11月, 2021 16 次提交
  4. 15 11月, 2021 3 次提交
    • C
      [Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a
      Chen Weihang 提交于
      * move extension into pten [no-verify]
      
      * append tensor methods by ext_tensor [no-verify]
      
      * append other tensor methods [no-verify]
      
      * ext related files tidy [no-verify]
      
      * include relation tidy [no-verify]
      
      * add pten tensor test [no-verify]
      
      * replace tensor in custom op & compile success
      
      * refine tensor constructor for unittest
      
      * custom relu jit run success
      
      * fix all custom op unittests
      
      * add inference cmake adapt [no-verify]
      
      * fix failed unittests
      
      * fix windows failed unittests
      
      * try to fix kunlun and inference failed
      
      * fix test_elementwise_api error
      
      * try to fix win compile failed
      
      * fix kunlun fp16 type error
      
      * remove useless haddle error macro
      
      * add custom linear op test
      
      * fix compile failed & add win symbols
      
      * fix non pten kernel cast failed
      
      * add dll decl for api
      
      * polish several deetails
      
      * polish details by review comment
      
      * add dll_decl for register
      1e598f1a
    • L
      [new-exec] fix stream analysis (#37161) · 584b4b24
      Leo Chen 提交于
      * fix revord_event
      
      * refine class Instruction
      
      * refine Instruction and InterpreterCore
      
      * make instruction and operator_base consistent
      
      * support NoNeedBufferVar in stream_analyzer
      
      * fix place of event
      
      * add vlog before continue
      584b4b24
    • C
      remove needless declare (#37195) · 9c591703
      Chen Weihang 提交于
      9c591703