1. 18 11月, 2021 4 次提交
    • Y
      [PTen]elementwise_sub kernel refactor (#37260) · 36a95654
      YuanRisheng 提交于
      * elementwise_add kernel refactor
      
      * fix compile bugs in elementwise_add refactor
      
      * fix compile bugs when run in npu/xpu
      
      * fix bugs when run unit test
      
      * fix bugs when run ci-windows
      
      * modify code as recommended
      
      * code format adjust
      
      * fix bugs when run ci
      
      * fix compile bug when run in ci-windwos
      
      * elementwise_sub refactor
      
      * add PD_DLL_DECL for elementwise_sub
      
      * fix bugs when compilei
      36a95654
    • Y
      f85bd5c9
    • Z
      Add the `GetFetchNames` method in CinnGraphSymbolization. (#37218) · 3ad495e8
      Zhen Wang 提交于
      * Add the `GetFetchNames` method in CinnGraphSymbolization.
      
      * Use unordered_set instead vector as the type of fetch_var_names.
      
      * Reuse the definition of kCompilationKey.
      
      * Use CompileOptions to set fetch_var_ids.
      
      * Update the argument passing of GraphCompiler.Build.
      
      * Fix some bugs in CinnGraphSymbolization::GetFetchIds.
      3ad495e8
    • Z
      Opt topk (#37256) · c4862d99
      zhangkaihuo 提交于
      topk中有cub和手写kernel两种实现,而cub是通过排序来获取topk,通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。
      c4862d99
  2. 17 11月, 2021 16 次提交
  3. 16 11月, 2021 11 次提交
  4. 15 11月, 2021 9 次提交
    • C
      [Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a
      Chen Weihang 提交于
      * move extension into pten [no-verify]
      
      * append tensor methods by ext_tensor [no-verify]
      
      * append other tensor methods [no-verify]
      
      * ext related files tidy [no-verify]
      
      * include relation tidy [no-verify]
      
      * add pten tensor test [no-verify]
      
      * replace tensor in custom op & compile success
      
      * refine tensor constructor for unittest
      
      * custom relu jit run success
      
      * fix all custom op unittests
      
      * add inference cmake adapt [no-verify]
      
      * fix failed unittests
      
      * fix windows failed unittests
      
      * try to fix kunlun and inference failed
      
      * fix test_elementwise_api error
      
      * try to fix win compile failed
      
      * fix kunlun fp16 type error
      
      * remove useless haddle error macro
      
      * add custom linear op test
      
      * fix compile failed & add win symbols
      
      * fix non pten kernel cast failed
      
      * add dll decl for api
      
      * polish several deetails
      
      * polish details by review comment
      
      * add dll_decl for register
      1e598f1a
    • L
      [new-exec] fix stream analysis (#37161) · 584b4b24
      Leo Chen 提交于
      * fix revord_event
      
      * refine class Instruction
      
      * refine Instruction and InterpreterCore
      
      * make instruction and operator_base consistent
      
      * support NoNeedBufferVar in stream_analyzer
      
      * fix place of event
      
      * add vlog before continue
      584b4b24
    • C
      remove needless declare (#37195) · 9c591703
      Chen Weihang 提交于
      9c591703
    • B
      remove input dim check in op_teller and update ut (#37097) · 6b21bb0b
      baoachun 提交于
      * remove input dim check of activation in op_teller
      
      * remove input dim check of concat in op_teller
      
      * remove input dim check of clip in op_teller
      
      * remove input dim check of scale in op_teller
      
      * remove input dim check in op_teller
      
      * update attr check of slice in op_teller
      6b21bb0b
    • Y
      fix ctest depent probs (#37203) · cf958f2f
      Yuang Liu 提交于
      cf958f2f
    • W
      fix 3 bug of new_executor (#37142) · 8358d614
      wanghuancoder 提交于
      * fix 3 bug, test=develop
      
      * refine, test=develop
      8358d614
    • F
      fix:delete macro INFERENCE (#37130) · b628c316
      feng_shuai 提交于
      b628c316
    • A
      Added BF16 to mean op (#37104) · df7cc457
      arlesniak 提交于
      * Added BF16 to mean op
      
      * fix for CI
      
      * fix for CI
      
      * fix for CI
      df7cc457
    • J
      fix cinn_compile_test not pass problem (#37190) · 83eef6d2
      jiangcheng 提交于
      83eef6d2