1. 20 4月, 2023 7 次提交
  2. 19 4月, 2023 18 次提交
  3. 18 4月, 2023 11 次提交
  4. 17 4月, 2023 4 次提交
    • Z
      [Paddle-Inference] Add cutlass conv2d_depthwise (#51792) · bd3b096a
      zhoutianzi666 提交于
      * initial commit for cutlass_teller
      
      * second commit for cutlass_teller
      
      * add conv2d_depthwise python template
      
      * add conv2d_depthwise cutlass template
      
      * /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h
      
      * refine code in Conv2dFusionCanSupport
      
      * add macro in cutlass_teller.h
      
      * add 3x3 5x5 teller
      
      * add groups not 1 or conv2d_depthwise teller
      
      * 只生成ic是8的倍数的conv2d_depthwise 的kernel
      
      * add EXPLICIT in cutlass_teller.h
      
      * final commit
      
      * add split_k_slices in conv2d_depthwise
      
      * make stages == 2
      
      * 重构部分代码
      
      * add CutlassFusionType
      
      * solve illegal memory
      
      * make stride_h=stride_w && make dilation==1
      
      * must check HasAttr(use_cutlass) before GetAttrIfExists
      
      * add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String
      
      * modify decl.h and util.cu
      bd3b096a
    • L
      cherry-pick fleet executor from 2.4 (#52896) · bafe287a
      LiYuRio 提交于
      * cherry-pick fleet executor from 2.4
      
      * fix test case
      bafe287a
    • S
      Support static graph code-gen for matrix_rank (#52659) · a2aa0087
      Sanbu 提交于
      a2aa0087
    • S
      Add unique counter for shared memory used in DataLoader (#52976) · b0911ecb
      sneaxiy 提交于
      * fix ipc counter
      
      * fix missing std::to_string
      b0911ecb