1. 21 11月, 2022 4 次提交
    • S
      [PHI] Migrate mul_grad kernel (#48061) · 55f6fb3d
      Sławomir Siwek 提交于
      * cleanup unused code
      
      * unify is_int8 is_bfloat16
      
      * Simplify matmul_v2 FWD kernel
      
      * remove RunKernel methods
      
      * remove import namespace
      
      * remove headers
      
      * clean fluid/phi cross imports
      
      * remove fluid axpy_handler
      
      * delete fluid methods
      
      * activations
      
      * OneDNNMemDesc
      
      * MKLDNNFormatForSize
      
      * MatchShapeToLayout
      
      * MKLDNNMemoryFormat
      
      * MKLDNNFormat
      
      * ReorderMKLDNNHandler
      
      * to_void_cast
      
      * review suggestions
      
      * interpolate
      
      * remove fluid depedency
      
      * init
      
      * ExecuteMatMulV2
      
      * rm fluid kernel
      
      * matmul_grad
      
      * remove mutable_data
      
      * mul_grad
      55f6fb3d
    • L
      mma qk tensor_core (#48087) · d79eda71
      lzy 提交于
      * use mma for QK dot computing in fused_multi_transformer.
      * Update fused_multi_transformer_op.cu.h
      d79eda71
    • H
      [PHI decoupling] move cross_entropy from fluid to phi (#48160) · 3501ff7d
      huangjiyi 提交于
      * move cross_entropy from fluid to phi
      
      * replace mutable_data with Alloc
      
      * use .template
      3501ff7d
    • W
      Unify `ProcessGroupNCCL` APIs underlying implementation (#48163) · 88410225
      Wen Sun 提交于
      * refactor: replace Collective & PointToPoint with NCCLEnv
      
      * refactor: rename to RunFnInNCCLEnv
      
      * refactor: pass std::function by value
      88410225
  2. 18 11月, 2022 7 次提交
  3. 17 11月, 2022 6 次提交
  4. 15 11月, 2022 4 次提交
  5. 14 11月, 2022 3 次提交
  6. 11 11月, 2022 3 次提交
  7. 10 11月, 2022 5 次提交
  8. 09 11月, 2022 7 次提交
  9. 08 11月, 2022 1 次提交
    • P
      Split quant (#47449) · 130db92a
      Paulina Gacek 提交于
      * Split kernel registered, tests for uint/int added
      
      * Split quantized
      
      * Split output scales calculated only once
      
      * NearestInterp test fix reversed
      
      * DequantizeOutputs corrected
      130db92a