1. 22 11月, 2022 10 次提交
  2. 21 11月, 2022 16 次提交
  3. 19 11月, 2022 2 次提交
  4. 18 11月, 2022 12 次提交
    • W
      refine save hook (#48124) · 04709310
      wanghuancoder 提交于
      04709310
    • MarDino's avatar
      Fused QKVBiasAdd and Transpose with Split Q, KV (#47680) · d595928e
      MarDino 提交于
      * fused qkvBiasAdd and transpose with split qkv
      
      * fix typo
      
      * fix format
      
      * fix name
      
      * add annotation
      
      * fix comment
      d595928e
    • S
      [PHI] Migrate matmul_grad kernel (#48023) · 4ab18ada
      Sławomir Siwek 提交于
      * cleanup unused code
      
      * unify is_int8 is_bfloat16
      
      * Simplify matmul_v2 FWD kernel
      
      * remove RunKernel methods
      
      * remove import namespace
      
      * remove headers
      
      * clean fluid/phi cross imports
      
      * remove fluid axpy_handler
      
      * delete fluid methods
      
      * activations
      
      * OneDNNMemDesc
      
      * MKLDNNFormatForSize
      
      * MatchShapeToLayout
      
      * MKLDNNMemoryFormat
      
      * MKLDNNFormat
      
      * ReorderMKLDNNHandler
      
      * to_void_cast
      
      * review suggestions
      
      * interpolate
      
      * remove fluid depedency
      
      * init
      
      * ExecuteMatMulV2
      
      * rm fluid kernel
      
      * matmul_grad
      
      * remove mutable_data
      4ab18ada
    • Z
      [PHI] Migrate conv_transpose kernel (#48119) · 9aacb31b
      Zuza Gawrysiak 提交于
      * Migrate conv_transpose to phi
      
      * Move handler to kernel
      
      * kernel m
      
      * Fix formatting
      
      * handler
      
      * remove fluid
      
      * revert tcp_store
      
      * tcp_store
      
      * remove unused
      
      * Fix declaration
      
      * add dnn input
      
      * Fix typo
      Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
      9aacb31b
    • Z
      Fix bug of zero_allocator in HostAlloc (#48108) · 7f92e27e
      zyfncg 提交于
      * fix bug of zero_allocator in host
      
      * fix test compile bug
      
      * add unittest
      
      * update test
      7f92e27e
    • MarDino's avatar
      Optimize FusedBiasAddGelu Kernel (#47679) · b0e28540
      MarDino 提交于
      * Add quick gelu and fused bias add kernel
      
      * fix annotation
      
      * remove useless code
      
      * add fast gelu option and set it in multi transformer op
      
      * add flag to restrict if use fast gelu approximate
      
      * fix flags conflict
      
      * fix use tanh function instead
      
      * add cudart version limit
      
      * use phi fast tanh func
      
      * fix comment
      b0e28540
    • H
      [PHI decoupling] move "gpu_device_function.h" from fluid to phi (#48097) · 27ee6e71
      huangjiyi 提交于
      * move "paddle/phi/backends/gpu/gpu_device_function.h" to phi
      
      * update copyright years
      
      * rm "fluid/platform/device/gpu/gpu_device_function.h" in phi
      
      * fix rocm-complie bugs
      27ee6e71
    • W
    • J
      correct sync behavior for XPU distributed training (#47882) · aafa9820
      james 提交于
      * correct sync behavior for XPU distributed training
      
      XPU support event mechanism similar to cuda event, so it is advisable to
      use an event to sync compute/comm streams for performance. However this
      mechanism is never fully tested, and inconsistent loss/ending_epochs are
      reported. Therefore, this PR replaces event sync with stream waiting as
      a temporary solution.
      
      * remove compile warning
      aafa9820
    • J
      fix device id issue for xpu eager mode (#48076) · 3b18d96b
      james 提交于
      * fix device id issue for xpu eager
      
      xpu device id is not correctly set in eager mode, thus vars are on dev0 unless
      XPUDeviceGurad is called, leading to this error message for all node rank != 0:
      "NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported."
      
      * fix typo
      
      * fix pybind error
      3b18d96b
    • T
      CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b
      Tian Zheng 提交于
      * Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation
      
      * Fix macro
      
      * Add implementation for conv_kernel and conv_grad_kernel
      
      * Modification after rebase onto latest develop
      
      * Modify plan cache to comply with the API of phi::autotune
      
      * Refactor to reduce duplicate code
      
      * Review fix:
      - move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
      - add const specifier for input tensor
      - add logging when plans fail to execute
      - move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h
      
      * - move plan building outside of cache
      
      * Fix ROCM build
      14a6e67b
    • Y
      add bf16 for numel (#48121) · a7d306af
      Yuang Liu 提交于
      a7d306af