1. 24 11月, 2022 9 次提交
    • Z
    • Z
      22555e96
    • W
      [Fluid clean] (#48105) · 43b92b63
      wangxiaoning 提交于
      * add index sample fp16 support
      
      * remove fluid APIs in distributed_strategy.py and role_maker.py
      
      * Revert "remove fluid APIs in distributed_strategy.py and role_maker.py"
      
      This reverts commit 223bbee990d3bf69e252fc3c0f19e3873550a264.
      
      * remove fluid APIs in distributed_strategy.py and role_maker.py
      
      * remove index sample op changes
      
      * remove fluid APIs under fleet.base
      
      * remove fluid APIs under fleet.layers.mpu
      
      * remove fluid APIs under fleet.meta_optimizers
      
      * fix fluid error
      
      * fix util_factory.py
      
      * reset fluid.io.load_inference_model API
      43b92b63
    • H
      [PHI decoupling] simplify "convert_utils.h" in fluid (#48168) · de4310e6
      huangjiyi 提交于
      * rm dependence to "convert_utils.h" in some files
      
      * fix bugs
      
      * replace DataType2String with DataTypeToString
      
      * replace framework::DataTypeSize with phi::SizeOf
      
      * mv convert_function from fluid to phi and rm old map
      
      * recommit with pre-commit
      
      * repalce ProtoVarType with ProtoDataType and update comment.
      
      * fix error about include "dnnl.hpp"
      
      * revert add dep mkldnn to convert_utils in phi
      
      * add mkldnn deps in convert_utils.h in phi
      
      * move deps to convert_utils.h in phi
      de4310e6
    • P
      df23c7c3
    • H
      [Phi Support CuDNN] Support ALL CuDNN (#47865) · 1623f1b4
      HongyuJia 提交于
      * support default use_gpudnn=True
      
      * fully support cudnn in phi
      
      * add header file
      
      * add white_list, verify accuracy
      
      * phi support all cudnn
      
      * opt affine_grad
      
      * try different arches of pretrained_model
      
      * try different arches of pretrained_model
      
      * add debug string
      
      * debug eager_method
      
      * add debug string, pass all local ctest
      
      * polish all debug code
      
      * delete use_cudnn relevant code autogen
      
      * fix depthwise_conv2d
      
      * Share all other members of Tensor except use_cudnn
      
      * polish codes according to review opinion
      
      * polish codes according to review opinion, fix bug
      
      * polish codes according to review opinion, opt performance
      
      * polish codes according to review opinion, fix pooling.py
      1623f1b4
    • S
      [PHI] Migrate batch_norm_grad kernel (#48288) · 561b7278
      Sławomir Siwek 提交于
      561b7278
    • S
      fix adam thread num (#48297) · dd27996c
      sneaxiy 提交于
      dd27996c
    • W
      do not calc reduce_all in eager mode (#48199) · bcf75132
      wanghuancoder 提交于
      * do not calc reduce_all in eager mode
      
      * refine python c cast list
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      
      * refine
      bcf75132
  2. 23 11月, 2022 7 次提交
  3. 22 11月, 2022 5 次提交
  4. 21 11月, 2022 8 次提交
  5. 18 11月, 2022 11 次提交
    • S
      [PHI] Migrate matmul_grad kernel (#48023) · 4ab18ada
      Sławomir Siwek 提交于
      * cleanup unused code
      
      * unify is_int8 is_bfloat16
      
      * Simplify matmul_v2 FWD kernel
      
      * remove RunKernel methods
      
      * remove import namespace
      
      * remove headers
      
      * clean fluid/phi cross imports
      
      * remove fluid axpy_handler
      
      * delete fluid methods
      
      * activations
      
      * OneDNNMemDesc
      
      * MKLDNNFormatForSize
      
      * MatchShapeToLayout
      
      * MKLDNNMemoryFormat
      
      * MKLDNNFormat
      
      * ReorderMKLDNNHandler
      
      * to_void_cast
      
      * review suggestions
      
      * interpolate
      
      * remove fluid depedency
      
      * init
      
      * ExecuteMatMulV2
      
      * rm fluid kernel
      
      * matmul_grad
      
      * remove mutable_data
      4ab18ada
    • Z
      [PHI] Migrate conv_transpose kernel (#48119) · 9aacb31b
      Zuza Gawrysiak 提交于
      * Migrate conv_transpose to phi
      
      * Move handler to kernel
      
      * kernel m
      
      * Fix formatting
      
      * handler
      
      * remove fluid
      
      * revert tcp_store
      
      * tcp_store
      
      * remove unused
      
      * Fix declaration
      
      * add dnn input
      
      * Fix typo
      Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
      9aacb31b
    • Z
      Fix bug of zero_allocator in HostAlloc (#48108) · 7f92e27e
      zyfncg 提交于
      * fix bug of zero_allocator in host
      
      * fix test compile bug
      
      * add unittest
      
      * update test
      7f92e27e
    • MarDino's avatar
      Optimize FusedBiasAddGelu Kernel (#47679) · b0e28540
      MarDino 提交于
      * Add quick gelu and fused bias add kernel
      
      * fix annotation
      
      * remove useless code
      
      * add fast gelu option and set it in multi transformer op
      
      * add flag to restrict if use fast gelu approximate
      
      * fix flags conflict
      
      * fix use tanh function instead
      
      * add cudart version limit
      
      * use phi fast tanh func
      
      * fix comment
      b0e28540
    • H
      [PHI decoupling] move "gpu_device_function.h" from fluid to phi (#48097) · 27ee6e71
      huangjiyi 提交于
      * move "paddle/phi/backends/gpu/gpu_device_function.h" to phi
      
      * update copyright years
      
      * rm "fluid/platform/device/gpu/gpu_device_function.h" in phi
      
      * fix rocm-complie bugs
      27ee6e71
    • J
      correct sync behavior for XPU distributed training (#47882) · aafa9820
      james 提交于
      * correct sync behavior for XPU distributed training
      
      XPU support event mechanism similar to cuda event, so it is advisable to
      use an event to sync compute/comm streams for performance. However this
      mechanism is never fully tested, and inconsistent loss/ending_epochs are
      reported. Therefore, this PR replaces event sync with stream waiting as
      a temporary solution.
      
      * remove compile warning
      aafa9820
    • T
      CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b
      Tian Zheng 提交于
      * Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation
      
      * Fix macro
      
      * Add implementation for conv_kernel and conv_grad_kernel
      
      * Modification after rebase onto latest develop
      
      * Modify plan cache to comply with the API of phi::autotune
      
      * Refactor to reduce duplicate code
      
      * Review fix:
      - move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
      - add const specifier for input tensor
      - add logging when plans fail to execute
      - move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h
      
      * - move plan building outside of cache
      
      * Fix ROCM build
      14a6e67b
    • Y
      add bf16 for numel (#48121) · a7d306af
      Yuang Liu 提交于
      a7d306af
    • W
      [PHI decoupling] remove "gpu_primitives.h" in fluid (#48063) · 9918bf9c
      Wang Xin 提交于
      * remove "gpu_primitives.h" in fluid namespace
      
      * fix PR-CI-GpuPS fail
      
      * fix PR-CI-GpuPS fail
      9918bf9c
    • Z
    • S
      fix onednn prelu header (#48064) · 85598e31
      Sylwester Fraczek 提交于
      85598e31