1. 29 1月, 2022 5 次提交
    • L
      Optimize layer norm backward cuda kernel when cols is 1024. (#39247) · 99cfcc09
      Li Min 提交于
      * Add fp16 support for scale/bias for fused_layernnorm_residual_dropout_bias op.
      
      * Remove useless code.
      
      * Remove useless code.
      
      * Optimize layer_norm fwd when cols is 1024.
      
      * Remove useless code.
      
      * Minors.
      
      * Minors.
      
      * Modifications accordding to reviews.
      
      * Minors.
      
      * Optimize layer_norm bwd kernel when cols is 1024.
      
      * Polish layer_norm_bwd_1024 kernel.
      
      * Limit ln_bwd_1024_kernel to paddle_with_cuda.
      
      * Fix double type compile error.
      
      * Add optimization of ln bwd for fused_dropout_add_ln op.
      
      * Polish codes.
      99cfcc09
    • L
      Add xpu2 compiler (#37254) · 92da5055
      Liu-xiandong 提交于
      * Add XPU compiler for paddle, test=develop
      
      * clean code
      
      * clean useless code
      
      * clean useless code
      
      * clean useless code
      
      * test
      
      * add include path
      
      * use clang compiler
      
      * xpu2.cmake
      
      * XPU2 compiler passed
      
      * update
      
      * update after pten
      
      * combination the WITH_XPU and WITH_XPU2
      
      * update the fuse operation in WITH_XPU and WITH_XPU2
      
      * update
      
      * update
      
      * update
      
      * fix the merge error
      
      * update
      
      * update the code
      
      * update the code
      
      * add run_kp_kernel flag
      
      * update
      
      * update
      
      * fix prepared type_ bug
      
      * clean and update the code
      
      * reset the kernel_primitives
      
      * update
      
      * clean the code
      
      * delete useless comment
      
      * fix the bug in WITH_XPU
      
      * update
      
      * update
      
      * modify the abi
      
      * delete some useless code
      
      * Parameter automation in xpu compilation
      
      * Parameter automation in xpu compilation
      
      * delete kps in cmake
      
      * delete useless comment
      
      * clean the code
      
      * clean the code
      92da5055
    • C
      [PTen] Tidy pten core headers (#39188) · dd990981
      Chen Weihang 提交于
      * open header for custom kernel
      
      * add core utils
      
      * tidy core code
      
      * tify header
      
      * tidy include
      
      * tidy namespace
      
      * resolve conflit
      
      * fix unittest and coverage
      
      * remove platform using
      
      * resolve conflict
      
      * resolve conflict
      
      * fix digamma namespace error
      
      * fix xpu full kernel error
      
      * fix xpu full kernel error
      
      * polish details
      
      * add place for lib storage
      dd990981
    • Q
      fix kunlun2 softmax unitest bug (#39274) · 23bb2836
      QingshuChen 提交于
      * fix kunlun2 softmax unitest bug
      *test=kunlun
      
      * minor
      23bb2836
    • L
      7b4916c4
  2. 28 1月, 2022 12 次提交
  3. 27 1月, 2022 19 次提交
  4. 26 1月, 2022 4 次提交
    • L
      [pten] remove deprecated fluid op kernel for pten (#38842) · 3ab9aef1
      Leo Chen 提交于
      * update cmake file to remove fluid kernel
      
      * add pten declaration.h to where pybind.h used
      
      * fix sync_bn and tensorrt_engine
      
      * refine detection_library
      
      * fix interpreter_core
      
      * support eager legacy
      
      * fit eager legacy for pten
      
      * fall back to cpu if not found kernel
      
      * fix compile problem
      
      * fix compile problem
      
      * refine fallback logic
      
      * fit operator.run()
      
      * fix xpu compile
      
      * fit for new_exec
      
      * add REGISTER_OP_WITHOUT_GRADIENT
      
      * un-cache pt_kernel_context
      
      * fix compile
      
      * fix cudnn
      
      * fix compiling with on_infer
      
      * fix mkldnn
      
      * fix isfinite_v2
      
      * fix xpu problem
      
      * fix op_device
      
      * refine fallback for xpu
      
      * fix xpu compile
      
      * merge develop
      
      * refine code format
      
      * fix compile
      
      * fix compile
      
      * add data_transfer
      
      * fix PreparePtenData
      
      * fix cpu context
      
      * merge develop
      
      * fix compile
      
      * fix error device context
      
      * fix xpu
      
      * fix dev_ctx
      3ab9aef1
    • C
      [pten] Cast xpu kernel (#39179) · 93d2f0a6
      chentianyu03 提交于
      * cast xpu kernel init
      
      * cast xpu kernel
      
      * replace with raw cast xpu kernel
      
      * fix cast kernel bug
      
      * add the missing break
      
      * modify namespace and header file
      93d2f0a6
    • W
      [Eager] Support imperative selected_rows_to_lod_tensor and the opposite case (#39223) · 787980b1
      Weilong Wu 提交于
      * Added selected_rows and rw_lock to pten
      
      * Renamed the unit test target to fix CI
      
      * Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid
      
      * Remove rw_lock.h,rw_lock_test.cc in fluid
      
      * Use pten::RWLock and pten::AutoRDLock, fix CI
      
      * Use pten::SelectedRows
      
      * Use pten::SelectedRows
      
      * Fix to pass NPU CI
      
      * Selected_Rows inherits from TensorBase
      
      * Use pten::SelectedRows, to pass NPU CI
      
      * To fix NPU CI
      
      * To fix NPU CI again
      
      * Use paddle/pten/core/enforce and polish code
      
      * Support imperative selected_rows_to_lod_tensor
      
      * Polish code
      787980b1
    • Q
      [MLU]Add conv2d op (#39110) · 71634a61
      qipengh 提交于
      * [MLU]Add conv2d op
      
      * [MLU]fix comment
      
      * [MLU]adapt NCHW of conv2d op
      71634a61