1. 21 1月, 2022 19 次提交
    • Y
      [PTen]Separate origin Kernel and add Kernel for C++ API (#39002) · a0f586bc
      YuanRisheng 提交于
      * add kernel for c++ api
      
      * fix compile bugs
      
      * fix kunlun compile bugs
      
      * perfect cmake
      
      * fix compile bugs when run ci-inference
      
      * fix compile bugs
      
      * add non-raw kernel for fluid op
      
      * fix compile bugs
      
      * fix compile bugs
      
      * fix unit test bug
      a0f586bc
    • S
      add pten dependency to infrt (#39079) · 854a7ab3
      Shang Zhizhou 提交于
      * add pten dependency to infrt
      
      * fix code style
      
      * add pten::CPUContext
      
      * revert .ignore
      854a7ab3
    • C
      [pten] add concat pten kernel (#38955) · 06803c29
      chentianyu03 提交于
      06803c29
    • W
      814e5ab4
    • Z
      df515255
    • T
      Keep strided_slice op behavior consistent with slice op when starts input is... · b47fb764
      TeslaZhao 提交于
      Keep strided_slice op behavior consistent with slice op when starts input is less than -rank (#39066)
      
      b47fb764
    • F
      [MLU]add mlu ci dockerfile (#39021) · fdab43b5
      fwenguang 提交于
      * [MLU]add mlu ci dockerfile
      
      * fix comment
      
      * add cncl
      fdab43b5
    • T
      refactor unittest for kunlun (#38772) · 4f1fef60
      TTerror 提交于
      * refactor unittests for kunlun
      
      * refactor unittests for kunlun, test=kunlun
      4f1fef60
    • A
      [PTen]Migrate Dim and DDim from paddle::framework into pten namespace (#39053) · 4e23ba32
      Aurelius84 提交于
      * Migrate Dim and DDim from paddle::framework into pten namespace
      
      * fix paddle::framework::Array
      
      * fix framework::Array
      4e23ba32
    • W
      update recommend member (#39083) · cf6516ff
      wuhuanzhou 提交于
      * update recommend member, test=document_fix
      
      * remove update of UB rule file, test=document_fix
      cf6516ff
    • R
      fix npu c_allgather int64 (#39099) · 89f903da
      ronnywang 提交于
      89f903da
    • F
      add block and grid loop for index_sample kernel to deal with a large-shape tensor (#37816) · 4adeff06
      FlyingQianMM 提交于
      * add block and grid loop for index_sample kernel to deal with a large-shape tensor
      
      * fix code format
      
      * limit grid dim
      4adeff06
    • T
      fix gcd and lcm data type (#39043) · ba51a6c8
      Tao Luo 提交于
      ba51a6c8
    • Y
      [Auto Parallel] Use the new completion algorithm (#39086) · e5cda6fa
      Yulong Ao 提交于
      * Add the backward support for QR
      
      * Remove unnecessary comments
      
      * [Auto Parallel] Improve the dist op interface and compatible computation
      
      * Remove unnecessary modification
      
      * Recover some modifications
      
      * Add lost files
      
      * Fix a minor bug
      
      * Fix the bug of the planner
      
      * Fix the format problem
      
      * [Auto Parallel] Update the completion algorithm
      
      * Fix the bug of auto_searcher unittest
      e5cda6fa
    • W
      Support test_imperative parameterlist and layerdict (#38800) · f68ef9d2
      Weilong Wu 提交于
      * Rearranged Eager AutoCodeGen directory structure
      
      * Removed USE_OP in Eager AutoCodeGen
      
      * Enabled generation for Operators without Grad/Inputs/Outputs
      
      * Resolved operators without input
      
      * Fixed merge conflicts
      
      * Enabled Eager AutoCodeGen for 10+ more operators
      
      * Refactored Eager AutoCodeGen with more organized helper objects
      
      * Enabled Eager AutoCodeGen for operators with multiple OpBases
      
      * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument
      
      * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen
      
      * Adjusted function generation/call between Python-C API & Dygraph API
      
      * Synchronized auto-generated Python-C API with Dygraph Forward Functions
      
      * support more eager tensor api
      
      * fix merge compile error
      
      * fix compile error and fit develop code
      
      * support pure CPU
      
      * fix some logic error in eager_mode
      
      * support _varbase_creator in eager mode
      
      * Added safe_initialized interface to EagerTensor for use in processing dispensable inputs
      
      * for eager mode
      
      * refine
      
      * support multiple constructor for eager tensor
      
      * add place related code
      
      * polish code
      
      * specific randint with dtype of int64
      
      * Support pure cpu test
      
      * eager logic
      
      * refine test in pure cpu
      
      * eager logic
      
      * eager logic
      
      * eager logic, test=develop
      
      * skip core.eager when in inference, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * call RetainGrad after run forward kernel, test=develop
      
      * refine, test=develop
      
      * support dygraph util, meta, guard test
      
      * eager test case
      
      * support inference test
      
      * refine test and fix initializer failed
      
      * modify eagertensor patch method
      
      * add eagertensor.clear_grandint, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * support create varbase and fix retain grad error
      
      * call monkey_patch_varbase in _test_eager_guard, test=develop
      
      * fix windows error
      
      * split clear_gradient to clear_gradient and zero_grads, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * support test_imperative_basic test in eager mode
      
      * remove additional log in variable.h
      
      * remove additional log in variable.h
      
      * remove additional code create in merge
      
      * eager
      
      * fix some eager logic, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * patch_tensor_method_func, test=develop
      
      * refine, test=develop
      
      * eager test case, test=develop
      
      * refine, test=develop
      
      * Support eager_guard() in container_layerdict&parameterlist
      
      * eager, test=develop
      
      * eager, test=develop
      
      * eager optimizer, test=develop
      
      * eager optimizer, test=develop
      
      * eager test_imperative_optimizer_v2, test=develop
      
      * eager, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * eager, test=develop
      
      * add resize in share buffer to, test=develop
      
      * eager, test=develop
      
      * fix _share_buffer_to, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * support eager for dataloader,test=develop
      Co-authored-by: Njim19930609 <jim19930609@gmail.com>
      Co-authored-by: NJiabinYang <360788950@qq.com>
      Co-authored-by: NWang Huan <wanghuan29@baidu.com>
      Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>
      f68ef9d2
    • F
      [MLU]add batch_norm mlu kernel (#39070) · 29796efe
      fwenguang 提交于
      29796efe
    • C
      fix save channel wise quant model (#39054) · ab1abd40
      ceci3 提交于
      ab1abd40
    • W
      [PTEN] Add cpu context (#38979) · 064bc4b8
      Wilber 提交于
      * add cpu_context.
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * fix ci problem
      
      * fix npu ci problem
      
      * update
      
      * fix ci compile
      064bc4b8
    • Y
  2. 20 1月, 2022 16 次提交
    • S
      fix device_context place print (#39062) · 3dd7f353
      sneaxiy 提交于
      3dd7f353
    • F
      [MLU]add mlu kernel for top_k and top_k_v2 (#39065) · e02dec01
      fwenguang 提交于
      e02dec01
    • F
      [MLU]add mlu kernel for cast and scale op (#38961) · e3e50ea8
      fwenguang 提交于
      e3e50ea8
    • A
      [Pten] Migrate bfloat16/float16/complex from paddle::platform into pten::common (#39044) · f1143f0c
      Aurelius84 提交于
      * Migrate bfloat16/float16/complex from platform into pten::common
      
      * fix typo
      
      * fix code style
      f1143f0c
    • W
      Modify Code AutoGen logics and Support test_imperative decorator and... · 655f76d2
      Weilong Wu 提交于
      Modify Code AutoGen logics and Support test_imperative decorator and layer_children, layer_trainable (#38633)
      
      * Rearranged Eager AutoCodeGen directory structure
      
      * Removed USE_OP in Eager AutoCodeGen
      
      * Enabled generation for Operators without Grad/Inputs/Outputs
      
      * Resolved operators without input
      
      * Fixed merge conflicts
      
      * Enabled Eager AutoCodeGen for 10+ more operators
      
      * Refactored Eager AutoCodeGen with more organized helper objects
      
      * Enabled Eager AutoCodeGen for operators with multiple OpBases
      
      * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument
      
      * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen
      
      * Adjusted function generation/call between Python-C API & Dygraph API
      
      * Synchronized auto-generated Python-C API with Dygraph Forward Functions
      
      * support more eager tensor api
      
      * fix merge compile error
      
      * fix compile error and fit develop code
      
      * support pure CPU
      
      * fix some logic error in eager_mode
      
      * support _varbase_creator in eager mode
      
      * Added safe_initialized interface to EagerTensor for use in processing dispensable inputs
      
      * for eager mode
      
      * refine
      
      * support multiple constructor for eager tensor
      
      * add place related code
      
      * polish code
      
      * specific randint with dtype of int64
      
      * Support pure cpu test
      
      * eager logic
      
      * refine test in pure cpu
      
      * eager logic
      
      * eager logic
      
      * eager logic, test=develop
      
      * skip core.eager when in inference, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * call RetainGrad after run forward kernel, test=develop
      
      * refine, test=develop
      
      * support dygraph util, meta, guard test
      
      * eager test case
      
      * support inference test
      
      * refine test and fix initializer failed
      
      * modify eagertensor patch method
      
      * add eagertensor.clear_grandint, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * support create varbase and fix retain grad error
      
      * call monkey_patch_varbase in _test_eager_guard, test=develop
      
      * fix windows error
      
      * split clear_gradient to clear_gradient and zero_grads, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * support test_imperative_basic test in eager mode
      
      * remove additional log in variable.h
      
      * remove additional log in variable.h
      
      * remove additional code create in merge
      
      * eager
      
      * fix some eager logic, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * Support test_imperative decorator and layer_children, layer_trainable
      
      * Compare ori_dygraph and new_egr
      
      * refine, test=develop
      
      * patch_tensor_method_func, test=develop
      
      * refine, test=develop
      
      * eager test case, test=develop
      
      * refine, test=develop
      
      * Updated assert_equal func
      
      * eager, test=develop
      
      * Updated assert statement
      
      * eager, test=develop
      
      * eager optimizer, test=develop
      
      * eager optimizer, test=develop
      
      * eager test_imperative_optimizer_v2, test=develop
      
      * eager, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * eager, test=develop
      
      * add resize in share buffer to, test=develop
      
      * eager, test=develop
      
      * fix _share_buffer_to, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * support eager for dataloader,test=develop
      
      * Modified eager_generator logic to use ptr
      
      * Updated eager_generator logic
      Co-authored-by: Njim19930609 <jim19930609@gmail.com>
      Co-authored-by: NJiabinYang <360788950@qq.com>
      Co-authored-by: NWang Huan <wanghuan29@baidu.com>
      Co-authored-by: Nwanghuancoder <wanghuancoder@163.com>
      655f76d2
    • Y
      Disable the accuracy test in op benchmark ci temporary, because ci will not... · 3ebe7964
      Yiqun Liu 提交于
      Disable the accuracy test in op benchmark ci temporary, because ci will not fail when accuracy check failed. (#39049)
      
      * Disable the accuracy test in op benchmark ci temporary, because ci will not fail when accuracy check failed.
      
      * Revert the modification in source codes.
      3ebe7964
    • Y
      fix mac ci bug (#38964) · 5d5d8450
      YUNSHEN XIE 提交于
      * test=allcases;notest,test=mac_py3
      
      * fix bug in mac ci
      
      * fix format issue
      5d5d8450
    • Y
      [Auto Parallel] Improve the dist op interface and the compatible computation (#39014) · 9acc26ca
      Yulong Ao 提交于
      * Add the backward support for QR
      
      * Remove unnecessary comments
      
      * [Auto Parallel] Improve the dist op interface and compatible computation
      
      * Remove unnecessary modification
      
      * Recover some modifications
      
      * Add lost files
      
      * Fix a minor bug
      
      * Fix the bug of the planner
      
      * Fix the format problem
      9acc26ca
    • Y
      mod communicator (#39064) · 2a9c993e
      yaoxuefeng 提交于
      2a9c993e
    • Z
      Fix master weight bug for multi_tensor optimizer(momentum, adam) (#38991) · 6b0c57cf
      zhangbo9674 提交于
      * fix mp
      
      * support merged_momentum for mp
      6b0c57cf
    • M
      [Paddle-ASP]Make test_asp_sharding running on non-mac platform (#39034) · c0f27282
      minghaoBD 提交于
      * [Paddle-ASP]Make test_asp_sharding running on non-mac platform
      
      * syntax check
      
      * syntax check
      c0f27282
    • Z
      【PTen】Remove code of converting Tensor to DensoeTensor (#38926) · 8784ec65
      zyfncg 提交于
      * remove MakePtenTensor in BuildKernelContext
      
      * fix a bug caused by storage
      
      * remove WriteBackOutput in dynamic and static mode
      
      * fix complie error of std::max
      
      * fix complie error of std::max
      
      * fix date_type bug
      
      * fix memory alloc bug
      
      * add some debug info
      
      * fix compile problem
      
      * fix problem of data_type check
      
      * comment out some unreached code
      8784ec65
    • S
      remove if !defined(WIN32) (#39058) · 90e9233a
      sneaxiy 提交于
      90e9233a
    • W
      [Eager] Support Eager mode for some testcase (#38783) · d21074cd
      wanghuancoder 提交于
      * Rearranged Eager AutoCodeGen directory structure
      
      * Removed USE_OP in Eager AutoCodeGen
      
      * Enabled generation for Operators without Grad/Inputs/Outputs
      
      * Resolved operators without input
      
      * Fixed merge conflicts
      
      * Enabled Eager AutoCodeGen for 10+ more operators
      
      * Refactored Eager AutoCodeGen with more organized helper objects
      
      * Enabled Eager AutoCodeGen for operators with multiple OpBases
      
      * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument
      
      * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen
      
      * Adjusted function generation/call between Python-C API & Dygraph API
      
      * Synchronized auto-generated Python-C API with Dygraph Forward Functions
      
      * support more eager tensor api
      
      * fix merge compile error
      
      * fix compile error and fit develop code
      
      * support pure CPU
      
      * fix some logic error in eager_mode
      
      * support _varbase_creator in eager mode
      
      * Added safe_initialized interface to EagerTensor for use in processing dispensable inputs
      
      * for eager mode
      
      * refine
      
      * support multiple constructor for eager tensor
      
      * add place related code
      
      * polish code
      
      * specific randint with dtype of int64
      
      * Support pure cpu test
      
      * eager logic
      
      * refine test in pure cpu
      
      * eager logic
      
      * eager logic
      
      * eager logic, test=develop
      
      * skip core.eager when in inference, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * call RetainGrad after run forward kernel, test=develop
      
      * refine, test=develop
      
      * support dygraph util, meta, guard test
      
      * eager test case
      
      * support inference test
      
      * refine test and fix initializer failed
      
      * modify eagertensor patch method
      
      * add eagertensor.clear_grandint, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * support create varbase and fix retain grad error
      
      * call monkey_patch_varbase in _test_eager_guard, test=develop
      
      * fix windows error
      
      * split clear_gradient to clear_gradient and zero_grads, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * support test_imperative_basic test in eager mode
      
      * remove additional log in variable.h
      
      * remove additional log in variable.h
      
      * remove additional code create in merge
      
      * eager
      
      * fix some eager logic, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * patch_tensor_method_func, test=develop
      
      * refine, test=develop
      
      * eager test case, test=develop
      
      * refine, test=develop
      
      * eager, test=develop
      
      * eager, test=develop
      
      * eager optimizer, test=develop
      
      * eager optimizer, test=develop
      
      * eager test_imperative_optimizer_v2, test=develop
      
      * eager, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * eager, test=develop
      
      * add resize in share buffer to, test=develop
      
      * eager, test=develop
      
      * fix _share_buffer_to, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * support eager for dataloader,test=develop
      Co-authored-by: Njim19930609 <jim19930609@gmail.com>
      Co-authored-by: NJiabinYang <360788950@qq.com>
      d21074cd
    • C
      revert cached kernel context removing (#39055) · 4d413d02
      Chen Weihang 提交于
      4d413d02
    • S
      fix gelu compile on CUDA 10 (#39045) · 0617a3ed
      sneaxiy 提交于
      0617a3ed
  3. 19 1月, 2022 5 次提交