1. 31 3月, 2023 1 次提交
    • H
      register fluid kerenls to phi [part 2] (#52044) · d05b73e4
      huangjiyi 提交于
      * update bipartite_match
      
      * update
      
      * fix bug
      
      * fix test
      
      * fix bug
      
      * fix Kunlun-KP-Build
      
      * Revert "fix Kunlun-KP-Build"
      
      This reverts commit ceab63cc23079fd6839c826bb52db893fb056355.
      
      * update
      d05b73e4
  2. 26 6月, 2022 1 次提交
  3. 20 1月, 2022 1 次提交
  4. 17 1月, 2022 1 次提交
    • W
      [Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5
      Wilber 提交于
      * add pten::Place data structure.
      
      * update ci problem
      
      * fix ci problem
      
      * update
      
      * using platform::Place=pten::Place
      
      * remove BOOST_GET_CONST for CPUPlace and GPUPlace
      
      * compile pass 25%.
      
      * compile pass 45%
      
      * compile pass 60%
      
      * remove boost_get for xpu npu mlu and ipu
      
      * compile pass on cpu and gpu.
      
      * fix compile problem
      
      * fix compile error.
      
      * update
      
      * fix ci problem
      
      * update
      
      * ci approve
      
      * fix ci problem
      
      * fix ci eager test problem
      
      * remove BOOST_GET_CONST
      
      * fix npu compile
      c48a9ad5
  5. 24 9月, 2020 1 次提交
    • W
      use iwyu clean include (#27267) · df43905f
      wanghuancoder 提交于
      * use iwyu clean include, test=develop, test=win
      
      * compilation error, test=develop
      
      * fix compilation error2, test=develop
      
      * fix compilation error3, test=develop
      
      * fix compilation error4, test=develop
      
      * fix compilation error5, test=develop
      
      * fix compilation error6, test=develop
      
      * fix compilation error7, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error10, test=develop
      
      * fix compilation error11, test=develop
      df43905f
  6. 02 7月, 2019 1 次提交
    • Y
      supports collective training with programs (#18392) · a873fa84
      Yi Liu 提交于
      1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
      2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
      3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
      a873fa84
  7. 27 6月, 2019 1 次提交
    • H
      supports collective communicated training (#18175) · b7128bac
      HaoRen 提交于
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * fix comment
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * fix comment
      test=develop
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * test=develop
      add collective op unittest standard
      
      * test=develop
      remove the test_collective directory
      
      * test=develop
      remove the test_collective directory
      
      * remove slicegather test
      
      * code format for reducescatter
      
      * update attr of shard_index_op
      
      * Modify macro nccl_helper
      
      * remove test without distribute
      
      * macro collective_helper
      
      * marcro update
      
      * test=develop
      update support python3.5
      
      * test=develop change gpu memory use to 0.1 when test
      
      * test=develop
      update ut equal func
      
      * test=develop
      set flags to 1.5
      
      * test=develop fix pickle dumple  py35
      
      * test=develop
      fix divide in slice and add sync_comm_stream
      update atol and rtol to 1e-05
      rm shard_index op and test
      modify read input from file to read from memory
      remove origin_program in framework and add i/o in c_sync_calc_stream
      
      * test=develop update unittest sync operator I/O
      b7128bac
  8. 03 4月, 2019 1 次提交
    • R
      Add Pixel shuffle OP (#15782) · 229dc932
      ruri 提交于
      * add pixel_shuffle op
      
      * add pixel_shuffle op, test=develop
      
      * rewrite code, test=develop
      
      * delete useless comment, test=develop
      
      * Refine pixel_shuffle_op and unit testing
      
      * refine code,test=develop
      
      * refine .cu,test=develop
      
      * fix unittest,test=develop
      
      * Fix unit testing
      test=develop
      
      * resolve conflict, test=develop
      
      * fix test, test=develop
      
      * fix API, test=develop
      
      * fix test datatype bug,test=develop
      
      * polish comments,test=develop
      
      * add API,test=develop
      
      * test=develop
      
      * Add Pixel_Shuffle OP,test=develop
      
      * support python3,test=develop
      
      * add include memory to travis CI bug,test=develop
      229dc932
  9. 21 3月, 2019 2 次提交
  10. 12 2月, 2018 1 次提交
  11. 10 2月, 2018 2 次提交
  12. 26 12月, 2017 1 次提交
  13. 12 12月, 2017 1 次提交
    • Q
      Refine device context (#6433) · 61ec0b95
      QI JUN 提交于
      There are mainly following fixes:
      
      - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
      - remove `eigen_device` interface in base class  `DeviceContext`
      - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
      - remove unused `platform::EigenDeviceConverter`
      - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
      - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
      61ec0b95
  14. 03 11月, 2017 1 次提交
  15. 02 11月, 2017 1 次提交
  16. 13 10月, 2017 1 次提交
    • A
      Adding the Adam Optimizer operator (#4733) · 11680037
      Abhinav Arora 提交于
      * add adam op
      
      moment1_out = beta1 * moment1 + (1 − beta1) * grad
      moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad
      moment1_hat =  moment1_out / (1 - beta1^t)
      moment2_hat =  moment2_out / (1 - beta2^t)
      param_out = param - learning_rate * moment1_hat / (sqrt(moment2_hat) +
      epsilon)
      
      * fix moment 2
      
      * Adding the Adam optimization operator
      
      * Adding more tests for Adam op
      11680037
  17. 07 8月, 2017 1 次提交
  18. 04 8月, 2017 1 次提交
  19. 31 7月, 2017 1 次提交
  20. 25 7月, 2017 1 次提交
  21. 19 7月, 2017 1 次提交