1. 01 3月, 2022 1 次提交
    • Z
      [bf16] add bf16 kernel: scale gather sum (#39683) · 6d26b332
      zhangbo9674 提交于
      * add scale gather sum
      
      * refine CUDA_ATOMIC_WRAPPER ADD for bf16
      
      * add gather unittest
      
      * solve conflict
      
      * add scale uinttest
      
      * add sum unittest
      
      * solve conflict
      
      * refine gather unittest
      
      * refine unittest
      6d26b332
  2. 20 2月, 2022 1 次提交
  3. 11 2月, 2022 1 次提交
  4. 25 1月, 2022 1 次提交
    • W
      [Move selected_rows PR #3] Change the relationship of [include/Cmake]. (#39128) · 2bafd338
      Weilong Wu 提交于
      * Added selected_rows and rw_lock to pten
      
      * Renamed the unit test target to fix CI
      
      * Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid
      
      * Remove rw_lock.h,rw_lock_test.cc in fluid
      
      * Use pten::RWLock and pten::AutoRDLock, fix CI
      
      * Use pten::SelectedRows
      
      * Use pten::SelectedRows
      
      * Fix to pass NPU CI
      
      * Use pten::SelectedRows, to pass NPU CI
      
      * To fix NPU CI
      
      * To fix NPU CI again
      2bafd338
  5. 17 1月, 2022 1 次提交
    • W
      [Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5
      Wilber 提交于
      * add pten::Place data structure.
      
      * update ci problem
      
      * fix ci problem
      
      * update
      
      * using platform::Place=pten::Place
      
      * remove BOOST_GET_CONST for CPUPlace and GPUPlace
      
      * compile pass 25%.
      
      * compile pass 45%
      
      * compile pass 60%
      
      * remove boost_get for xpu npu mlu and ipu
      
      * compile pass on cpu and gpu.
      
      * fix compile problem
      
      * fix compile error.
      
      * update
      
      * fix ci problem
      
      * update
      
      * ci approve
      
      * fix ci problem
      
      * fix ci eager test problem
      
      * remove BOOST_GET_CONST
      
      * fix npu compile
      c48a9ad5
  6. 16 9月, 2020 1 次提交
  7. 11 5月, 2020 1 次提交
    • C
      Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f
      Chen Weihang 提交于
      * add new macro BOOST_GET_SAFELY & unittests, test=develop
      
      * add different macro type, test=develop
      
      * fix get macro type in executor, test=develop
      
      * four macro part change backup
      
      * using one macro for all case, test=develop
      
      * revert attribute change, test=develop
      
      * change to three func to solve gcc4.8 bug, test=develop
      
      * polish some details, test=develop
      aa0f254f
  8. 16 1月, 2020 1 次提交
  9. 29 11月, 2019 1 次提交
    • H
      Fix Cond Bug for Nested Control Flow (#21340) · 630be319
      Huihuang Zheng 提交于
      * Commit before merging develop
      
      test=develop
      
      * Backup after working with Huihuang logs
      
      * Commit before deleting Huihuang debug loggings
      
      * Commit before debug
      
      test=develop
      
      * Fix bug commit
      
      test=develop
      
      * Backup of fixing bugs
      
      test=develop
      
      * Clean up code
      
      test=develop
      
      * Fix a bug in sum_op
      
      test=develop
      630be319
  10. 15 10月, 2019 1 次提交
  11. 11 9月, 2019 1 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
  12. 22 8月, 2019 1 次提交
    • L
      Enhance OpTest to check the consistency of operators when using and not using inplace (#19101) · a9d5fc51
      Leo Chen 提交于
      * add pybind interface to get all inplace ops, test=develop
      
      * enhance OpTest to check whether the consistency of operator when using and not using inplace, test=develop
      
      * handle corner cases in op_test, test=develop
      
      * support outputs without tensor holder_, like XShape in reshape_op, test=develop
      
      * fix bug, some op has GradOpMaker, but actually no grad_op in OpInfoMap, test=develop
      
      * use reshape_grad instead of reshape in FlattenGradOp, test=develop
      
      * fix error debug dims info for variables like XShape, test=develop
      
      * change computational order in sum_op to relieve computation difference using inplace, test=develop
      
      * add inplace_atol to check group_norm, and skip inplace_grad for mkldnn, test=develop
      
      * follow sneaxiy's comments, test=develop
      
      * remove unused DefaultGradOpDescMaker in mkldnn op, test=develop
      a9d5fc51
  13. 11 7月, 2019 1 次提交
    • Z
      Feature/buffer_shared_inplace (#17911) · d3003a16
      Zeng Jinle 提交于
      * feature/buffer_shared_inplace, test=develop
      
      * refine code, test=develop
      
      * fix elementwise_add op cpu inplace and sum inplace bug, test=develop
      
      * add unittest and debug log, test=develop
      
      * fix parallel_executor scope bug, polish code, test=develop
      
      * fix sum op, activation op, single_in_place_inference bug, test=develop
      
      * remove kLocalExecScopeName, test=develop
      
      * fix unittest,test=develop
      
      * fix out_var first version bug, test=develop
      
      * follow comments,test=develop
      d3003a16
  14. 16 6月, 2019 1 次提交
  15. 08 5月, 2019 1 次提交
  16. 07 5月, 2019 1 次提交
  17. 11 12月, 2018 1 次提交
  18. 07 11月, 2018 1 次提交
    • C
      Add fp16 backward support (#14202) · a9b5d42d
      chengduo 提交于
      * add fp16 backward support
      test=develop
      
      * add sum_op fp16 test
      
      * disable test_dist_save_load
      test=develop
      
      * add check_grad for sum
      
      * add unit test for softmax_grad fp16
      test=develop
      
      * add scale_op unit test
      
      * add mul_grad_op unit test for fp16
      
      * add cross_entropy_grad and eman_grad unit test for fp16
      test=develop
      
      * fix cross_entropy unit test
      
      * add pool2d fp16 unit test
      
      * refine conv2d fp16 unit test
      test=develop
      
      * refine activation unit test
      test=develop
      
      * fix ci
      test=develop
      
      * follow zhihong's comment, copy from https://github.com/PaddlePaddle/Paddle/pull/12796
      test=develop
      a9b5d42d
  19. 17 8月, 2018 1 次提交
  20. 16 8月, 2018 1 次提交
  21. 12 2月, 2018 1 次提交
  22. 10 2月, 2018 2 次提交
  23. 12 12月, 2017 1 次提交
    • Q
      Refine device context (#6433) · 61ec0b95
      QI JUN 提交于
      There are mainly following fixes:
      
      - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
      - remove `eigen_device` interface in base class  `DeviceContext`
      - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
      - remove unused `platform::EigenDeviceConverter`
      - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
      - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
      61ec0b95
  24. 23 11月, 2017 1 次提交
  25. 27 10月, 2017 1 次提交
    • Y
      Gradient check use graph (#5027) · be00b0c4
      Yu Yang 提交于
      * Simplize Gradient Check
      
      * Stash
      
      * Extract apply_backward_pass to backward.py
      
      Rename apply_backward_pass to append_backward_ops
      
      * Use graph API to check gradient
      
      * Fix ci
      
      * Fix CI
      
      * Fix backward for double precision
      
      * Stash
      
      * Fix CI
      
      * Fix ci
      
      * Ignore GRU test
      
      * Ignore xe op
      
      * Fix CI
      
      * Fix softmax with xe gradient
      
      The correct equation should be IG = OG * (d_softmax_with_xe())
      
      * Fix typo
      
      * Fix merge error
      
      * Disable LRN
      be00b0c4
  26. 03 10月, 2017 2 次提交
  27. 05 9月, 2017 1 次提交
  28. 04 9月, 2017 1 次提交
  29. 26 8月, 2017 1 次提交
  30. 25 8月, 2017 1 次提交
  31. 07 8月, 2017 1 次提交
  32. 04 8月, 2017 1 次提交
  33. 31 7月, 2017 1 次提交
  34. 25 7月, 2017 1 次提交
  35. 19 7月, 2017 1 次提交