1. 27 6月, 2019 1 次提交
    • H
      supports collective communicated training (#18175) · b7128bac
      HaoRen 提交于
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * fix comment
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * fix comment
      test=develop
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * test=develop
      add collective op unittest standard
      
      * test=develop
      remove the test_collective directory
      
      * test=develop
      remove the test_collective directory
      
      * remove slicegather test
      
      * code format for reducescatter
      
      * update attr of shard_index_op
      
      * Modify macro nccl_helper
      
      * remove test without distribute
      
      * macro collective_helper
      
      * marcro update
      
      * test=develop
      update support python3.5
      
      * test=develop change gpu memory use to 0.1 when test
      
      * test=develop
      update ut equal func
      
      * test=develop
      set flags to 1.5
      
      * test=develop fix pickle dumple  py35
      
      * test=develop
      fix divide in slice and add sync_comm_stream
      update atol and rtol to 1e-05
      rm shard_index op and test
      modify read input from file to read from memory
      remove origin_program in framework and add i/o in c_sync_calc_stream
      
      * test=develop update unittest sync operator I/O
      b7128bac
  2. 03 4月, 2019 1 次提交
    • R
      Add Pixel shuffle OP (#15782) · 229dc932
      ruri 提交于
      * add pixel_shuffle op
      
      * add pixel_shuffle op, test=develop
      
      * rewrite code, test=develop
      
      * delete useless comment, test=develop
      
      * Refine pixel_shuffle_op and unit testing
      
      * refine code,test=develop
      
      * refine .cu,test=develop
      
      * fix unittest,test=develop
      
      * Fix unit testing
      test=develop
      
      * resolve conflict, test=develop
      
      * fix test, test=develop
      
      * fix API, test=develop
      
      * fix test datatype bug,test=develop
      
      * polish comments,test=develop
      
      * add API,test=develop
      
      * test=develop
      
      * Add Pixel_Shuffle OP,test=develop
      
      * support python3,test=develop
      
      * add include memory to travis CI bug,test=develop
      229dc932
  3. 21 3月, 2019 2 次提交
  4. 12 2月, 2018 1 次提交
  5. 10 2月, 2018 2 次提交
  6. 26 12月, 2017 1 次提交
  7. 12 12月, 2017 1 次提交
    • Q
      Refine device context (#6433) · 61ec0b95
      QI JUN 提交于
      There are mainly following fixes:
      
      - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
      - remove `eigen_device` interface in base class  `DeviceContext`
      - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
      - remove unused `platform::EigenDeviceConverter`
      - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
      - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
      61ec0b95
  8. 03 11月, 2017 1 次提交
  9. 02 11月, 2017 1 次提交
  10. 13 10月, 2017 1 次提交
    • A
      Adding the Adam Optimizer operator (#4733) · 11680037
      Abhinav Arora 提交于
      * add adam op
      
      moment1_out = beta1 * moment1 + (1 − beta1) * grad
      moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad
      moment1_hat =  moment1_out / (1 - beta1^t)
      moment2_hat =  moment2_out / (1 - beta2^t)
      param_out = param - learning_rate * moment1_hat / (sqrt(moment2_hat) +
      epsilon)
      
      * fix moment 2
      
      * Adding the Adam optimization operator
      
      * Adding more tests for Adam op
      11680037
  11. 07 8月, 2017 1 次提交
  12. 04 8月, 2017 1 次提交
  13. 31 7月, 2017 1 次提交
  14. 25 7月, 2017 1 次提交
  15. 19 7月, 2017 1 次提交