1. 03 3月, 2020 1 次提交
  2. 12 11月, 2019 1 次提交
  3. 29 7月, 2019 1 次提交
  4. 02 7月, 2019 1 次提交
    • Y
      supports collective training with programs (#18392) · a873fa84
      Yi Liu 提交于
      1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
      2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
      3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
      a873fa84
  5. 27 6月, 2019 1 次提交
    • H
      supports collective communicated training (#18175) · b7128bac
      HaoRen 提交于
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * fix comment
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * fix comment
      test=develop
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * test=develop
      add collective op unittest standard
      
      * test=develop
      remove the test_collective directory
      
      * test=develop
      remove the test_collective directory
      
      * remove slicegather test
      
      * code format for reducescatter
      
      * update attr of shard_index_op
      
      * Modify macro nccl_helper
      
      * remove test without distribute
      
      * macro collective_helper
      
      * marcro update
      
      * test=develop
      update support python3.5
      
      * test=develop change gpu memory use to 0.1 when test
      
      * test=develop
      update ut equal func
      
      * test=develop
      set flags to 1.5
      
      * test=develop fix pickle dumple  py35
      
      * test=develop
      fix divide in slice and add sync_comm_stream
      update atol and rtol to 1e-05
      rm shard_index op and test
      modify read input from file to read from memory
      remove origin_program in framework and add i/o in c_sync_calc_stream
      
      * test=develop update unittest sync operator I/O
      b7128bac
  6. 08 5月, 2019 1 次提交
  7. 21 4月, 2019 1 次提交
    • Z
      Refine model gpu memory (#16993) · 1202d3fc
      Zeng Jinle 提交于
      * speedup gc and inplace softmax_with_cross_entropy_grad
      test=develop
      
      * refine models gpu mem
      Merge skip vars and warning messages of mem opt
      remove relu mem opt
      test=develop
      
      * follow comments
      test=develop
      1202d3fc
  8. 18 4月, 2019 1 次提交
  9. 08 1月, 2019 1 次提交
  10. 26 12月, 2018 1 次提交
  11. 25 12月, 2018 1 次提交
  12. 08 11月, 2018 1 次提交
    • C
      Fix input<tensor> (#14208) · c5b6573a
      chengduo 提交于
      * fix input<tensor>
      test=develop
      
      * fix split_ids
      test=develop
      
      * ElementwiseMul should not support SelectedRows
      
      * fix scale op
      test=develop
      
      * change GetTensorFromVar() method to GetTensorOrSelectedRowsFromVar()
      
      * fix operator
      
      * refine MultiOutput
      
      * fix MultiOutput
      test=develop
      
      * disable test_dist_save_load
      test=develop
      
      * fix elementwise_op
      test=develop
      
      * add get_sparse_as_op
      test=develop
      
      * add info for check
      test=develop
      
      * rename get_sparse_as_op with extract_rows_as_op.
      test=develop
      
      * elementwise doesn't support selected_rows
      
      * fix regularizer
      
      * remove extract_rows_as
      test=develop
      
      * fix ci
      test=develop
      
      * add test for sum_op
      
      * fix regularizer
      test=develop
      
      *  test=develop
      
      * fix pserver weight decay multi inputs test=develop
      c5b6573a
  13. 30 9月, 2018 1 次提交
  14. 21 9月, 2018 1 次提交
  15. 16 9月, 2018 1 次提交
  16. 14 9月, 2018 1 次提交
  17. 04 9月, 2018 1 次提交
  18. 29 8月, 2018 1 次提交
  19. 23 8月, 2018 3 次提交
  20. 29 5月, 2018 1 次提交
  21. 15 5月, 2018 1 次提交
  22. 07 4月, 2018 1 次提交
  23. 12 2月, 2018 1 次提交
  24. 10 2月, 2018 2 次提交
  25. 05 1月, 2018 1 次提交
    • D
      Feature/use cudnn (#7141) · 5593858d
      dzhwinter 提交于
      * "add c++ side kernel selection"
      
      * "add multiple kernel op test"
      
      * "kernel selection only support cudnn"
      
      * "better formatter"
      
      * "small fix with UseCPU"
      
      * "depends on change interface Get(Place, Library)"
      
      * "fix CI"
      
      * "fix python cudnn test"
      
      * "leave the register cudnn op to another PR"
      
      * "fix CI"
      
      * "use all kernel by default"
      
      * "fix CI"
      5593858d
  26. 25 12月, 2017 1 次提交
    • Q
      Impl kernel hint (#6883) · af0c4c45
      Qiao Longfei 提交于
      * init kernel hint
      
      * fix typo
      
      * rm unused code
      
      * add include in op_kernel.h
      
      * restore op_kernel since it will be moved to op_kernel_type
      
      * change force_cpu to use_cpu
      
      * fix compilation
      af0c4c45
  27. 19 12月, 2017 1 次提交
  28. 25 5月, 2017 1 次提交
  29. 09 12月, 2016 1 次提交
  30. 22 11月, 2016 1 次提交
  31. 29 8月, 2016 1 次提交