1. 20 7月, 2019 1 次提交
  2. 19 7月, 2019 2 次提交
  3. 18 7月, 2019 2 次提交
  4. 17 7月, 2019 3 次提交
  5. 16 7月, 2019 2 次提交
    • J
      [MKL-DNN] Reimplemented pool2d mkl-dnn to use Acquire API (#18585) · 71d883b8
      Jacek Czaja 提交于
      * - Added partial draft of pooling acquire
      
      - Workspace support
      
      - compilation fix
      
      - Added draft of pooling backward reimplementation
      
      - Segfault fix
      
      - reverted 'any' for diff_dst crewation in pooling
      
      - Lint fixes
      
      test=develop
      
      - lint fixes
      
      test=develop
      
      - Further lint fixes
      
      test=develop
      
      * - Fixes after review
      
      test=develop
      
      * - Lint fixes
      
      test=develop
      
      * - Even more lint fixes
      
      test=develop
      71d883b8
    • C
      fix bug of scatter op (#18640) · f4ec7d54
      chengduo 提交于
      test=develop
      f4ec7d54
  6. 15 7月, 2019 1 次提交
  7. 11 7月, 2019 2 次提交
    • H
      fix cudnn lstm shape bug; test=develop (#18492) · a20b2b43
      Hongyu Liu 提交于
      a20b2b43
    • Z
      Feature/buffer_shared_inplace (#17911) · d3003a16
      Zeng Jinle 提交于
      * feature/buffer_shared_inplace, test=develop
      
      * refine code, test=develop
      
      * fix elementwise_add op cpu inplace and sum inplace bug, test=develop
      
      * add unittest and debug log, test=develop
      
      * fix parallel_executor scope bug, polish code, test=develop
      
      * fix sum op, activation op, single_in_place_inference bug, test=develop
      
      * remove kLocalExecScopeName, test=develop
      
      * fix unittest,test=develop
      
      * fix out_var first version bug, test=develop
      
      * follow comments,test=develop
      d3003a16
  8. 10 7月, 2019 4 次提交
  9. 09 7月, 2019 3 次提交
  10. 08 7月, 2019 1 次提交
  11. 05 7月, 2019 1 次提交
  12. 04 7月, 2019 2 次提交
  13. 03 7月, 2019 6 次提交
  14. 02 7月, 2019 3 次提交
    • L
      rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() (#18453) · 8f5fffca
      Leo Zhao 提交于
      * rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id()
      
      test=develop
      
      * update session id definition and adjust logic for default behavior
      
      test=develop
      
      * reset logic in mkldnn reuse as most of cases work in default.
      
      test=develop
      8f5fffca
    • Y
      supports collective training with programs (#18392) · a873fa84
      Yi Liu 提交于
      1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
      2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
      3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis
      a873fa84
    • C
      Add find_no_grad_vars in backward.py (#17942) · e0d8c6ac
      chengduo 提交于
      * add not_been_used_vars to no_grad_set
      test=develop
      e0d8c6ac
  15. 01 7月, 2019 2 次提交
  16. 28 6月, 2019 2 次提交
  17. 27 6月, 2019 3 次提交
    • T
      fix communicator with pyreader (#18350) · 999d9a59
      tangwei12 提交于
      * add is_runnning in communicator, test=develop
      999d9a59
    • H
      supports collective communicated training (#18175) · b7128bac
      HaoRen 提交于
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * fix comment
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * fix comment
      test=develop
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * test=develop
      add collective op unittest standard
      
      * test=develop
      remove the test_collective directory
      
      * test=develop
      remove the test_collective directory
      
      * remove slicegather test
      
      * code format for reducescatter
      
      * update attr of shard_index_op
      
      * Modify macro nccl_helper
      
      * remove test without distribute
      
      * macro collective_helper
      
      * marcro update
      
      * test=develop
      update support python3.5
      
      * test=develop change gpu memory use to 0.1 when test
      
      * test=develop
      update ut equal func
      
      * test=develop
      set flags to 1.5
      
      * test=develop fix pickle dumple  py35
      
      * test=develop
      fix divide in slice and add sync_comm_stream
      update atol and rtol to 1e-05
      rm shard_index op and test
      modify read input from file to read from memory
      remove origin_program in framework and add i/o in c_sync_calc_stream
      
      * test=develop update unittest sync operator I/O
      b7128bac
    • S
      add int8 mkldnn prior_box (#17242) · 9252e8fa
      Sylwester Fraczek 提交于
      add prior_box quantization code
      
      add scale algo rules for prior box
      
      test=develop
      9252e8fa