1. 16 10月, 2019 1 次提交
  2. 14 10月, 2019 1 次提交
    • 6
      Dlpack support (#20039) · 12e4be03
      633WHU 提交于
      * support dlpack to tensor and implement python interface test=develop
      
      * add unittest for _to_dlpack and from_dlpack test=develop
      12e4be03
  3. 24 9月, 2019 1 次提交
  4. 11 9月, 2019 1 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
  5. 05 9月, 2019 1 次提交
    • Y
      Integrate NVRTC to support compiling CUDA kernel at runtime (#19422) · 42b5bec6
      Yiqun Liu 提交于
      * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
      test=develop
      
      * Call CUDA driver api to launch the kernel compiled by nvrtc.
      test=develop
      
      * Disable for mac and windows.
      test=develop
      
      * Refine the codes to support manually specified num_threads and workload_per_thread.
      test=develop
      
      * Refine the CUDA kernel to support large dims.
      test=develop
      42b5bec6
  6. 16 8月, 2019 1 次提交
  7. 23 7月, 2019 1 次提交
  8. 27 6月, 2019 2 次提交
    • H
      add dependecy of collective_helper (#18365) · 9931bc64
      HaoRen 提交于
      * add dependecy of collective_helper
      
      * test=develop
      fix dependecy of collective_helper
      9931bc64
    • H
      supports collective communicated training (#18175) · b7128bac
      HaoRen 提交于
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * fix comment
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * fix prepare context redundant code problem, optimize executor by caching create_varaiables
      test=develop
      
      * supports collective training in executor
      
      * make fetch_list runable with variables, add more unittest for use_program_cache
      test=develop
      
      * use unique name for nccl_id
      
      * supports output to stream in program_to_code
      
      * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code
      
      * set op role in collective training
      
      * add collective op role
      
      * fix comment
      test=develop
      
      * remove orig file
      
      * add build optimizer by strategy
      
      * add collective strategy
      
      * refine collective strategy
      
      * add multi-process role maker
      
      * refine strategy building factory so that we can easily plugin more strategy
      
      * scale loss grad in collective sgd transpiler
      
      * add support for distributed fc
      
      * code format
      
      * revert some features for dist fc
      
      * add support for distributed fc training
      
      * test=develop
      add collective op unittest standard
      
      * test=develop
      remove the test_collective directory
      
      * test=develop
      remove the test_collective directory
      
      * remove slicegather test
      
      * code format for reducescatter
      
      * update attr of shard_index_op
      
      * Modify macro nccl_helper
      
      * remove test without distribute
      
      * macro collective_helper
      
      * marcro update
      
      * test=develop
      update support python3.5
      
      * test=develop change gpu memory use to 0.1 when test
      
      * test=develop
      update ut equal func
      
      * test=develop
      set flags to 1.5
      
      * test=develop fix pickle dumple  py35
      
      * test=develop
      fix divide in slice and add sync_comm_stream
      update atol and rtol to 1e-05
      rm shard_index op and test
      modify read input from file to read from memory
      remove origin_program in framework and add i/o in c_sync_calc_stream
      
      * test=develop update unittest sync operator I/O
      b7128bac
  9. 18 4月, 2019 1 次提交
  10. 30 3月, 2019 1 次提交
  11. 29 3月, 2019 3 次提交
  12. 28 3月, 2019 1 次提交
  13. 04 3月, 2019 1 次提交
    • D
      polish cudnn related code and fix bug. (#15164) · 4449e855
      dzhwinter 提交于
      * staged.
      
      * polish code
      
      * polish code. test=develop
      
      * polish code. test=develop
      
      * api change. test=develop
      
      * fix default value. test=develop
      
      * fix default value. test=develop
      4449e855
  14. 27 2月, 2019 1 次提交
    • D
      polish cudnn related code and fix bug. (#15164) · 225c11a9
      dzhwinter 提交于
      * staged.
      
      * polish code
      
      * polish code. test=develop
      
      * polish code. test=develop
      
      * api change. test=develop
      
      * fix default value. test=develop
      
      * fix default value. test=develop
      225c11a9
  15. 25 2月, 2019 3 次提交
  16. 21 2月, 2019 2 次提交
    • T
      disable dam temporarily (#15860) · e3dd6970
      Tao Luo 提交于
      test=develop
      e3dd6970
    • D
      Profiler refine and add CUDA runtime api tracer (#15301) · a83e4704
      Dun 提交于
      * refine profiler && add runtime tracer
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * fix bug && test=develop
      
      * add thread id map && test=develop
      
      * test=develop
      
      * testing
      
      * bug fix
      
      * remove cuda event && refine code && test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * fix windows temp file && test=develop
      
      * test=develop
      
      * fix windows bug && test=develop
      
      * fix start up issue && test=develop
      
      * code polish &&  test=develop
      
      * remove unused code && test=develop
      
      * add some cupti cbid && test=develop
      
      * add FLAGS_multiple_of_cupti_buffer_size && test=develop
      
      * fix compile error && test=develop
      
      * add keyword && test=develop
      
      * fix && test=develop
      
      * code polish && test=develop
      a83e4704
  17. 20 2月, 2019 1 次提交
  18. 03 2月, 2019 1 次提交
  19. 02 2月, 2019 1 次提交
  20. 14 1月, 2019 1 次提交
  21. 08 1月, 2019 1 次提交
  22. 02 1月, 2019 1 次提交
  23. 24 12月, 2018 1 次提交
  24. 21 12月, 2018 1 次提交
    • C
      [Feature] Add Temporary Allocator (#14875) · 79bd6dfa
      chengduo 提交于
      * Add Temporal Allocator
      
      * add Temporay Allocator to DeviceContext
      test=develop
      
      * code refine
      test=develop
      
      * fix mean_iou
      test=develop
      
      * Add DeviceTemporaryAllocator
      test=develop
      
      * fix conv_op bug
      test=develop
      
      * small fix
      test=develop
      
      * code refine
      test=develop
      
      * log refine
      test=develop
      
      * fix unit test
      test=develop
      
      * move double check
      
      * refine concat_and_split
      test=develop
      
      * add limit_of_temporary_allocation
      test=develop
      
      * fix name
      test=develop
      79bd6dfa
  25. 14 12月, 2018 1 次提交
  26. 29 11月, 2018 1 次提交
  27. 22 11月, 2018 1 次提交
    • W
      Windows/online (#14474) · d9a1f3e5
      wopeizl 提交于
      * add recordio support
      
      * disable the openblas multi-thread on windows since no support
      adjust the python script
      
      * code style
      
      * code style
      test=develop
      
      * add create_recordio_file_reader back
      
      * fix code style
      test=develop
      
      * fix the gtest.cmake on windows
      
      * fix cc_test on windows
      
      * fix the win build
      test=develop
      
      * remove fused compile support on windows
      test=develop
      
      * add the jit support
      test=develop
      
      * add the jit support, test=develop
      
      * add the jit support, test=develop
      
      * add the jit back
      fix compile error on windows
      
      * rollback test=develop
      
      * test case fix
      
      * disable DSO by default on windows
      
      * exclude warpctc_op on windows
      
      * exclude the dynload_warpctc out on windows
      test=develop
      
      * fix the scripts error
      test=develop
      
      * disable avx on windows by default
      test=develop
      
      * re-organize the cmake file
      
      * disable mkl on windows by default
      
      * add warp_ctc back
      
      * fix the dependency
      
      * fix the dependency
      
      * fix the build issue on windows
      
      * remove unsupported flag on windows
      
      * code style
      
      * code style
      test=develop
      
      * fix issue
      
      * add profiler, parallel_executor back
      
      * clean up the pre-definitions on windows
      
      * fix build issue
      
      * test=develop
      d9a1f3e5
  28. 21 11月, 2018 1 次提交
  29. 01 11月, 2018 1 次提交
  30. 26 10月, 2018 1 次提交
  31. 29 9月, 2018 1 次提交
  32. 15 9月, 2018 1 次提交
  33. 28 8月, 2018 1 次提交
  34. 27 8月, 2018 1 次提交