- 14 10月, 2019 1 次提交
-
-
由 633WHU 提交于
-
- 24 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 05 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop
-
- 16 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 23 7月, 2019 1 次提交
-
-
由 chengduo 提交于
* support sparse gradients test=develop
-
- 27 6月, 2019 2 次提交
-
-
由 HaoRen 提交于
* add dependecy of collective_helper * test=develop fix dependecy of collective_helper
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
- 18 4月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 30 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
* fix compiled test=develop * follow comments test=develop
-
- 29 3月, 2019 3 次提交
-
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
-
- 28 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 04 3月, 2019 1 次提交
-
-
由 dzhwinter 提交于
* staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop
-
- 27 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
* staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop
-
- 25 2月, 2019 3 次提交
- 21 2月, 2019 2 次提交
-
-
由 Tao Luo 提交于
test=develop
-
由 Dun 提交于
* refine profiler && add runtime tracer * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * fix bug && test=develop * add thread id map && test=develop * test=develop * testing * bug fix * remove cuda event && refine code && test=develop * test=develop * test=develop * test=develop * fix windows temp file && test=develop * test=develop * fix windows bug && test=develop * fix start up issue && test=develop * code polish && test=develop * remove unused code && test=develop * add some cupti cbid && test=develop * add FLAGS_multiple_of_cupti_buffer_size && test=develop * fix compile error && test=develop * add keyword && test=develop * fix && test=develop * code polish && test=develop
-
- 20 2月, 2019 1 次提交
-
-
由 Tao Luo 提交于
-
- 03 2月, 2019 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 02 2月, 2019 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 14 1月, 2019 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 08 1月, 2019 1 次提交
-
-
由 peizhilin 提交于
-
- 02 1月, 2019 1 次提交
-
-
由 Xin Pan 提交于
test=develop
-
- 24 12月, 2018 1 次提交
-
-
由 dongdaxiang 提交于
-
- 21 12月, 2018 1 次提交
-
-
由 chengduo 提交于
* Add Temporal Allocator * add Temporay Allocator to DeviceContext test=develop * code refine test=develop * fix mean_iou test=develop * Add DeviceTemporaryAllocator test=develop * fix conv_op bug test=develop * small fix test=develop * code refine test=develop * log refine test=develop * fix unit test test=develop * move double check * refine concat_and_split test=develop * add limit_of_temporary_allocation test=develop * fix name test=develop
-
- 14 12月, 2018 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 29 11月, 2018 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 22 11月, 2018 1 次提交
-
-
由 wopeizl 提交于
* add recordio support * disable the openblas multi-thread on windows since no support adjust the python script * code style * code style test=develop * add create_recordio_file_reader back * fix code style test=develop * fix the gtest.cmake on windows * fix cc_test on windows * fix the win build test=develop * remove fused compile support on windows test=develop * add the jit support test=develop * add the jit support, test=develop * add the jit support, test=develop * add the jit back fix compile error on windows * rollback test=develop * test case fix * disable DSO by default on windows * exclude warpctc_op on windows * exclude the dynload_warpctc out on windows test=develop * fix the scripts error test=develop * disable avx on windows by default test=develop * re-organize the cmake file * disable mkl on windows by default * add warp_ctc back * fix the dependency * fix the dependency * fix the build issue on windows * remove unsupported flag on windows * code style * code style test=develop * fix issue * add profiler, parallel_executor back * clean up the pre-definitions on windows * fix build issue * test=develop
-
- 21 11月, 2018 1 次提交
-
-
由 peizhilin 提交于
-
- 01 11月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 26 10月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 29 9月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 15 9月, 2018 1 次提交
-
-
由 sneaxiy 提交于
-
- 28 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 27 8月, 2018 2 次提交