- 05 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop
-
- 16 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 23 7月, 2019 1 次提交
-
-
由 chengduo 提交于
* support sparse gradients test=develop
-
- 27 6月, 2019 2 次提交
-
-
由 HaoRen 提交于
* add dependecy of collective_helper * test=develop fix dependecy of collective_helper
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
- 18 4月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 30 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
* fix compiled test=develop * follow comments test=develop
-
- 29 3月, 2019 3 次提交
-
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
-
- 28 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 04 3月, 2019 1 次提交
-
-
由 dzhwinter 提交于
* staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop
-
- 27 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
* staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop
-
- 25 2月, 2019 3 次提交
- 21 2月, 2019 2 次提交
-
-
由 Tao Luo 提交于
test=develop
-
由 Dun 提交于
* refine profiler && add runtime tracer * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * fix bug && test=develop * add thread id map && test=develop * test=develop * testing * bug fix * remove cuda event && refine code && test=develop * test=develop * test=develop * test=develop * fix windows temp file && test=develop * test=develop * fix windows bug && test=develop * fix start up issue && test=develop * code polish && test=develop * remove unused code && test=develop * add some cupti cbid && test=develop * add FLAGS_multiple_of_cupti_buffer_size && test=develop * fix compile error && test=develop * add keyword && test=develop * fix && test=develop * code polish && test=develop
-
- 20 2月, 2019 1 次提交
-
-
由 Tao Luo 提交于
-
- 03 2月, 2019 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 02 2月, 2019 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 14 1月, 2019 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 08 1月, 2019 1 次提交
-
-
由 peizhilin 提交于
-
- 02 1月, 2019 1 次提交
-
-
由 Xin Pan 提交于
test=develop
-
- 24 12月, 2018 1 次提交
-
-
由 dongdaxiang 提交于
-
- 21 12月, 2018 1 次提交
-
-
由 chengduo 提交于
* Add Temporal Allocator * add Temporay Allocator to DeviceContext test=develop * code refine test=develop * fix mean_iou test=develop * Add DeviceTemporaryAllocator test=develop * fix conv_op bug test=develop * small fix test=develop * code refine test=develop * log refine test=develop * fix unit test test=develop * move double check * refine concat_and_split test=develop * add limit_of_temporary_allocation test=develop * fix name test=develop
-
- 14 12月, 2018 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 29 11月, 2018 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 22 11月, 2018 1 次提交
-
-
由 wopeizl 提交于
* add recordio support * disable the openblas multi-thread on windows since no support adjust the python script * code style * code style test=develop * add create_recordio_file_reader back * fix code style test=develop * fix the gtest.cmake on windows * fix cc_test on windows * fix the win build test=develop * remove fused compile support on windows test=develop * add the jit support test=develop * add the jit support, test=develop * add the jit support, test=develop * add the jit back fix compile error on windows * rollback test=develop * test case fix * disable DSO by default on windows * exclude warpctc_op on windows * exclude the dynload_warpctc out on windows test=develop * fix the scripts error test=develop * disable avx on windows by default test=develop * re-organize the cmake file * disable mkl on windows by default * add warp_ctc back * fix the dependency * fix the dependency * fix the build issue on windows * remove unsupported flag on windows * code style * code style test=develop * fix issue * add profiler, parallel_executor back * clean up the pre-definitions on windows * fix build issue * test=develop
-
- 21 11月, 2018 1 次提交
-
-
由 peizhilin 提交于
-
- 01 11月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 26 10月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 29 9月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 15 9月, 2018 1 次提交
-
-
由 sneaxiy 提交于
-
- 28 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 27 8月, 2018 3 次提交
- 24 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-
- 23 8月, 2018 1 次提交
-
-
由 tensor-tang 提交于
-