- 21 4月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
* cherry-pick,Optimize the error messages of paddle CUDA API * fix the error messages of paddle CUDA API * Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL * remove build_ex_string
-
- 01 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 30 3月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 25 3月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 18 3月, 2020 1 次提交
-
-
由 Yi Liu 提交于
initialize global nccl context in dygraph test=develop
-
- 04 3月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
* add recorded cuda memory apis, fix typo, test=develop * add more ut, test=develop * follow comments, test=develop * fix py35 incompatible issues, test=develop
-
- 03 12月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
* add jeston compile support test=develop * refine the cmake test=develop
-
- 25 11月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 08 11月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
* Enrich the type of error and declare the error type interfaces, test=develop * adjust tests to adapt new form, test=develop * add inference deps with error_codes.pb.h, test=develop * restore stack iter start pos, test=develop * polish code based review comments, test=develop
-
- 16 10月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 14 10月, 2019 1 次提交
-
-
由 633WHU 提交于
* support dlpack to tensor and implement python interface test=develop * add unittest for _to_dlpack and from_dlpack test=develop
-
- 24 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 05 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop
-
- 16 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 23 7月, 2019 1 次提交
-
-
由 chengduo 提交于
* support sparse gradients test=develop
-
- 27 6月, 2019 2 次提交
-
-
由 HaoRen 提交于
* add dependecy of collective_helper * test=develop fix dependecy of collective_helper
-
由 HaoRen 提交于
* fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * fix comment test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * fix prepare context redundant code problem, optimize executor by caching create_varaiables test=develop * supports collective training in executor * make fetch_list runable with variables, add more unittest for use_program_cache test=develop * use unique name for nccl_id * supports output to stream in program_to_code * insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code * set op role in collective training * add collective op role * fix comment test=develop * remove orig file * add build optimizer by strategy * add collective strategy * refine collective strategy * add multi-process role maker * refine strategy building factory so that we can easily plugin more strategy * scale loss grad in collective sgd transpiler * add support for distributed fc * code format * revert some features for dist fc * add support for distributed fc training * test=develop add collective op unittest standard * test=develop remove the test_collective directory * test=develop remove the test_collective directory * remove slicegather test * code format for reducescatter * update attr of shard_index_op * Modify macro nccl_helper * remove test without distribute * macro collective_helper * marcro update * test=develop update support python3.5 * test=develop change gpu memory use to 0.1 when test * test=develop update ut equal func * test=develop set flags to 1.5 * test=develop fix pickle dumple py35 * test=develop fix divide in slice and add sync_comm_stream update atol and rtol to 1e-05 rm shard_index op and test modify read input from file to read from memory remove origin_program in framework and add i/o in c_sync_calc_stream * test=develop update unittest sync operator I/O
-
- 18 4月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 30 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
* fix compiled test=develop * follow comments test=develop
-
- 29 3月, 2019 3 次提交
-
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
test=develop
-
由 dongdaxiang 提交于
-
- 28 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 04 3月, 2019 1 次提交
-
-
由 dzhwinter 提交于
* staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop
-
- 27 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
* staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop
-
- 25 2月, 2019 3 次提交
- 21 2月, 2019 2 次提交
-
-
由 Tao Luo 提交于
test=develop
-
由 Dun 提交于
* refine profiler && add runtime tracer * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * fix bug && test=develop * add thread id map && test=develop * test=develop * testing * bug fix * remove cuda event && refine code && test=develop * test=develop * test=develop * test=develop * fix windows temp file && test=develop * test=develop * fix windows bug && test=develop * fix start up issue && test=develop * code polish && test=develop * remove unused code && test=develop * add some cupti cbid && test=develop * add FLAGS_multiple_of_cupti_buffer_size && test=develop * fix compile error && test=develop * add keyword && test=develop * fix && test=develop * code polish && test=develop
-
- 20 2月, 2019 1 次提交
-
-
由 Tao Luo 提交于
-
- 03 2月, 2019 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 02 2月, 2019 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 14 1月, 2019 1 次提交
-
-
由 peizhilin 提交于
test=develop
-
- 08 1月, 2019 1 次提交
-
-
由 peizhilin 提交于
-
- 02 1月, 2019 1 次提交
-
-
由 Xin Pan 提交于
test=develop
-
- 24 12月, 2018 1 次提交
-
-
由 dongdaxiang 提交于
-
- 21 12月, 2018 1 次提交
-
-
由 chengduo 提交于
* Add Temporal Allocator * add Temporay Allocator to DeviceContext test=develop * code refine test=develop * fix mean_iou test=develop * Add DeviceTemporaryAllocator test=develop * fix conv_op bug test=develop * small fix test=develop * code refine test=develop * log refine test=develop * fix unit test test=develop * move double check * refine concat_and_split test=develop * add limit_of_temporary_allocation test=develop * fix name test=develop
-
- 14 12月, 2018 1 次提交
-
-
由 peizhilin 提交于
test=develop
-