- 02 12月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 25 11月, 2019 1 次提交
-
-
由 liuwei1031 提交于
cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows
-
- 24 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 22 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* refine reallocate of workspace size, test=develop * add lock to cudnn handle calls, test=develop
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 12 8月, 2019 1 次提交
-
-
由 gongweibao 提交于
Polish fleet API to support cuda collective mode and nccl2 mode
-
- 11 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy test=develop * enhance MkldnnPostReset test=develop * add comments for mkldnn_cache_capacity field test=develop
-
- 08 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop
-
- 03 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 02 7月, 2019 1 次提交
-
-
由 Leo Zhao 提交于
* rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop
-
- 27 6月, 2019 1 次提交
-
-
由 Michał Gallus 提交于
test=develop
-
- 18 6月, 2019 1 次提交
-
-
由 chengduo 提交于
* remove nccl dep when the number of GPU is 1 test=develop
-
- 10 6月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop
-
- 07 6月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix cuda/cudnn version detection error, test=develop * fix again, test=develop
-
- 28 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 25 3月, 2019 1 次提交
-
-
由 nhzlx 提交于
test=develop
-
- 20 3月, 2019 1 次提交
-
-
由 nhzlx 提交于
-
- 19 3月, 2019 1 次提交
-
-
由 zhhsplendid 提交于
test=develop
-
- 16 3月, 2019 1 次提交
-
-
由 qingqing01 提交于
test=develop
-
- 15 3月, 2019 1 次提交
-
-
由 qingqing01 提交于
* Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)
-
- 22 2月, 2019 1 次提交
-
-
由 Sylwester Fraczek 提交于
reason: dereferencing smart pointer is the same as the underlying pointer test=develop
-
- 19 2月, 2019 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 16 1月, 2019 1 次提交
-
-
由 minqiyang 提交于
-
- 11 1月, 2019 3 次提交
-
-
由 chengduozh 提交于
test=develop
-
由 chengduozh 提交于
test=develop This reverts commit 064512aa.
-
由 chengduo 提交于
* remove workspace_handle in conv2d_cudnn test=develop * remove workspace_handle test=develop * fix bug test=develop * make test_conv2d_op SERIAL test=develop * save memory in conv_cudnn test=develop * enhance thread safety test=develop * enhance temporary allocator test=develop * Add excess fraction test=develop * follow comments test=develop * fix bug and code refine test=develop * fix memory size check test=develop * rename reuse_tmp_allocation_excess_fraction test=develop
-
- 08 1月, 2019 2 次提交
-
-
由 sneaxiy 提交于
test=develop
-
由 Zeng Jinle 提交于
test=develop
-
- 07 1月, 2019 1 次提交
-
-
由 sneaxiy 提交于
-
- 02 1月, 2019 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 29 12月, 2018 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 25 12月, 2018 1 次提交
-
-
由 chengduo 提交于
* refine tensor test=develop * refine tensor test=develop * fix device_context log test=develop
-
- 21 12月, 2018 1 次提交
-
-
由 chengduo 提交于
* Add Temporal Allocator * add Temporay Allocator to DeviceContext test=develop * code refine test=develop * fix mean_iou test=develop * Add DeviceTemporaryAllocator test=develop * fix conv_op bug test=develop * small fix test=develop * code refine test=develop * log refine test=develop * fix unit test test=develop * move double check * refine concat_and_split test=develop * add limit_of_temporary_allocation test=develop * fix name test=develop
-
- 14 12月, 2018 1 次提交
-
-
由 Yan Chunwei 提交于
-
- 10 12月, 2018 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 06 12月, 2018 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 05 12月, 2018 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 14 11月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 09 11月, 2018 1 次提交
-
-
由 qingqing01 提交于
* exhaustive search for cuDNN conv. * Refine code and add unit testing. * Fix model load in fluid/inference and unit testing in conv2d * Follow comments. * Fix compiling test=develop
-
- 08 11月, 2018 1 次提交
-
-
由 Zhaolong Xing 提交于
-