- 03 12月, 2019 1 次提交
-
-
由 bingyanghuang 提交于
-
- 02 12月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 28 9月, 2019 1 次提交
-
-
由 qingqing01 提交于
* How to write custom op needs to follow framework OP spec. * Package fluid_framework.so and headers into whl. * Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir. * Export some C-APIs to merge OpInfo between core.so and custom_op.so. * Add unit testing. * Update API.spec.
-
- 24 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 22 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* refine reallocate of workspace size, test=develop * add lock to cudnn handle calls, test=develop
-
- 18 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 03 9月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 08 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop
-
- 03 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 02 7月, 2019 1 次提交
-
-
由 Leo Zhao 提交于
* rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop
-
- 27 6月, 2019 1 次提交
-
-
由 Michał Gallus 提交于
test=develop
-
- 28 4月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn. 2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search. test=develop
-
- 21 4月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop
-
- 25 3月, 2019 1 次提交
-
-
由 nhzlx 提交于
test=develop
-
- 21 3月, 2019 1 次提交
-
-
由 Wu Yi 提交于
-
- 20 3月, 2019 2 次提交
-
-
由 nhzlx 提交于
-
由 Wu Yi 提交于
* wip allreduce in op * wip * wip * wip * wip adding test * wip for conflict with mp mode * fix tests test=develop * fix cpu build test=develop * fix travis clang format test=develop * fix cpu build test=develop * update api.spec test=develop * delete comment test=develop * fix cpplint test=develop * fix test=develop * follow comment test=develop * add file test=develop * fix build test=develop * update test=develop * to be compatible with sync_bn, and fix mp mode in develop test=develop
-
- 19 3月, 2019 1 次提交
-
-
由 zhhsplendid 提交于
test=develop
-
- 16 3月, 2019 1 次提交
-
-
由 qingqing01 提交于
test=develop
-
- 15 3月, 2019 1 次提交
-
-
由 qingqing01 提交于
* Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)
-
- 14 1月, 2019 1 次提交
-
- 11 1月, 2019 2 次提交
-
-
由 chengduozh 提交于
test=develop This reverts commit 064512aa.
-
由 chengduo 提交于
* remove workspace_handle in conv2d_cudnn test=develop * remove workspace_handle test=develop * fix bug test=develop * make test_conv2d_op SERIAL test=develop * save memory in conv_cudnn test=develop * enhance thread safety test=develop * enhance temporary allocator test=develop * Add excess fraction test=develop * follow comments test=develop * fix bug and code refine test=develop * fix memory size check test=develop * rename reuse_tmp_allocation_excess_fraction test=develop
-
- 08 1月, 2019 2 次提交
-
-
由 sneaxiy 提交于
test=develop
-
由 Zeng Jinle 提交于
test=develop
-
- 02 1月, 2019 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 29 12月, 2018 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 25 12月, 2018 1 次提交
-
-
由 chengduo 提交于
* refine tensor test=develop * refine tensor test=develop * fix device_context log test=develop
-
- 21 12月, 2018 1 次提交
-
-
由 chengduo 提交于
* Add Temporal Allocator * add Temporay Allocator to DeviceContext test=develop * code refine test=develop * fix mean_iou test=develop * Add DeviceTemporaryAllocator test=develop * fix conv_op bug test=develop * small fix test=develop * code refine test=develop * log refine test=develop * fix unit test test=develop * move double check * refine concat_and_split test=develop * add limit_of_temporary_allocation test=develop * fix name test=develop
-
- 11 12月, 2018 1 次提交
-
-
由 Yu Yang 提交于
The macro should be defined by compiler rather than by source. test=develop
-
- 03 12月, 2018 1 次提交
-
-
由 sneaxiy 提交于
-
- 22 11月, 2018 1 次提交
-
-
由 chengduo 提交于
* refine cublase test=develop * code refine * refine cublas * add GEMME_EX * add enable_cublas_tensor_op_math doc and add cublasCall test=develop * fix CublasCall for cuda version test=develop * fix error test=develop * fix GEMM_EX to be compatible with gcc 4.8 test=develop * add GEMM_EX test=develop * to compatiable with gcc4.8 test=develop
-
- 15 11月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 08 11月, 2018 2 次提交
-
-
由 peizhilin 提交于
-
由 Zhaolong Xing 提交于
-
- 07 11月, 2018 1 次提交
-
-
由 Yu Yang 提交于
test=develop
-
- 06 11月, 2018 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 31 10月, 2018 1 次提交
-
-
由 Yu Yang 提交于
* feat(platform): lazy initialization of devicecontext in pool Use std::async(deferer, []{...}) to lazy initialize DeviceContext in Pool test=develop * Add future includes test=develop
-
- 30 10月, 2018 1 次提交
-
-
由 dzhwinter 提交于
-