- 29 12月, 2020 1 次提交
-
-
由 liuyuhui 提交于
* [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337) * [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574) * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926) * add bkcl.so in whl for kunlun (#29947) * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29961) Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>
-
- 17 12月, 2020 1 次提交
-
-
由 arlesniak 提交于
fix #27935 (comment) by QA @OliverLPH (Could you add some MKLDNN-related print log when use FLAGS_use_mkldnn?)
-
- 26 11月, 2020 1 次提交
-
-
由 Aurelius84 提交于
-
- 25 11月, 2020 1 次提交
-
-
由 wawltor 提交于
remove eigen threadpool for the speed up
-
- 29 9月, 2020 1 次提交
-
-
由 123malin 提交于
* test=develop, optimize geo communicator
-
- 17 9月, 2020 1 次提交
-
-
由 Jack Zhou 提交于
enhance reduce op which can reduce tensor with arbitrary rank
-
- 21 8月, 2020 2 次提交
-
-
由 Adam 提交于
* Add mechanism for blocking oneDNN cache clearing * Review changes and Add thread guards
-
由 QingshuChen 提交于
* support Baidu AI Accelerator * test=kunlun * minor * test=kunlun * support xpu op in separate file * test=kunlun * update XPU error message and remove duplicated code * test=kunlun * minor * test=kunlun * minor * test=kunlun
-
- 15 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* Refine PADDLE_ENFORCE in paddle/fluid/platform test=develop
-
- 07 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* refine PADDLE_ENFORCE test=develop
-
- 03 6月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* remove REPLACE_ENFORCE_GLOG compile option & add ci rule prohibit LOG(FATAL) using, test=develop * remove ci test case, test=develop * replace all LOG(FATAL) & polish message, test=develop * fix typo, test=develop * polish error info detail, test=develop
-
- 14 5月, 2020 1 次提交
-
-
由 pawelpiotrowicz 提交于
test=develop
-
- 11 5月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add new macro BOOST_GET_SAFELY & unittests, test=develop * add different macro type, test=develop * fix get macro type in executor, test=develop * four macro part change backup * using one macro for all case, test=develop * revert attribute change, test=develop * change to three func to solve gcc4.8 bug, test=develop * polish some details, test=develop
-
- 28 4月, 2020 1 次提交
-
-
由 Sylwester Fraczek 提交于
-
- 24 4月, 2020 1 次提交
-
-
由 Guo Sheng 提交于
* Add cholesky_op forward part. test=develop * Complete cholesky_op forward part. test=develop * Add cholesky_op backward part. test=develop * Complete cholesky_op backward part. test=develop * Refine cholesky_op error check and docs. test=develop * Add grad_check unit test for cholesky_op. test=develop * Fix sample code in cholesky doc. test=develop * Refine some error messages of cholesky_op. test=develop * Refine some error messages of cholesky_op. test=develop * Remove unused input in cholesky_grad. test=develop * Remove unused input in cholesky_grad. test=develop * Fix stream for cusolverDnSetStream. test=develop * Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code. test=develop * Add CUSOLVER ERROR in enforce.h test=develop * Fix the missing return value in cholesky. test=develop
-
- 23 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 18 4月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
* update eigen, test=develop * remove patches, test=develop * add definition of -fabi-version, test=develop * add patch for TensorBlock.h, test=develop * test windows, test=develop * only update eigen for Linux, test=develop * add code comments, test=develop
-
- 17 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
* supports thread-binding stream, test=develop * avoid using thread_local variables in dtor, test=develop * modify the stream priority enum, test=develop
-
- 01 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 31 3月, 2020 1 次提交
-
-
由 Yi Liu 提交于
As nccl comm is not created by CUDADeviceContext, it should be destroyed by the creator as the best practice of RAII.
-
- 30 3月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 08 1月, 2020 1 次提交
-
-
由 zhaoyuchen2018 提交于
stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 10 12月, 2019 1 次提交
-
-
由 Adam 提交于
* MKLDNN v1.0 rebase to Paddle 1.6 test=develop * Add hacky paddle::string::to_string() implementation * vectorize<int64-t>() -> vectorize() cleanup test=develop * PADDLE_ENFORCE and void_cast fixes test=develop * Rebase changes test=develop * Cosmetics test=develop * Delete MKL from mkldnn.cmake test=develop * CMake debug commands test=develop * Delete MKLDNN_VERBOSE and rebase fixes test=develop * Rebase fixes test=develop * Temporarily disable int8 resnet101 vgg16 and vgg19 tests test=develop * Add libmkldnn.so.1 to python setup test=develop * Add libmkldnn.so.1 to inference_lib cmake after rebase test=develop * Post rebase fixes + FC int8 changes test=develop * Fix LRN NHWC test=develop * Fix NHWC conv3d test=develop * Windows build fix + next conv3d fix test=develop * Fix conv2d on AVX2 machines test=develop
-
- 06 12月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 29 11月, 2019 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 18 11月, 2019 1 次提交
-
-
由 liuwei1031 提交于
cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows
-
- 14 11月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 24 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 22 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* refine reallocate of workspace size, test=develop * add lock to cudnn handle calls, test=develop
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 12 8月, 2019 1 次提交
-
-
由 gongweibao 提交于
Polish fleet API to support cuda collective mode and nccl2 mode
-
- 11 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* add config.SetMkldnnCacheCapacity api for mkldnn cache clear strategy test=develop * enhance MkldnnPostReset test=develop * add comments for mkldnn_cache_capacity field test=develop
-
- 08 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop
-
- 03 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 02 7月, 2019 1 次提交
-
-
由 Leo Zhao 提交于
* rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop
-
- 27 6月, 2019 1 次提交
-
-
由 Michał Gallus 提交于
test=develop
-
- 18 6月, 2019 1 次提交
-
-
由 chengduo 提交于
* remove nccl dep when the number of GPU is 1 test=develop
-
- 10 6月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* remove attribute in Allocator::Allocate, test=develop * fix travis ci error, test=develop
-
- 07 6月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix cuda/cudnn version detection error, test=develop * fix again, test=develop
-