- 25 2月, 2021 1 次提交
-
-
由 liu zhengxi 提交于
* add get_cublas_handle() api * update format * add unittests * alter function name
-
- 20 1月, 2021 2 次提交
-
-
由 AshburnLee 提交于
* Add tf32 support for A100 tensor core acceleration for cuBLAS (#28732) * Fixed an error * Fixed an error
-
由 AshburnLee 提交于
This PR is cherry-picked from PR: #29192 Function: Added TF32 switch for cuDNN. Turned on as default, turned off when users set the switch as False
-
- 29 12月, 2020 1 次提交
-
-
由 liuyuhui 提交于
* [Kunlun] PR1:Support one Kunlun card training in parallel executor (#29337) * [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574) * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29926) * add bkcl.so in whl for kunlun (#29947) * [Kunlun] bug fix of PR2: Support MultiDevicePass and BKCL in parallel executor (#29961) Co-authored-by: NQingshuChen <qingshu.chen714@gmail.com>
-
- 21 12月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 17 12月, 2020 1 次提交
-
-
由 arlesniak 提交于
fix #27935 (comment) by QA @OliverLPH (Could you add some MKLDNN-related print log when use FLAGS_use_mkldnn?)
-
- 25 11月, 2020 1 次提交
-
-
由 wawltor 提交于
remove eigen threadpool for the speed up
-
- 23 11月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 02 11月, 2020 1 次提交
-
-
由 Huihuang Zheng 提交于
This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.
-
- 24 9月, 2020 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop
-
- 17 9月, 2020 1 次提交
-
-
由 Jack Zhou 提交于
enhance reduce op which can reduce tensor with arbitrary rank
-
- 21 8月, 2020 2 次提交
-
-
由 Adam 提交于
* Add mechanism for blocking oneDNN cache clearing * Review changes and Add thread guards
-
由 QingshuChen 提交于
* support Baidu AI Accelerator * test=kunlun * minor * test=kunlun * support xpu op in separate file * test=kunlun * update XPU error message and remove duplicated code * test=kunlun * minor * test=kunlun * minor * test=kunlun
-
- 03 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* fix PADDLE_ENFORCE and refine the description test=develop
-
- 14 5月, 2020 1 次提交
-
-
由 pawelpiotrowicz 提交于
test=develop
-
- 24 4月, 2020 1 次提交
-
-
由 Guo Sheng 提交于
* Add cholesky_op forward part. test=develop * Complete cholesky_op forward part. test=develop * Add cholesky_op backward part. test=develop * Complete cholesky_op backward part. test=develop * Refine cholesky_op error check and docs. test=develop * Add grad_check unit test for cholesky_op. test=develop * Fix sample code in cholesky doc. test=develop * Refine some error messages of cholesky_op. test=develop * Refine some error messages of cholesky_op. test=develop * Remove unused input in cholesky_grad. test=develop * Remove unused input in cholesky_grad. test=develop * Fix stream for cusolverDnSetStream. test=develop * Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code. test=develop * Add CUSOLVER ERROR in enforce.h test=develop * Fix the missing return value in cholesky. test=develop
-
- 23 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 20 4月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
* Optimize the error messages of paddle CUDA API, test=develop * fix the error messages of paddle CUDA API, test=develop * Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop * remove build_ex_string,test=develop * merge conflict,test=develop
-
- 17 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
* supports thread-binding stream, test=develop * avoid using thread_local variables in dtor, test=develop * modify the stream priority enum, test=develop
-
- 01 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 30 3月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 08 1月, 2020 1 次提交
-
-
由 zhaoyuchen2018 提交于
stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 29 11月, 2019 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 18 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop
-
- 14 11月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 28 9月, 2019 1 次提交
-
-
由 qingqing01 提交于
* How to write custom op needs to follow framework OP spec. * Package fluid_framework.so and headers into whl. * Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir. * Export some C-APIs to merge OpInfo between core.so and custom_op.so. * Add unit testing. * Update API.spec.
-
- 24 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 22 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* refine reallocate of workspace size, test=develop * add lock to cudnn handle calls, test=develop
-
- 18 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 03 9月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 08 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop
-
- 03 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 02 7月, 2019 1 次提交
-
-
由 Leo Zhao 提交于
* rename mkldnn set/get_cur_thread_id() to set/get_cur_mkldnn_session_id() test=develop * update session id definition and adjust logic for default behavior test=develop * reset logic in mkldnn reuse as most of cases work in default. test=develop
-
- 27 6月, 2019 1 次提交
-
-
由 Michał Gallus 提交于
test=develop
-
- 28 4月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn. 2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search. test=develop
-
- 21 4月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop
-
- 25 3月, 2019 1 次提交
-
-
由 nhzlx 提交于
test=develop
-
- 21 3月, 2019 1 次提交
-
-
由 Wu Yi 提交于
-