- 26 3月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* support GarbageCollector for npu * fix typo * fix gather_grad * disable NPUDefaultStreamGarbageCollector on NPU
-
- 02 3月, 2021 1 次提交
-
-
由 Void Main 提交于
Refactor HCCLCommContext to be compatible with Paddle (#31359)
-
- 09 2月, 2021 3 次提交
-
-
由 Leo Chen 提交于
* support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute
-
由 Leo Chen 提交于
[feature] support npu operator
-
由 Leo Chen 提交于
[feature] support npu allocator
-
- 11 1月, 2021 1 次提交
-
-
由 AshburnLee 提交于
-
- 28 12月, 2020 1 次提交
-
-
由 liuyuhui 提交于
-
- 26 12月, 2020 1 次提交
-
-
由 liuyuhui 提交于
-
- 23 12月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 15 12月, 2020 1 次提交
-
-
由 AshburnLee 提交于
-
- 14 12月, 2020 2 次提交
-
-
由 arlesniak 提交于
-
由 Jacek Czaja 提交于
-
- 25 11月, 2020 1 次提交
-
-
由 wawltor 提交于
remove eigen threadpool for the speed up
-
- 23 11月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 02 11月, 2020 1 次提交
-
-
由 Huihuang Zheng 提交于
This PR is follow up of #28213. On that PR we tried to decrease GPU usage, however the CI still randomly failed. So I added retry logic for the initialization of nccl and cusolver. If the initialization failed, we can retry to avoid the random failure.
-
- 24 9月, 2020 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop
-
- 17 9月, 2020 1 次提交
-
-
由 Jack Zhou 提交于
enhance reduce op which can reduce tensor with arbitrary rank
-
- 21 8月, 2020 2 次提交
-
-
由 Adam 提交于
* Add mechanism for blocking oneDNN cache clearing * Review changes and Add thread guards
-
由 QingshuChen 提交于
* support Baidu AI Accelerator * test=kunlun * minor * test=kunlun * support xpu op in separate file * test=kunlun * update XPU error message and remove duplicated code * test=kunlun * minor * test=kunlun * minor * test=kunlun
-
- 03 7月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* fix PADDLE_ENFORCE and refine the description test=develop
-
- 14 5月, 2020 1 次提交
-
-
由 pawelpiotrowicz 提交于
test=develop
-
- 24 4月, 2020 1 次提交
-
-
由 Guo Sheng 提交于
* Add cholesky_op forward part. test=develop * Complete cholesky_op forward part. test=develop * Add cholesky_op backward part. test=develop * Complete cholesky_op backward part. test=develop * Refine cholesky_op error check and docs. test=develop * Add grad_check unit test for cholesky_op. test=develop * Fix sample code in cholesky doc. test=develop * Refine some error messages of cholesky_op. test=develop * Refine some error messages of cholesky_op. test=develop * Remove unused input in cholesky_grad. test=develop * Remove unused input in cholesky_grad. test=develop * Fix stream for cusolverDnSetStream. test=develop * Update PADDLE_ENFORCE_CUDA_SUCCESS from cholesky_op to adapt to latest code. test=develop * Add CUSOLVER ERROR in enforce.h test=develop * Fix the missing return value in cholesky. test=develop
-
- 23 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 20 4月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
* Optimize the error messages of paddle CUDA API, test=develop * fix the error messages of paddle CUDA API, test=develop * Refactoring PADDLE_ENFORCE_CUDA_SUCCESS, and apply to curand/cudnn/cublas/NCCL,test=develop * remove build_ex_string,test=develop * merge conflict,test=develop
-
- 17 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
* supports thread-binding stream, test=develop * avoid using thread_local variables in dtor, test=develop * modify the stream priority enum, test=develop
-
- 01 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 30 3月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 08 1月, 2020 1 次提交
-
-
由 zhaoyuchen2018 提交于
stack's wait cost a lot of cpu time, use cuda kernel to do memory copy will reduce cpu time. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 29 11月, 2019 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 18 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop
-
- 14 11月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Improve topk performance. give 200000 data to compute topk, before opt: cost 1s after opt: cost 0.0028s. * Refine return value. * Add cuda util funtions. * Fix ComputeBlockSize bug & refine comments. Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 28 9月, 2019 1 次提交
-
-
由 qingqing01 提交于
* How to write custom op needs to follow framework OP spec. * Package fluid_framework.so and headers into whl. * Add paddle.sysconfig.get_include() and paddle.sysconfig.get_lib() to get include dir and lib dir. * Export some C-APIs to merge OpInfo between core.so and custom_op.so. * Add unit testing. * Update API.spec.
-
- 24 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 22 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* refine reallocate of workspace size, test=develop * add lock to cudnn handle calls, test=develop
-
- 18 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 03 9月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 08 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* add mkldnn shapeblob cache clear strategy test=develop * refine with comments test=develop * make cache clear strategy more safey test=develop * add lock for GetShapeBlobSize test=develop
-
- 03 7月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-