- 07 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* fix compile problem on cann 20.3 * fix ut * fix test_mul * fix check_finite_and_scale * fix lookup_table_v2_grad * fix cmake * support print op
-
- 01 4月, 2021 3 次提交
-
-
由 liym27 提交于
-
由 Leo Chen 提交于
* enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * make TensorFromVector/TensorToVector sync
-
由 Leo Chen 提交于
* support npu profiler * add python api * fix bugs * add wrapper for incomplete type * update profile proto * record npu wait * add xpu placeholder
-
- 26 3月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* support GarbageCollector for npu * fix typo * fix gather_grad * disable NPUDefaultStreamGarbageCollector on NPU
-
- 18 3月, 2021 1 次提交
-
-
由 Void Main 提交于
-
- 15 3月, 2021 1 次提交
-
-
由 Leo Chen 提交于
-
- 08 3月, 2021 1 次提交
-
-
由 lw921014 提交于
* add allreduce and broadcast without test * add c_broadcast_test case * build c_comm_init and c_create_group operators * make the whole thing compile * add broadcast and init op test case but run failed * make unit test compile * fix broadcast test bug and change into hcom for ccl * change c_comm_init and c_create_group ops accordingly * make tests compile * transfer code to 27 * compiled successfully in 28, but run failed * test broadcast in 28, but failed * make hcom primitives work * change hccl data type for base.h * fix broadcast bug * make attributes work * fix group name bug * add allreduce but test failed * allreduce bug for qiuliang * allreduce finished * add allgather and reducescatter * merge all op code * add allgather test * finish run all ccl op test exclude send/recv * all all op and test exclude send/recv * send_v2_npu.cc recv_v2_npiu.cc compiled * fix ccl core dump bug and test allgather, reducescatter, broadcast op * fix allreduce bug just for test * hcom send&recv test pass, without hcom_destroy * for qiuliang test * Ascend Send&Recv Test Pass * all op (ex send/recv) ok * fix bug * merge all ccl op * style merge to PaddlePaddle * merge style * new merge style * merge style 2 * insert an empty at the end * disable ctest for hcom to pass ci Co-authored-by: Nvoid-main <voidmain1313113@gmail.com> Co-authored-by: Nf2hkop <f2huestc@outlook.com>
-
- 02 3月, 2021 1 次提交
-
-
由 Void Main 提交于
Refactor HCCLCommContext to be compatible with Paddle (#31359)
-
- 01 3月, 2021 1 次提交
-
-
由 lw921014 提交于
add allreduce and broadcast without test
-
- 25 2月, 2021 2 次提交
- 23 2月, 2021 1 次提交
-
-
由 Leo Chen 提交于
Fix compilation problem (#31100)
-
- 09 2月, 2021 3 次提交
-
-
由 Leo Chen 提交于
* support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute
-
由 Leo Chen 提交于
[feature] support npu operator
-
由 Leo Chen 提交于
[feature] support npu allocator
-
- 21 1月, 2021 1 次提交
-
-
由 gongweibao 提交于
Add distribution supported
-
- 15 1月, 2021 1 次提交
-
-
由 石晓伟 提交于
-
- 13 1月, 2021 2 次提交
-
-
由 Huihuang Zheng 提交于
usleep function in <unistd.h> only takes argument less than 1,000,000. Current call can exceed this limit, we have to fix it. This PR can fix random CI error.
-
由 QingshuChen 提交于
* optimize memcpy perf for kunlun * remove useless unitest for kunlun mean * minor
-
- 12 1月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
-
- 11 1月, 2021 1 次提交
-
-
由 AshburnLee 提交于
-
- 09 1月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
* - Added UT for testing elementwise_mul caching * lint fixes
-
- 07 1月, 2021 1 次提交
-
-
由 WeiXin 提交于
-
- 06 1月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
* Polish and Optimize the print/repr message of all layer * fix some code format
-
- 29 12月, 2020 2 次提交
-
-
由 石晓伟 提交于
-
由 Huihuang Zheng 提交于
PADDLE_RETRY_CUDA_SUCCESS used wrong sleep time so it can cause timeout in unittest. This PR fixed it. After we searched the doc in https://pubs.opengroup.org/onlinepubs/7908799/xsh/unistd.h.html, the time unit of sleep in unistd.h takes "seconds", usleep takes "microseconds", Sleep in windows.h takes "milliseconds".
-
- 28 12月, 2020 3 次提交
- 26 12月, 2020 1 次提交
-
-
由 liuyuhui 提交于
-
- 25 12月, 2020 2 次提交
-
-
由 LielinJiang 提交于
* enable bilateral_slice unittest on windows platform * reduce max threads
-
由 Chen Weihang 提交于
* add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line
-
- 24 12月, 2020 1 次提交
-
-
由 Wilber 提交于
-
- 23 12月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 21 12月, 2020 1 次提交
-
-
由 Huihuang Zheng 提交于
Add Retry Logic to CublasHandlerHolder to avoid random unittest failure.
-
- 19 12月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
* - Reimplemented elementwise_add grad - lint * - fix after review * - Fix to fix after review
-
- 18 12月, 2020 1 次提交
-
-
由 Aurelius84 提交于
-
- 17 12月, 2020 2 次提交
-
-
由 wanghuancoder 提交于
* Windows generate pdb and dump, for debug * fix code style, test=develop * modify cmakelist
-
由 Huihuang Zheng 提交于
Modify CublasHandleHolder from using PADDLE_ENFORCE_CUDA_SUCCESS to PADDLE_RETRY_CUDA_SUCCESS to fix random unittest failure. We checked that the unittest log showed CUDA allocation error at this file, which may due to GPU not enough. We fixed similar failure in the past, so we applied PADDLE_RETRY_CUDA_SUCCESS here.
-