- 12 4月, 2021 2 次提交
-
-
由 Leo Chen 提交于
* fix NPUDeviceContext in all c++ unittest * refine log Co-authored-by: Npangyoki <pangyoki@126.com>
-
由 Leo Chen 提交于
* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace * refine code
-
- 07 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* fix compile problem on cann 20.3 * fix ut * fix test_mul * fix check_finite_and_scale * fix lookup_table_v2_grad * fix cmake * support print op
-
- 01 4月, 2021 3 次提交
-
-
由 liym27 提交于
-
由 Leo Chen 提交于
* enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * make TensorFromVector/TensorToVector sync
-
由 Leo Chen 提交于
* support npu profiler * add python api * fix bugs * add wrapper for incomplete type * update profile proto * record npu wait * add xpu placeholder
-
- 26 3月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* support GarbageCollector for npu * fix typo * fix gather_grad * disable NPUDefaultStreamGarbageCollector on NPU
-
- 18 3月, 2021 1 次提交
-
-
由 Void Main 提交于
-
- 15 3月, 2021 1 次提交
-
-
由 Leo Chen 提交于
-
- 08 3月, 2021 1 次提交
-
-
由 lw921014 提交于
* add allreduce and broadcast without test * add c_broadcast_test case * build c_comm_init and c_create_group operators * make the whole thing compile * add broadcast and init op test case but run failed * make unit test compile * fix broadcast test bug and change into hcom for ccl * change c_comm_init and c_create_group ops accordingly * make tests compile * transfer code to 27 * compiled successfully in 28, but run failed * test broadcast in 28, but failed * make hcom primitives work * change hccl data type for base.h * fix broadcast bug * make attributes work * fix group name bug * add allreduce but test failed * allreduce bug for qiuliang * allreduce finished * add allgather and reducescatter * merge all op code * add allgather test * finish run all ccl op test exclude send/recv * all all op and test exclude send/recv * send_v2_npu.cc recv_v2_npiu.cc compiled * fix ccl core dump bug and test allgather, reducescatter, broadcast op * fix allreduce bug just for test * hcom send&recv test pass, without hcom_destroy * for qiuliang test * Ascend Send&Recv Test Pass * all op (ex send/recv) ok * fix bug * merge all ccl op * style merge to PaddlePaddle * merge style * new merge style * merge style 2 * insert an empty at the end * disable ctest for hcom to pass ci Co-authored-by: Nvoid-main <voidmain1313113@gmail.com> Co-authored-by: Nf2hkop <f2huestc@outlook.com>
-
- 02 3月, 2021 1 次提交
-
-
由 Void Main 提交于
Refactor HCCLCommContext to be compatible with Paddle (#31359)
-
- 01 3月, 2021 1 次提交
-
-
由 lw921014 提交于
add allreduce and broadcast without test
-
- 25 2月, 2021 2 次提交
- 23 2月, 2021 1 次提交
-
-
由 Leo Chen 提交于
Fix compilation problem (#31100)
-
- 09 2月, 2021 3 次提交
-
-
由 Leo Chen 提交于
* support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute
-
由 Leo Chen 提交于
[feature] support npu operator
-
由 Leo Chen 提交于
[feature] support npu allocator
-
- 21 1月, 2021 1 次提交
-
-
由 gongweibao 提交于
Add distribution supported
-
- 15 1月, 2021 1 次提交
-
-
由 石晓伟 提交于
-
- 13 1月, 2021 2 次提交
-
-
由 Huihuang Zheng 提交于
usleep function in <unistd.h> only takes argument less than 1,000,000. Current call can exceed this limit, we have to fix it. This PR can fix random CI error.
-
由 QingshuChen 提交于
* optimize memcpy perf for kunlun * remove useless unitest for kunlun mean * minor
-
- 12 1月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
-
- 11 1月, 2021 1 次提交
-
-
由 AshburnLee 提交于
-
- 09 1月, 2021 1 次提交
-
-
由 Jacek Czaja 提交于
* - Added UT for testing elementwise_mul caching * lint fixes
-
- 07 1月, 2021 1 次提交
-
-
由 WeiXin 提交于
-
- 06 1月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
* Polish and Optimize the print/repr message of all layer * fix some code format
-
- 29 12月, 2020 2 次提交
-
-
由 石晓伟 提交于
-
由 Huihuang Zheng 提交于
PADDLE_RETRY_CUDA_SUCCESS used wrong sleep time so it can cause timeout in unittest. This PR fixed it. After we searched the doc in https://pubs.opengroup.org/onlinepubs/7908799/xsh/unistd.h.html, the time unit of sleep in unistd.h takes "seconds", usleep takes "microseconds", Sleep in windows.h takes "milliseconds".
-
- 28 12月, 2020 3 次提交
- 26 12月, 2020 1 次提交
-
-
由 liuyuhui 提交于
-
- 25 12月, 2020 2 次提交
-
-
由 LielinJiang 提交于
* enable bilateral_slice unittest on windows platform * reduce max threads
-
由 Chen Weihang 提交于
* add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line
-
- 24 12月, 2020 1 次提交
-
-
由 Wilber 提交于
-
- 23 12月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 21 12月, 2020 1 次提交
-
-
由 Huihuang Zheng 提交于
Add Retry Logic to CublasHandlerHolder to avoid random unittest failure.
-
- 19 12月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
* - Reimplemented elementwise_add grad - lint * - fix after review * - Fix to fix after review
-
- 18 12月, 2020 1 次提交
-
-
由 Aurelius84 提交于
-