- 13 8月, 2021 1 次提交
-
-
由 Tongxin Bai 提交于
* OP dot: refactor CPU kernels and get better loop performance. * Minor fix on code format. * Fixed minor errors. * Add new API: einsum * Update the Einsum unit test. One case failed with matmul_v2, where the dtype is int64: a = np.arange(2 * 3 * 1).reshape(2, 3, 1) b = np.arange(1) paddle.einsum("...i, ...i", a, b) * Test cases in test_einsum test floating point dtypes only. As of now Paddle only supports float/double dtypes in matmul, which is one of building blocks of this Einsum implementation. We decide not to test einsum against other dtypes. * Polish format. * More formatting. * Format... * Einsum: improve test coverage. * Einsum: bug fixes and more testcases for testing error messages * Einsum: fix format.. * Einsum: fixed typo and format. * Einsum: format again... * Einsum: applied suggested changes. * Einsum API: improve API documentation. * Einsum API: apply suggested changes. * Einsum API: Add dygraph only note. * Einsum API: Add dygraph only note. * Einsum API: fixed unittest.
-
- 29 7月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* add fix op run order pass * add ut for fix_op_run_order * fix ci error * improve coverage * improve coverge again and fix cpu test case * follow some comments
-
- 07 7月, 2021 1 次提交
-
-
由 feng_shuai 提交于
-
- 24 6月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
* Modify the search order of dynamic library * Modify the search order of dynamic library
-
- 02 6月, 2021 1 次提交
-
-
由 Qi Li 提交于
-
- 07 5月, 2021 1 次提交
-
-
由 LielinJiang 提交于
* fix compile error on jetson platform
-
- 06 5月, 2021 1 次提交
-
-
由 ronnywang 提交于
* fix test_unpool_op * fix test_inplace_addto_strategy * fix test_conv2d_fusion_op * fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor * fix test_dot_op * fix test_correlation_op * fix tracer * fix test_memcpy_op
-
- 29 4月, 2021 1 次提交
-
-
由 LielinJiang 提交于
* add op read_file and decode_jpeg
-
- 25 4月, 2021 1 次提交
-
-
由 Pei Yang 提交于
* add trt runtime version check * use different wrap, and change to major version check
-
- 21 4月, 2021 1 次提交
-
-
由 zhang wenhui 提交于
* add allreduce and broadcast without test (#31024) add allreduce and broadcast without test * Refactor HCCLCommContext to be compatible with Paddle (#31359) Refactor HCCLCommContext to be compatible with Paddle (#31359) * [NPU] add npu kernel for communication op (#31437) * add allreduce and broadcast without test * add c_broadcast_test case * build c_comm_init and c_create_group operators * make the whole thing compile * add broadcast and init op test case but run failed * make unit test compile * fix broadcast test bug and change into hcom for ccl * change c_comm_init and c_create_group ops accordingly * make tests compile * transfer code to 27 * compiled successfully in 28, but run failed * test broadcast in 28, but failed * make hcom primitives work * change hccl data type for base.h * fix broadcast bug * make attributes work * fix group name bug * add allreduce but test failed * allreduce bug for qiuliang * allreduce finished * add allgather and reducescatter * merge all op code * add allgather test * finish run all ccl op test exclude send/recv * all all op and test exclude send/recv * send_v2_npu.cc recv_v2_npiu.cc compiled * fix ccl core dump bug and test allgather, reducescatter, broadcast op * fix allreduce bug just for test * hcom send&recv test pass, without hcom_destroy * for qiuliang test * Ascend Send&Recv Test Pass * all op (ex send/recv) ok * fix bug * merge all ccl op * style merge to PaddlePaddle * merge style * new merge style * merge style 2 * insert an empty at the end * disable ctest for hcom to pass ci Co-authored-by: Nvoid-main <voidmain1313113@gmail.com> Co-authored-by: Nf2hkop <f2huestc@outlook.com> * Add auto-increasing tag id for Hcom OPs (#31702) * add c_reduce_sum op (#31793) add c_reduce_sum op * update Ascendrc hccl to 20.3 (#32126) update Ascendrc hccl to 20.3 (#32126) * fix merge code * change cmake.txt1 * [NPU] Support npu kernel for c sync stream op (#31386) * sync stream npu op * add with_ascend_acl * update c++ unittest * compile all failed * try to pre commit * after pre commit * merge&compile&test hccl successfully! * fix code style * fix code style * fix bugs about hccl * fix some bugs * fix code style * fix style * fix style * fix * fixed * merge develop Co-authored-by: Nlw921014 <liuwei921014@yeah.net> Co-authored-by: NVoid Main <voidmain1313113@gmail.com> Co-authored-by: Nf2hkop <f2huestc@outlook.com> Co-authored-by: Nxiayanming <41795079@qq.com>
-
- 15 4月, 2021 1 次提交
-
-
由 furnace 提交于
* [ROCM] bugfix for test_conv_transpose_nn_grad * [ROCM] bugfix for test_batch_norm_op_v2 * [ROCM] bugfix for test_empty_like_op * [ROCM] bugfix for test_conv_transpose_nn_grad
-
- 09 4月, 2021 2 次提交
-
-
由 Leo Chen 提交于
* [feature] support npu allocator (#30840) [feature] support npu allocator * [feature] support npu operator (#30951) [feature] support npu operator * [feature] support npu allocator, part 2 (#30972) * support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute * support parsing ascend rank table file (#31000) support parsing ascend rank table file * Fix reshape on GE graph. (#31084) Fix reshape on GE graph * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973) * add npu sub op * fix typo * rename test * fix bug * fix bug * add fp16 kernel * fix typo * support sub grad op * support elementwise_sub_grad op Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> * Fix compilation problem (#31100) Fix compilation problem (#31100) * fix compile * fix code stype * remove const_cast * support adding correct npu op in pybind.h (#31143) * support adding correct npu op in pybind.h * refine code * [NPU] Support executor with NPU (#31057) * [NPU] Support executor with NPU * Fix code according to reviews * Fix code * Add unittest for sub op npu * refactor npu device manager (#31154) refactor npu device manager (#31154) * fix selected npus * fix compile * fix reading flags from env * format Co-authored-by: Nxiayanming <41795079@qq.com> Co-authored-by: Ngongweibao <weibao.gong@gmail.com> Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
-
由 Aurelius84 提交于
* Remove old custom OP to reduce whl package volume * [Custom OP]Remove old custom OP to reduce whl package volume * support macos
-
- 06 4月, 2021 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 02 4月, 2021 1 次提交
-
-
由 ronnywang 提交于
-
- 19 3月, 2021 1 次提交
-
-
由 ronnywang 提交于
-
- 22 2月, 2021 2 次提交
- 04 2月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include second time, test=develop
-
- 28 1月, 2021 1 次提交
-
-
由 Qi Li 提交于
* [ROCM] update fluid platform for rocm35 (part1), test=develop * address review comments, test=develop
-
- 20 1月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* delete empty line of pybing.cc, test=develop * use nvtx push pop in timeline, test=develop * change year, test=develop * add #ifdef PADDLE_WITH_CUDA, test=develop * add #ifndef WIN32, test=develop * is_pushed to is_pushed_, test=develop
-
- 06 1月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
* Polish and Optimize the print/repr message of all layer * fix some code format
-
- 25 12月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add support for complex grad accumulated * add unittest for coverage * update test dtype * remove useless blank line
-
- 16 12月, 2020 1 次提交
-
-
由 Y_Xuan 提交于
* 添加rocm平台支持代码 * 修改一些问题 * 修改一些歧义并添加备注 * 修改代码格式 * 解决冲突后的代码修改 * 修改operators.cmake * 修改格式 * 修正错误 * 统一接口 * 修改日期
-
- 01 12月, 2020 1 次提交
-
-
由 chentianyu03 提交于
* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types * add test cases for complex elementwise, matmul and getitem unittest * add test cases for complex types * add test cases for complex matmul unittest
-
- 27 11月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
-
- 23 11月, 2020 1 次提交
-
-
由 Pei Yang 提交于
* change avg pooling and global pooling to trt layer * add support for static shape global pooling * modify trt errmsg
-
- 17 11月, 2020 1 次提交
-
-
由 lilong12 提交于
-
- 03 11月, 2020 1 次提交
-
-
由 Shang Zhizhou 提交于
* fp16 result ok * change -DWITH_NVINFER_PLUGIN toconfig.EnableTensorRtOSS * auto detect special slice op converter for ernie with trt oss * ernie oss only support fp16 * fix special_slice_plugin serialize bug * matmul in tensorrt ok * ernie unittest ok * add matmul tensorrt unittest * remove demo code
-
- 21 10月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
-
- 19 10月, 2020 1 次提交
-
-
由 Pei Yang 提交于
-
- 14 10月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
* use exhaustive_search for float16 * tune algo only when dtype is float16
-
- 28 9月, 2020 1 次提交
-
-
由 lilong12 提交于
* include ncclRecv and ncclSend, test=develop
-
- 27 9月, 2020 1 次提交
-
-
由 Li Fuchen 提交于
* add float64 input to ctc_loss * modified error message of warpctc * update repo and tag of warpctc * add test for warpctc with float64 input * modified warpctc.cmake to make sure build always * resolved sample code bug of warpctc * add core.ops in warpctc dygraph * fix a bug of test
-
- 24 9月, 2020 2 次提交
-
-
由 Shibo Tao 提交于
* fix tensorrt 6 build error. test=develop * fix. test=develop * bug fix * test=develop
-
由 wanghuancoder 提交于
* use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop
-
- 23 9月, 2020 1 次提交
-
-
由 Shang Zhizhou 提交于
* [bug fix]:Memory increases after adapting the cudnn version to 8 * [bug fix]cudnnGetConvolutionForwardAlgorithm not defined
-
- 18 9月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* fix cudnn dyload error
-
- 07 9月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* add lstm cudnn of padding data and refine cudnn codes
-
- 03 9月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-