- 21 4月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* [Phi] Support setting size of vector<Tensor> for out in yaml (#41576) * support setting vector out size in yaml * support setting size of vector<tensor> for out in yaml * resolve conflict Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>
-
- 08 3月, 2022 1 次提交
-
-
由 Aganlengzi 提交于
* [custom kernel]Upgade support for multi libs * upgrade phi_custom_kernel deps
-
- 23 2月, 2022 1 次提交
-
-
由 chentianyu03 提交于
-
- 22 2月, 2022 1 次提交
-
-
由 chentianyu03 提交于
-
- 27 1月, 2022 1 次提交
-
-
由 Aganlengzi 提交于
* [Demo] custom kernel based on pten kernel * merge and npu custom work well * del comments * delete other code * fix CUDAContext * fix not found small_vector.h * support NPU * fix NPUContext * fix DeviceContext support * add UT * fix call * add UT * fix * fix for comments and ut * add MACRO control * fix multi input output * support env CUSTOM_DEVICE_ROOT * deal with special cases * fix for Windows * try coverage with test_custom_kernel_dot.py * fix test_custom_kernel_dot * fix test_custom_kernel_dot * fix merge * fix merge * fix CI * update * merge and fix * remove WITH_CUSTOM_KERNEL * fix merge * merge and fix * fix ut * fix ut for mac * add more UT * add more UT * fix
-
- 27 11月, 2021 1 次提交
-
-
由 Aganlengzi 提交于
* [NPU] reorganization for device API abstraction * [NPU] delete old files * [NPU] fix npu_collective_helper * [NPU] fix collective_helper * [NPU] fix ut * [NPU] mod memory allocation and hccl_helper * [NPU] fix place_type * [NPU] split enfoce.h * move acl* call into npu_info * merge conflict * fix merge * merge conflict * merge conflict
-
- 28 9月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* read envs in flags_map * add flags to undefok
-
- 27 9月, 2021 1 次提交
-
- 26 9月, 2021 1 次提交
-
-
由 Leo Chen 提交于
-
- 10 9月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* add llvm::SmallVector to paddle * rename small vector file * merge paddle small vector to one file * add small_vector_test * modify smallvector test argument type * add string header
-
- 30 7月, 2021 1 次提交
-
-
由 Leo Chen 提交于
-
- 22 4月, 2021 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 09 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* [feature] support npu allocator (#30840) [feature] support npu allocator * [feature] support npu operator (#30951) [feature] support npu operator * [feature] support npu allocator, part 2 (#30972) * support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute * support parsing ascend rank table file (#31000) support parsing ascend rank table file * Fix reshape on GE graph. (#31084) Fix reshape on GE graph * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973) * add npu sub op * fix typo * rename test * fix bug * fix bug * add fp16 kernel * fix typo * support sub grad op * support elementwise_sub_grad op Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> * Fix compilation problem (#31100) Fix compilation problem (#31100) * fix compile * fix code stype * remove const_cast * support adding correct npu op in pybind.h (#31143) * support adding correct npu op in pybind.h * refine code * [NPU] Support executor with NPU (#31057) * [NPU] Support executor with NPU * Fix code according to reviews * Fix code * Add unittest for sub op npu * refactor npu device manager (#31154) refactor npu device manager (#31154) * fix selected npus * fix compile * fix reading flags from env * format Co-authored-by: Nxiayanming <41795079@qq.com> Co-authored-by: Ngongweibao <weibao.gong@gmail.com> Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
-
- 15 1月, 2021 1 次提交
-
-
由 石晓伟 提交于
-
- 12 1月, 2021 1 次提交
-
-
由 tangwei12 提交于
* rename sendrecv.proto to namespace paddle.distributed * split ps with distributed
-
- 20 11月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 24 9月, 2020 1 次提交
-
-
由 wanghuancoder 提交于
* use iwyu clean include, test=develop, test=win * compilation error, test=develop * fix compilation error2, test=develop * fix compilation error3, test=develop * fix compilation error4, test=develop * fix compilation error5, test=develop * fix compilation error6, test=develop * fix compilation error7, test=develop * fix compilation error8, test=develop * fix compilation error8, test=develop * fix compilation error10, test=develop * fix compilation error11, test=develop
-
- 01 6月, 2020 1 次提交
-
-
由 Wilber 提交于
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 02 8月, 2019 1 次提交
-
-
由 Krzysztof Binias 提交于
* Fix memory leak in test test=develop * Fix memory leak in test test=develop * Fix memory leak in test test=develop * Pull out vars of the loops test=develop
-
- 19 3月, 2019 1 次提交
-
-
由 zhhsplendid 提交于
test=develop
-
- 21 1月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 18 1月, 2019 2 次提交
- 28 12月, 2018 1 次提交
-
-
由 gongweibao 提交于
-
- 26 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
test=develop
-
- 23 11月, 2018 2 次提交
-
-
由 JiabinYang 提交于
-
由 sabreshao 提交于
* HIP cmake. Enable whole archieve build for pybind library. Disable two warning. Rollback to C++11. Link RCCL to WA gpu kernel loading issue. Update eigen to fix build failure. Add more include directories. Fix O3 build failure. Update eigen. fix tensor_util_test segment fault issue add more macro check in hip.cmake. we may consider refine hip.cmake to inherit all add_definitions() in parrent scope, in the future. Fix rocRAND load. Update eigen to fix gru_unit_op and reduce_op. Add HIP support to testing. Update eigen to support int16 and int8 in arg min and arg max. * add rocprim as cub library used by nv implementation * Reduce build time in rocprim. * Add rocprim introduction, remove useless cmake code. * Remove useless flags and format cmake file.
-
- 22 11月, 2018 1 次提交
-
-
由 peizhilin 提交于
-
- 12 11月, 2018 1 次提交
-
-
由 Yu Yang 提交于
-
- 09 11月, 2018 1 次提交
-
-
由 Yu Yang 提交于
test=develop
-
- 08 11月, 2018 1 次提交
-
-
由 minqiyang 提交于
Fix code to support cpplint syntax check test=develop
-
- 28 9月, 2018 1 次提交
-
-
由 Yu Yang 提交于
Use OO style to rewrite memory allocation.
-
- 05 7月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* move to platform * "move init from framework to platform" * "remove used init" * "fix ci" * "fix ci" * "fix generic" * "fix ci" * "fix ci" * "fix ci" * "disable fragile test"
-
- 03 7月, 2018 1 次提交
-
-
由 Xin Pan 提交于
-
- 01 7月, 2018 1 次提交
-
-
由 Xin Pan 提交于
-
- 17 6月, 2018 1 次提交
-
-
由 tensor-tang 提交于
-
- 15 6月, 2018 1 次提交
-
-
由 tensor-tang 提交于
-
- 12 6月, 2018 1 次提交
-
-
由 tensor-tang 提交于
-
- 08 6月, 2018 1 次提交
-
-
由 Luo Tao 提交于
-