- 14 3月, 2022 1 次提交
-
-
由 Zhong Hui 提交于
[multiprocessing] Add paddle.incubate.multiprocessing for sharing tensors between python processes. (#37302) * Add support for paddle.multiprocessing * move multiprocessing to incubate.
-
- 03 3月, 2022 1 次提交
-
-
由 From00 提交于
* Support cuda graph in StreamSafeCudaAllocator * Fix CI error * Arrange AllocatorFacade * Fix CI error * Fix CI error * Fix ROCM Compile error * Fix ROCM Compile error
-
- 15 2月, 2022 1 次提交
-
-
由 ronnywang 提交于
* [CustomRuntime] Add DeviceManager * [CustomRuntime] Add DeviceInterface * [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager * [CustomRuntime] Add plug-in device * [CustomRuntime] Memory module support PluggableDevice * [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option * update * [API] update API doc based on comments, test=develop Co-authored-by: Nqili93 <qili93@qq.com>
-
- 08 2月, 2022 1 次提交
-
-
由 From00 提交于
* Rough implementation for experiment * Support allocate cuda managed memory * Fix CI error * Modify UT * Check whether support memory oversubscription * Fix ROCM Compile error * Fix ROCM Compile error * Fix UT cuda_managed_memory_test * Set UT timeout to 40 * Add UT OOMExceptionTest * Set UT timeout to 50
-
- 25 1月, 2022 1 次提交
-
-
由 From00 提交于
-
- 17 12月, 2021 3 次提交
-
-
由 Leo Chen 提交于
-
由 From00 提交于
* Get GPU BasePtr from CUDA allocation * Fix compile error for ROCm * Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc * Add alignment for BuddyAllocator * Set address alignment of BuddyAllocator to 32 bytes * Fix CI error * Remove code for naive_best_fit strategy
-
由 From00 提交于
-
- 07 12月, 2021 1 次提交
-
-
由 jianghaicheng 提交于
-
- 25 11月, 2021 1 次提交
-
-
由 From00 提交于
* Support multi-stream allocation for CUDA place * Do not notify the retrying from other streams when free CUDA allocation * Fix compile error for CPU * Fix compile error for HIP * Release memory for StreamSafeCUDAAllocaRetry in malloc_test * Add FLAGS_use_stream_safe_cuda_allocator * Fix CI error for 'set_tests_properties' * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator * Add UT for alloc interface * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
-
- 08 11月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* Use cuda virtual memory management and merge blocks, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * window dll, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * use autogrowthv2 for system allocator, test=develop * remove ~CUDAVirtualMemAllocator(), test=develop * refine, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop * fix bug, test=develop * revert system allocator, test =develop * revert multiprocessing, test=develop * fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop * catch cudaErrorInitializationError when create allocator, test=develop * fix cuMemSetAccess use, test=develop * refine cuda api use, test=develop * refine, test=develop * for test, test=develop * for test, test=develop * switch to v2, test=develop * refine virtual allocator, test=develop * Record cuMemCreate and cuMemRelease, test=develop * refine, test=develop * avoid out of bounds, test=develop * rename allocator, test=develop * refine, test=develop * use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop * for test,test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop * refine, test=develop
-
- 29 9月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* add basic support for CUDA Graph * fix ci compile error * fix LOG print, fix windows CI * follow comments and update * small fix for default ctor * fix rocm compile error * fix CPU compile error
-
- 17 9月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* make flag setter easier * update * rename macro name * fix bug of public/writable * update to pass CI * polish * fix CPU link error
-
- 12 5月, 2021 1 次提交
-
-
由 liym27 提交于
-
- 09 4月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* [feature] support npu allocator (#30840) [feature] support npu allocator * [feature] support npu operator (#30951) [feature] support npu operator * [feature] support npu allocator, part 2 (#30972) * support npu allocator * add npu device context * fix some compile problem * fix some compile problem * add npu info * compile ok * fix include dir * support naive_best_fit_allocator * run ut ok, bug failed to exit * call aclrtResetDevice before exit * fix aclFinilize * add system allocatot test * add selected_gpus in gtest * add tensor_test for npu * support npu op, initial commit * add npu stream * add elementwise_add_op * compile ok * fix typo * fix elementwise_add_op_npu_test * support op run * test can run but failed * change aclopExecuteV2 to aclopCompileAndExecute * support parsing ascend rank table file (#31000) support parsing ascend rank table file * Fix reshape on GE graph. (#31084) Fix reshape on GE graph * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973) * add npu sub op * fix typo * rename test * fix bug * fix bug * add fp16 kernel * fix typo * support sub grad op * support elementwise_sub_grad op Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> * Fix compilation problem (#31100) Fix compilation problem (#31100) * fix compile * fix code stype * remove const_cast * support adding correct npu op in pybind.h (#31143) * support adding correct npu op in pybind.h * refine code * [NPU] Support executor with NPU (#31057) * [NPU] Support executor with NPU * Fix code according to reviews * Fix code * Add unittest for sub op npu * refactor npu device manager (#31154) refactor npu device manager (#31154) * fix selected npus * fix compile * fix reading flags from env * format Co-authored-by: Nxiayanming <41795079@qq.com> Co-authored-by: Ngongweibao <weibao.gong@gmail.com> Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com> Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
-
- 07 4月, 2021 1 次提交
-
-
由 zhang wenhui 提交于
* Ascend rc (#30483) * Fix compilcation on CANN20.1 and older (#30494) Fix compilcation on CANN20.1 and older * Add distribution supported (#30578) Add distribution supported * Build praser for Hcom* operators (#30627) Build praser for Hcom* operators * Pass device_ids info from launch to trainer. (#30632) Pass device_ids info from launch to trainer * Add Hccl program group (#30642) Add Hccl program group * Add startup bash files of test_ascend_group. (#30645) Add startup bash files of test_ascend_group * cleanup (#30646) cleanup test_ascend_group.py * [Feature] Build parser to support distributed training (#30658) [Feature] Build parser to support distributed training * fix compilation on ascend-20.1 (#30722) fix compilation on ascend-20.1 * Dev/fix ascend string (#30749) Dev/fix ascend string * code style (#30781) code style * Merge ascend_optimizer and ascend_parser. (#30776) Merge ascend_optimizer and ascend_parser. * Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug (#30797) Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug * Add paddle ascend distribution training supported (#30796) Add paddle ascend distribution training supported * pass cxx_flags to gloo cmake (#30857) * Destroy session first. (#30954) Destroy session first. * merge * fix, test=develop * fix, test=develop * fix style, test=develop * fix, test=develop * fix * fix log fatal, test=develop * fix enforce style, test=develop * fix, test=develop * fix, test=develop * fix rccl, test=develop * fix test, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix node_num, test=develop * fix ids str, test=develop * fix ids str, test=develop * fix ids str, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix, test=develop * fix style code, test=develop * fix style code, test=develop * fix style code, test=develop * fix style code, test=develop Co-authored-by: Nhutuxian <hutuxian2011@sina.cn> Co-authored-by: Ngongweibao <weibao.gong@gmail.com> Co-authored-by: NVoid Main <voidmain1313113@gmail.com> Co-authored-by: NLeo Chen <chenqiuliang@baidu.com> Co-authored-by: Ndingsiyu <18369187719@163.com> Co-authored-by: NOleNet <olenet@126.com>
-
- 22 2月, 2021 1 次提交
-
-
由 Qi Li 提交于
-
- 11 12月, 2020 1 次提交
-
-
由 LoveAn 提交于
* Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop * fix if error with CI_SKIP_TEST, test=develop * fix add properties to test error on Linux/MAC, test=develop * fix set test properties of test_code_generator error, test=develop * remove test codes and advance judgment of file modification on Linux, test=develop * rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix * Add branch judgement on Linux, test=develop
-
- 04 11月, 2020 1 次提交
-
-
由 Wilber 提交于
-
- 21 8月, 2020 1 次提交
-
-
由 QingshuChen 提交于
* support Baidu AI Accelerator * test=kunlun * minor * test=kunlun * support xpu op in separate file * test=kunlun * update XPU error message and remove duplicated code * test=kunlun * minor * test=kunlun * minor * test=kunlun
-
- 22 7月, 2020 1 次提交
-
-
由 Leo Chen 提交于
* fix best_fit_allocator_test on windows, test=develop * enable best_fit_allocator_test and test_math_op_patch_var_base, test=develop
-
- 08 6月, 2020 1 次提交
-
-
由 Zhou Wei 提交于
-
- 21 4月, 2020 1 次提交
-
-
由 石晓伟 提交于
* add the thread_local_allocator, test=develop * refactor the thread_local_allocator, test=develop * provides option setting strategy, test=develop
-
- 02 3月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add lodtensor share memory & serialization, test=develop * fix windows compile error, test=develop * deal vartype pickle & fix unittest matching error message, test=develop * update timeout variable name, test=develop * refactor memory map implement, test=develop * clear mmap file discripter when exit unexpectedly, test=develop * remove the child process fd in advance, test=develop * remove mmap fds after Queue.put in child process, test=develop * add hard unittests for register exit func, test=develop * fix python2 compatibility problem in unittest, test=develop * fix exception unittest error, test=develop * polish code based review comment, test=develop
-
- 19 12月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* add some debug flags to auto growth allocator, test=develop * add comments about auto growth, test=develop
-
- 24 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 11 9月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory. We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton. Also added data_feed_proto to operator to fix CI in CPU compilation
-
- 03 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 01 9月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* add retry_allocator for gpu, test=develop * follow chengduoZH's comments, test=develop * follow huihuang's comments,test=develop * change f,l in enforce.h to be file,line, test=develop * increase code coverage by adding unittests, test=develop * fix CMakeLists.txt, test=develop
-
- 18 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* feature/auto_growth_allocator, test=develop * add unittest of AlignedAllocator, test=develop * try to turn on auto_growth to test on CI, test=develop * fix segmentation fault in mixed_vector.h, test=develop * add unittests, test=develop
-
- 27 5月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* Revert "Revert "Fix allocator bug"" This reverts commit 174d0d0b. * Revert "fix travis ci" This reverts commit 5656fa9f. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop * clean code of allocator,test=develop * delete zero_size_allocator.h,test=develop * fix failed unittest,test=develop
-
- 23 5月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* Revert "Revert "Fix allocator bug"" This reverts commit 174d0d0b. * Revert "fix travis ci" This reverts commit 5656fa9f. test=develop * add inlined_vector.h, test=develop * add inlined_vector_test,test=develop
-
- 07 5月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
test=develop
-
- 28 3月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
add include headers to fix travis-ci test=develop
-
- 25 3月, 2019 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 21 3月, 2019 1 次提交
-
-
由 sneaxiy 提交于
modify allocator strategy remove changes of legacy buddy_allocator test=develop
-
- 18 3月, 2019 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 13 3月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 06 3月, 2019 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 16 11月, 2018 1 次提交
-
-
由 Yu Yang 提交于
test=develop
-