- 20 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* rename pten dir to phi * rename namespace to phi * rename infrt pten dir to phi * resolve conflict * rename pten to phi in cmake * revert all infrt change * change needed files * fix infrt failed * fix inference failed
-
- 19 2月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* add DistributedFusedLamb op * polish code * fix compile error * compatible with pten changement * fix rocm compile error * improve converage * update upstream/develop * fix cast_with_ptr.h * add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1 * fix clip before allreduce * add use_master_param_norm * code polish * fix bug * fix ROCM ci
-
- 15 2月, 2022 1 次提交
-
-
由 ronnywang 提交于
* [CustomRuntime] Add DeviceManager * [CustomRuntime] Add DeviceInterface * [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager * [CustomRuntime] Add plug-in device * [CustomRuntime] Memory module support PluggableDevice * [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option * update * [API] update API doc based on comments, test=develop Co-authored-by: Nqili93 <qili93@qq.com>
-
- 09 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
-
- 08 2月, 2022 1 次提交
-
-
由 From00 提交于
* Rough implementation for experiment * Support allocate cuda managed memory * Fix CI error * Modify UT * Check whether support memory oversubscription * Fix ROCM Compile error * Fix ROCM Compile error * Fix UT cuda_managed_memory_test * Set UT timeout to 40 * Add UT OOMExceptionTest * Set UT timeout to 50
-
- 06 2月, 2022 1 次提交
-
-
由 Wilber 提交于
-
- 27 1月, 2022 1 次提交
-
-
由 Aurelius84 提交于
* Support allocate_from in Tensor and allocate_data in Context * fix #ifdef CUDA * fix cycle depends * fix test_xxx_dev_api failed * fix windows compiling error * fix unittest * modify into PImpl * fix selected rows * add TODO comment * refine interface according reviewer
-
- 26 1月, 2022 1 次提交
-
-
由 Allen Guo 提交于
* sync misc changes * apply comments 01 * fix compile error * remove is_ipu_place check * add authors Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai> Co-authored-by: NAllen Guo <alleng@graphcore.ai> Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai> Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai> Co-authored-by: NHan Zhao <hanzhao@graphcore.ai> * sync changes * restore cmake * update ir cmake and setup.py * update inference_lib cmake * split PR Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai> Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai> Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai> Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>
-
- 25 1月, 2022 1 次提交
-
-
由 From00 提交于
-
- 21 1月, 2022 2 次提交
- 17 1月, 2022 2 次提交
-
-
由 Wilber 提交于
* add pten::Place data structure. * update ci problem * fix ci problem * update * using platform::Place=pten::Place * remove BOOST_GET_CONST for CPUPlace and GPUPlace * compile pass 25%. * compile pass 45% * compile pass 60% * remove boost_get for xpu npu mlu and ipu * compile pass on cpu and gpu. * fix compile problem * fix compile error. * update * fix ci problem * update * ci approve * fix ci problem * fix ci eager test problem * remove BOOST_GET_CONST * fix npu compile
-
由 sneaxiy 提交于
-
- 14 1月, 2022 1 次提交
-
-
由 qipengh 提交于
* [MLU]: add mean and reduce mean op * [MLU]add mlu pytest dir in CMakeLists.txt * [MLU]fix tensor data * [MLU]fix TensorToPyArray and license
-
- 13 1月, 2022 1 次提交
-
-
由 石晓伟 提交于
-
- 10 1月, 2022 1 次提交
-
-
由 taixiurong 提交于
-
- 05 1月, 2022 1 次提交
-
-
由 From00 提交于
* Fix bug of GetAllocatorInterfaceTest * Replace some shared_ptr with unique_ptr * Change Alloc call
-
- 04 1月, 2022 1 次提交
-
-
由 Qi Li 提交于
-
- 31 12月, 2021 1 次提交
-
-
由 fwenguang 提交于
* [MLU]support calling mlu op from python interface * [MLU]fix * fix * [mlu]fix mlu_places * [mlu]fix required mlu * fix * [MLU]fix tensor copy * [mlu] fix MLUPlace call path
-
- 30 12月, 2021 1 次提交
-
-
由 From00 提交于
-
- 29 12月, 2021 1 次提交
-
-
由 Huihuang Zheng 提交于
Fix Buddy Allocator random CI failure due to machine environment.
-
- 28 12月, 2021 1 次提交
-
-
由 From00 提交于
* fix reshape move storage error * remove needless set type * alloc tensor by shared storage * Utilize StreamSafeCUDAAllocator to support fast GC in new executor * Fix compile error for Windows and ROCm * Fix compile error for Windows * Modify UT stream_safe_cuda_alloc_test * Modify UT stream_safe_cuda_alloc_test * Rewrite fast GC * Rewrite fast GC * Fix compile error for BOOST_GET_CONST * Fix compile error for BOOST_GET_CONST * Changes default stream for StreamSafeCUDAAllocator * Fix a small CI error * Remove some redundant code * Fix conflict * Fix compile error for ROCm * Fix Windoes CI error * Fix CI error * Remove some unnecessary code * Fix CI error * Add UT for fast GC * Fix CI error * add device-agnostic stream class * add stream.h * fix ut * fix cpu compile * Use RWLock in GetAllocator * Fix CI error Co-authored-by: NChen Weihang <chenweihang@baidu.com> Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
-
- 27 12月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* add device-agnostic stream class * add stream.h * fix ut * fix cpu compile
-
- 22 12月, 2021 1 次提交
-
-
由 Yang 提交于
-
- 20 12月, 2021 2 次提交
- 17 12月, 2021 3 次提交
-
-
由 Leo Chen 提交于
-
由 From00 提交于
* Get GPU BasePtr from CUDA allocation * Fix compile error for ROCm * Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc * Add alignment for BuddyAllocator * Set address alignment of BuddyAllocator to 32 bytes * Fix CI error * Remove code for naive_best_fit strategy
-
由 From00 提交于
-
- 13 12月, 2021 1 次提交
-
-
由 taixiurong 提交于
-
- 10 12月, 2021 1 次提交
-
-
由 sneaxiy 提交于
-
- 09 12月, 2021 1 次提交
-
-
由 jianghaicheng 提交于
-
- 08 12月, 2021 1 次提交
-
-
由 From00 提交于
* Fix CUDAGraph bug for StreamSafeCUDAAllocator * Add CUDAGrapthAllocator check in multi-stream interface * Set FLAGS_use_stream_safe_cuda_allocator defaulted to false * Fix environment error for cmake * Fix cmake error * Add UT of GetAllocatorInterfaceTest * Add UT of CUDAGraphExceptionTest * Enhance CUDAGraphExceptionTest
-
- 07 12月, 2021 1 次提交
-
-
由 jianghaicheng 提交于
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 01 12月, 2021 1 次提交
-
-
由 Leo Chen 提交于
-
- 29 11月, 2021 1 次提交
-
-
由 piotrekobiIntel 提交于
-
- 27 11月, 2021 1 次提交
-
-
由 Aganlengzi 提交于
* [NPU] reorganization for device API abstraction * [NPU] delete old files * [NPU] fix npu_collective_helper * [NPU] fix collective_helper * [NPU] fix ut * [NPU] mod memory allocation and hccl_helper * [NPU] fix place_type * [NPU] split enfoce.h * move acl* call into npu_info * merge conflict * fix merge * merge conflict * merge conflict
-
- 25 11月, 2021 1 次提交
-
-
由 From00 提交于
* Support multi-stream allocation for CUDA place * Do not notify the retrying from other streams when free CUDA allocation * Fix compile error for CPU * Fix compile error for HIP * Release memory for StreamSafeCUDAAllocaRetry in malloc_test * Add FLAGS_use_stream_safe_cuda_allocator * Fix CI error for 'set_tests_properties' * Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy * Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock * FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator * Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator * Add UT for alloc interface * Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator
-
- 23 11月, 2021 1 次提交
-
-
由 Qi Li 提交于
* [XPU] Reorganize xpu device codes in platform, test=develop * fix xpu_header.h, test=develop
-