提交 · c6950ab2573aece1fa0728aef1446bd8b0b8c1a0 · Crayon鑫 / Paddle

19 2月, 2022 1 次提交

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

15 2月, 2022 1 次提交

[PluggableDevice] Add custom runtime support (#38740) · 3e7825f3

由 ronnywang 提交于 2月 15, 2022

* [CustomRuntime] Add DeviceManager

* [CustomRuntime] Add DeviceInterface

* [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager

* [CustomRuntime] Add plug-in device

* [CustomRuntime] Memory module support PluggableDevice

* [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option

* update

* [API] update API doc based on comments, test=develop
Co-authored-by: Nqili93 <qili93@qq.com>

3e7825f3

09 2月, 2022 1 次提交
- C
  
  move stream into pten (#39392) · 266955a9
  由 Chen Weihang 提交于 2月 09, 2022
  
  266955a9
08 2月, 2022 1 次提交

Support allocate CUDA managed memory (#39075) · 42910361

由 From00 提交于 2月 08, 2022

* Rough implementation for experiment

* Support allocate cuda managed memory

* Fix CI error

* Modify UT

* Check whether support memory oversubscription

* Fix ROCM Compile error

* Fix ROCM Compile error

* Fix UT cuda_managed_memory_test

* Set UT timeout to 40

* Add UT OOMExceptionTest

* Set UT timeout to 50

42910361

06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
27 1月, 2022 1 次提交

[PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c

由 Aurelius84 提交于 1月 27, 2022

* Support allocate_from in Tensor and allocate_data in Context

* fix #ifdef CUDA

* fix cycle depends

* fix test_xxx_dev_api failed

* fix windows compiling error

* fix unittest

* modify into PImpl

* fix selected rows

* add TODO comment

* refine interface according reviewer

5631da9c

26 1月, 2022 1 次提交

[IPU] sync misc changes 01 (#38876) · 4efbebea

由 Allen Guo 提交于 1月 26, 2022

* sync misc changes

* apply comments 01

* fix compile error

* remove is_ipu_place check

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* sync changes

* restore cmake

* update ir cmake and setup.py

* update inference_lib cmake

* split PR
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

4efbebea

25 1月, 2022 1 次提交
- F
  
  Add GetBasePtr interface in paddle::memory (#39145) · b2a7261d
  由 From00 提交于 1月 25, 2022
  
  b2a7261d
21 1月, 2022 2 次提交

F
[MLU]add mlu ci dockerfile (#39021) · fdab43b5
由 fwenguang 提交于 1月 21, 2022
```
* [MLU]add mlu ci dockerfile

* fix comment

* add cncl
```
fdab43b5

[PTEN] Add cpu context (#38979) · 064bc4b8

由 Wilber 提交于 1月 21, 2022

* add cpu_context.

* update

* update

* update

* update

* update

* fix ci problem

* fix npu ci problem

* update

* fix ci compile

064bc4b8

17 1月, 2022 2 次提交

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

S

add squared_l2_norm (#38968) · 6eeb16b8
由 sneaxiy 提交于 1月 17, 2022

6eeb16b8

14 1月, 2022 1 次提交

[MLU]Add mean and reduce_mean op (#38872) · 7f8d5bc8

由 qipengh 提交于 1月 14, 2022

* [MLU]: add mean and reduce mean op

* [MLU]add mlu pytest dir in CMakeLists.txt

* [MLU]fix tensor data

* [MLU]fix TensorToPyArray and license

7f8d5bc8

13 1月, 2022 1 次提交
- 石
  
  splits allocation for pten, test=develop (#38853) · 277cf900
  由石晓伟提交于 1月 13, 2022
  
  277cf900
10 1月, 2022 1 次提交
- T
  
  1.fix elementwise_add_grad bug. 2. add dropout kernel in kl2 (#38726) · 7b860a23
  由 taixiurong 提交于 1月 10, 2022
  
  7b860a23
05 1月, 2022 1 次提交

Fix bug for UT GetAllocatorInterfaceTest (#38720) · 905c8022

由 From00 提交于 1月 05, 2022

* Fix bug of GetAllocatorInterfaceTest

* Replace some shared_ptr with unique_ptr

* Change Alloc call

905c8022

04 1月, 2022 1 次提交
- Q
  
  [XPU] update XPU device info, test=develop (#37884) · e1187e50
  由 Qi Li 提交于 1月 04, 2022
  
  e1187e50
31 12月, 2021 1 次提交

[MLU]support calling mlu op from python interface (#38292) · b6bf650a

由 fwenguang 提交于 12月 31, 2021

* [MLU]support calling mlu op from python interface

* [MLU]fix

* fix

* [mlu]fix mlu_places

* [mlu]fix required mlu

* fix

* [MLU]fix tensor copy

* [mlu] fix MLUPlace call path

b6bf650a

30 12月, 2021 1 次提交
- F
  
  Replace shared_ptr with unique_ptr in base_ptr_test (#38530) · 3f6229c6
  由 From00 提交于 12月 30, 2021
  
  3f6229c6
29 12月, 2021 1 次提交
- H
  Fix buddy allocator random CI failure (#38545) · 14658d8f
  由 Huihuang Zheng 提交于 12月 29, 2021
```
Fix Buddy Allocator random CI failure due to machine environment.
```
  14658d8f
28 12月, 2021 1 次提交

Utilize StreamSafeCUDAAllocator to support fast GC in new executor (#37642) · 0c7153a4

由 From00 提交于 12月 28, 2021

* fix reshape move storage error

* remove needless set type

* alloc tensor by shared storage

* Utilize StreamSafeCUDAAllocator to support fast GC in new executor

* Fix compile error for Windows and ROCm

* Fix compile error for Windows

* Modify UT stream_safe_cuda_alloc_test

* Modify UT stream_safe_cuda_alloc_test

* Rewrite fast GC

* Rewrite fast GC

* Fix compile error for BOOST_GET_CONST

* Fix compile error for BOOST_GET_CONST

* Changes default stream for StreamSafeCUDAAllocator

* Fix a small CI error

* Remove some redundant code

* Fix conflict

* Fix compile error for ROCm

* Fix Windoes CI error

* Fix CI error

* Remove some unnecessary code

* Fix CI error

* Add UT for fast GC

* Fix CI error

* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile

* Use RWLock in GetAllocator

* Fix CI error
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

0c7153a4

27 12月, 2021 1 次提交
- L
  add device-agnostic stream class (#38391) · 6b5e33b4
  由 Leo Chen 提交于 12月 27, 2021
```
* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile
```
  6b5e33b4
22 12月, 2021 1 次提交
- Y
  
  optimize buddy_allocator (#38312) · 8fe1cb72
  由 Yang 提交于 12月 22, 2021
  
  8fe1cb72
20 12月, 2021 2 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
- F
  
  Skip zero-size Allocation in RecordStream (#38264) · 48937020
  由 From00 提交于 12月 20, 2021
  
  48937020
17 12月, 2021 3 次提交
- L
  
  fit CI_SKIP_CPP_TEST (#38242) · b613c31e
  由 Leo Chen 提交于 12月 17, 2021
  
  b613c31e
- F
  Get base pointer from Allocation (#37978) · 431a2d6a
  由 From00 提交于 12月 17, 2021
```
* Get GPU BasePtr from CUDA allocation

* Fix compile error for ROCm

* Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc

* Add alignment for BuddyAllocator

* Set address alignment of BuddyAllocator to 32 bytes

* Fix CI error

* Remove code for naive_best_fit strategy
```
  431a2d6a
- F
  
  Add GetStream Interface for StreamSafeCUDAAllocator (#38195) · b0d12d99
  由 From00 提交于 12月 17, 2021
  
  b0d12d99
13 12月, 2021 1 次提交
- T
  
  update xpu_memcpy (#38049) · bdf5834e
  由 taixiurong 提交于 12月 13, 2021
  
  bdf5834e
10 12月, 2021 1 次提交
- S
  
  make cuda graph thread local allocator (#37814) · 62b1f38c
  由 sneaxiy 提交于 12月 10, 2021
  
  62b1f38c
09 12月, 2021 1 次提交
- J
  
  add ipu device p2 (#37840) · cb636a48
  由 jianghaicheng 提交于 12月 09, 2021
  
  cb636a48
08 12月, 2021 1 次提交

Fix CUDAGraphAllocator bug for StreamSafeCUDAAllocator (#37821) · b4a67491

由 From00 提交于 12月 08, 2021

* Fix CUDAGraph bug for StreamSafeCUDAAllocator

* Add CUDAGrapthAllocator check in multi-stream interface

* Set FLAGS_use_stream_safe_cuda_allocator defaulted to false

* Fix environment error for cmake

* Fix cmake error

* Add UT of GetAllocatorInterfaceTest

* Add UT of CUDAGraphExceptionTest

* Enhance CUDAGraphExceptionTest

b4a67491

07 12月, 2021 1 次提交
- J
  
  add ipu device p1 (#37841) · c9a3c669
  由 jianghaicheng 提交于 12月 07, 2021
  
  c9a3c669
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
01 12月, 2021 1 次提交
- L
  
  add vlog to auto_growth_best_fit_allocator (#37601) · 934e5d09
  由 Leo Chen 提交于 12月 01, 2021
  
  934e5d09
29 11月, 2021 1 次提交
- P
  
  Add third batch of deprecated mkldnn namespace name changes (#37558) · 1ba81500
  由 piotrekobiIntel 提交于 11月 29, 2021
  
  1ba81500
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

25 11月, 2021 1 次提交

Support multi-stream allocation for CUDA place (#37290) · b9c464c3

由 From00 提交于 11月 25, 2021

* Support multi-stream allocation for CUDA place

* Do not notify the retrying from other streams when free CUDA allocation

* Fix compile error for CPU

* Fix compile error for HIP

* Release memory for StreamSafeCUDAAllocaRetry in malloc_test

* Add FLAGS_use_stream_safe_cuda_allocator

* Fix CI error for 'set_tests_properties'

* Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy

* Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock

* FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator

* Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator

* Add UT for alloc interface

* Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator

b9c464c3

23 11月, 2021 1 次提交
- Q
  [XPU] Reorganize xpu device codes in platform, test=develop (#37428) · 79800978
  由 Qi Li 提交于 11月 23, 2021
```
* [XPU] Reorganize xpu device codes in platform, test=develop

* fix xpu_header.h, test=develop
```
  79800978
22 11月, 2021 1 次提交
- W
  
  fix cuda_virtual_mem_allocator a bug, test=develop (#37390) · e28d5b89
  由 wanghuancoder 提交于 11月 22, 2021
  
  e28d5b89

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致