提交 · 9ccb622898202fc32bd24c96c119b4e828459960 · BaiXuePrincess / Paddle

01 12月, 2021 1 次提交
- L
  
  add vlog to auto_growth_best_fit_allocator (#37601) · 934e5d09
  由 Leo Chen 提交于 12月 01, 2021
  
  934e5d09
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

25 11月, 2021 1 次提交

Support multi-stream allocation for CUDA place (#37290) · b9c464c3

由 From00 提交于 11月 25, 2021

* Support multi-stream allocation for CUDA place

* Do not notify the retrying from other streams when free CUDA allocation

* Fix compile error for CPU

* Fix compile error for HIP

* Release memory for StreamSafeCUDAAllocaRetry in malloc_test

* Add FLAGS_use_stream_safe_cuda_allocator

* Fix CI error for 'set_tests_properties'

* Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy

* Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock

* FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator

* Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator

* Add UT for alloc interface

* Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator

b9c464c3

23 11月, 2021 1 次提交
- Q
  [XPU] Reorganize xpu device codes in platform, test=develop (#37428) · 79800978
  由 Qi Li 提交于 11月 23, 2021
```
* [XPU] Reorganize xpu device codes in platform, test=develop

* fix xpu_header.h, test=develop
```
  79800978
22 11月, 2021 1 次提交
- W
  
  fix cuda_virtual_mem_allocator a bug, test=develop (#37390) · e28d5b89
  由 wanghuancoder 提交于 11月 22, 2021
  
  e28d5b89
17 11月, 2021 1 次提交
- W
  
  [npu][hybrid] support offload (#37224) · 762819a8
  由 WangXi 提交于 11月 17, 2021
  
  762819a8
08 11月, 2021 1 次提交

Use cuda virtual memory management and merge blocks (#36189) · a1ec1d5a

由 wanghuancoder 提交于 11月 08, 2021

* Use cuda virtual memory management and merge blocks, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* window dll, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* use autogrowthv2 for system allocator, test=develop

* remove ~CUDAVirtualMemAllocator(), test=develop

* refine, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

* fix bug, test=develop

* revert system allocator, test =develop

* revert multiprocessing, test=develop

* fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop

* catch cudaErrorInitializationError when create allocator, test=develop

* fix cuMemSetAccess use, test=develop

* refine cuda api use, test=develop

* refine, test=develop

* for test, test=develop

* for test, test=develop

* switch to v2, test=develop

* refine virtual allocator, test=develop

* Record cuMemCreate and cuMemRelease, test=develop

* refine, test=develop

* avoid out of bounds, test=develop

* rename allocator, test=develop

* refine, test=develop

* use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop

* for test,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

a1ec1d5a

01 11月, 2021 1 次提交
- L
  [new-exec] refine vlog of interpretercore (#36865) · 4c93c4c3
  由 Leo Chen 提交于 11月 01, 2021
```
* refine vlog of interpretercore

* fix ut
```
  4c93c4c3
11 10月, 2021 1 次提交

refine auto_growth allocator (#35732) · 6d353aa5

由 Leo Chen 提交于 10月 11, 2021

* do not use alignedAllocator when cuda has alignment

* update test

* fix error during multiple process

6d353aa5

29 9月, 2021 2 次提交

Add basic support for CUDA Graph (#36190) · 21b93c3d

由 Zeng Jinle 提交于 9月 29, 2021

* add basic support for CUDA Graph

* fix ci compile error

* fix LOG print, fix windows CI

* follow comments and update

* small fix for default ctor

* fix rocm compile error

* fix CPU compile error

21b93c3d

L
Spinlock (#36030) · a9ea41c5
由 liutiexing 提交于 9月 29, 2021
```
* add align for WorkQueue

* add spinlock

* merge spinlock
```
a9ea41c5

22 9月, 2021 1 次提交
- T
  Fix copy elision warning (#35885) · 47d6bc86
  由 Tomasz Socha 提交于 9月 22, 2021
```
* Fix copy elision warning

* Remove redundand code
```
  47d6bc86
17 9月, 2021 1 次提交

Make flag adding easier (#35823) · 2c781455

由 Zeng Jinle 提交于 9月 17, 2021

* make flag setter easier

* update

* rename macro name

* fix bug of public/writable

* update to pass CI

* polish

* fix CPU link error

2c781455

11 9月, 2021 1 次提交

refactor gc (#35525) · adaa207b

由 wanghuancoder 提交于 9月 10, 2021

* refactor gc, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* gc each tensor, test=develop

* refine, test=develop

adaa207b

03 9月, 2021 1 次提交
- L
  
  [NPU] add 32 extra bytes for npu memory slot (#35347) · 668bfb35
  由 Leo Chen 提交于 9月 03, 2021
  
  668bfb35
26 8月, 2021 1 次提交
- W
  use spinlock in auto growth (#35139) · 0efda9d9
  由 wanghuancoder 提交于 8月 26, 2021
```
* use spinlock in auto growth, test=develop

* refine,test=develop
```
  0efda9d9
23 8月, 2021 1 次提交
- W
  Revert "use spin lock in auto growth allocator (#34910)" (#35069) · 97fef015
  由 wanghuancoder 提交于 8月 23, 2021
```
This reverts commit 6bacfb0e.
```
  97fef015
20 8月, 2021 1 次提交

use spin lock in auto growth allocator (#34910) · 6bacfb0e

由 wanghuancoder 提交于 8月 20, 2021

* use spin lock in auto growth allocator, test=develop

* use pthread spin lock, test=develop

* use lock guard, test=develop

* use malloc spin lock, test=develop

* use lock_guard, test=develop

6bacfb0e

09 8月, 2021 1 次提交
- L
  [NPU] add lock for npu_pinned_allocator (#34700) · e285258e
  由 Leo Chen 提交于 8月 09, 2021
```
* add lock

* fix typo
```
  e285258e
03 8月, 2021 1 次提交
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
19 7月, 2021 1 次提交
- Q
  
  [NPU] add is_empty_op_npu, test=develop (#34234) · d4fb5c68
  由 Qi Li 提交于 7月 19, 2021
  
  d4fb5c68
12 5月, 2021 1 次提交
- L
  
  [NPU] Support npu pinned allocator and manage Tensor on NPUPinnedPlace (#32840) · 6b3bb796
  由 liym27 提交于 5月 12, 2021
  
  6b3bb796
26 4月, 2021 1 次提交
- L
  
  refine error msg when out of memory (#32527) · 756f4639
  由 Leo Chen 提交于 4月 26, 2021
  
  756f4639
12 4月, 2021 1 次提交
- L
  
  follow comments to refine PR 32144 (#32174) · af374ae6
  由 Leo Chen 提交于 4月 12, 2021
  
  af374ae6
09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

07 4月, 2021 1 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

22 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part4), test=develop (#30936) · 33429630
  由 Qi Li 提交于 2月 22, 2021
  
  33429630
04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
03 2月, 2021 2 次提交

Q
try again if kunlun memory malloc failed (#30855) · 5c8455d6
由 QingshuChen 提交于 2月 03, 2021
```
* try again if kunlun memory malloc failed

* minor
```
5c8455d6

石

support xpu with analysis predictor, test=develop (#30832) · 2ac4143b

由石晓伟提交于 2月 03, 2021

* support xpu inference with analysis predictor, test=develop

* merge the cmake of the xpu toolchain, test=develop

* add c-apis, test=develop

* fix a bug in extern_xpu, test=develop

2ac4143b

01 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid memory for rocm35 (part1), test=develop (#30758) · 69875dc4
  由 Qi Li 提交于 2月 01, 2021
  
  69875dc4
12 1月, 2021 1 次提交
- 石
  
  fix header file paths of gflags, commit 3, test=develop (#30273) · efa54629
  由石晓伟提交于 1月 12, 2021
  
  efa54629
11 12月, 2020 1 次提交

Add the strategy of skipping cc/cu test compilation and execution in CI (#29499) · b5d4a1f3

由 LoveAn 提交于 12月 11, 2020

* Add the strategy of skipping cc/cu test compilation and execution in CI, test=develop

* fix if error with CI_SKIP_TEST, test=develop

* fix add properties to test error on Linux/MAC, test=develop

* fix set test properties of test_code_generator error, test=develop

* remove test codes and advance judgment of file modification on Linux, test=develop

* rename CI_SKIP_TEST to CI_SKIP_CPP_TEST, test=document_fix

* Add branch judgement on Linux, test=develop

b5d4a1f3

02 12月, 2020 1 次提交
- W
  
  change import math.h to cmath (#29260) · 6673fb05
  由 Wojciech Uss 提交于 12月 02, 2020
  
  6673fb05
20 11月, 2020 1 次提交
- G
  
  Fix gpu memory allocation bug. (#28703) · 1dad8cea
  由 gongweibao 提交于 11月 20, 2020
  
  1dad8cea
06 11月, 2020 1 次提交
- W
  
  Update memory release interface. (#28456) · ced5c40c
  由 Wilber 提交于 11月 05, 2020
  
  ced5c40c
04 11月, 2020 1 次提交
- W
  
  [Inference] Memory modification for ShrinkMemory. (#28355) · 05114693
  由 Wilber 提交于 11月 04, 2020
  
  05114693
23 10月, 2020 1 次提交

Add compile limit for PADDLE_ENFORCE without error message (#28221) · 2babd6ff

由 Chen Weihang 提交于 10月 23, 2020

* add compile limit for paddle enforce

* polish elementwise_op_function.cu.h

* fix failed unittest

* fix windows compile failed

* detail polish

* revert no type constructor

2babd6ff

27 9月, 2020 1 次提交

support elementwise add, activation, matmul on Baidu Kunlun (#27143) · 6b727e08

由 QingshuChen 提交于 9月 27, 2020

* support elementwise add, activation, matmul on Baidu Kunlun
* test=kunlun

* minor
* test=kunlun

* reconstuct the xpu directory
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

* minor
* test=kunlun

6b727e08

25 9月, 2020 1 次提交
- L
  
  increase retry time (#27553) · 6bb02e8e
  由 Leo Chen 提交于 9月 25, 2020
  
  6bb02e8e

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致