提交 · f014e301197cc7bd9e101cd3b478da63f92557f3 · Crayon鑫 / Paddle

03 9月, 2021 1 次提交

[NPU] add int64_t kernels for YoloV3, test=develop (#35045) · f014e301

由 Qi Li 提交于 9月 03, 2021

* [NPU] add int64 kernels, test=develop

* update ci scripts to be able to trun WITH_ASCEND_INT64 on, test=develop

f014e301

31 8月, 2021 1 次提交

New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d

由 Zhanlue Yang 提交于 8月 31, 2021

[Background]
Expansion in code size can be irreversible in the long run, leading to huge release packages which
not only hampers user experience but also exceeds a hard limit of pypi.

In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
arches supported.

This PR aims to prune this NV_FATBIN.

[Solution]
In the new release strategy, two types of whl packages will be involved:

Cubin PIP package:
PIP package maintains a smaller window for GPU arches support, containing
sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches

JIT release package:
This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
compute_70, compute_75, compute_80, with best performance and GPU arches coverage.

However, it takes around 10 min to install due to the JIT compilation.

[How to use]
The new release strategy is disabled by default.
To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL

2f3b393d

09 8月, 2021 1 次提交
- Increase the speed of incremental compilation (#34616) · aab4d6e4
  由 zhouweiwei2014 提交于 8月 09, 2021
  
  aab4d6e4
22 7月, 2021 1 次提交

[NPU] update NPU ci tests, test=npu_aarch64 (#34272) · e0da9666

由 Qi Li 提交于 7月 22, 2021

* [NPU] update NPU ci tests, test=npu_aarch64

* [NPU] fix x86 build and add disable_ut for NPU, test=npu_aarch64

* [NPU] address review comments, test=develop

e0da9666

21 7月, 2021 1 次提交
- Polish windows compile for Ninja, fix UT random compile (#34237) · 05805d91
  由 zhouweiwei2014 提交于 7月 21, 2021
```
* polish windows compile for Ninja, fix random compile fail

* polish windows compile for Ninja, fix random compile fail
```
  05805d91
14 7月, 2021 2 次提交
- T
  Support Mac M1 build (#34071) · ec0ea4c5
  由 tianshuo78520a 提交于 7月 14, 2021
```
* Support Mac M1 make

* cmake version check
```
  ec0ea4c5
- Support sccache to speed up compilation on Windows (#34019) · 4ce66826
  由 zhouweiwei2014 提交于 7月 14, 2021
```
* Support sccache to speed up compilation on Windows

* Support sccache to speed up compilation on Windows
```
  4ce66826
17 6月, 2021 1 次提交
- T
  
  test=document_fix (#33623) · d9941c83
  由 tianshuo78520a 提交于 6月 17, 2021
  
  d9941c83
16 6月, 2021 1 次提交
- T
  
  del python2 code (#33556) · 0b4a7f1a
  由 tianshuo78520a 提交于 6月 16, 2021
  
  0b4a7f1a
02 6月, 2021 1 次提交
- Q
  
  [ROCM] update paddle inference cmake, test=develop (#33260) · e7541209
  由 Qi Li 提交于 6月 02, 2021
  
  e7541209
26 5月, 2021 1 次提交
- Z
  Fix ninja compilation bug and warning on windows (#32987) · accf284b
  由 Zhou Wei 提交于 5月 26, 2021
```
* fix ninja compilation bug on windows

* polish windows ci

* polish windows ci
```
  accf284b
28 4月, 2021 1 次提交

[PsCore] solve Brpc dep (#32632) · 4ead9a5a

由 Thunderbrook 提交于 4月 28, 2021

* Revert "Revert "[PsCore] optimize performance of large kv (#32535)" (#32599)"

This reverts commit 809ac036.

* brpc dep

4ead9a5a

27 4月, 2021 1 次提交
- T
  Revert "[PsCore] optimize performance of large kv (#32535)" (#32599) · 809ac036
  由 tianshuo78520a 提交于 4月 27, 2021
```
This reverts commit 4b7242b0.
```
  809ac036
26 4月, 2021 2 次提交
- Z
  Fix OPENBLAS ci and fix windows CPU CI to parallel compile (#32548) · 1ec9525a
  由 Zhou Wei 提交于 4月 26, 2021
```
* clear CUDA compile environment on windows

* fix Windows CI

* fix Windows CI

* fix Windows CI
```
  1ec9525a
- T
  [PsCore] optimize performance of large kv (#32535) · 4b7242b0
  由 Thunderbrook 提交于 4月 26, 2021
```
* optimize pull sparse

* optimize pull sparse

* change macro

* format
```
  4b7242b0
23 4月, 2021 1 次提交

fix Windows CI MP compile and environment install script and openblas CI (#32378) · 7a681f0b

由 Zhou Wei 提交于 4月 23, 2021

* fix Windows CI MP compile and environment install script

* clear Windows CI environment

* clear Windows CI environment

* clear Windows CI environment

7a681f0b

22 4月, 2021 2 次提交
- W
  
  strip after compilation (#32145) · e727820d
  由 wuhuanzhou 提交于 4月 22, 2021
  
  e727820d
- T
  
  Delete WITH_GRPC flag and Distributed old code (#32383) · e58c705b
  由 tianshuo78520a 提交于 4月 22, 2021
  
  e58c705b
21 4月, 2021 1 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

15 4月, 2021 1 次提交

heterps support pscore (#32093) · 9f8c8f96

由 Thunderbrook 提交于 4月 15, 2021

* pscore support heterps

* fleet cmake

* fleet wrapper

* macro

* solve conflict

* solve conflict

* add unitest

* paddle enforce

* unitest

* unitest

* unitest

9f8c8f96

14 4月, 2021 1 次提交

Fix rocm cmake (#32230) · f3e49c40

由 Qi Li 提交于 4月 14, 2021

* [ROCM] fix some typo in cmake, test=develop

* [ROCM] fix rccl in paddle build script, test=develop

f3e49c40

09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

07 4月, 2021 1 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

31 3月, 2021 1 次提交

update cmake minimum version to 3.15 (#31807) · 3a95a0bc

由 wuhuanzhou 提交于 3月 31, 2021

* update cmake minimum version to 3.15, test=develop

* fix compilation error on Windows, test=develop

* fix compilation error on Windows, test=develop

* fix compilation error on Windows, test=develop

3a95a0bc

23 3月, 2021 2 次提交
- Z
  
  Restore the third-party library cache for windows (#31811) · 1eb927f9
  由 Zhou Wei 提交于 3月 23, 2021
  
  1eb927f9
- Z
  Update windows compiler and CI from VS2015 to VS2017 (#31652) · a70de87d
  由 Zhou Wei 提交于 3月 23, 2021
```
* modify windows CI to VS2017

* modify windows CI to VS2017

* modify windows CI to VS2017
```
  a70de87d
17 3月, 2021 1 次提交
- Z
  
  support Geforce RTX 30+ GPU (#31529) · 4c0c55bb
  由 Zhou Wei 提交于 3月 17, 2021
  
  4c0c55bb
16 3月, 2021 1 次提交

Optimize compilation with Ninja (#31449) · 41e9ecfd

由 wuhuanzhou 提交于 3月 16, 2021

* Optimize compilation with Ninja, notest, test=windows_ci, test=windows_op

* no cache on windows ci, notest, test=windows_ci, test=windows_op

* delete /Zc:inline compiled in NVCC, notest, test=windows_ci, test=windows_op

* fix test_warpctc_op, notest, test=windows_ci

* remove test code, test=develop

41e9ecfd

22 2月, 2021 1 次提交

[2.0Custom OP]Support New Custom OP on Windows (#31063) · adaec007

由 Zhou Wei 提交于 2月 22, 2021

* [2.0.1]Support New Custom OP on windows

* fix CI

* fix code style

* fix CI

* fix CI

* fix coverage

* fix CI

* fix CI

adaec007

09 2月, 2021 1 次提交
- W
  update eigen version on Windows (#30573) · 9b3c80c8
  由 wuhuanzhou 提交于 2月 09, 2021
```
* update eigen version on Windows, test=develop

* add /bigobj for cl, test=develop
```
  9b3c80c8
21 1月, 2021 1 次提交
- Q
  
  [ROCM] update cmake and dockerfile, test=develop (#30598) · 1f5841c2
  由 Qi Li 提交于 1月 21, 2021
  
  1f5841c2
20 1月, 2021 1 次提交

optimize unity build (#30195) · 7e671c07

由 wuhuanzhou 提交于 1月 20, 2021

* optimize unity build, test=develop

* fix code style error, test=develop

* fix code style error and test /MP settings, test=develop

7e671c07

18 1月, 2021 1 次提交
- H
  
  Ascend Framework Part1: OP & Wrapper (#30281) · 40ede126
  由 hutuxian 提交于 1月 18, 2021
  
  40ede126
14 1月, 2021 1 次提交
- S
  
  fix jetson compile error (#30378) · 49e79cad
  由 Shang Zhizhou 提交于 1月 14, 2021
  
  49e79cad
12 1月, 2021 1 次提交

Fix/distributed proto (#29981) · 25f80fd3

由 tangwei12 提交于 1月 12, 2021

* rename sendrecv.proto to namespace paddle.distributed

* split ps with distributed

25f80fd3

28 12月, 2020 1 次提交
- W
  Support mips arch (#29903) · 332da133
  由 Wilber 提交于 12月 28, 2020
```
* Support MIPS arch.
```
  332da133
26 12月, 2020 1 次提交
- L
  
  [Kunlun] PR2: Support MultiDevicePass and BKCL in parallel executor (#29574) · 4427df37
  由 liuyuhui 提交于 12月 26, 2020
  
  4427df37
24 12月, 2020 1 次提交

[Feature] one ps (3/4) (#29604) · 032414ca

由 tangwei12 提交于 12月 24, 2020

* oneps (3/4)
Co-authored-by: NMrChengmo <cmchengmo@163.com>
Co-authored-by: Nmalin10 <malin10@baidu.com>
Co-authored-by: Nchengmo <chengmo@baidu.com>

032414ca

21 12月, 2020 1 次提交

Optimize compilation time with Unity Build (#29733) · 2e5b4a21

由 LoveAn 提交于 12月 21, 2020

* Test compilation time with less parallel count, notest, test=windows_ci

* optimize rules of Unity Build, notest, test=windows_ci, test=windows_op

* limit parallel counts used only on GPU, test=develop

* remove limit of argument /m:8 on Windows, test=develop

2e5b4a21

17 12月, 2020 1 次提交
- W
  Windows generate pdb and dump, for debug (#29628) · 0c59ad2a
  由 wanghuancoder 提交于 12月 17, 2020
```
* Windows generate pdb and dump, for debug

* fix code style, test=develop

* modify cmakelist
```
  0c59ad2a

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致