提交 · 1e598f1addf350c5d295ee28ab06bbc826136f5c · BaiXuePrincess / Paddle

15 11月, 2021 2 次提交

[Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a

由 Chen Weihang 提交于 11月 15, 2021

* move extension into pten [no-verify]

* append tensor methods by ext_tensor [no-verify]

* append other tensor methods [no-verify]

* ext related files tidy [no-verify]

* include relation tidy [no-verify]

* add pten tensor test [no-verify]

* replace tensor in custom op & compile success

* refine tensor constructor for unittest

* custom relu jit run success

* fix all custom op unittests

* add inference cmake adapt [no-verify]

* fix failed unittests

* fix windows failed unittests

* try to fix kunlun and inference failed

* fix test_elementwise_api error

* try to fix win compile failed

* fix kunlun fp16 type error

* remove useless haddle error macro

* add custom linear op test

* fix compile failed & add win symbols

* fix non pten kernel cast failed

* add dll decl for api

* polish several deetails

* polish details by review comment

* add dll_decl for register

1e598f1a

Add distributed pass framework: including PassBase/PassTest/PassUtils (#36643) · 12339fa0

由 Zeng Jinle 提交于 11月 15, 2021

* add split_program

* make ut faster

* increase ut timeout

* make result deterministic

* add fuse_all_reduce pass

* add ut framework, update

* fix ut framework

* remove useless code

* add coverage support

* update

* fix CI

* fix some bugs and fix ci coverage

* fix conflict

12339fa0

10 11月, 2021 1 次提交
- H
  Add libcinnapi.so to setup.py.in (#37068) · b4e25436
  由 Huihuang Zheng 提交于 11月 10, 2021
```
Add libcinnapi.so to setup.py.in
```
  b4e25436
04 11月, 2021 1 次提交
- H
  static cost model (#36775) · d33e99fe
  由 huangxu96 提交于 11月 04, 2021
```
Add Static CostModel. Static data is based on op benchmark system
```
  d33e99fe
29 10月, 2021 1 次提交
- M
  
  Move the ASP training API to paddle.static.sparsity. (#36525) · 113816d8
  由 Ming-Xu Huang 提交于 10月 29, 2021
  
  113816d8
28 10月, 2021 1 次提交
- P
  Expose paddle.version.show API and add doc for it (#36800) · d88c3e12
  由 pangyoki 提交于 10月 28, 2021
```
* add doc for show() in paddle.version

* fix format

* print cuda and cudnn in show API
```
  d88c3e12
27 10月, 2021 2 次提交

add paddle.version.cuda and paddle.version.cudnn API (#36556) · d65f41db

由 pangyoki 提交于 10月 27, 2021

* add paddle.version.cuda and paddle.version.cudnn API

* fix little bug

* fix bug

* add doc string

* fix mkdir error

* fix windows path

* fix new paddle/version path

* fix unittest

* fix format

d65f41db

Fused transformer encoder layer and fused feedforward layer (#36604) · 9f3613f3

由 zhangkaihuo 提交于 10月 27, 2021

本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

9f3613f3

22 9月, 2021 2 次提交
- J
  
  [Inference] Support NNAdapter and ascend310 (#35226) · 10e53044
  由 JingZhuangzhuang 提交于 9月 22, 2021
  
  10e53044
- [2.2]support extern third_party lapack API on Linux/Windows/Mac (#35690) · ae65257d
  由 zhouweiwei2014 提交于 9月 22, 2021
```
* support extern third_party lapack on Linux/Windows/Mac

* fix ci
```
  ae65257d
16 9月, 2021 2 次提交
- Z
  
  Add segment apis to paddle.incubate (#35759) · 4b683887
  由 Zhong Hui 提交于 9月 16, 2021
  
  4b683887
- S
  
  fix py api for paddle.inference.contrib.utils (#35769) · 554771dd
  由 Shang Zhizhou 提交于 9月 16, 2021
  
  554771dd
31 8月, 2021 1 次提交

New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d

由 Zhanlue Yang 提交于 8月 31, 2021

[Background]
Expansion in code size can be irreversible in the long run, leading to huge release packages which
not only hampers user experience but also exceeds a hard limit of pypi.

In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
arches supported.

This PR aims to prune this NV_FATBIN.

[Solution]
In the new release strategy, two types of whl packages will be involved:

Cubin PIP package:
PIP package maintains a smaller window for GPU arches support, containing
sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches

JIT release package:
This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
compute_70, compute_75, compute_80, with best performance and GPU arches coverage.

However, it takes around 10 min to install due to the JIT compilation.

[How to use]
The new release strategy is disabled by default.
To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL

2f3b393d

24 8月, 2021 1 次提交

Add auto completion module for auto parallel (#34813) · 93d862b0

由 Yulong Ao 提交于 8月 24, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments
Co-authored-by: Nsandyhouse <lilong12@baidu.com>

93d862b0

11 8月, 2021 1 次提交
- L
  add the basic apis for auto_parallel (#33804) · 3f962e77
  由 lilong12 提交于 8月 11, 2021
```
* add auto_parallel apis
```
  3f962e77
10 8月, 2021 1 次提交

copy boost/any.hpp to utils and replace boost::any with self defined any (#34613) · 12892929

由 chentianyu03 提交于 8月 10, 2021

* add any.hpp to utils and replace boost::any with self defined paddle::any

* add copy any.hpp to custom op depends

* modify any.hpp include path

* remove boost from setup.py.in

* add copy any.hpp to custom op depends

* move any.hpp to paddle/utils/ dirs

* move any.h to extension/include direction

* copy utils to right directions

12892929

05 8月, 2021 1 次提交

[Dy2st]Integrated gast library to fix compatibility problem permanently (#34556) · a9ee3833

由 0x45f 提交于 8月 05, 2021

* integrated gast library

* integrated gast library

* fix unittest and remove ast2.py

* remove 'gast' from __all__ in __init__.py

* add copyright in other files

* fix copyright

a9ee3833

04 8月, 2021 1 次提交
- K
  
  Elastic as module (#34572) · 1f76a2f7
  由 kuizhiqing 提交于 8月 04, 2021
  
  1f76a2f7
19 7月, 2021 1 次提交

Add Cuda event and stream API (#32460) · 9c7f6af5

由 chentianyu03 提交于 7月 19, 2021

* add cuda event and stream api

* add cuda event and stream api

* add get_current_stream api

* add get_current_stream api

* init streams

* modify get_current_stream

* modify get_cuttent_stream

* add synchronize func

* add current_stream doc and test file

* move get_current_stream into CUDA macro

* move CudaEvent into CUDA macro

* move _get_current_stream and _device_synchronize into cuda macro

* modify the macro of cuda stream and event

* add test case for synchronize

* add paddle.devices.cuda module

* event and stream support hip

* add doc for stream and event class

* move cuda stream and event into single pybind

* add cuda_streams_py.cc to cmakelist

* add _device_synchronize and _get_current_stream to core module

* add test case for cudastream and cudaevent

* move __all__ in streams.py

* fix test fail

* add cuda to devices __all__

* fix current_stream doc writing error

* move devices to device direction, and merge device.py into __init__.py

* add required:gpu to sample codes

* remove cuda direction from device/__init__.py

9c7f6af5

12 7月, 2021 1 次提交
- Y
  softmax mask fuse upper triangle (#33981) · e2e1c57b
  由 Yuang Liu 提交于 7月 12, 2021
```
* softmax mask fuse upper triangle

* cover not implemented cpu code
```
  e2e1c57b
15 6月, 2021 1 次提交
- W
  
  [XPU] Update cmake options for xpu. (#33450) · e47c3f04
  由 Wilber 提交于 6月 15, 2021
  
  e47c3f04
09 6月, 2021 1 次提交

[quant] Add quant wrap for functional api and refine the qat (#33162) · ddc95a01

由 cc 提交于 6月 09, 2021

* Add wrap for functional api
* Refine the wraped api
* Add unit test for quant functional layers
* Update all unit tests for dygraph qat

ddc95a01

01 6月, 2021 2 次提交
- C
  
  remove complex64 file (#33237) · 44dd918d
  由 chentianyu03 提交于 6月 01, 2021
  
  44dd918d
- C
  replace and remove complex64/128 types in custom OP and other files (#33195) · 06c63ca0
  由 chentianyu03 提交于 6月 01, 2021
```
* replace and remove complex64/128 types in custom OP and other files

* fix custom_tensor_test fail bug

* fix custom_conj_test fail bug

* fix dispatch_test_op build fail bug
```
  06c63ca0
27 5月, 2021 1 次提交
- Z
  Unify all external API error message mechanism and enhance third-party API error msg (#33003) · b425215a
  由 Zhou Wei 提交于 5月 27, 2021
```
* Unify all external API error message mechanism and enhance third-party API error msg

* fix some comment

* fix some comment
```
  b425215a
25 5月, 2021 1 次提交
- M
  
  Add Automatic SParsity Utilities (#32995) · f91e0f45
  由 Ming-Xu Huang 提交于 5月 25, 2021
  
  f91e0f45
07 5月, 2021 1 次提交
- Z
  Remove paddle_custom_op dynamic libraries, and link to FLUID_CORE on Windows (#32583) · 7610c2b4
  由 Zhou Wei 提交于 5月 07, 2021
```
* Remove paddle_custom_op dynamic libraries, change link to FLUID_CORE on windows, and check copy_to

* fix CI
```
  7610c2b4
30 4月, 2021 1 次提交
- T
  revert data_generator __init__.py (#32670) · eb13c19f
  由 tianshuo78520a 提交于 4月 30, 2021
```
* revert data_generator

* test

* add setup.py
```
  eb13c19f
25 4月, 2021 2 次提交
- L
  add pipeline for dynamic graph (#32511) · 561dc719
  由 lilong12 提交于 4月 25, 2021
```
* add pp dygraph, test=develop
```
  561dc719
- S
  [HybridParallel] Add pipeline layer in dygraph (#32449) · 7ef1de67
  由 ShenLiang 提交于 4月 25, 2021
```
* add pipeline layer
```
  7ef1de67
23 4月, 2021 1 次提交
- C
  [CustomOp] Remove useless extension headers for old custom op (#32463) · 7d4998ac
  由 Chen Weihang 提交于 4月 23, 2021
```
* remove useless ext headers

* fix boost header compile failed
```
  7d4998ac
22 4月, 2021 2 次提交
- W
  
  strip after compilation (#32145) · e727820d
  由 wuhuanzhou 提交于 4月 22, 2021
  
  e727820d
- T
  
  Delete WITH_GRPC flag and Distributed old code (#32383) · e58c705b
  由 tianshuo78520a 提交于 4月 22, 2021
  
  e58c705b
21 4月, 2021 1 次提交
- X
  remove fluid for auto_checkpoint. (#32157) · 1593ee25
  由 xiemoyuan 提交于 4月 21, 2021
```
* remove fluid for auto_checkpoint.

* fix bug.
```
  1593ee25
19 4月, 2021 1 次提交
- S
  [Hybrid Parallel] Support dp & mp in dygraph (#32323) · ffd40860
  由 ShenLiang 提交于 4月 19, 2021
```
* support dp & mp
```
  ffd40860
17 4月, 2021 1 次提交
- S
  [Hybrid Parallel] Add model parallel support in dygraph (#32248) · 66d46221
  由 ShenLiang 提交于 4月 17, 2021
```
* add model parallel support in dygraph
```
  66d46221
09 4月, 2021 1 次提交
- A
  [CustomOp]Support MacOS platform and Remove libpaddle_custom_op.so dependency (#31976) · d815fbf9
  由 Aurelius84 提交于 4月 09, 2021
```
* Remove old custom OP to reduce whl package volume

* [Custom OP]Remove old custom OP to reduce whl package volume

* support macos
```
  d815fbf9
07 4月, 2021 1 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

01 4月, 2021 1 次提交

add custom init grad for backward function (#31540) · 83b953f5

由 chentianyu03 提交于 4月 01, 2021

* add custom init grad for backward function

* add custom init grad for backward function

* handle when the grad_tensor is none

* handle when the grad_tensor is none

* fix the args type error on windows platform

* modify the args order and doc

* format code

* add grad_tensor to xpu

* modify the grad_tensor type check

* add paddle.backward api to support multi tensors gradient compute

* add paddle.backward api to support multi tensors gradient compute

* add paddle.atuograd module and backward api

* change tensor.backward func args

* modify tensor backward api

* remove create_graph intputs args

* add doc and examplex code for backward api

* when have the same tensor, throw error

* modify test Init func args

* modify the execute.Init func args in test files

* add paddle.autograd package in setup.py.in

* modify error msg, remove _run_backward method in class Tensor

* add test cases for backward api

83b953f5

31 3月, 2021 1 次提交
- T
  fix whl package push pypi (#31585) · b09c1ce0
  由 tianshuo78520a 提交于 3月 31, 2021
```
* fix whl package push pypi

* add rst
```
  b09c1ce0

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致