提交 · e3cca8ac7c32906244e25e07f19347b5d7ec7f24 · Crayon鑫 / Paddle

07 12月, 2021 1 次提交
- H
  Set runtime_include_dir in Paddle.__init__.py (#37886) · e3cca8ac
  由 Huihuang Zheng 提交于 12月 07, 2021
```
Paddle don't have to set runtime_include_dir during run CINN.
```
  e3cca8ac
03 12月, 2021 1 次提交

[Eager] publish python c api for eager (#37550) · 07b4fe93

由 wanghuancoder 提交于 12月 03, 2021

* refine a test case, test=develop

* publish python c api for eager, test=develop

* revert modify about test_allclose_layer.py, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* delete numpy includes, use pybind11 numpy.h, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* suport eager error msg, and add grad test case, test=develop

* refine, test=develop

* refine, test=develop

07b4fe93

29 11月, 2021 1 次提交
- B
  
  fix_InternalStorage (#37568) · d0a89744
  由 Baibaifan 提交于 11月 29, 2021
  
  d0a89744
19 11月, 2021 1 次提交

Add fuse_resnet_unit pass (#36818) · 3cd3bf29

由 wuhuanzhou 提交于 11月 19, 2021

* GeneratePass support attr condition and mapping, test=develop

* fix coverage, test=develop

* Add fuse_resnet_unit pass, test=develop

* fix CI errors, test=develop

* fix CI errors, test=develop

* fix unittest error when compiling without CUDA, test=develop

* fix static ci error, test=develop

* limit kernel size must equal 1, test=develop

3cd3bf29

15 11月, 2021 2 次提交

[Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a

由 Chen Weihang 提交于 11月 15, 2021

* move extension into pten [no-verify]

* append tensor methods by ext_tensor [no-verify]

* append other tensor methods [no-verify]

* ext related files tidy [no-verify]

* include relation tidy [no-verify]

* add pten tensor test [no-verify]

* replace tensor in custom op & compile success

* refine tensor constructor for unittest

* custom relu jit run success

* fix all custom op unittests

* add inference cmake adapt [no-verify]

* fix failed unittests

* fix windows failed unittests

* try to fix kunlun and inference failed

* fix test_elementwise_api error

* try to fix win compile failed

* fix kunlun fp16 type error

* remove useless haddle error macro

* add custom linear op test

* fix compile failed & add win symbols

* fix non pten kernel cast failed

* add dll decl for api

* polish several deetails

* polish details by review comment

* add dll_decl for register

1e598f1a

Add distributed pass framework: including PassBase/PassTest/PassUtils (#36643) · 12339fa0

由 Zeng Jinle 提交于 11月 15, 2021

* add split_program

* make ut faster

* increase ut timeout

* make result deterministic

* add fuse_all_reduce pass

* add ut framework, update

* fix ut framework

* remove useless code

* add coverage support

* update

* fix CI

* fix some bugs and fix ci coverage

* fix conflict

12339fa0

10 11月, 2021 1 次提交
- H
  Add libcinnapi.so to setup.py.in (#37068) · b4e25436
  由 Huihuang Zheng 提交于 11月 10, 2021
```
Add libcinnapi.so to setup.py.in
```
  b4e25436
04 11月, 2021 1 次提交
- H
  static cost model (#36775) · d33e99fe
  由 huangxu96 提交于 11月 04, 2021
```
Add Static CostModel. Static data is based on op benchmark system
```
  d33e99fe
29 10月, 2021 1 次提交
- M
  
  Move the ASP training API to paddle.static.sparsity. (#36525) · 113816d8
  由 Ming-Xu Huang 提交于 10月 29, 2021
  
  113816d8
28 10月, 2021 1 次提交
- P
  Expose paddle.version.show API and add doc for it (#36800) · d88c3e12
  由 pangyoki 提交于 10月 28, 2021
```
* add doc for show() in paddle.version

* fix format

* print cuda and cudnn in show API
```
  d88c3e12
27 10月, 2021 2 次提交

add paddle.version.cuda and paddle.version.cudnn API (#36556) · d65f41db

由 pangyoki 提交于 10月 27, 2021

* add paddle.version.cuda and paddle.version.cudnn API

* fix little bug

* fix bug

* add doc string

* fix mkdir error

* fix windows path

* fix new paddle/version path

* fix unittest

* fix format

d65f41db

Fused transformer encoder layer and fused feedforward layer (#36604) · 9f3613f3

由 zhangkaihuo 提交于 10月 27, 2021

本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

9f3613f3

22 9月, 2021 2 次提交
- J
  
  [Inference] Support NNAdapter and ascend310 (#35226) · 10e53044
  由 JingZhuangzhuang 提交于 9月 22, 2021
  
  10e53044
- [2.2]support extern third_party lapack API on Linux/Windows/Mac (#35690) · ae65257d
  由 zhouweiwei2014 提交于 9月 22, 2021
```
* support extern third_party lapack on Linux/Windows/Mac

* fix ci
```
  ae65257d
16 9月, 2021 2 次提交
- Z
  
  Add segment apis to paddle.incubate (#35759) · 4b683887
  由 Zhong Hui 提交于 9月 16, 2021
  
  4b683887
- S
  
  fix py api for paddle.inference.contrib.utils (#35769) · 554771dd
  由 Shang Zhizhou 提交于 9月 16, 2021
  
  554771dd
31 8月, 2021 1 次提交

New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d

由 Zhanlue Yang 提交于 8月 31, 2021

[Background]
Expansion in code size can be irreversible in the long run, leading to huge release packages which
not only hampers user experience but also exceeds a hard limit of pypi.

In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
arches supported.

This PR aims to prune this NV_FATBIN.

[Solution]
In the new release strategy, two types of whl packages will be involved:

Cubin PIP package:
PIP package maintains a smaller window for GPU arches support, containing
sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches

JIT release package:
This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
compute_70, compute_75, compute_80, with best performance and GPU arches coverage.

However, it takes around 10 min to install due to the JIT compilation.

[How to use]
The new release strategy is disabled by default.
To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL

2f3b393d

24 8月, 2021 1 次提交

Add auto completion module for auto parallel (#34813) · 93d862b0

由 Yulong Ao 提交于 8月 24, 2021

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* add dist

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update, test=develop

* update

* update

* update

* update

* update

* update, test=develop

* update, test=develop

* update

* update

* delete unused proto

* resotre op_desc

* restore type_defs

* update var_desc

* remove dimss_mapping for proto_pybind

* update interface.py

* update framework.py

* update

* update

* add auto_parallel dir

* mv to paddle.distributed

* add shard_xx api

* add distributed attrs for var

* add ut, test=develop

* [WIP] Add the auto completion feature and related codes

* [WIP] Improve the auto completion and related codes

* [WIP] Make the auto completion to support data-parallel

* [WIP] Make the completion support mp and dp+mp

* [WIP] Refactor auto completion unit test for MLP

* [WIP] Refactor the implementation of DistributedOperatorImpl

* [WIP] Improve dims_mapping update rule and fix a bug

* [WIP] Support auto completion for one transformer decoder layer

* [WIP] Add a minor change

* [WIP] Fix a bug within the uint test

* Shard XShape tensor, add embedding completion and refactor code

* Add the distributed_operators dir to setup.py.in

* Improve the completion process and add the unittest for gpt

* fix process_mesh ut

* fix process_mesh ut

* update

* update, test=develop

* Add support for automatically completing distributed attrs of special ops

* update

* update

* update

* fix doc sample codes, test=develop

* improve coverage, test=develop

* add static_mode check, test=develop

* Model the cluster for cost model and physical mapping

* update, test=develop

* add set_placement, test=develop

* Add the check to make sure the candidate tensors' size is great than zero

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update doc, test=develop

* update, test=develop

* Auto mark dist attrs annotated by user

* update ndarray to nested list, test=develop

* update, test=develop

* Add auto-completion module for auto-parallel (based on PR#33804)

* Remove unnecessary files

* Remove unrelated files for the auto completion pr

* Update the unit test to improve the coverage

* Modify codes based on reviews

* Minor changes for CI

* Improve some codes based on new comments

* Fix bugs caused by shallow copy in attributes.py
* Imporve amend_distributed_attr_for_program in context.py
* Other changes for weihang's comments
Co-authored-by: Nsandyhouse <lilong12@baidu.com>

93d862b0

11 8月, 2021 1 次提交
- L
  add the basic apis for auto_parallel (#33804) · 3f962e77
  由 lilong12 提交于 8月 11, 2021
```
* add auto_parallel apis
```
  3f962e77
10 8月, 2021 1 次提交

copy boost/any.hpp to utils and replace boost::any with self defined any (#34613) · 12892929

由 chentianyu03 提交于 8月 10, 2021

* add any.hpp to utils and replace boost::any with self defined paddle::any

* add copy any.hpp to custom op depends

* modify any.hpp include path

* remove boost from setup.py.in

* add copy any.hpp to custom op depends

* move any.hpp to paddle/utils/ dirs

* move any.h to extension/include direction

* copy utils to right directions

12892929

05 8月, 2021 1 次提交

[Dy2st]Integrated gast library to fix compatibility problem permanently (#34556) · a9ee3833

由 0x45f 提交于 8月 05, 2021

* integrated gast library

* integrated gast library

* fix unittest and remove ast2.py

* remove 'gast' from __all__ in __init__.py

* add copyright in other files

* fix copyright

a9ee3833

04 8月, 2021 1 次提交
- K
  
  Elastic as module (#34572) · 1f76a2f7
  由 kuizhiqing 提交于 8月 04, 2021
  
  1f76a2f7
19 7月, 2021 1 次提交

Add Cuda event and stream API (#32460) · 9c7f6af5

由 chentianyu03 提交于 7月 19, 2021

* add cuda event and stream api

* add cuda event and stream api

* add get_current_stream api

* add get_current_stream api

* init streams

* modify get_current_stream

* modify get_cuttent_stream

* add synchronize func

* add current_stream doc and test file

* move get_current_stream into CUDA macro

* move CudaEvent into CUDA macro

* move _get_current_stream and _device_synchronize into cuda macro

* modify the macro of cuda stream and event

* add test case for synchronize

* add paddle.devices.cuda module

* event and stream support hip

* add doc for stream and event class

* move cuda stream and event into single pybind

* add cuda_streams_py.cc to cmakelist

* add _device_synchronize and _get_current_stream to core module

* add test case for cudastream and cudaevent

* move __all__ in streams.py

* fix test fail

* add cuda to devices __all__

* fix current_stream doc writing error

* move devices to device direction, and merge device.py into __init__.py

* add required:gpu to sample codes

* remove cuda direction from device/__init__.py

9c7f6af5

12 7月, 2021 1 次提交
- Y
  softmax mask fuse upper triangle (#33981) · e2e1c57b
  由 Yuang Liu 提交于 7月 12, 2021
```
* softmax mask fuse upper triangle

* cover not implemented cpu code
```
  e2e1c57b
15 6月, 2021 1 次提交
- W
  
  [XPU] Update cmake options for xpu. (#33450) · e47c3f04
  由 Wilber 提交于 6月 15, 2021
  
  e47c3f04
09 6月, 2021 1 次提交

[quant] Add quant wrap for functional api and refine the qat (#33162) · ddc95a01

由 cc 提交于 6月 09, 2021

* Add wrap for functional api
* Refine the wraped api
* Add unit test for quant functional layers
* Update all unit tests for dygraph qat

ddc95a01

01 6月, 2021 2 次提交
- C
  
  remove complex64 file (#33237) · 44dd918d
  由 chentianyu03 提交于 6月 01, 2021
  
  44dd918d
- C
  replace and remove complex64/128 types in custom OP and other files (#33195) · 06c63ca0
  由 chentianyu03 提交于 6月 01, 2021
```
* replace and remove complex64/128 types in custom OP and other files

* fix custom_tensor_test fail bug

* fix custom_conj_test fail bug

* fix dispatch_test_op build fail bug
```
  06c63ca0
27 5月, 2021 1 次提交
- Z
  Unify all external API error message mechanism and enhance third-party API error msg (#33003) · b425215a
  由 Zhou Wei 提交于 5月 27, 2021
```
* Unify all external API error message mechanism and enhance third-party API error msg

* fix some comment

* fix some comment
```
  b425215a
25 5月, 2021 1 次提交
- M
  
  Add Automatic SParsity Utilities (#32995) · f91e0f45
  由 Ming-Xu Huang 提交于 5月 25, 2021
  
  f91e0f45
07 5月, 2021 1 次提交
- Z
  Remove paddle_custom_op dynamic libraries, and link to FLUID_CORE on Windows (#32583) · 7610c2b4
  由 Zhou Wei 提交于 5月 07, 2021
```
* Remove paddle_custom_op dynamic libraries, change link to FLUID_CORE on windows, and check copy_to

* fix CI
```
  7610c2b4
30 4月, 2021 1 次提交
- T
  revert data_generator __init__.py (#32670) · eb13c19f
  由 tianshuo78520a 提交于 4月 30, 2021
```
* revert data_generator

* test

* add setup.py
```
  eb13c19f
25 4月, 2021 2 次提交
- L
  add pipeline for dynamic graph (#32511) · 561dc719
  由 lilong12 提交于 4月 25, 2021
```
* add pp dygraph, test=develop
```
  561dc719
- S
  [HybridParallel] Add pipeline layer in dygraph (#32449) · 7ef1de67
  由 ShenLiang 提交于 4月 25, 2021
```
* add pipeline layer
```
  7ef1de67
23 4月, 2021 1 次提交
- C
  [CustomOp] Remove useless extension headers for old custom op (#32463) · 7d4998ac
  由 Chen Weihang 提交于 4月 23, 2021
```
* remove useless ext headers

* fix boost header compile failed
```
  7d4998ac
22 4月, 2021 2 次提交
- W
  
  strip after compilation (#32145) · e727820d
  由 wuhuanzhou 提交于 4月 22, 2021
  
  e727820d
- T
  
  Delete WITH_GRPC flag and Distributed old code (#32383) · e58c705b
  由 tianshuo78520a 提交于 4月 22, 2021
  
  e58c705b
21 4月, 2021 1 次提交
- X
  remove fluid for auto_checkpoint. (#32157) · 1593ee25
  由 xiemoyuan 提交于 4月 21, 2021
```
* remove fluid for auto_checkpoint.

* fix bug.
```
  1593ee25
19 4月, 2021 1 次提交
- S
  [Hybrid Parallel] Support dp & mp in dygraph (#32323) · ffd40860
  由 ShenLiang 提交于 4月 19, 2021
```
* support dp & mp
```
  ffd40860
17 4月, 2021 1 次提交
- S
  [Hybrid Parallel] Add model parallel support in dygraph (#32248) · 66d46221
  由 ShenLiang 提交于 4月 17, 2021
```
* add model parallel support in dygraph
```
  66d46221

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致