提交 · 8305ba378708cdde9529520f7579a1d8823f4e10 · Crayon鑫 / Paddle

03 9月, 2021 1 次提交
- T
  
  fix bn_infer and optimize momentum for kunlun (#35250) · 8305ba37
  由 TTerror 提交于 9月 03, 2021
  
  8305ba37
02 9月, 2021 1 次提交

Add SVD Op and it's GPU and CPU kernel (#34953) · 7e5fb462

由 xiongkun 提交于 9月 02, 2021

* Add SVD Op and it's GPU and CPU kernel

* Remove CUDAPlace in test_svd_op, make the test available in CPU package

* modfity the file

* fix windows bug/ fix ROCM / fix test timeout

* for pass the CIs

* improve error report

* for code review

* some modification to test_svd_op

* change python code style

* expose the svd interface for document

7e5fb462

01 9月, 2021 1 次提交
- Q
  support KL label smooth (#35177) · 7ca28bb6
  由 QingshuChen 提交于 9月 01, 2021
```
* support KL label smooth

* update UT for KL label_smooth
```
  7ca28bb6
31 8月, 2021 4 次提交

S
Revert "Revert "Add copy from tensor (#34406)" (#35173)" (#35256) · 6116f9af
由 Shang Zhizhou 提交于 8月 31, 2021
```
* Revert "Revert "Add copy from tensor (#34406)" (#35173)"

This reverts commit 32c1ec42.

* add template instantiation
```
6116f9af
fix bug that cmake find python (#35304) · 00c9aeb0
由 zhouweiwei2014 提交于 8月 31, 2021

00c9aeb0

New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d

由 Zhanlue Yang 提交于 8月 31, 2021

[Background]
Expansion in code size can be irreversible in the long run, leading to huge release packages which
not only hampers user experience but also exceeds a hard limit of pypi.

In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
arches supported.

This PR aims to prune this NV_FATBIN.

[Solution]
In the new release strategy, two types of whl packages will be involved:

Cubin PIP package:
PIP package maintains a smaller window for GPU arches support, containing
sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches

JIT release package:
This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
compute_70, compute_75, compute_80, with best performance and GPU arches coverage.

However, it takes around 10 min to install due to the JIT compilation.

[How to use]
The new release strategy is disabled by default.
To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL

2f3b393d

W
fix CI skip cc test error (#35264) · 3d76d003
由 wuhuanzhou 提交于 8月 31, 2021
```
* fix CI skip cc test error, test=develop

* remove test code, test=develop
```
3d76d003

27 8月, 2021 1 次提交
- Z
  Revert "Add copy from tensor (#34406)" (#35173) · 32c1ec42
  由 zhangchunle 提交于 8月 27, 2021
```
This reverts commit ac33c0ca.
```
  32c1ec42
26 8月, 2021 1 次提交

Add copy from tensor (#34406) · ac33c0ca

由 Shang Zhizhou 提交于 8月 26, 2021

* add api

* temp save

* revert

* copytocpu async ok

* fix style

* copy sync ok

* fix compile error

* fix compile error

* api done

* update python async api

* fix compile

* remove async python api; add c++ async unittest

* remove python async api

* update unittest

* update unittest

* add C++ unittest for copytensor

* add unittest

* update namespace utils to class TensorUtils

* add unittest

* update unittest

* update unittest

* update code style

* update code style

* update unittest

ac33c0ca

25 8月, 2021 2 次提交
- W
  
  strip inference.so and make link to mkldnn.so (#34895) · 086540cc
  由 Wilber 提交于 8月 25, 2021
  
  086540cc
- T
  
  update elementwise api in kunlun (#35021) · ff96a7d5
  由 taixiurong 提交于 8月 25, 2021
  
  ff96a7d5
23 8月, 2021 1 次提交
- L
  
  upgrade oneDNN to v2.3.2 (#35040) · a047c139
  由 lidanqing 提交于 8月 23, 2021
  
  a047c139
16 8月, 2021 1 次提交

Jetson nano bilinear (#34751) · 2a4ed087

由 feng_shuai 提交于 8月 16, 2021

* change bilinear thread for nano and tx2

* change bilinear thread for nano and tx2

2a4ed087

10 8月, 2021 1 次提交

copy boost/any.hpp to utils and replace boost::any with self defined any (#34613) · 12892929

由 chentianyu03 提交于 8月 10, 2021

* add any.hpp to utils and replace boost::any with self defined paddle::any

* add copy any.hpp to custom op depends

* modify any.hpp include path

* remove boost from setup.py.in

* add copy any.hpp to custom op depends

* move any.hpp to paddle/utils/ dirs

* move any.h to extension/include direction

* copy utils to right directions

12892929

09 8月, 2021 1 次提交
- Increase the speed of incremental compilation (#34616) · aab4d6e4
  由 zhouweiwei2014 提交于 8月 09, 2021
  
  aab4d6e4
06 8月, 2021 1 次提交
- T
  
  add get xpu version api (#34594) · 8a9dc5dc
  由 TTerror 提交于 8月 06, 2021
  
  8a9dc5dc
03 8月, 2021 1 次提交
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
29 7月, 2021 1 次提交
- Improve sccache hit rate and avoid absolute path (#34435) · 92d8fed8
  由 zhouweiwei2014 提交于 7月 29, 2021
  
  92d8fed8
21 7月, 2021 1 次提交
- Polish windows compile for Ninja, fix UT random compile (#34237) · 05805d91
  由 zhouweiwei2014 提交于 7月 21, 2021
```
* polish windows compile for Ninja, fix random compile fail

* polish windows compile for Ninja, fix random compile fail
```
  05805d91
14 7月, 2021 2 次提交
- T
  Support Mac M1 build (#34071) · ec0ea4c5
  由 tianshuo78520a 提交于 7月 14, 2021
```
* Support Mac M1 make

* cmake version check
```
  ec0ea4c5
- Support sccache to speed up compilation on Windows (#34019) · 4ce66826
  由 zhouweiwei2014 提交于 7月 14, 2021
```
* Support sccache to speed up compilation on Windows

* Support sccache to speed up compilation on Windows
```
  4ce66826
07 7月, 2021 1 次提交
- T
  
  [xpu] add dropout & amp ops in xpu place (#33891) · 84e813e3
  由 taixiurong 提交于 7月 07, 2021
  
  84e813e3
06 7月, 2021 1 次提交

Add gpu implementation of shuffle_batch_op (#33938) · c6b6ba1f

由 Zeng Jinle 提交于 7月 06, 2021

* add gpu implementation of shuffle batch
test=develop

* add thrust cuda patches
test=develop

* fix macro guard

* fix shuffle batch compile on windows/hip

* fix hip compilation error

* refine CMakeLists.txt

* fix windows compile error

* try to fix windows CI compilation error

* fix windows compilation again

* fix shuffle_batch op test on Windows

c6b6ba1f

02 7月, 2021 2 次提交
- J
  
  update of oneDNN to 2.3 final (#33923) · 15451c61
  由 Jacek Czaja 提交于 7月 02, 2021
  
  15451c61
- T
  
  update xpu cmake (#33906) · 4c352033
  由 TTerror 提交于 7月 02, 2021
  
  4c352033
29 6月, 2021 2 次提交
- T
  
  xpu support amp (#33809) · 4d4fb660
  由 taixiurong 提交于 6月 29, 2021
  
  4d4fb660
- Z
  support Ninja, establish dependencies relationship between paddle with third_party (#33140) · 43c38c67
  由 Zhou Wei 提交于 6月 29, 2021
```
* support Ninja and establish dependencies relationship between paddle with third_party

* fix CI

* support Ninja
```
  43c38c67
24 6月, 2021 1 次提交
- Z
  
  fix unittest can't get cuda error message correctly (#33743) · 3946afc4
  由 Zhou Wei 提交于 6月 24, 2021
  
  3946afc4
22 6月, 2021 1 次提交
- J
  
  - Updated of oneDNN to 2.3 + bugfixes (#33702) · 246da751
  由 Jacek Czaja 提交于 6月 22, 2021
  
  246da751
21 6月, 2021 1 次提交
- W
  
  update trt version from major to full (#33690) · 2d7ef7ad
  由 Wilber 提交于 6月 21, 2021
  
  2d7ef7ad
18 6月, 2021 2 次提交
- Z
  
  polish windows ci (#32964) · 39556a44
  由 Zhou Wei 提交于 6月 18, 2021
  
  39556a44
- W
  
  [XPU] Add xpu include and so into inference third_party (#33641) · cca44c1d
  由 Wilber 提交于 6月 18, 2021
  
  cca44c1d
17 6月, 2021 1 次提交
- L
  
  Relax the constraint of installed openblas from version==0.3.7 to >=0.3.7 (#33626) · ab0272eb
  由 Leo Chen 提交于 6月 17, 2021
  
  ab0272eb
16 6月, 2021 1 次提交
- Z
  
  Add bitwise_and/or/xor/not OP/API and unittest (#33524) · ecc05377
  由 Zhou Wei 提交于 6月 16, 2021
  
  ecc05377
15 6月, 2021 1 次提交
- W
  
  [XPU] Update cmake options for xpu. (#33450) · e47c3f04
  由 Wilber 提交于 6月 15, 2021
  
  e47c3f04
09 6月, 2021 1 次提交
- L
  
  Check the installed openblas version in cmake (#33440) · 23290929
  由 Leo Chen 提交于 6月 09, 2021
  
  23290929
08 6月, 2021 1 次提交
- T
  
  update xpu cmake for kunlun (#33328) · 64914ea4
  由 TTerror 提交于 6月 08, 2021
  
  64914ea4
07 6月, 2021 1 次提交
- L
  
  bump up to oneDNN v2.3 (#33229) · 94e83606
  由 lidanqing 提交于 6月 07, 2021
  
  94e83606
02 6月, 2021 2 次提交
- P
  
  fix jetson arch when compiling with single arch (#33269) · 29dc439a
  由 Pei Yang 提交于 6月 02, 2021
  
  29dc439a
- Q
  
  [ROCM] update paddle inference cmake, test=develop (#33260) · e7541209
  由 Qi Li 提交于 6月 02, 2021
  
  e7541209

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致