提交 · 21b93c3dc68c616f12c360ebbbd9961fe379902f · BaiXuePrincess / Paddle

29 9月, 2021 14 次提交
- Z
  Add basic support for CUDA Graph (#36190) · 21b93c3d
  由 Zeng Jinle 提交于 9月 29, 2021
```
* add basic support for CUDA Graph

* fix ci compile error

* fix LOG print, fix windows CI

* follow comments and update

* small fix for default ctor

* fix rocm compile error

* fix CPU compile error
```
  21b93c3d
- L
  fix cusparse compile problem, test=develop (#36199) · 3eb50715
  由 Liu-xiandong 提交于 9月 29, 2021
```
* fix cusparse compile problem, test=develop

* Modify file permissions
```
  3eb50715
- L
  Spinlock (#36030) · a9ea41c5
  由 liutiexing 提交于 9月 29, 2021
```
* add align for WorkQueue

* add spinlock

* merge spinlock
```
  a9ea41c5
- Y
  
  add slot record dataset (#36200) · 79bd5f90
  由 yaoxuefeng 提交于 9月 29, 2021
  
  79bd5f90
- Z
  [npu] add box coder (#36171) · 83578cfa
  由 zhulei 提交于 9月 29, 2021
```
* [npu] add box coder

* [npu] add box coder
```
  83578cfa
- P
  
  fix bug of top_k npu op (#36175) · 2b8fd704
  由 pangyoki 提交于 9月 29, 2021
  
  2b8fd704
- Z
  [NPU] Add group norm (#35937) · c79de728
  由 zhulei 提交于 9月 29, 2021
```
* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group_norm op
```
  c79de728
- A
  [NPU] mod for model bert (#36165) · 7bddf2e8
  由 Aganlengzi 提交于 9月 29, 2021
```
* merge conflict of paddle_gtest_main.cc

* modify FLAGS_npu_precision_mode and default not to call aclSetCompileopt
```
  7bddf2e8
- Y
  
  Implement the grad and enhance the cache of norm_convolution fusion ops. (#36168) · 767050d9
  由 Yiqun Liu 提交于 9月 29, 2021
  
  767050d9
- Z
  
  remove wait if no fetch (#36150) · b3d2dc7b
  由 Zeng Jinle 提交于 9月 29, 2021
  
  b3d2dc7b
- B
  
  fix nullptr block in op_teller (#36197) · 667bf188
  由 baoachun 提交于 9月 29, 2021
  
  667bf188
- Z
  
  refine case when thread_num = 1 (#36201) · 7e60cc63
  由 Zeng Jinle 提交于 9月 29, 2021
  
  7e60cc63
- L
  
  Add fused_dropout wrapper to ease use. (#36185) · 092d45c3
  由 Li Min 提交于 9月 29, 2021
  
  092d45c3
- R
  
  [ROCM] bugfix for bilinear_interp_v2_grad (#36160) · 5e1d0b5c
  由 ronnywang 提交于 9月 29, 2021
  
  5e1d0b5c
28 9月, 2021 13 次提交

L
Add sparse_attention api, test=develop (#35676) · 6b587e93
由 Liu-xiandong 提交于 9月 28, 2021
```
Add sparse_attention OPs, python api will be added in next pr
```
6b587e93

add API paddle.linalg.eig (#35674) · bc7e2b92

由 Lijunhui 提交于 9月 28, 2021

* Add paddle.linalg.eig op

* remove comments

* remove comments

* extend batch_size to the origin

* add real times complex functor & destroy the backward complex output bug

* terminate output diff when input real tensors

* correct tiny doc errors

* move functions from eig_helper to svd_helper and remove eig_helper

* remove tensor.Resize

* remove no longer used code

* use existing lapack functions

* reply review comments 21/27

* remove .cu as this op is only executed on CPU

* remove const_cast & add const in argument list for read-only references

* fix sample code error in CI

* remove template typename Tbase and more

* remove eig exposure in paddle.*

* add 'name=None' in eig python implementation

* handle the unittest

* try to solve the unittest

* solve CI coverage

* remove no longer used code

* polish API doc and more

* reply review comments

* polish unittest, commit plan B

* polish unittest

bc7e2b92

R

[ROCM] bugfix for arg_min_max (#36098) · 36791fdd
由 ronnywang 提交于 9月 28, 2021

36791fdd
T
[HeterPs]ps gpu dump (#36157) · 97d30602
由 Thunderbrook 提交于 9月 28, 2021
```
* ps gpu dump

* remove log
```
97d30602

[hybrid] seed and dropout op support force-cpu (#35820) · 58c8f6b3

由 xiayanming 提交于 9月 28, 2021

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] fix seed ci failed issue

* add AsExtra for force_cpu of seed op

58c8f6b3

【Bug fix】Fix dygraph double grad dtype error (#36125) · af4f018a

由 Jiabin Yang 提交于 9月 28, 2021

* fix dygraph double grad dtype error when calling for high differential senario

* reinvoke ci

* add test for partial_engine.cc

af4f018a

L
[re-submit] auto read all public envs from flags_map in paddle_gtest_main (#36121) · 53f9768d
由 Leo Chen 提交于 9月 28, 2021
```
* read envs in flags_map

* add flags to undefok
```
53f9768d
L

reduce calls to SizeOfType (#36110) · c719add7
由 Leo Chen 提交于 9月 28, 2021

c719add7
G

fix bug of reduce_sum when src_dtype != dst_dtype and reduce_num == 1 (#36123) · d5268a6e
由 Guoxia Wang 提交于 9月 28, 2021

d5268a6e
Z

rename scale loss grad (#36162) · ad128144
由 Zeng Jinle 提交于 9月 28, 2021

ad128144

Add paddle.device.cuda.get_device_properties (#35661) · 4cbed9e5

由 Yanxing Shi 提交于 9月 28, 2021

* Initial Commit

* add unittest and add error information

* modify doc

* fix some error

* fix some word

* fix bug cudaDeviceProp* and modify error explanation

* fix cudaDeviceProp* error and unnitest samples

* fix hip error and PADDLE_WITH_HIP

* update style

* fix error is_compiled_with_cuda

* fix paddle.device.cuda.get_device_properties

* fix error for multi thread safe

* update style

* merge conflict

* modify after mentor review

* update style

* delete word

* fix unittest error for windows

* support string input and modify some code

* modify doc to support string input

* fix error for express information

* fix error for express information

* fix unnitest for windows

* fix device.startswith('gpu:')

* format error and doc

* fix after review

* format code

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix py2 error

* fix wrong words and doc

* fix _gpuDeviceProperties

4cbed9e5

Add Basic CINN Runner Class (#35978) · 6f18b041

由 Huihuang Zheng 提交于 9月 28, 2021

* Add Basic CINN Runner Class

* Add CinnCacheKey

* Add Cache logic and improve CinnCacheKey


* Modify as reviewer commented

* Implement hash_combine to fix MAC build.

6f18b041

S

dlpack fix (#35817) · 74ff59cf
由 Siming Dai 提交于 9月 28, 2021

74ff59cf

27 9月, 2021 6 次提交

gloo hdfs set check & gloo connect retry (#35750) · ae382d1f

由 xiaoxiao-luomu 提交于 9月 27, 2021

* gloo hdfs set check & gloo connect retry

* add vlog

* print gloo connect addr & add vlog

* .

* modify vlof

* modify vlog

* modify vlog

ae382d1f

fix zero tensor for unique, unstack (#36021) · efd35384

由 Jiawei Wang 提交于 9月 27, 2021

* fix extra op for expand, expand_as, tile, unstack

* fix unique unstack dim 0

* Update expand_v2_op.cc

* fix unique_op format

efd35384

Lars op optimiztion with cudaLaunchCooperativeKernel method (#35652) · a112ce42

由 limingshu 提交于 9月 27, 2021

* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* fix code according to comments

* fix codes according to  review comments

* adding some function overload

* relocate the power operation.

a112ce42

Added flatten and flatten2 BF16/FP32 FWD/BWD kernels (#35892) · e427a0f1

由 jakpiase 提交于 9月 27, 2021

* refactored reshape multiop kernel and added flatten1/2 kernels

* added formatting for flatten tests

* CI fix

* disabled reshape_kernel ops after succesful CI run

* minor fix

e427a0f1

A
Polish multi-thread schedule strategy and Keep one task in current thread (#35928) · 0e5d81c7
由 Aurelius84 提交于 9月 27, 2021
```
* Polish multi-thread schedule strategy

* fix atomic_deps

* modify into lambda function

* add and run
```
0e5d81c7
L
Revert "auto read all public envs from flags_map in paddle_gtest_main (#36057)" (#36117) · 7803f403
由 Leo Chen 提交于 9月 27, 2021
```
This reverts commit 3fabc808.
```
7803f403

26 9月, 2021 7 次提交
- J
  
  bugfix reshape -1 (#36087) · 2fe9ae71
  由 JZ-LIANG 提交于 9月 26, 2021
  
  2fe9ae71
- J
  [new api] add func/class API psroi_pool and UT (#35352) · e45d64ec
  由 JYChen 提交于 9月 26, 2021
```
* add func/class API psroi_pool and UT

* add UT in static mode

* Remove redundant type checks in static mode

* More detailed description for test_psroi_pool_op

* fix code format of UT

* fix en-doc
```
  e45d64ec
- T
  set file_num in one shard (#35835) · 991dc67d
  由 Thunderbrook 提交于 9月 26, 2021
```
* set file_num in one shard

* format
```
  991dc67d
- Z
  modify adam to adamw in AdamW (#36028) · 49c8253f
  由 zhangbo9674 提交于 9月 26, 2021
```
* adam to adamw in AdamW

* add lr_ratio in adamw

* refine logic bug in cpu adamw

* delete fix bug for cpu adamw

* delete fix bug for cpu adamw
```
  49c8253f
- L
  
  auto read all public envs from flags_map in paddle_gtest_main (#36057) · 3fabc808
  由 Leo Chen 提交于 9月 26, 2021
  
  3fabc808
- Y
  
  Add a check for multiplex op (#34972) · b430f6a3
  由 Yulong Ao 提交于 9月 26, 2021
  
  b430f6a3
- W
  
  Fix FPE of label smooth op (#35861) · 628ff34b
  由 whs 提交于 9月 26, 2021
  
  628ff34b

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致