提交 · c79de7286e4463119639f97143ef1f91cc70d6a9 · 机器未来 / Paddle

29 9月, 2021 12 次提交

[NPU] Add group norm (#35937) · c79de728

由 zhulei 提交于 9月 29, 2021

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group norm

* [NPU] Add group_norm op

c79de728

[NPU] mod for model bert (#36165) · 7bddf2e8

由 Aganlengzi 提交于 9月 29, 2021

* merge conflict of paddle_gtest_main.cc

* modify FLAGS_npu_precision_mode and default not to call aclSetCompileopt

7bddf2e8

W

[hybrid] Fix model parallel non-distributed param broadcast (#36186) · bec9fc9a
由 WangXi 提交于 9月 29, 2021

bec9fc9a

Add op paddle.device.cuda.get_device_name and paddle.device.cuda.get_device_capability. (#35672) · f703558d

由 hlygit66666 提交于 9月 29, 2021

* add op paddle.device.cuda.get_device_name

* fix some bugs

* fix some bugs

* fix error message bugs

* fix en docs

* fix bugs

* fix bugs

* fix bugs

* add error message test case

* add get_device_name and get_device_capability

* fix review

* fix docs bug

* fix docs

* fix docs

f703558d

fix paddle.device.cuda.get_device_properties doc (#36178) · 6d4435ac

由 Yanxing Shi 提交于 9月 29, 2021

* Initial Commit

* add unittest and add error information

* modify doc

* fix some error

* fix some word

* fix bug cudaDeviceProp* and modify error explanation

* fix cudaDeviceProp* error and unnitest samples

* fix hip error and PADDLE_WITH_HIP

* update style

* fix error is_compiled_with_cuda

* fix paddle.device.cuda.get_device_properties

* fix error for multi thread safe

* update style

* merge conflict

* modify after mentor review

* update style

* delete word

* fix unittest error for windows

* support string input and modify some code

* modify doc to support string input

* fix error for express information

* fix error for express information

* fix unnitest for windows

* fix device.startswith('gpu:')

* format error and doc

* fix after review

* format code

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix py2 error

* fix wrong words and doc

* fix _gpuDeviceProperties

* test=document_fix

6d4435ac

Y

Implement the grad and enhance the cache of norm_convolution fusion ops. (#36168) · 767050d9
由 Yiqun Liu 提交于 9月 29, 2021

767050d9
Z

remove wait if no fetch (#36150) · b3d2dc7b
由 Zeng Jinle 提交于 9月 29, 2021

b3d2dc7b
B

fix nullptr block in op_teller (#36197) · 667bf188
由 baoachun 提交于 9月 29, 2021

667bf188
Z

refine case when thread_num = 1 (#36201) · 7e60cc63
由 Zeng Jinle 提交于 9月 29, 2021

7e60cc63
L

Add fused_dropout wrapper to ease use. (#36185) · 092d45c3
由 Li Min 提交于 9月 29, 2021

092d45c3
R

[ROCM] bugfix for bilinear_interp_v2_grad (#36160) · 5e1d0b5c
由 ronnywang 提交于 9月 29, 2021

5e1d0b5c
Z

fix flags approval (#36192) · 1b1210ea
由 Zeng Jinle 提交于 9月 29, 2021

1b1210ea

28 9月, 2021 17 次提交

F
add roi_align (#35102) · f068e08d
由 Feng Ni 提交于 9月 28, 2021
```
* add roi_align in vision/ops.py
```
f068e08d
L
Add sparse_attention api, test=develop (#35676) · 6b587e93
由 Liu-xiandong 提交于 9月 28, 2021
```
Add sparse_attention OPs, python api will be added in next pr
```
6b587e93

add API paddle.linalg.eig (#35674) · bc7e2b92

由 Lijunhui 提交于 9月 28, 2021

* Add paddle.linalg.eig op

* remove comments

* remove comments

* extend batch_size to the origin

* add real times complex functor & destroy the backward complex output bug

* terminate output diff when input real tensors

* correct tiny doc errors

* move functions from eig_helper to svd_helper and remove eig_helper

* remove tensor.Resize

* remove no longer used code

* use existing lapack functions

* reply review comments 21/27

* remove .cu as this op is only executed on CPU

* remove const_cast & add const in argument list for read-only references

* fix sample code error in CI

* remove template typename Tbase and more

* remove eig exposure in paddle.*

* add 'name=None' in eig python implementation

* handle the unittest

* try to solve the unittest

* solve CI coverage

* remove no longer used code

* polish API doc and more

* reply review comments

* polish unittest, commit plan B

* polish unittest

bc7e2b92

R

[ROCM] bugfix for arg_min_max (#36098) · 36791fdd
由 ronnywang 提交于 9月 28, 2021

36791fdd
T
[HeterPs]ps gpu dump (#36157) · 97d30602
由 Thunderbrook 提交于 9月 28, 2021
```
* ps gpu dump

* remove log
```
97d30602

[hybrid] seed and dropout op support force-cpu (#35820) · 58c8f6b3

由 xiayanming 提交于 9月 28, 2021

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug, the flag PADDLE_WITH_ROCM is invalid

* [HIP] fix op not support AMD GPU bug

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] seed and dropout op support force-cpu

* [hybrid] fix seed ci failed issue

* add AsExtra for force_cpu of seed op

58c8f6b3

remove new linalg api in paddle.__init__ (#36151) · 3bb4715e

由 zhiboniu 提交于 9月 28, 2021

remove recent linalg api in paddle.init;
add args 'name' in some new linalg api interface
same change in develop branch to #36112

3bb4715e

【Bug fix】Fix dygraph double grad dtype error (#36125) · af4f018a

由 Jiabin Yang 提交于 9月 28, 2021

* fix dygraph double grad dtype error when calling for high differential senario

* reinvoke ci

* add test for partial_engine.cc

af4f018a

K

py2 to py3 bug and iface fix for pslib (#36102) · 0e07f20e
由 kuizhiqing 提交于 9月 28, 2021

0e07f20e
L
[re-submit] auto read all public envs from flags_map in paddle_gtest_main (#36121) · 53f9768d
由 Leo Chen 提交于 9月 28, 2021
```
* read envs in flags_map

* add flags to undefok
```
53f9768d
L

reduce calls to SizeOfType (#36110) · c719add7
由 Leo Chen 提交于 9月 28, 2021

c719add7
W

[hybrid] optimizer sharding support optimize cast (#35878) · eef0a943
由 WangXi 提交于 9月 28, 2021

eef0a943
G

fix bug of reduce_sum when src_dtype != dst_dtype and reduce_num == 1 (#36123) · d5268a6e
由 Guoxia Wang 提交于 9月 28, 2021

d5268a6e
Z

rename scale loss grad (#36162) · ad128144
由 Zeng Jinle 提交于 9月 28, 2021

ad128144

Add paddle.device.cuda.get_device_properties (#35661) · 4cbed9e5

由 Yanxing Shi 提交于 9月 28, 2021

* Initial Commit

* add unittest and add error information

* modify doc

* fix some error

* fix some word

* fix bug cudaDeviceProp* and modify error explanation

* fix cudaDeviceProp* error and unnitest samples

* fix hip error and PADDLE_WITH_HIP

* update style

* fix error is_compiled_with_cuda

* fix paddle.device.cuda.get_device_properties

* fix error for multi thread safe

* update style

* merge conflict

* modify after mentor review

* update style

* delete word

* fix unittest error for windows

* support string input and modify some code

* modify doc to support string input

* fix error for express information

* fix error for express information

* fix unnitest for windows

* fix device.startswith('gpu:')

* format error and doc

* fix after review

* format code

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix error for doc compile

* fix py2 error

* fix wrong words and doc

* fix _gpuDeviceProperties

4cbed9e5

Add Basic CINN Runner Class (#35978) · 6f18b041

由 Huihuang Zheng 提交于 9月 28, 2021

* Add Basic CINN Runner Class

* Add CinnCacheKey

* Add Cache logic and improve CinnCacheKey


* Modify as reviewer commented

* Implement hash_combine to fix MAC build.

6f18b041

S

dlpack fix (#35817) · 74ff59cf
由 Siming Dai 提交于 9月 28, 2021

74ff59cf

27 9月, 2021 11 次提交

gloo hdfs set check & gloo connect retry (#35750) · ae382d1f

由 xiaoxiao-luomu 提交于 9月 27, 2021

* gloo hdfs set check & gloo connect retry

* add vlog

* print gloo connect addr & add vlog

* .

* modify vlof

* modify vlog

* modify vlog

ae382d1f

fix zero tensor for unique, unstack (#36021) · efd35384

由 Jiawei Wang 提交于 9月 27, 2021

* fix extra op for expand, expand_as, tile, unstack

* fix unique unstack dim 0

* Update expand_v2_op.cc

* fix unique_op format

efd35384

Lars op optimiztion with cudaLaunchCooperativeKernel method (#35652) · a112ce42

由 limingshu 提交于 9月 27, 2021

* A leap of try for cudaLaunchCooperativeKernel

* fix bugs

* Totally replace the lar cuda kernel

* Fix bugs

* fix code according to comments

* fix codes according to  review comments

* adding some function overload

* relocate the power operation.

a112ce42

Added flatten and flatten2 BF16/FP32 FWD/BWD kernels (#35892) · e427a0f1

由 jakpiase 提交于 9月 27, 2021

* refactored reshape multiop kernel and added flatten1/2 kernels

* added formatting for flatten tests

* CI fix

* disabled reshape_kernel ops after succesful CI run

* minor fix

e427a0f1

Add functional autograd API: jacobian (#35917) · ec2f68e8

由 levi131 提交于 9月 27, 2021

* init functional jacobian api

* finish test with dtype float32

* add float64 test case

* polish code

* use atol=1e-5 with dtype float64

* fix for ci

* set timeout for test_jacobian

* polish API docstring

* modify docstring

ec2f68e8

W
Add roi pool (#35084) · 6d62769a
由 Wenyu 提交于 9月 27, 2021
```
* add roi pool

* rename input as x
```
6d62769a
Z

test=document_fix;paddle/testing nend run all cases (#36138) · 6841d4d4
由 zhangchunle 提交于 9月 27, 2021

6841d4d4
A
Polish multi-thread schedule strategy and Keep one task in current thread (#35928) · 0e5d81c7
由 Aurelius84 提交于 9月 27, 2021
```
* Polish multi-thread schedule strategy

* fix atomic_deps

* modify into lambda function

* add and run
```
0e5d81c7
J

[Docker Images] Add cuda11.2 + cudnn8.2.1 + trt8.0.3.4 images (#35982) · 6c4a741a
由 JingZhuangzhuang 提交于 9月 26, 2021

6c4a741a

support saving model defined parameters without add scale_op (#36119) · 8db6d221

由 Haipeng Wang 提交于 9月 27, 2021

* add scale_op in model save step is not necessary, just fix the prune method to support static graph and inplace op

* fix jit.save, no need to add scale_op to each outputvar anymore.
fix prune_with_input, now it supports inplace op

* temporarily disable test_trt_dynamic_shape.TRTDynamicShapeOutOfBound2Test

* allow user to export parameters defined in model

8db6d221

X

update externalErrorMsg.tar.gz md5 value (#36126) · 23ccbcb1
由 Xiaoxu Chen 提交于 9月 27, 2021

23ccbcb1

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致