提交 · a0d465f80fa29e347331a7600bdff05ed18d1f2f · PaddlePaddle / Paddle

25 11月, 2021 15 次提交

【PTen】Add fill_constant kernel using ScalarArray in pten (#37481) · a0d465f8

由 zyfncg 提交于 11月 25, 2021

* add scalar and scalar_array

* remove DenseTensor include from Scalar and ScalarArray

* remove inner header from scalar_array

* refactor the method of fill_constant and add some comment

* add fill_constant kernel using ScalarArray

* modify some prompt

* remove fill_constant kernel with no shape

a0d465f8

F
[NPU] add int64 support for argsort op (#37434) · 3e088aaf
由 furnace 提交于 11月 25, 2021
```
* [NPU] add int64 support for argsort op

* [NPU] delete debug codes
```
3e088aaf
F
[NPU] add NPU kernel for prior_box op (#37519) · 1127fecb
由 furnace 提交于 11月 25, 2021
```
* [NPU] add NPU kernel for prior_box op

* [NPU] delete debug codes
```
1127fecb
Y

Disable the check of missing op benchmark script temporarily. (#37535) · 65056742
由 Yiqun Liu 提交于 11月 25, 2021

65056742
Z

Pass the stream created by Paddle to CINN. (#37337) · c249556d
由 Zhen Wang 提交于 11月 25, 2021

c249556d
W

fix pass_desc.proto compilation error, test=develop (#37536) · a4ef88ed
由 wuhuanzhou 提交于 11月 25, 2021

a4ef88ed

[cherry-pick 2.2 heterps]bug fix for launch_utils.py (#37521) · 8bb1038c

由 zmx 提交于 11月 25, 2021

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* [heterps]bug fix for _run_from_dataset

* fix heter_server.cc

* fix launch_utils.py

* fix heter_section_worker.cc

* fix. test=develop

* fix. test=develop

8bb1038c

Support multi-stream allocation for CUDA place (#37290) · b9c464c3

由 From00 提交于 11月 25, 2021

* Support multi-stream allocation for CUDA place

* Do not notify the retrying from other streams when free CUDA allocation

* Fix compile error for CPU

* Fix compile error for HIP

* Release memory for StreamSafeCUDAAllocaRetry in malloc_test

* Add FLAGS_use_stream_safe_cuda_allocator

* Fix CI error for 'set_tests_properties'

* Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy

* Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock

* FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator

* Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator

* Add UT for alloc interface

* Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator

b9c464c3

W

[fleet_executor] Compute Interceptor stop along data flow (#37531) · 50f75fb5
由 WangXi 提交于 11月 25, 2021

50f75fb5
T
Fix static-ci (#37504) · 992d4ebb
由 tianshuo78520a 提交于 11月 25, 2021
```
* Fix static-ci
```
992d4ebb

Added GradTensorHolder to Eager Dygraph (#37458) · bc9f9f43

由 Zhanlue Yang 提交于 11月 25, 2021

* Added GradTensorHolder to Eager Dygraph

* Added accumulation codes to Eager Dygraph

* Fix windows-ci issue

* Fix NPU-CI issue

* Fixed CI-Coverage issue

bc9f9f43

L

Export task node to python (#37509) · 3f815e76
由 LiYuRio 提交于 11月 25, 2021

3f815e76
X
Fix test rnn memory helper op (#37474) · e4791d88
由 xiongkun 提交于 11月 25, 2021
```
* clear LoDTensorArray

* fix  bugs

* fix

* fix gpu
```
e4791d88
W

fix_matmul_op_int8_plugin (#37525) · 0fd70d71
由 Wangzheee 提交于 11月 25, 2021

0fd70d71
C

infershape func to infermeta (#37524) · 2a905f6b
由 Chen Weihang 提交于 11月 24, 2021

2a905f6b

24 11月, 2021 16 次提交

P
Changed second batch of deprecated mkldnn header and function names to new oneDNN names (#37351) · 7db7a0ec
由 piotrekobiIntel 提交于 11月 24, 2021
```
* Add second batch of deprecated mkldnn namespace and macro changes

* Unlock CI

* Fix temporary namespace alias placing
```
7db7a0ec
Y

[fleet_executor] fix message bus bug (#37507) · 10d8d6b6
由 Yuang Liu 提交于 11月 24, 2021

10d8d6b6

Added EagerUtils to Eager Dygraph (#37479) · 7de99d8c

由 Zhanlue Yang 提交于 11月 24, 2021

* Added EagerUtils to Eager Dygraph

* Purified include dependencies for global_utils

* Fixed merge conflicts

7de99d8c

A

Fix lod in fetch_v2 (#37514) · acbf9974
由 Aurelius84 提交于 11月 24, 2021

acbf9974
L

[new-exec] support skipping infershape (#37510) · e76b601b
由 Leo Chen 提交于 11月 24, 2021

e76b601b

elementwise_mul refactor (#37471) · c5e857d4

由 YuanRisheng 提交于 11月 24, 2021

* elementwise_mul refactor

* perfect code in test

* delete redundant code

* fix bugs when run test_multiply

* adjust the location of macro

* fix bugs when run ci

c5e857d4

【PTen】Add Scalar and ScalarArray in pten (#37409) · 0f24de83

由 zyfncg 提交于 11月 24, 2021

* add scalar and scalar_array

* remove DenseTensor include from Scalar and ScalarArray

* remove inner header from scalar_array

* refactor the method of fill_constant and add some comment

0f24de83

[Paddle-Inference] Matmul_int8_convert: tensor*tensor (#37285) · 16590799

由 Wangzheee 提交于 11月 24, 2021

* matmul_convert_int8

* matmul_convert_int8

* matmulconvert_int8

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

16590799

Z
Adapt auto search (#37490) · 025053b4
由 zhaoyingli 提交于 11月 24, 2021
```
* adapt auto search

* adapt auto search

* fix matmulv2 compatible

* del debug
```
025053b4
A

[NewExe] Support HandleComplexGradToRealGrad to cast complex into Real (#37450) · 8b87d5eb
由 Aurelius84 提交于 11月 24, 2021

8b87d5eb
C
[PTen] Standardized unittest namespace (#37456) · 1c969d20
由 Chen Weihang 提交于 11月 23, 2021
```
* standarded unittest namespace

* fix detail error
```
1c969d20

[Dy2stat]support pure fp16 for dy2stat (#36944) · 52edad6a

由 0x45f 提交于 11月 24, 2021

* run dy2stat pure fp16 in Linear model

* no use self._pure_fp16_inputs

* add test and fix Adam error in dy2stat pure fp16 training

* use paddle.optimizer.Adam

* run test in gpu

* change test time for CI

* enlarge atol for test_resnet_pure_fp16

* refine code and enlarge atol

* make custom_white_list and custom_black_list take effect for AMP and pure fp16

* check tracer is not None

* use default atol

* change filter_size

* change atol and add some NOTE

52edad6a

Z

fix lite with xpu or nnadapter (#37449) · 93aefceb
由 zhupengyang 提交于 11月 24, 2021

93aefceb
F

fix:transform the data from cpu to gpu when trt is used (#37427) · 49366a63
由 feng_shuai 提交于 11月 24, 2021

49366a63
W

[fleet_executor] Complete compute interceptor (#37485) · be3b7740
由 WangXi 提交于 11月 24, 2021

be3b7740

Refactor dygraph to eager -- TensorWrapper, EagerUtils, GlobalUtils (#37466) · 1799c032

由 Jiabin Yang 提交于 11月 24, 2021

* Add EagerTensor and tests

* remove useless enforce

* remove comment in cmake

* support autograd meta

* support grad node info test

* support grad_node_info

* add more edge test

* remove Python.h

* add tensor wrapper with tests

* support compute require grad and stop gradient

* support sync methods and global utils

* support pure cpu test

* refine error msg

* refine error msg

* refine error info

* fix npu error

1799c032

23 11月, 2021 9 次提交
- P
  fix inplace bug when the first grad_var(loss_grad) is inplace var (#37420) · ee1e1642
  由 pangyoki 提交于 11月 23, 2021
```
* fix inplace bug

* fix custom grad input error

* add unittest

* fix inplace bug
```
  ee1e1642
- Q
  [XPU] Reorganize xpu device codes in platform, test=develop (#37428) · 79800978
  由 Qi Li 提交于 11月 23, 2021
```
* [XPU] Reorganize xpu device codes in platform, test=develop

* fix xpu_header.h, test=develop
```
  79800978
- L
  Add support bias is none for fused_attention op. (#37411) · 1a8786cf
  由 Li Min 提交于 11月 23, 2021
```
Add support for bias is none for fused_attention op.
```
  1a8786cf
- W
  
  set feed var skip inplace, test=develop (#37467) · 4812eda5
  由 wanghuancoder 提交于 11月 23, 2021
  
  4812eda5
- Y
  
  [fleet_executor] Update with collective (#37462) · df14dbf0
  由 Yuang Liu 提交于 11月 23, 2021
  
  df14dbf0
- T
  
  test=document_fix (#37477) · 38f1ef50
  由 tianshuo78520a 提交于 11月 23, 2021
  
  38f1ef50
- F
  
  use ShareBufferWith instead of ShareDataWith for ops with view mechanism (#37464) · 81349970
  由 Feiyu Chan 提交于 11月 23, 2021
  
  81349970
- W
  fix problem of dcnv2 trt (#37345) · e91141fb
  由 wangxinxin08 提交于 11月 23, 2021
```
* modify code about fp16 of dcnv2 trt
```
  e91141fb
- Z
  
  Removed debug code (#37447) · 586bafbd
  由 Zhanlue Yang 提交于 11月 23, 2021
  
  586bafbd

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功