提交 · a0d465f80fa29e347331a7600bdff05ed18d1f2f · BaiXuePrincess / Paddle

25 11月, 2021 22 次提交

【PTen】Add fill_constant kernel using ScalarArray in pten (#37481) · a0d465f8

由 zyfncg 提交于 11月 25, 2021

* add scalar and scalar_array

* remove DenseTensor include from Scalar and ScalarArray

* remove inner header from scalar_array

* refactor the method of fill_constant and add some comment

* add fill_constant kernel using ScalarArray

* modify some prompt

* remove fill_constant kernel with no shape

a0d465f8

F
[NPU] add int64 support for argsort op (#37434) · 3e088aaf
由 furnace 提交于 11月 25, 2021
```
* [NPU] add int64 support for argsort op

* [NPU] delete debug codes
```
3e088aaf
F
[NPU] add NPU kernel for prior_box op (#37519) · 1127fecb
由 furnace 提交于 11月 25, 2021
```
* [NPU] add NPU kernel for prior_box op

* [NPU] delete debug codes
```
1127fecb
Y

Disable the check of missing op benchmark script temporarily. (#37535) · 65056742
由 Yiqun Liu 提交于 11月 25, 2021

65056742
Z

Pass the stream created by Paddle to CINN. (#37337) · c249556d
由 Zhen Wang 提交于 11月 25, 2021

c249556d
W

fix pass_desc.proto compilation error, test=develop (#37536) · a4ef88ed
由 wuhuanzhou 提交于 11月 25, 2021

a4ef88ed
B

Add InternalStorage and add ShardingOptimizerStage2 (#37489) · 5af64631
由 Baibaifan 提交于 11月 25, 2021

5af64631

[cherry-pick 2.2 heterps]bug fix for launch_utils.py (#37521) · 8bb1038c

由 zmx 提交于 11月 25, 2021

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* [heterps]bug fix for _run_from_dataset

* fix heter_server.cc

* fix launch_utils.py

* fix heter_section_worker.cc

* fix. test=develop

* fix. test=develop

8bb1038c

Support multi-stream allocation for CUDA place (#37290) · b9c464c3

由 From00 提交于 11月 25, 2021

* Support multi-stream allocation for CUDA place

* Do not notify the retrying from other streams when free CUDA allocation

* Fix compile error for CPU

* Fix compile error for HIP

* Release memory for StreamSafeCUDAAllocaRetry in malloc_test

* Add FLAGS_use_stream_safe_cuda_allocator

* Fix CI error for 'set_tests_properties'

* Invalidate stream safe CUDA allocator for naive_best_fit and thread_local strategy

* Performance improvement: insert allocation pair to outstanding_events_map when free but not alloc; replace recursive_mutex with SpinLock

* FLAGS priority changes: FLAGS_use_system_allocator > FLAGS_use_stream_safe_cuda_allocator

* Performance improvement: directly delete allocation when the recorded_streams is empty in FreeImpl of StreamSafeCUDAAllocator

* Add UT for alloc interface

* Changes multi-stream interface; move retry code from AllocatorFacadePrivate to StreamSafeCUDAAllocator

b9c464c3

S
block unknown option /arch:SSE3 (#37439) · adb54eb0
由 Sing_chan 提交于 11月 25, 2021
```
* block unknown option /arch:SSE3

* modify according to zhouwei's comment
```
adb54eb0
add new API paddle.nn.initializer.Dirac (#37389) · bbb9b28a
由 zhouweiwei2014 提交于 11月 25, 2021
```
* add new API paddle.nn.initializer.Dirac

* fix doc
```
bbb9b28a
L
[new-exec] fix program cache key (#37500) · e64829e2
由 Leo Chen 提交于 11月 25, 2021
```
* fix program cache key

* bug fix

* fix cache problem

* remove unused code
```
e64829e2
W

[fleet_executor] Compute Interceptor stop along data flow (#37531) · 50f75fb5
由 WangXi 提交于 11月 25, 2021

50f75fb5
T
Fix static-ci (#37504) · 992d4ebb
由 tianshuo78520a 提交于 11月 25, 2021
```
* Fix static-ci
```
992d4ebb

Added GradTensorHolder to Eager Dygraph (#37458) · bc9f9f43

由 Zhanlue Yang 提交于 11月 25, 2021

* Added GradTensorHolder to Eager Dygraph

* Added accumulation codes to Eager Dygraph

* Fix windows-ci issue

* Fix NPU-CI issue

* Fixed CI-Coverage issue

bc9f9f43

L

Export task node to python (#37509) · 3f815e76
由 LiYuRio 提交于 11月 25, 2021

3f815e76
C
Hot fix for dataloader thread error because of pten (#37520) · ed7a21de
由 Chen Weihang 提交于 11月 24, 2021
```
* hot fix for dataloader thread error

* polish comment

* fix type in comment, test=document_fix
```
ed7a21de
X
Fix test rnn memory helper op (#37474) · e4791d88
由 xiongkun 提交于 11月 25, 2021
```
* clear LoDTensorArray

* fix  bugs

* fix

* fix gpu
```
e4791d88

【PaddlePaddle Hackathon】6、在 Paddle 中新增 ZeroPad2d (#37151) · 81861f69

由 Matsumoto GAO 提交于 11月 25, 2021

* add zeropad2d v0.1

* add zeropad2d v0.2

* add zeropad2d v0.3

* add zeropad2d v0.3

* add zeropad2d v0.3

* add zeropad2d v0.4

* add zeropad2d v0.5

* add zeropad2d v0.5 codestyle

* add zeropad2d v0.5 codestyle

* add zeropad2d v0.6 functional

* add zeropad2d v0.6 functional

* add zeropad2d v0.6 functional

81861f69

W

fix_matmul_op_int8_plugin (#37525) · 0fd70d71
由 Wangzheee 提交于 11月 25, 2021

0fd70d71
C

infershape func to infermeta (#37524) · 2a905f6b
由 Chen Weihang 提交于 11月 24, 2021

2a905f6b
L
[new-exec] skip compiled program (#37512) · 171da2ce
由 Leo Chen 提交于 11月 25, 2021
```
* skip compiled program

* fix ut
```
171da2ce

24 11月, 2021 18 次提交
- P
  Changed second batch of deprecated mkldnn header and function names to new oneDNN names (#37351) · 7db7a0ec
  由 piotrekobiIntel 提交于 11月 24, 2021
```
* Add second batch of deprecated mkldnn namespace and macro changes

* Unlock CI

* Fix temporary namespace alias placing
```
  7db7a0ec
- Y
  
  [fleet_executor] fix message bus bug (#37507) · 10d8d6b6
  由 Yuang Liu 提交于 11月 24, 2021
  
  10d8d6b6
- Z
  Added EagerUtils to Eager Dygraph (#37479) · 7de99d8c
  由 Zhanlue Yang 提交于 11月 24, 2021
```
* Added EagerUtils to Eager Dygraph

* Purified include dependencies for global_utils

* Fixed merge conflicts
```
  7de99d8c
- S
  
  bring forward check added_ut (#37511) · 486b77f2
  由 Sing_chan 提交于 11月 24, 2021
  
  486b77f2
- T
  [GpuPs]pybind core (#37287) · d69daed1
  由 Thunderbrook 提交于 11月 24, 2021
```
* pybind core

* set use psgpu
```
  d69daed1
- A
  
  Fix lod in fetch_v2 (#37514) · acbf9974
  由 Aurelius84 提交于 11月 24, 2021
  
  acbf9974
- J
  
  fix range op (#37486) · d5c51e62
  由 Jiawei Wang 提交于 11月 24, 2021
  
  d5c51e62
- L
  
  [new-exec] support skipping infershape (#37510) · e76b601b
  由 Leo Chen 提交于 11月 24, 2021
  
  e76b601b
- Y
  elementwise_mul refactor (#37471) · c5e857d4
  由 YuanRisheng 提交于 11月 24, 2021
```
* elementwise_mul refactor

* perfect code in test

* delete redundant code

* fix bugs when run test_multiply

* adjust the location of macro

* fix bugs when run ci
```
  c5e857d4
- Z
  【PTen】Add Scalar and ScalarArray in pten (#37409) · 0f24de83
  由 zyfncg 提交于 11月 24, 2021
```
* add scalar and scalar_array

* remove DenseTensor include from Scalar and ScalarArray

* remove inner header from scalar_array

* refactor the method of fill_constant and add some comment
```
  0f24de83
- W
  [Paddle-Inference] Matmul_int8_convert: tensor*tensor (#37285) · 16590799
  由 Wangzheee 提交于 11月 24, 2021
```
* matmul_convert_int8

* matmul_convert_int8

* matmulconvert_int8

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor
```
  16590799
- Z
  Adapt auto search (#37490) · 025053b4
  由 zhaoyingli 提交于 11月 24, 2021
```
* adapt auto search

* adapt auto search

* fix matmulv2 compatible

* del debug
```
  025053b4
- T
  Fix op-benchmark CI (#37487) · 5ff1ff5a
  由 tianshuo78520a 提交于 11月 24, 2021
```
Fix op-benchmark CI
```
  5ff1ff5a
- Y
  [Auto Parallel] Add the unified cluster representation (#37091) · db727551
  由 Yulong Ao 提交于 11月 24, 2021
```
* [Auto Parallel]  Add the unified cluster representation

* Add the local id for devices

* Add some comments
```
  db727551
- A
  
  [NewExe] Support HandleComplexGradToRealGrad to cast complex into Real (#37450) · 8b87d5eb
  由 Aurelius84 提交于 11月 24, 2021
  
  8b87d5eb
- C
  [PTen] Standardized unittest namespace (#37456) · 1c969d20
  由 Chen Weihang 提交于 11月 23, 2021
```
* standarded unittest namespace

* fix detail error
```
  1c969d20
- 0
  [Dy2stat]support pure fp16 for dy2stat (#36944) · 52edad6a
  由 0x45f 提交于 11月 24, 2021
```
* run dy2stat pure fp16 in Linear model

* no use self._pure_fp16_inputs

* add test and fix Adam error in dy2stat pure fp16 training

* use paddle.optimizer.Adam

* run test in gpu

* change test time for CI

* enlarge atol for test_resnet_pure_fp16

* refine code and enlarge atol

* make custom_white_list and custom_black_list take effect for AMP and pure fp16

* check tracer is not None

* use default atol

* change filter_size

* change atol and add some NOTE
```
  52edad6a
- Z
  
  fix lite with xpu or nnadapter (#37449) · 93aefceb
  由 zhupengyang 提交于 11月 24, 2021
  
  93aefceb

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致