提交 · 6721376ba68177fa169d0ae8d306a09a6ac66da6 · 机器未来 / Paddle

25 4月, 2022 22 次提交

Optimize dygraph InferShape perf (#42155) · 6721376b

由 Chen Weihang 提交于 4月 25, 2022

* init commit

* remove two hash impl

* fix bug

* polish details

* fix compile failed

* fix compile failed

* fix compile failed

* add default kernel sig cache

* fix get kernel arg defs error

* remove kernel arg defs cache

* fix origin op execute

6721376b

B
update test case output threshold (#41242) · 192a5af5
由 baoachun 提交于 4月 25, 2022
```
* update test case output threshold

* update testcase
```
192a5af5

Fix compiling ort test cases error on Windows (#42186) · 3241cea2

由 heliqi 提交于 4月 25, 2022

* fix windows compile test case error

* test windows ci

* cmake add onnxruntime

* cmake add onnxruntime

* test windows ci

* auto_code_generator add ort lib copy

* fallback modify windows ci bat

* ci notest;test=document_fix;test=windows_ci_inference;test=windows_ci;test=windows_op

3241cea2

C
Change small vector size (#42202) · 8df81f83
由 Chen Weihang 提交于 4月 25, 2022
```
* change samll vector size

* Update type_defs.h
```
8df81f83
Z

Increase test_export_deploy_model tolerance for broadwell CPU (#42230) · 45572700
由 zlsh80826 提交于 4月 25, 2022

45572700
L
Fix dimension merge bug in broadcast (#42143) · 2562ad5a
由 limingshu 提交于 4月 25, 2022
```
* change sequential logic

* change some quotes

* add some notations

* change wrong note style.
```
2562ad5a
merge all phi kernel lib to several big static lib, reduce link command (#42185) · e52e6d01
由 zhouweiwei2014 提交于 4月 25, 2022
```
* merge all phi lib to several big static lib

* merge all phi lib to several big static lib
```
e52e6d01
W

int8 clone issue fix (#42218) · 30f65c25
由 wenbin 提交于 4月 25, 2022

30f65c25

fix en docs of some Apis (gradients, scope_guard, cuda_places, name_scope,... · 6dd9dd39

由 Yilingyelu 提交于 4月 25, 2022

fix en docs of some Apis (gradients, scope_guard, cuda_places, name_scope, device_guard, load_program_state, scale, ParamAttr and WeightNormParamAttr) (#41604)

* Update scope_guard; test=document_fix

* gradients; test=document_fix

* gradients; test=document_fix

* name_scope; test=document_fix

* cpu_places; test=document_fix

* WeightNormParamAttr; test=document_fix

* cuda_places; test=document_fix

* load_program_state; test=document_fix

* device_guard; test=document_fix

* device_guard; test=document_fix

* ParamAttr; test=document_fix

* scale; test=document_fix

* scale; test=document_fix

* update code example；test=document_fix
Co-authored-by: NChen Long <1300851984@qq.com>

6dd9dd39

Downloading data for test_analyzer_vit_ocr (#42041) · 41852264

由 Sławomir Siwek 提交于 4月 25, 2022

* Change server URL

* update config

* add test to parallel UT rule

* add checksum to ensure files are downloaded

* change downloading target

* reuse existing variable

* change target directory

41852264

F

remove redundant computation in Categorical.probs (#42114) · 9a0bfece
由 Feiyu Chan 提交于 4月 25, 2022

9a0bfece
R

Do not reset default stream for StreamSafeCUDAAllocator (#42149) · 6553a9d7
由 Ruibiao Chen 提交于 4月 25, 2022

6553a9d7

【PaddlePaddle Hackathon 2】24、为 Paddle 新增 nn.ChannelShuffle 组网 API (#40743) · bbaaf217

由 BrilliantYuKaimin 提交于 4月 25, 2022

* Add infermeta for ChannelShuffle

* Create channel_shuffle_grad_kernel.h

* Create channel_shuffle_kernel.h

* Create channel_shuffle_sig.cc

* Create channel_shuffle_op.cc

ChannelShuffle算子的描述

* Create channel_shuffle_kernel_impl.h

ChannelShuffle核函数的实现

* Create channel_shuffle_grad_kernel_impl.h

ChannelShuffle反向核函数的实现

* Add kernel register of channel shuffle and grad

注册ChannelShuffle及其反向的核函数

* add nn.functional.channel_shuffle

* add nn.ChannelShuffle

* Create test_channel_shuffle.py

* Update example of ChannelShuffle in vision.py

* Update test_channel_shuffle.py

* 修改channel_shuffle核函数的实现位置

* 修正代码格式

* 删除多余空格

* 完善channel_shuffle的错误检查

* Update unary.cc

* Update channel_shuffle_op.cc

* Update test_channel_shuffle.py

* Update unary.cc

* add channel_shuffle

* Update test_channel_shuffle.py

* Update vision.py

* 调整代码格式

* Update channel_shuffle_sig.cc

* 更新ChannelShuffle的文档

* 更新channel_shuffle的文档

* remove ChannelShuffleOpArgumentMapping

* add ChannelShuffleGradInferMeta

* Update channel_shuffle_op.cc

* 调整channel_shuffle及其梯度的核函数的位置

bbaaf217

C

replace any by variant in infermeta (#42181) · c2a05a90
由 Chen Weihang 提交于 4月 25, 2022

c2a05a90
P

add LICENSE in wheel dist-info package (#42187) · a3a6f0cf
由 pangyoki 提交于 4月 25, 2022

a3a6f0cf
R
fix recompute (#42128) · f21824d9
由 Roc 提交于 4月 25, 2022
```
* fix recompute

* modify return
```
f21824d9

[Eager] Support div(scalar) in eager mode (#42148) · f4ce8a92

由 Weilong Wu 提交于 4月 25, 2022

* [Eager] Support div scalar in eager mode

* Updated and remove debug logs

* Remove list, use 'or' directly

* Remove useless statement

f4ce8a92

W

[Eager] Remove redundancy code, fix fp16 case (#42169) · 3b8f8b6c
由 Weilong Wu 提交于 4月 25, 2022

3b8f8b6c
W

[Eager] Support numpy.ndarry in CastNumpy2Scalar (#42136) · 4a16d5c6
由 Weilong Wu 提交于 4月 25, 2022

4a16d5c6
C

fix variant compile error (#42203) · 1178f153
由 Chen Weihang 提交于 4月 25, 2022

1178f153

fix incorrect usages of std::move and other compile errors (#41045) · 05739d9e

由 tiancaishaonvjituizi 提交于 4月 25, 2022

* fix bug of std::move and others

* fix an compile error in debug mode

* fix wrong copy assignment operator
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* reformat
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* reformat
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* fix ArrayRef constructor following llvm

* fix format

* fix conflict with master

05739d9e

C
Optimize dygraph GetExpectedKernelType perf (#42154) · 3a0d7bf0
由 Chen Weihang 提交于 4月 25, 2022
```
* opt dygraph scheduling

* revert part impl
```
3a0d7bf0

24 4月, 2022 8 次提交

fix python3.10 compile bug on windows (#42140) · 13190707
由 zhouweiwei2014 提交于 4月 24, 2022

13190707
B
fix FlattenContiguousRangeOpConverter out dim error (#42087) · 2bcec75a
由 baoachun 提交于 4月 24, 2022
```
* fix FlattenContiguousRangeOpConverter out dim error

* update code
```
2bcec75a
R

[CustomDevice] add eager mode support (#42034) · ccafd2e5
由 ronnywang 提交于 4月 24, 2022

ccafd2e5

combine graph_table and feature_table in graph_engine (#42134) · 0e0f7da6

由 seemingwang 提交于 4月 24, 2022

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add dsm sample method

* add graph_neighbor_sample_v2

* Add graph_neighbor_sample_v2

* fix for loop

* add cpu sample interface

* fix kernel judgement

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* change index settings

* recover test

* recover test

* fix spelling

* recover

* fix

* move cudamemcpy after cuda stream sync

* fix linking problem

* remove comment

* add cpu test

* test

* add cpu test

* change comment

* combine feature table and graph table

* test

* test

* pybind

* test

* test

* test

* test

* pybind

* pybind

* fix cmake

* pybind

* fix

* fix

* add pybind

* add pybind
Co-authored-by: NDesmonDay <908660116@qq.com>

0e0f7da6

P
disable unittest failed in eager CI in temporary (#42101) · d6b66924
由 pangyoki 提交于 4月 24, 2022
```
* test=py3-eager

* test=py3-eager

* test=py3-eager
```
d6b66924
C
Add paddle::variant and replace paddle::any (#42139) · 79f717d6
由 Chen Weihang 提交于 4月 24, 2022
```
* add variant and replace any

* split attribute
```
79f717d6

Update Mac cmake version >=3.15 (#41456) · b1c6378d

由 tianshuo78520a 提交于 4月 24, 2022

* Update Mac cmake version >=3.15

* notest;read test1

notest;read test2

notest;read test3

* fix inference link error

* fix inference link error

* fix windows link error

* fix cmake_policy

* fix build big size

b1c6378d

Z

refine optest logic for bfloat16 (#42151) · 532c3b4c
由 zhangbo9674 提交于 4月 24, 2022

532c3b4c

23 4月, 2022 5 次提交
- Z
  
  optimize performance of dygraph (#42137) · c56fffb4
  由 zyfncg 提交于 4月 23, 2022
  
  c56fffb4
- A
  [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT (#42138) · 79ac8870
  由 Aurelius84 提交于 4月 23, 2022
```
* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT

* [Performance]Set ShapeKernel with ALL_BACKEND and ALL_LAYOUT
```
  79ac8870
- T
  
  update reduce_max for kunlun, *test=kunlun (#42116) · 1587ad07
  由 TTerror 提交于 4月 23, 2022
  
  1587ad07
- A
  
  [Performance]Remove CudaStreamSychornize in ClipGradByGlobalNorm (#42132) · 6700294c
  由 Aurelius84 提交于 4月 23, 2022
  
  6700294c
- N
  reuse ConvNormActivation in some vision models (#40431) · f6219dda
  由 Nyakku Shigure 提交于 4月 23, 2022
```
* reuse ConvNormActivation in some vision models
```
  f6219dda
22 4月, 2022 5 次提交

Y
Support triple grad check of op in Eager mode (#42131) · 34ac7b74
由 YuanRisheng 提交于 4月 22, 2022
```
* support 3-rd order gradient

* change code format
```
34ac7b74

Add gpudnn yaml config for some OPs (#41773) · 4940a525

由 Ruibiao Chen 提交于 4月 22, 2022

* Add gpudnn yaml config for some OPs

* Add grad gpudnn config

* Fix CI errors

* Fix CI errors

* Fix CI errors

* Fix conflicts

4940a525

Ssd sparse table (#41812) · cca57c4a

由 zhaocaibei123 提交于 4月 22, 2022

* [cherry-pick2.3]fix compile bug of windows cuda11.5 (#41464)

cherry-pick

fix compile bug of windows cuda11.5 #41433

* fix bug of missing boost when compile cache.cc (#41449)

【chery-pick #41430】fix bug of random compile failure, due to incorrect compile order of dependencies

* Fix eager try catch (#41438) (#41477)

[Cherry-Pick]Fix eager try catch (#41438)

* Cherry-pick-PR41407, fix device_id bug for final_state op in multiprocess testcase (#41407) (#41475)

Cherry-pick PR #41407

* [BugFix] Add error hint for one_hot gpu version (#41335) (#41495)

* add one_hot gpu hint

* move allow_out_of_range judgement

* delete useless unittest

* fix bugs of reshape double grad infermeta (#41459) (#41493)

* [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341)  (#41491)
Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>

* [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523)

Cherry-pick of #41521

* [Cherry-Pick]Cherry pick PR41200, PR41474, PR41382 (#41509)

* Use `self`as a parameter of _hash_with_id function to avoid error caused by hash_id reuse (#41200)

* Add fill_constant_batch_size YAML and UT (#41474)

* Switch some dy2st UT to eager mode (#41382)

* Sitch some dy2st UT to eager mode

* Fix test_lstm and remove test_transformer

* Run test_resnet_v2 in old dy mode

* Unittest recover (#41431)

* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix

* remove some interface

* fix

* remove

* code style

* recover

* fix

* remove code unused

* remove some unused table & accessor & CommonDenseTable => MemoryDenseTable

* fix

* fix

* fix

* recover

* remove unused code

* recover unittest

* fix

* remove

* fix

* remove code unuseful

* remove

* fix

* recover

* remove
Co-authored-by: Nesythan <esythan@126.com>

* add ssd sparse table

* fix

* add cache shuffle

* fix

* fix

* fix

* fix

* fix

* fix

* add unit test

* fix
Co-authored-by: Zhou Wei <1183042833@qq.com>
Co-authored-by: NSing_chan <51314274+betterpig@users.noreply.github.com>
Co-authored-by: N0x45f <23097963+0x45f@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>
Co-authored-by: NSiming Dai <908660116@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: NZhang Jun <ewalker@live.cn>
Co-authored-by: NJingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com>
Co-authored-by: NQi Li <qili93@qq.com>
Co-authored-by: Nesythan <esythan@126.com>

cca57c4a

C
Reduce performance influence by record event in python (#42040) · 4fd190d5
由 chenjian 提交于 4月 22, 2022
```
* optimize performance

* fix

* improve coverage

* fix

* fix
```
4fd190d5

[WIP] Algorithm Cache of cuBlasLt Epilogue (#41010) · 19650d72

由 Ming-Xu Huang 提交于 4月 22, 2022

* Fix leading dimension setting error in fused_gemm_epilogue_grad_op.

* Add dyload to cuBlasLt functions.

* Added cublasLtMatmulAlgoGetHeuristic to improve performance.

* Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue

* Added UTs to FLAGS_cublaslt_exhaustive_search_times

* Added warmup runs in algo searching of Gemm epilogue.

* Update copyright and documents.

* Fixed error handling.

19650d72

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致