提交 · 754820fe9e78c922d09bb44f9dc2e5579c68fa20 · PaddlePaddle / Paddle

11 5月, 2022 7 次提交
- J
  
  support custom operator run in double grad mode (#42653) · 00ecb98f
  由 Jiabin Yang 提交于 5月 11, 2022
  
  00ecb98f
- A
  [IPU] update to popart v2.5.0 (#42552) · 27acc6c3
  由 Allen Guo 提交于 5月 11, 2022
```
* update to popart v2.5.0

* use a specific version of sdk2.5.0
```
  27acc6c3
- F
  
  stride_slice don't support trt6 (#42639) · c4bed7e4
  由 feng_shuai 提交于 5月 11, 2022
  
  c4bed7e4
- H
  [Dygraph] Support diff batch for sparse of EagerReducer (#42646) · c5232b4b
  由 Haohongxiang 提交于 5月 11, 2022
```
* support diff batch for sparse of eagerreducer

* fix
```
  c5232b4b
- T
  
  remove old XDNN implementation test=kunlun (#42404) · 7b828f71
  由 taixiurong 提交于 5月 11, 2022
  
  7b828f71
- W
  swish refactor (#42610) · a1abb7c9
  由 wenbin 提交于 5月 11, 2022
```
* swish refactor

* bug fix

* trt7 non-linear bug fix
```
  a1abb7c9
- A
  
  update CompilationProgressLogger (#42665) · 29a6b8c9
  由 Allen Guo 提交于 5月 11, 2022
  
  29a6b8c9
10 5月, 2022 19 次提交

X
[EinsumOp] Polish forward logic and backward logic for optimize (#42603) · cf198dc9
由 xiongkun 提交于 5月 10, 2022
```
* change logic for optimize

* modifty
```
cf198dc9
R

[CustomDevice] add inference support (#42036) · 02e5c4be
由 ronnywang 提交于 5月 10, 2022

02e5c4be

Rea-dd conv_affine_channel fuse pass as oneDNN only pass (#41998) · 3540d33b

由 piotrekobi 提交于 5月 10, 2022

* Readd conv_affine_channel fuse pass as mkldnn pass

* Fix formatting

* Add new test to parallel_UT_rule.py

* Fix Coverage and Windows CI issues

* Revert "Fix Coverage and Windows CI issues"

This reverts commit f33459846385c9fd51c07f9f44e7ff283a652637.

* Fix CI errors

* Remove unnecessary conv_eltwise_add_affine_channel fuse pass

* Remove test from parallel_UT_rule.py

3540d33b

T

add fp16 for reshape op on kunlun2, *test=kunlun (#42605) · 754edf6e
由 TTerror 提交于 5月 10, 2022

754edf6e

fix switch client multithread bug (#42600) · e2540c17

由 ziyoujiyi 提交于 5月 10, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* arm_brpc compile

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* only output is ok

* base is ok

* .

* .

* .

* .

* .

* .

* .

* .

* add switch server bin

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* adapt brpc ssl

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* fix heter_server & heter_client

* .

* .

* int->int64_t

* .

* safe map in multithread

* fix heter unitest

* .

* fix code_style

* .

* fix bug

* .

e2540c17

shape mkldnn kernel adapted to NHWC (#42548) · d47690b2

由 Jacek Czaja 提交于 5月 10, 2022

* - shape mkldnn adapted to NHWC

- NHWC shape mkldnn ut

- fixes to UT

- Fix to UT

- Fixes to UT

- Fix of compilation

* - lint candidate fix

d47690b2

[Video detection] Added fill_constant FP32 FWD oneDNN kernel (#37216) · 66a10f36

由 jakpiase 提交于 5月 10, 2022

* added fill_constant kernel

* CI fix

* ci fix

* switched from nan to zero memory

* CI FIX

* ci fixes

* CI rerun

* ci fix

* minor change

* CI rerun

66a10f36

【PaddlePaddle Hackathon 2】18、为 Paddle 新增 paddle.heaviside 和 paddle.Tensor.heaviside API (#41872) · 4892d592

由 BrilliantYuKaimin 提交于 5月 10, 2022

* Create elementwise_heaviside_op.cc

* add ElementwiseHeavisideFunctor

* Create test_elementwise_heaviside_op.py

* 增加heaviside的python接口

* add heaviside in white list

* 增加heaviside的签名

* 增加heaviside的核函数

* 增加heaviside梯度的核函数

* 增加heaviside梯度的注册

* 调整代码格式

* Update elementwise_sig.cc

* add heaviside in __all__

* Update heaviside docs

* Update math.py

* Update math.py

* Update math.py

4892d592

W
[Eager] print gpu mem info (#42616) · 81644145
由 wanghuancoder 提交于 5月 10, 2022
```
* print mem

* refine

* refine

* refine

* refine
```
81644145
L

add int8 for cast (#42634) · 8a100774
由 lilong12 提交于 5月 10, 2022

8a100774
Q

[MLU]add adam, adamw op of mlu device (#42557) · cc077693
由 qipengh 提交于 5月 10, 2022

cc077693
F

[MLU] add layernorm mlu kernel (#42356) · ecd6db43
由 fwenguang 提交于 5月 10, 2022

ecd6db43
Q

[MLU]add assign op of mlu device (#42591) · 4e5fb733
由 qipengh 提交于 5月 10, 2022

4e5fb733
J
pdnode_compare (#42597) · 30234dd7
由 JingZhuangzhuang 提交于 5月 10, 2022
```
* pdnode_compare

* panode compare

* pdnode_compare
```
30234dd7
Z

merge develop. test=develop (#42624) · 0ce42fb0
由 zmxdream 提交于 5月 10, 2022

0ce42fb0
L

fix bug for heter (#42590) · 21b35167
由 lilong12 提交于 5月 10, 2022

21b35167
S

fix sample error (#42595) · df96d1ed
由 Siming Dai 提交于 5月 10, 2022

df96d1ed
A
fix random cache (#723) (#42621) · be87caf2
由 Allen Guo 提交于 5月 10, 2022
```
Co-authored-by: Nyaozhixin <522190855@qq.com>
```
be87caf2
S

broadcast_add kp performance optimization (#42097) · c7855125
由 shixingbo 提交于 5月 10, 2022

c7855125

09 5月, 2022 8 次提交
- A
  [Eager]Fix tensor.name is empty behavior (#42587) · 81078a88
  由 Aurelius84 提交于 5月 09, 2022
```
* [Eager]Fix tensor.name is empty behavior

* fix unittest
```
  81078a88
- W
  refine pylayer (#42572) · c22c2c58
  由 wanghuancoder 提交于 5月 09, 2022
```
* refine pylayer

* refine
```
  c22c2c58
- J
  [Need approval] Add AdamW-CPU FP32 JIT assembly kernel (#42522) · 766c50ac
  由 joanna.wozna.intel 提交于 5月 09, 2022
```
* Add AdamW jit kernel

* Second implementation

* Add missing header

* Correct number of jit kernels in the test
```
  766c50ac
- J
  [Ready to merge] oneDNN NHWC matmul & elementwise kernels fixes (#42506) · bf481550
  由 Jacek Czaja 提交于 5月 09, 2022
```
* - fix to crash

- more fixes

- added diagnostic

- matmul output fixes.

- compilation fix

- stop rotating too small shapes

* - Added enabling of matmul_V2 onednn test
```
  bf481550
- N
  
  Modified reduce for xpu2 (#42439) · ae4d1ec1
  由 niuliling123 提交于 5月 09, 2022
  
  ae4d1ec1
- C
  [Eager] Polish grad code details (#42536) · 778ea4ec
  由 Chen Weihang 提交于 5月 09, 2022
```
* polish grad details

* polish detail by comment
```
  778ea4ec
- C
  
  fix split stride_numel may be 0 (#42537) · d878f971
  由 chentianyu03 提交于 5月 09, 2022
  
  d878f971
- J
  [Eager] Support Gradient Accumulation for sr (#42371) · 1cddcd70
  由 Jiabin Yang 提交于 5月 09, 2022
```
* Support Gradient Accumulation for sr

* add ut

* change ut to fit small vector
```
  1cddcd70
07 5月, 2022 5 次提交
- Z
  
  fix bug of optional_tensor in amp logic (#42561) · 4e66010b
  由 zhangbo9674 提交于 5月 07, 2022
  
  4e66010b
- Z
  [Phi] Change sync copy to async for gpu_pinned to gpu place in data transform (#41966) · 6583a8d2
  由 zyfncg 提交于 5月 07, 2022
```
* the copy type of data transform for gpu_pinned to gpu change from syna to async

* refactor code
```
  6583a8d2
- A
  
  sync misc changes (#42534) · 37580838
  由 Allen Guo 提交于 5月 07, 2022
  
  37580838
- S
  support set cuda_arch_name in pipeline (#42498) · bb5a14dd
  由 Sing_chan 提交于 5月 07, 2022
```
* set auto to reduce core_avx/noavx.pyd size

* set CUDA_ARCH_NAME in each case
```
  bb5a14dd
- F
  Reduce the number of threads per block of deformable_psroi_pooling to solve... · 8c1b2fa6
  由 FlyingQianMM 提交于 5月 07, 2022
```
Reduce the number of threads per block of deformable_psroi_pooling to solve the bug where too many resources requested for launch (#42531)
```
  8c1b2fa6
06 5月, 2022 1 次提交

bind elementwise_mod_op_xpu (#42175) · 6ea2f049

由 enzodechine 提交于 5月 06, 2022

* bind elementwise_mod_op_xpu *test=kunlun

* add more supported dtypes and UTs *test=kunlun

* fix datatype error

* add op to in xpu1_op_list

* Update Mac cmake version >=3.15 (#41456)

* Update Mac cmake version >=3.15

* notest;read test1

notest;read test2

notest;read test3

* fix inference link error

* fix inference link error

* fix windows link error

* fix cmake_policy

* fix build big size

* Add paddle::variant and replace paddle::any (#42139)

* add variant and replace any

* split attribute

* disable unittest failed in eager CI in temporary (#42101)

* test=py3-eager

* test=py3-eager

* test=py3-eager

* combine graph_table and feature_table in graph_engine (#42134)

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add dsm sample method

* add graph_neighbor_sample_v2

* Add graph_neighbor_sample_v2

* fix for loop

* add cpu sample interface

* fix kernel judgement

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* change index settings

* recover test

* recover test

* fix spelling

* recover

* fix

* move cudamemcpy after cuda stream sync

* fix linking problem

* remove comment

* add cpu test

* test

* add cpu test

* change comment

* combine feature table and graph table

* test

* test

* pybind

* test

* test

* test

* test

* pybind

* pybind

* fix cmake

* pybind

* fix

* fix

* add pybind

* add pybind
Co-authored-by: NDesmonDay <908660116@qq.com>

* [CustomDevice] add eager mode support (#42034)

* fix FlattenContiguousRangeOpConverter out dim error (#42087)

* fix FlattenContiguousRangeOpConverter out dim error

* update code

* fix python3.10 compile bug on windows (#42140)

* Optimize dygraph GetExpectedKernelType perf (#42154)

* opt dygraph scheduling

* revert part impl

* fix incorrect usages of std::move and other compile errors (#41045)

* fix bug of std::move and others

* fix an compile error in debug mode

* fix wrong copy assignment operator
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* reformat
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* reformat
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* fix ArrayRef constructor following llvm

* fix format

* fix conflict with master

* fix variant compile error (#42203)

* [Eager] Support numpy.ndarry in CastNumpy2Scalar (#42136)

* [Eager] Remove redundancy code, fix fp16 case (#42169)

* [Eager] Support div(scalar) in eager mode (#42148)

* [Eager] Support div scalar in eager mode

* Updated and remove debug logs

* Remove list, use 'or' directly

* Remove useless statement

* fix recompute (#42128)

* fix recompute

* modify return

* add LICENSE in wheel dist-info package (#42187)

* replace any by variant in infermeta (#42181)

* 【PaddlePaddle Hackathon 2】24、为 Paddle 新增 nn.ChannelShuffle 组网 API (#40743)

* Add infermeta for ChannelShuffle

* Create channel_shuffle_grad_kernel.h

* Create channel_shuffle_kernel.h

* Create channel_shuffle_sig.cc

* Create channel_shuffle_op.cc

ChannelShuffle算子的描述

* Create channel_shuffle_kernel_impl.h

ChannelShuffle核函数的实现

* Create channel_shuffle_grad_kernel_impl.h

ChannelShuffle反向核函数的实现

* Add kernel register of channel shuffle and grad

注册ChannelShuffle及其反向的核函数

* add nn.functional.channel_shuffle

* add nn.ChannelShuffle

* Create test_channel_shuffle.py

* Update example of ChannelShuffle in vision.py

* Update test_channel_shuffle.py

* 修改channel_shuffle核函数的实现位置

* 修正代码格式

* 删除多余空格

* 完善channel_shuffle的错误检查

* Update unary.cc

* Update channel_shuffle_op.cc

* Update test_channel_shuffle.py

* Update unary.cc

* add channel_shuffle

* Update test_channel_shuffle.py

* Update vision.py

* 调整代码格式

* Update channel_shuffle_sig.cc

* 更新ChannelShuffle的文档

* 更新channel_shuffle的文档

* remove ChannelShuffleOpArgumentMapping

* add ChannelShuffleGradInferMeta

* Update channel_shuffle_op.cc

* 调整channel_shuffle及其梯度的核函数的位置

* Do not reset default stream for StreamSafeCUDAAllocator (#42149)

* remove redundant computation in Categorical.probs (#42114)

* Downloading data for test_analyzer_vit_ocr (#42041)

* Change server URL

* update config

* add test to parallel UT rule

* add checksum to ensure files are downloaded

* change downloading target

* reuse existing variable

* change target directory

* fix en docs of some Apis (gradients, scope_guard, cuda_places, name_scope, device_guard, load_program_state, scale, ParamAttr and WeightNormParamAttr) (#41604)

* Update scope_guard; test=document_fix

* gradients; test=document_fix

* gradients; test=document_fix

* name_scope; test=document_fix

* cpu_places; test=document_fix

* WeightNormParamAttr; test=document_fix

* cuda_places; test=document_fix

* load_program_state; test=document_fix

* device_guard; test=document_fix

* device_guard; test=document_fix

* ParamAttr; test=document_fix

* scale; test=document_fix

* scale; test=document_fix

* update code example；test=document_fix
Co-authored-by: NChen Long <1300851984@qq.com>

* fix datatype error

add op to in xpu1_op_list

*test=kunlun

* fix elementwise_mod op path error  *test=kunlun

* fix elementwise_mod UT error  *test=kunlun

* fix datatype error

add op to in xpu1_op_list

*test=kunlun

add op to in xpu1_op_list

fix elementwise_mod op path error  *test=kunlun

fix elementwise_mod UT error  *test=kunlun
Co-authored-by: Ntianshuo78520a <707759223@qq.com>
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Npangyoki <pangyoki@126.com>
Co-authored-by: Nseemingwang <seemingwang@users.noreply.github.com>
Co-authored-by: NDesmonDay <908660116@qq.com>
Co-authored-by: Nronnywang <524019753@qq.com>
Co-authored-by: Nbaoachun <962571062@qq.com>
Co-authored-by: Zhou Wei <1183042833@qq.com>
Co-authored-by: Ntiancaishaonvjituizi <452565578@qq.com>
Co-authored-by: NWeilong Wu <veyron_wu@163.com>
Co-authored-by: NRoc <30228238+sljlp@users.noreply.github.com>
Co-authored-by: NBrilliantYuKaimin <91609464+BrilliantYuKaimin@users.noreply.github.com>
Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
Co-authored-by: NFeiyu Chan <chenfeiyu@baidu.com>
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
Co-authored-by: NYilingyelu <103369238+Yilingyelu@users.noreply.github.com>
Co-authored-by: NChen Long <1300851984@qq.com>

6ea2f049

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功