提交 · 4892d5926cc9e4783c3278a5b5a5c689efcee736 · 机器未来 / Paddle

10 5月, 2022 14 次提交
- B
  【PaddlePaddle Hackathon 2】18、为 Paddle 新增 paddle.heaviside 和 paddle.Tensor.heaviside API (#41872) · 4892d592
  由 BrilliantYuKaimin 提交于 5月 10, 2022
```
* Create elementwise_heaviside_op.cc

* add ElementwiseHeavisideFunctor

* Create test_elementwise_heaviside_op.py

* 增加heaviside的python接口

* add heaviside in white list

* 增加heaviside的签名

* 增加heaviside的核函数

* 增加heaviside梯度的核函数

* 增加heaviside梯度的注册

* 调整代码格式

* Update elementwise_sig.cc

* add heaviside in __all__

* Update heaviside docs

* Update math.py

* Update math.py

* Update math.py
```
  4892d592
- W
  [Eager] print gpu mem info (#42616) · 81644145
  由 wanghuancoder 提交于 5月 10, 2022
```
* print mem

* refine

* refine

* refine

* refine
```
  81644145
- L
  
  add int8 for cast (#42634) · 8a100774
  由 lilong12 提交于 5月 10, 2022
  
  8a100774
- C
  
  update base of cost model (#42601) · 6ac08db5
  由 caozhou 提交于 5月 10, 2022
  
  6ac08db5
- Q
  
  [MLU]add adam, adamw op of mlu device (#42557) · cc077693
  由 qipengh 提交于 5月 10, 2022
  
  cc077693
- F
  
  [MLU] add layernorm mlu kernel (#42356) · ecd6db43
  由 fwenguang 提交于 5月 10, 2022
  
  ecd6db43
- Q
  
  [MLU]add assign op of mlu device (#42591) · 4e5fb733
  由 qipengh 提交于 5月 10, 2022
  
  4e5fb733
- Z
  fix adamw unittest (#42593) · c6f49f0b
  由 zhaoyingli 提交于 5月 10, 2022
```
* fix adamw unittest

* tiny fix

* fix param name
```
  c6f49f0b
- J
  pdnode_compare (#42597) · 30234dd7
  由 JingZhuangzhuang 提交于 5月 10, 2022
```
* pdnode_compare

* panode compare

* pdnode_compare
```
  30234dd7
- Z
  
  merge develop. test=develop (#42624) · 0ce42fb0
  由 zmxdream 提交于 5月 10, 2022
  
  0ce42fb0
- L
  
  fix bug for heter (#42590) · 21b35167
  由 lilong12 提交于 5月 10, 2022
  
  21b35167
- S
  
  fix sample error (#42595) · df96d1ed
  由 Siming Dai 提交于 5月 10, 2022
  
  df96d1ed
- A
  fix random cache (#723) (#42621) · be87caf2
  由 Allen Guo 提交于 5月 10, 2022
```
Co-authored-by: Nyaozhixin <522190855@qq.com>
```
  be87caf2
- S
  
  broadcast_add kp performance optimization (#42097) · c7855125
  由 shixingbo 提交于 5月 10, 2022
  
  c7855125
09 5月, 2022 13 次提交

A
[Eager]Fix tensor.name is empty behavior (#42587) · 81078a88
由 Aurelius84 提交于 5月 09, 2022
```
* [Eager]Fix tensor.name is empty behavior

* fix unittest
```
81078a88
W
refine pylayer (#42572) · c22c2c58
由 wanghuancoder 提交于 5月 09, 2022
```
* refine pylayer

* refine
```
c22c2c58
L
fix docs of auto_cast, cuda_places, static.save (#42107) · c3b7bc61
由 Liyulingyue 提交于 5月 09, 2022
```
* auto_cast; test=document_fix

* static.save; test=document_fix

* cuda_places; test=document_fix
```
c3b7bc61
J
[Need approval] Add AdamW-CPU FP32 JIT assembly kernel (#42522) · 766c50ac
由 joanna.wozna.intel 提交于 5月 09, 2022
```
* Add AdamW jit kernel

* Second implementation

* Add missing header

* Correct number of jit kernels in the test
```
766c50ac

[Ready to merge] oneDNN NHWC matmul & elementwise kernels fixes (#42506) · bf481550

由 Jacek Czaja 提交于 5月 09, 2022

* - fix to crash

- more fixes

- added diagnostic

- matmul output fixes.

- compilation fix

- stop rotating too small shapes

* - Added enabling of matmul_V2 onednn test

bf481550

N

Modified reduce for xpu2 (#42439) · ae4d1ec1
由 niuliling123 提交于 5月 09, 2022

ae4d1ec1

double grad yaml and test case (#42553) · 8b546f1c

由 chentianyu03 提交于 5月 09, 2022

* add abs double grad yaml and test case

* add pool2d double grad yaml

* add pool2d dygraph double grad test case

8b546f1c

C
[Eager] Polish grad code details (#42536) · 778ea4ec
由 Chen Weihang 提交于 5月 09, 2022
```
* polish grad details

* polish detail by comment
```
778ea4ec
W
[Eager] Fix several sharding test under eager mode (#42573) · 13bcb7cd
由 Weilong Wu 提交于 5月 09, 2022
```
* [Eager] fix sharding under eager mode

* [Eager] fix several sharding test under eager mode
```
13bcb7cd
C

fix split stride_numel may be 0 (#42537) · d878f971
由 chentianyu03 提交于 5月 09, 2022

d878f971

【PaddlePaddle Hackathon 2】3、为 Paddle 新增 corrcoef(皮尔逊积矩相关系数) API (#40690) · 95a502a2

由 liqitong-a 提交于 5月 09, 2022

* corrcoef commit

* corrcoef commit

* Update test_corr.py

* Update linalg.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update linalg.py

* Update linalg.py

* Update linalg.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

* Update test_corr.py

95a502a2

Q

[ROCm] fix rocksdb on ROCm version 40020496, test=develop (#42563) · bba5e083
由 Qi Li 提交于 5月 09, 2022

bba5e083
J
[Eager] Support Gradient Accumulation for sr (#42371) · 1cddcd70
由 Jiabin Yang 提交于 5月 09, 2022
```
* Support Gradient Accumulation for sr

* add ut

* change ut to fit small vector
```
1cddcd70

07 5月, 2022 12 次提交

Z

fix bug of optional_tensor in amp logic (#42561) · 4e66010b
由 zhangbo9674 提交于 5月 07, 2022

4e66010b
C
put_record_event_in_python_on_timeline_python (#42555) · 80015c06
由 chenjian 提交于 5月 07, 2022
```
* put_record_event_in_python_on_timeline_python

* fix
```
80015c06
Z

fix the problem of slice infer shape (#42568) · c1e45a11
由 zyfncg 提交于 5月 07, 2022

c1e45a11
Q

[ROCm] add gfx908 support for AMD MI100, test=develop (#42560) · d1aedd58
由 Qi Li 提交于 5月 07, 2022

d1aedd58
Z
[Phi] Change sync copy to async for gpu_pinned to gpu place in data transform (#41966) · 6583a8d2
由 zyfncg 提交于 5月 07, 2022
```
* the copy type of data transform for gpu_pinned to gpu change from syna to async

* refactor code
```
6583a8d2
W

add some no need buff (#42556) · 3f372814
由 wanghuancoder 提交于 5月 07, 2022

3f372814

[dockerfile] add cuda11.6、cuda11.5、cuda11.4、cuda11.3 manylinux docker (#41251) · efa21b12

由 pangyoki 提交于 5月 07, 2022

* add cuda11.5 manylinux docker

* build.sh

* fix

* fix

* support cu113 and cu114

* add --no-check-certificate when wget sqlite-autoconf-3250300

* change cuda11.4.2 to cuda11.4.3

* add cu116

efa21b12

[Auto Parallel] Improve the codes of the completion and distributed context (#40671) · bed9aaea

由 Yulong Ao 提交于 5月 07, 2022

* [Auto Parallel] Replace the old planner by the new partition tuner

* [Auto Parallel] Improve the completion and distributed context

* [Auto Parallel] Fix some bugs of the compatible check of some dist ops

* [Auto Parallel] Fix some bugs

bed9aaea

[dockerfile] update go version and delete useless package in dockerfile (#36809) · afcf6bd0

由 pangyoki 提交于 5月 07, 2022

* add cuda11.4 develop docker

* change default python from 2.7 to 3.7

* change base image for cpu docker

* fix gcc bug

* fix whl package name

* update go version and delete useless package in dockerfile

* fix release18 error

* fix wget sqlite problem

* update go version

* update go version in dev dockerfile

* fix CI error

* install zstd

* fix CI error

* add --no-check-certificate when install go

* python2.7 do not install requirements

* fix CI Coverage error

* coverage==5.5

* fix test_activation ut

* let numpy < 1.22 to pass test_activation_op unittest

* fix test_python_bf16_numpy_datatype unittest

* change paddle-bfloat==0.1.3

* recover version of paddle-bfloat

afcf6bd0

A

sync misc changes (#42534) · 37580838
由 Allen Guo 提交于 5月 07, 2022

37580838
S
support set cuda_arch_name in pipeline (#42498) · bb5a14dd
由 Sing_chan 提交于 5月 07, 2022
```
* set auto to reduce core_avx/noavx.pyd size

* set CUDA_ARCH_NAME in each case
```
bb5a14dd

Reduce the number of threads per block of deformable_psroi_pooling to solve... · 8c1b2fa6

由 FlyingQianMM 提交于 5月 07, 2022

Reduce the number of threads per block of deformable_psroi_pooling to solve the bug where too many resources requested for launch (#42531)

8c1b2fa6

06 5月, 2022 1 次提交

bind elementwise_mod_op_xpu (#42175) · 6ea2f049

由 enzodechine 提交于 5月 06, 2022

* bind elementwise_mod_op_xpu *test=kunlun

* add more supported dtypes and UTs *test=kunlun

* fix datatype error

* add op to in xpu1_op_list

* Update Mac cmake version >=3.15 (#41456)

* Update Mac cmake version >=3.15

* notest;read test1

notest;read test2

notest;read test3

* fix inference link error

* fix inference link error

* fix windows link error

* fix cmake_policy

* fix build big size

* Add paddle::variant and replace paddle::any (#42139)

* add variant and replace any

* split attribute

* disable unittest failed in eager CI in temporary (#42101)

* test=py3-eager

* test=py3-eager

* test=py3-eager

* combine graph_table and feature_table in graph_engine (#42134)

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add dsm sample method

* add graph_neighbor_sample_v2

* Add graph_neighbor_sample_v2

* fix for loop

* add cpu sample interface

* fix kernel judgement

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* change index settings

* recover test

* recover test

* fix spelling

* recover

* fix

* move cudamemcpy after cuda stream sync

* fix linking problem

* remove comment

* add cpu test

* test

* add cpu test

* change comment

* combine feature table and graph table

* test

* test

* pybind

* test

* test

* test

* test

* pybind

* pybind

* fix cmake

* pybind

* fix

* fix

* add pybind

* add pybind
Co-authored-by: NDesmonDay <908660116@qq.com>

* [CustomDevice] add eager mode support (#42034)

* fix FlattenContiguousRangeOpConverter out dim error (#42087)

* fix FlattenContiguousRangeOpConverter out dim error

* update code

* fix python3.10 compile bug on windows (#42140)

* Optimize dygraph GetExpectedKernelType perf (#42154)

* opt dygraph scheduling

* revert part impl

* fix incorrect usages of std::move and other compile errors (#41045)

* fix bug of std::move and others

* fix an compile error in debug mode

* fix wrong copy assignment operator
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* reformat
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* reformat
Signed-off-by: Ntiancaishaonvjituizi <452565578@qq.com>

* fix ArrayRef constructor following llvm

* fix format

* fix conflict with master

* fix variant compile error (#42203)

* [Eager] Support numpy.ndarry in CastNumpy2Scalar (#42136)

* [Eager] Remove redundancy code, fix fp16 case (#42169)

* [Eager] Support div(scalar) in eager mode (#42148)

* [Eager] Support div scalar in eager mode

* Updated and remove debug logs

* Remove list, use 'or' directly

* Remove useless statement

* fix recompute (#42128)

* fix recompute

* modify return

* add LICENSE in wheel dist-info package (#42187)

* replace any by variant in infermeta (#42181)

* 【PaddlePaddle Hackathon 2】24、为 Paddle 新增 nn.ChannelShuffle 组网 API (#40743)

* Add infermeta for ChannelShuffle

* Create channel_shuffle_grad_kernel.h

* Create channel_shuffle_kernel.h

* Create channel_shuffle_sig.cc

* Create channel_shuffle_op.cc

ChannelShuffle算子的描述

* Create channel_shuffle_kernel_impl.h

ChannelShuffle核函数的实现

* Create channel_shuffle_grad_kernel_impl.h

ChannelShuffle反向核函数的实现

* Add kernel register of channel shuffle and grad

注册ChannelShuffle及其反向的核函数

* add nn.functional.channel_shuffle

* add nn.ChannelShuffle

* Create test_channel_shuffle.py

* Update example of ChannelShuffle in vision.py

* Update test_channel_shuffle.py

* 修改channel_shuffle核函数的实现位置

* 修正代码格式

* 删除多余空格

* 完善channel_shuffle的错误检查

* Update unary.cc

* Update channel_shuffle_op.cc

* Update test_channel_shuffle.py

* Update unary.cc

* add channel_shuffle

* Update test_channel_shuffle.py

* Update vision.py

* 调整代码格式

* Update channel_shuffle_sig.cc

* 更新ChannelShuffle的文档

* 更新channel_shuffle的文档

* remove ChannelShuffleOpArgumentMapping

* add ChannelShuffleGradInferMeta

* Update channel_shuffle_op.cc

* 调整channel_shuffle及其梯度的核函数的位置

* Do not reset default stream for StreamSafeCUDAAllocator (#42149)

* remove redundant computation in Categorical.probs (#42114)

* Downloading data for test_analyzer_vit_ocr (#42041)

* Change server URL

* update config

* add test to parallel UT rule

* add checksum to ensure files are downloaded

* change downloading target

* reuse existing variable

* change target directory

* fix en docs of some Apis (gradients, scope_guard, cuda_places, name_scope, device_guard, load_program_state, scale, ParamAttr and WeightNormParamAttr) (#41604)

* Update scope_guard; test=document_fix

* gradients; test=document_fix

* gradients; test=document_fix

* name_scope; test=document_fix

* cpu_places; test=document_fix

* WeightNormParamAttr; test=document_fix

* cuda_places; test=document_fix

* load_program_state; test=document_fix

* device_guard; test=document_fix

* device_guard; test=document_fix

* ParamAttr; test=document_fix

* scale; test=document_fix

* scale; test=document_fix

* update code example；test=document_fix
Co-authored-by: NChen Long <1300851984@qq.com>

* fix datatype error

add op to in xpu1_op_list

*test=kunlun

* fix elementwise_mod op path error  *test=kunlun

* fix elementwise_mod UT error  *test=kunlun

* fix datatype error

add op to in xpu1_op_list

*test=kunlun

add op to in xpu1_op_list

fix elementwise_mod op path error  *test=kunlun

fix elementwise_mod UT error  *test=kunlun
Co-authored-by: Ntianshuo78520a <707759223@qq.com>
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Npangyoki <pangyoki@126.com>
Co-authored-by: Nseemingwang <seemingwang@users.noreply.github.com>
Co-authored-by: NDesmonDay <908660116@qq.com>
Co-authored-by: Nronnywang <524019753@qq.com>
Co-authored-by: Nbaoachun <962571062@qq.com>
Co-authored-by: Zhou Wei <1183042833@qq.com>
Co-authored-by: Ntiancaishaonvjituizi <452565578@qq.com>
Co-authored-by: NWeilong Wu <veyron_wu@163.com>
Co-authored-by: NRoc <30228238+sljlp@users.noreply.github.com>
Co-authored-by: NBrilliantYuKaimin <91609464+BrilliantYuKaimin@users.noreply.github.com>
Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>
Co-authored-by: NFeiyu Chan <chenfeiyu@baidu.com>
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>
Co-authored-by: NYilingyelu <103369238+Yilingyelu@users.noreply.github.com>
Co-authored-by: NChen Long <1300851984@qq.com>

6ea2f049

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致