提交 · 52a0a6774178552afcb4c2b5f416ff9e384f2da6 · PaddlePaddle / Paddle

05 9月, 2023 7 次提交

G
[Fluid] move lars_momentum_op InferShape to phi (#56749) · 52a0a677
由 gouzil 提交于 9月 05, 2023
```
* move to phi

* fix

* fix type
```
52a0a677
W

add informata for strided grad kernel (#56947) · 89b91021
由 wanghuancoder 提交于 9月 05, 2023

89b91021

[Auto Parallel]: Support std::vector<phi::Tensor> input and output for DistTensor. (#56602) · d2fedeac

由 Ghost Screaming 提交于 9月 05, 2023

* [WIP] Support std::vector<phi::Tensor> input and output for DistTensor.
Concat forward and backward are verified.

* Polish code for new dist tensor implementation.

* Fix bug of DistTensor upgrade. Add support functions for std::vector<Tensor> -> std::vector<Tensor>.

* Add support for DistTensor type of std::vector<phi::Tensor> as input or output of operators.
Following testcases are passed.
1. concat: std::vector<phi::Tensor> -> phi::Tensor
2. unbind: phi::Tensor -> std::vector<phi::Tensor>
3. broadcast_tensors: std::vector<phi::Tensor> -> std::vector<phi::Tensor>

* Polish code. Remove useless comments.

* Add update_loss_scaling in skip_op_lists.

* Polish code.

d2fedeac

G
[clang-tidy] NO.8 enable `cppcoreguidelines-narrowing-conversions`. step:2 (#56895) · c2f0e9c4
由 gouzil 提交于 9月 05, 2023
```
* [clang-tidy] replenish cppcoreguidelines-narrowing-conversions

* fix

* fix
```
c2f0e9c4
G
[Fluid] move lars_momentum_xpu to phi (#56751) · 54b247b1
由 gouzil 提交于 9月 05, 2023
```
* [Fluid] move lars_momentum_xpu to phi

* Empty-Commit;test=kunlun;
```
54b247b1
J

[XPU] Add element_mul_add_fuse_pass and elementwise_madd_xpu kernel (#56629) · 5efaaaa3
由 jiangfan06 提交于 9月 05, 2023

5efaaaa3
X
[clang-tidy] No. 57,58 cppcoreguidelines-explicit-virtual-functions... · 6dd9a024
由 xiaoye 提交于 9月 05, 2023
```
[clang-tidy] No. 57,58 cppcoreguidelines-explicit-virtual-functions clang-analyzer-core.NonNullParamChecker (#56649)
```
6dd9a024

04 9月, 2023 9 次提交
- T
  Add rotate_half implementation for fused_rope (#56401) · c089a2af
  由 tianhaodongbd 提交于 9月 04, 2023
```
* add rotate_half in fused_rope

* add position_ids in fused_rope

* modified examples about fused_rope

* add set_device in examples
```
  c089a2af
- Y
  
  multihead_matmul op support codegen and kernel remove to phi (#56846) · 79bfb184
  由 Yuanle Liu 提交于 9月 04, 2023
  
  79bfb184
- N
  add num_splist to support deterministic for flash_attn_bwd and FlashAttnUnpaddedGradKernel (#56363) · 7fd6ffb8
  由 niuliling123 提交于 9月 04, 2023
```
* add num_splist for flash_attn_bwd and FlashAttnUnpaddedGradKernel

* Add assertTrue

* Update submodule to a specific commit
```
  7fd6ffb8
- W
  disable strided split (#56882) · eddf6d05
  由 wanghuancoder 提交于 9月 04, 2023
```
* disable strided split
```
  eddf6d05
- Z
  [NewIR]support c_allreduce_sum/c_identity/c_embedding/c_embedding_grad (#56836) · 0e74bf36
  由 zhaoyingli 提交于 9月 04, 2023
```
* [NewIR]add c_allreduce_sum/c_identity/c_reduce_sum/c_embedding/c_embedding_grad

* rm VLOG

* rm c_identity from LegacyOpList

* rm VLOG

* rm c_reduce_sum
```
  0e74bf36
- H
  fix compile errors when using shared phi on windows (#56915) · 8aa1772c
  由 huangjiyi 提交于 9月 04, 2023
```
* update

* fix bug

* fix bug

* fix bug

* fix bug

* rerun ci

* turn off shared_phi
```
  8aa1772c
- H
  fix paddle namespace conflict when using paddle_flags (#56913) · 7d8402a8
  由 huangjiyi 提交于 9月 04, 2023
```
* update

* update

* update
```
  7d8402a8
- D
  
  optimize softmax_mask_fuse (#56877) · 25a0b46d
  由 duanyanhui 提交于 9月 04, 2023
  
  25a0b46d
- L
  
  reshard r to p (#56833) · a28e6f63
  由 LiYuRio 提交于 9月 04, 2023
  
  a28e6f63
01 9月, 2023 7 次提交

H
export flags defined in phi on windows (#56848) · 17003369
由 huangjiyi 提交于 9月 01, 2023
```
* update

* update
```
17003369

【Complex op】add complex support for index_select and index_sample (#56457) · 0b608393

由 Scotty 提交于 9月 01, 2023

* support index_select op

* index_sample in cpu

* support index_sample in gpu

* change data_transform

* fix api gen and use skip_transform in yaml

0b608393

[NewIR]Part-2.1 Refactor NewIRCompiler to support Group Ops (#56762) · 7adb4703

由 Aurelius84 提交于 9月 01, 2023

* [NewIR]Part-2.1 Refactor NewIRCompiler to support Group Ops

* fix gflags link error

* fix include ir_printer.h

* fix unittest

* fix conflict

* fix flags

* fix comment

7adb4703

G

[clang-tidy] enable bugprone-incorrect-roundings check (#56747) · e8a96347
由 gouzil 提交于 9月 01, 2023

e8a96347

[clang-tidy] No.34,36 enable... · 17e4be21

由 cyberslack_lee 提交于 9月 01, 2023

[clang-tidy] No.34,36 enable performance-noexcept-move-constructor,modernize-use-transparent-functors (#56261)

* fix

* fix

* CI

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* CI

* fix

* CI

17e4be21

[IR] Generate pd_op.parsed.yaml from pd_op.yaml (#56674) · 962f67d2

由 chen2016013 提交于 9月 01, 2023

* Generate pd_op.parsed.yaml from pd_op.yaml

* Generate pd_op.parsed.yaml from pd_op.yaml

* fix bug

* bug fix

* bug fix

* bug fix

* 向pd_ops.yaml中新增算子 & 修改pd_ops.parsed.yaml存放路径

* 修复路径依赖bug & 添加 .gitignore文件

* fix bug - compat input args in save_combine op

* fix compat file

* fix set_value_with_tensor yaml

* split backward op in original yaml file

* add send_v2 & recv_v2

962f67d2

C
Fix custom device compile error caused by dist marco changing (#56760) · ddc81cc2
由 Chen Weihang 提交于 9月 01, 2023
```
* fix custom device errro by dist

* polish details
```
ddc81cc2

31 8月, 2023 7 次提交

【complex op】No.7 add complex support for isclose (#56723) · d53972fd

由 iSerendipity 提交于 8月 31, 2023

* add complex support for isclose

* add complex test for isclose

* fix template complie issue

* fix cuda compilation error

* fix type typo

* fix error for complex's abs

* add complex dtype into input

* fix ut

d53972fd

[NewIR]New ir using kernel registrer type (#56789) · a34bdb64

由 hong 提交于 8月 31, 2023

* update

* fix batch norm grad args def

* fix bug

* fix combine slice bug

* fix slice bug

* update builtin split

* disable using kernel resigter dtype

* polish code

* disable some test

a34bdb64

Add fused_scale_bias_relu_conv_bnstats OP (#55026) · 71e28b12

由 Tian Zheng 提交于 8月 31, 2023

* Add fused_scale_bias_relu_conv_bnstats op

* Review changes

* Fix no CUDNN Frontend build

* Fix PADDLE_ENFORCE format

* Fix PADDLE_ENFORCE CI error

* Rename kernel filename

* Refactor unittest to use paddle eager_op_test

* Fix padding bugs

* Review changes

* test=cuda117

* test=cuda117

71e28b12

L

use macro instead of functor (#56726) · 5425ad7f
由 LiYuRio 提交于 8月 31, 2023

5425ad7f
Z

[Fluid] Move distributed_fused_lamb_init to phi (#55993) · 0bc369ef
由 Zero Rains 提交于 8月 31, 2023

0bc369ef
R

[ROCM] Remove the constraint with a maximum number of threads per block of 256, P1 (#56699) · d7679426
由 ronnywang 提交于 8月 31, 2023

d7679426

[AutoParallel] Adapt static spmd rules for dynamic graph (#56367) · 54fcd9a9

由 Chen Weihang 提交于 8月 31, 2023

* move matmul spmd rules into phi

* add basic infer spmd utils

* addspmd factory

* fix compile error

* add unittest

* refine infer spmd test and utils

* debug infer spmd test

* adapt python test

* poish details

* change to vector attr arg

* revert needless change

* update matmul spmd rule test

* remove original rule

* polish details

* fix marco error

* add comment

* pass backward test

* fix compile error

* add cmake rule for spmd_rules_test

* add dist meta tensor

* update pybind impl

* add marco for rules

54fcd9a9

30 8月, 2023 6 次提交

K
[NewIR] fix logical op infermeta (#56711) · 987cb97e
由 kangguangli 提交于 8月 30, 2023
```
* fix logical op infermeta

* add test

* adpat inplace api
```
987cb97e

Add paddle custom flags support (#56256) · 2ef4ec71

由 huangjiyi 提交于 8月 30, 2023

* update

* repalce gflags header

* replace DEFINE_<type> with PD_DEFINE_<type>

* fix bug

* fix bug

* fix bug

* update cmake

* add :: before some paddle namespace

* fix link error

* fix CI-Py3

* allow commandline parse

* fix SetFlagsFromEnv

* fix bug

* fix bug

* fix CI-CINN

* fix CI-Coverage-build

* fix CI-Windows-build

* fix CI-Inference

* fix bug

* fix bug

* fix CI-CINN

* fix inference api test

* fix infer_ut test

* revert infer_ut gflags usage

* update

* fix inference

* remove flags export macro

* revert inference demo_ci gflags usage

* update

* update

* update

* update

* update

* update

* update

* update

* fix bug when turn on WITH_GFLAGS

* turn on WITH_GFLAGS

* fix bug when turn on WITH_GFLAGS

* fix bug when turn on WITH_GFLAGS

* update

* update and add unittest

* add unittest

* fix conflict

* rerun ci

* update

* resolve conflict

2ef4ec71

R

[ROCM] Remove the constraint with a maximum number of threads per block of 256, P4 (#56702) · 8c154880
由 ronnywang 提交于 8月 30, 2023

8c154880

[Auto Parallel] Compatible new comm library upgrade (#56604) · ade51aa5

由 Ghost Screaming 提交于 8月 30, 2023

* for verify

fluid operator support new comm library

* u

* u

* u

* compatiable new comm library upgrade for c_allgather, c_reduce, c_reduce_scatter and c_scatter.

* Remove useless comments in process_group.py

* Polish code style.

* Fix some problems.

* Remove use fluid api in phi comm_context_manager.

* Add PPADDLE_WITH_CUDA and PADDLE_WITH_NCCL micro judgement.

* Fix bug of HIP architecture.

* Fix some problems.
1. remove useless loggings.
2. Fix conditional compilation for HIP.
3. Fix problems of test_pass_generation_pipeline.py. It calls paddle.distributed.init_parallel_env() at first,
then auto.Engine calls _init_comm(), which will calls process_group.instantiate(). However, init_parallel_env() will call
paddle.distributed.barrier(), it will call CreateNCCLEnvCache and create corresponding NCCLCommContext. But dev_id is not
set, as a result, NCCLCommContext's dev_ctx is not initialized.

* Fix some problems.

* Polish code.

* Polish code.

* Revert compatiable upgrade for communication operators. Their upgrades
will be submitted in another PR.

* Remove StaticTCPStore.

* Remove useless modification.

* Remove useless set_cuda_device_id.

* Polish code.

* Remove fluid header files in phi files.

* Remove useless comments.

* Fix problems of hip arch.

* Fix some problems.

* Polish code.

* Polish code style.

---------
Co-authored-by: hitywt <yuwentao126@126.com>

ade51aa5

G

[clang-tidy] enable clang-analyzer-optin.cplusplus.UninitializedObject check (#56648) · 6d19073a
由 gouzil 提交于 8月 30, 2023

6d19073a

【complex op】No.6 add complex support for logical_and/or/xor/not (#56323) · 5cbf5bd4

由 iSerendipity 提交于 8月 30, 2023

* 【complex op】No.6 add complex support for logical_and/or/xor/not

* fix dtype check

* modify the docs

* add special condition for not raise when x.dtype is complex

* add random generate for complex dtype

* fix generate for complex

* fix

* fix

* add corner case for complex type

* fix ut

* fix ut

5cbf5bd4

29 8月, 2023 4 次提交

[NewIR] support c_sync_calc_stream/c_sync_comm_stream/send_v2/recv_v2 (#56557) · 0ce66c1c

由 zhaoyingli 提交于 8月 29, 2023

* [AutoParallel][NewIR] support calc_sync/comm_sync/send_v2/recv_v2

* pre-commit

* rm unittest

* tiny fix

* api_gen support send_v2's output is empty

* fix format

* python_c_gen support send_v2

0ce66c1c

Remove need_move_to_phi (#56371) · daac3829

由 Sonder 提交于 8月 29, 2023

* remove flag

* open static build flag

* add searchsorted to list

* add register info for fused layernorm

* fix fused_layernorm_kernel output registe info

* fix stft registe info

* add include

* fix registe info

* add skip fake init for fused_layernorm:residual_out

* fix error

* add distributed_fused_lamb_init to StaticBuildBlackList

* set static_build flag to false

daac3829

D
[DCU] support cum & multinomial for dcu (#56612) · 0c3e4cf6
由 duanyanhui 提交于 8月 29, 2023
```
* support cum & multinomial for dcu

* rm commt
```
0c3e4cf6
R

[ROCM] Remove the constraint with a maximum number of threads per block of 256, P2 (#56700) · 76b328bc
由 ronnywang 提交于 8月 29, 2023

76b328bc

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功