提交 · 44855da3e30f1bad5c32d084fa7dd05c9cf76a7c · PaddlePaddle / Paddle

20 1月, 2023 1 次提交
- J
  [KUNLUN] update xccl lib & use native Reduce in dygraph (#49941) · 073f7ced
  由 jameszhang 提交于 1月 20, 2023
```
* update xccl lib & use native Reduce in dygraph

* minor
```
  073f7ced
19 1月, 2023 4 次提交

F

add test for zero dimensional tensor for real, imag, angle, conj, as_real and sequence_pad (#49921) · 64b3f2f6
由 Feiyu Chan 提交于 1月 19, 2023

64b3f2f6
X
【prim】Modify dygraph code_gen , add set_output (#49918) · 22b5241f
由 xiaoguoguo626807 提交于 1月 19, 2023
```
* modify name

* merge develop

* fix param

* fix exp gen bug

* fix sum_grad

* comment
```
22b5241f

[KUNLUN] add op: maxpool_with_index (#49505) · f71f77e9

由 jameszhang 提交于 1月 19, 2023

* [KUNLUN] add op: maxpool_with_index

* use DeviceContext::Alloc() instead of DenseTensor::mutable_data()

* fix file format

* solve clip unittest failure

* minor fix

* Revert "solve clip unittest failure" since the issue is fixed
in #49535

This reverts commit 1127adc66e79afe35ac3c00bb34e6aaa7cd7d78b.

* align with xdnn on the definition of mask in max_pool_with_index

* minor

f71f77e9

H
[Paddle Inference]Support PaddlePaddle Backend on Triton (#49758) · e3f39833
由 heliqi 提交于 1月 19, 2023
```
* support PaddlePaddle Backend on Triton

* fix test cases

* fix Codestyle

* add test case

* add test case
```
e3f39833

18 1月, 2023 6 次提交

Handle repetitive code in oneDNN activation fuse passes (#49824) · a1b2e1e2

由 Sławomir Siwek 提交于 1月 18, 2023

* extract fuse pass logic to header file

* adjust namespaces

* Update paddle/fluid/framework/ir/mkldnn/activation_onednn_fuse_pass.h

update date
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

* add inline remove static
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>

a1b2e1e2

W
fix cast issue (#49909) · 55ccb429
由 wenbin 提交于 1月 18, 2023
```
* fix cast issue

* add ut
```
55ccb429
J

kunlun support p2p send/recv (#49896) · 7242f40b
由 jameszhang 提交于 1月 18, 2023

7242f40b

[0 Tensor support] support the 0d tensor for the cumsum (#49518) · 5fca45ea

由 wawltor 提交于 1月 18, 2023

* Add the cumsum 0d tensor

* xpu and cpu judge the 0d  tensor

* change to 2022 to 2023 in new commit

* fix the reverse logic

5fca45ea

L

fix cinn compilation with py38 (#49883) · bc93452d
由 Leo Chen 提交于 1月 18, 2023

bc93452d

use default XPU stream for computing (#49806) · f6b23d6d

由 jameszhang 提交于 1月 18, 2023

* revert to use default XPU stream for computing

XPUContext now has a null stream by default. If you want to use a separate stream
 (e.g. in async collective communication), you should create a dedicated XPUContext
and invoke its XPUContext::CreateStream()

* minor

f6b23d6d

17 1月, 2023 10 次提交

J
Add more dy2st ut2 (#49881) · 2242136a
由 Jiabin Yang 提交于 1月 17, 2023
```
* add test for composite with dy2st

* add more log
```
2242136a

Refine munmap freq for RefcountedMemoryMapAllocation (#49691) · 3fdc105f

由 zhangbo9674 提交于 1月 17, 2023

* refine munmap freq for ref_cnt_mmap_allocator

* add shm reuse logic

* fix compile bug

* fix compile bug

* fix bug of file refcount

* fix compile bug

* fix compile bug

* refine code for delete shm case

* polish code

* refine shm cache pool size setting logic

* set buffer is 2

* refine shm cache size logic

* refine max shm cache

* refine shm cache size

3fdc105f

Rewrite mat reshape transpose testers (#49580) · d9d47dc6

由 Paulina Gacek 提交于 1月 17, 2023

* reshape_transpose_matmul_pass_tester rewritten

* matmul_transpose_reshape_pass_tester rewritten

* mkldnn to onednn

d9d47dc6

support CUDA Graph for new executor (#49708) · 8e5ed04d

由 pangyoki 提交于 1月 17, 2023

* new exe supports CUDA Graph

* fix

* fix

* fix

* fix FLAGS_use_stream_safe_cuda_allocator in unittest

* insert output of coalesce_tensor op to skip_gc_var

* fix

8e5ed04d

Prim api gen (#49654) · 813e27c9

由 xiaoguoguo626807 提交于 1月 17, 2023

* proto type of composite grad in paddle

* proto type of composite grad in paddle

* refactor composite api with phi

* fix compile error

* support static graph code-gen for squeeze op

* generate static graph code of unsqueeze

* refine op name

* fix compile error

* add extra output in op_compat

* remove debug log

* fix clang compile error

* support prim switch flag

* support prim switch flag

* fix dygraph error

* merge develop

* add code_gen

* add necessary files without codegen

* fix code_gen bug

* add deps

* modify igmnore

* add ignore

* delete std cout

* add composite logic for backward.py

* add tanh first order grad composite

* support enable_prim flag for static graph

* throw expection when both GrapOpMaker and GradCompOpMaker not been registered

* reorganize the directory of prim api tests

* fix windows error

* add eager_utils

* add eager_utils

* modify code gen

* add composite parse

* add unittest for get_grad_op_desc

* code optimize

* fix static test on windows

* support generate static graph code for imag and real op

* fix windows compile error in test_static_prim

* merge develop

* disable test eager in inference

* prim code gen

* disable eager compile in inference

* origin_yaml codegen success

* rm other file

* rm gitignore file

* code_style

* add eager test

* code_style

* clear #

* merge develop

* clear #

* remove useless files

* modify static test

* support bool flag from singlton

* merge develop

* recover git ignore

* fix conflict

* clear prim_gen

* recover git ignore for generated op

* parse_yaml success

* fix test compile error

* remove some tests

* add python test

* code_style

* revert parse_utils+ clear prim_gen

* fix some name issue

* add composite code gen

* modify backward yaml

* fix static composite grad maker code gen

* remove addtional files

* add some static funcs unit test

* fix some bugs

* fix composite grad maker register code gen

* optimize some functions

* modify gen cmake

* add more api gen

* add header

* modify static

* add static expand unsqueeze

* comments

* modify compopmaker

* revert

* modify gen name
Co-authored-by: NJiabinYang <360788950@qq.com>
Co-authored-by: Nzyfncg <zhangyunfei07@baidu.com>
Co-authored-by: Ncxxly <chenxx_id@163.com>
Co-authored-by: Ncharles-hit <wanghao107@baidu.com>

813e27c9

[PHI]Change feed_op to phi kernel (#49116) · f7f1dc03

由 YuanRisheng 提交于 1月 17, 2023

* change feed_op to phi kernel

* fix ci bugs

* fix build bugs

* fix ci bugs

* fix compile bugs

* fix ci bugs

* perfect code

* perfect comment code

* fix install bugs

* modify code according comment

* remove visitor in feed_op

* modify according comment

* perfect code according comment

* add infershape

* fix py3 bugs

* fix getexpected kernel type

* fix getexpected kernel type

* fix ci bugs

* add registry for custom device

* fix py3 bugs

* fix floating point error

* fix py3 test bugs

f7f1dc03

J

add test for composite with dy2st (#49873) · b927ce81
由 Jiabin Yang 提交于 1月 17, 2023

b927ce81
W
[Dy2St]Support call backward() without params in dy2st (#49812) · 2f24b2d8
由 WangZhen 提交于 1月 17, 2023
```
* Support call backward() without params in dy2st
```
2f24b2d8
L

Modified compute and amplifier interceptor (#42044) · 989e39a5
由 LiYuRio 提交于 1月 17, 2023

989e39a5

【Prim】Add multiply,expand,div vjp rules (#49831) · 39c6765a

由 Xiaoxu Chen 提交于 1月 17, 2023

* support elementwise base func

* fix compiling error and add test

* support vjp for div using comp

* remove additional change

* fix dy2st error with magic num

* fix dy magic num

* another magic

* another magic

* another magic

* add skip rename strategy

* support add vjp

* support add with new axis cal

* support sub vjp

* [prim] add multiply vjp rules

* [prim] add multiply vjp rules

* [prim] fix no infershape with composite in _append_backward_ops

* [prim] add expand vjp rule

* [prim] add exp vjp rule

* uncomment infer shape for reshape/sum static prim api

* [prim] fix tanh nullptr error

* remove some print message

* fix magic number in run_program relative tests @JiaBinYang

* [prim] add expand,multiply,exp vjp rules

* fix only support single direction reduce error

* infer reduce dims using out dims
Co-authored-by: NJiabinYang <360788950@qq.com>

39c6765a

16 1月, 2023 12 次提交
- Support the 'data_transform' for generating static graph ops (#49772) · 28864137
  由 HappyHeavyRain 提交于 1月 16, 2023
```
* support the 'data_transform' for generating static graph ops

* reset 'pow' code

* change the 'GetKernelTypeForVar'
```
  28864137
- Z
  CUDA12.0 integration (#49539) · 1885d55a
  由 zlsh80826 提交于 1月 16, 2023
```
* Update warpctc for cuda-12

* Deprecate cudaProfilerInitialize for CUDA > 11

* Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040

* Add the missing thrust header
```
  1885d55a
- Z
  [inference] Use output var name to mark the NVTX flag (#49825) · ea2e2495
  由 Zhang Jun 提交于 1月 16, 2023
```
* add outvar name for nvtx mark

* nly network created with kEXPLICIT_BATCH can setsetMaxBatchSize
```
  ea2e2495
- A
  [CINN]Switch cinn GIT_TAG from v0.2 into develop (#49775) · c8187ac7
  由 Aurelius84 提交于 1月 16, 2023
```
* [CINN]Switch cinn GIT_TAG from v0.2 into develop

* fix branch name

* specify commit

* disable unittest

* disable unittest
```
  c8187ac7
- Y
  [Paddle-TRT] support nhwc (#49633) · e43f7102
  由 Yuanle Liu 提交于 1月 16, 2023
```
* add trt_support_nhwc_pass
```
  e43f7102
- W
  [fix code style]fix cpplint code style (#49742) · a3f58b70
  由 wangxiaoning 提交于 1月 16, 2023
```
* fix ctr_double_accessor.h

* fix graph_brpc_client.h non-const reference to pointer

* fix common_table.h

* fix graph_py_service.cc, server.cc, server.h
```
  a3f58b70
- W
  
  [fix code style]fix cpplint code style (#49811) · b7d44eb9
  由 wangxiaoning 提交于 1月 16, 2023
  
  b7d44eb9
- J
  Revert "[static code gen]Add phi and fluid info in static code gen (#49763)" (#49848) · 0355bb90
  由 Jiabin Yang 提交于 1月 16, 2023
```
This reverts commit 4d5265b8.
```
  0355bb90
- Y
  add gpu_cpu_map_matmul_to_mul_pass to kGpuLowerPrecisionPasses (#49753) · 07514139
  由 Yuanle Liu 提交于 1月 16, 2023
```
* add gpu_cpu_map_matmul_to_mul_pass to kGpuLowerPrecisionPasses

* disable fc_elementwise_layernorm_fuse_pass in mixed precision
```
  07514139
- C
  [static code gen]Add phi and fluid info in static code gen (#49763) · 4d5265b8
  由 Charles-hit 提交于 1月 16, 2023
```
* polish static grad op maker gen

* fix some bugs

* fix static code gen

* solve conflict

* modify composite grad maker name
```
  4d5265b8
- Z
  
  add sqrt_comp_grad composite rule (#49769) · 70378584
  由 zqw_1997 提交于 1月 16, 2023
  
  70378584
- X
  
  【prim】vjp for reduce sum (#49736) · 292f3f77
  由 xiaoguoguo626807 提交于 1月 16, 2023
  
  292f3f77
15 1月, 2023 2 次提交

support mp on xpu (#49815) · 6a56bce7

由 Roc 提交于 1月 15, 2023

1 update xccl lib
2 when using comm_ctx, the allocator should be set manually.

6a56bce7

【Prim】Enhance tests (#49814) · 090aa45d

由 Jiabin Yang 提交于 1月 15, 2023

* support elementwise base func

* fix compiling error and add test

* remove additional param

* support vjp for div using comp

* remove additional change

* fix dy2st error with magic num

* fix dy magic num

* another magic

* another magic

* add more test

* fix windows problem

* another magic

* fix windows compile

* invoke ci

* add skip rename strategy

* support add vjp

* fix test_tanh

* support add with new axis cal

* fix resnet and some test

* add composite log

* support sub vjp

* enhance_tests

* support more dtype for full

090aa45d

13 1月, 2023 5 次提交

W
add oss flash fmha and fmhca support (#49438) · a48b8e2c
由 Wang Bojun 提交于 1月 13, 2023
```
* add fmha_flashattention oss plugin
```
a48b8e2c
W

refine _grad_ivar (#49787) · 93cee48e
由 wanghuancoder 提交于 1月 13, 2023

93cee48e

[inference][trt]set output data type of trt network (#49712) · 690d7a69

由 Zhang Jun 提交于 1月 13, 2023

* update trt engine to set in/out data type

* update

* Update engine.cc

* Update engine.cc

* update

* set engine output type before freeze the network

* update

* update trt autoscan ut

* update

* update ut

* fix equal bug, update ut

* fix cast and equal ut

* update cast ut using TRT < 8.4

* set datatype from scope

* check output var is nullptr

* Update op_converter.h

* update tensorrt_engine_op_test ut

* update

690d7a69

[Custom Device] Clear ProcessGroup Manually (#49182) · a923a757

由 duanyanhui 提交于 1月 13, 2023

* clear ProcessGroupCustom manually

* fix bug

* fix bug

* move destroy ProcessGroup to ProcessGroupIdMap

* enable destroy to all device

* remove unused comments

* change to internal api

* Update process_group.cc

* Update process_group.cc

a923a757

D
[Custom Device] update get_device to custom and add custom_device api (#49721) · bd165b94
由 duanyanhui 提交于 1月 13, 2023
```
* update get_device to custom

* add custom_device api

* rm is_compiled_with_custom_device from framework

* add todo comments
```
bd165b94

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功