提交 · 88724a53b4cffb39ef9d5137c0352a0793455014 · Crayon鑫 / Paddle

16 8月, 2022 3 次提交

convert multihead to oss (#45019) · f706d95d

由 feng_shuai 提交于 8月 16, 2022

* convert multihead to oss

* fix:bug

* fix:delete const cast

* fix:don't support bias_qk

* add vit pass

* fix:convert bug and add preln_residual_bias

* support length=-1

* add UT for convert

* add no_bias_qk support for gpu_multihead_op

* delete infer_shape depends on bias_qk

* oss just can be used in T4 and A*

* fix:change api for ROCM CI

f706d95d

W

fix new quant (#45155) · 2fb65e44
由 Wangzheee 提交于 8月 16, 2022

2fb65e44
F

add strongly typed functions to set attributes to avoid unexpected type conversions. (#45107) · 307801d5
由 Feiyu Chan 提交于 8月 16, 2022

307801d5

15 8月, 2022 2 次提交

Y

fused_embedding_eltwise_layernorm_op and skip_layernorm_op support fp16 (#44969) · ac0553a0
由 Yuanle Liu 提交于 8月 15, 2022

ac0553a0

[Auto Parallel] Move the distributed info from python to c++ (#44510) · a52357fe

由 Yulong Ao 提交于 8月 15, 2022

* [Auto Parallel] Move the distributed info from python to c++

* [Auto Parallel] Add dist_attrs for VarDesc and OpDesc

* [Auto Parallel] Add the lost file

* [Auto Parallel] Make the dist attr be unique_ptr

* [Auto Parallel] Add the proto conversion

* [Auto Parallel] Improve the proto support

* [Auto Parallel] Fix the bugs for adding a device or a link

* [Auto Parallel] Add the C++ ProcessMesh and DistributedMapper

* [Auto Parallel] Improve the impl of these dist attrs

* [Auto Parallel] Pybind11 ProcessMesh and DeviceMesh

* [Auto Parallel] Fix the unittest problem

* [Auto Parallel] Explicitly add the src file for auto_parallel target

* [Auto Parallel] Add the proto depedency explicitly

* [Auto Parallel] Fix the cmake bug on windows and mac

* [Auto Parallel] Remove the pybind11 header file in process_mesh.h

* [Auto Parallel] Remove unused codes

* [Auto Parallel] Check whether the dist attr is null

* [Auto Parallel] Implement the assign operator for OpDesc explicitly

a52357fe

14 8月, 2022 1 次提交
- X
  Revert "[Paddle Inference] Support cuda_graph. (#44878)" (#45115) · b0e7681f
  由 xiaoxiaohehe001 提交于 8月 14, 2022
```
This reverts commit 84bf5c31.
```
  b0e7681f
13 8月, 2022 2 次提交

Refine program cache (#45005) · e96dae8b

由 Leo Chen 提交于 8月 13, 2022

* add cached_serialize_str_

* support program hash

* add sha

* add ut

* use hash_str only for new_exe

* fix attr order

e96dae8b

fl-ps: support split sparse params in local & remote (#44864) · 3f5c405f

由 ziyoujiyi 提交于 8月 13, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fl-ps v1.0

* .

* support N + N mode

* .

* .

* .

* .

* delete print

* .

* .

* .

* .

* fix bug

* .

* .

* fl-ps with coordinator ready

* merge dev

* update message parse only

* update fl client scheduler

* fix bug

* update multithreads sync

* fix ci errors

* update role_maker.py

* update role_maker.py

* fix ci error: windows py import error

* fix ci error: windows py import error

* fix windows ci pylib import error

* add dump fields & params

* try to fix windows import fleet error

* fix ps FLAGS error

* fix logging risk

* fix logging possible risk

* write trainer_desc file

* support split sparse params in local & remote

* fix import paddle.fluid.core.PSGPU

* fix import paddle.fluid.core.PSGPU

* add remote_sparse & local_sparse config

* fix unittest

* fix test_dist_fleet_geo table error

* fix PADDLE_ENFORCE error

* fix other's pr conflict

3f5c405f

12 8月, 2022 2 次提交

Offload calculations from matmul op to fuse pass (#44941) · acb78ea2

由 Sławomir Siwek 提交于 8月 12, 2022

* remove v2_transpose_reshape

* matmul_transpose_reshape

* reshape_transpose_matmul

* Add int8 support for matmulV2

* restore ut

* adjust old ut

* restore parallel UT ruels

* remove mkldnn code from base ops

* move enforces to pass

* remove duplicated functions

* delete duplicated enforces

* feedback from review

* add comments to variables

* enable eltwise support

* dynamic attribute

* remove fusepass tests from op test

* remove fuse pass cases from op test

* revert introduction of dynamic attributes

* style
Co-authored-by: Nwozna <joanna.wozna@intel.com>

acb78ea2

transfer memcpy_h2d from fluid to phi (#44932) · 7bc57d35

由 kangguangli 提交于 8月 12, 2022

* transfer memcpy_h2d from fluid to phi

* use UnchangedInferMeta instead

* restore test_standalone_executor

* add newline to fix codestyle check

* rename pt -> phi

* simplify logic and add check

* make the comment more clear

* remove useless comment

* refine code

7bc57d35

11 8月, 2022 1 次提交
- W
  
  Change bias to persistable in preln_residual_bias_fuse_pass (#45037) · 26c573de
  由 whs 提交于 8月 11, 2022
  
  26c573de
10 8月, 2022 4 次提交
- X
  [Paddle Inference] Support cuda_graph. (#44878) · 84bf5c31
  由 xiaoxiaohehe001 提交于 8月 10, 2022
```
* cuda_graph

* cuda_graph_

* cuda_graph_

* cuda_graph_
```
  84bf5c31
- L
  [new-exec] set cuda device before run (#44985) · 68b06ba6
  由 Leo Chen 提交于 8月 10, 2022
```
* set cuda device before run

* add header file

* fix compile
```
  68b06ba6
- L
  fix proto consistency bug (#45017) · 9c98ee3e
  由 Leo Chen 提交于 8月 10, 2022
```
* fix proto bug

* add ut

* reset need_update for var_desc

* refine code

* fix var desc order issue
```
  9c98ee3e
- A
  [OpAttr]Support VarDesc* and vector<VarDesc*> in Attribute (#44737) · 81d6fa6c
  由 Aurelius84 提交于 8月 10, 2022
```
* [OpAttr]Support VarDesc* and vector<VarDesc*> in Attribute

* add unittest for inference predictor
```
  81d6fa6c
09 8月, 2022 2 次提交
- Y
  
  fix mkldnn conv add pass when the dims of res and out are not equel (#45018) · 42c694df
  由 yeliang2258 提交于 8月 09, 2022
  
  42c694df
- Y
  Fix a bug in transpose2 when run native cpu (#44659) · 8185cecd
  由 yeliang2258 提交于 8月 09, 2022
```
* fix a bug in transpose2 about mkldnn

* fix bug
```
  8185cecd
08 8月, 2022 1 次提交
- L
  clean includes of tensor.h (#44928) · ee9ea48d
  由 Leo Chen 提交于 8月 08, 2022
```
* clean tensor.h

* fix gather_nd
```
  ee9ea48d
05 8月, 2022 4 次提交

fix 5 operator makers with typos which pass string literal to argument... · ce9d2a9e

由 Feiyu Chan 提交于 8月 05, 2022

fix 5 operator makers with typos which pass string literal to argument 'generated', remove generated as parameter of AddAttr (#44935)

ce9d2a9e

[MKLDNN]Move mkldnn activation kernel to phi (#44365) · 2dfa88d2

由 YuanRisheng 提交于 8月 05, 2022

* move mkldnn activation kernel

* fix compile bugs

* fix compile bugs

* deal with conflict

* fix compile bugs

* fix windows compile bugs

* mkldnn unittest fix

* change mutable to alloc

* fix unittest bugs

* modify code according comment

2dfa88d2

Z

Add feed&fetch as default deny ops. (#44708) · d4ca7ffb
由 Zhen Wang 提交于 8月 05, 2022

d4ca7ffb

Merge matmul_v1 and matmul_v2 fuse passes (#44870) · d0cf9d9d

由 Sławomir Siwek 提交于 8月 05, 2022

* remove v2_transpose_reshape

* matmul_transpose_reshape

* reshape_transpose_matmul

* restore ut

* adjust old ut

* restore parallel UT ruels

* feedback from review

d0cf9d9d

04 8月, 2022 2 次提交

Matmuls with activation and elementwise_add fuses (#44655) · 0420d514

由 Sławomir Siwek 提交于 8月 04, 2022

* Add unit tests

* matmul_v2 + activation

* matmuls + elementwise_add

* matmul_v2 postops

* transform matmul to v2

* opcompat

* fix fusing matmul with multipe outs

* add shape constraints

* remove unused vars

* change pass order

* - Unit tests to be debugged

- fix

- refactor

- diagnostic

- more diagnostic

- fix

- Fix number two

- fix

- fix

- fix

- alpha added

- more fixes

- compilation fix

- removed diagnostic code

- cosmetic fixes

* lint

* add alpha constraint

* merge matmul refactor

* trigger CI

* - fix

* - another fix

* code style

* add support for matmul+elementwise_add+activation

* code style

* fix bfloat16 bugs

* change append_binary to append_sum
Co-authored-by: NJacek Czaja <jacek.czaja@intel.com>

0420d514

王

add xpu garbage collector for standalone executor. (#44572) · 0e26361c
由王明冬提交于 8月 04, 2022

0e26361c

03 8月, 2022 2 次提交
- H
  [jit] c++ property deserialization & Variable support vector of int, float (#44727) · 9735d1b8
  由 Hui Zhang 提交于 8月 03, 2022
```
* c++ property deserialization

* fix for comment

* more error info

* fix exception info

* fix ci

* fix compile

* fix layer test ci
```
  9735d1b8
- W
  
  fix trt and gpu pass: emb_elt_layn (#44842) · 2ea1c134
  由 Wangzheee 提交于 8月 03, 2022
  
  2ea1c134
02 8月, 2022 6 次提交
- L
  
  fix namespace of GPUContext (#44822) · 65f38869
  由 Leo Chen 提交于 8月 02, 2022
  
  65f38869
- W
  Multihead matmul fp16 (#44792) · 0fd8ee63
  由 Wilber 提交于 8月 02, 2022
```
* multihead matmul add fp16

* fix windows error

* fix rocm error

* fix rocm error
```
  0fd8ee63
- D
  
  fix gpups CUDADeviceContext to phi-GPUContext;test=develop (#44804) · 3491d183
  由 danleifeng 提交于 8月 02, 2022
  
  3491d183
- W
  [Phi] polish and rename, pt* -> phi* (#44697) · 942ff89f
  由 Weilong Wu 提交于 8月 02, 2022
```
* polish and rename, pt* -> phi*

* fix code format
```
  942ff89f
- R
  Skip inplace for coalesce_tensor_op outputs (#44795) · bb22e59c
  由 Ruibiao Chen 提交于 8月 02, 2022
```
* Skip inplace for coalesce_tensor_op outputs

* Fix typos

* Add UTs

* Fix typos
```
  bb22e59c
- R
  Refactor build_op_downstream_map for standalone executor (#44729) · 9b97ac70
  由 Ruibiao Chen 提交于 8月 02, 2022
```
* Refactor build_op_downstream_map for standalone executor

* Add some comments
```
  9b97ac70
01 8月, 2022 3 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

GPUGraph merge to develop (#44594) · 798670bb

由 danleifeng 提交于 8月 01, 2022

798670bb

W
[Paddle Inference] add varlen_token_prune plugin, pass, convert (#44733) · 24187fcb
由 Wangzheee 提交于 8月 01, 2022
```
* add varlen_token_prune plugin, pass, convert
```
24187fcb

29 7月, 2022 4 次提交

L
unify fluid::CUDADeviceContext and phi::GpuContext (#44723) · 88490567
由 Leo Chen 提交于 7月 29, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile
```
88490567

[Auto parallel] Optimization Tuning (#43782) · 72f2ed43

由 JZ-LIANG 提交于 7月 29, 2022

* fixed bug for pass & engine

* fixed bug for benchmark GPT-3

* add tuner & profiler

* add algorithms & config

72f2ed43

move CUDAStream to phi (#44529) · da3743fd

由 Leo Chen 提交于 7月 29, 2022

* init

* move CUDAStream to phi

* fix compilation

* merge develop

* add stream_owned_ member

* split cuda_stream.h

* fix cpu compile

* fix constructor

* fix bug

* fix windows compile

* fix inference test_levit

* fix windows tests

da3743fd

H

[XPU] add sampling_id op, add top_k op, update xdnn api. test=kunlun (#44704) · e61f48c1
由 houj04 提交于 7月 29, 2022

e61f48c1

27 7月, 2022 1 次提交
- P
  fix RemoveIntermediateOut in fuse_elewise_add_act_pass while converting graph to program (#44593) · be132719
  由 pangyoki 提交于 7月 27, 2022
```
* fix RemoveNode in fuse_elewise_add_act_pass

* fix

* change pointer to share_ptr

* fix

* fix

* fix format

* fix

* fix graph_safe_remove_nodes
```
  be132719

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致