提交 · a694e679763ff13dbde2d25f0ff125d3bbf6c6fe · PaddlePaddle / Paddle

28 8月, 2023 1 次提交

[AutoParallel] Simplify PADDLE_WITH_DISTRIBUTE marco using (#56361) · 62c78e26

由 Chen Weihang 提交于 8月 28, 2023

* simplify with dist marco

* polish error message format

* fix vtable error

* fix cmake error

* fix winsock redefined error

* fix windows compile error

* fix windows conpile failed

* fix merge error

* fix vec compile error

* add port.h into test_cpu_vec

* fix merge error

* try to fix winsock error

62c78e26

25 8月, 2023 2 次提交
- R
  
  [CustomDevice] add comm context support (#56301) · 62397cd2
  由 ronnywang 提交于 8月 25, 2023
  
  62397cd2
- R
  
  [CustomDevice] Fix device id out of range in custom device resource pool (#56580) · e99b3cb2
  由 ronnywang 提交于 8月 25, 2023
  
  e99b3cb2
23 8月, 2023 1 次提交
- R
  
  [CustomDevice] Fix device occupancy (#56556) · 5c6ae96b
  由 ronnywang 提交于 8月 23, 2023
  
  5c6ae96b
22 8月, 2023 1 次提交

[XPU][PHI Kernels] add index_put kernel for xpu (#56169) · 332a73b1

由 lijin23 提交于 8月 22, 2023

* add inverse kernel for xpu

* add more kernels

* add index_put kernel for xpu

* add index_put kernel for xpu

* remove unused headers

* refine test

* wait to avoid memory bugs for xpu

* refine inverse

332a73b1

18 8月, 2023 1 次提交
- Y
  
  fix fft bug in DCU (#56340) · d084a236
  由 yuguo 提交于 8月 18, 2023
  
  d084a236
16 8月, 2023 2 次提交

[ROCM]:Delete the special target and fix compiler options (#55507) · 4d501872

由 onepick 提交于 8月 16, 2023

runtime compiler api will only build special target if it is bind.

'--include-path' is not supported by hipcc and "-I/include/folder"
is better choice

fix ut:
        * device_code_test
        * test_code_generator
        * test_fusion_group_pass
        * test_fusion_group_op
Signed-off-by: Njiajuku <jiajuku12@163.com>

4d501872

J

[XPU] Add fast_layernorm_xpu_fuse_pass and fast_layernorm_xpu plugin (#56269) · f16e1869
由 jiangfan06 提交于 8月 16, 2023

f16e1869

14 8月, 2023 2 次提交
- G
  
  [clang-tidy] Open cppcoreguidelines-avoid-c-arrays Check (#56208) · 0f5148fb
  由 gouzil 提交于 8月 14, 2023
  
  0f5148fb
- J
  
  [XPU] Add take_along_axis xpu kernel and plugin (#56125) · b2e06fc9
  由 jiangfan06 提交于 8月 14, 2023
  
  b2e06fc9
11 8月, 2023 1 次提交
- W
  
  [XPU]Add flip kernel (#55932) · ee003457
  由 wz1qqx 提交于 8月 10, 2023
  
  ee003457
10 8月, 2023 1 次提交
- J
  
  [XPU] Add gather_nd fp16 and add check_dtype_op_blacklist (#55860) · 307128d1
  由 jiangfan06 提交于 8月 10, 2023
  
  307128d1
09 8月, 2023 3 次提交
- X
  [oneDNN]rename macro to PADDLE_WITH_DNNL (#52208) · 6ff4c130
  由 Xinyu Chen 提交于 8月 09, 2023
```
* onednn: rename macro to PADDLE_WITH_DNNL

* onednn: rename macro to CINN_WITH_DNNL
```
  6ff4c130
- R
  
  [clang-tidy] fix modernize-make-unique (#55764) · 9f04f2ac
  由 Ruibin Cheung 提交于 8月 09, 2023
  
  9f04f2ac
- H
  
  [XPU] add fused_softmax_mask and fused_softmax_mask_grad. (#55914) · b982af4a
  由 houj04 提交于 8月 09, 2023
  
  b982af4a
08 8月, 2023 1 次提交
- L
  
  [XPU] register multiclass_nms3 and norm xpu kernel to optimize model (#56064) · ba992136
  由 leolishaohao 提交于 8月 08, 2023
  
  ba992136
07 8月, 2023 3 次提交

G

[clang-tidy] NO.6 enable `modernize-avoid-c-arrays` step: 2 (#55954) · 5ada98b8
由 gouzil 提交于 8月 07, 2023

5ada98b8
R

[clang-tidy] enable modernize-use-equals-default (#55983) · 30a02d27
由 Ruibin Cheung 提交于 8月 07, 2023

30a02d27

[WIP] Integration flash attention 2 (#55758) · 0473369f

由 umiswing 提交于 8月 07, 2023

* Work for fa-2 padded fwd. Code to be cleaned.

* Work for fa2 unpadded fwd.

* Work for padded-bwd, dk get small diff on np.random.seed(0)

* Anyway I pass paddle's utest, except return softmax without dropout.

* Clean code.

* Modify interface.

* Clean code and add some check.

* Easy compile for dev.

* Fix ci.

* Fix ci-build.

* Add std c++17 option again.

* Limit max job when compiling fa2.

* Remove const_cast

* Add fwd params, to be cleaned.

* Clean code.

* Add bwd params.

* Clean code.

* Add enforce.

* Use v2.0.4

* Pass RNG state to fa2 capi

* Fix review.

* Add assert

* Skip compile for sm less than 80.

0473369f

04 8月, 2023 2 次提交
- N
  
  Fix a bug in VecAutomaticAddPerBlock (#55929) · 81511469
  由 niuliling123 提交于 8月 04, 2023
  
  81511469
- J
  
  [XPU] Add int support for elementwise_sub/elementwise_div (#55920) · 97ab6aa6
  由 jiangfan06 提交于 8月 04, 2023
  
  97ab6aa6
03 8月, 2023 2 次提交
- W
  
  [clang-tidy] [No.4] enable `modernize-loop-convert` (#55704) · 81ccd99e
  由 Wang Xin 提交于 8月 03, 2023
  
  81ccd99e
- W
  
  eliminate small pattern (#55843) · dc4b48f6
  由 wz1qqx 提交于 8月 03, 2023
  
  dc4b48f6
02 8月, 2023 3 次提交

[clang-tidy] NO.6 enable `modernize-avoid-c-arrays` check (#55774) · c000091e

由 gouzil 提交于 8月 02, 2023

* [clang-tidy] modernize-avoid-c-arrays

* rollback

* [clang-tidy] fix

* close modernize-avoid-c-arrays

* fix PHI_DEFINE_string; add PHI_DEFINE_bool NOLINT

* fix PHI_DEFINE_string

* fix next_h_state and parity err

* fix win32

* fix cuda_graph

* fix accuracy_kernel

* fix math_function

* fix fused_softmax_mask_kernel.cu load_data and warp_reduce; rollback concat_and_split_functor ins_addr

* fix fused_dropout_add_grad_kernel

* fix

* rollback cu

* rollback concat_and_split_functor.cu

* rollback

c000091e

W

[XPU]Add conv1d fuse pass (#55719) · 22c7a6eb
由 wz1qqx 提交于 8月 02, 2023

22c7a6eb
J

[XPU] Add gather_squeeze_pass (#55605) · d13a49d6
由 jiangfan06 提交于 8月 02, 2023

d13a49d6

01 8月, 2023 1 次提交
- H
  
  [XPU] Add fast_where fusion op and XPU micro kernel (#55628) · 07e788f1
  由 hong19860320 提交于 8月 01, 2023
  
  07e788f1
26 7月, 2023 1 次提交
- G
  
  add modernize-redundant-void-arg check (#55652) · 12fb18dd
  由 gouzil 提交于 7月 26, 2023
  
  12fb18dd
25 7月, 2023 1 次提交
- J
  
  [XPU] Add FP16 support for arg_min_max (#55642) · 14094aad
  由 jiangfan06 提交于 7月 25, 2023
  
  14094aad
21 7月, 2023 1 次提交
- R
  
  [clang-tidy] enable modernize-make-unique (#55506) · 45d49619
  由 Ruibin Cheung 提交于 7月 21, 2023
  
  45d49619
20 7月, 2023 1 次提交

[XPU][PHI Kernels] bind reduce_max_int64 set_value_bool sin_grad_fp32... · ab00c96c

由 lijin23 提交于 7月 20, 2023

[XPU][PHI Kernels] bind reduce_max_int64 set_value_bool sin_grad_fp32 cos_grad_fp32 for XPU (#55375)

* bind kernels for xpu

* format code

* format code

* 0d support for set value

* refine set_value

ab00c96c

13 7月, 2023 2 次提交
- R
  
  [CustomDevice] fix device guard (#55351) · 0fd6efbb
  由 ronnywang 提交于 7月 13, 2023
  
  0fd6efbb
- M
  
  fix bug on case with gpu driver but no gpu (#55335) · acf4a2ae
  由 ming1753 提交于 7月 13, 2023
  
  acf4a2ae
12 7月, 2023 3 次提交

R
[CustomDevice] fix release error in process_group_custom (#55293) · 7a705727
由 ronnywang 提交于 7月 12, 2023
```
* [CustomDevice] fix release error for process_group_custom

* update
```
7a705727

[ONEDNN] Upgrade oneDNN version to v3.1 (#52463) · cfa513f7

由 YangQun 提交于 7月 12, 2023

* squash pick the poc code
* fix build after rebase
* fix int8 conv and fc uts
* Fix and clean-up Get_SRC_Scale_Memory
* fix floating point fc uts
* fix test_analyzer_int8_googlenet
* test_analyzer_int8_mobilenetv1
* fix int8 mobilenet v2 and v3
* fix build error after rebase
* [oneDNN] rename library version
* fix conv bias datatype
* try to fix import error
* fix rebase error
* [oneDNN] pack library into python wheel
* add MKLDNN_SHARED_LIB_3 to env_dict
* fix test_analyzer_bert
* fix fill_constant op kernel
* fix ernie and matmul op ut
* fix softplus ut
* fix conv+relu6 fusion ut
* fix hardswish fusion
* fix quant+transpose fusion ut
* fixsgd ut
* fix int8 matmul with flatten
* fix fc+scale fusion
* fix conv/matmul+gelu fusion uts
* fix rebase error
* Revert "fix conv/matmul+gelu fusion uts"
This reverts commit 47eb5e49972bd8f7271a233def9bfb3e98ce78e1.
* upgrade to onednn v3.1
* remove older version onednn
* use densetensor::data() for achieving mean and var in layernorm impl
* comments for atol of integer tests
* fix clang-format
* Revert "remove older version onednn"
This reverts commit 783e57ddfd4401254596eae7d47adb9b03590c09.
* improve binary handle
* fix expand kernel
* Revert "use densetensor::data() for achieving mean and var in layernorm impl"
* always use forward_inference for conv
* remove activation scales
* rollback changes to mkldnn.cmake
* address comments
* port changes to dequantize kernel
* fix merge error
* fix fused_elementwise_kernel
* upgrade onednn version to v3.1.1
* fix some approval error
* fix error msg format
* remove old onednn libs
* try to fix symbolic link issue
* fix cinn test case segfault
* do not explicit link test with onednn
* remove unnecessary changes
* integrate CINN with onednn v3
* link with mkldnn project
* fix cinn build file

---------
Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>
Co-authored-by: NChen, Xinyu1 <xinyu1.chen@intel.com>
Co-authored-by: Ntianshuo78520a <707759223@qq.com>

cfa513f7

[clang-tidy] enable `readability-container-size-empty` check (#55279) · be3a6fa7

由 Wang Xin 提交于 7月 12, 2023

* [clang-tidy] enable readability-container-size-empty check

* fix test_custom_kernel Failed

* add clang-tid-10 in dockerfile

* add clang-tidy in dockerfile

* fix bug

be3a6fa7

07 7月, 2023 2 次提交
- W
  
  [XPU] Add layernorm fuse pass (#55154) · eb12739e
  由 wz1qqx 提交于 7月 07, 2023
  
  eb12739e
- R
  
  [CustomDevice] fix resource_pool release bug (#55229) · 6af85a81
  由 ronnywang 提交于 7月 07, 2023
  
  6af85a81
04 7月, 2023 1 次提交

[XPU] Add XPU plugin support (#55101) · 6d5d9f23

由 hong19860320 提交于 7月 04, 2023

* Add XPU plugin to support the customized ops or improve the performance of the fusion ops based on hand-written xpu micro kernels.

* refine README.md

6d5d9f23

03 7月, 2023 1 次提交
- J
  [XPU] Fix the topk, set_value ops that using temporary tensors avoiding the... · cc2059a0
  由 jiangfan06 提交于 7月 03, 2023
```
[XPU] Fix the topk, set_value ops that using temporary tensors avoiding the memory overlaps during multi-stream inference (#54851)
```
  cc2059a0

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功