提交 · 75cc7057454a557a0136d47e6391c9c8d17b670a · PaddlePaddle / Paddle

09 8月, 2023 2 次提交
- Y
  fused linear grad add bug fix and perf optim (#56094) · 77da9106
  由 Yuang Liu 提交于 8月 09, 2023
```
* skip CopyOrAdd when tmp grad is None (#55679)

* Optim fused linear grad add (#55927)
```
  77da9106
- S
  
  fix codestyle (#56066) · caa0f377
  由 ShenLiang 提交于 8月 09, 2023
  
  caa0f377
08 8月, 2023 1 次提交

Make flash attn v1 available (#56040) · 5d991c6f

由 sneaxiy 提交于 8月 08, 2023

* make flash attn v1 available

* add deps error

* refine cmake dependencies

* fix cmake error

5d991c6f

07 8月, 2023 2 次提交

[cherry-pick] Integration flash attention 2 (#56015) · cc9a7688

由 umiswing 提交于 8月 07, 2023

* [FlashAttn] add flash randomness control (#52902)

* add flash randomness control

* fix VLOG undefied

* [WIP] Integration flash attention 2 (#55758)

* Work for fa-2 padded fwd. Code to be cleaned.

* Work for fa2 unpadded fwd.

* Work for padded-bwd, dk get small diff on np.random.seed(0)

* Anyway I pass paddle's utest, except return softmax without dropout.

* Clean code.

* Modify interface.

* Clean code and add some check.

* Easy compile for dev.

* Fix ci.

* Fix ci-build.

* Add std c++17 option again.

* Limit max job when compiling fa2.

* Remove const_cast

* Add fwd params, to be cleaned.

* Clean code.

* Add bwd params.

* Clean code.

* Add enforce.

* Use v2.0.4

* Pass RNG state to fa2 capi

* Fix review.

* Add assert

* Skip compile for sm less than 80.

---------
Co-authored-by: NChitsing KUI <kuizhiqing@msn.com>

cc9a7688

cherry-pick fused_rope from develop (#55931) · 8d3a9882

由 niuliling123 提交于 8月 07, 2023

* Add fused_rope forward op (#54351)

* style

* more

* update ctest

* Update legacy_backward.yaml

* Update legacy_ops.yaml

* Update legacy_ops.yaml

* update

* update

* update for move

* Update the rope op according to the comments (#54985)

* Update multiary.cc

* Update __init__.py

* for int64_t and assert

* more

* remove useless assert first

---------
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

8d3a9882

18 7月, 2023 1 次提交
- L
  make top_p_sampling supports threshold (#55486) · e1545af4
  由 lzy 提交于 7月 18, 2023
```
* make top_p_sampling supports threshold

* delete __nv_bfloat16
```
  e1545af4
13 7月, 2023 2 次提交
- S
  Fix MemoryEfficientAttentionGradInferMeta error when bias is not None (#55388) · 8511e030
  由 sneaxiy 提交于 7月 13, 2023
```
* fix mea backward seg fault

* fix bias stride error

* fix MemoryEfficientAttentionGradInferMeta
```
  8511e030
- S
  Fix MEA backward segmentation fault error (#55382) · 39a386c7
  由 sneaxiy 提交于 7月 13, 2023
```
* fix mea backward seg fault

* fix bias stride error
```
  39a386c7
29 6月, 2023 1 次提交
- P
  support add(x_float32, bfloa16_) or add(x_float32, y_float16) (#54611) · 51fcceb2
  由 pangengzheng 提交于 6月 29, 2023
```
* support add(x_float32, bfloa16_) or add(x_float32, y_float16)

* polisg
```
  51fcceb2
14 6月, 2023 1 次提交

support sharding stage1 (#54069) · 974676bc

由 pangengzheng 提交于 6月 14, 2023

* support sharding stage1

* fix unittest

* format

* pass sharded sharding params_and_grads to inner_opt apply_pptimize

* change sharding gradient allreduce to reduce

* support save state_dict adptively and support sharding with mp

* fix sharding test

* test set_state_dict

* add more unit test

* fix global norm of mp case

* polish

* hack to calculate global norm in order to remove diff in calculating global norm values in HybridParallelClipGrad compared to dp

* remove print

974676bc

29 5月, 2023 1 次提交
- L
  
  add top_p_sampling (#54127) · f9c9dc29
  由 lzy 提交于 5月 29, 2023
  
  f9c9dc29
26 5月, 2023 1 次提交

[cherry-pick] add log for memory stats (#54087) · 435560f0

由 Leo Chen 提交于 5月 26, 2023

* add log for memory stats

* fix string_split in einsum

* Set random seed for test_tensordot (#53004)

---------
Co-authored-by: NRuibiao Chen <chenruibiao@baidu.com>

435560f0

19 5月, 2023 1 次提交
- D
  delete bf16 of cross entropy in new frl (#53923) · 9f54be49
  由 Danyang Zhang 提交于 5月 19, 2023
```
* delete bf16 of cross entropy in new frl

* delete bf16 of cross entropy grad
```
  9f54be49
15 5月, 2023 3 次提交
- L
  
  fix add_n kernel of large shape (#53767) · ab385ca4
  由 Leo Chen 提交于 5月 15, 2023
  
  ab385ca4
- Z
  
  [Cherry-Pick] Fix the calculation of y_grad in divide_backward (#53672) · 268156f8
  由 Zhang Zheng 提交于 5月 15, 2023
  
  268156f8
- Y
  
  [cherry-pick] Fix the index calculation in cross_entroy_kernel. (#53659) (#53765) · 3a53b77e
  由 Yiqun Liu 提交于 5月 15, 2023
  
  3a53b77e
14 5月, 2023 1 次提交
- W
  
  add throw exception when index type is wrong. (#53674) · 3641c5ec
  由 wuhuachaocoding 提交于 5月 14, 2023
  
  3641c5ec
12 5月, 2023 1 次提交
- Z
  
  [Cherry-Pick] Fix bug in log_softmax kernel when lastdim is larger than 100000 (#53656) · adaa2510
  由 Zhang Zheng 提交于 5月 12, 2023
  
  adaa2510
29 4月, 2023 1 次提交
- C
  extend num_split for flash attn (#53402) · 16f69e7a
  由 Chitsing KUI 提交于 4月 29, 2023
```
Co-authored-by: Nsneaxiy <32832641+sneaxiy@users.noreply.github.com>
```
  16f69e7a
26 4月, 2023 2 次提交
- S
  
  [Cherry-pick] Optimize c_embedding op in deterministic mode (#53203) · 6975542a
  由 sneaxiy 提交于 4月 26, 2023
  
  6975542a
- cherry pick dev branch for embedding grad (#53332) · 1f45b313
  由 shaojie_wang 提交于 4月 25, 2023
  
  1f45b313
25 4月, 2023 1 次提交
- Z
  Fix the calculation of layer_norm_bwd (#53229) · 3f2f4040
  由 Zhang Zheng 提交于 4月 25, 2023
```
* Fix the calculation of layer_norm_bwd

* fix
```
  3f2f4040
21 4月, 2023 1 次提交
- Y
  
  fix fused bias grad add nan (#53129) · 5ba30402
  由 Yuang Liu 提交于 4月 21, 2023
  
  5ba30402
14 4月, 2023 17 次提交

Z

[AMP OP&Test] Cumprod support fp16 and bf16 (#52919) · 8a850af6
由 Zhang Zheng 提交于 4月 14, 2023

8a850af6
C

【Hackathon4 No58】logcumsum logsum (#51275) · 468869e4
由 cyberslack_lee 提交于 4月 14, 2023

468869e4
C

【Hackathon4 No58】kthvalue (#51615) · 43efb979
由 cyberslack_lee 提交于 4月 14, 2023

43efb979
C
【Hackathon No.62】digamma, dirichlet算子FP16/BF16单测完善 (#52604) · 7ecbcc08
由 chenxujun 提交于 4月 14, 2023
```
* Add digamma, dirichlet tests

* Fix code
```
7ecbcc08
S
【Hackathon No.55】add erf FP16 test and BF16 test (#52136) · eeb4d165
由 superwinner1 提交于 4月 14, 2023
```
* add erf FP16 test
```
eeb4d165
C

Add angle,bmm tests (#52630) · 6d7ee668
由 chenxujun 提交于 4月 14, 2023

6d7ee668
U

[Dcu]: Add rocsparse_spmm for dcu. (#52200) · 281ea2f4
由 umiswing 提交于 4月 14, 2023

281ea2f4

[Zero-Dim] support 0-D tensor for... · 6f41e177

由 YangQun 提交于 4月 14, 2023

[Zero-Dim] support 0-D tensor for reduce/reshape/stack/prelu/expand_v2/gaussion onednn kernels (#52185)

* support 0-D tensor for reduce/reshape/stack/prelu/expand_v2/gaussion ops

* fix gaussian random mkldnn op ut

6f41e177

[Decouple enforce.h] Move LOG from enforce.h to enforce.cc (#52883) · b33f95b0

由 HongyuJia 提交于 4月 14, 2023

* [Decouple enforce.h] Move LOG from enforce.h to enforce.cc

* update cmake of device_context.cc, solve cuda_device_context_allocator.h compile error

* add namespace inside macro

b33f95b0

1. modify set_value op, use Scalars to represent attr `values`, instead of a... · dd2a749a

由 Feiyu Chan 提交于 4月 14, 2023

1. modify set_value op, use Scalars to represent attr `values`, instead of a bunch of attributs of various types; (#52408)

2. add program converter and set_value op as an example, which provides the functionality to convert `paddle::framework::ProgramDesc` between old and new formats(the differences are mainly some operators with incompatible updates in the definition);
3. program version and operator version map now are always saved when serializing `paddle::framework::ProgramDesc` to identify the version;
3. provide an option `legacy_format=false` in serialization of `paddle::framework::ProgramDesc`, it decided whether to convert ProgramDesc back to a legacy format, which is compatible for paddle 2.4.2 or earlier versions to load and execute;
4. deserialization of `paddle::framework::ProgramDesc` is now automatically detecting whether the bytes it receives is in legacy format(contains any of the operators that has been incompatibly updated and have any attribute of type `Scalar`) and convert it to new format. But if you want a faithful deserialization without the automatic conversion, you can use protobuf's deserialization instead. Though it is not recommended, it can be used for the purpose of testing.

dd2a749a

[phi] move sequence_pool to phi - Step 2 : sequence_pool_op (#52750) · b281b221

由 gouzil 提交于 4月 14, 2023

* [phi] move sequence_pool kernel to phi

* [phi] mv sequence_pooling to phi funcs

* [phi] mv sequence_pooling_test

* [phi] RollBACK `paddle/fluid/operators/sequence_ops/sequence_pool_op.cc`

* [phi][funcs] fix mutable_data

* [phi][funcs] fix mutable_data

b281b221

Move fused_attention op to phi [迁移反向 GPU OpKernel] (#51909) · 3bac6264

由 Sonder 提交于 4月 14, 2023

* add kernel functions

* update kernel functions

* update func parameters' name

* create codes for gpu device

* 调整文件位置

* fix include error

* remove dependent files to phi/

* restore fused_attention_op.cu

* fix dependence errors

* fix dependence errors

* fix include error

* fix all depandence errors[build success]

* remove useless include

* recover useless include

* use phi::ToNCCLDataType

* fix namespace

* update new register code

* fix error in fused_gemm_epilogue_utils

* fix error in FusedAttentionKernel parm

* finish fused_attention registe code[build success]

* add paddle::optional

* add sig file

* fix build error

* fix a include error

* 恢复正向代码

* update CMkaeList

* trans Compute function to phi [build success]

* add register code and fix include error [build success]

* fix parameter sequence

* add include file

* update #if before include

* update #if before include

* fix grammly error

* update codes for DropoutParam

* remove const cast

* trans some fluid api to phi api

* remove const cast

* trans some fluid api to phi api

* add #if

* update test code

* update test codes

* recover test codes

* fix namespace and remove fluid include

* recover random seed

* remove fluid quant_helper

* fix include error

* include utils in funcs

* change include file

* move grad codes back to fluid floder

* move grad codes back to fluid floder

* fix sig file error

* update include

* recover codes to develop

* update register codes

* fix build error

* recover fluid include

* remove some fluid include

* remove some fluid include

* Update fused_attention_op.cu

* remove fluid include

* add some fluid include

* Update fused_attention_op.cu

* Update fused_attention_op.cu

* Update fused_attention_op.cu

* Update fused_attention_op.cu

* remote useless include

3bac6264

G
fix some [-Wunused-function] and [-Wunused-function] warning (#52868) · ab163063
由 Galaxy1458 提交于 4月 14, 2023
```
* test,test=develop

* test,test=develop

* test,test=develop
```
ab163063
L

add backend config to select kernel (#52907) · 1ab7e77a
由 lzydev 提交于 4月 14, 2023

1ab7e77a
S

fix win cu116 compile error (#52894) · 60ba559a
由 sneaxiy 提交于 4月 14, 2023

60ba559a
H

update (#52875) · ce6978c6
由 huangjiyi 提交于 4月 14, 2023

ce6978c6
Z

delete unused param from swish_grad and relu6_grad (#52805) · 54e4360a
由 zhangyuqin1998 提交于 4月 14, 2023

54e4360a

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功