提交 · 1882c49630678e3f89aaad0691e18b198b11abb5 · Crayon鑫 / Paddle

11 3月, 2022 1 次提交
- Y
  
  [hybrid] Support tensor parallel and cache structure for fused attention op. (#40101) · 1882c496
  由 Yuang Liu 提交于 3月 11, 2022
  
  1882c496
01 3月, 2022 1 次提交
- S
  Optimize the CUDA kernel in DistributedFusedLamb optimizer (#39972) · d17961ed
  由 sneaxiy 提交于 3月 01, 2022
```
* vectorize lamb kernel

* remove flags, add ut

* remove useless codes

* refine code, add param order
```
  d17961ed
25 2月, 2022 1 次提交

Add MultiTensorApply to calculate L2-Norm in DistributedFusedLamb optimizer (#39900) · d32a0102

由 sneaxiy 提交于 2月 25, 2022

* add multi tensor apply l2 norm

* add multi_tensor_apply code

* make sizeof(TensorMeta) smalller

* move code to distributed_fused_lamb_op.cu

* remove useless FLAGS

d32a0102

24 2月, 2022 1 次提交
- L
  fix 'invalid escape sequence' (#39842) · 4e26fa57
  由 Leo Chen 提交于 2月 24, 2022
```
* fix 'invalid escape sequence'

* fix assert error
```
  4e26fa57
19 2月, 2022 1 次提交

Add the DistributedFusedLamb optimizer (#39148) · 5df3cd61

由 sneaxiy 提交于 2月 19, 2022

* add DistributedFusedLamb op

* polish code

* fix compile error

* compatible with pten changement

* fix rocm compile error

* improve converage

* update upstream/develop

* fix cast_with_ptr.h

* add FLAGS_distributed_lamb_divide_nranks_when_allreduce=1

* fix clip before allreduce

* add use_master_param_norm

* code polish

* fix bug

* fix ROCM ci

5df3cd61

28 1月, 2022 1 次提交
- Z
  
  recovery code (#39287) · 45f9c9eb
  由 zhangkaihuo 提交于 1月 28, 2022
  
  45f9c9eb
27 1月, 2022 2 次提交

Add Khop Graph Sampler API (#39146) · 35f949b5

由 Siming Dai 提交于 1月 27, 2022

* add the test case for the UVA

* add the context load for the uva

* Add graph_sample kernel

* Add graph_sample commit

* add new commit for graph_sample

* add unsigned long long int

* delete some remarks

* add cpu version

* add cuda eids

* add cpu eids

* delete _uva

* optimize speed: emplace_back, last_layer

* add to_uva_tensor

* add cpu return_eids choice

* add gpu return_eids choice

* add cpu reindex_nodes

* add gpu reindex_nodes

* rename op and add OMP for cpu

* add incubate api

* fix the compile problem for the PADDLE_ENFORE and different device

* fix the rcom and windows compile problem

* add unittest for graph_sample_neighbors

* fix cpu unittest and unique problem

* fix uva unittest, fix cuda unique problem

* fix the windows compile problem

* fix the windows rand_r compile problem

* add correct unittest, add src_eids dispensable

* delete black

* combine uva unittest

* mv Sample_index to Sample_Index; check input shape; fix random sample func

* delete memset & cudaMemset

* fix according to PR comments

* fix rocm ci

* modify function names according to the specification

* fix windows_openblas ci

* refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors

* fix rocm ci

* rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc

* add data type

* fix conflict
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

35f949b5

Add SparseCooTensor and SparseCsrTensor (#38906) · a7edb3f3

由 zhangkaihuo 提交于 1月 27, 2022

* fix bug:
1. atten: set the default value of attn_dropout_rate to None
2. ffn: add activation parameter

* for pure fp16

* Add a SparseCsrTensor

* remove unused functional

* remove const

* remove SetMemoberTensor

* remove non_zero_nums_, the number of non zero elements of each batch can be obtained from the crows

* SparseCooTensor

* add SetMember

* merge upstream; add SetMember

* merge upstream

* merge upstream; add newline at end of file

* add newline at end of file

* remove newline at end of file

* remove newline at end of file

* stash

* user pten::framework::make_ddim

* user pten::framework::make_ddim

* merge upstream; use the latest mutable_data

* merge upstream; use the latest mutable_data

* return mutable dense tensor

a7edb3f3

22 12月, 2021 1 次提交
- Z
  
  Replaced core.ops with _C_ops (#38337) · 242ef2b9
  由 Zhanlue Yang 提交于 12月 22, 2021
  
  242ef2b9
26 11月, 2021 1 次提交
- L
  Fix bugs when bias add none in static graph for fused_attention op. (#37566) · 097e098d
  由 Li Min 提交于 11月 26, 2021
```
* Fix bugs when bias is none for static graph for fused_attention op.
```
  097e098d
23 11月, 2021 1 次提交
- L
  Add support bias is none for fused_attention op. (#37411) · 1a8786cf
  由 Li Min 提交于 11月 23, 2021
```
Add support for bias is none for fused_attention op.
```
  1a8786cf
19 11月, 2021 2 次提交

Add fuse_resnet_unit pass (#36818) · 3cd3bf29

由 wuhuanzhou 提交于 11月 19, 2021

* GeneratePass support attr condition and mapping, test=develop

* fix coverage, test=develop

* Add fuse_resnet_unit pass, test=develop

* fix CI errors, test=develop

* fix CI errors, test=develop

* fix unittest error when compiling without CUDA, test=develop

* fix static ci error, test=develop

* limit kernel size must equal 1, test=develop

3cd3bf29

Add paddle.incubate.graph_send_recv API (#37205) · 39012536

由 Siming Dai 提交于 11月 19, 2021

* add cpu version, using set: sum, min, max

* add cpu version: mean

* improve cpu code and fix dynamic memory allcation problem

* fix arg error, add index judge, delete fp16

* fix bug in CudaAtomicMax and CudaAtomicMin

* add CUDA version

* fix grad_op bug for index

* add op test, add correct cpu grad op

* Add correct CUDA Mean grad

* [Add] Successful MEAN and SUM

* [Add] Successful MIN and MAX in CPU

* [Add] Successful MIN and MAX in CUDA

* fix windows dtype ci

* fix ROCM ci by adding HIP flag

* rename fused_gather_scatter to send_recv

* unify name as send and recv

* change zero index return time

* add send_recv incubate api

* fix index data type, add unittest case for API

* delete redundant input tensor

* fix en example and docs, add default value in pool_type

* add shape judge and max grid judge

* fix comment

* fix index type bug

* add const &

* fix en docs

* delete numpy in examples

* add unittest for int input

* fix send_recv comment

* change send_recv to graph_send_recv

39012536

16 11月, 2021 1 次提交

Fix attn_bias_add bug. (#37147) · a9e7a854

由 Li Min 提交于 11月 16, 2021

fused_attention_op的实现中，使用了bias_add，且其实现是通过使用kernel primitive来实现的，之后kernel primitive的WriteData api接口及函数内部实现发生了更改，将判断越界的逻辑移到了template的参数中，使得调用的分支有错误，产生了越界赋值操作，污染了别的显存空间的内容。具体表现为：test_fused_attention_op_api.py 单次执行基本上不会报错，多次循环执行不同shape的输入，结果计算不对，具有偶发性，bug不易察觉。

a9e7a854

12 11月, 2021 1 次提交
- Z
  [fix]fix the bug of fused_attention and fused_feedforward (#36972) · 6486e242
  由 zhangkaihuo 提交于 11月 12, 2021
```
* fix bug:
1. atten: set the default value of attn_dropout_rate to None
2. ffn: add activation parameter
```
  6486e242
28 10月, 2021 1 次提交
- L
  [fix-doc-bug] Fix fused_attention_op english doc test=document_fix (#36803) · 11c2874e
  由 Li Min 提交于 10月 28, 2021
```
* Fix fused_attention english doc test=document_fix
```
  11c2874e
27 10月, 2021 1 次提交

Fused transformer encoder layer and fused feedforward layer (#36604) · 9f3613f3

由 zhangkaihuo 提交于 10月 27, 2021

本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

9f3613f3

26 10月, 2021 2 次提交

Add fused attention op backward and python layer. (#36498) · 5119428e

由 Li Min 提交于 10月 26, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

5119428e

L
Move fused_attention and fused_feedforward functional api path to incubate (#36704) · 9aeca2f1
由 Li Min 提交于 10月 26, 2021
```
将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。
```
9aeca2f1

17 10月, 2021 1 次提交
- Z
  Revert "fix the initializer of resnet unit op (#36483)" (#36487) · 314cc495
  由 Zeng Jinle 提交于 10月 17, 2021
```
This reverts commit 0452f27c.
```
  314cc495
16 10月, 2021 1 次提交
- Z
  fix the initializer of resnet unit op (#36483) · 0452f27c
  由 Zhang Zheng 提交于 10月 16, 2021
```
* fix the initializer of resnet unit op

* fix the initializer of resnet unit op
```
  0452f27c
15 10月, 2021 1 次提交
- Z
  
  Add ResNetUnit Python API (#35426) · 12882b2f
  由 Zhang Zheng 提交于 10月 15, 2021
  
  12882b2f
26 9月, 2021 1 次提交
- Y
  
  add doc for two softmax fuse api, test=document_fix (#35943) · 97922557
  由 Yuang Liu 提交于 9月 26, 2021
  
  97922557
17 9月, 2021 1 次提交
- Z
  
  Fix segment api document. (#35818) · 6d5fc220
  由 Zhong Hui 提交于 9月 17, 2021
  
  6d5fc220
16 9月, 2021 1 次提交
- Z
  
  Add segment apis to paddle.incubate (#35759) · 4b683887
  由 Zhong Hui 提交于 9月 16, 2021
  
  4b683887
16 7月, 2021 1 次提交
- Y
  
  softmax mask fuse op, test=develop (#33841) · 44bdbe93
  由 Yuang Liu 提交于 7月 16, 2021
  
  44bdbe93
15 7月, 2021 1 次提交
- W
  cache core.ops (#34058) · f05098b5
  由 wanghuancoder 提交于 7月 15, 2021
```
* cache core.ops, test=develop

* refine, test=develop
```
  f05098b5
14 7月, 2021 1 次提交
- Y
  
  rename the fuse op, test=allcase (#34120) · 6febe5fe
  由 Yuang Liu 提交于 7月 14, 2021
  
  6febe5fe
12 7月, 2021 1 次提交
- Y
  softmax mask fuse upper triangle (#33981) · e2e1c57b
  由 Yuang Liu 提交于 7月 12, 2021
```
* softmax mask fuse upper triangle

* cover not implemented cpu code
```
  e2e1c57b
11 6月, 2021 1 次提交
- Z
  update 2.0 public api in all left files (#33313) · 022198c5
  由 zhiboniu 提交于 6月 11, 2021
```
* update 2.0 public api in all left files

* reverse device.py all list;
fix some flake8 errors
```
  022198c5
22 4月, 2021 1 次提交
- T
  
  Delete WITH_GRPC flag and Distributed old code (#32383) · e58c705b
  由 tianshuo78520a 提交于 4月 22, 2021
  
  e58c705b
21 4月, 2021 1 次提交
- X
  remove fluid for auto_checkpoint. (#32157) · 1593ee25
  由 xiemoyuan 提交于 4月 21, 2021
```
* remove fluid for auto_checkpoint.

* fix bug.
```
  1593ee25
30 3月, 2021 1 次提交
- Z
  [Custom OP]Remove old custom OP and reduce whl package volume (#31813) · 04a49b09
  由 Zhou Wei 提交于 3月 30, 2021
```
* Remove old custom OP to reduce whl package volume

* [Custom OP]Remove old custom OP to reduce whl package volume
```
  04a49b09
25 1月, 2021 1 次提交
- 1
  test=develop, fix test_lookahead (#30677) · 06a3e311
  由 123malin 提交于 1月 25, 2021
```
* test=develop, fix test_lookahead
```
  06a3e311
13 1月, 2021 1 次提交
- W
  
  move 'load_op_library','LayerHelper' to 'paddle/incubate' (#30339) · 5ff4f1ad
  由 WeiXin 提交于 1月 13, 2021
  
  5ff4f1ad
07 1月, 2021 1 次提交
- 1
  Add Lookahead and ModelAverage Optimizer (#30004) · 198fbdfb
  由 123malin 提交于 1月 07, 2021
```
* test=develop, add model_average and lookahead
```
  198fbdfb
08 12月, 2020 1 次提交
- C
  
  remove complex module direction (#29419) · acce9621
  由 chentianyu03 提交于 12月 08, 2020
  
  acce9621
28 10月, 2020 1 次提交

add + - * / @ [] operator to ComplexVariable (#28217) · 6cebd714

由 chentianyu03 提交于 10月 28, 2020

* add + - * / @ [] operator to ComplexVariable, also add unittest

* fix circular reference bug

* fit for py2.7

* remove reverse oprators which not supported now

6cebd714

12 10月, 2020 1 次提交

refine adam/strided_slice && fix doc for rmsprop/unstack (#27740) · 84d8e49d

由 MRXLT 提交于 10月 12, 2020

* refine parameters order && doc

* update rmsprop doc

* refine adam/transpose/unstack/stride_slice

* fix bug && doc

* fix doc

* bug fix

* bug fix

* fix doc

* fix doc

* fix doc

* fix doc

* depercate old strided_slice

* update doc

* set default value for name

* update doc

84d8e49d

31 8月, 2020 1 次提交

Move hapi to python/paddle root dir. (#26442) · f7fb4c22

由 qingqing01 提交于 8月 31, 2020

* Move hapi form paddle/incubate to paddle

* Remove vision/datasets/utils.py and clean code

* Add sample code for conll05

* Print pull path when saving model

* Fix sample code after paramter_list of SGD is changed to parameters

* Fix bug in wmt16 datase

f7fb4c22

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致