提交 · a5ca2672ba240ce475759c0b30a90af1ee01f6fa · PaddlePaddle / Paddle

29 3月, 2023 1 次提交
- C
  
  Fix the type conflicts against the openblas (#52187) · a5ca2672
  由 chenxujun 提交于 3月 29, 2023
  
  a5ca2672
18 11月, 2022 1 次提交
- W
  [PHI decoupling] remove "gpu_primitives.h" in fluid (#48063) · 9918bf9c
  由 Wang Xin 提交于 11月 18, 2022
```
* remove "gpu_primitives.h" in fluid namespace

* fix PR-CI-GpuPS fail

* fix PR-CI-GpuPS fail
```
  9918bf9c
17 11月, 2022 1 次提交

Add vectorized bfloat16 atomicAdd (#48056) · ccbd03d5

由 sneaxiy 提交于 11月 17, 2022

* add vectorized bfloat16 atomicAdd

* fix compile error

* fix compile error again

* fix V100 compile error

* fix V100 compile again

ccbd03d5

16 11月, 2022 1 次提交
- W
  
  move "gpu_primitives.h" to phi (#48015) · 9adca1e7
  由 Wang Xin 提交于 11月 16, 2022
  
  9adca1e7
30 9月, 2022 1 次提交

support pure bfloat16 for more ops (#46364) · b7b231a6

由 sneaxiy 提交于 9月 30, 2022

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* add bfloat16 to selu_grad to pass CI

* fix selu grad compilation error

b7b231a6

12 8月, 2022 1 次提交

[geometric]Add paddle.geometric.send_ue_recv API (#43174) · 615b15a3

由 Siming Dai 提交于 8月 12, 2022

* add init file

* add op definition and infermeta

* add kernel definition funcs

* add broadcast infer shape

* add gpu forward kernel

* delete SUB and DIV

* add x_grad

* add template

* add e_grad for min and max

* fix small bug

* temp commit

* temp commit

* add e_grad for sum and mean

* fix some compile bug

* fix compile bugs

* fix compile problem

* add sum forward unittest

* fix broadcast error, add kernel sig, register e_grad, change unit test

* fix grad

* add temp grad fix

* temp commit

* add min max unittest

* add max, min unittest, fix mul bug

* add cpu forward sum and mean

* add forward min max, fix mean unittest

* add cpu backward min max

* fix code-style

* add backward sum mean

* fix rocm ci

* set uniitest timeout

* fix bug of x broadcast to e, gpu grad

* fix bug of x broadcast to e, cpu grad

* rename BOOST_GET_CONST macro

* fix rocm ci

* mv graph_send_e_recv to graph_send_ue_recv

* move out_size to IntArray

* add eager op test

* fix max pool type bug, add unittest for api

* revise api doc

* add fp16 for atomic min and max, add unittest

* add unittest

* add fp16 support for graph_send_recv

* fix unittest fp16 bug

* change OutSizeTensor to Out_size

* move E to Y

* add copyright, fix comment

* review code

* fix thread block size

* fix thread block size

* change api attribute name: pool_type to reduce_op, compute_type to message_op

* change api attribute name, move pool_type to reduce_op, move compute_type to message_op

615b15a3

26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
01 3月, 2022 1 次提交

[bf16] add bf16 kernel: scale gather sum (#39683) · 6d26b332

由 zhangbo9674 提交于 3月 01, 2022

* add scale gather sum

* refine CUDA_ATOMIC_WRAPPER ADD for bf16

* add gather unittest

* solve conflict

* add scale uinttest

* add sum unittest

* solve conflict

* refine gather unittest

* refine unittest

6d26b332

25 2月, 2022 1 次提交
- L
  [Fix bug] fix fp16 atomicAdd compiler error on different cuda_arch. (#39886) · ef96ffb6
  由 Li Min 提交于 2月 25, 2022
```
* Fix compile error on cuda_arch less than 700.
```
  ef96ffb6
24 2月, 2022 1 次提交
- L
  optimize performance of lookup_table_v2_op (#39856) · d6038c22
  由 Li Min 提交于 2月 24, 2022
```
* optimize block config  and fp16 atomicAdd perf for lookup_table_v2_grad.
```
  d6038c22
09 12月, 2021 1 次提交
- S
  Refine CUDA atomicAdd for FP16 by CUDA primitive methods (#37895) · 033ebe7e
  由 sneaxiy 提交于 12月 09, 2021
```
* fix cuda atomicAdd for FP16

* try to fix ci
```
  033ebe7e
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
19 11月, 2021 1 次提交

Add paddle.incubate.graph_send_recv API (#37205) · 39012536

由 Siming Dai 提交于 11月 19, 2021

* add cpu version, using set: sum, min, max

* add cpu version: mean

* improve cpu code and fix dynamic memory allcation problem

* fix arg error, add index judge, delete fp16

* fix bug in CudaAtomicMax and CudaAtomicMin

* add CUDA version

* fix grad_op bug for index

* add op test, add correct cpu grad op

* Add correct CUDA Mean grad

* [Add] Successful MEAN and SUM

* [Add] Successful MIN and MAX in CPU

* [Add] Successful MIN and MAX in CUDA

* fix windows dtype ci

* fix ROCM ci by adding HIP flag

* rename fused_gather_scatter to send_recv

* unify name as send and recv

* change zero index return time

* add send_recv incubate api

* fix index data type, add unittest case for API

* delete redundant input tensor

* fix en example and docs, add default value in pool_type

* add shape judge and max grid judge

* fix comment

* fix index type bug

* add const &

* fix en docs

* delete numpy in examples

* add unittest for int input

* fix send_recv comment

* change send_recv to graph_send_recv

39012536

01 6月, 2021 1 次提交

replace and remove complex64/128 types in custom OP and other files (#33195) · 06c63ca0

由 chentianyu03 提交于 6月 01, 2021

* replace and remove complex64/128 types in custom OP and other files

* fix custom_tensor_test fail bug

* fix custom_conj_test fail bug

* fix dispatch_test_op build fail bug

06c63ca0

07 4月, 2021 1 次提交
- F
  
  bugfix for unit test test_segment_ops (#32116) · d91faf29
  由 furnace 提交于 4月 07, 2021
  
  d91faf29
08 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part3), test=develop (#30913) · 93c1d9e7
  由 Qi Li 提交于 2月 08, 2021
  
  93c1d9e7
25 12月, 2020 1 次提交

[Complex] Add support for complex grad accumulated (#29889) · 1a304e6c

由 Chen Weihang 提交于 12月 25, 2020

* add support for complex grad accumulated

* add unittest for coverage

* update test dtype

* remove useless blank line

1a304e6c

26 9月, 2020 1 次提交
- Z
  fix cpplint error for the autmic max/min · a85592bc
  由 Zhong Hui 提交于 9月 26, 2020
```
fix cpplint error for the autmic max/min
```
  a85592bc
25 9月, 2020 1 次提交
- Z
  fix cuda atomic for ARCH<350 for the automic_max · 597345d1
  由 Zhong Hui 提交于 9月 25, 2020
```
fix cuda atomic for ARCH<350 for the automic_max
```
  597345d1
24 9月, 2020 1 次提交
- Z
  Add GPU Kernels of Segment Ops, support, sum, max, min, mean · 4a9d21de
  由 Zhong Hui 提交于 9月 24, 2020
```
Add GPU Kernels of Segment Ops,  support, sum, max, min, mean
```
  4a9d21de
31 7月, 2018 1 次提交
- D
  Fix/float16 style (#12446) · 6d3da458
  由 dzhwinter 提交于 7月 31, 2018
```
* "rewrite the test case"

* "follow comment"
```
  6d3da458
30 7月, 2018 1 次提交
- D
  float16 type support enhance (#12181) · 39ac9e39
  由 dzhwinter 提交于 7月 30, 2018
```
* cherry picked

* "cherry picked platform"

* "add comment"

* "fix ci"
```
  39ac9e39
03 5月, 2018 1 次提交
- C
  Fix __shfl_down_sync_ of cross_entropy (#10345) · 4fbde42c
  由 chengduo 提交于 5月 03, 2018
```
* fix __shfl_down_sync_ of cross_entropy

* use reduceSum

* "fix ci"
```
  4fbde42c
02 5月, 2018 2 次提交
- C
  
  replace __shfl with __shfl_sync · b8f7fa97
  由 chengduoZH 提交于 5月 02, 2018
  
  b8f7fa97
- C
  
  fix shfl_sync for CUDA8.0 · 90d73c79
  由 chengduoZH 提交于 5月 02, 2018
  
  90d73c79
30 4月, 2018 1 次提交
- D
  Feature/cuda9 cudnn7 (#10140) · eb6f9dd5
  由 dzhwinter 提交于 4月 30, 2018
```
* "re-commit "

* "picked up"

* "fix ci"

* "fix pdb hang up issue in cuda 9"
```
  eb6f9dd5
10 4月, 2018 2 次提交
- Y
  
  Make cuda_helper.h Pass cpplint · 40e3fe17
  由 Yu Yang 提交于 4月 10, 2018
  
  40e3fe17
- C
  Move reduceSum to elementwise_op_function.h (#9773) · b1224da8
  由 chengduo 提交于 4月 10, 2018
```
* add cuda_device_functions.h

* move reduceSum to elementwise_op_function.h
```
  b1224da8
28 2月, 2018 1 次提交
- C
  
  Add todo for reduceSum · 90dc33b5
  由 chengduoZH 提交于 2月 28, 2018
  
  90dc33b5
26 2月, 2018 1 次提交
- C
  
  refine Sum · b8938b44
  由 chengduoZH 提交于 2月 24, 2018
  
  b8938b44
24 2月, 2018 1 次提交
- C
  
  follow comments · a8288392
  由 chengduoZH 提交于 2月 24, 2018
  
  a8288392
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 1 次提交
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
23 11月, 2017 1 次提交
- Y
  Feature/support int64 for sum (#5832) · c077a6d5
  由 Yu Yang 提交于 11月 23, 2017
```
* Support int64 for sum op

* Refine code
```
  c077a6d5
18 9月, 2017 1 次提交
- 武
  Refine accuracy_op CUDA kernel (#4097) · 8580dce3
  由武毅提交于 9月 18, 2017
```
* refind accuracy_op

* follow comments

* follow comments
```
  8580dce3
23 8月, 2017 1 次提交
- D
  
  Remove set functor and add comapre_grad test · f188e22b
  由 dangqingqing 提交于 8月 23, 2017
  
  f188e22b
22 8月, 2017 2 次提交
- D
  
  fix cuda_helper.h · 9bc1a1a1
  由 dangqingqing 提交于 8月 22, 2017
  
  9bc1a1a1
- D
  lookup table op, cuda helper and set functor · 0f3b9e41
  由 dangqingqing 提交于 8月 22, 2017
```
1. finish lookup table CPU and GPU kernel
2. Add some cuda helper
3. Add some math funtor
```
  0f3b9e41

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功