提交 · d6038c22696e23dfc181643694e84f888e8001ae · BaiXuePrincess / Paddle

24 2月, 2022 1 次提交
- L
  optimize performance of lookup_table_v2_op (#39856) · d6038c22
  由 Li Min 提交于 2月 24, 2022
```
* optimize block config  and fp16 atomicAdd perf for lookup_table_v2_grad.
```
  d6038c22
09 12月, 2021 1 次提交
- S
  Refine CUDA atomicAdd for FP16 by CUDA primitive methods (#37895) · 033ebe7e
  由 sneaxiy 提交于 12月 09, 2021
```
* fix cuda atomicAdd for FP16

* try to fix ci
```
  033ebe7e
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
19 11月, 2021 1 次提交

Add paddle.incubate.graph_send_recv API (#37205) · 39012536

由 Siming Dai 提交于 11月 19, 2021

* add cpu version, using set: sum, min, max

* add cpu version: mean

* improve cpu code and fix dynamic memory allcation problem

* fix arg error, add index judge, delete fp16

* fix bug in CudaAtomicMax and CudaAtomicMin

* add CUDA version

* fix grad_op bug for index

* add op test, add correct cpu grad op

* Add correct CUDA Mean grad

* [Add] Successful MEAN and SUM

* [Add] Successful MIN and MAX in CPU

* [Add] Successful MIN and MAX in CUDA

* fix windows dtype ci

* fix ROCM ci by adding HIP flag

* rename fused_gather_scatter to send_recv

* unify name as send and recv

* change zero index return time

* add send_recv incubate api

* fix index data type, add unittest case for API

* delete redundant input tensor

* fix en example and docs, add default value in pool_type

* add shape judge and max grid judge

* fix comment

* fix index type bug

* add const &

* fix en docs

* delete numpy in examples

* add unittest for int input

* fix send_recv comment

* change send_recv to graph_send_recv

39012536

01 6月, 2021 1 次提交

replace and remove complex64/128 types in custom OP and other files (#33195) · 06c63ca0

由 chentianyu03 提交于 6月 01, 2021

* replace and remove complex64/128 types in custom OP and other files

* fix custom_tensor_test fail bug

* fix custom_conj_test fail bug

* fix dispatch_test_op build fail bug

06c63ca0

07 4月, 2021 1 次提交
- F
  
  bugfix for unit test test_segment_ops (#32116) · d91faf29
  由 furnace 提交于 4月 07, 2021
  
  d91faf29
08 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part3), test=develop (#30913) · 93c1d9e7
  由 Qi Li 提交于 2月 08, 2021
  
  93c1d9e7
25 12月, 2020 1 次提交

[Complex] Add support for complex grad accumulated (#29889) · 1a304e6c

由 Chen Weihang 提交于 12月 25, 2020

* add support for complex grad accumulated

* add unittest for coverage

* update test dtype

* remove useless blank line

1a304e6c

26 9月, 2020 1 次提交
- Z
  fix cpplint error for the autmic max/min · a85592bc
  由 Zhong Hui 提交于 9月 26, 2020
```
fix cpplint error for the autmic max/min
```
  a85592bc
25 9月, 2020 1 次提交
- Z
  fix cuda atomic for ARCH<350 for the automic_max · 597345d1
  由 Zhong Hui 提交于 9月 25, 2020
```
fix cuda atomic for ARCH<350 for the automic_max
```
  597345d1
24 9月, 2020 1 次提交
- Z
  Add GPU Kernels of Segment Ops, support, sum, max, min, mean · 4a9d21de
  由 Zhong Hui 提交于 9月 24, 2020
```
Add GPU Kernels of Segment Ops,  support, sum, max, min, mean
```
  4a9d21de
31 7月, 2018 1 次提交
- D
  Fix/float16 style (#12446) · 6d3da458
  由 dzhwinter 提交于 7月 31, 2018
```
* "rewrite the test case"

* "follow comment"
```
  6d3da458
30 7月, 2018 1 次提交
- D
  float16 type support enhance (#12181) · 39ac9e39
  由 dzhwinter 提交于 7月 30, 2018
```
* cherry picked

* "cherry picked platform"

* "add comment"

* "fix ci"
```
  39ac9e39
03 5月, 2018 1 次提交
- C
  Fix __shfl_down_sync_ of cross_entropy (#10345) · 4fbde42c
  由 chengduo 提交于 5月 03, 2018
```
* fix __shfl_down_sync_ of cross_entropy

* use reduceSum

* "fix ci"
```
  4fbde42c
02 5月, 2018 2 次提交
- C
  
  replace __shfl with __shfl_sync · b8f7fa97
  由 chengduoZH 提交于 5月 02, 2018
  
  b8f7fa97
- C
  
  fix shfl_sync for CUDA8.0 · 90d73c79
  由 chengduoZH 提交于 5月 02, 2018
  
  90d73c79
30 4月, 2018 1 次提交
- D
  Feature/cuda9 cudnn7 (#10140) · eb6f9dd5
  由 dzhwinter 提交于 4月 30, 2018
```
* "re-commit "

* "picked up"

* "fix ci"

* "fix pdb hang up issue in cuda 9"
```
  eb6f9dd5
10 4月, 2018 2 次提交
- Y
  
  Make cuda_helper.h Pass cpplint · 40e3fe17
  由 Yu Yang 提交于 4月 10, 2018
  
  40e3fe17
- C
  Move reduceSum to elementwise_op_function.h (#9773) · b1224da8
  由 chengduo 提交于 4月 10, 2018
```
* add cuda_device_functions.h

* move reduceSum to elementwise_op_function.h
```
  b1224da8
28 2月, 2018 1 次提交
- C
  
  Add todo for reduceSum · 90dc33b5
  由 chengduoZH 提交于 2月 28, 2018
  
  90dc33b5
26 2月, 2018 1 次提交
- C
  
  refine Sum · b8938b44
  由 chengduoZH 提交于 2月 24, 2018
  
  b8938b44
24 2月, 2018 1 次提交
- C
  
  follow comments · a8288392
  由 chengduoZH 提交于 2月 24, 2018
  
  a8288392
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 1 次提交
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
23 11月, 2017 1 次提交
- Y
  Feature/support int64 for sum (#5832) · c077a6d5
  由 Yu Yang 提交于 11月 23, 2017
```
* Support int64 for sum op

* Refine code
```
  c077a6d5
18 9月, 2017 1 次提交
- 武
  Refine accuracy_op CUDA kernel (#4097) · 8580dce3
  由武毅提交于 9月 18, 2017
```
* refind accuracy_op

* follow comments

* follow comments
```
  8580dce3
23 8月, 2017 1 次提交
- D
  
  Remove set functor and add comapre_grad test · f188e22b
  由 dangqingqing 提交于 8月 23, 2017
  
  f188e22b
22 8月, 2017 2 次提交
- D
  
  fix cuda_helper.h · 9bc1a1a1
  由 dangqingqing 提交于 8月 22, 2017
  
  9bc1a1a1
- D
  lookup table op, cuda helper and set functor · 0f3b9e41
  由 dangqingqing 提交于 8月 22, 2017
```
1. finish lookup table CPU and GPU kernel
2. Add some cuda helper
3. Add some math funtor
```
  0f3b9e41

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致