提交 · da7d2f297ce15af23307d233ef8cfc479677a2c6 · BaiXuePrincess / Paddle

20 10月, 2022 1 次提交
- S
  [Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
  由 sneaxiy 提交于 10月 20, 2022
```
support pure bfloat16 for more ops
```
  da7d2f29
17 10月, 2022 2 次提交

Optimize performance of depthwise_conv (#46896) · 976af0da

由 Zhang Zheng 提交于 10月 17, 2022

Optimize performance of depthwise_conv

Config: input[2048, 1024, 4, 4], filter[1024, 1, 4, 4], stride=1, pad=0, dilation=1

976af0da

[Cherry-Pick]Move valid check from python to kernel (#46980) · 8bfd45ad

由 Zhang Zheng 提交于 10月 17, 2022

为了提升性能，将label的边界检查从python端转移到kernel内，减少额外op的调用，如min、max和同步拷贝等
    当前的模板参数IgnoreIndex仅在ignore_index取值范围在[0, dim)时才生效，但是当某个label值超出了边界，ignore_index等于该label，这种情况下是应该仍然能正常计算。虽然当前的计算逻辑在结果上不会出错，但逻辑上仍是有问题的，且模板参数IgnoreIndex是没有必要的

8bfd45ad

11 10月, 2022 1 次提交
- F
  
  set_value_op: add support for complex types (#46885) · b051455f
  由 Feiyu Chan 提交于 10月 11, 2022
  
  b051455f
29 9月, 2022 1 次提交
- 傅
  [cherry-pick] Add FP16 support for uniform in dygraph mode on Nvidia GPU (#46641) · a58663f3
  由傅剑寒提交于 9月 29, 2022
```
Add FP16 support for uniform in dygraph mode on Nvidia GPU
Dev PR link PR46212
```
  a58663f3
27 9月, 2022 1 次提交
- Z
  
  fix shard_index kernel (#46491) (#46511) · 5711bbee
  由 zhaoyingli 提交于 9月 27, 2022
  
  5711bbee
20 9月, 2022 2 次提交

H
[PolishComments] Polish some code comments (#46032) (#46261) · 42e56f65
由 HongyuJia 提交于 9月 20, 2022
```
* polish code comments

* polish data_device_transform.cc
```
42e56f65

[Cherry-pick] Fix amp error cp (#46272) · da173c40

由 Jiabin Yang 提交于 9月 20, 2022

* [Eager] Fix ocr (#46124)

* fix linspace error in amp

* fix log

* fix amp error

* fix ocr error which caused by amp

* add more check

* rename dtype ns

* [Eager Bug fix]Fix Detection (#46147)

* fix linspace error in amp

* fix log

* fix amp error

* Revert "Simplify size op impl (#45808)"

This reverts commit c252b1de.

* fix_seg

* fix detection
Co-authored-by: NChen Weihang <sunny_cwh@163.com>
Co-authored-by: NChen Weihang <sunny_cwh@163.com>

da173c40

19 9月, 2022 3 次提交
- R
  [vision.ops.nms] Fix return order error and duplicate results with specific... · be84cac7
  由 RichardWooSJTU 提交于 9月 19, 2022
```
[vision.ops.nms] Fix return order error and duplicate results with specific inputs (#46148) (#46193)

* fix return order error and duplicate results with specific inputs
```
  be84cac7
- S
  
  fix broadcast kernel (#46158) · 860f6077
  由 sneaxiy 提交于 9月 19, 2022
  
  860f6077
- C
  Revert "Simplify size op impl (#45808)" (#46168) · dabb8f23
  由 Chen Weihang 提交于 9月 19, 2022
```
This reverts commit c252b1de.
```
  dabb8f23
14 9月, 2022 1 次提交
- [chery-pick] Fix namespace error (#45925) (#46029) · 925e84bf
  由 engineer1109 提交于 9月 14, 2022
```
修复cuda11.7编译出错的问题
```
  925e84bf
13 9月, 2022 1 次提交
- J
  
  cherry pick softmax infer kernel (#45957) · 0903020d
  由 JingZhuangzhuang 提交于 9月 13, 2022
  
  0903020d
09 9月, 2022 1 次提交
- C
  Simplify size op impl (#45808) · c252b1de
  由 Chen Weihang 提交于 9月 09, 2022
```
* simplify size op

* trans to cuda manuly

* fix copy error
```
  c252b1de
07 9月, 2022 2 次提交
- W
  [OpAttr]Adapt tensor output_size for conv2d_transpose and depthwise_conv2d_transpose (#45620) · fe169bf1
  由 WangZhen 提交于 9月 07, 2022
```
Adapt tensor output_size for conv2d_transpose and depthwise_conv2d_transpose
```
  fe169bf1
- S
  Fix UpdateLossScalingKernel to prevent data transform error (#45809) · c084a7b1
  由 sneaxiy 提交于 9月 07, 2022
```
* fix amp kernel

* update to remove PADDLE_WITH_XPU macro
```
  c084a7b1
06 9月, 2022 4 次提交
- Y
  
  migrate unsqueeze kernels to phi, test=kunlun (#45673) · 4acf1ef7
  由 ykkk2333 提交于 9月 06, 2022
  
  4acf1ef7
- X
  
  elementwise op support fp16 (#45496) · f6d9ec27
  由 xiaohemaikoo 提交于 9月 06, 2022
  
  f6d9ec27
- L
  Fix grad error of groupnorm op when cuda version==11.7 (#45738) · b0a3638f
  由 LielinJiang 提交于 9月 06, 2022
```
* fix grad error of grounorm op when cuda version==11.7
```
  b0a3638f
- W
  
  Completes basic dtypes for collective api in eager mode (#45574) · 7a92e74b
  由 Wen Sun 提交于 9月 06, 2022
  
  7a92e74b
05 9月, 2022 1 次提交
- S
  
  fix some op int32 exceed range (#45711) · a1dbee23
  由 sneaxiy 提交于 9月 05, 2022
  
  a1dbee23
02 9月, 2022 2 次提交
- Y
  
  interpolate (forward grad) op support fp16 on gpu (#45061) · b12c27eb
  由 Yuanle Liu 提交于 9月 02, 2022
  
  b12c27eb
- T
  【PaddlePaddle Hackathon 3 No.31】为 Paddle 优化 dist op 在 GPU 上的计算性能 (#44946) · ad704715
  由 thunder95 提交于 9月 02, 2022
```
* add dist cuda kernel

* reuse some funcs in phi

* 使用pnorm

* fix code style - explicit

* fix code sytle

* fix bug

* remove unused headers
```
  ad704715
01 9月, 2022 2 次提交

[phi] Migrate uniform_random XPU kernel to PHI (#45583) · ded33b58

由 HongyuJia 提交于 9月 01, 2022

* copy kernel file to phi

* delete some code

* migrate uniform_random, test=kunlun

* fix input error, test=kunlun

* fix gpu register error, test=kunlun

* add include file, test=kunlun

* try fix error from CI, test=kunlun

* polish other PR

* fix CI-coverage error, test=kunlun

ded33b58

L
remove circular dependency of device_context and allocator (#45455) · 934171ae
由 Leo Chen 提交于 9月 01, 2022
```
* refine cmake of framework

* add deps for dense tensor

* fix deps

* remove alloc(ctx)

* add depends on mkldnn
```
934171ae

31 8月, 2022 3 次提交

A
[OpAttr]output_size of unpool support Tensor type (#45543) · 236ac0d0
由 Aurelius84 提交于 8月 31, 2022
```
* [OpAttr]output_size of unpool support Tensor type

* fix coverage

* fix contain_var

* fix coverage
```
236ac0d0

Fix split api bug (#45396) · 4a25b60d

由 Charles-hit 提交于 8月 31, 2022

* fix split bug

* solve function redefine

* fix fluid.layers.split and add unit test

* delete splitInferMeta register in unary.cc

* modify test_split_op GPU unit test

* modify test_split_op GPU unit test place param

* refactor split op and fix infershape bugs

* add () in && and ||

* fix split C++ unit test

* fix split infershape

4a25b60d

L

Add index add API (#45176) · 45171911
由 Li Min 提交于 8月 31, 2022

45171911

30 8月, 2022 4 次提交
- W
  [OpAttr]Adapt tensor axis for argmin/max (#45453) · 6fc15986
  由 WangZhen 提交于 8月 30, 2022
```
* Adapt tensor axis for argmin/max

* Add UT

* Polish UT
```
  6fc15986
- W
  [OpAttr]Adapt tensor axis for reduce_min/max/mean/sum/prod (#45078) · 32f42e94
  由 WangZhen 提交于 8月 30, 2022
```
* [OpAttr]Adapt tensor axis for reduce_min/max/mean/sum/prod
```
  32f42e94
- W
  
  Adapt tensor num_samples for multinomial (#45522) · c857841e
  由 WangZhen 提交于 8月 30, 2022
  
  c857841e
- M
  
  strided_slice grad add fp16 support (#45504) · 51f4291c
  由 ming1753 提交于 8月 30, 2022
  
  51f4291c
29 8月, 2022 1 次提交

[geometric]Move graph-related incubate api to geometric (#44970) · 8f657f74

由 Siming Dai 提交于 8月 29, 2022

* move incubate to geometric

* add paddle.geometric

* fix unittest bug

* add float16 support for segment op

* change reindex and sample neighbors flag name

* add heter graph reindex

* move sample_neighbors.py to neighbors.py

* delete khop_sampler in geometric

* delete unused code

* change sample_neighbors api input order

* fix en doc

* fix unittest

* fix unittest

* change reindex

* fix division by 0

* delete unnecessary input argument

* delete final_state

8f657f74

25 8月, 2022 5 次提交
- A
  [OpAttr]min/max of uniform_random support Tensor type (#45417) · c8955d0d
  由 Aurelius84 提交于 8月 25, 2022
```
* [OpAttr]min/max of Uniform_rand support Tensor type

* fix typo
```
  c8955d0d
- S
  make full_like support double_max in dygraph (#45385) · edd66f2e
  由 Sing_chan 提交于 8月 25, 2022
```
* make full_like support double_max in dygraph

* fix bug
```
  edd66f2e
- W
  [Eager] sync_batch_norm_grad delete mean and variance (#45411) · 5df464fe
  由 wanghuancoder 提交于 8月 25, 2022
```
* sync_batch_norm_grad delete mean and variance
```
  5df464fe
- R
  
  [triu_indices] add triu_indices_op (#45168) · a410c397
  由 Rayman 提交于 8月 25, 2022
  
  a410c397
- S
  Fix unique_kernel bugs (#45032) · ea1f4702
  由 sprouteer 提交于 8月 25, 2022
```
* fix unique_kernel bugs

* fix unique kernel cu bugs
```
  ea1f4702
24 8月, 2022 2 次提交

make tensor_util contains no cuda code (#45256) · 78916a7a

由 Leo Chen 提交于 8月 24, 2022

* make tensor_util contains no cuda code

* refine isfinite

* revert ut

* move isfinite function to its op

* fix test

* fix compile

* std::isnan is not defined for int type on windows

* fix windows compile

* fix fp16

* fix rocm compile

* revert gradient node

78916a7a

W

Adapt tensor axis for cumsum (#45372) · 7f49b9ba
由 WangZhen 提交于 8月 24, 2022

7f49b9ba

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致