提交 · 1e1c72751827281ec5b83b30ca17e11a8f64d227 · PaddlePaddle / Paddle

19 10月, 2022 1 次提交
- W
  
  slice op supports uint8_t (#47067) · 1e1c7275
  由 will-jl944 提交于 10月 19, 2022
  
  1e1c7275
18 10月, 2022 2 次提交
- S
  add embedding range check (#46991) · d68c38ef
  由 seemingwang 提交于 10月 18, 2022
```
* add embedding range check

* change head file

* change head file

* fix
```
  d68c38ef
- L
  
  Add value check & error message for gather_tree (#47051) · e5e3d5cf
  由 liu zhengxi 提交于 10月 18, 2022
  
  e5e3d5cf
17 10月, 2022 2 次提交

Support BF16 training for sharding (#46846) · 0b39b244

由 Ghost Screaming 提交于 10月 17, 2022

* Fix bug of reduce_sum op. When input.numel() > INT32_MAX, its result
is wrong.

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* Support bfloat16 type for reducer and sharding.

* Fix some bug.

* Polish code.

* Polise code.

* Add bfloat16 datatype in fill_grad kernels.
Co-authored-by: Nsneaxiy <sneaxiy@126.com>

0b39b244

Y
[PHI]Modify DataLayout's namespace from paddle::experimental to phi (#46869) · ec749398
由 YuanRisheng 提交于 10月 17, 2022
```
* namespace modify

* update by comment
```
ec749398

13 10月, 2022 4 次提交
- X
  
  logsumexp support fp16 (#45817) · 910e1b6a
  由 xiaohemaikoo 提交于 10月 13, 2022
  
  910e1b6a
- [Zero-Dim] support 0D for paddle.transpose/reshape/stack/tile/unsqueeze (#46555) · 78add057
  由 zhouweiwei2014 提交于 10月 13, 2022
  
  78add057
- Z
  Revert #46111 (#46961) · cf9ca61d
  由 Zhang Ting 提交于 10月 13, 2022
```
* Revert "【Hackathon No.56&38】deformable_conv_v1 算子实现 float16 数据类型支持&前向运行加速 (#46111)"
```
  cf9ca61d
- Z
  Correct the logic and remove unnecessary template param (#46623) · 450af30c
  由 Zhang Zheng 提交于 10月 13, 2022
```
* Correct the logic and remove unnecessary template param

* fix error throw

* fix print format

* fix ci
```
  450af30c
12 10月, 2022 1 次提交
- Z
  Revert "remove comment (#46827)" (#46935) · 2ea3700a
  由 Zhang Ting 提交于 10月 12, 2022
```
This reverts commit 8a5f17e8.
```
  2ea3700a
11 10月, 2022 1 次提交
- F
  
  set_value_op: add support for complex types (#46884) · 34c7e3e3
  由 Feiyu Chan 提交于 10月 11, 2022
  
  34c7e3e3
10 10月, 2022 3 次提交
- R
  
  remove comment (#46827) · 8a5f17e8
  由 Rayman 提交于 10月 10, 2022
  
  8a5f17e8
- R
  
  【Hackathon No.36】优化 lerp_grad op 在 GPU 上的计算性能 (#45946) · ef61df30
  由 Rayman 提交于 10月 10, 2022
  
  ef61df30
- R
  【Hackathon No.56&38】deformable_conv_v1 算子实现 float16 数据类型支持&前向运行加速 (#46111) · 5e0614a1
  由 Rayman 提交于 10月 10, 2022
```
support fp16 for deformable conv
```
  5e0614a1
30 9月, 2022 3 次提交

Z
Optimize performance of depthwise_conv_bwd of filter (#46490) · 04eb211a
由 Zhang Zheng 提交于 9月 30, 2022
```
* Optimize performance of depthwise_conv_bwd of filter

* op-benchmark

* fix

* op benchmark

* merge bwd
```
04eb211a
Z
Optimize performance of depthwise_conv_bwd (#46362) · f17a73e9
由 Zhang Zheng 提交于 9月 30, 2022
```
* Optimize performance of depthwise_conv_bwd

* fix
```
f17a73e9

support pure bfloat16 for more ops (#46364) · b7b231a6

由 sneaxiy 提交于 9月 30, 2022

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* add bfloat16 to selu_grad to pass CI

* fix selu grad compilation error

b7b231a6

29 9月, 2022 3 次提交

Move valid check from python to kernel (#46412) · 37bc2d7b

由 Zhang Zheng 提交于 9月 29, 2022

* Move valid check from python to kernel

* fix error throw

* fix

* invalid label check

* fix

* Revert "fix"

This reverts commit 79fad6799cfa4b30423dbc84e67d7d843d22b84a.

* Revert "invalid label check"

This reverts commit 402a9707390ad5386b3222e85844b92d2e9b9fa4.

* Revert "fix"

This reverts commit 09ba3080ee0587447f875c19cdf060485f15ae3b.

* Revert "fix error throw"

This reverts commit a901bfcc2179d5c120ec29af766f392b122dab52.

* Revert "Move valid check from python to kernel"

This reverts commit baa03cc4ef82d8d45516c30dfb52bf5aead30748.

* final fix

* fix

* fix

37bc2d7b

fix P40 topk: Make the optimized topk compatible with P40. (#46547) · 667082c0

由 carryyu 提交于 9月 29, 2022

* fix P40 topk: Make the optimized topk compatible with P40.

* fix P40 topk: Make the optimized topk compatible with P40.

* fix P40 topk: Make the optimized topk compatible with P40.

667082c0

傅

fix uniform_rand_kernel FP16 support in dygraph mode (#46212) · ccab0e2a
由傅剑寒提交于 9月 29, 2022

ccab0e2a

28 9月, 2022 1 次提交

Remove the declaration of using Tensor in framework/tensor.h (#46432) · e12a905e

由 Chen Weihang 提交于 9月 28, 2022

* remove needless using tensor

* remove needless using tensor

* resolve conflict

* replace tensor using

* fix format error

* revert needless changing

* fix rocm and npu compile error

* fix cinn compile error

* fix format error

* fix mkldnn format error

* fix mkldnn format error

* fix cinn compile error

* fix cinn compile error

* fix cinn compile error

* resolve conflict

e12a905e

26 9月, 2022 1 次提交
- Z
  
  fix shard_index kernel (#46491) · 808bf2b4
  由 zhaoyingli 提交于 9月 26, 2022
  
  808bf2b4
23 9月, 2022 2 次提交
- Z
  Optimize performance of depthwise_conv_fwd (#46287) · 330b1a0a
  由 Zhang Zheng 提交于 9月 23, 2022
```
* Optimize performance of depthwise_conv_fwd

* fix
```
  330b1a0a
- Y
  
  move selected_rows_functor (#46373) · b6c6f4f9
  由 YuanRisheng 提交于 9月 23, 2022
  
  b6c6f4f9
22 9月, 2022 1 次提交

Optimize topk's performance when k is small and input_width is large (#45312) · 2c687df0

由 carryyu 提交于 9月 22, 2022

* Optimize topk's performance when k is small and input_width is large

* 修改blockdim设置逻辑

* Update top_k_function_cuda.h

2c687df0

21 9月, 2022 3 次提交

add layer_norm trt fp16 support (#45043) · b7a1ae22

由 ccrrong 提交于 9月 21, 2022

* add fp16 support

* update

* update half

* code format

* fix unittest

* fix rocm compile error

* code format

* code format

* fix rocm compile error

* fix rocm compile error

b7a1ae22

Enable PaddleInference to use CINN. (#45009) · 3aa6bd57

由 Zhen Wang 提交于 9月 21, 2022

* use cinn in the paddle inference

* fix some cmake errors

* Avoid division by zero in the arange_kernel.

* Avoid dynamic ops.

* Remove some useless codes.

* Use OpTransInfo to encapsulate some codes used in the build_cinn_pass.

3aa6bd57

5

optimization of depthwise_conv2d grad (#46332) · 18650db3
由 5u13 提交于 9月 21, 2022

18650db3

20 9月, 2022 4 次提交
- 傅
  
  Flip Kernel Optimization (#46119) · bcef8275
  由傅剑寒提交于 9月 20, 2022
  
  bcef8275
- Y
  
  move reduce func (#46248) · 6b47507d
  由 YuanRisheng 提交于 9月 20, 2022
  
  6b47507d
- J
  [Eager] Fix ocr (#46124) · d13a4a25
  由 Jiabin Yang 提交于 9月 20, 2022
```
* fix linspace error in amp

* fix log

* fix amp error

* fix ocr error which caused by amp

* add more check

* rename dtype ns
```
  d13a4a25
- H
  [PolishComments] Polish some code comments (#46032) · 56f9452c
  由 HongyuJia 提交于 9月 20, 2022
```
* polish code comments

* polish data_device_transform.cc
```
  56f9452c
19 9月, 2022 3 次提交
- Y
  [PHI]Move sum op to PHI (#45860) · 4b3f2af1
  由 YuanRisheng 提交于 9月 19, 2022
```
* move sum

* fix ci bugs

* fix ci bugs

* fix set_lod bugs

* fix infershape bugs

* fix ci bugs

* fix ci unittest bug

* fix ci bugs

* perfect code

* update code according comment

* add unittest

* fix ci bugs
```
  4b3f2af1
- C
  Revert "Simplify size op impl (#45808)" (#46123) · d963e2e4
  由 Chen Weihang 提交于 9月 19, 2022
```
This reverts commit c252b1de.
```
  d963e2e4
- R
  [vision.ops.nms] Fix return order error and duplicate results with specific inputs (#46148) · 2b76db99
  由 RichardWooSJTU 提交于 9月 19, 2022
```
* fix return order error and duplicate results with specific inputs
```
  2b76db99
18 9月, 2022 1 次提交
- Y
  Delete redundant param in SoftmaxFunctor (#46003) · 7f346a76
  由 YuanRisheng 提交于 9月 18, 2022
```
* perfect softmax functor

* fix compile bugs

* fix ci bugs
```
  7f346a76
16 9月, 2022 2 次提交

Support broadcast elementwise operators with int64 index type (#45741) · 20b5bf84

由 sneaxiy 提交于 9月 16, 2022

* support int64 non-broadcast

* support broadcast case for int64 index

* fix bug

* support more Arity

* remove some codes

* upgrade patchelf to v0.15.0 to pass CI build

* fix bug

* fix patchelf installation

* add debug flags

* remove useless codes

* fix viterbi_decode and set_value op uts

* remove always enable int64

20b5bf84

Z

Correct spelling errors (#46108) · 08186f14
由 Zhang Zheng 提交于 9月 16, 2022

08186f14

15 9月, 2022 2 次提交
- 傅
  
  Optimize flip kernel by eliminating H2D data transfer, test=develop (#46046) · b3283f4c
  由傅剑寒提交于 9月 15, 2022
  
  b3283f4c
- L
  
  add determine action for embed_grad and index_add. (#46040) · 0c40d889
  由 Li Min 提交于 9月 15, 2022
  
  0c40d889

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功