提交 · 39012536762576eda7c72aa5413de8202dba8e7d · PaddlePaddle / Paddle

19 11月, 2021 5 次提交

Add paddle.incubate.graph_send_recv API (#37205) · 39012536

由 Siming Dai 提交于 11月 19, 2021

* add cpu version, using set: sum, min, max

* add cpu version: mean

* improve cpu code and fix dynamic memory allcation problem

* fix arg error, add index judge, delete fp16

* fix bug in CudaAtomicMax and CudaAtomicMin

* add CUDA version

* fix grad_op bug for index

* add op test, add correct cpu grad op

* Add correct CUDA Mean grad

* [Add] Successful MEAN and SUM

* [Add] Successful MIN and MAX in CPU

* [Add] Successful MIN and MAX in CUDA

* fix windows dtype ci

* fix ROCM ci by adding HIP flag

* rename fused_gather_scatter to send_recv

* unify name as send and recv

* change zero index return time

* add send_recv incubate api

* fix index data type, add unittest case for API

* delete redundant input tensor

* fix en example and docs, add default value in pool_type

* add shape judge and max grid judge

* fix comment

* fix index type bug

* add const &

* fix en docs

* delete numpy in examples

* add unittest for int input

* fix send_recv comment

* change send_recv to graph_send_recv

39012536

Y

[fleet_executor] Parse pipeline config (#37319) · ca088f92
由 Yuang Liu 提交于 11月 19, 2021

ca088f92
W

[fleet_executor] Add interceptor register (#37338) · f11e843a
由 WangXi 提交于 11月 19, 2021

f11e843a
C
[PTen] Add compatible reshape method for Tensor (#37281) · 715fd051
由 Chen Weihang 提交于 11月 18, 2021
```
* add reshape method for Tensor

* fix typo

* fix typo

* fix conflit with develop
```
715fd051
L

fix cmake dependence error (#37304) · 6653ac5e
由 LiYuRio 提交于 11月 19, 2021

6653ac5e

18 11月, 2021 7 次提交

J
Fix for wrong results in segmentation models (#37310) · c1802f91
由 jakpiase 提交于 11月 18, 2021
```
* fix

* ci rerun

* ci rerun

* ci Rerun
```
c1802f91
optimize the data structure to speed up sampling in graph engine. (#37315) · 521a274e
由 Webbley 提交于 11月 18, 2021
```
* optimize the data structure from c++ to python to speed up sampling in graph engine

* update test
```
521a274e
L
fix bug to support dropout eval grad computing. (#37305) · c3d3001f
由 Li Min 提交于 11月 18, 2021
```
* fix bug to support dropout eval grad computing.

* Remove useless code.
```
c3d3001f

[PTen]elementwise_sub kernel refactor (#37260) · 36a95654

由 YuanRisheng 提交于 11月 18, 2021

* elementwise_add kernel refactor

* fix compile bugs in elementwise_add refactor

* fix compile bugs when run in npu/xpu

* fix bugs when run unit test

* fix bugs when run ci-windows

* modify code as recommended

* code format adjust

* fix bugs when run ci

* fix compile bug when run in ci-windwos

* elementwise_sub refactor

* add PD_DLL_DECL for elementwise_sub

* fix bugs when compilei

36a95654

Y

[fleet_executor] Parse runtime graph to start carrier (#37282) · f85bd5c9
由 Yuang Liu 提交于 11月 18, 2021

f85bd5c9

Add the `GetFetchNames` method in CinnGraphSymbolization. (#37218) · 3ad495e8

由 Zhen Wang 提交于 11月 18, 2021

* Add the `GetFetchNames` method in CinnGraphSymbolization.

* Use unordered_set instead vector as the type of fetch_var_names.

* Reuse the definition of kCompilationKey.

* Use CompileOptions to set fetch_var_ids.

* Update the argument passing of GraphCompiler.Build.

* Fix some bugs in CinnGraphSymbolization::GetFetchIds.

3ad495e8

Opt topk (#37256) · c4862d99

由 zhangkaihuo 提交于 11月 18, 2021

topk中有cub和手写kernel两种实现，而cub是通过排序来获取topk，通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。

c4862d99

17 11月, 2021 16 次提交
- S
  Replace custom IOHW -> OIHW reorder with build-in oneDNN reorder (#37175) · 162ac048
  由 Sławomir Siwek 提交于 11月 17, 2021
```
* Use oneDNN reorder instead of custom one

* Fix whitespace typo

* Fix Code format error

* Incorporating feedback

* Remove unncessary reorder

* Support GIOHW format

* Fix code format error
```
  162ac048
- L
  [new-exec] Refine standalone executor (#37278) · 6d6642c8
  由 Leo Chen 提交于 11月 17, 2021
```
* init

* add feed ops in python side

* import LRScheduler

* update_feed

* refine code format
```
  6d6642c8
- P
  Changed first batch of deprecated mkldnn headers and function names to new oneDNN names (#37040) · ce3ee9bb
  由 piotrekobiIntel 提交于 11月 17, 2021
```
* Change first batch of mkldnn headers and namespace names to dnnl

* Revert changes to tensor.h, which require approval

* Format changes with pre-commit

* Add int32 tests

* Fix int32 tests and call GetDataFromTensor for int32

* Fix test
```
  ce3ee9bb
- N
  Modify reduce_op.op.h for xpu2 with kernel primitive api (#36904) · 9c5d5665
  由 niuliling123 提交于 11月 17, 2021
```
* Modify reduce_op.op.h for xpu2 with kernel primitive api
```
  9c5d5665
- A
  
  Fix data transform bug in new executor (#37280) · 1460b761
  由 Aurelius84 提交于 11月 17, 2021
  
  1460b761
- 石
  
  change the meta modification rules, test=develop (#37255) · 8c44ad47
  由石晓伟提交于 11月 17, 2021
  
  8c44ad47
- C
  [PTen] Add slice api implemention for Tensor (#37276) · 3328eb03
  由 Chen Weihang 提交于 11月 17, 2021
```
* add slice api impl of Tensor

* fix test slice error
```
  3328eb03
- Z
  
  update dataset (#37194) · ca8c4f3e
  由 zhaocaibei123 提交于 11月 17, 2021
  
  ca8c4f3e
- Z
  [heterps]Refactor heterogenous worker (#37244) · 54d2626a
  由 zmx 提交于 11月 17, 2021
```
* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* refactor heter trainer. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop
```
  54d2626a
- D
  
  fix compile error when pslib use cpu branch;test=develop (#37248) · 0057c12d
  由 danleifeng 提交于 11月 17, 2021
  
  0057c12d
- Z
  
  add ut parallel (#37211) · 1223238f
  由 zhangchunle 提交于 11月 17, 2021
  
  1223238f
- L
  copy beta pow to same place when skip_update=1 (#37245) · 5e4b419b
  由 Leo Chen 提交于 11月 17, 2021
```
* copy beta pow to same place when skip_update=1

* fix xpu
```
  5e4b419b
- Z
  
  rename TensorBase interface data_type() to dtype() (#37257) · 1e9b3a3d
  由 zyfncg 提交于 11月 17, 2021
  
  1e9b3a3d
- L
  
  [Fleet Executor] Construct runtime graph (#37158) · 0daa69d4
  由 LiYuRio 提交于 11月 17, 2021
  
  0daa69d4
- W
  
  [npu][hybrid] support offload (#37224) · 762819a8
  由 WangXi 提交于 11月 17, 2021
  
  762819a8
- X
  Dependence analysis (#37231) · d943459b
  由 xiongkun 提交于 11月 17, 2021
```
* add

* add BuildOperatorDependences

* fix bug

* add unittest for write after write

* fix merge bug

* fix
```
  d943459b
16 11月, 2021 11 次提交
- C
  
  decrease pten log level (#37239) · d8982c52
  由 Chen Weihang 提交于 11月 16, 2021
  
  d8982c52
- A
  Added BF16 Pool2d grad (#37081) · f95d44a2
  由 arlesniak 提交于 11月 16, 2021
```
* Added BF16 Pool2d grad

* upstream pulled

* fix for CI

* fixes after review
```
  f95d44a2
- D
  
  [psgpu]fix pipe bug:save and pull overlap; test=develop (#37233) · 62ec644f
  由 danleifeng 提交于 11月 16, 2021
  
  62ec644f
- W
  
  Removed unnecessary ENFORCE statement (#37219) · 70b7c7ed
  由 Weilong Wu 提交于 11月 16, 2021
  
  70b7c7ed
- Y
  Add API and unit test for reshape (#37232) · 79b49c20
  由 YuanRisheng 提交于 11月 16, 2021
```
* reshape kernel refactor

* fix compile bugs when run ci

* support xpu for reshape

* fix bugs when run unittest in kunlun ci

* fix compile bugs when run kunlun

* perfect code according to suggestion

* add api and unit test for reshape
```
  79b49c20
- Z
  for pure fp16 (#37230) · 6ebc318e
  由 zhangkaihuo 提交于 11月 16, 2021
```
Add pure fp16 support for fused transformer.
```
  6ebc318e
- Y
  Make FLAGS_determinstic effective in conv2d forward. (#37173) · ea47d211
  由 Yiqun Liu 提交于 11月 16, 2021
```
* Make FLAGS_determinstic effective in conv2d forward.

* Add call of SetCinnCudnnDeterministic in cinn_launch op.
```
  ea47d211
- J
  
  added onednn elu kernel (#37149) · ae40ee32
  由 jakpiase 提交于 11月 16, 2021
  
  ae40ee32
- L
  Fix attn_bias_add bug. (#37147) · a9e7a854
  由 Li Min 提交于 11月 16, 2021
```
fused_attention_op的实现中，使用了bias_add，且其实现是通过使用kernel primitive来实现的，之后kernel primitive的WriteData api接口及函数内部实现发生了更改，将判断越界的逻辑移到了template的参数中，使得调用的分支有错误，产生了越界赋值操作，污染了别的显存空间的内容。具体表现为：test_fused_attention_op_api.py 单次执行基本上不会报错，多次循环执行不同shape的输入，结果计算不对，具有偶发性，bug不易察觉。
```
  a9e7a854
- 石
  
  supports the slice of upper tensor, test=develop (#37215) · c5ccff73
  由石晓伟提交于 11月 16, 2021
  
  c5ccff73
- Y
  
  [fleet_executor] Add sync method (#37167) · f49c2c23
  由 Yuang Liu 提交于 11月 16, 2021
  
  f49c2c23
15 11月, 2021 1 次提交

[Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a

由 Chen Weihang 提交于 11月 15, 2021

* move extension into pten [no-verify]

* append tensor methods by ext_tensor [no-verify]

* append other tensor methods [no-verify]

* ext related files tidy [no-verify]

* include relation tidy [no-verify]

* add pten tensor test [no-verify]

* replace tensor in custom op & compile success

* refine tensor constructor for unittest

* custom relu jit run success

* fix all custom op unittests

* add inference cmake adapt [no-verify]

* fix failed unittests

* fix windows failed unittests

* try to fix kunlun and inference failed

* fix test_elementwise_api error

* try to fix win compile failed

* fix kunlun fp16 type error

* remove useless haddle error macro

* add custom linear op test

* fix compile failed & add win symbols

* fix non pten kernel cast failed

* add dll decl for api

* polish several deetails

* polish details by review comment

* add dll_decl for register

1e598f1a

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功