提交 · 9fc11db74fd47bff98b341af2e255d3dc0cb19ca · SummerGao. / Paddle

19 11月, 2021 5 次提交

optimize graph-engine sample api's data-transfer process (#37341) · 9fc11db7

由 seemingwang 提交于 11月 19, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function

* graph-engine data transfer optimization
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

9fc11db7

Add paddle.incubate.graph_send_recv API (#37205) · 39012536

由 Siming Dai 提交于 11月 19, 2021

* add cpu version, using set: sum, min, max

* add cpu version: mean

* improve cpu code and fix dynamic memory allcation problem

* fix arg error, add index judge, delete fp16

* fix bug in CudaAtomicMax and CudaAtomicMin

* add CUDA version

* fix grad_op bug for index

* add op test, add correct cpu grad op

* Add correct CUDA Mean grad

* [Add] Successful MEAN and SUM

* [Add] Successful MIN and MAX in CPU

* [Add] Successful MIN and MAX in CUDA

* fix windows dtype ci

* fix ROCM ci by adding HIP flag

* rename fused_gather_scatter to send_recv

* unify name as send and recv

* change zero index return time

* add send_recv incubate api

* fix index data type, add unittest case for API

* delete redundant input tensor

* fix en example and docs, add default value in pool_type

* add shape judge and max grid judge

* fix comment

* fix index type bug

* add const &

* fix en docs

* delete numpy in examples

* add unittest for int input

* fix send_recv comment

* change send_recv to graph_send_recv

39012536

Y

[fleet_executor] Parse pipeline config (#37319) · ca088f92
由 Yuang Liu 提交于 11月 19, 2021

ca088f92
W

[fleet_executor] Add interceptor register (#37338) · f11e843a
由 WangXi 提交于 11月 19, 2021

f11e843a
L

fix cmake dependence error (#37304) · 6653ac5e
由 LiYuRio 提交于 11月 19, 2021

6653ac5e

18 11月, 2021 7 次提交

J
Fix for wrong results in segmentation models (#37310) · c1802f91
由 jakpiase 提交于 11月 18, 2021
```
* fix

* ci rerun

* ci rerun

* ci Rerun
```
c1802f91
optimize the data structure to speed up sampling in graph engine. (#37315) · 521a274e
由 Webbley 提交于 11月 18, 2021
```
* optimize the data structure from c++ to python to speed up sampling in graph engine

* update test
```
521a274e
L
fix bug to support dropout eval grad computing. (#37305) · c3d3001f
由 Li Min 提交于 11月 18, 2021
```
* fix bug to support dropout eval grad computing.

* Remove useless code.
```
c3d3001f

[PTen]elementwise_sub kernel refactor (#37260) · 36a95654

由 YuanRisheng 提交于 11月 18, 2021

* elementwise_add kernel refactor

* fix compile bugs in elementwise_add refactor

* fix compile bugs when run in npu/xpu

* fix bugs when run unit test

* fix bugs when run ci-windows

* modify code as recommended

* code format adjust

* fix bugs when run ci

* fix compile bug when run in ci-windwos

* elementwise_sub refactor

* add PD_DLL_DECL for elementwise_sub

* fix bugs when compilei

36a95654

Y

[fleet_executor] Parse runtime graph to start carrier (#37282) · f85bd5c9
由 Yuang Liu 提交于 11月 18, 2021

f85bd5c9

Add the `GetFetchNames` method in CinnGraphSymbolization. (#37218) · 3ad495e8

由 Zhen Wang 提交于 11月 18, 2021

* Add the `GetFetchNames` method in CinnGraphSymbolization.

* Use unordered_set instead vector as the type of fetch_var_names.

* Reuse the definition of kCompilationKey.

* Use CompileOptions to set fetch_var_ids.

* Update the argument passing of GraphCompiler.Build.

* Fix some bugs in CinnGraphSymbolization::GetFetchIds.

3ad495e8

Opt topk (#37256) · c4862d99

由 zhangkaihuo 提交于 11月 18, 2021

topk中有cub和手写kernel两种实现，而cub是通过排序来获取topk，通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。

c4862d99

17 11月, 2021 12 次提交

Replace custom IOHW -> OIHW reorder with build-in oneDNN reorder (#37175) · 162ac048

由 Sławomir Siwek 提交于 11月 17, 2021

* Use oneDNN reorder instead of custom one

* Fix whitespace typo

* Fix Code format error

* Incorporating feedback

* Remove unncessary reorder

* Support GIOHW format

* Fix code format error

162ac048

L
[new-exec] Refine standalone executor (#37278) · 6d6642c8
由 Leo Chen 提交于 11月 17, 2021
```
* init

* add feed ops in python side

* import LRScheduler

* update_feed

* refine code format
```
6d6642c8

Changed first batch of deprecated mkldnn headers and function names to new oneDNN names (#37040) · ce3ee9bb

由 piotrekobiIntel 提交于 11月 17, 2021

* Change first batch of mkldnn headers and namespace names to dnnl

* Revert changes to tensor.h, which require approval

* Format changes with pre-commit

* Add int32 tests

* Fix int32 tests and call GetDataFromTensor for int32

* Fix test

ce3ee9bb

N
Modify reduce_op.op.h for xpu2 with kernel primitive api (#36904) · 9c5d5665
由 niuliling123 提交于 11月 17, 2021
```
* Modify reduce_op.op.h for xpu2 with kernel primitive api
```
9c5d5665
A

Fix data transform bug in new executor (#37280) · 1460b761
由 Aurelius84 提交于 11月 17, 2021

1460b761
Z

update dataset (#37194) · ca8c4f3e
由 zhaocaibei123 提交于 11月 17, 2021

ca8c4f3e

[heterps]Refactor heterogenous worker (#37244) · 54d2626a

由 zmx 提交于 11月 17, 2021

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* refactor heter trainer. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

54d2626a

D

fix compile error when pslib use cpu branch;test=develop (#37248) · 0057c12d
由 danleifeng 提交于 11月 17, 2021

0057c12d
L
copy beta pow to same place when skip_update=1 (#37245) · 5e4b419b
由 Leo Chen 提交于 11月 17, 2021
```
* copy beta pow to same place when skip_update=1

* fix xpu
```
5e4b419b
L

[Fleet Executor] Construct runtime graph (#37158) · 0daa69d4
由 LiYuRio 提交于 11月 17, 2021

0daa69d4
W

[npu][hybrid] support offload (#37224) · 762819a8
由 WangXi 提交于 11月 17, 2021

762819a8

Dependence analysis (#37231) · d943459b

由 xiongkun 提交于 11月 17, 2021

* add

* add BuildOperatorDependences

* fix bug

* add unittest for write after write

* fix merge bug

* fix

d943459b

16 11月, 2021 10 次提交

C

decrease pten log level (#37239) · d8982c52
由 Chen Weihang 提交于 11月 16, 2021

d8982c52
A
Added BF16 Pool2d grad (#37081) · f95d44a2
由 arlesniak 提交于 11月 16, 2021
```
* Added BF16 Pool2d grad

* upstream pulled

* fix for CI

* fixes after review
```
f95d44a2
D

[psgpu]fix pipe bug:save and pull overlap; test=develop (#37233) · 62ec644f
由 danleifeng 提交于 11月 16, 2021

62ec644f
W

Removed unnecessary ENFORCE statement (#37219) · 70b7c7ed
由 Weilong Wu 提交于 11月 16, 2021

70b7c7ed

Add API and unit test for reshape (#37232) · 79b49c20

由 YuanRisheng 提交于 11月 16, 2021

* reshape kernel refactor

* fix compile bugs when run ci

* support xpu for reshape

* fix bugs when run unittest in kunlun ci

* fix compile bugs when run kunlun

* perfect code according to suggestion

* add api and unit test for reshape

79b49c20

Z
for pure fp16 (#37230) · 6ebc318e
由 zhangkaihuo 提交于 11月 16, 2021
```
Add pure fp16 support for fused transformer.
```
6ebc318e
Y
Make FLAGS_determinstic effective in conv2d forward. (#37173) · ea47d211
由 Yiqun Liu 提交于 11月 16, 2021
```
* Make FLAGS_determinstic effective in conv2d forward.

* Add call of SetCinnCudnnDeterministic in cinn_launch op.
```
ea47d211
J

added onednn elu kernel (#37149) · ae40ee32
由 jakpiase 提交于 11月 16, 2021

ae40ee32

Fix attn_bias_add bug. (#37147) · a9e7a854

由 Li Min 提交于 11月 16, 2021

fused_attention_op的实现中，使用了bias_add，且其实现是通过使用kernel primitive来实现的，之后kernel primitive的WriteData api接口及函数内部实现发生了更改，将判断越界的逻辑移到了template的参数中，使得调用的分支有错误，产生了越界赋值操作，污染了别的显存空间的内容。具体表现为：test_fused_attention_op_api.py 单次执行基本上不会报错，多次循环执行不同shape的输入，结果计算不对，具有偶发性，bug不易察觉。

a9e7a854

Y

[fleet_executor] Add sync method (#37167) · f49c2c23
由 Yuang Liu 提交于 11月 16, 2021

f49c2c23

15 11月, 2021 6 次提交

[Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a

由 Chen Weihang 提交于 11月 15, 2021

* move extension into pten [no-verify]

* append tensor methods by ext_tensor [no-verify]

* append other tensor methods [no-verify]

* ext related files tidy [no-verify]

* include relation tidy [no-verify]

* add pten tensor test [no-verify]

* replace tensor in custom op & compile success

* refine tensor constructor for unittest

* custom relu jit run success

* fix all custom op unittests

* add inference cmake adapt [no-verify]

* fix failed unittests

* fix windows failed unittests

* try to fix kunlun and inference failed

* fix test_elementwise_api error

* try to fix win compile failed

* fix kunlun fp16 type error

* remove useless haddle error macro

* add custom linear op test

* fix compile failed & add win symbols

* fix non pten kernel cast failed

* add dll decl for api

* polish several deetails

* polish details by review comment

* add dll_decl for register

1e598f1a

[new-exec] fix stream analysis (#37161) · 584b4b24

由 Leo Chen 提交于 11月 15, 2021

* fix revord_event

* refine class Instruction

* refine Instruction and InterpreterCore

* make instruction and operator_base consistent

* support NoNeedBufferVar in stream_analyzer

* fix place of event

* add vlog before continue

584b4b24

remove input dim check in op_teller and update ut (#37097) · 6b21bb0b

由 baoachun 提交于 11月 15, 2021

* remove input dim check of activation in op_teller

* remove input dim check of concat in op_teller

* remove input dim check of clip in op_teller

* remove input dim check of scale in op_teller

* remove input dim check in op_teller

* update attr check of slice in op_teller

6b21bb0b

Y

fix ctest depent probs (#37203) · cf958f2f
由 Yuang Liu 提交于 11月 15, 2021

cf958f2f
W
fix 3 bug of new_executor (#37142) · 8358d614
由 wanghuancoder 提交于 11月 15, 2021
```
* fix 3 bug, test=develop

* refine, test=develop
```
8358d614
F

fix:delete macro INFERENCE (#37130) · b628c316
由 feng_shuai 提交于 11月 15, 2021

b628c316

SummerGao. / Paddle 与 Fork 源项目一致

SummerGao. / Paddle
与 Fork 源项目一致