提交 · bb2733fa2f399187017e967f506816ab4a99d3b3 · PaddlePaddle / Paddle

19 11月, 2021 21 次提交

W

Add dygraph triple grad test, broadcast case (#37377) · bb2733fa
由 Weilong Wu 提交于 11月 19, 2021

bb2733fa
L

bug fix shard_index (#37042) · b505ff96
由 lilong12 提交于 11月 19, 2021

b505ff96
add new API paddle.nn.initializer.Orthogonal and calculate_gain (#37163) · 62ad3594
由 zhouweiwei2014 提交于 11月 19, 2021
```
* add new API paddle.nn.initializer.Orthogonal and calculate_gain

* fix comment

* fix comment
```
62ad3594
L

Fix runtime graph on gpt, add debug message (#37361) · af83e79a
由 LiYuRio 提交于 11月 19, 2021

af83e79a
J
Optimize cinn_cache_key by replace GraphToProgram to Dot string (#37317) · edc3496f
由 jiangcheng 提交于 11月 19, 2021
```
* optimize cache-key by replace GraphToProgram to Dot string

* fix compile failure bug
```
edc3496f

Fix CI bug caused by type of TensorMeta (#37373) · d29cc7b4

由 zyfncg 提交于 11月 19, 2021

* rename TensorBase interface data_type() to dtype()

* rename type to dtype of TensorMeta

* merge the code

* merge the code

* fix the problem when merge conflict

* fix bug of ci caused by type of tensor_meta

* changes cmake to clear cache

d29cc7b4

S

make third_party's cmake get source code directly (#37332) · da5fb1d4
由 Sing_chan 提交于 11月 19, 2021

da5fb1d4

Add fuse_resnet_unit pass (#36818) · 3cd3bf29

由 wuhuanzhou 提交于 11月 19, 2021

* GeneratePass support attr condition and mapping, test=develop

* fix coverage, test=develop

* Add fuse_resnet_unit pass, test=develop

* fix CI errors, test=develop

* fix CI errors, test=develop

* fix unittest error when compiling without CUDA, test=develop

* fix static ci error, test=develop

* limit kernel size must equal 1, test=develop

3cd3bf29

F

fix for cufft: some early versions of cufft do not define CUFFT_VERSION in the header (#37312) · d8191d06
由 Feiyu Chan 提交于 11月 19, 2021

d8191d06
W

fix bug in save_inference_model (#37362) · 77bca4de
由 wangguanqun 提交于 11月 19, 2021

77bca4de
T
Update OP-benchamrk CI scripts (#37360) · 2e758325
由 tianshuo78520a 提交于 11月 19, 2021
```
Update OP-benchamrk CI scripts 
```
2e758325

Refactor dygraph to eager (#37318) · b962f5fe

由 Jiabin Yang 提交于 11月 19, 2021

* Add EagerTensor and tests

* remove useless enforce

* remove comment in cmake

* fix test_error

* add depends on python

b962f5fe

optimize graph-engine sample api's data-transfer process (#37341) · 9fc11db7

由 seemingwang 提交于 11月 19, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function

* graph-engine data transfer optimization
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

9fc11db7

【PTen】Rename TensorMeta member type to dtype (#37277) · c13edf66

由 zyfncg 提交于 11月 19, 2021

* rename TensorBase interface data_type() to dtype()

* rename type to dtype of TensorMeta

* merge the code

* merge the code

* fix the problem when merge conflict

c13edf66

[PTen] Add copy_to and to method for Tensor (#37262) · 5a000900

由 Chen Weihang 提交于 11月 18, 2021

* add copy_to and to method for Tensor

* polish msg format

* fix details error

* fix copy_to test compile failed

* fix typo

5a000900

Add paddle.incubate.graph_send_recv API (#37205) · 39012536

由 Siming Dai 提交于 11月 19, 2021

* add cpu version, using set: sum, min, max

* add cpu version: mean

* improve cpu code and fix dynamic memory allcation problem

* fix arg error, add index judge, delete fp16

* fix bug in CudaAtomicMax and CudaAtomicMin

* add CUDA version

* fix grad_op bug for index

* add op test, add correct cpu grad op

* Add correct CUDA Mean grad

* [Add] Successful MEAN and SUM

* [Add] Successful MIN and MAX in CPU

* [Add] Successful MIN and MAX in CUDA

* fix windows dtype ci

* fix ROCM ci by adding HIP flag

* rename fused_gather_scatter to send_recv

* unify name as send and recv

* change zero index return time

* add send_recv incubate api

* fix index data type, add unittest case for API

* delete redundant input tensor

* fix en example and docs, add default value in pool_type

* add shape judge and max grid judge

* fix comment

* fix index type bug

* add const &

* fix en docs

* delete numpy in examples

* add unittest for int input

* fix send_recv comment

* change send_recv to graph_send_recv

39012536

Y

[fleet_executor] Parse pipeline config (#37319) · ca088f92
由 Yuang Liu 提交于 11月 19, 2021

ca088f92
W

[fleet_executor] Add interceptor register (#37338) · f11e843a
由 WangXi 提交于 11月 19, 2021

f11e843a
C
[PTen] Add compatible reshape method for Tensor (#37281) · 715fd051
由 Chen Weihang 提交于 11月 18, 2021
```
* add reshape method for Tensor

* fix typo

* fix typo

* fix conflit with develop
```
715fd051
L

fix cmake dependence error (#37304) · 6653ac5e
由 LiYuRio 提交于 11月 19, 2021

6653ac5e
0
[Dy2stat]Support `for i in [1,2,3]` statements in dy2stat (#37259) · d772a9aa
由 0x45f 提交于 11月 19, 2021
```
* support `for i in [1,2,3]` statements in dy2stat

* add test case

* fix ci

* remove wrong code
```
d772a9aa

18 11月, 2021 17 次提交

[heterps]change default executor for heter trainer (#37314) · c98d175d

由 zmx 提交于 11月 18, 2021

* fix pslib. test=develop

* add device to train_from_dataset. test=develop

* refine fleet.stop_worker. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix executor & ut. test=develop

* fix executor & ut. test=develop

* fix executor & ut. test=develop

c98d175d

T

test=document_fix (#37330) · 8fd8780e
由 tianshuo78520a 提交于 11月 18, 2021

8fd8780e
J
Fix for wrong results in segmentation models (#37310) · c1802f91
由 jakpiase 提交于 11月 18, 2021
```
* fix

* ci rerun

* ci rerun

* ci Rerun
```
c1802f91
optimize the data structure to speed up sampling in graph engine. (#37315) · 521a274e
由 Webbley 提交于 11月 18, 2021
```
* optimize the data structure from c++ to python to speed up sampling in graph engine

* update test
```
521a274e
L
fix bug to support dropout eval grad computing. (#37305) · c3d3001f
由 Li Min 提交于 11月 18, 2021
```
* fix bug to support dropout eval grad computing.

* Remove useless code.
```
c3d3001f

Optimize fleet elastic scale in/out (#37177) · 6d34d266

由 xiayanming 提交于 11月 18, 2021

* fleet support elastic train

* fleet support elastic train

* support elastic

* add unittest

* fix unitest bug

* fix unittest bug

* fix unittest bug

* fix unittest coverage

* fix unittest coverage

* fix unittest coverage

* fix unittest coverage

* fix unittest coverage

* fix elastic bug

* fix ci fail

* fix ci fail

* fix elastic bug

* fix elastic bug

* fix joint debugging bug

* fix joint debugging bug

* fix windows ci failed

* fix windows ci failed

* Optimize fleet elastic scale in/out

* elastic support pre hook

* add prehook unittest

6d34d266

[PTen]elementwise_sub kernel refactor (#37260) · 36a95654

由 YuanRisheng 提交于 11月 18, 2021

* elementwise_add kernel refactor

* fix compile bugs in elementwise_add refactor

* fix compile bugs when run in npu/xpu

* fix bugs when run unit test

* fix bugs when run ci-windows

* modify code as recommended

* code format adjust

* fix bugs when run ci

* fix compile bug when run in ci-windwos

* elementwise_sub refactor

* add PD_DLL_DECL for elementwise_sub

* fix bugs when compilei

36a95654

Z

Fix Layer.to() of device bug (#37156) · 706a7897
由 zhangbo9674 提交于 11月 18, 2021

706a7897
S

update unittest timeout (#37279) · 34a44d59
由 Shang Zhizhou 提交于 11月 18, 2021

34a44d59
Z

[heterps]add heterps mode judgement (#37298) · dd7189ff
由 zmx 提交于 11月 18, 2021

dd7189ff
Y

[fleet_executor] Parse runtime graph to start carrier (#37282) · f85bd5c9
由 Yuang Liu 提交于 11月 18, 2021

f85bd5c9
L
polish unittest of test_pretrained_model (#37307) · 38141036
由 LielinJiang 提交于 11月 18, 2021
```
* fix cache

* Fix unittest
```
38141036
L
Fix the slow running speed of kl_div when option 'reduction' is set (#37283) · a6e9ff85
由 LielinJiang 提交于 11月 18, 2021
```
* Fix the slow running speed of kl_div when option reduction is set

* fix unittest coverage
```
a6e9ff85
L

Fix the issue of disordered loading cifar data (#37272) · 99909520
由 LielinJiang 提交于 11月 18, 2021

99909520
T
add benchmark ci(#37295) · 6a813d83
由 tianshuo78520a 提交于 11月 18, 2021
```
* add benchmark
```
6a813d83

Add the `GetFetchNames` method in CinnGraphSymbolization. (#37218) · 3ad495e8

由 Zhen Wang 提交于 11月 18, 2021

* Add the `GetFetchNames` method in CinnGraphSymbolization.

* Use unordered_set instead vector as the type of fetch_var_names.

* Reuse the definition of kCompilationKey.

* Use CompileOptions to set fetch_var_ids.

* Update the argument passing of GraphCompiler.Build.

* Fix some bugs in CinnGraphSymbolization::GetFetchIds.

3ad495e8

Opt topk (#37256) · c4862d99

由 zhangkaihuo 提交于 11月 18, 2021

topk中有cub和手写kernel两种实现，而cub是通过排序来获取topk，通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。

c4862d99

17 11月, 2021 2 次提交

Replace custom IOHW -> OIHW reorder with build-in oneDNN reorder (#37175) · 162ac048

由 Sławomir Siwek 提交于 11月 17, 2021

* Use oneDNN reorder instead of custom one

* Fix whitespace typo

* Fix Code format error

* Incorporating feedback

* Remove unncessary reorder

* Support GIOHW format

* Fix code format error

162ac048

L
[new-exec] Refine standalone executor (#37278) · 6d6642c8
由 Leo Chen 提交于 11月 17, 2021
```
* init

* add feed ops in python side

* import LRScheduler

* update_feed

* refine code format
```
6d6642c8

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功