提交 · b628c316148b072b73b38a6820c5328b4a782c79 · Crayon鑫 / Paddle

15 11月, 2021 12 次提交

F

fix:delete macro INFERENCE (#37130) · b628c316
由 feng_shuai 提交于 11月 15, 2021

b628c316
A
Added BF16 to mean op (#37104) · df7cc457
由 arlesniak 提交于 11月 15, 2021
```
* Added BF16 to mean op

* fix for CI

* fix for CI

* fix for CI
```
df7cc457
J

fix cinn_compile_test not pass problem (#37190) · 83eef6d2
由 jiangcheng 提交于 11月 15, 2021

83eef6d2
W
[New features] Add elementwise_mul triple grad kernel (#37152) · 59fdf4da
由 Weilong Wu 提交于 11月 15, 2021
```
* Add elementwise_mul triple grad kernel

* Removed InplaceInferer and polished code
```
59fdf4da
Z

Accessor 20211112 2 (#37181) · 84b0ec97
由 zhaocaibei123 提交于 11月 15, 2021

84b0ec97

Add distributed pass framework: including PassBase/PassTest/PassUtils (#36643) · 12339fa0

由 Zeng Jinle 提交于 11月 15, 2021

* add split_program

* make ut faster

* increase ut timeout

* make result deterministic

* add fuse_all_reduce pass

* add ut framework, update

* fix ut framework

* remove useless code

* add coverage support

* update

* fix CI

* fix some bugs and fix ci coverage

* fix conflict

12339fa0

graph-engine cache optimization (#37168) · b44db69f

由 seemingwang 提交于 11月 15, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

b44db69f

Z

fix bug of indexing with ellipsis (#37182) · f2a56c6a
由 zyfncg 提交于 11月 15, 2021

f2a56c6a
J

add fetch op for cinn graph output node of build_cinn_pass (#37172) · 10cc040d
由 jiangcheng 提交于 11月 15, 2021

10cc040d
L
Optimize Matmul_v2 (#37037) · 444a7358
由 Linjie Chen 提交于 11月 15, 2021
```
Optimize dot product of Matmul_v2 
```
444a7358
L
modify sparse_attention docs, test=document_fix (#36554) · 6b0cc2b1
由 Liu-xiandong 提交于 11月 15, 2021
```
* modify sparse_attention docs, test=develop

* add warning

* add warning ,test=document_fix
```
6b0cc2b1

[heterps]bug fix for local training with --heter_worker_num (#37166) · 31cd9145

由 zmx 提交于 11月 15, 2021

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix ut. test=develop

* fix ut. test=develop

* fix ut. test=develop

31cd9145

14 11月, 2021 1 次提交

[PTen]Reshape Kernel Refactor (#37164) · 895692e3

由 YuanRisheng 提交于 11月 14, 2021

* reshape kernel refactor

* fix compile bugs when run ci

* support xpu for reshape

* fix bugs when run unittest in kunlun ci

* fix compile bugs when run kunlun

* perfect code according to suggestion

895692e3

13 11月, 2021 1 次提交

cinn_launch_op: skip checking input variables must be used (#37119) · 228eb898

由 CtfGo 提交于 11月 13, 2021

Modify serveral implements on CinnLaunchOp：
1. Skip checking input variables must be used 
2. Move current helper functions to a CinnlaunchContext

228eb898

12 11月, 2021 22 次提交

Z
[fix]fix the bug of fused_attention and fused_feedforward (#36972) · 6486e242
由 zhangkaihuo 提交于 11月 12, 2021
```
* fix bug:
1. atten: set the default value of attn_dropout_rate to None
2. ffn: add activation parameter
```
6486e242
W

add_fc_convert_layers_name (#37157) · a76b77a5
由 Wangzheee 提交于 11月 12, 2021

a76b77a5

reduce graph-engine cache memory (#37155) · f584d378

由 seemingwang 提交于 11月 12, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

f584d378

C

infershape to infermeta (#37107) · 0fc9919b
由 Chen Weihang 提交于 11月 12, 2021

0fc9919b
C

fix test_scale_op skipped test (#37153) · ca7f1cd2
由 Chen Weihang 提交于 11月 12, 2021

ca7f1cd2
S

block showincudes (#37071) · d7d22640
由 Sing_chan 提交于 11月 12, 2021

d7d22640
Y

[fleet_executor] handle empty addr for single card train (#37150) · 2c7870e0
由 Yuang Liu 提交于 11月 12, 2021

2c7870e0
W

[fleet_executor] Add interceptor ping pong test (#37143) · 742378f4
由 WangXi 提交于 11月 12, 2021

742378f4
J

add fetch to black list (#37123) · 63c8c8c2
由 JingZhuangzhuang 提交于 11月 11, 2021

63c8c8c2
H
Fix Paddle-CINN CI (#37145) · e8ce3311
由 Huihuang Zheng 提交于 11月 12, 2021
```
Fix Paddle-CINN CI
```
e8ce3311

brpc_ps_client upgrade (#36943) · bd79ae8a

由 zhaocaibei123 提交于 11月 12, 2021

* test

* rm test

* add memory_sparse_table and brpc communication upgrade dependency

* fix

* add dense optimizer & fix dump bug & add some strategy fields

* fix

* fix

* remove thread_pool thread_queue

* add memory sparse table

* update memory sparse table

* update memory sparse table

* update cmake

* upgrade brpc_ps_client

* remove show/click_const in ctr_accessor

* fix deconstructor

bd79ae8a

[PTen] Adjust the param of full_like API in pten (#37088) · abd4ab9c

由 zyfncg 提交于 11月 12, 2021

* adjust the param of full_like api  in pten

* adjust the code format

* adjust the code format

* adjust the code format

abd4ab9c

Refine new executor (#37074) · 1fe4513c

由 Leo Chen 提交于 11月 12, 2021

* split declaration and implementation

* remove initdevices

* refine VariableMetaInfo

* add ut

* fix compile

1fe4513c

Z
[heterps]fix ut for heter_pipeline_trainer.cc (#37136) · 0a92c857
由 zmx 提交于 11月 12, 2021
```
* fix ut. test=develop

* fix ut. test=develop
```
0a92c857
W
[Paddle-Inference] fix_qkv_plugin: fix half scale (#37096) · 36154ba9
由 Wangzheee 提交于 11月 12, 2021
```
* fix_qkv_plugin: half_scale

* [Paddle-Inference] fix_qkv_plugin: fix half scale
```
36154ba9

[CPU-PSLIB] Fix bug for consistency insepection of op's embedding name and... · 9574bcd7

由 Fan Zhang 提交于 11月 12, 2021

[CPU-PSLIB] Fix bug for consistency insepection of op's embedding name and sparse table name in config_fleet.py (#36753)

* [CPU-PSLIB] Fix bug for consistency insepection of op's embedding name and sparse table name in config_fleet.py

* [CPU-PSLIB] Fix bug for consistency insepection of op's embedding name and sparse table name in config_fleet.py

9574bcd7

石

add the shallow clone member func of the dense tensor, test=develop (#37146) · 9303b095
由石晓伟提交于 11月 12, 2021

9303b095
石

adjust the COLUMNS=128; (#37120) · 4d536678
由石晓伟提交于 11月 12, 2021

4d536678
A

[NPU] fix fill_constant and test_memcpy_op_npu (#37144) · 9396f286
由 Aganlengzi 提交于 11月 12, 2021

9396f286

[AutoParallel] Add AutoConvert (#36958) · 1773afd7

由 zhaoyingli 提交于 11月 12, 2021

* add AutoConvert

* add unitest

* amend merge&slice

* amend default dist_attr

* update doc&improve coverage

* add interface dist_context

* tiny modify

1773afd7

[Pten]Refactor the Elementwise_add Kernel (#37043) · c1310343

由 YuanRisheng 提交于 11月 12, 2021

* elementwise_add kernel refactor

* fix compile bugs in elementwise_add refactor

* fix compile bugs when run in npu/xpu

* fix bugs when run unit test

* fix bugs when run ci-windows

* modify code as recommended

* code format adjust

* fix bugs when run ci

* fix compile bug when run in ci-windwos

c1310343

Y

[fleet_executor] Parse rank_to_ip map on cpp side and start message bus. (#37126) · 6bf208c3
由 Yuang Liu 提交于 11月 12, 2021

6bf208c3

11 11月, 2021 4 次提交
- C
  Update readme 2.2 (#37148) · 778a3630
  由 Chen Long 提交于 11月 11, 2021
```
* update readme test=document_fix

* update readme;test=document_fix
```
  778a3630
- Z
  
  Fix unit test for send_and_recv_cpu & send_and_recv_gpu (#37129) · a41447f0
  由 zmx 提交于 11月 11, 2021
  
  a41447f0
- remove repeated linalg in __all__ (#37117) · 357425d8
  由 zhouweiwei2014 提交于 11月 11, 2021
  
  357425d8
- W
  [Bug fixes] Add default arg to enhance varbase ClearGradient func (#36837) · 63f5c2d4
  由 Weilong Wu 提交于 11月 11, 2021
```
* Add default arg to enhance varbase ClearGradient func

* Removed default arg, use a Flag to enhance varbase ClearGradient func

* Renamed Flags to FLAGS_real_release

* Use default arg to enhance varbase ClearGradient func and expose two func to set/get gradient isEmpty

* Removed DECLARE_bool statement

* Polished Code
```
  63f5c2d4

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致