提交 · 1223238f734664bc4d902a70d4d622ebd3591836 · PaddlePaddle / Paddle

17 11月, 2021 7 次提交
- Z
  
  add ut parallel (#37211) · 1223238f
  由 zhangchunle 提交于 11月 17, 2021
  
  1223238f
- L
  copy beta pow to same place when skip_update=1 (#37245) · 5e4b419b
  由 Leo Chen 提交于 11月 17, 2021
```
* copy beta pow to same place when skip_update=1

* fix xpu
```
  5e4b419b
- Z
  
  rename TensorBase interface data_type() to dtype() (#37257) · 1e9b3a3d
  由 zyfncg 提交于 11月 17, 2021
  
  1e9b3a3d
- L
  
  [Fleet Executor] Construct runtime graph (#37158) · 0daa69d4
  由 LiYuRio 提交于 11月 17, 2021
  
  0daa69d4
- W
  
  [npu][hybrid] support offload (#37224) · 762819a8
  由 WangXi 提交于 11月 17, 2021
  
  762819a8
- T
  [Einsum] correct output dimension errors. (#37222) · 5237cc05
  由 Tongxin Bai 提交于 11月 17, 2021
```
* [Einsum] correct output dimension errors due to single element tensors.

* [Einsum] format polish.
```
  5237cc05
- X
  Dependence analysis (#37231) · d943459b
  由 xiongkun 提交于 11月 17, 2021
```
* add

* add BuildOperatorDependences

* fix bug

* add unittest for write after write

* fix merge bug

* fix
```
  d943459b
16 11月, 2021 16 次提交
- C
  
  decrease pten log level (#37239) · d8982c52
  由 Chen Weihang 提交于 11月 16, 2021
  
  d8982c52
- A
  Added BF16 Pool2d grad (#37081) · f95d44a2
  由 arlesniak 提交于 11月 16, 2021
```
* Added BF16 Pool2d grad

* upstream pulled

* fix for CI

* fixes after review
```
  f95d44a2
- D
  
  [psgpu]fix pipe bug:save and pull overlap; test=develop (#37233) · 62ec644f
  由 danleifeng 提交于 11月 16, 2021
  
  62ec644f
- W
  
  Fix the logic of VarBase _to func (#37193) · f29a3c68
  由 Weilong Wu 提交于 11月 16, 2021
  
  f29a3c68
- Z
  
  refine pass by removing CommOpt, CalcOpt, ParallelOpt (#37206) · 4c160be2
  由 Zeng Jinle 提交于 11月 16, 2021
  
  4c160be2
- W
  
  Removed unnecessary ENFORCE statement (#37219) · 70b7c7ed
  由 Weilong Wu 提交于 11月 16, 2021
  
  70b7c7ed
- Y
  Add API and unit test for reshape (#37232) · 79b49c20
  由 YuanRisheng 提交于 11月 16, 2021
```
* reshape kernel refactor

* fix compile bugs when run ci

* support xpu for reshape

* fix bugs when run unittest in kunlun ci

* fix compile bugs when run kunlun

* perfect code according to suggestion

* add api and unit test for reshape
```
  79b49c20
- Z
  for pure fp16 (#37230) · 6ebc318e
  由 zhangkaihuo 提交于 11月 16, 2021
```
Add pure fp16 support for fused transformer.
```
  6ebc318e
- T
  
  test=document_fix (#37234) · 56810f45
  由 tianshuo78520a 提交于 11月 16, 2021
  
  56810f45
- Z
  Make Distributed Pass UT Timeout Smaller (#37199) · a01e27cc
  由 Zeng Jinle 提交于 11月 16, 2021
```
* make pass ut timeout smaller

* increate ut timeout
```
  a01e27cc
- Y
  Make FLAGS_determinstic effective in conv2d forward. (#37173) · ea47d211
  由 Yiqun Liu 提交于 11月 16, 2021
```
* Make FLAGS_determinstic effective in conv2d forward.

* Add call of SetCinnCudnnDeterministic in cinn_launch op.
```
  ea47d211
- S
  
  modify long time ut list (#37220) · 5091fed7
  由 Sing_chan 提交于 11月 16, 2021
  
  5091fed7
- J
  
  added onednn elu kernel (#37149) · ae40ee32
  由 jakpiase 提交于 11月 16, 2021
  
  ae40ee32
- L
  Fix attn_bias_add bug. (#37147) · a9e7a854
  由 Li Min 提交于 11月 16, 2021
```
fused_attention_op的实现中，使用了bias_add，且其实现是通过使用kernel primitive来实现的，之后kernel primitive的WriteData api接口及函数内部实现发生了更改，将判断越界的逻辑移到了template的参数中，使得调用的分支有错误，产生了越界赋值操作，污染了别的显存空间的内容。具体表现为：test_fused_attention_op_api.py 单次执行基本上不会报错，多次循环执行不同shape的输入，结果计算不对，具有偶发性，bug不易察觉。
```
  a9e7a854
- 石
  
  supports the slice of upper tensor, test=develop (#37215) · c5ccff73
  由石晓伟提交于 11月 16, 2021
  
  c5ccff73
- Y
  
  [fleet_executor] Add sync method (#37167) · f49c2c23
  由 Yuang Liu 提交于 11月 16, 2021
  
  f49c2c23
15 11月, 2021 17 次提交

[Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a

由 Chen Weihang 提交于 11月 15, 2021

* move extension into pten [no-verify]

* append tensor methods by ext_tensor [no-verify]

* append other tensor methods [no-verify]

* ext related files tidy [no-verify]

* include relation tidy [no-verify]

* add pten tensor test [no-verify]

* replace tensor in custom op & compile success

* refine tensor constructor for unittest

* custom relu jit run success

* fix all custom op unittests

* add inference cmake adapt [no-verify]

* fix failed unittests

* fix windows failed unittests

* try to fix kunlun and inference failed

* fix test_elementwise_api error

* try to fix win compile failed

* fix kunlun fp16 type error

* remove useless haddle error macro

* add custom linear op test

* fix compile failed & add win symbols

* fix non pten kernel cast failed

* add dll decl for api

* polish several deetails

* polish details by review comment

* add dll_decl for register

1e598f1a

[new-exec] fix stream analysis (#37161) · 584b4b24

由 Leo Chen 提交于 11月 15, 2021

* fix revord_event

* refine class Instruction

* refine Instruction and InterpreterCore

* make instruction and operator_base consistent

* support NoNeedBufferVar in stream_analyzer

* fix place of event

* add vlog before continue

584b4b24

C

remove needless declare (#37195) · 9c591703
由 Chen Weihang 提交于 11月 15, 2021

9c591703

remove input dim check in op_teller and update ut (#37097) · 6b21bb0b

由 baoachun 提交于 11月 15, 2021

* remove input dim check of activation in op_teller

* remove input dim check of concat in op_teller

* remove input dim check of clip in op_teller

* remove input dim check of scale in op_teller

* remove input dim check in op_teller

* update attr check of slice in op_teller

6b21bb0b

Y

fix ctest depent probs (#37203) · cf958f2f
由 Yuang Liu 提交于 11月 15, 2021

cf958f2f
W
fix 3 bug of new_executor (#37142) · 8358d614
由 wanghuancoder 提交于 11月 15, 2021
```
* fix 3 bug, test=develop

* refine, test=develop
```
8358d614
F

fix:delete macro INFERENCE (#37130) · b628c316
由 feng_shuai 提交于 11月 15, 2021

b628c316
A
Added BF16 to mean op (#37104) · df7cc457
由 arlesniak 提交于 11月 15, 2021
```
* Added BF16 to mean op

* fix for CI

* fix for CI

* fix for CI
```
df7cc457
J

fix cinn_compile_test not pass problem (#37190) · 83eef6d2
由 jiangcheng 提交于 11月 15, 2021

83eef6d2
W
[New features] Add elementwise_mul triple grad kernel (#37152) · 59fdf4da
由 Weilong Wu 提交于 11月 15, 2021
```
* Add elementwise_mul triple grad kernel

* Removed InplaceInferer and polished code
```
59fdf4da
Z

Accessor 20211112 2 (#37181) · 84b0ec97
由 zhaocaibei123 提交于 11月 15, 2021

84b0ec97

Add distributed pass framework: including PassBase/PassTest/PassUtils (#36643) · 12339fa0

由 Zeng Jinle 提交于 11月 15, 2021

* add split_program

* make ut faster

* increase ut timeout

* make result deterministic

* add fuse_all_reduce pass

* add ut framework, update

* fix ut framework

* remove useless code

* add coverage support

* update

* fix CI

* fix some bugs and fix ci coverage

* fix conflict

12339fa0

graph-engine cache optimization (#37168) · b44db69f

由 seemingwang 提交于 11月 15, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

b44db69f

Z

fix bug of indexing with ellipsis (#37182) · f2a56c6a
由 zyfncg 提交于 11月 15, 2021

f2a56c6a
J

add fetch op for cinn graph output node of build_cinn_pass (#37172) · 10cc040d
由 jiangcheng 提交于 11月 15, 2021

10cc040d
L
Optimize Matmul_v2 (#37037) · 444a7358
由 Linjie Chen 提交于 11月 15, 2021
```
Optimize dot product of Matmul_v2 
```
444a7358
L
modify sparse_attention docs, test=document_fix (#36554) · 6b0cc2b1
由 Liu-xiandong 提交于 11月 15, 2021
```
* modify sparse_attention docs, test=develop

* add warning

* add warning ,test=document_fix
```
6b0cc2b1

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功