提交 · eb7c211a762c0961915c0f9a5d7b0010cd2746e2 · PaddlePaddle / Paddle

01 3月, 2022 20 次提交
- J
  Add mobilenetv3_large performance test for bf16 and int8 (#39738) · eb7c211a
  由 joanna.wozna.intel 提交于 3月 01, 2022
```
* Add mobilenetv3_large performance test

* Disable the BF16 test if the device does not support BF16 computations

* Change test timeout
```
  eb7c211a
- Z
  [bf16] add bf16 kernel: layer_norm p_norm reduce_sum (#39843) · ce8ed978
  由 zhangbo9674 提交于 3月 01, 2022
```
* add layer norm

* add p norm

* add reduce sum

* refine layer norm register bf16 for cudnn811

* add bf16 cast for hip

* add unittest

* refine rocm

* refine layer_norm unittest

* refine reduce op

* refine unittest

* enhance atol for reduce unittest
```
  ce8ed978
- W
  remove conv_affine_channel_fuse_pass (#39817) · fc06be9d
  由 wenbin 提交于 3月 01, 2022
```
* remove

* pass

* more pass
```
  fc06be9d
- Z
  
  add test_warpctc_op in mac (#39983) · 25650774
  由 zhangchunle 提交于 3月 01, 2022
  
  25650774
- Z
  [bf16] add bf16 kernel: scale gather sum (#39683) · 6d26b332
  由 zhangbo9674 提交于 3月 01, 2022
```
* add scale gather sum

* refine CUDA_ATOMIC_WRAPPER ADD for bf16

* add gather unittest

* solve conflict

* add scale uinttest

* add sum unittest

* solve conflict

* refine gather unittest

* refine unittest
```
  6d26b332
- G
  
  add MasterParam and MasterParamOut for sparse_momentum op (#39969) · 9de79892
  由 Guoxia Wang 提交于 3月 01, 2022
  
  9de79892
- P
  
  change tests_v2 to dynamic_tests_v2 in CI op benchmark (#39995) · 4204b97a
  由 pangyoki 提交于 3月 01, 2022
  
  4204b97a
- H
  
  update error_string when target is out of bound (#40001) · a7acfc5b
  由 HydrogenSulfate 提交于 3月 01, 2022
  
  a7acfc5b
- R
  
  [phi] migrate where kernel into phi (#39811) · 468a2a17
  由 ronnywang 提交于 3月 01, 2022
  
  468a2a17
- Z
  [PHI] Remove reseting dtype, layout and allocation by arg_def for outputs in executor (#39781) · 4fbcf6f4
  由 zyfncg 提交于 3月 01, 2022
```
* remove SetAllocationForOutputTenosr

* add place param for copy kernel

* recover SetAllocationForOutputTenosr

* polish code

* fix empty_dev api bug

* remove reseting dtype and layout for output in executor

* fix merge bug

* [Phi] Add ClearHolder when re-alloc on new place in DeviceContext

* fix hostAlloc

* remove setting output allocation

* remove full_kernel_impl.h

* fix bug of xpu full_like
Co-authored-by: NAurelius84 <zhangliujie@baidu.com>
```
  4fbcf6f4
- L
  [phi] move uniform_random to phi (#39937) · b3466387
  由 Leo Chen 提交于 3月 01, 2022
```
* move uniform_random to phi

* fit selected_rows

* replace mutable_data
```
  b3466387
- C
  [Phi] Support kps backend and kernel registry (#39941) · 08b43cce
  由 Chen Weihang 提交于 3月 01, 2022
```
* support kps backend and compile

* resolve conflict

* fix kps backend trans

* test in xpu2 device

* remove dummy kernel
```
  08b43cce
- optimize mergeadd for sparse_adam,*test=kunlun (#39966) · d4911594
  由 z8hanghuan 提交于 3月 01, 2022
```
* optimize mergeadd for sparse_adam,*test=kunlun

* optimize mergeadd for sparse_adam,*test=kunlun

* optimize mergeadd for sparse_adam, *test=kunlun
```
  d4911594
- Z
  [PHI] Support Multi Input and Output for InferShape (#39870) · e8d45583
  由 zyfncg 提交于 3月 01, 2022
```
* add multi input for infer_shape

* support multi output for infershape

* fix split bug

* fix bug of concat

* support vector<MetaTensor*> in infrt

* fix bug
```
  e8d45583
- A
  [Phi] Migrate logical_and/or/not/xor into Phi (#39942) · 8c237973
  由 Aurelius84 提交于 3月 01, 2022
```
* [Phi] Migrate logical_and/or/not/xor into Phi

* fix unittest

* fix function name
```
  8c237973
- S
  [DP] Construct reducer group (#39987) · 4da841e0
  由 ShenLiang 提交于 3月 01, 2022
```
* add reducer
```
  4da841e0
- C
  Optimize group_norm op forward (#39596) · 657dd5a9
  由 crystal 提交于 3月 01, 2022
```
* optimize group norm forward

* use vectorized optimization

* add scalar calculation code

* optimize code
```
  657dd5a9
- C
  
  remove dot infershape (#39945) · 75280d36
  由 chentianyu03 提交于 3月 01, 2022
  
  75280d36
- 王
  
  add type constrait for DenseTensor (#39967) · 4149cabe
  由王明冬提交于 3月 01, 2022
  
  4149cabe
- S
  Optimize the CUDA kernel in DistributedFusedLamb optimizer (#39972) · d17961ed
  由 sneaxiy 提交于 3月 01, 2022
```
* vectorize lamb kernel

* remove flags, add ut

* remove useless codes

* refine code, add param order
```
  d17961ed
28 2月, 2022 20 次提交

由 seemingwang 提交于 2月 28, 2022

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function

* graph-engine data transfer optimization

* support graph_split load&query

* remove logs

* change shards to pointer vector

* use inference

* remove test code

* renorm op

* simplify renorm op

* recover local changes

* recover renorm op kernel

* fix init

* add blanklines in renorm doc

* fix import

* fix import

* add renorm to init.py

* merge

* move index_sample op

* Delete api.h

* Delete api.cc

* fix

* remove logs

* recover infer shape of grad

* recover changes

* change shape

* fix label

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

1b585b28

A

[custom kernel] change kernel name judgement and remove macro control for selected_row (#39977) · 49677636
由 Aganlengzi 提交于 2月 28, 2022

49677636
R

fix where api doc (#39980) · 5471d162
由 ronnywang 提交于 2月 28, 2022

5471d162
0
[Phi]Move size, erfinv, pixel_shuffle infershape to phi (#39949) · a0cb3203
由 0x45f 提交于 2月 28, 2022
```
* move size, erfinv, pixel_shuffle infershape to phi

* fix erfinv infermeta
```
a0cb3203

Grid_sampler optimization (#39751) · 2c66775b

由 Lijunhui 提交于 2月 28, 2022

* init grid_sampler with mode=bilinear

* solve error

* rm fill constant

* rm head

* change block size

* change block size

* optimize

* apply existing config

2c66775b

Trace level env (#39926) · f335d9e1

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* Add host_trace_level env variable

* Revert "Optimize perf of softmax_with_cross_entropy (#39553)"

This reverts commit bbe5228c.
Co-authored-by: Nliutiexing <liutiexing@google.com>
Co-authored-by: NZzSean <18818272991@163.com>

f335d9e1

Profile Executor (#39641) · 7ecefec3

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* add log for Executor

* Profile Allocators

* Profile Allocators

* adjust interface

* remove lock for set

* fix
Co-authored-by: Nliutiexing <liutiexing@google.com>

7ecefec3

Add sparse conv3d kernel (#39879) · bc99a76c

由 zhangkaihuo 提交于 2月 28, 2022

* fix incorrect dims settings

* sparse conv3d

* fix out dims

* test performance

* test large shape success

* opt scatter, double performance

* test float16

* remove profiling code

* remove pten

* opt code lines

* correct boundary judgment

* only cpu

* test ci

* test ci

* remove the including paddle/fluid header; extract the conmmon function

* opt code lines

* use DenseTensor::data() instead of mutable_data

* return rulebook for backward

* specify layout

* rename:conv -> sparse_conv3d

bc99a76c

T

Change CI-Build build develop (#39863) · 61443a0e
由 tianshuo78520a 提交于 2月 28, 2022

61443a0e

[Phi] move truncated_gaussian_random kernel (#39971) · 23aa7a36

由 furnace 提交于 2月 28, 2022

* [Phi] move truncated_gaussian_random, copy kernels

* [Phi] move truncated_gaussian_random, kernel register

* [Phi] move truncated_gaussian_random, delete useless codes

23aa7a36

【infrt】add TrtOpConverterPass (#39902) · 35471b1f

由 Shang Zhizhou 提交于 2月 28, 2022

* add some trt layers

* trtOpConverter pass ok

* add comments

* add constraints to some attrs in the pd_lower_to_trt patterns

* update constraint

* fix code style

* update pass name

* update code style

* change .hpp.inc to .cc.inc in mlir_add_rewriter

35471b1f

Z
PR-CI-Py3 change cpu test (#39659) · 3cb93edf
由 zhangchunle 提交于 2月 28, 2022
```
* update;test=cpu-py3
```
3cb93edf

[Pten->Phi PR4] Rename pten in funcs to phi (#39961) · eb42dd52

由 Chen Weihang 提交于 2月 28, 2022

* rename pten_utils to phi_utils

* rename pten_utils target

* rename Pten to Phi

* replace pten with phi

* resolve conflict

eb42dd52

Update host tracer (#39975) · 406f1b96

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update HostTracer

* fix

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

406f1b96

Z
[bf16] Refine BF16 amp-o1 logic (#39815) · 18ee051e
由 zhangbo9674 提交于 2月 28, 2022
```
* refine bf16 amp-o1 logic

* refine amp GLOG

* refine unittest

* refine unittest
```
18ee051e

[PHI] adjust the empty kernel and dev_api (#39958) · d1595c26

由 zyfncg 提交于 2月 28, 2022

* remove empty kernel in fluid and adjust the param of empty dev_api

* polish code

* revert fluid empty kernel

d1595c26

W

infrt add trt engine (#39885) · 27536a32
由 Wilber 提交于 2月 28, 2022

27536a32

[Pten] Support optional param for C++ API (#39760) · aceb25e1

由 zyfncg 提交于 2月 28, 2022

* fix selected_rows bug in C++ API

* add optional for C++ APIO

* data transform support optional

* remove data transform for optional vector<Tensor>

* adjust some format of funtcion

* fix empyt bug

aceb25e1

Z

fix ps_gpu_wrapper (#39965) · bd9b9460
由 zmxdream 提交于 2月 28, 2022

bd9b9460
C
add new profiler components (#39964) · d4ae1775
由 chenjian 提交于 2月 28, 2022
```
* add new profiler components

* fix bug
```
d4ae1775

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功