提交 · 909d1e617c36cf19822cb3b96ea14783cda6dfff · BaiXuePrincess / Paddle

03 3月, 2022 3 次提交

N
Modified Reduce for XPU2 (#38918) · 909d1e61
由 niuliling123 提交于 3月 03, 2022
```
1. set xpu2 block_size = 64
2. fix a bug when reduce_num is too large
```
909d1e61
Z
Implement SparseConv3d kernel (#39784) · 6bf85eaf
由 zhangkaihuo 提交于 3月 03, 2022
```
* sparse conv3d: gpu code
```
6bf85eaf

由 hong 提交于 3月 03, 2022

* add bn cpu version; test=develop

* move batch norm to pten

* move batch norm to pten; test=develop

* fix bug; test=develop

* fix func::tranpose depend bug; test=develop

* fix compile bugs; test=develop

* fix use_op batch_norm bug; test=develop

* fix cudnn bn add relu test; test=develop

* fix pten context build and double grad bug; test= develop

* remve useless code; test=develop

* add batch norm gpu fp16 support; test=develop

* fix test bn op bug; test=develop

* remove output dtype set; test=develop

* fix bug; test=develop

* fix bug; test=develop

* fix applay pass to program bug; test=develop

* revert to develop; test=develop

* fix rocm bug; test=develop

* revert operator to develop; test=develop

* fix pre_commit; test=develop

* fix statci check error; test=develop

* resolve conflict; test=develop

* ana batch norm bug;

* revert batch norm op

* resolve conlict

* fix nan inf and speed bug; test=develop

* fix bug; test=develop

* fix error; test=develop

* test expand op; test=develop

* fix bug; test=develop

* resolve confilct

* resolve confilct; test=develop

* polish code; test=develop

* polish code; test=develop

* change mutable data to ctx alloc; test=develop

* make format same with ci; test=develop

* fix format error with ci; test=develop

ebd0f512

02 3月, 2022 11 次提交

Move sgd to phi (#40045) · f3d54e2e

由 hong 提交于 3月 02, 2022

* move sgd to phi; test=develop

* update

* add sgd kernel; test=develop

f3d54e2e

S
Move gather.h/gather.cu.h/scatter.h/scatter.cu.h to the phi library (#40043) · 09258040
由 sneaxiy 提交于 3月 02, 2022
```
* move gather.h gather.cu.h scatter.h scatter.cu.h to phi library

* fix CI

* fix rocm ci
```
09258040
Y
[Phi]Move elementwise function to funcs directory (#39986) · 5898e9ab
由 YuanRisheng 提交于 3月 02, 2022
```
* move elementwise function to funcs directory

* fix compile bugs

* modify according to comment
```
5898e9ab

Move transpose to pten (#39327) · 7a857924

由 hong 提交于 3月 02, 2022

* immigrate_transpose_to_pten cpu kernel only; test=develop

* fix bug; test=develop

* add transpose cuda api

* bug fix;

* fix bugs

* fix bugs; test=develop

* bug fix;

* move transepose to pten; test=develop

* fix bug; test=develop

* fix bugs; test=develop

* add transpose grad fp16 support; test=develop

* fix bug; test=develop

* fix npu bug; test=develop

* fix nemul = 0 bug; test=develop

* add fp16 support; test=develop

* fix data type register bug; test=develop

* fix transpose bug; test=develop

* update transpose

* fix transpose bug; test=develop

* remove useless code; test=develop

* remove useless code; test=develop

* fix transpose alias bug; test=develop

* polish code; test=develop

* resolve confict; test=develop

* resolve confilct; test=develop

* recover prepared operator; test=develop

* fix bug; test=develop

* polish code; test=develop

* fix bug; test=develop

* fix bug; test=develop

7a857924

Move BroadcastTensors OP to phi (#40047) · 2a5590a1

由 From00 提交于 3月 02, 2022

* Move BroadcastTensors OP to phi

* Remove mutable_data in impl

* Move BilinearTensorProductInferMeta to multiary.h/cc

2a5590a1

Z
The backward code of Sparse Conv3d (#40054) · 8492d3bb
由 zhangkaihuo 提交于 3月 02, 2022
```
Sparse Conv3d backward code
```
8492d3bb
Z
[bf16] add bf16 kernel: softmax & log_softmax (#39999) · 4a4215ff
由 zhangbo9674 提交于 3月 02, 2022
```
* add softmax log_softmax

* refine rocm

* refine unittest
```
4a4215ff
C
【phi】migrate gather_tree,reduce_prod to phi (#39844) · 6af2729e
由 crystal 提交于 3月 02, 2022
```
* move to phi

* migrate gather_tree_op into phi

* move reduce_prod tp phi

* optimize code
```
6af2729e
C
[Phi] Unify complex type trait and fix real imag bug (#40036) · 0764fda2
由 Chen Weihang 提交于 3月 02, 2022
```
* unify complex type trait and fix real imag bug

* add unittest for type tratis
```
0764fda2
optimize CUDA implementaion of randint OP (#39952) · fb635089
由 zhouweiwei2014 提交于 3月 02, 2022
```
* change CUDA implementaion of randint OP,move distribution common func to phi

* fix CI

* fix CI
```
fb635089

[Pten] Gru lstm migration (#39729) · e4dba69a

由 Feiyu Chan 提交于 3月 02, 2022

* move sequence2batch

* move lstm and gru

* Add phi/kernels directory into exclusion to stop using hipcc to compile non .cu files in it.

e4dba69a

01 3月, 2022 10 次提交

[Phi]rm reduce infershape (#39820) · 09039636

由 chentianyu03 提交于 3月 01, 2022

* modify infershape utils and rm reduce infershape

* merge develop

* fix infermete bug

* add IsForInferShape func in ArgumentMappingContext

* add reduce_mean infermeta

* modify annotation

* add default dims

09039636

[phi] tranfer the selu_op and pass the CI (#39819) · 197da15a

由 xiongkun 提交于 3月 01, 2022

* tranfer the selu_op and pass the CI

* add sig files

* fix code

* fix by code review

* remove TOOD

* change the include position

* change the head position

197da15a

Add function description for Kernel Primitive API (#39884) · 255bf609

由 niuliling123 提交于 3月 01, 2022

* Add function description for Kernel Primitive API
1. Set cumsum and sort share memory size = 1024
2.sort and cumsum api limitation : blockDim.x must be less than 512 (blockDim.x <= 512)

255bf609

[bf16] add bf16 kernel: layer_norm p_norm reduce_sum (#39843) · ce8ed978

由 zhangbo9674 提交于 3月 01, 2022

* add layer norm

* add p norm

* add reduce sum

* refine layer norm register bf16 for cudnn811

* add bf16 cast for hip

* add unittest

* refine rocm

* refine layer_norm unittest

* refine reduce op

* refine unittest

* enhance atol for reduce unittest

ce8ed978

[bf16] add bf16 kernel: scale gather sum (#39683) · 6d26b332

由 zhangbo9674 提交于 3月 01, 2022

* add scale gather sum

* refine CUDA_ATOMIC_WRAPPER ADD for bf16

* add gather unittest

* solve conflict

* add scale uinttest

* add sum unittest

* solve conflict

* refine gather unittest

* refine unittest

6d26b332

R

[phi] migrate where kernel into phi (#39811) · 468a2a17
由 ronnywang 提交于 3月 01, 2022

468a2a17

[PHI] Remove reseting dtype, layout and allocation by arg_def for outputs in executor (#39781) · 4fbcf6f4

由 zyfncg 提交于 3月 01, 2022

* remove SetAllocationForOutputTenosr

* add place param for copy kernel

* recover SetAllocationForOutputTenosr

* polish code

* fix empty_dev api bug

* remove reseting dtype and layout for output in executor

* fix merge bug

* [Phi] Add ClearHolder when re-alloc on new place in DeviceContext

* fix hostAlloc

* remove setting output allocation

* remove full_kernel_impl.h

* fix bug of xpu full_like
Co-authored-by: NAurelius84 <zhangliujie@baidu.com>

4fbcf6f4

L
[phi] move uniform_random to phi (#39937) · b3466387
由 Leo Chen 提交于 3月 01, 2022
```
* move uniform_random to phi

* fit selected_rows

* replace mutable_data
```
b3466387

[PHI] Support Multi Input and Output for InferShape (#39870) · e8d45583

由 zyfncg 提交于 3月 01, 2022

* add multi input for infer_shape

* support multi output for infershape

* fix split bug

* fix bug of concat

* support vector<MetaTensor*> in infrt

* fix bug

e8d45583

A
[Phi] Migrate logical_and/or/not/xor into Phi (#39942) · 8c237973
由 Aurelius84 提交于 3月 01, 2022
```
* [Phi] Migrate logical_and/or/not/xor into Phi

* fix unittest

* fix function name
```
8c237973

28 2月, 2022 6 次提交

Move index sample (#39905) · 1b585b28

由 seemingwang 提交于 2月 28, 2022

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function

* graph-engine data transfer optimization

* support graph_split load&query

* remove logs

* change shards to pointer vector

* use inference

* remove test code

* renorm op

* simplify renorm op

* recover local changes

* recover renorm op kernel

* fix init

* add blanklines in renorm doc

* fix import

* fix import

* add renorm to init.py

* merge

* move index_sample op

* Delete api.h

* Delete api.cc

* fix

* remove logs

* recover infer shape of grad

* recover changes

* change shape

* fix label

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

1b585b28

Add sparse conv3d kernel (#39879) · bc99a76c

由 zhangkaihuo 提交于 2月 28, 2022

* fix incorrect dims settings

* sparse conv3d

* fix out dims

* test performance

* test large shape success

* opt scatter, double performance

* test float16

* remove profiling code

* remove pten

* opt code lines

* correct boundary judgment

* only cpu

* test ci

* test ci

* remove the including paddle/fluid header; extract the conmmon function

* opt code lines

* use DenseTensor::data() instead of mutable_data

* return rulebook for backward

* specify layout

* rename:conv -> sparse_conv3d

bc99a76c

[Phi] move truncated_gaussian_random kernel (#39971) · 23aa7a36

由 furnace 提交于 2月 28, 2022

* [Phi] move truncated_gaussian_random, copy kernels

* [Phi] move truncated_gaussian_random, kernel register

* [Phi] move truncated_gaussian_random, delete useless codes

23aa7a36

[Pten->Phi PR4] Rename pten in funcs to phi (#39961) · eb42dd52

由 Chen Weihang 提交于 2月 28, 2022

* rename pten_utils to phi_utils

* rename pten_utils target

* rename Pten to Phi

* replace pten with phi

* resolve conflict

eb42dd52

[PHI] adjust the empty kernel and dev_api (#39958) · d1595c26

由 zyfncg 提交于 2月 28, 2022

* remove empty kernel in fluid and adjust the param of empty dev_api

* polish code

* revert fluid empty kernel

d1595c26

[Pten] Support optional param for C++ API (#39760) · aceb25e1

由 zyfncg 提交于 2月 28, 2022

* fix selected_rows bug in C++ API

* add optional for C++ APIO

* data transform support optional

* remove data transform for optional vector<Tensor>

* adjust some format of funtcion

* fix empyt bug

aceb25e1

26 2月, 2022 4 次提交

[Pten] Refactor the copy kernel (#39731) · 9a7b9eda

由 zyfncg 提交于 2月 26, 2022

* remove SetAllocationForOutputTenosr

* add place param for copy kernel

* recover SetAllocationForOutputTenosr

* polish code

* fix empty_dev api bug

* test=allcases

* test=allcases

* fix bug

* recover empty

* recover modify

9a7b9eda

Move GumbelSoftmax OP to phi (#39873) · 581b2c64

由 From00 提交于 2月 26, 2022

* Move GumbelSoftmax OP to phi

* platform::errors -> phi::errors; GumbelSoftmaxGradInferMeta -> backend.h/cc

* Use axis util in kernel impl

* Remove namespace platform::errors

* Use GetCPUEngine in Device Context

581b2c64

Support custom implement for C++ API (#39521) · caea126c

由 zyfncg 提交于 2月 26, 2022

* Support custom implement for C++ API

* rename api_invoke_impl to api_custom_impl

* remove manual_api

* delete mutable_data in copy_to api

* fix problem of copy_to

* add unittest for infer_meta_fn_factory

* fix split cofig in yaml

* fix split cofig in yaml

* modify sum api yaml

* add copy_to wrapped infermeta

* rollback copy impl

caea126c

F
Move BilinearTensorProduct OP to phi (#39903) · de8f2748
由 From00 提交于 2月 26, 2022
```
* Move BilinearTensorProduct OP to phi

* Set dtype for Infermeta
```
de8f2748

25 2月, 2022 6 次提交
- C
  
  move for_range into phi (#39931) · 94d8f392
  由 Chen Weihang 提交于 2月 25, 2022
  
  94d8f392
- 0
  move eye、size、erfinv、pixel_shuffle OP to phi (#39712) · 639675de
  由 0x45f 提交于 2月 25, 2022
```
* move eye OP to pten

* move size OP to pten

* merge develop

* fix merge

* move files

* move erfinv OP to phi

* remove comment

* move pixel_shuffle OP to phi

* remove comment

* fix PT_REGISTER

* fix NPU

* fix CR

* remove size_sig.cc for PR-CI-Coverage
```
  639675de
- A
  [phi]migrate increment addmm multinomial cholesky InferShapes to phi (#39913) · 87b903a3
  由 Aganlengzi 提交于 2月 25, 2022
```
* [phi]migrate increment addmm multinomial cholesky InferShapes to phi

* set_dtype and mod MultinomialFunctor
```
  87b903a3
- L
  
  move diag_v2 to phi (#39914) · 783c4aba
  由 Linjie Chen 提交于 2月 25, 2022
  
  783c4aba
- Z
  
  replace implementation with cuda kernel (#39795) · 64f1485a
  由 Zhang Ting 提交于 2月 25, 2022
  
  64f1485a
- Z
  [bf16] add bf16 kernel: elementwise_add elementwise_mul elementwise_sub (#39716) · 2fedd39b
  由 zhangbo9674 提交于 2月 25, 2022
```
* add ele_add

* add ele_mul

* add ele_sub

* sovle conflict

* fix npu

* refine ele_add

* add ele_mul unittest

* refine ele_sub

* refine ci

* refine unittest
```
  2fedd39b

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致