提交 · 657dd5a97de6b54e59aa60a7d7afcab33bf36420 · 机器未来 / Paddle

01 3月, 2022 3 次提交
- C
  Optimize group_norm op forward (#39596) · 657dd5a9
  由 crystal 提交于 3月 01, 2022
```
* optimize group norm forward

* use vectorized optimization

* add scalar calculation code

* optimize code
```
  657dd5a9
- C
  
  remove dot infershape (#39945) · 75280d36
  由 chentianyu03 提交于 3月 01, 2022
  
  75280d36
- S
  Optimize the CUDA kernel in DistributedFusedLamb optimizer (#39972) · d17961ed
  由 sneaxiy 提交于 3月 01, 2022
```
* vectorize lamb kernel

* remove flags, add ut

* remove useless codes

* refine code, add param order
```
  d17961ed
28 2月, 2022 14 次提交

由 seemingwang 提交于 2月 28, 2022

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function

* graph-engine data transfer optimization

* support graph_split load&query

* remove logs

* change shards to pointer vector

* use inference

* remove test code

* renorm op

* simplify renorm op

* recover local changes

* recover renorm op kernel

* fix init

* add blanklines in renorm doc

* fix import

* fix import

* add renorm to init.py

* merge

* move index_sample op

* Delete api.h

* Delete api.cc

* fix

* remove logs

* recover infer shape of grad

* recover changes

* change shape

* fix label

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

1b585b28

0
[Phi]Move size, erfinv, pixel_shuffle infershape to phi (#39949) · a0cb3203
由 0x45f 提交于 2月 28, 2022
```
* move size, erfinv, pixel_shuffle infershape to phi

* fix erfinv infermeta
```
a0cb3203

Grid_sampler optimization (#39751) · 2c66775b

由 Lijunhui 提交于 2月 28, 2022

* init grid_sampler with mode=bilinear

* solve error

* rm fill constant

* rm head

* change block size

* change block size

* optimize

* apply existing config

2c66775b

Trace level env (#39926) · f335d9e1

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* Add host_trace_level env variable

* Revert "Optimize perf of softmax_with_cross_entropy (#39553)"

This reverts commit bbe5228c.
Co-authored-by: Nliutiexing <liutiexing@google.com>
Co-authored-by: NZzSean <18818272991@163.com>

f335d9e1

Profile Executor (#39641) · 7ecefec3

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* add log for Executor

* Profile Allocators

* Profile Allocators

* adjust interface

* remove lock for set

* fix
Co-authored-by: Nliutiexing <liutiexing@google.com>

7ecefec3

[Phi] move truncated_gaussian_random kernel (#39971) · 23aa7a36

由 furnace 提交于 2月 28, 2022

* [Phi] move truncated_gaussian_random, copy kernels

* [Phi] move truncated_gaussian_random, kernel register

* [Phi] move truncated_gaussian_random, delete useless codes

23aa7a36

Z
PR-CI-Py3 change cpu test (#39659) · 3cb93edf
由 zhangchunle 提交于 2月 28, 2022
```
* update;test=cpu-py3
```
3cb93edf

[Pten->Phi PR4] Rename pten in funcs to phi (#39961) · eb42dd52

由 Chen Weihang 提交于 2月 28, 2022

* rename pten_utils to phi_utils

* rename pten_utils target

* rename Pten to Phi

* replace pten with phi

* resolve conflict

eb42dd52

Update host tracer (#39975) · 406f1b96

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update HostTracer

* fix

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

406f1b96

Z
[bf16] Refine BF16 amp-o1 logic (#39815) · 18ee051e
由 zhangbo9674 提交于 2月 28, 2022
```
* refine bf16 amp-o1 logic

* refine amp GLOG

* refine unittest

* refine unittest
```
18ee051e
W

infrt add trt engine (#39885) · 27536a32
由 Wilber 提交于 2月 28, 2022

27536a32
Z

fix ps_gpu_wrapper (#39965) · bd9b9460
由 zmxdream 提交于 2月 28, 2022

bd9b9460
C
add new profiler components (#39964) · d4ae1775
由 chenjian 提交于 2月 28, 2022
```
* add new profiler components

* fix bug
```
d4ae1775

[KP] Unify .cu and .xpu files with .kps files (#39917) · 0ff72e5d

由 Liu-xiandong 提交于 2月 28, 2022

* [KP] Unify .cu and .xpu files with .kps files

* fix CI bug in GPU and modify the list

* fix conflict

* modify the date

0ff72e5d

26 2月, 2022 5 次提交
- Y
  
  revert reshape op infershape (#39946) · b33a3c23
  由 YuanRisheng 提交于 2月 26, 2022
  
  b33a3c23
- F
  Move GumbelSoftmax OP to phi (#39873) · 581b2c64
  由 From00 提交于 2月 26, 2022
```
* Move GumbelSoftmax OP to phi

* platform::errors -> phi::errors; GumbelSoftmaxGradInferMeta -> backend.h/cc

* Use axis util in kernel impl

* Remove namespace platform::errors

* Use GetCPUEngine in Device Context
```
  581b2c64
- F
  Move BilinearTensorProduct OP to phi (#39903) · de8f2748
  由 From00 提交于 2月 26, 2022
```
* Move BilinearTensorProduct OP to phi

* Set dtype for Infermeta
```
  de8f2748
- W
  [Eager Hook] Support GradientHook and ReduceHook, expose related interface to python (#39893) · a456dda6
  由 Weilong Wu 提交于 2月 26, 2022
```
* Support Eager Hook, expose interface to python

* Fix CI issue
```
  a456dda6
- C
  
  fix mkldnn softmax erro (#39951) · ab872efe
  由 Chen Weihang 提交于 2月 26, 2022
  
  ab872efe
25 2月, 2022 18 次提交
- C
  
  move for_range into phi (#39931) · 94d8f392
  由 Chen Weihang 提交于 2月 25, 2022
  
  94d8f392
- F
  
  [phi] update code for mkl based fft (#39889) · 687902fc
  由 Feiyu Chan 提交于 2月 25, 2022
  
  687902fc
- J
  
  added logsoftmax oneDNN kernel (#39793) · 584844ec
  由 jakpiase 提交于 2月 25, 2022
  
  584844ec
- S
  Add MultiTensorApply to calculate L2-Norm in DistributedFusedLamb optimizer (#39900) · d32a0102
  由 sneaxiy 提交于 2月 25, 2022
```
* add multi tensor apply l2 norm

* add multi_tensor_apply code

* make sizeof(TensorMeta) smalller

* move code to distributed_fused_lamb_op.cu

* remove useless FLAGS
```
  d32a0102
- 0
  move eye、size、erfinv、pixel_shuffle OP to phi (#39712) · 639675de
  由 0x45f 提交于 2月 25, 2022
```
* move eye OP to pten

* move size OP to pten

* merge develop

* fix merge

* move files

* move erfinv OP to phi

* remove comment

* move pixel_shuffle OP to phi

* remove comment

* fix PT_REGISTER

* fix NPU

* fix CR

* remove size_sig.cc for PR-CI-Coverage
```
  639675de
- Y
  Disable dist ut cases (#39906) · 4fe465cb
  由 YUNSHEN XIE 提交于 2月 25, 2022
```
* disable some distribute test case when in CPU test env

* disable some test case when in CPU test env

* fix
```
  4fe465cb
- Z
  
  Fix conflict caused by wrong namespace (#39930) · d8fc7211
  由 Zhang Zheng 提交于 2月 25, 2022
  
  d8fc7211
- A
  [phi]migrate increment addmm multinomial cholesky InferShapes to phi (#39913) · 87b903a3
  由 Aganlengzi 提交于 2月 25, 2022
```
* [phi]migrate increment addmm multinomial cholesky InferShapes to phi

* set_dtype and mod MultinomialFunctor
```
  87b903a3
- Q
  [ROCm] fix Managed Memory Alloc on HIP, test=develop (#39896) · 37cb6f32
  由 Qi Li 提交于 2月 25, 2022
```
* [ROCm] fix Managed Memory Alloc on HIP, test=develop

* update, test=develop
```
  37cb6f32
- L
  
  move diag_v2 to phi (#39914) · 783c4aba
  由 Linjie Chen 提交于 2月 25, 2022
  
  783c4aba
- Z
  
  replace implementation with cuda kernel (#39795) · 64f1485a
  由 Zhang Ting 提交于 2月 25, 2022
  
  64f1485a
- Z
  Optimize perf of softmax_with_cross_entropy (#39553) · bbe5228c
  由 Zhang Zheng 提交于 2月 25, 2022
```
* Optimize perf of softmax_with_cross_entropy

* fix

* fix

* fix accuracy error
```
  bbe5228c
- Z
  [bf16] add bf16 kernel: elementwise_add elementwise_mul elementwise_sub (#39716) · 2fedd39b
  由 zhangbo9674 提交于 2月 25, 2022
```
* add ele_add

* add ele_mul

* add ele_sub

* sovle conflict

* fix npu

* refine ele_add

* add ele_mul unittest

* refine ele_sub

* refine ci

* refine unittest
```
  2fedd39b
- F
  [Phi] mv kernel (#39861) · 2553af4f
  由 furnace 提交于 2月 25, 2022
```
[Phi] mv kernel 
```
  2553af4f
- J
  
  add reduce_min and reduce_max (#39899) · 44da9b42
  由 joeqiao12 提交于 2月 25, 2022
  
  44da9b42
- C
  [Phi] Support cudnn kernel moving & move softmax kernels (#39547) · 8895379a
  由 Chen Weihang 提交于 2月 25, 2022
```
* support cudnn kernel moving

* polish cmake rules

* add unittest for coverage

* remove orig kernel

* remove softmax cudnn kernel

* fix softmax test failed

* fix npu func error

* resolve conflict

* rename gpu dnn kernels

* fix name rule error

* fix compile error

* update fp16 namespace
```
  8895379a
- Y
  [Bug Fixes]Fix Bugs when construct infermeta by using shape(Vector<Tensor>) (#39904) · fed6de40
  由 YuanRisheng 提交于 2月 25, 2022
```
* fix bugs

* fix bugs
```
  fed6de40
- L
  [Fix bug] fix fp16 atomicAdd compiler error on different cuda_arch. (#39886) · ef96ffb6
  由 Li Min 提交于 2月 25, 2022
```
* Fix compile error on cuda_arch less than 700.
```
  ef96ffb6

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致