提交 · 48f061fb9c0185f269ab2208c03b2be7ecf7214c · 机器未来 / Paddle

28 12月, 2021 8 次提交
- L
  Support multi-output feature for elementwise (#38410) · 48f061fb
  由 limingshu 提交于 12月 28, 2021
```
* first commit

* pass ctest of  elementwise_div_grad
```
  48f061fb
- Z
  refactor matmul directory in pten (#38227) · 982bf444
  由 zyfncg 提交于 12月 28, 2021
```
* refactor matmul directory in pten

* fix merge conflict
```
  982bf444
- H
  Add API and op for take_along_axis (#38396) · 3310f519
  由 huangxu96 提交于 12月 28, 2021
```
* add API and op for take_along_axis

* fix compile dependency problem and add example code and doc

* add unitest

* delete some code for CI coverage

* fix code style problem

* fix as review
```
  3310f519
- G
  
  fix adamw epsilon in cuda kernel (#37746) · 6f1bb3d6
  由 Guoxia Wang 提交于 12月 28, 2021
  
  6f1bb3d6
- T
  Add Amax and Amin API (#38417) · 340dfb26
  由 Tao Luo 提交于 12月 28, 2021
```
* add amax/amin

* support axis is list
```
  340dfb26
- C
  [pten] remove in_type arg in cast kernel (#38486) · 0637b9a6
  由 chentianyu03 提交于 12月 28, 2021
```
* remove intype arg in cast kernel

* modify conj config in api.yaml by dictionary order

* rm unused code in cast_kernel.cu
```
  0637b9a6
- H
  add reduce_prod_xpu. fix reduce_mean_xpu bug. (#38481) · 78836bb7
  由 houj04 提交于 12月 28, 2021
```
* add reduce_prod_xpu. fix reduce_mean_xpu bug.

* iadd reduce_prod_xpu. fix reduce_mean_xpu bug. test=kunlun
```
  78836bb7
- L
  
  Add constructor for fused dropout param to ease use. (#38475) · f9e8a775
  由 Li Min 提交于 12月 28, 2021
  
  f9e8a775
27 12月, 2021 6 次提交
- B
  
  update mkldnn matmul_transpose_reshape fuse pass ut (#38467) · 9cfdae91
  由 baoachun 提交于 12月 27, 2021
  
  9cfdae91
- B
  add matmulv2_transpose_reshape_pass ut (#37416) · f664a533
  由 baoachun 提交于 12月 27, 2021
```
* update mkldnn matmul_v2_transpose_reshape_fuse_pass ut

* update mkldnn matmul_v2_transpose_reshape_fuse_pass ut

* update ut

* update ut
```
  f664a533
- L
  add device-agnostic stream class (#38391) · 6b5e33b4
  由 Leo Chen 提交于 12月 27, 2021
```
* add device-agnostic stream class

* add stream.h

* fix ut

* fix cpu compile
```
  6b5e33b4
- S
  
  refine float16 implementation (#38439) · 78375990
  由 sneaxiy 提交于 12月 27, 2021
  
  78375990
- L
  Support multi-outputs feature for broadcast ops (#38329) · 89d38f55
  由 limingshu 提交于 12月 27, 2021
```
* No harm to KP

* Pass the compile stage

* change the WriteData function

* fix template bugs and pass ctest of current elementwise

* for passing partial template specialization of tempalte function in CI-ROCm

* To make 'WriteData' funtion flexible.

* a less harmful way to support multi-output

* a less harmful way to support multi-output
```
  89d38f55
- G
  
  gelu using normcdf for cudnn (#38450) · 37022482
  由 Guoxia Wang 提交于 12月 27, 2021
  
  37022482
26 12月, 2021 3 次提交
- Z
  
  improve forward performace (#38279) · acef85b2
  由 Zhang Ting 提交于 12月 26, 2021
  
  acef85b2
- C
  Fix renorm op include error and format error (#38451) · e6c3f64f
  由 Chen Weihang 提交于 12月 25, 2021
```
* remove needless header

* remove needless header

* adjust header order
```
  e6c3f64f
- Z
  [Unify Tensors PR #2] Replaced pten::LoD with paddle::framework::LoD (#38275) · bbe879fc
  由 Zhanlue Yang 提交于 12月 26, 2021
```
* Replaced pten::LoD with paddle::framework::LoD

* Overrided CPUVector with CUDAVector

* Refactored paddle::framework::Vector
```
  bbe879fc
24 12月, 2021 8 次提交

由 seemingwang 提交于 12月 24, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink

* use cache when sample nodes

* remove unused function

* change unique_ptr to shared_ptr

* simplify cache template

* cache api on client

* fix

* reduce sample threads when cache is not used

* reduce cache memory

* cache optimization

* remove test function

* remove extra fetch function

* graph-engine data transfer optimization

* support graph_split load&query

* remove logs

* change shards to pointer vector

* use inference

* remove test code

* renorm op

* simplify renorm op

* recover local changes

* recover renorm op kernel

* fix init

* add blanklines in renorm doc

* fix import

* fix import
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

6982871d

Z

[AMP] Add multi_precision for sgd (#38231) · a4d07bb9
由 zhangbo9674 提交于 12月 24, 2021

a4d07bb9

[pten] combine reduce_cuda codes (#38328) · 08941eda

由 chentianyu03 提交于 12月 24, 2021

* combine reduce_cuda codes

* support float16 in pten redcue_mean

* replace ReduceCudaKernel impl with pten reduce impl

* mv reduce funcs into reduce_cuda_impl

* rm unsed codes and headers

* mv GetReduceDim into reduce_cuda_impl

* recover GetReduceDim in reduce_op.h

* add new dispatch macro

* fix pool op output not inited and cause transform to pten::denseTensor error

* fix output tensor not initialized error

* rename new dispatch macro and format code style

* rm reduce_functor_op.h file

08941eda

add new API/OP:paddle.Tensor.exponential_ (#38256) · 33185000
由 zhouweiwei2014 提交于 12月 24, 2021
```
* add new API/OP:paddle.Tensor.exponential_

* fix CI
```
33185000
[MLU]add mlu op interface (#38241) · c396ee65
由努力努力在努力丶提交于 12月 24, 2021
```
* [MLU]add mlu op interface

* [MLU]fix alpha of activation op
```
c396ee65
Y
add pull gpups sparse op (#37124) · 572b3e90
由 yaoxuefeng 提交于 12月 24, 2021
```
 add pull gpups sparse op
```
572b3e90
Z

Add new API cholesky_solve (#38167) · 39f7c41f
由 zhiboniu 提交于 12月 24, 2021

39f7c41f
add new API/OP: paddle.poisson (#38117) · bcf86e5c
由 zhouweiwei2014 提交于 12月 24, 2021
```
* add new API/OP:paddle.poisson

* fix comment
```
bcf86e5c

23 12月, 2021 5 次提交
- C
  
  move conj kernel impl (#38365) · 8da9eff4
  由 Chen Weihang 提交于 12月 23, 2021
  
  8da9eff4
- J
  Make GetBlob assuming elements are cached (#38336) · 7da5368d
  由 Jacek Czaja 提交于 12月 23, 2021
```
* First set of fixes

* - Make more likely to GetBlob find a blobs

* - Lint
```
  7da5368d
- W
  Add erfinv API (#38295) · 6b59b58c
  由 wuhuanzhou 提交于 12月 23, 2021
```
* add erfinv API, test=develop

* fix gradient accuracy error, test=develop

* fix cuda compilation error on Windows, test=develop

* fix M_2_SQRTPI undeclared identifier on Windows, test=develop
```
  6b59b58c
- Z
  【PTen】Add empty and empty_like kernel in pten (#38334) · 4221cd33
  由 zyfncg 提交于 12月 23, 2021
```
* add empty and empty_like kernel in pten

* add empty dev_api
```
  4221cd33
- C
  
  move sign kernel impl (#38363) · bb38b6aa
  由 Chen Weihang 提交于 12月 22, 2021
  
  bb38b6aa
22 12月, 2021 3 次提交
- C
  use elementwise to optimize gelu backward implementation on GPU (#38263) · 858e4358
  由 crystal 提交于 12月 22, 2021
```
* optimize gelu backward

* optimize gelu backward

* optimize code

* Number to expression

* Replacement number
```
  858e4358
- Y
  [PTen]Move flatten kernel to new directory (#38255) · 4d1ce184
  由 YuanRisheng 提交于 12月 22, 2021
```
* move flatten

* fix bugs of test

* modify header file

* add copy declare

* fix compile bugs
```
  4d1ce184
- J
  
  Add nearest_interp/v2 int8 and uint8 support (#37985) · 56e2a6a6
  由 joanna.wozna.intel 提交于 12月 22, 2021
  
  56e2a6a6
21 12月, 2021 4 次提交
- C
  [PTen] Rename cuda dir and context to gpu (#38296) · dc7597e3
  由 Chen Weihang 提交于 12月 21, 2021
```
* rename cuda to gpu

* revert CMake change

* resolve conflit

* rename other cuda to gpu

* poish details
```
  dc7597e3
- C
  use elementwise to optimize gelu forward implementation on GPU (#38188) · aff43684
  由 crystal 提交于 12月 21, 2021
```
* relu forward opt

* add gelu functor

* optimize code
```
  aff43684
- A
  
  Fix for wrong conditions between forward and backward in elementwise_add_grad op (#38176) · d9780a22
  由 arlesniak 提交于 12月 21, 2021
  
  d9780a22
- S
  Support FP16 mean (#38289) · 643a268e
  由 sneaxiy 提交于 12月 21, 2021
```
* mean first version

* fix scalar mean

* add fp16 dtype for api
```
  643a268e
20 12月, 2021 3 次提交
- C
  [pten]add pten conj kernel (#38247) · a2793e5e
  由 chentianyu03 提交于 12月 20, 2021
```
* add pten conj kernel

* modify conj_kernel file path

* add defined cuda macro to cuda/conj_kernel.h
```
  a2793e5e
- B
  
  add gelu pbtxt for conv+gelu mkldnn fuse pass (#38162) · 1b7f6ae9
  由 baoachun 提交于 12月 20, 2021
  
  1b7f6ae9
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致