提交 · 3c0a68cec492a3e5e624242b77f3b56cfc39463c · Crayon鑫 / Paddle

01 11月, 2021 5 次提交

J

add debug infomation for build_cinn_pass and graph symbolization (#36867) · 813e7526
由 jiangcheng 提交于 11月 01, 2021

813e7526
Z

memory sparse table & brpc communication upgrade dependency (#36734) · 29c6bcbf
由 zhaocaibei123 提交于 11月 01, 2021

29c6bcbf

cache for graph_engine (#36880) · 249081b6

由 seemingwang 提交于 11月 01, 2021

* graph engine demo

* upload unsaved changes

* fix dependency error

* fix shard_num problem

* py client

* remove lock and graph-type

* add load direct graph

* add load direct graph

* add load direct graph

* batch random_sample

* batch_sample_k

* fix num_nodes size

* batch brpc

* batch brpc

* add test

* add test

* add load_nodes; change add_node function

* change sample return type to pair

* resolve conflict

* resolved conflict

* resolved conflict

* separate server and client

* merge pair type

* fix

* resolved conflict

* fixed segment fault; high-level VLOG for load edges and load nodes

* random_sample return 0

* rm useless loop

* test:load edge

* fix ret -1

* test: rm sample

* rm sample

* random_sample return future

* random_sample return int

* test fake node

* fixed here

* memory leak

* remove test code

* fix return problem

* add common_graph_table

* random sample node &test & change data-structure from linkedList to vector

* add common_graph_table

* sample with srand

* add node_types

* optimize nodes sample

* recover test

* random sample

* destruct weighted sampler

* GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* WeightedGraphEdgeBlob to GraphEdgeBlob

* pybind sample nodes api

* pull nodes with step

* fixed pull_graph_list bug; add test for pull_graph_list by step

* add graph table;name

* add graph table;name

* add pybind

* add pybind

* add FeatureNode

* add FeatureNode

* add FeatureNode Serialize

* add FeatureNode Serialize

* get_feat_node

* avoid local rpc

* fix get_node_feat

* fix get_node_feat

* remove log

* get_node_feat return  py:bytes

* merge develop with graph_engine

* fix threadpool.h head

* fix

* fix typo

* resolve conflict

* fix conflict

* recover lost content

* fix pybind of FeatureNode

* recover cmake

* recover tools

* resolve conflict

* resolve linking problem

* code style

* change test_server port

* fix code problems

* remove shard_num config

* remove redundent threads

* optimize start server

* remove logs

* fix code problems by reviewers' suggestions

* move graph files into a folder

* code style change

* remove graph operations from base table

* optimize get_feat function of graph engine

* fix long long count problem

* remove redandunt graph files

* remove unused shell

* recover dropout_op_pass.h

* fix potential stack overflow when request number is too large & node add & node clear & node remove

* when sample k is larger than neigbor num, return directly

* using random seed generator of paddle to speed up

* fix bug of random sample k

* fix code style

* fix code style

* add remove graph to fleet_py.cc

* fix blocking_queue problem

* fix style

* fix

* recover capacity check

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* add remove graph node; add set_feature

* fix distributed op combining problems

* optimize

* remove logs

* fix MultiSlotDataGenerator error

* cache for graph engine

* fix type compare error

* more test&fix thread terminating problem

* remove header

* change time interval of shrink
Co-authored-by: NHuang Zhengjie <270018958@qq.com>
Co-authored-by: NWeiyue Su <weiyue.su@gmail.com>
Co-authored-by: Nsuweiyue <suweiyue@baidu.com>
Co-authored-by: Nluobin06 <luobin06@baidu.com>
Co-authored-by: Nliweibin02 <liweibin02@baidu.com>
Co-authored-by: Ntangwei12 <tangwei12@baidu.com>

249081b6

A
[NPU] fix lookup_table_v2_grad ACL error for model BoW (#36864) · 792d3d76
由 Aganlengzi 提交于 11月 01, 2021
```
* [NPU] fix lookup_table_v2_grad ACL error for model BoW

* add more unit tests
```
792d3d76

add cinn_launch_op for using CINN to optimize graph (#36600) · 0a963ee9

由 CtfGo 提交于 11月 01, 2021

增加CinnLaunchOp，负责执行Cinn子图编译的结果，要点如下：
1. 在子图划分的BuildCinnPass中，每个子图在原图中会被替换为该CinnLaunchOp，由它来调用Cinn进行子图编译、执行的功能。
2. CinnLaunchOp的输入/输出即为子图的输入和输出，另外增加`compilation_key`属性，它可由该属性key从全局Cache中获取子图对象、编译结果，该属性由BuildCinnPass在创建Op时进行设置
3. CinnLaunchOp功能实现的流程为：
        - 从全局Cache中获取子图对象
        - 从全局Cache中获取子图编译结果，未命中cache时进行即时编译
        - 根据编译结果的变量信息(数据类型、shape）初始化运行时数据，分配内存/显存
        - 将运行时数据打包为参数，调用cinn的可执行对象runtime program进行计算
        - 子图运行结果通过参数指针同步到paddle侧的tensor

0a963ee9

29 10月, 2021 7 次提交
- T
  add some ops support fp16 in kunlun2 (#36854) · 442688a8
  由 taixiurong 提交于 10月 29, 2021
```
* aaaa

* add some ops support fp16 in kunlun2
```
  442688a8
- B
  
  fix matmul error when input's dim is 3 (#36849) · f6b4ed22
  由 baoachun 提交于 10月 29, 2021
  
  f6b4ed22
- N
  
  Add io api and compute api for XPU (#36423) · 89a8989f
  由 niuliling123 提交于 10月 29, 2021
  
  89a8989f
- add new API/OP: paddle.linalg.triangular_solve (#36714) · 92d6a048
  由 zhouweiwei2014 提交于 10月 29, 2021
```
* add new API: paddle.linalg.triangular_solve

* add new API/OP: paddle.linalg.triangular_solve

* add new API/OP: paddle.linalg.triangular_solve

* fix comment
```
  92d6a048
- W
  fix some bug in new executor (#36822) · b5af9575
  由 wanghuancoder 提交于 10月 29, 2021
```
* fix some bug in new executor, test=develop

* fix error message, test=develop
```
  b5af9575
- L
  [new-exec] enable check_nan_inf (#36802) · be55bac3
  由 Leo Chen 提交于 10月 29, 2021
```
* enable check_nan_inf and fix variable scope

* add ut

* fix bug

* update ut

* revert doc change

* fix npu compile
```
  be55bac3
- W
  
  fix dcnv2 trt8 compile error (#36850) · 82fb63eb
  由 wangxinxin08 提交于 10月 29, 2021
  
  82fb63eb
28 10月, 2021 10 次提交

Fix several bugs for enabling Paddle to train with CINN. (#36739) · c93331c5

由 Zhen Wang 提交于 10月 28, 2021

* Update the content of `test_parallel_executor_run_cinn.py`.

* Fix some bugs in the topological sort and `CreateNewSubGraph`.

* Update the CINN commit id used by Paddle.

* Update the unit test to `add+relu`.

* Update according to reviewers' suggestion.

c93331c5

[NPU] Add int64 supporting for expand_v2, reduce_max, scale and tests (#36582) · c038cc7a

由 ronnywang 提交于 10月 28, 2021

* add TypeAdapter method for npu_op_runner

* add int64 supporting for elementwise_mul and reduce_sum

* add int64 supporting and UT for expand_v2, scale and reduce_max

* fix bug

c038cc7a

ctc grad compute on gpu (#36756) · 54ef9d06

由 Hui Zhang 提交于 10月 28, 2021

* Revert "Align CTC grad scale same with ESPNet (#34729)"

This reverts commit 10f9644c.

* ctc grad compute on gpu

54ef9d06

save/load in ps runtime(the_one_ps) (#36097) · e7842ba6

由 wangguanqun 提交于 10月 28, 2021

* add trainer desc config to distributed strategy

* code style modified

* data_feed set lod

* fix bug

* code style

* fix bug

* save load

* save load

* save unittest

* add unittest of the_one_ps

* unittest

* add todo in communicator sendsparse

e7842ba6

L

Rewrite Softmax in Kernel Primitive API, test=develop (#36706) · ef76f664
由 Liu-xiandong 提交于 10月 28, 2021

ef76f664
X
support inference for quantized matmul_v2 (#36594) · b151a451
由 XGZhang 提交于 10月 28, 2021
```
* support inference for quantized matmul_v2

* undate code style

* code style
```
b151a451

Fix cancel (#36740) · 704e454f

由 liutiexing 提交于 10月 28, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* update

* update

* update Error MSG

* update EventsWaiter

* Add Cancel For ThreadPool

* Add UT for Cancel

* fix Cancel

704e454f

L
Fix fused_attention_op and fused_feedforward_op bug when pre_layer_norm is false. (#36793) · ff3018d7
由 Li Min 提交于 10月 28, 2021
```
* Fix bug when pre_layer_norm is false.
```
ff3018d7
A
Modify Struct into Class to improve encapsulation and Polish code exception (#36797) · 9516108a
由 Aurelius84 提交于 10月 28, 2021
```
* Refactor InterpreterCore code

* make tuple
```
9516108a
F
change api to support trt8 in pool3d_op_convert (#36783) · a7d8837b
由 feng_shuai 提交于 10月 28, 2021
```
* change api for support trt8

* fix:change api
```
a7d8837b

27 10月, 2021 14 次提交

P

add unittest (#36511) · 51a33962
由 pangyoki 提交于 10月 27, 2021

51a33962
Q
[ROCM] add custom op support, test=develop (#36771) · dd1d3789
由 Qi Li 提交于 10月 27, 2021
```
* [ROCM] add custom op support, test=develop

* remove debug codes, test=develop
```
dd1d3789
W
GeneratePass support attr condition and mapping (#36747) · 5c569aef
由 wuhuanzhou 提交于 10月 27, 2021
```
* GeneratePass support attr condition and mapping, test=develop

* fix coverage, test=develop
```
5c569aef
W
add dcnv2 trt plugin (#36612) · 8c3decd8
由 wangxinxin08 提交于 10月 27, 2021
```
* add dcnv2 plugin
```
8c3decd8
Z

fix ernie serialize problem (#36769) · d6b1beb0
由 zlsh80826 提交于 10月 27, 2021

d6b1beb0

Added fp32 / bf16 forward and backward elementwise_div_mkldnn operator (#36158) · e92e6b06

由 piotrekobiIntel 提交于 10月 27, 2021

* Add WIP version of elementwise_div_mkldnn without working dy grad

* Add dy gradient calculation implementation, disable broadcast tests

* Readd removed tests from static_mode_white_list

* Add bfloat16 gradient tests, remove int8 and uint8 support

* - Change the way dy grad is calculated to improve performance
- Refactor BinaryMKLDNNHandler to use a default parameter

* Change copyright year

* Refactor as suggested

* Attempt to bypass CI Approval
not accepting max_relative_error

* Fix formatting issue

e92e6b06

Add LRUCache for fft plans (#36646) · 737992eb

由 Feiyu Chan 提交于 10月 27, 2021

* WIP: add cache

* delete move constructor and operator= for CuFFTHandle and FFTConfig

* remove log from CuFFTHandle and FFTConfig

* add lrucache for fft rocm backend

* disable LRUCache when CUFFT_VERSION >= 10200

* disbale copy and move for hipFFTHandle; format code

* clean debug code
Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>

737992eb

Fused transformer encoder layer and fused feedforward layer (#36604) · 9f3613f3

由 zhangkaihuo 提交于 10月 27, 2021

本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

9f3613f3

B
add matmul_v2 to v1 CPU pass and fix matmul dim error (#36731) · d5245a35
由 baoachun 提交于 10月 27, 2021
```
* fix matmul dim error

* fix wrong dim check in matmul
```
d5245a35

fix fftshift/ifftshift on static mode (#36748) · 34b6860e

由 Feiyu Chan 提交于 10月 27, 2021

* fix fftshift/ifftshift on static mode
* update roll_op version
* add more test cases for fftshift/ifftshift

34b6860e

T

add fp16 unittests for kl2 (#36583) · 6838a187
由 taixiurong 提交于 10月 27, 2021

6838a187
W

enable trt test check and fix trt ut error（3/3） (#36581) · 8c1c72af
由 Wilber 提交于 10月 27, 2021

8c1c72af

add paddle.linalg.eigvalsh API (#35615) · 9f9ed3ae

由 huangjun12 提交于 10月 27, 2021

* add eigvalsh with is_test

* add eigvalsh op

* fix backward bug

* forward and backward, float and complex, unittest

* remove eigvalsh_helper.h

* remove changes of cusolver.h

* fix unittest

* fix unittest bug

* update code following eigh

* fix test

* update lapack

* pull develop

* update funcor

* fix unittest bug

* fix details

* add tensor_method_func

* fix notes

9f9ed3ae

W

Fix inverse in fake quant (#36762) · 542ba214
由 whs 提交于 10月 27, 2021

542ba214

26 10月, 2021 4 次提交

Add fused attention op backward and python layer. (#36498) · 5119428e

由 Li Min 提交于 10月 26, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

5119428e

F

roll_op: support Tensor as input for shifts (#36727) · 7b1e30fc
由 Feiyu Chan 提交于 10月 26, 2021

7b1e30fc
Z

Add roi_align grad (#36724) · 236ed94d
由 zhulei 提交于 10月 26, 2021

236ed94d
L
[new-exec] cache exception in child thread (#36692) · 87fbbd36
由 Leo Chen 提交于 10月 26, 2021
```
* cache exception in child thread

* add ut

* fix ut
```
87fbbd36

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致