提交 · 7ee31a96b436de4b0701de2ba56bd0b2a653994c · Crayon鑫 / Paddle

17 4月, 2022 1 次提交

[Perf] Optimize dygraph scheduling performance (#41696) · 7ee31a96

由 Chen Weihang 提交于 4月 17, 2022

* split phi and fluid infermeta context

* resolve conflict

* fix type error

* optimize scheduling perf

* spec small vector size

* replace all grad var name

* fix test failed

* move init defalut signature

* polish details

* polish details

* fix no init bug

* init sig for tests

* add init sig for infer

* fix infrt error

* fix infrt failed

* fix kunlun error

* fix infrt failed

7ee31a96

15 4月, 2022 6 次提交

solve brpc compile in arm-ubantu18 (#41649) · 56dafc4f

由 ziyoujiyi 提交于 4月 15, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* arm_brpc compile

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* only output is ok

* base is ok

* .

* .

* .

* .

* .

* .

* .

* .

* add switch server bin

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* adapt brpc ssl

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

56dafc4f

gpu_graph engine optimization+ (#41455) · ce72690c

由 seemingwang 提交于 4月 15, 2022

* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* recover test

* recover test

* fix spelling

* recover

* fix

ce72690c

[Phi]Reduce kernels into multiply files (#41747) · 1927aff9

由 chentianyu03 提交于 4月 15, 2022

* split reduce_kernel

* rm reduce_kernel in cmake

* split reduce_grad kernels

* fix cmake build error

* format code

* fix standalone_executor_test error

1927aff9

D
【GPUPS】add afsclient and gpupsutil (#41324) · 30a1213b
由 danleifeng 提交于 4月 15, 2022
```
* add gpupsutil and afsclient; test=develop
```
30a1213b

[XPUPS]fix hashtable_kernel.kps (#41790) · ef6ff4ef

由 zmxdream 提交于 4月 15, 2022

* refactor heter comm kernel

* update. test=develop

* update calc_shard_offset. test=develop

* update xpu kernel. test=develop

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* update. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* add optimizer kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update hashtable. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* template init. test=develop

* hashtable template init. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix hashtable_kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop
Co-authored-by: NWorgenZhang <frank08081993@gmail.com>

ef6ff4ef

A
[IPU] add mixed-precission support for ipu (#41733) · d7224482
由 Allen Guo 提交于 4月 15, 2022
```
* add mixed-precission support for ipu

* restore cast_model_to_fp16 api

* update UTs
```
d7224482

14 4月, 2022 6 次提交

L
[KP] Add registry for elementwise_add/max/min/sub/div/mul/floordiv on XPU2 with KP lib (#41494) · fbe2c311
由 Lijunhui 提交于 4月 14, 2022
```
* regist elementwise_xxx
```
fbe2c311
L
executor perf statistics (#41648) · cbe7466f
由 liutiexing 提交于 4月 14, 2022
```
* executor perf statistics

* fix ut

* fix ut

* fix ut

* add ut

* add ut
```
cbe7466f

Fix to #38693 (minimal UT) (#41026) · d0f3296b

由 Jacek Czaja 提交于 4月 14, 2022

* Add UT

- Added missed data_layout

- Added missing conversions

- NDHWC added

- NDHWC support in data_transform

- another fix

- condddate change

- fix

u- fix

- fix

- fix

- fix

- fix

- fix to hack

- compilation fix

- fix to automatic merge

* - reduced UT

* - fix

* - lint

* - fix to lint

d0f3296b

FC+elementwise_add (residual connection) (#41776) · 92d8d0bc

由 Sławomir Siwek 提交于 4月 14, 2022

* Change tensor name to match activation

* declare fc_eltwise_add pass

* merge conv_eltwise refactor PR

* first compilable draft

* unittest feedback tools

* Fuse pass tester

* Move IsReachable() to shared file

* 100% coverage of fuse_pass_tester.cc

* register pass

* Add bias node

* Improve unit tests / remove bias node from pattern

* improve fc_eltwiseadd_unittest

* cancel eltwise_add fuse if act is already fused

* Add elementwise_input scale

* Residual MVP

* Add new FC attrs

* Add more test cases

* Add missing op attrs

* Adapt code to new Elementwise pattern

* reuse existing fcpattern

* improve code style

* remove unused arguments

* fix typo

* remove whitespace

* remove int8 related code

* Remove attributes from base ops

* style

* style check

* Remove input from base op

* Set attribute during fuse

* ut timeout

* download and test model

* DRY

* apply feedback from review

* Style check

* fix typo

* cosmetic changes

* explicitly set residual as output

* VIT-OCR accuracy check

* trigger CI

* remove whitespaces

* fix missing data file

92d8d0bc

add mkldnn int8 pass [step3] (#41599) · 8e2d4d30

由 baoachun 提交于 4月 14, 2022

* add mkldnn int8 pass [step3]

* Add test for compute_propagate_scales_mkldnn_pass

* update pass

* update api comment and python api
Co-authored-by: Nwozna <joanna.wozna@intel.com>

8e2d4d30

Added shuffle_channel BF16/FP32 FWD oneDNN kernel (#39756) · c7623d72

由 jakpiase 提交于 4月 14, 2022

* added shuffle_channel bf16/fp32 fwd kernel

* added missing files

* CI fix

* changed from pten to phi

* tmp save

* added reviewers suggestions

* fix for test

c7623d72

13 4月, 2022 5 次提交
- W
  the one ps proto (#41659) · b12af9e1
  由 wangguanqun 提交于 4月 13, 2022
```
* the one ps proto

* the one ps proto

* fix

* fix

* fix

* fix windows ci

* fix windows ci

* add dependency

* add dependency
```
  b12af9e1
- Z
  [XPUPS]add support for kunlun2 (#40985) · c9c03e7b
  由 zmxdream 提交于 4月 13, 2022
```
[XPUPS]add support for kunlun2
Co-authored-by: NWorgenZhang <frank08081993@gmail.com>
```
  c9c03e7b
- Z
  Fix problem of infermeta with vector output (#41646) · b2390438
  由 zyfncg 提交于 4月 13, 2022
```
* remove stack_grad infershape

* fix bug of output with null

* fix bug
```
  b2390438
- T
  optimize hbm (#41623) · d95280c7
  由 Thunderbrook 提交于 4月 13, 2022
```
* optimize hbm

* format

* format
```
  d95280c7
- C
  [Phi&CustomOp] Remove deprecated enum PlaceType for custom op & add warning (#41647) · 78ef1071
  由 Chen Weihang 提交于 4月 13, 2022
```
* remove old custom op placetype

* replace dist  placetype using

* add with gpu macro

* fix mutable_data error

* fix set value error

* add comment
```
  78ef1071
12 4月, 2022 3 次提交
- D
  【heterps】datafeed puttofeedvec performance (#40168) · c202a613
  由 danleifeng 提交于 4月 12, 2022
```
* perform SlotRecordInMemoryDataFeed feedvec;test=develop
```
  c202a613
- L
  
  add dependency for send/recv to support pp parallel (#41652) · a058b474
  由 Leo Chen 提交于 4月 12, 2022
  
  a058b474
- C
  [CustomOp]Add new method for custom double grad (#41538) · 362c7c80
  由 Chen Weihang 提交于 4月 12, 2022
```
* add new method for custom double grad

* add tanh double grad unittest

* change year

* revert tensor init method
```
  362c7c80
10 4月, 2022 3 次提交
- L
  [KP]fix bug when TruncatedNormal cannot fall back in cpu (#41565) · c1394c6a
  由 Liu-xiandong 提交于 4月 10, 2022
```
* [KP]fix bug when TruncatedNormal cannot fall back in cpu

* delete useless comment

* delete useless comment
```
  c1394c6a
- B
  
  add mkldnn compute_propagate_scales int8 pass (#41592) · c00d869b
  由 baoachun 提交于 4月 10, 2022
  
  c00d869b
- B
  add mkldnn int8 pass [step1] (#41579) · e68da187
  由 baoachun 提交于 4月 10, 2022
```
* add mkldnn int8 pass

* add mkldnn int8 pass

* update pass
```
  e68da187
09 4月, 2022 3 次提交

Unittest recover (#41431) · 7a07c4a5

由 zhaocaibei123 提交于 4月 09, 2022

* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix

* remove some interface

* fix

* remove

* code style

* recover

* fix

* remove code unused

* remove some unused table & accessor & CommonDenseTable => MemoryDenseTable

* fix

* fix

* fix

* recover

* remove unused code

* recover unittest

* fix

* remove

* fix

* remove code unuseful

* remove

* fix

* recover

* remove
Co-authored-by: Nesythan <esythan@126.com>

7a07c4a5

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

L
[new-exec] fix bug that no thread is waked up when adding task to threadpool (#41567) · f581f5bf
由 Leo Chen 提交于 4月 09, 2022
```
* fix bug that no thread is waked up when adding task to threadpool

* fix typo
```
f581f5bf

07 4月, 2022 3 次提交
- T
  [GPUPS] bind afs wrpper (#41227) · b3bcebbe
  由 Thunderbrook 提交于 4月 07, 2022
```
* afs wrapper

* format

* format

* macro
```
  b3bcebbe
- L
  Profile Executors (#41100) · dfb47986
  由 liutiexing 提交于 4月 07, 2022
```
* Profile Executors

* update

* fix ut

* fix names

* update

* update
```
  dfb47986
- H
  momentum support l2decay for xpu. test=kunlun (#41325) · 533c649f
  由 houj04 提交于 4月 07, 2022
```
* momentum support l2decay for xpu. test=kunlun

* fix include file. test=kunlun

* fix cmake for device_worker. test=kunlun
```
  533c649f
06 4月, 2022 1 次提交
- A
  [IPU] remove paddle_ipu shared library (#41307) · 229e91bf
  由 Allen Guo 提交于 4月 06, 2022
```
* remove paddle_ipu shared library

* fix unique_name
```
  229e91bf
05 4月, 2022 2 次提交

Z
Fix bug of data transform in inference executor (#41349) · 91212104
由 zyfncg 提交于 4月 05, 2022
```
* fix bug of data transform in inference executor

* fix bug
```
91212104

[new-exec] enable the new standalone executor by default (#41179) · 93ea1297

由 Leo Chen 提交于 4月 05, 2022

* enable new executor by default

* enable stream safe allocator

* test=document_fix;test=coverage

* do not use scope in op kernel

* fit empty program for new executor

* fix communication depend

* fix test_sync_batch_norm

* skip unsupported place

* refine datatransfer

* fit for dirtributed program

* fix dependencpy

* fix some ut

93ea1297

04 4月, 2022 2 次提交

S
conv + elementwise_add refactor (#41286) · e5e0b726
由 Sławomir Siwek 提交于 4月 04, 2022
```
* DRY

* change nodes names

* add const prefix

* change asX to as_x in all files
```
e5e0b726

Add dropout yaml (#41355) · 1c7001e7

由 hong 提交于 4月 04, 2022

* add dropout slice yaml

* remove useless code

* fix infer shape error

* skip infrt compile for dropout

1c7001e7

03 4月, 2022 1 次提交

[Phi]Concat grad (#41112) · 3f57ef7a

由 chentianyu03 提交于 4月 03, 2022

* add concat_grad kernel

* fix error

* remove comment code

* fix outs nullptr error

* change to phi header

* add concat_grad declare for standalone_executor_test

3f57ef7a

02 4月, 2022 4 次提交
- L
  
  [new-exec] fit empty program for new executor (#41328) · e0ccaeaf
  由 Leo Chen 提交于 4月 02, 2022
  
  e0ccaeaf
- W
  [Paddle inference] support new quant_model (#41049) · 1b58ce14
  由 Wangzheee 提交于 4月 02, 2022
```
* paddle inference support new quant_model
```
  1b58ce14
- L
  [KP] fix bug in phi static graph mode (#41269) · d0f46aac
  由 Liu-xiandong 提交于 4月 02, 2022
```
* [KP] fix bug in phi static graph mode

* modify the useless code
```
  d0f46aac
- Z
  统一ps refine (#41234) · b3270adf
  由 zhaocaibei123 提交于 4月 02, 2022
```
* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix
Co-authored-by: Nesythan <esythan@126.com>
```
  b3270adf

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致