提交 · 0a837cb224746e38e33e0676c566076f92027067 · wmsofts / Paddle

26 12月, 2022 2 次提交
- R
  
  Add FLAGS for communication op dependency in standalone executor (#49291) · d6fef01c
  由 Ruibiao Chen 提交于 12月 26, 2022
  
  d6fef01c
- R
  Improve stream analyzer (#49314) · f0f4dd1e
  由 Ruibiao Chen 提交于 12月 26, 2022
```
* Memory search for stream analyzer

* Shrink redundant waiters
```
  f0f4dd1e
23 12月, 2022 1 次提交
- J
  [AutoParallel-Performance] AMP Flag Memcpy support newexe Overlap (#49219) · 2259ced1
  由 JZ-LIANG 提交于 12月 23, 2022
```
* memcpy overlap

* memcpy newexe
```
  2259ced1
19 12月, 2022 2 次提交
- H
  
  simplify FallbackToCpu (#49124) · 7ffde4bc
  由 HongyuJia 提交于 12月 19, 2022
  
  7ffde4bc
- H
  
  remove expected_kernel_key in interpreter (#49120) · ff79c144
  由 HongyuJia 提交于 12月 19, 2022
  
  ff79c144
12 12月, 2022 1 次提交
- R
  Support cross-step stream synchronization for standalone executor (#48809) · 9455d146
  由 Ruibiao Chen 提交于 12月 12, 2022
```
* Add UT

* Support cross-step stream synchronization for standalone executor

* Fix typos

* Fix typos

* Update UTs
```
  9455d146
09 12月, 2022 1 次提交
- P
  
  [PHI decoupling] move "flags.h" from fluid to phi (#48696) · 39ffef0d
  由 PuQing 提交于 12月 09, 2022
  
  39ffef0d
08 12月, 2022 1 次提交
- R
  
  Set WaiterType of kGpuSync to kCPU (#48758) · a5999d83
  由 Ruibiao Chen 提交于 12月 08, 2022
  
  a5999d83
30 11月, 2022 1 次提交
- Z
  Add fuse_act_add_grad_pass (#48346) · ca552933
  由 zhangbo9674 提交于 11月 30, 2022
```
* add fuse act add grad pass

* polish code

* refine code

* add test

* refine code
```
  ca552933
29 11月, 2022 2 次提交

[Control Flow] replace executor in while op with InterpreterCore (#47573) · 6dbfbfa5

由 kangguangli 提交于 11月 29, 2022

* fix:add no support for cuda_arch<700

* replace Executor in while op with InterpreterCore

* cache InterpreterCore as the member of WhileOp

* fix bug: tensor place changed because of assign op in while loop

* refine code

* refine code

* refine code

* hot fix

* fix compile

* merge develop

* follow comments

* add log for test

* remove LoDTensor

* set flag control_flow_use_new_executor false
Co-authored-by: Nfengshuai <fengshuai03@baidu.com>
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

6dbfbfa5

S

[PHI decoupling] Move MKLDNN code (#48352) · fa051eec
由 Sławomir Siwek 提交于 11月 29, 2022

fa051eec

28 11月, 2022 2 次提交
- Z
  Add trace mode for interpretercore (#48370) · bb1fffd6
  由 zhangbo9674 提交于 11月 28, 2022
```
* add trace mode for interpretercore

* fix bug

* add a ctrl flag

* add record for memcpyd2h

* polish code

* polish code
```
  bb1fffd6
- R
  Remove kSyncRun in StreamAnalyzer (#48425) · e7d459ac
  由 Ruibiao Chen 提交于 11月 28, 2022
```
* Remove kSyncRun in StreamAnalyzer

* Update code
```
  e7d459ac
26 11月, 2022 1 次提交
- L
  fix jit input var not ready error (#48351) · ab6a3dad
  由 Leo Chen 提交于 11月 26, 2022
```
* hot fix

* fix compile

* merge develop

* follow comments
```
  ab6a3dad
25 11月, 2022 2 次提交

[PROFILER] add flops for Profiler (#47766) · 3d1981ad

由 Chitsing KUI 提交于 11月 25, 2022

* attr ready

* op ip ready

* start dynamic

* end2end ok

* input shape to map, stat by op

* layer wip

* first version ready

* fix proto depds

* fix profiler deps

* fix flops typo, rm tuple shape

3d1981ad

Refactor stream anayzer (#48158) · 889318d8

由 Ruibiao Chen 提交于 11月 25, 2022

* Move stream_anayzer to interpreter

* Refactor StreamAnalyzer

* Refactor RunNextInstructionList

* Remove no_data_transform_index

* Fix typos

* Fix data_transfer OpFuncType error

* Add event for depend_op

* Update transfer OpFuncType for heter place

889318d8

17 11月, 2022 1 次提交
- H
  
  fix new executor gc dep bug (#48068) · 04dcb9d7
  由 hong 提交于 11月 17, 2022
  
  04dcb9d7
15 11月, 2022 1 次提交

mkldnn directory cleanup (#47779) · 8a339d24

由 Sławomir Siwek 提交于 11月 15, 2022

* cleanup unused code

* unify is_int8 is_bfloat16

* Simplify matmul_v2 FWD kernel

* remove RunKernel methods

* remove import namespace

* remove headers

* clean fluid/phi cross imports

* remove fluid axpy_handler

* delete fluid methods

* activations

* OneDNNMemDesc

* MKLDNNFormatForSize

* MatchShapeToLayout

* MKLDNNMemoryFormat

* MKLDNNFormat

* ReorderMKLDNNHandler

* to_void_cast

* review suggestions

* interpolate

* remove fluid depedency

8a339d24

14 11月, 2022 1 次提交
- R
  
  Do not release memory cache after build_op_func_list in interpretercore (#47910) · 8347354d
  由 Ruibiao Chen 提交于 11月 14, 2022
  
  8347354d
11 11月, 2022 1 次提交

Refine shape op lanch method for standalone executor (#47843) · 981d1a10

由 zhangbo9674 提交于 11月 11, 2022

* refine shape op in new_exe

* Revert "refine shape op in new_exe"

This reverts commit 0e0336ddc5eede3da019b348a0bcc0ef0f3be64e.

* refine shape op in new_exe

* refine shape expected_kernel_type

* add SelectedRows check for shape op

* refine code

981d1a10

07 11月, 2022 1 次提交

[Restore PR] Remove hard code of PADDLE_WITH_CUDA (#47630) · 908a381d

由 HongyuJia 提交于 11月 07, 2022

* move cudnn hardcode outside GetExpectedKernelType

* add header file

* debug

* update interpreter_util with hardcode

* update interpreter_util headerfile

* solve activation hardcode

* debug with CI

* add mkldnn_op_list header file

* temporarily uncomment mkldnn

* temporarily uncomment mkldnn

* delete sequence_softmax cudnn hardcode

* add hardcode to data_transfer.cc

* update data_transfer headerfile

* try fix segment fault

* update cudnn&miopen_helper

* reset HasAttr of DygraphExctnCtx

* debug, this commit should pass all CI

* debug should pass CI, temporarily disable activation

* debug should pass CI

* fix default_attr=nullptr bug

* clean debug code

* Call SetDnnFallback function in the base class

* activation fallback to plain kernel

* fix default GetExpectedKernelType find wrong kernel

* search cudnn kernel instead of fallback

* fix cudnn_handle bug

* remove tanh use_cudnn

* restore tanh use_cudnn

* debug tanh

* fix tanh bug

* delete activation cudnn kernel

* polish code

908a381d

03 11月, 2022 1 次提交

Improve performance of coalesce_tensor and depend op in standalone executor (#47606) · 5fb1e824

由 Ruibiao Chen 提交于 11月 03, 2022

* Dispath computation OPs before communication in standalone executor

* Update code

* Fix CI errors

* Improve performance of coalesce_tensor and depend OP in standalone executor

* pre-commit check

5fb1e824

02 11月, 2022 2 次提交
- H
  Revert "[Kernel Selection] Remove hard code of PADDLE_WITH_CUDA (#47325)" (#47582) · a57a19ea
  由 HongyuJia 提交于 11月 02, 2022
```
This reverts commit f9134045.
```
  a57a19ea
- R
  Dispatch computation OPs before communication in standalone executor (#47471) · 5ed487bf
  由 Ruibiao Chen 提交于 11月 02, 2022
```
* Dispath computation OPs before communication in standalone executor

* Update code

* Fix CI errors
```
  5ed487bf
01 11月, 2022 3 次提交

[Kernel Selection] Remove hard code of PADDLE_WITH_CUDA (#47325) · f9134045

由 HongyuJia 提交于 11月 01, 2022

* move cudnn hardcode outside GetExpectedKernelType

* add header file

* debug

* update interpreter_util with hardcode

* update interpreter_util headerfile

* solve activation hardcode

* debug with CI

* add mkldnn_op_list header file

* temporarily uncomment mkldnn

* temporarily uncomment mkldnn

* delete sequence_softmax cudnn hardcode

* add hardcode to data_transfer.cc

* update data_transfer headerfile

* try fix segment fault

* update cudnn&miopen_helper

* reset HasAttr of DygraphExctnCtx

* debug, this commit should pass all CI

* debug should pass CI, temporarily disable activation

* debug should pass CI

* fix default_attr=nullptr bug

* clean debug code

f9134045

Y
[PHI]Standardise some C++ API (Part2) (#47510) · 399047d7
由 YuanRisheng 提交于 11月 01, 2022
```
* standard_api

* add hardtanh
```
399047d7

Support custom stream for standalone executor (#47411) · e12b6c04

由 Ruibiao Chen 提交于 11月 01, 2022

* [Auto Parallel] Improve the c++ dist attr

* [Auto Parallel] Modify test_program.py

* Support custom stream for standalone executor
Co-authored-by: NYulong Ao <aoyulong@baidu.com>

e12b6c04

31 10月, 2022 1 次提交

[ControlFlow] replace executor in run method of control flow ops with standalone_executor (#45696) · 3b219e5e

由 kangguangli 提交于 10月 31, 2022

* replace executor in conditional_block_op.run with standalone_executor

* add block_id as the argument of standalone executor's method run; add print for program

* fix scope bug about conditional block op

* fix bug: unnecessary return of fetch value

* fix typo

* fix: quantization will set variable persistable, and these variables must exist in global scope

* add interpretercore cache for conditional block op but not activate in default

* fix bug: local scope reuse for conditional block op

* reset scope when conditional block op runs

* fix typo

* fix typo and code style

* add build scope for conditional block op

* add skip for transfer_layout kernel

* refind code

* fix reset_scope

* fix reset_scope

* refine code

* refine code

* refine code

1. remove flag use in conditional_block_op
2. pass execution_config to BuildOpFuncList instead of individual parameter

* refine code

* remove the use of FLAGS_control_flow_use_new_executor_cache

* change FLAGS_control_flow_use_new_executor to false

3b219e5e

27 10月, 2022 1 次提交

make all cpp tests dynamic linked to libpaddle.so [except windows] (#47088) · 2096448b

由 Leo Chen 提交于 10月 27, 2022

* make all cpp tests dynamic linked to libpaddle.so

* add comments

* keep old cc_test for some tests

* fix some ut

* make some ut use cc_test_old

* fix typos and fit for win32

* fix lib path

* fix some tests

* skip lite test

* fit for rocm

* fit for cinn

* fit for mac

* fit for win32

* skip inference ut

* skip  windows

* fix coverage

2096448b

26 10月, 2022 2 次提交
- W
  fix uninitialized, tautological-constant-out-of-range-compare and... · 076c41ef
  由 Wang Xin 提交于 10月 26, 2022
```
fix uninitialized, tautological-constant-out-of-range-compare and literal-conversion warning on macos (#47341)
```
  076c41ef
- C
  Remove the declaration of using LoDTensor in framework/lod_tensor.h (Part2) (#46953) · 1cb12ff5
  由 Chen Weihang 提交于 10月 25, 2022
```
* remove using lodtensor part2

* resolve code format error

* resolve conflict

* resolve conflict

* replace added frameworrk tensor
```
  1cb12ff5
19 10月, 2022 1 次提交

Support stream overlap for c_allreduce_sum (#47030) · d00b7d83

由 Ruibiao Chen 提交于 10月 19, 2022

* Support stream overlap for c_allreduce_sum

* Test CI

* Add notes

* Add SingleStreamGuard for BuildOpFuncList

d00b7d83

17 10月, 2022 1 次提交
- Y
  [PHI]Modify DataLayout's namespace from paddle::experimental to phi (#46869) · ec749398
  由 YuanRisheng 提交于 10月 17, 2022
```
* namespace modify

* update by comment
```
  ec749398
13 10月, 2022 2 次提交

L
[new-exec] remove variable scope, stage2 (#43936) · 1230a3f4
由 Leo Chen 提交于 10月 13, 2022
```
* remove class ScopeBase

* reopen test
```
1230a3f4

[Kernel Selection] Remove hard code of PADDLE_WITH_MKLDNN (#46606) · ef1c8759

由 HongyuJia 提交于 10月 13, 2022

* remove PADDLE_WITH_MKLDNN, test white_list=abs

* fix unique_ptr

* fix op.Type()

* remove TODO in kernel_dispatch.h

* remove IndicateVarDataType function, update white_list

* remove mkldnn hard code

* add comments

* fix ==

* update mkldnn_op_list

* delete hard code of OPs

* update mkldnn_op_list

* update mkldnn_op_list, remove interp

* add error check for ExecutionContext

* update mkldnn_op_list, remove transpose2_grad

* remove interpolate mkldnn

* remove fill_constant mkldnn

* opt HasAttr in DygraphExecutionContext

* deprecated commit, test mkldnn_white_list

* deprecated commit, test mkldnn_white_list

* deprecated commit, test mkldnn_black_list

* update mkldnn_op_list, add assert error op

* solve cudnn related op

* fix error

* add mkldnn fallback in phi_utils.cc

* remove mkldnn fallback in phi_utils.cc

* opt code implementation

* polish Copyright License

ef1c8759

12 10月, 2022 1 次提交
- L
  clean code of interpretercore (#46891) · 5303b66b
  由 Leo Chen 提交于 10月 12, 2022
```
* refactor

* refine code
```
  5303b66b
11 10月, 2022 1 次提交
- C
  Remove LoDTensor using in fluid (Part 1) (#46663) · 940d8f25
  由 Chen Weihang 提交于 10月 11, 2022
```
* remove using lodtensor part1

* polish history code format
```
  940d8f25
10 10月, 2022 2 次提交

[PHI]Add RNN yaml (#46812) · ab60fd8b

由 YuanRisheng 提交于 10月 10, 2022

* add yaml entry for rnn and rrnn_grad, move infershape function for rnn_grad to phi infer meta

* WIP: move rnn kernrl to phi

* Change the code generation to avoid converting from intializer list to tuple of heterogeneous types.
This is only triggered when an api has intermediate outputs, and the result of the outputs are of heterogeneous types.

* fix the bug that when none in a vector of tensors requires gradient, the conversion to InferShapeContext to InferMetaContext (a.k.a. BuildInferMetaContext) produces errorous results.

* fix ci bugs

* fix ci bugs

* fix ci bugs

* modify code according comment
Co-authored-by: Nchenfeiyu <chenfeiyu@baidu.com>

ab60fd8b

reduce time cost on atomic in interpretercore (#46688) · dd3d45de

由 Leo Chen 提交于 10月 10, 2022

* reduce time cost on atomic in interpretercore

* clear code of PrepareAtomic in interpretercore

* refine threadpool cache

dd3d45de

09 10月, 2022 1 次提交
- Z
  
  interpretercore thread not always spin (#46687) · 2e217dbb
  由 zhangbo9674 提交于 10月 09, 2022
  
  2e217dbb

wmsofts / Paddle 与 Fork 源项目一致

wmsofts / Paddle
与 Fork 源项目一致