提交 · e12a905e9999b87315f27ca99ade38f5fdf22a74 · PaddlePaddle / Paddle

28 9月, 2022 2 次提交

Remove the declaration of using Tensor in framework/tensor.h (#46432) · e12a905e

由 Chen Weihang 提交于 9月 28, 2022

* remove needless using tensor

* remove needless using tensor

* resolve conflict

* replace tensor using

* fix format error

* revert needless changing

* fix rocm and npu compile error

* fix cinn compile error

* fix format error

* fix mkldnn format error

* fix mkldnn format error

* fix cinn compile error

* fix cinn compile error

* fix cinn compile error

* resolve conflict

e12a905e

L

remove const qualifier in function return (#46546) · 8c5b9cf8
由 Leo Chen 提交于 9月 28, 2022

8c5b9cf8

22 9月, 2022 1 次提交
- L
  
  convert grad_merge_all_reduce in graph to program (#46353) · 0a144ca1
  由 Leo Chen 提交于 9月 22, 2022
  
  0a144ca1
19 9月, 2022 1 次提交

Fix wrong eigen header include (#46082) · 59a2a987

由 zyfncg 提交于 9月 19, 2022

* fix wrong eigen header include

* fix complie bug

* fix nan_inf_utils_detail

* fix resource_manager

* fix conv_miopen_helper

59a2a987

09 9月, 2022 1 次提交

[new-exe] convert fused_all_reduce_op_handle to program (#45774) · e755c07e

由 Leo Chen 提交于 9月 09, 2022

* add operator<< for BuildStrategy

* add fake_coalesce

* fit allreduce mode for new_exe

* remove dubeg code

* follow comments

e755c07e

06 9月, 2022 1 次提交
- H
  [jit] pe engine with mkldnn (#45728) · 0a7e6f90
  由 Hui Zhang 提交于 9月 06, 2022
```
* using mkldnn

* using with mkldnn macro

* fix use mkldnn
```
  0a7e6f90
01 9月, 2022 1 次提交
- L
  remove circular dependency of device_context and allocator (#45455) · 934171ae
  由 Leo Chen 提交于 9月 01, 2022
```
* refine cmake of framework

* add deps for dense tensor

* fix deps

* remove alloc(ctx)

* add depends on mkldnn
```
  934171ae
31 8月, 2022 1 次提交
- H
  add del dropout op pass to jit pe enigne (#45439) · 46bc06b5
  由 Hui Zhang 提交于 8月 31, 2022
```
* add del dropout op pass to jit pe enigne

* add delete dropout test
```
  46bc06b5
30 8月, 2022 1 次提交

Remove extra attribute in OpMaker (#44310) · fe321f9a

由 zyfncg 提交于 8月 30, 2022

* add runtime config in phi

* add runtime attr for op desc and op

* fix no proto error

* adjust opdesc set_attr impl

* try to remove conv_op extra attrs

* add init runtime attr map

* change extra header path

* fix runtime_attr

* fix trace_op

* fix bug of pass

* fix merge conflict

* fix dygraph attrs

* fix bug of pass

* fix dygraph bug

* fix unittest module

* delete extra attr default

* fix dropout kernel

* polish code

* fix extra output of instance_norm

* fix merge confilct

* fix op_desc bug

* add extra attr in yaml for conv3d_transpose

* don't remove extra input and output

* fix save_inference_model

* fix bug of batch_norm

* revert some change

* polish log

* polish code

* add code comment
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

fe321f9a

25 8月, 2022 1 次提交
- D
  update brpc version to 1.2.0 (#45351) · 9b5b005e
  由 danleifeng 提交于 8月 25, 2022
```
* update brpc version;test=develop
```
  9b5b005e
18 8月, 2022 1 次提交

change to async mode for xpu multi-card training in static graph mode, test=kunlun (#45024) · 41bdf41d

由 zhangxiaoci 提交于 8月 18, 2022

* change to async mode for xpu multi-card training in static graph mode

* minor bugfix

* irrelevant. move to another pr

* move change to other pr

* fix stream issue

* fix 'stream not meet with current context' error

* fix branch diverge, test=kunlun

41bdf41d

08 8月, 2022 1 次提交
- L
  clean includes of tensor.h (#44928) · ee9ea48d
  由 Leo Chen 提交于 8月 08, 2022
```
* clean tensor.h

* fix gather_nd
```
  ee9ea48d
02 8月, 2022 1 次提交
- L
  
  fix namespace of GPUContext (#44822) · 65f38869
  由 Leo Chen 提交于 8月 02, 2022
  
  65f38869
01 8月, 2022 1 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

29 7月, 2022 1 次提交
- L
  unify fluid::CUDADeviceContext and phi::GpuContext (#44723) · 88490567
  由 Leo Chen 提交于 7月 29, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile
```
  88490567
26 7月, 2022 1 次提交
- R
  Set more attrs in ReplaceScaleLossGradOp (#44576) · ab198b45
  由 Ruibiao Chen 提交于 7月 26, 2022
```
* Set more attrs in ReplaceScaleLossGradOp

* Fix typos

* Fix CI errors

* Add UT
```
  ab198b45
19 7月, 2022 1 次提交
- R
  Rename BOOST_GET macros (#44368) · 4b085c57
  由 Ruibiao Chen 提交于 7月 19, 2022
```
* Rename BOOST_GET macros

* Fix conflicts
```
  4b085c57
15 7月, 2022 1 次提交
- R
  
  Remove boost library (#44092) · d2e59e15
  由 Ruibiao Chen 提交于 7月 15, 2022
  
  d2e59e15
02 7月, 2022 1 次提交

unify cpu context, part2 (#44012) · 755438a7

由 Leo Chen 提交于 7月 02, 2022

* fix init()

* delete test_device_context

* replace CPUDeviceContext with CPUContext

* fix test_scalar

* remove dot_op.cc

* fix compile

755438a7

30 6月, 2022 1 次提交
- R
  Remove boost::variant for FetchResultType (#43932) · f720e231
  由 Ruibiao Chen 提交于 6月 30, 2022
```
* Remove boost::variant for FetchResultType

* Fix pybind errors
```
  f720e231
28 6月, 2022 1 次提交

Remove boost::variant (#43100) · b3cf28f8

由 Ruibiao Chen 提交于 6月 28, 2022

* boost::variant -> paddle::variant

* boost::variant.apply_visit -> paddle::visit

* Update pybind_boost_hraders.h

* Fix CINN compilation errors

* Revert FetchResultType

b3cf28f8

26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
10 6月, 2022 1 次提交
- R
  Refactor DeviceContextPool (#42901) · 114723c9
  由 Ruibiao Chen 提交于 6月 10, 2022
```
* Refactor DeviceContextPool

* Adjust header file order
```
  114723c9
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
02 6月, 2022 1 次提交

Support CUDA Graph for partial graph in dygraph mode (#42786) · d05b940a

由 sneaxiy 提交于 6月 02, 2022

* support CUDAGraph for partial graph

* add ut

* fix ci

* fix ut again because of eager mode

* fix kunlun ci

* fix win ci

d05b940a

17 5月, 2022 1 次提交
- C
  [Eager] Add nan and inf check utils (#42763) · a51c492c
  由 Chen Weihang 提交于 5月 17, 2022
```
* add nan_inf_utils for eager

* support check nan and inf

* add unittest for coverage
```
  a51c492c
12 5月, 2022 1 次提交
- S
  
  Fix some typos in paddle/. (#42408) · 2012672c
  由 Shuangchi He 提交于 5月 12, 2022
  
  2012672c
07 4月, 2022 1 次提交
- L
  Profile Executors (#41100) · dfb47986
  由 liutiexing 提交于 4月 07, 2022
```
* Profile Executors

* update

* fix ut

* fix names

* update

* update
```
  dfb47986
30 3月, 2022 1 次提交
- L
  
  fix bug that some op has no op_role attr (#41040) · 040d3386
  由 Leo Chen 提交于 3月 30, 2022
  
  040d3386
07 3月, 2022 1 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

28 2月, 2022 1 次提交

Update host tracer (#39975) · 406f1b96

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update HostTracer

* fix

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

406f1b96

23 2月, 2022 1 次提交

Update record interface using part3 (#39695) · 1fcaab45

由 chenjian 提交于 2月 23, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update record event interface using

* update record event interface using

* update record event interface using

* update operator.cc

* update part2

* update part1

* update part3

* fix include profiler.h header in ps server

* fix include profiler.h header in ps server

* fix profiler.h header

* fix profiler.h header

* fix merge buf

* update

* fix bug

* fix bug

1fcaab45

21 2月, 2022 1 次提交

Update record interface using part2 (#39694) · c984cd85

由 chenjian 提交于 2月 21, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update record event interface using

* update record event interface using

* update operator.cc

* update part2

* update part1

* fix include profiler.h header in ps server

* fix include profiler.h header in ps server

* fix profiler.h header

c984cd85

20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

19 2月, 2022 2 次提交

[Pten]Unify paddle/pten::framework::ddim into pten::ddim (#39614) · 2fe04264

由 Aurelius84 提交于 2月 19, 2022

* Unify paddle/pten::framework::ddim into pten::ddim

* fix paddle namespace

* compile sucessfully

* fix npu src file

* fix conflict

* fix conflict

* fix tensorrt compiler error

* fix conflict

* fix conflict

* fix tesst file conflict

* fix conflict

* fix mlu file conflict

* fix mlu file conflict

* fix cinn header file conflict

* fix conflict

* fix conflict

* fix conflict

* fix conflict

2fe04264

Update record interface using part1 (#39693) · eec6ef81

由 chenjian 提交于 2月 19, 2022

* fix RecordEvent interface

* modify default level to 4

* update interface use

* add const default trace level

* update record event interface using

* update operator.cc

* update part1

* fix include profiler.h header in ps server

* fix include profiler.h header in ps server

eec6ef81

15 2月, 2022 1 次提交

[PTen]Migrate proto::VarType outside of Pten (#39411) · 7e7e9404

由 Aurelius84 提交于 2月 15, 2022

* #1 migrate dist-related type()-> dtype()

* move datatype function from pten -> fluid/framework

* change type() in imperative into convert(dtype())

* modify xx_tensor->type into xx_tensor->dtype

* change the set_type interface and the caller

* modify xx_tensor.type into xx_tensor.dtype

* fix mutable_data(place, dtype())

* change caller of mutable_data in pten and distributed

* change the caller of mutable_data in fluid/framework

* change the caller of mutable_data in imperative directory

* mutable_data: inference

* update the call of mutable_data

* transfer MakePenScalarArray MakePtenScalar ResetHolderWithType

* pass the compile. the next step is remove VarType in Pten

* fix all and remove VarType from pten. success in linux. Next task is other platform

* fix conflict with develop

* fix compiled error

* Fix reset conversion

* fix conflict

* fix compiled problem

* fix typo

* Fix << in tensor_utils.cc

* fix type->dtype

* fix unittest

* fix tensor init constructor

* fix DataTypeSize for BFloat16

* fix code style

* fix npu compiled error

* fix npu

* compile npu sucessfully

* fix conflict

* fix conflict
Co-authored-by: Nxiongkun <xiongkun03@baidu.com>

7e7e9404

10 2月, 2022 1 次提交

share MemOptVarInfos of external variables into cinn_launch subgraph (#39209) · 35b03e1c

由 TeFeng Chen 提交于 2月 10, 2022

* add a graph pass to share MemOptVarInfos of external variables into subgraph

* update pass name

* fix compile failed

* add share_mem_opt_info_to_subgraph_pass test

* share_mem_opt_info_to_subgraph_pass_test pass

* modify some codes for better style and more robust

* update cmake

35b03e1c

02 2月, 2022 1 次提交
- J
  
  Merge legacy to fluid (#39318) · 34cce62f
  由 Jiabin Yang 提交于 2月 02, 2022
  
  34cce62f

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功