提交 · b9ee846e463a9b9ea2a67e3af08b52593799e6a3 · 机器未来 / Paddle

05 4月, 2022 2 次提交
- Z
  Add roi_align yaml and unittest (#41402) · b9ee846e
  由 zyfncg 提交于 4月 05, 2022
```
* add roi_align yaml

* fix bug
```
  b9ee846e
- R
  Add nms op and batched_nms api (#40962) · 7554f428
  由 RichardWooSJTU 提交于 4月 05, 2022
```
* add nms op and batched_nms api
```
  7554f428
04 4月, 2022 3 次提交

Add expand as sigmoid api (#41311) · fa250aa1

由 hong 提交于 4月 04, 2022

* update epxand and sigmoid with cross entropy

* skip expand as infrt check

* fix sigmoid cross entropy bug

* remove no grad set white list

* remove no grad set

* fix bug

* fix sigmoid error

* fix bug

fa250aa1

Add dropout yaml (#41355) · 1c7001e7

由 hong 提交于 4月 04, 2022

* add dropout slice yaml

* remove useless code

* fix infer shape error

* skip infrt compile for dropout

1c7001e7

Add yaml for flatten_contiguous_range OP (#41345) · c5285cc5

由 From00 提交于 4月 04, 2022

* Add yaml for flatten_contiguous_range OP

* update

* Fix typos
Co-authored-by: NShixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

c5285cc5

02 4月, 2022 2 次提交

H
[Infrt] skip grad kernel in infrt frame (#41315) · 2a01a157
由 huzhiqiang 提交于 4月 02, 2022
```
* code

* code
```
2a01a157

Enhance vjp/jvp/Jacobian/Hessian API for supporting dynamic, static graph and... · 9e764d82

由 Xiaoxu Chen 提交于 4月 02, 2022

Enhance vjp/jvp/Jacobian/Hessian API for supporting dynamic, static graph and batched, unbatched mode (#40692)

* modify vjp/jvp for both dynamic and static graph

* enforce jacobian class for supporting first/last batch

* add unittest for jvp, jacobian withlast batch, jacobian with first batch

* fix the incorrect shape when multi-index Jacobian

* enforce Hessian class for supporting dynamic graph

* add Hessian class unittest

* bugfix, jvp double_backward_trick zeros_like return stop_gradient=True in static graph

* add API beta warnnings

* add white_list for cuda11.x ci windows.

* optimize some code snippets and documments

* set unittest timeout to 100 seconds

* move vjp,jvp,Jacobian,Hessian to incubate

* fix vjp,vjp import path of sample code

* fix code style error of augtograd/__init__ file

9e764d82

01 4月, 2022 1 次提交

Add nll_loss yaml (#41126) · 8e032db8

由 zyfncg 提交于 4月 01, 2022

* add nll_loss yaml

* fix nll loss

* fix nll loss bug

* fix bug

* fix bug

* fix infrt problem
Co-authored-by: Nxiongkun <xiongkun03@baidu.com>

8e032db8

31 3月, 2022 1 次提交

heter & multi-cloud brpc communication (#40965) · 2f41f389

由 ziyoujiyi 提交于 3月 31, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

2f41f389

30 3月, 2022 2 次提交
- 王
  
  [Infrt] add infer shape cache for kernel. (#41104) · 60c4c9cd
  由王明冬提交于 3月 30, 2022
  
  60c4c9cd
- H
  
  [Infrt] add skip method for inferShape codegen (#41014) · 1840349a
  由 huzhiqiang 提交于 3月 30, 2022
  
  1840349a
29 3月, 2022 1 次提交
- 王
  
  [Infrt] delete custom_pdop.td and move op to infrt dialect. (#41021) · 9bb3744f
  由王明冬提交于 3月 29, 2022
  
  9bb3744f
28 3月, 2022 2 次提交
- 王
  
  [infrt] move graph op from pd dialect to infrt dialect. (#41003) · bf93050c
  由王明冬提交于 3月 28, 2022
  
  bf93050c
- C
  [Phi] Fix assign kernel bug (#40927) · 822a2d1f
  由 Chen Weihang 提交于 3月 28, 2022
```
* fix assign kernel bug

* fix xpu kernel select error

* add cudn pinned place

* fix copy error

* fix infrt error
```
  822a2d1f
27 3月, 2022 1 次提交

Add StringTensor (#39830) · 0695e1ac

由 Jack Zhou 提交于 3月 27, 2022

* add string tensor and case convert kernels

* Add strings empty kernel; Reorganize the structure of case convert kernel

* Add string infermeta

* Update mutable_data of string tensor

* rename kernel name

* add string copy tmp

* Fix strings copy device bug

* add utf8 gpu converter

* add string tensor c++ api

* Remove mutable_data of string tensor

* update string tensor interface

* remove charcases_flag.h

* remove some fluid headers

* Add make_ddim

* __HIPCC__ -> PADDLE_WITH_HIP

* remove fluid headers

* fix cpu compile

* remove std::hash

* Fix cudaMalloc

* Remove strings/impl directory

* Fix infrt/get_phi_kernel_info.py;Add custom_kernels deps

* Add empty kernel test

* Remove some comments

* Modify lower/upper api encoding type: string->bool

* STRING->PSTRING; Add CreateInferLikeMeta

* Add code gen for C++ String API

* remove strings_api_utils.h

* Add ignore file (strings_api.h, strings_api.cc)

* update strings gen script

* change args order of case convert kernels

* Add comments for pstring, StringTensor

* cpstring_internal.h -> cpstring_impl.h

* Update accordding to comments:

1. Remove fluid headers
2. paddle::platform::errors -> phi::errors
3. Use 'place.GetType() == phi::AllocationType::GPU' instead of 'paddle::platform::is_cpu_space()'
4. Use camel code style

* Remove all singletons in strings kernels

* fix rocm compile

* Fix py3 compile

* Fix c++ coverage

* 1. Add pstring proto type
2. Add StringTensor debug info
3. Rename case_convert_kernel to strings_lower_upper
4. Remove serialize derialize strings kernel

* DataLayout::PSTRING -> DataLayout::PSTRING_UNION

* Register pstring data type

* Fix strings api gen

* Fix dense tensor register pstring dtype

* Fix error messages

* remove line

* add pstring unittest

* remove test string api unitest

* remove empty line

* Remove some headers to decrease the size of executable file

0695e1ac

25 3月, 2022 3 次提交

W
infrt update phi gpu register. (#40866) · 5f6038ff
由 Wilber 提交于 3月 25, 2022
```
* update register every make.

* fix

* update
```
5f6038ff

[Phi] Migrate Adam and AdamW into Phi (#40351) · 56cd3407

由 Aurelius84 提交于 3月 25, 2022

* [Phi] Migrate Adam and Adamw into Phi

* fix compile error and unittest ok

* fix compile error and unittest ok

* fix undefined reference to fLI::FLAGS

* test depend on operator

* fix cmake

* fix xpu compile

* fix infrt

* fix amp_type_traits

* fix amp_type_traits

* modify according reviewer

* modify according reviewer

* fix dtype float16

* fix typo

* fix Cmake

* fix code style

56cd3407

王

[infrt] add phi_dt.create_inited_dense_tensor.cpu.f32 kernel. (#40902) · 65478332
由王明冬提交于 3月 25, 2022

65478332

24 3月, 2022 4 次提交

[AMP] Support amp for Intermediate_dygraph (#40623) · c12f7d48

由 zhangbo9674 提交于 3月 24, 2022

* approve amp for intermediate_dygraph

* add amp_utils for intermediate_dygraph

* add amp needcast check for mlu & npu

* test unittest

* add SetGradNode for set_stop_gradient && add checktensor for GradientHooks

* refine code

* refien unittest of imperative_amp for new dygraph

* inplace api skip amp

* add test_imperative_qat_amp for intermediate amp

* refine code

* refine test_amp ci strategy

* refine unittest code

* refine amp_utils code

* refine amp getpromotetype for some special op

* refine unittest code

c12f7d48

R

the `defaults` in FullArgSpec may be `None` (#40882) · 99541895
由 Ren Wei (任卫) 提交于 3月 24, 2022

99541895
H

[Infrt] add method for automatically scanning pass and kernel info (#40822) · f51a5791
由 huzhiqiang 提交于 3月 24, 2022

f51a5791
H

[Infrt] upgrade kernel launcher fun generator (#40826) · 7fa3a724
由 huzhiqiang 提交于 3月 24, 2022

7fa3a724

23 3月, 2022 2 次提交
- 王
  
  [infrt] add ir support for phi kernel batch_norm_infer. (#40755) · c751e405
  由王明冬提交于 3月 23, 2022
  
  c751e405
- Z
  Removed redundant use of declarations.h (#40703) · 2a1b4c07
  由 Zhanlue Yang 提交于 3月 23, 2022
```
* Removed redundant use of declarations.h

* Fixed minor bug
```
  2a1b4c07
21 3月, 2022 1 次提交
- 石
  
  add the map for dense tensor, test=develop (#40665) · b77e20ac
  由石晓伟提交于 3月 21, 2022
  
  b77e20ac
18 3月, 2022 4 次提交
- H
  
  update infrt script (#40670) · 50fad3ed
  由 huzhiqiang 提交于 3月 18, 2022
  
  50fad3ed
- S
  
  set +x to close showing command, update check_change code with linux (#40456) · 161d27dc
  由 Sing_chan 提交于 3月 18, 2022
  
  161d27dc
- W
  support register with attr (#40564) · 755a6c53
  由 Wilber 提交于 3月 18, 2022
```
* support register with attr

* add infrt_with_gpu macor
```
  755a6c53
- 王
  [infrt] rename pd dialect from mlir to infrt. (#40651) · ef4ef154
  由王明冬提交于 3月 18, 2022
```
* [infrt] rename pd dialect from mlir to infrt. test=develop

* [infrt] fix the kernel signature generator bug.
```
  ef4ef154
17 3月, 2022 2 次提交
- 王
  
  [infrt] move pd_ops.td to pd floder. test=develop (#40613) · 4c01763c
  由王明冬提交于 3月 17, 2022
  
  4c01763c
- 王
  
  [infrt] move pd dialect position. test=develop (#40616) · 3a256637
  由王明冬提交于 3月 17, 2022
  
  3a256637
15 3月, 2022 2 次提交

Skip infrt when checking log fatal (#40529) · c9f3ad03

由 Chen Weihang 提交于 3月 15, 2022

* skip infrt when checking log fatal, test=document_fix

* remove test=document_fix

* update commit

c9f3ad03

[Phi]Move Tanh/BRelu/LeakyRelu/ThresholdedRelu Kernels to Phi (#40385) · d7112180

由 YuanRisheng 提交于 3月 15, 2022

* move activation op

* adjust code format

* fix compile bugs

* fix ci bugs

* code format adjust

* code format adjust2

* activate ci status

* modify according to comment

* move activation kernel

* revert relu6

* reduce add code

* perfect use_phi_functor

* completing func name

* fix bugs when run ci

* fix bugs when run infr

* modifpy infrt get kernel signature

d7112180

14 3月, 2022 1 次提交
- H
  
  [infrt] add skip list (#40450) · 95a526b2
  由 huzhiqiang 提交于 3月 13, 2022
  
  95a526b2
10 3月, 2022 1 次提交

Add trt execute (#40224) · e72ef603

由 Shang Zhizhou 提交于 3月 10, 2022

* add trt.execute

* merge trt.engine type

* update return op

* update comments

* fix style

* fix style

e72ef603

09 3月, 2022 2 次提交

H

[Infrt]Update kernel dialect (#40141) · 767647ce
由 huzhiqiang 提交于 3月 09, 2022

767647ce

build documents if public apis modified, meanwhile their samplecodes should be tested (#39728) · 041c4bca

由 Ren Wei (任卫) 提交于 3月 09, 2022

* run document_preview when samplecodes be tested

* run document_preview when samplecodes be tested

* sphinx-build symbol link; and build-doc default

* FLUIDDOCDIR typo

* download the required configirations and some other scripts

* install required python packages.

* clone specified branch of docs repo, and if failed, clone the default branch

* clean workspace for docs repo

* use the conf.py imported by https://github.com/PaddlePaddle/docs/pull/4222/

* download and install the boscmd

* Optimaze the code comments.

* specify the pypi index server

* only do doc-build when running in cpu mode

* pull docs pr

git log

paddle_pr_info

* install jq

* force using sphinx-build under py3.7

* using our new domain name for preview

* install python package error

* don't build doc default

041c4bca

07 3月, 2022 2 次提交

王

[infrt] fold the infrt.cvtTensorOp. test=develop (#40214) · b798fb07
由王明冬提交于 3月 07, 2022

b798fb07

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

04 3月, 2022 1 次提交
- 王
  
  [infrt] add ir for convert pd dilect to phi dialect. test=develop (#40104) · 3ac9bc95
  由王明冬提交于 3月 04, 2022
  
  3ac9bc95

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致