提交 · 96bcf2dfaba6145bbe4e7a58ffe7137b9946872f · 机器未来 / Paddle

18 1月, 2022 2 次提交
- change CUDA implementaion of uniform/gaussian OP (#38611) · bbbd75e4
  由 zhouweiwei2014 提交于 1月 18, 2022
```
* change CUDA implementaion of uniform/gaussian OP

* fix unittest
```
  bbbd75e4
- S
  Speedup FP16 Gelu op using fast math and vectorized 8 kernel (#38980) · 8c20d668
  由 sneaxiy 提交于 1月 18, 2022
```
* speedup gelu using fast math

* add bwd part
```
  8c20d668
30 12月, 2021 1 次提交

flags to choose kp kernel (#38455) · ed2cfecf

由 Feng Xing 提交于 12月 30, 2021

This PR adds runtime flags run_kp_kernel, which choose which op to run for xpu2. There are two: dynamic linked and built from kp.

ed2cfecf

20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
03 11月, 2021 1 次提交

Add FLAGS_allow_cinn_ops & FLAGS_deny_cinn_ops for controlling op types used... · 2479664a

由 Zhen Wang 提交于 11月 03, 2021

Add FLAGS_allow_cinn_ops & FLAGS_deny_cinn_ops for controlling op types used in training with CINN. (#36842)

* Update UT test_parallel_executor_run_cinn.py.

* Add FLAGS_allow_cinn_ops & FLAGS_deny_cinn_ops & FLAGS_cinn_ops_delim.

* Use the custom StringSplit function and remove the FLAGS_cinn_ops_delim flag.

* Add FlagController test.

* Apply lock to the cache_ only in CinnCompiler.

* Add VizGraph & ReadableKey method for CinnCompiler.

* Update the dot style of VizGraph in CinnCompiler.

2479664a

01 11月, 2021 1 次提交

Paddle Tensor Operation Library initial implementation (#34425) · b9fdd3bc

由 Chen Weihang 提交于 11月 01, 2021

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: Nzyfncg <1370305206@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

b9fdd3bc

24 10月, 2021 1 次提交
- Z
  
  Add the macro `-DPADDLE_WITH_CINN`. (#36660) · e2173b68
  由 Zhen Wang 提交于 10月 24, 2021
  
  e2173b68
11 10月, 2021 2 次提交

Z
Add FLAGS_allreduce_record_one_event to remove event waiting number (#36263) · 7b45a46e
由 Zeng Jinle 提交于 10月 11, 2021
```
* add FLAGS_allreduce_record_one_event

* add more comments

* fix ut

* improve coverage

* fix ut, improve coverage
```
7b45a46e

Add use_cinn Flag and RunFromCinn in PE (#36107) · 5690666c

由 Huihuang Zheng 提交于 10月 11, 2021

Add use_cinn flag and use it to control whether we run PaddlePaddle using CINN.

Also add:

Replace PaddlePaddle graph with a CINN graph in a pass
PE Method to feed data and run the graph by CINN

5690666c

30 9月, 2021 1 次提交
- Y
  
  add slotrecord datafeed (#36099) · 0a3dbe8a
  由 yaoxuefeng 提交于 9月 30, 2021
  
  0a3dbe8a
29 9月, 2021 2 次提交
- Y
  
  add slot record dataset (#36200) · 79bd5f90
  由 yaoxuefeng 提交于 9月 29, 2021
  
  79bd5f90
- A
  [NPU] mod for model bert (#36165) · 7bddf2e8
  由 Aganlengzi 提交于 9月 29, 2021
```
* merge conflict of paddle_gtest_main.cc

* modify FLAGS_npu_precision_mode and default not to call aclSetCompileopt
```
  7bddf2e8
17 9月, 2021 1 次提交

Make flag adding easier (#35823) · 2c781455

由 Zeng Jinle 提交于 9月 17, 2021

* make flag setter easier

* update

* rename macro name

* fix bug of public/writable

* update to pass CI

* polish

* fix CPU link error

2c781455

08 9月, 2021 1 次提交

Enable program passes on Fleet APIs (#34955) · 5f369881

由 Zeng Jinle 提交于 9月 08, 2021

* add fleet api for program pass

* turn on apply pass for CI test

* fix disable fuse_all_optimizer bug

* try to test ci

* fix CI

* fill unspecified op role

* fix fuse_allreduce

* add ut to improve coverage

* remove useless change

* improve c++ coverage

* follow some comments

* test ir pass pipeline

* update doc

* reduce ut time again

5f369881

06 9月, 2021 1 次提交
- Y
  
  Revert hccl check nan (#35438) · c3ad7775
  由 Yuang Liu 提交于 9月 06, 2021
  
  c3ad7775
02 9月, 2021 1 次提交
- B
  
  [npu] add update_loss_scaling npu min value (#35270) · 280d7421
  由 Baibaifan 提交于 9月 02, 2021
  
  280d7421
24 8月, 2021 1 次提交
- G
  
  Add flags to control whether to check Nan value of hccl_allreduce_sum. (#35093) · 5b737834
  由 gongweibao 提交于 8月 24, 2021
  
  5b737834
13 8月, 2021 1 次提交
- B
  
  add retry for gethostbyname (#34855) · e92f0388
  由 Baibaifan 提交于 8月 13, 2021
  
  e92f0388
30 7月, 2021 1 次提交
- L
  
  [NPU] support npu config on aclinit (#34500) · 6c09496a
  由 Leo Chen 提交于 7月 30, 2021
  
  6c09496a
30 4月, 2021 1 次提交
- X
  
  add flag to check_kernel launch (#32692) · 109fdf14
  由 XiangGao 提交于 4月 30, 2021
  
  109fdf14
09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

29 3月, 2021 1 次提交
- R
  
  [ROCM] added a cudnn switch of conv2d for rocm platform (#31836) · 123949eb
  由 ronnywang 提交于 3月 29, 2021
  
  123949eb
22 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid platform for rocm39 (part4), test=develop (#30936) · 33429630
  由 Qi Li 提交于 2月 22, 2021
  
  33429630
24 12月, 2020 1 次提交
- W
  
  call_statck is turned on default when ON_INFER=ON (#29798) · 2c0a4a34
  由 Wilber 提交于 12月 24, 2020
  
  2c0a4a34
25 11月, 2020 1 次提交
- C
  Hide the C++ stack by default and add hints (#29042) · fea0e294
  由 Chen Weihang 提交于 11月 25, 2020
```
* default not show cpp statck & add hint

* fix failed unittest

* fix failed unittests
```
  fea0e294
28 9月, 2020 1 次提交
- A
  Add support for mkldnn ops types selection with FLAGS in dygraph (#27482) · 0ecf441a
  由 arlesniak 提交于 9月 28, 2020
```
* Add support for mkldnn ops types selection with FLAGS in dygraph

* use regex to match DNNL verbose

* python3 encoding fix
```
  0ecf441a
21 9月, 2020 1 次提交

[Feature] Enhance inplace addto strategy for gradient accumulation in static graph (#27112) · aba759ba

由 Leo Chen 提交于 9月 21, 2020

* support use add instead of sum to do gradient accumulation

* add inplace addto pass

* add grad_add op and inplace addto pass

* remove debug code

* code refine

* fix bug when sereral sum ops inserts at same op_idx

* fix Flags type

* add addto attribute for conv3d

* fix ut

* code clean

* fix type

aba759ba

28 8月, 2020 1 次提交

Update the demo code and the doc of varbase.backward. (#26506) · f9066e6a

由 Zhen Wang 提交于 8月 28, 2020

* update the demo code and the doc of varbase.backward.

* update the doc of the fake interface `paddle.fluid.Variable`.

* remove BackwardStrategy.

f9066e6a

07 8月, 2020 1 次提交
- L
  Add flags to control call stack of error message (#25997) · 751305ec
  由 Leo Chen 提交于 8月 07, 2020
```
* add flags_call_stack_level

* update

* refine code
```
  751305ec
28 7月, 2020 1 次提交

Added DNNL cache management for DyGraph (#25624) · e52df3b1

由 arlesniak 提交于 7月 28, 2020

* Added DNNL cache management for DyGraph

* move FLAGS_use_mkldnn to more general CMakeLists, getu use of the flag in ClearGradients

* missing file

* Fixes after review

* Bringing back original idea of place for 'use_mkldnn' flag to be accessible from platform nad imperative.

* Removed duplicate and added docs

* Fixes for CI

e52df3b1

21 4月, 2020 1 次提交

石

New feature: thread local allocator, test=develop (#23989) · d2584a70

由石晓伟提交于 4月 21, 2020

* add the thread_local_allocator, test=develop

* refactor the thread_local_allocator, test=develop

* provides option setting strategy, test=develop

d2584a70

15 4月, 2020 1 次提交

Correct the wrong name in the flag comment (#22977) · c2a60bb1

由 guofei 提交于 4月 15, 2020

Correct the name [`FLAGS_sync_nccl_allreduce`](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/flags/others_cn.html#flags-sync-nccl-allreduce) based on the information from our official website.

c2a60bb1

04 3月, 2020 1 次提交

Add flags to limit gpu memory (#22793) · d41d802b

由 Zeng Jinle 提交于 3月 04, 2020

* add recorded cuda memory apis, fix typo, test=develop

* add more ut, test=develop

* follow comments, test=develop

* fix py35 incompatible issues, test=develop

d41d802b

08 1月, 2020 1 次提交
- Z
  
  fix allocator strategy comment, test=develop, test=document_fix (#22121) · 4c2df8e4
  由 Zeng Jinle 提交于 1月 08, 2020
  
  4c2df8e4
06 1月, 2020 2 次提交
- Z
  
  polish allocator strategy doc, test=develop, test=document_fix (#22095) · 95872494
  由 Zeng Jinle 提交于 1月 06, 2020
  
  95872494
- Z
  
  ag allocator by default, test=develop (#21837) · d9f5d1eb
  由 Zeng Jinle 提交于 1月 06, 2020
  
  d9f5d1eb
20 10月, 2019 1 次提交
- 1
  test=develop, add communicator_is_sgd_optimizer flag (#20677) · 95e90aa1
  由 123malin 提交于 10月 20, 2019
```
* test=develop, communicator_is_sgd_optimizer flags
```
  95e90aa1
11 10月, 2019 1 次提交
- Z
  
  refine allocator_flag, test=develop, test=document_fix (#20400) · 1d1d221f
  由 Zeng Jinle 提交于 10月 11, 2019
  
  1d1d221f
23 9月, 2019 1 次提交
- C
  Delete local execution scopes (#19749) · d7251a8e
  由 chengduo 提交于 9月 23, 2019
```
* Add RecordHistoryLocalExecScopes
test=develop
```
  d7251a8e
18 9月, 2019 1 次提交
- Z
  
  remove some flags and add comments to some flags, test=develop (#19813) · 13ca364c
  由 Zeng Jinle 提交于 9月 18, 2019
  
  13ca364c

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致