提交 · a88791481484ab6a61540a737336d79c65d021dc · PaddlePaddle / Paddle

14 1月, 2022 1 次提交

[MLU]Add mean and reduce_mean op (#38872) · 7f8d5bc8

由 qipengh 提交于 1月 14, 2022

* [MLU]: add mean and reduce mean op

* [MLU]add mlu pytest dir in CMakeLists.txt

* [MLU]fix tensor data

* [MLU]fix TensorToPyArray and license

7f8d5bc8

13 1月, 2022 1 次提交
- 石
  
  splits allocation for pten, test=develop (#38853) · 277cf900
  由石晓伟提交于 1月 13, 2022
  
  277cf900
10 1月, 2022 2 次提交

T

1.fix elementwise_add_grad bug. 2. add dropout kernel in kl2 (#38726) · 7b860a23
由 taixiurong 提交于 1月 10, 2022

7b860a23

[Unify Tensors PR #5] framework::Tensor inherits from DenseTensor,test=allcases (#38632) · 5c73a6ea

由 Zhanlue Yang 提交于 1月 10, 2022

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor

* Modified framework::Tensor to inherit from DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes

* Rearranged cfunction calls from tensor.data<void>() to tensor.data()

* Fixed CI issues

* Fixed lite issues

* Fixed data() interface issues,test=allcases

* Resolved IsInitialized() issues

* Fixed ResetHolder() issues

* Fixed MKLDNN & Storage issues

* Resolved ShareBufferWith() issues

* Fixed LoD issues

5c73a6ea

04 1月, 2022 1 次提交

[Unify Tensors PR #3]Port framework::Tensor members & interfaces to... · dfdc9960

由 Zhanlue Yang 提交于 1月 04, 2022

[Unify Tensors PR #3]Port framework::Tensor members & interfaces to pten::DenseTensor, test=allcases (#38473)

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes

dfdc9960

31 12月, 2021 1 次提交

[MLU]support calling mlu op from python interface (#38292) · b6bf650a

由 fwenguang 提交于 12月 31, 2021

* [MLU]support calling mlu op from python interface

* [MLU]fix

* fix

* [mlu]fix mlu_places

* [mlu]fix required mlu

* fix

* [MLU]fix tensor copy

* [mlu] fix MLUPlace call path

b6bf650a

20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
09 12月, 2021 1 次提交
- J
  
  add ipu device p2 (#37840) · cb636a48
  由 jianghaicheng 提交于 12月 09, 2021
  
  cb636a48
17 11月, 2021 1 次提交
- W
  
  [npu][hybrid] support offload (#37224) · 762819a8
  由 WangXi 提交于 11月 17, 2021
  
  762819a8
20 10月, 2021 1 次提交

Add FasterTokenizer Operator (#34491) · 3f2d6a3f

由 Steffy-zxf 提交于 10月 20, 2021

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

3f2d6a3f

28 9月, 2021 1 次提交
- S
  
  dlpack fix (#35817) · 74ff59cf
  由 Siming Dai 提交于 9月 28, 2021
  
  74ff59cf
29 7月, 2021 1 次提交
- L
  
  [NPU] Avoid cpu tensor freed before copying to npu completed (#34475) · d71b9ba7
  由 Leo Chen 提交于 7月 29, 2021
  
  d71b9ba7
13 7月, 2021 1 次提交
- J
  Added printing tensor's format if oneDNN is used (#34021) · d9d45ba7
  由 jakpiase 提交于 7月 13, 2021
```
* added printing tensor's format

* added suggested changes
```
  d9d45ba7
24 6月, 2021 1 次提交

[NPU] support dygraph execution on npu place(#33579) · 6aea6be2

由 houj04 提交于 6月 24, 2021

* in NPU environment, use CPUPlace for missing operators.

* in NPU environment, use CPUPlace for missing operators.

* fix TensorCopy bug and add unit test.

* fix code style.

* add more unit tests.

6aea6be2

21 6月, 2021 1 次提交

[NPU] flatten params and grads, fuse grad_clip and optimizer op (#33461) · c269a160

由 Leo Chen 提交于 6月 21, 2021

* enable npu alignment

* support flatten_params/grads

* support clip by global norm

* remove memset in coalesce_tensor_op

* fix npu kernel of sum op when input is one tensor

* add ut for flatten_param_grads+regularizer

* fix ut

* fix typo

c269a160

01 6月, 2021 1 次提交

replace and remove complex64/128 types in custom OP and other files (#33195) · 06c63ca0

由 chentianyu03 提交于 6月 01, 2021

* replace and remove complex64/128 types in custom OP and other files

* fix custom_tensor_test fail bug

* fix custom_conj_test fail bug

* fix dispatch_test_op build fail bug

06c63ca0

12 5月, 2021 1 次提交
- L
  
  [NPU] Support npu pinned allocator and manage Tensor on NPUPinnedPlace (#32840) · 6b3bb796
  由 liym27 提交于 5月 12, 2021
  
  6b3bb796
19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

09 4月, 2021 1 次提交

[NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d

由 Leo Chen 提交于 4月 09, 2021

* [feature] support npu allocator (#30840)

[feature] support npu allocator

* [feature] support npu operator (#30951)

[feature] support npu operator

* [feature] support npu allocator, part 2 (#30972)

* support npu allocator

* add npu device context

* fix some compile problem

* fix some compile problem

* add npu info

* compile ok

* fix include dir

* support naive_best_fit_allocator

* run ut ok, bug failed to exit

* call aclrtResetDevice before exit

* fix aclFinilize

* add system allocatot test

* add selected_gpus in gtest

* add tensor_test for npu

* support npu op, initial commit

* add npu stream

* add elementwise_add_op

* compile ok

* fix typo

* fix elementwise_add_op_npu_test

* support op run

* test can run but failed

* change aclopExecuteV2 to aclopCompileAndExecute

* support parsing ascend rank table file (#31000)

support parsing ascend rank table file

* Fix reshape on GE graph. (#31084)

Fix reshape on GE graph

* add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)

* add npu sub op

* fix typo

* rename test

* fix bug

* fix bug

* add fp16 kernel

* fix typo

* support sub grad op

* support elementwise_sub_grad op
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>

* Fix compilation problem (#31100)

Fix compilation problem (#31100)

* fix compile

* fix code stype

* remove const_cast

* support adding correct npu op in pybind.h (#31143)

* support adding correct npu op in pybind.h

* refine code

* [NPU] Support executor with NPU (#31057)

* [NPU] Support executor with NPU

* Fix code according to reviews

* Fix code

* Add unittest for sub op npu

* refactor npu device manager (#31154)

refactor npu device manager (#31154)

* fix selected npus

* fix compile

* fix reading flags from env

* format
Co-authored-by: Nxiayanming <41795079@qq.com>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>

ccf5709d

26 2月, 2021 1 次提交
- Q
  
  [ROCM] update fluid framework for rocm (part6), test=develop (#31015) · 28b356b9
  由 Qi Li 提交于 2月 26, 2021
  
  28b356b9
22 12月, 2020 1 次提交
- J
  [oneDNN] Tensor copy fix to oneDNN tensors (#29771) · 7b33720c
  由 Jacek Czaja 提交于 12月 22, 2020
```
* - Tensor copy fix to oneDNN tensors

* - Fixes after review
```
  7b33720c
04 12月, 2020 1 次提交

Support type promote for basic math ops (quantum required) (#29265) · 9ad800eb

由 Chen Weihang 提交于 12月 04, 2020

* basic impl of type promote

* add comment & another testcase

* fix complex bugs & support python op promote type

* fix failed unittests & polish code

* add unittest for coverage

* change to only promote complex type

* polish code details

* polish several comments

9ad800eb

01 12月, 2020 1 次提交

add complex64 and complex128 type; add +-*/@ and slice opreator for c… (#29199) · 8f45d142

由 chentianyu03 提交于 12月 01, 2020

* add complex64 and complex128 type; add +-*/@ and slice opreator for complex types

* add test cases for complex elementwise, matmul and getitem unittest

* add test cases for complex types

* add test cases for complex matmul unittest

8f45d142

15 10月, 2020 1 次提交
- Z
  
  fix bug of tensor copy of CUDAPinnedPlace (#27966) · 2ac6c6c3
  由 Zhou Wei 提交于 10月 15, 2020
  
  2ac6c6c3
13 10月, 2020 1 次提交

Refine the format of printing tensor (#27673) · 049696bf

由 Leo Chen 提交于 10月 13, 2020

* add sumary feature

* refine printting tensor

* add sci_mode

* add sample code

* fix indent error

* fix _format_item

* polish code

* support item indent

* add ut

* set place for ut

* fix py2 issue

* fix ut

049696bf

09 10月, 2020 1 次提交
- J
  - Fix to 27398 (#27770) · 631c1f30
  由 Jacek Czaja 提交于 10月 09, 2020
```
test=develop

- compilation fix

test=develop
```
  631c1f30
24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

16 9月, 2020 1 次提交
- C
  Polish framework error message part 7 (#27266) · 4f9d6529
  由 Chen Weihang 提交于 9月 16, 2020
```
* polish framework error message part 7

* fix typo

* polish by reviewes comment
```
  4f9d6529
27 8月, 2020 1 次提交
- Z
  fix bug that can't print int8_t (#26712) · 8071d230
  由 Zhou Wei 提交于 8月 27, 2020
```
fix bug that can't print int8_t 
```
  8071d230
24 8月, 2020 1 次提交
- J
  Add isfinite v2 op (#26344) · 199b0c7c
  由 Jack Zhou 提交于 8月 24, 2020
```
add the isnan, isfinite, isinf api for the paddle 2.0
```
  199b0c7c
21 8月, 2020 1 次提交

support Baidu Kunlun AI Accelerator (#25959) · 138ecf24

由 QingshuChen 提交于 8月 21, 2020

* support Baidu AI Accelerator
  * test=kunlun

* minor
 * test=kunlun

* support xpu op in separate file
 * test=kunlun

* update XPU error message and remove duplicated code

 * test=kunlun

* minor
 * test=kunlun

* minor
 * test=kunlun

138ecf24

15 8月, 2020 1 次提交

expose and unify the Tensor concepts to the user (#25978) · 6de463d3

由 Zhou Wei 提交于 8月 15, 2020

* expose and unify the Tensor concepts to the user

* expose tensor to user

* add copy place for Tensor

* add copy place for Tensor

* add note

* add macro PADDLE_WITH_CUDA

* remove RUN_TYPE=DIST

* fix some error

6de463d3

29 7月, 2020 1 次提交

Simplify BufferedReader to improve DataLoader performance (#25648) · 1b3081b1

由 Chen Weihang 提交于 7月 29, 2020

* simplify buffered reader to improve DataLoader performance

* fix 22 failed unittests

* fix cuda pinned context condition

* fix test_reader_reset failed

* fix two failed unittests

* change unittest place

* polish error messaage

* polish cast op GetExpecctedKernelType

* remove debug info in unittest

1b3081b1

11 5月, 2020 1 次提交

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

27 4月, 2020 2 次提交

[dy2static] Add print transformer and unify print format (#24068) · 9b851ba2

由 Chen Weihang 提交于 4月 27, 2020

* add print transformer & unify print format, test=develop

* remove using of dygraph_to_static_func, test=develop

* remove python stdout capture, test=develop

* fix compatibility problems for PY2, test=develop

* fix detail error, test=develop

* fix type analysis bug, test=develop

* fix print tuple compatible error in PY2, test=develop

* replace get_func to declarative, test=develop

* fix detail bug, test=develop

* fix some detail problems, test=develop

* change visit_call in print transformer, test=develop

9b851ba2

Y

Add the implementation of inverse (#23310) · ecfddebb
由 Yiqun Liu 提交于 4月 27, 2020

ecfddebb

17 3月, 2020 1 次提交
- A
  
  Revert "Change ShareDataWith() to TensorCopy() in conv_mkldnn (#22695)" (#22985) · 5842ae67
  由 Adam 提交于 3月 17, 2020
  
  5842ae67
11 3月, 2020 1 次提交
- A
  
  Change ShareDataWith() to TensorCopy() in conv_mkldnn (#22695) · 056edf39
  由 Adam 提交于 3月 11, 2020
  
  056edf39
12 12月, 2019 1 次提交

memory leak for cpu (#21174) · 9ad940fd

由 tangwei12 提交于 12月 12, 2019

* add fake init for the trainer, fix large memory hold in the trainer
* do not merge recv vars from a remote endpoint, test=develop
* add recv and save op, merge slice var in one op, save memory
* remove hsigmoid with pull sparse, test=develop

9ad940fd

02 12月, 2019 1 次提交
- W
  
  fix the correctness of memcpy profiling result test=develop (#21458) · d4776ec0
  由 wangchaochaohu 提交于 12月 02, 2019
  
  d4776ec0

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功