提交 · d9884e2077d024a2439b8864b21885402f228af7 · Crayon鑫 / Paddle

23 2月, 2022 1 次提交
- [MLU] add cncl parallel context and mlu resource pool (#39803) · 6241913b
  由 mhhhh1 提交于 2月 23, 2022
```
* [MLU] add cncl parallel context and mlu resource pool

* [MLU] fix the cncl_context_test
```
  6241913b
20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

19 2月, 2022 1 次提交

[Pten]Unify paddle/pten::framework::ddim into pten::ddim (#39614) · 2fe04264

由 Aurelius84 提交于 2月 19, 2022

* Unify paddle/pten::framework::ddim into pten::ddim

* fix paddle namespace

* compile sucessfully

* fix npu src file

* fix conflict

* fix conflict

* fix tensorrt compiler error

* fix conflict

* fix conflict

* fix tesst file conflict

* fix conflict

* fix mlu file conflict

* fix mlu file conflict

* fix cinn header file conflict

* fix conflict

* fix conflict

* fix conflict

* fix conflict

2fe04264

18 2月, 2022 1 次提交

[AMP] support GPU BF16 amp for dygraph (#39029) · 7d6d3848

由 zhangbo9674 提交于 2月 18, 2022

* support dtype param for auto_cast

* add amp_dtype for tracer

* add unsupported bf16 list

* support bf16 amp for O2

* refine python interface for bfloat16

* refine code

* refine code

* refine unittest

* refine code

* refine code

* add bf16 o1

* refine code by comment

* add gradient accumulator

* add recompute

7d6d3848

15 2月, 2022 2 次提交

[PluggableDevice] Add custom runtime support (#38740) · 3e7825f3

由 ronnywang 提交于 2月 15, 2022

* [CustomRuntime] Add DeviceManager

* [CustomRuntime] Add DeviceInterface

* [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager

* [CustomRuntime] Add plug-in device

* [CustomRuntime] Memory module support PluggableDevice

* [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option

* update

* [API] update API doc based on comments, test=develop
Co-authored-by: Nqili93 <qili93@qq.com>

3e7825f3

[PTen]Migrate proto::VarType outside of Pten (#39411) · 7e7e9404

由 Aurelius84 提交于 2月 15, 2022

* #1 migrate dist-related type()-> dtype()

* move datatype function from pten -> fluid/framework

* change type() in imperative into convert(dtype())

* modify xx_tensor->type into xx_tensor->dtype

* change the set_type interface and the caller

* modify xx_tensor.type into xx_tensor.dtype

* fix mutable_data(place, dtype())

* change caller of mutable_data in pten and distributed

* change the caller of mutable_data in fluid/framework

* change the caller of mutable_data in imperative directory

* mutable_data: inference

* update the call of mutable_data

* transfer MakePenScalarArray MakePtenScalar ResetHolderWithType

* pass the compile. the next step is remove VarType in Pten

* fix all and remove VarType from pten. success in linux. Next task is other platform

* fix conflict with develop

* fix compiled error

* Fix reset conversion

* fix conflict

* fix compiled problem

* fix typo

* Fix << in tensor_utils.cc

* fix type->dtype

* fix unittest

* fix tensor init constructor

* fix DataTypeSize for BFloat16

* fix code style

* fix npu compiled error

* fix npu

* compile npu sucessfully

* fix conflict

* fix conflict
Co-authored-by: Nxiongkun <xiongkun03@baidu.com>

7e7e9404

02 2月, 2022 1 次提交
- J
  
  Merge legacy to fluid (#39318) · 34cce62f
  由 Jiabin Yang 提交于 2月 02, 2022
  
  34cce62f
27 1月, 2022 1 次提交

Add Khop Graph Sampler API (#39146) · 35f949b5

由 Siming Dai 提交于 1月 27, 2022

* add the test case for the UVA

* add the context load for the uva

* Add graph_sample kernel

* Add graph_sample commit

* add new commit for graph_sample

* add unsigned long long int

* delete some remarks

* add cpu version

* add cuda eids

* add cpu eids

* delete _uva

* optimize speed: emplace_back, last_layer

* add to_uva_tensor

* add cpu return_eids choice

* add gpu return_eids choice

* add cpu reindex_nodes

* add gpu reindex_nodes

* rename op and add OMP for cpu

* add incubate api

* fix the compile problem for the PADDLE_ENFORE and different device

* fix the rcom and windows compile problem

* add unittest for graph_sample_neighbors

* fix cpu unittest and unique problem

* fix uva unittest, fix cuda unique problem

* fix the windows compile problem

* fix the windows rand_r compile problem

* add correct unittest, add src_eids dispensable

* delete black

* combine uva unittest

* mv Sample_index to Sample_Index; check input shape; fix random sample func

* delete memset & cudaMemset

* fix according to PR comments

* fix rocm ci

* modify function names according to the specification

* fix windows_openblas ci

* refine annotations, fix windows unittest, add default value for uva device_id, fix bug for input nodes with empty neighbors

* fix rocm ci

* rename graph_sample_neighbors as graph_khop_sampler, add incubate api doc

* add data type

* fix conflict
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

35f949b5

25 1月, 2022 1 次提交

[Move selected_rows PR #3] Change the relationship of [include/Cmake]. (#39128) · 2bafd338

由 Weilong Wu 提交于 1月 25, 2022

* Added selected_rows and rw_lock to pten

* Renamed the unit test target to fix CI

* Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid

* Remove rw_lock.h,rw_lock_test.cc in fluid

* Use pten::RWLock and pten::AutoRDLock, fix CI

* Use pten::SelectedRows

* Use pten::SelectedRows

* Fix to pass NPU CI

* Use pten::SelectedRows, to pass NPU CI

* To fix NPU CI

* To fix NPU CI again

2bafd338

20 1月, 2022 1 次提交

[Eager] Support Eager mode for some testcase (#38783) · d21074cd

由 wanghuancoder 提交于 1月 20, 2022

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* eager test case

* support inference test

* refine test and fix initializer failed

* modify eagertensor patch method

* add eagertensor.clear_grandint, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* support create varbase and fix retain grad error

* call monkey_patch_varbase in _test_eager_guard, test=develop

* fix windows error

* split clear_gradient to clear_gradient and zero_grads, test=develop

* refine, test=develop

* refine, test=develop

* support test_imperative_basic test in eager mode

* remove additional log in variable.h

* remove additional log in variable.h

* remove additional code create in merge

* eager

* fix some eager logic, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* patch_tensor_method_func, test=develop

* refine, test=develop

* eager test case, test=develop

* refine, test=develop

* eager, test=develop

* eager, test=develop

* eager optimizer, test=develop

* eager optimizer, test=develop

* eager test_imperative_optimizer_v2, test=develop

* eager, test=develop

* refine, test=develop

* refine, test=develop

* eager, test=develop

* add resize in share buffer to, test=develop

* eager, test=develop

* fix _share_buffer_to, test=develop

* refine, test=develop

* refine, test=develop

* support eager for dataloader,test=develop
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NJiabinYang <360788950@qq.com>

d21074cd

18 1月, 2022 2 次提交

[Unify Tensors PR #8] Merged Tensor into DenseTensor, test=allcases (#38914) · 2052f1e3

由 Zhanlue Yang 提交于 1月 18, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Patched python level LoDTensor

* Merge Tensor into DenseTensor

* Fixed namespace issues,test=allcases

* Fixed merge issues

* Fixed inference issues

* Fixed NPU test issues

* Fixed merge issues

2052f1e3

add the uva function for the Tensor (#38950) · bfacd706

由 wawltor 提交于 1月 18, 2022

* add the uva api for the tensor

* fix the compiler problem for the uva

* fix the example for the _uva

* fix the compile problem in the pten library

* update the enviroment support for the uva

* use the make_shared replace the shared_ptr

bfacd706

17 1月, 2022 1 次提交

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

10 1月, 2022 1 次提交

[Unify Tensors PR #5] framework::Tensor inherits from DenseTensor,test=allcases (#38632) · 5c73a6ea

由 Zhanlue Yang 提交于 1月 10, 2022

* Added shared_ptr<Allocation> member & corresponding interfaces to Storage

* Removed original pten::Allocation from Storage and adjusted the interfaces accordingly

* Fixed issues with storage offset

* Used place to malloc allocation for TensorStorage

* [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor

* Fixed issues with place

* Added comments

* Moved mutable_data with stream argument to DenseTensor

* Added set_offset interface

* Fixed CI issues,test=allcases

* [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor

* Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor

* Modified framework::Tensor to inherit from DenseTensor

* Reverted changes too pten_layout() interface

* Removed friend classes

* Rearranged cfunction calls from tensor.data<void>() to tensor.data()

* Fixed CI issues

* Fixed lite issues

* Fixed data() interface issues,test=allcases

* Resolved IsInitialized() issues

* Fixed ResetHolder() issues

* Fixed MKLDNN & Storage issues

* Resolved ShareBufferWith() issues

* Fixed LoD issues

5c73a6ea

28 12月, 2021 1 次提交

Support test basic of Var and Layer (#38426) · 1fb80a6a

由 Jiabin Yang 提交于 12月 28, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* support inference test

* refine test and fix initializer failed

* support create varbase and fix retain grad error

* fix windows error

* support test code coverage

* support test code coverage

* support test code coverage
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NWang Huan <wanghuan29@baidu.com>

1fb80a6a

24 12月, 2021 2 次提交

B

fix share buffer to (#38407) · 9409ff6b
由 Baibaifan 提交于 12月 24, 2021

9409ff6b

Support test imperative basic in eager (#38313) · d48f7c89

由 Jiabin Yang 提交于 12月 24, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* support inference test

* refine test and fix initializer failed
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NWang Huan <wanghuan29@baidu.com>

d48f7c89

23 12月, 2021 1 次提交
- add new API: paddle.clone;Tensor.element_size;nn.utils.parameters_to_vector (#38020) · 0eb03ed7
  由 zhouweiwei2014 提交于 12月 23, 2021
```
* add new API: paddle.clone;Tensor.element_size;nn.utils.parameters_to_vector

* fix comment
```
  0eb03ed7
21 12月, 2021 1 次提交

Fix inplace problem of setitem (#38298) · da61df5c

由 zyfncg 提交于 12月 21, 2021

* add inplace_map for trace_op in pybind

* fix inplace problem of setitem

* refactor the param format  of trace_op
Co-authored-by: Npangyoki <pangyoki@126.com>

da61df5c

20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
16 12月, 2021 1 次提交
- J
  support eager switch system (#38170) · 8305c2be
  由 Jiabin Yang 提交于 12月 16, 2021
```
* support eager switch system

* polish code
```
  8305c2be
14 12月, 2021 1 次提交

fix memory leak problen of set_value op (#38098) · f8202941

由 zyfncg 提交于 12月 14, 2021

* fix bug of set_value op

* fix BumpInplaceVersion

* polish some comments

* revert change of full_like

f8202941

09 12月, 2021 1 次提交
- B
  
  Add varbase init name (#37947) · fdf62e1e
  由 Baibaifan 提交于 12月 09, 2021
  
  fdf62e1e
07 12月, 2021 1 次提交

Buf fix for reset grad inplace version (#37811) · cf586021

由 Zhanlue Yang 提交于 12月 07, 2021

* Debug

* Fixed issue with reset_grad_inplace_version when used with clear_gradient & cross-batch accumulation

* Rearranged interfaces

* Fixed ci issues

cf586021

06 12月, 2021 1 次提交
- K
  
  heter for collective (#37613) · 1bdb8578
  由 kuizhiqing 提交于 12月 06, 2021
  
  1bdb8578
03 12月, 2021 2 次提交
- W
  
  Fix _numel func logic and add test (#37810) · 075a02d2
  由 Weilong Wu 提交于 12月 03, 2021
  
  075a02d2
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
01 12月, 2021 1 次提交

Remove cpp layer (#37730) · 44def66a

由 Jiabin Yang 提交于 12月 01, 2021

* optimizer __call__ to make dygraph faster

* fix return type

* remove cpp Layer

44def66a

26 11月, 2021 1 次提交

Added interface reset_grad_inplace_version (#37573) · dcb91fd7

由 Zhanlue Yang 提交于 11月 26, 2021

reset_inplace_version removes all inplace related records to VarBase/VariableWrapper, the essential purpose of which is to let you use inplace operations as if using its non-inplaced version, which of course will cause unexpected consequences if not used with care.

This is essentially a hack interface to satisfy one specific request

dcb91fd7

23 11月, 2021 2 次提交
- Z
  
  Removed debug code (#37447) · 586bafbd
  由 Zhanlue Yang 提交于 11月 23, 2021
  
  586bafbd
- R
  [NPU] Added HCCL backend support in dygraph mode (#36285) · 83e55cff
  由 ronnywang 提交于 11月 23, 2021
```
* Added HCCL backend support in dynamic graph mode

* fix segmentation fault

* add ut
```
  83e55cff
22 11月, 2021 3 次提交
- Z
  
  fix bug of indexing tensor with None (#37400) · de0cb386
  由 zyfncg 提交于 11月 22, 2021
  
  de0cb386
- Z
  
  Add backward function hook to dygraph (#37141) · 31344ab7
  由 Zhanlue Yang 提交于 11月 22, 2021
  
  31344ab7
- W
  Renamed Func and removed ENFORCE statement (#37348) · 2702af21
  由 Weilong Wu 提交于 11月 22, 2021
```
* Removed one ENFORCE statement

* Changed func name to _share_buffer_to

* Improve error reporting information

* Updated the logic of _is_share_buffer_to func
```
  2702af21
15 11月, 2021 1 次提交
- Z
  
  fix bug of indexing with ellipsis (#37182) · f2a56c6a
  由 zyfncg 提交于 11月 15, 2021
  
  f2a56c6a
11 11月, 2021 2 次提交

[Bug fixes] Add default arg to enhance varbase ClearGradient func (#36837) · 63f5c2d4

由 Weilong Wu 提交于 11月 11, 2021

* Add default arg to enhance varbase ClearGradient func

* Removed default arg, use a Flag to enhance varbase ClearGradient func

* Renamed Flags to FLAGS_real_release

* Use default arg to enhance varbase ClearGradient func and expose two func to set/get gradient isEmpty

* Removed DECLARE_bool statement

* Polished Code

63f5c2d4

[New features] Support VarBase to expose func (#36965) · 52645667

由 Weilong Wu 提交于 11月 11, 2021

* Expose func for varbase

* Expose func for varbase and enhance varbase init func

* Change func name and add test case for _CopyGradientWith

* Rename func

* Add test cases to increase coverage

* Refine the logic of _to func

* Replace numel() with _numel(), Add test code

52645667

08 11月, 2021 1 次提交
- Z
  
  setitem support passing stop_gradient from value to tensor (#37023) · aef8bf2a
  由 zyfncg 提交于 11月 08, 2021
  
  aef8bf2a
20 10月, 2021 1 次提交

Add FasterTokenizer Operator (#34491) · 3f2d6a3f

由 Steffy-zxf 提交于 10月 20, 2021

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

3f2d6a3f

18 10月, 2021 1 次提交

Add operators for async read & async write (#36333) · 3845afff

由 Siming Dai 提交于 10月 18, 2021

* fix async_read bug

* change index place to cpu

* add tensor size judge

* add async_read & async_write test

* fix bug in async_write

* fix mac py3 ci

* fix bug for cpu version paddle

* fix windows ci bug

* change input argument error type

* change const_cast to mutable_data

* add async_write out-of-bound check and consumate error hint

* fix a small bug for dst_tensor

* add docs and refine codes

* refine docs

* notest,test=windows_ci

* fix windows ci

* fix require

* fix code-block

* add core.is_compiled_with_cuda()

3845afff

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致