提交 · d611e48c90d1a9145f97956ca2e5faea7a4a16bd · PaddlePaddle / Paddle

28 4月, 2023 1 次提交

Dropout optimize & clean broadcast inT and ElementwiseType (#52969) · d611e48c

由 Bo Zhang 提交于 4月 28, 2023

* change judgement for DropoutGradGPUKernelDriver

* add UnrollerWithoutVecSize and after this Loaddata to be refined

* pass unittest

* use same unroller with XPU

* BroadcastWithInt64Index

* BroadcastDataLoader template partial specialization

* fix compile errs in ROCms

* clean ElementwiseT and InT for BroadcastKernel

* default axis and clean inT

* remove redundant fast divmod computation

* optimize drop_nd & drop_nd_grad

* optimize BroadcastDataLoader bf16 fp16

* rm InT etc. after merge develop

* delete constexpr for windows ci

* fix conflict

* fix conflic with develop

* fix conflic

* new clean

* clean

d611e48c

20 4月, 2023 1 次提交

move_elementwise_raw (#53010) · 7a72f7a2

由 zhangyuqin1998 提交于 4月 20, 2023

* setup

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

* fix

* fix

* Update elementwise_kernel.cu

* fix

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

* Update elementwise_kernel.cc

7a72f7a2

23 3月, 2023 1 次提交
- L
  [AMP] Add bfloat16 Support for `elementwise_pow` Op (#51888) · 288ad844
  由 Lin Manhui 提交于 3月 23, 2023
```
* Add bf16 support for elementwise_pow

* Update ut
```
  288ad844
09 3月, 2023 1 次提交
- Z
  
  delete axis of fmax (#51264) · 7d138402
  由 zhangyuqin1998 提交于 3月 09, 2023
  
  7d138402
25 2月, 2023 1 次提交
- Z
  Rename elementwise_heaviside to heaviside (#50821) · 8129c22e
  由 zyfncg 提交于 2月 25, 2023
```
* rename elementwise_heaviside to heaviside

* delete __init__.py

* fix bug
```
  8129c22e
13 2月, 2023 1 次提交
- Z
  Delete axis of fmin kernel (#50358) · 8df8cb10
  由 zyfncg 提交于 2月 13, 2023
```
* delete axis of fmin

* fix bug
```
  8df8cb10
10 11月, 2022 1 次提交

[PHI]Standardise some C++ API (Part4) (#47702) · 594bd723

由 YuanRisheng 提交于 11月 10, 2022

* standard api

* fix sparse bugs

* fix xpu bugs, test=kunlun

* remove hard code for custom unittest

* open ci, test=kunlun

* deal with conflict

594bd723

26 10月, 2022 1 次提交
- L
  [Fix] Fix paddle.pow() Gets Incorrect Result When Broadcasting Is Triggered (#47307) · d8314ff5
  由 Lin Manhui 提交于 10月 26, 2022
```
* Fix paddle.pow() bugs

* Add unittest cases

* Fix ut cases

* Add ut cases on multiple devices
```
  d8314ff5
30 8月, 2022 1 次提交
- C
  
  rename mod c api name (#45476) · ad96fe2c
  由 Chen Weihang 提交于 8月 29, 2022
  
  ad96fe2c
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
10 5月, 2022 1 次提交

【PaddlePaddle Hackathon 2】18、为 Paddle 新增 paddle.heaviside 和 paddle.Tensor.heaviside API (#41872) · 4892d592

由 BrilliantYuKaimin 提交于 5月 10, 2022

* Create elementwise_heaviside_op.cc

* add ElementwiseHeavisideFunctor

* Create test_elementwise_heaviside_op.py

* 增加heaviside的python接口

* add heaviside in white list

* 增加heaviside的签名

* 增加heaviside的核函数

* 增加heaviside梯度的核函数

* 增加heaviside梯度的注册

* 调整代码格式

* Update elementwise_sig.cc

* add heaviside in __all__

* Update heaviside docs

* Update math.py

* Update math.py

* Update math.py

4892d592

19 4月, 2022 1 次提交

[Phi]Separate AddKernel/DivideKernel/SubtractKernel/MultiplyKernel from... · 2cb19d8f

由 YuanRisheng 提交于 4月 19, 2022

[Phi]Separate AddKernel/DivideKernel/SubtractKernel/MultiplyKernel from ElementwiseKernel（Part1） (#41806)

* seperate add/div/sub/mul from elementwise

* delete code

* fix compile bugs

* deal with conflict

* fix bugs when compile

* fix windows unit test bug

* fix ci converage bugs

2cb19d8f

14 4月, 2022 1 次提交
- C
  [Phi] Unify dispatch macros to visit (#41653) · 2ab986ae
  由 Chen Weihang 提交于 4月 14, 2022
```
* chnage dispatch to visit

* resolve conflict
```
  2ab986ae
30 3月, 2022 1 次提交

Revert "Revert "[Phi] Move elementwise_floordiv and elementwise_pow to phi... · eef46770

由 Chen Weihang 提交于 3月 30, 2022

Revert "Revert "[Phi] Move elementwise_floordiv and elementwise_pow to phi (#40993)" (#41065)" (#41110)

This reverts commit 3a6f1135.

eef46770

29 3月, 2022 2 次提交
- T
  Revert "[Phi] Move elementwise_floordiv and elementwise_pow to phi (#40993)" (#41065) · 3a6f1135
  由 tianshuo78520a 提交于 3月 29, 2022
```
This reverts commit b532315d.
```
  3a6f1135
- W
  [Phi] Move elementwise_floordiv and elementwise_pow to phi (#40993) · b532315d
  由 wuyefeilin 提交于 3月 29, 2022
```
* mv floordiv to phi

* mv elementwise_pow to phi

* fix as review
```
  b532315d
25 3月, 2022 1 次提交
- F
  
  move elementwise_max/min/mod into phi (#40590) · cfadf61b
  由 FlyingQianMM 提交于 3月 25, 2022
  
  cfadf61b
23 3月, 2022 1 次提交
- Y
  
  rename elementwise fmax (#40810) · 36492bc5
  由 YuanRisheng 提交于 3月 23, 2022
  
  36492bc5
17 3月, 2022 1 次提交
- Y
  
  rename math (#40641) · 883a8eea
  由 YuanRisheng 提交于 3月 17, 2022
  
  883a8eea
14 3月, 2022 1 次提交
- X
  
  [phi]migrate fmax,fmin kernel to phi (#40140) · bb801960
  由 Xiaoxu Chen 提交于 3月 14, 2022
  
  bb801960
09 3月, 2022 1 次提交
- Z
  [PHI] Move set_value kernel to phi (#40195) · cd28cddb
  由 zyfncg 提交于 3月 09, 2022
```
* save code

* fix bug of set_value

* add coverage test
```
  cd28cddb
22 2月, 2022 1 次提交
- C
  [PTen->Phi PR2] Rename PT_REGISTER macro to PD_REGISTER (#39790) · 4a338796
  由 Chen Weihang 提交于 2月 22, 2022
```
* unify register macro

* rename declare macro

* fix infrt error
```
  4a338796
20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

16 2月, 2022 1 次提交
- F
  [Pten] move complex_functors.h (#39558) · 5b5656d0
  由 Feiyu Chan 提交于 2月 16, 2022
```
* move complex_functors.h and update all references to symbols within it
```
  5b5656d0
15 2月, 2022 1 次提交

Move Abs OP to pten (#39492) · fb473067

由 From00 提交于 2月 15, 2022

* Move Abs op to pten

* Fix NPU compilation error

* Fix CI error

* Use LaunchSameDimsElementwiseCudaKernel in pten

fb473067

20 1月, 2022 1 次提交
- A
  [Pten] Migrate bfloat16/float16/complex from paddle::platform into pten::common (#39044) · f1143f0c
  由 Aurelius84 提交于 1月 20, 2022
```
* Migrate bfloat16/float16/complex from platform into pten::common

* fix typo

* fix code style
```
  f1143f0c
13 1月, 2022 1 次提交
- C
  [PTen] Rename kernel register marco (#38861) · 158bf13f
  由 Chen Weihang 提交于 1月 13, 2022
```
* rename register marco

* fix error changing

* fix format error
```
  158bf13f
11 1月, 2022 1 次提交

【PTen】Add dot and matmul grad kernel in pten (#38713) · be817719

由 zyfncg 提交于 1月 11, 2022

* refactor matmul directory in pten

* fix merge conflict

* add dot_grad kernel

* add dot_grad kernel in pten

* add matmul_grad kernel

* update the code

* delete useless code in fluid

* fix some bug of running matmul grad kernel

* fix merge conflict

* refactor some code

* refactor code

be817719

23 12月, 2021 1 次提交
- C
  
  move conj kernel impl (#38365) · 8da9eff4
  由 Chen Weihang 提交于 12月 23, 2021
  
  8da9eff4
20 12月, 2021 1 次提交

[pten]add pten conj kernel (#38247) · a2793e5e

由 chentianyu03 提交于 12月 20, 2021

* add pten conj kernel

* modify conj_kernel file path

* add defined cuda macro to cuda/conj_kernel.h

a2793e5e

16 12月, 2021 1 次提交

[PTen] Unify device context entrance in pten part 2 (#38182) · e02537f9

由 Chen Weihang 提交于 12月 16, 2021

* unify device context entrance

* move all_context include to header

* polish cmake relay for device_context

* fix npu compile failed

* fix npu compile failed

e02537f9

02 11月, 2021 1 次提交

Add matmul_v2 kernel in pten (#36844) · e11ecfce

由 zyfncg 提交于 11月 02, 2021

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* add matmul kernel in pten

* add unittest for new matmul_v2 kernel

* fix bug of CI compile

* fix bug of CI compile

* merge conflict

* remove useless file
Co-authored-by: NChen Weihang <chenweihang@baidu.com>
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

e11ecfce

01 11月, 2021 1 次提交

Paddle Tensor Operation Library initial implementation (#34425) · b9fdd3bc

由 Chen Weihang 提交于 11月 01, 2021

* initial tensor design & sign kernel demo

* add move constructor for meta & add lodtensor

* add dirs & sign xpu kernel

* add mean cpu&cuda kernel impl

* move sign & mean xpu & npu kernel

* add selected_rows basic impl

* refactor design, BaseTensor to DenseTensor, etc.

* add scale mkldnn kernel

* polish xpu & npu impl details

* fix mkldnn reuse compile failed

* change tensor operation lib name

* rename util filename

* add more comments

* change TensorImplInterface to TensorInterface

* add kernel key and factory

* remove MKLDNNTensorMeta, add MKLDNNDenseTensor

* change XXDeviceContext to XXContext

* add base kernel registrar utils & test on sign

* replace boost::any by paddle::any

* fix several ci failed

* fix npu compile error

* add ordered map util

* fix multiple ordered_map compile errors

* move dev into include dir

* support sign op in static op run

* fix static op run error

* fix new executor compile failed

* add dygraph branch & remove sign_op.h

* fix test_infer_no_need_buffer_slots

* fix rocm compile link error

* fix unitybuild error & clear glog

* fix npu compile failed

* skip quant trans test

* fix part windows compile problem

* fix xpu enforce error

* fix inference test failed

* remove ordered_map to solve quant failed

* fix part of rcom compile faild

* add more register kernels

* revert scale kernel temporarily

* fix code format error

* add new kernel registrar marco

* rename top to tcmpt

* revert xpu, npu, mkldnn impl & remove op def

* add kernel args parse functor to auto parse args

* revert some change & add scale kernels

* add op proto in dygraph kernelcontext building

* polish kernel dispatch logic & nameing rule

* fix scale kernel match error

* fix scale test failed

* add mean API and unittest

* test mean api success

* add branch to solve compiled error

* skip clang format error

* add mean skip rule in op_library

* add dot kernel, api and unittest (#6)

* remove old kernel and add symbol link

* fix dot compiled failed

* add merco for module declare

* fix npu and xpu compile error

* revert sign, mean, scale, dot kernel removing

* add comment for keeping old kernel impl

* fix mutable_data error

* fix bfloat16 conflit

* fix inference undef error

* adapt to msvc compile rules

* polish comment for template inst

* add cmake template instantiation for win

* fix backend to place device id bug

* fix ifdef error

* Op2functor (#7)

* add kernel args maker class

* make args maker non-const

* remove debug log

* modify codes by review options

* split constructPrKernelContext function

* fix output name bug

* fix test_mean_op test_sign_op failed

* fill_any_like kernel refactor (#10)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* skip dtype for fill_any_like

* add attrs for kernel key constrcut

* add use_pt_kernel Flags to control whether to use pt kernel (#13)

* add use_pt_kernel Flags to control whether to use pt kernel

* change the default value to true for cheking pt kernels

* fix mutable_data cuda place error

* move high level apis into hapi

* remove selectedrows adapting temporarily

* Support Scalar in Tensor Compute Library (#14)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* remove mkldnn tensor & polish details

* use flat_hash_map and small_vector in kernel factory

* Refactor flatten kernel (#12)

* refactor flatten kernel

* update infershape function

* fix compile bugs

* fix bugs when merge

* fix compiler bugs

* fix bugs when run test_flatten_api

* fix bugs when run test

* Revert "use flat_hash_map and small_vector in kernel factory"

This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.

* Move cpu, cuda and other device code into kernels (#15)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Perfect unitests (#16)

* perfect unittest

* update license

* replace with flat_hash_map, small_vector (#19)

* fix small_vector build error on windows platform

* replace with flat_hash_map, small_vector

* remove todo

* Perfect unitests (#20)

* perfect unittest

* update license

* fix bug when run tcmpt_utils_test

* refactor execution adapting impl

* fix insert conflit

* Fix CI bug of test_yolov3 (#21)

* fill_any_like kernel refactor

* remove useless code of full_like c++ api

* Support Scalar in Tensor Compute Library

* add scalar in dygraph and static graph mode

* keep the basic type for attr, instead of using scalar for all

* merge the code

* start refactor matmul

* move cpu, cuda and other device modules into kernels

* merge code

* polish code in operator.cc

* Fix CI bug of test_yolov3

* add the tensor base class, test=develop (#17)

* update the tensor base class, test=develop

* remove two funcs, test=develop

* update the error msg, test=develop
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* [no-verify] commit backend and tensor signature changes

* Rename tcmpt to pten (#23)

* rename tcmpt to pten

* update omitted files for rename to pten

* update omitted file for rename to pten

* remove k of all enum var

* remove kernel_instantiate (#26)

* remove symbols and spatial_tensor

* change common to functions

* readd share tensor impl methods

* add a candidate dense tensor class, test=develop (#28)

* change all Pt to Pten

* resolve conflit with xiaowei

* Op2functor opt1 (#27)

* replace to small vector and change to const &

* add std::move
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

* polish kernel factory and kernel registry

* fix operator test error msg mismatch

* remove tensor signature and backend set member

* move scalar and polish enforce

* revert dtype layout change to fix error

* fix enum operator override error

* add several base unittests

* add pten utils tests

* polish some details

* Dev/op2func refactor 3 (#30)

* add a candidate dense tensor class, test=develop

* remove TensorBase::backend(), test=develop

* remove some ops, test=develop

* cherry-pick the pr of tensor meta, test=develop

* moves the dense tensor and some ops, test=develop

* update the linalg operator, test=develop

* update other operators, test=develop

* fix errors, test=develop

* fix bugs, test=develop

* try to resolve the problem of windows ci, test=develop

* updates codes, test=develop

* fix the tensor_utils.cc, test=develop

* modify the dense tensor, test=develop

* fix the data type, test=develop
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details

* polish kernel signature details

* fix a bug about offsets of the tensor, test=develop (#31)
Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>

* polish some details
Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
Co-authored-by: Nzyfncg <1370305206@qq.com>
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>

b9fdd3bc

24 4月, 2021 1 次提交
- W
  
  refator paddle inference c api.test=develop (#32225) · 18d3e2ca
  由 winter-wang 提交于 4月 24, 2021
  
  18d3e2ca
21 4月, 2021 1 次提交

【NPU】Merge NPU ccl code (#32381) · c3158527

由 zhang wenhui 提交于 4月 21, 2021

* add allreduce and broadcast without test (#31024)

add allreduce and broadcast without test

* Refactor HCCLCommContext to be compatible with Paddle (#31359)

Refactor HCCLCommContext to be compatible with Paddle (#31359)

* [NPU] add npu kernel for communication op (#31437)

* add allreduce and broadcast without test

* add c_broadcast_test case

* build c_comm_init and c_create_group operators

* make the whole thing compile

* add broadcast and init op test case but run failed

* make unit test compile

* fix broadcast test bug and change into hcom for ccl

* change c_comm_init and c_create_group ops accordingly

* make tests compile

* transfer code to 27

* compiled successfully in 28, but run failed

* test broadcast in 28, but failed

* make hcom primitives work

* change hccl data type for base.h

* fix broadcast bug

* make attributes work

* fix group name bug

* add allreduce but test failed

* allreduce bug for qiuliang

* allreduce finished

* add allgather and reducescatter

* merge all op code

* add allgather test

* finish run all ccl op test exclude send/recv

* all all op and test exclude send/recv

* send_v2_npu.cc recv_v2_npiu.cc compiled

* fix ccl core dump bug and test allgather, reducescatter, broadcast op

* fix allreduce bug just for test

* hcom send&recv test pass, without hcom_destroy

* for qiuliang test

* Ascend Send&Recv Test Pass

* all op (ex send/recv) ok

* fix bug

* merge all ccl op

* style merge to PaddlePaddle

* merge style

* new merge style

* merge style 2

* insert an empty at the end

* disable ctest for hcom to pass ci
Co-authored-by: Nvoid-main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>

* Add auto-increasing tag id for Hcom OPs (#31702)

* add c_reduce_sum op (#31793)

add c_reduce_sum op

* update Ascendrc hccl to 20.3 (#32126)

update Ascendrc hccl to 20.3 (#32126)

* fix merge code

* change cmake.txt1

* [NPU] Support npu kernel for c sync stream op (#31386)

* sync stream npu op

* add with_ascend_acl

* update c++ unittest

* compile all failed

* try to pre commit

* after pre commit

* merge&compile&test hccl successfully!

* fix code style

* fix code style

* fix bugs about hccl

* fix some bugs

* fix code style

* fix style

* fix style

* fix

* fixed

* merge develop
Co-authored-by: Nlw921014 <liuwei921014@yeah.net>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: Nf2hkop <f2huestc@outlook.com>
Co-authored-by: Nxiayanming <41795079@qq.com>

c3158527

07 4月, 2021 1 次提交

【NPU】Merge ascend GE&distributed code by 0208 from ascendrc (#31957) · 8c7c53b3

由 zhang wenhui 提交于 4月 07, 2021

* Ascend rc (#30483)

* Fix compilcation on CANN20.1 and older (#30494)

Fix compilcation on CANN20.1 and older

* Add distribution supported (#30578)

Add distribution supported

* Build praser for Hcom* operators (#30627)

Build praser for Hcom* operators

* Pass device_ids info from launch to trainer. (#30632)

Pass device_ids info from launch to trainer

* Add Hccl program group (#30642)

Add Hccl program group

* Add startup bash files of test_ascend_group. (#30645)

Add startup bash files of test_ascend_group

* cleanup (#30646)

cleanup test_ascend_group.py

* [Feature] Build parser to support distributed training (#30658)

[Feature] Build parser to support distributed training

* fix compilation on ascend-20.1 (#30722)

fix compilation on ascend-20.1

* Dev/fix ascend string (#30749)

Dev/fix ascend string

* code style (#30781)

code style

* Merge ascend_optimizer and ascend_parser. (#30776)

Merge ascend_optimizer and ascend_parser.

* Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug  (#30797)

Ascendrc add converted op : [range/equal/range/uniform_random/expand/squeeze], fix cast op bug

* Add paddle ascend distribution training supported (#30796)

Add paddle ascend distribution training supported

* pass cxx_flags to gloo cmake (#30857)

* Destroy session first. (#30954)

Destroy session first.

* merge

* fix, test=develop

* fix, test=develop

* fix style, test=develop

* fix, test=develop

* fix

* fix log fatal, test=develop

* fix enforce style, test=develop

* fix, test=develop

* fix, test=develop

* fix rccl, test=develop

* fix test, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix node_num, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix ids str, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop

* fix style code, test=develop
Co-authored-by: Nhutuxian <hutuxian2011@sina.cn>
Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
Co-authored-by: NVoid Main <voidmain1313113@gmail.com>
Co-authored-by: NLeo Chen <chenqiuliang@baidu.com>
Co-authored-by: Ndingsiyu <18369187719@163.com>
Co-authored-by: NOleNet <olenet@126.com>

8c7c53b3

18 1月, 2021 1 次提交
- H
  
  Ascend Framework Part2: pybind files (#30410) · e207fe63
  由 hutuxian 提交于 1月 18, 2021
  
  e207fe63
25 2月, 2020 1 次提交

PaddleBox Framework Part2 (#22466) · 175954d8

由 hutuxian 提交于 2月 25, 2020

* Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
* Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
* Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
* Fix some known issues: such as copying persistable vars after one epoch running.

175954d8

31 8月, 2019 1 次提交

Paddlebox Framework (#18982) · c756b5d2

由 hutuxian 提交于 8月 31, 2019

* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982

c756b5d2

29 3月, 2019 1 次提交
- X
  
  add DataSet and InMemoryDataFeed, support load data into memory and shuffle data · 824b84d1
  由 xjqbest 提交于 3月 06, 2019
  
  824b84d1

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功