提交 · e3f68f426cbd38e1bfe344730c54a441007cf609 · BaiXuePrincess / Paddle

08 12月, 2021 1 次提交
- C
  implementation of broadcast sub backward by reduce (#37754) · 567e6bbc
  由 crystal 提交于 12月 08, 2021
```
* add boardcast_sub

* add boardcast_sub
```
  567e6bbc
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
29 11月, 2021 2 次提交

[Pten] Add reduce mean kernel, replace with mean API (#37559) · f9e9fd19

由 chentianyu03 提交于 11月 29, 2021

* add pten reduce kernel

* add reduce_sum kernel

* update attribute args and order

* make out dtype undefined

* fix empty input error

* merge develop branch

* rename sum as reduce function

* rename sum as reduce function

* fix reducekernelImpl args error

* add reduce cuda kernel

* modify dims type to const &

* remove unsed log

* fix reduce_all out eigen function error

* remove unused codes

* add the missing sum api define and testcase

* merge develop branch

* fix sum test axis value error

* replace pten mean kernel with reduce_mean

* revcover meam cuda to original implement

f9e9fd19

P

Add third batch of deprecated mkldnn namespace name changes (#37558) · 1ba81500
由 piotrekobiIntel 提交于 11月 29, 2021

1ba81500

27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

23 11月, 2021 1 次提交
- Q
  [XPU] Reorganize xpu device codes in platform, test=develop (#37428) · 79800978
  由 Qi Li 提交于 11月 23, 2021
```
* [XPU] Reorganize xpu device codes in platform, test=develop

* fix xpu_header.h, test=develop
```
  79800978
17 11月, 2021 1 次提交
- N
  Modify reduce_op.op.h for xpu2 with kernel primitive api (#36904) · 9c5d5665
  由 niuliling123 提交于 11月 17, 2021
```
* Modify reduce_op.op.h for xpu2 with kernel primitive api
```
  9c5d5665
28 10月, 2021 1 次提交

[NPU] Add int64 supporting for expand_v2, reduce_max, scale and tests (#36582) · c038cc7a

由 ronnywang 提交于 10月 28, 2021

* add TypeAdapter method for npu_op_runner

* add int64 supporting for elementwise_mul and reduce_sum

* add int64 supporting and UT for expand_v2, scale and reduce_max

* fix bug

c038cc7a

26 10月, 2021 1 次提交

[NPU] fix argsort op, test=develop (#36576) · 3523bbe8

由 Qi Li 提交于 10月 26, 2021

* [NPU] fix argsort op, test=develop

* remove debug files, test=develop

* fix typo, test=develop

* address review comments, test=develop

3523bbe8

21 10月, 2021 1 次提交

Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 (#36373) · 921c0917

由 niuliling123 提交于 10月 21, 2021

* Update the implement of reduceAnyKernel according to kernel primitive api
* Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1

921c0917

18 10月, 2021 1 次提交
- T
  [XPU AMP] 1. xpu support gradient acc 2. xpu support create tensor in dygraph... · d19a9b39
  由 taixiurong 提交于 10月 18, 2021
```
[XPU AMP] 1. xpu support gradient acc 2. xpu support create tensor in dygraph 3. xpu support update weight params in amp (#36439)
```
  d19a9b39
28 9月, 2021 1 次提交
- G
  
  fix bug of reduce_sum when src_dtype != dst_dtype and reduce_num == 1 (#36123) · d5268a6e
  由 Guoxia Wang 提交于 9月 28, 2021
  
  d5268a6e
18 9月, 2021 1 次提交

[oneDNN] Disable caching of Reorder operation (#35664) · e4c2a854

由 Jacek Czaja 提交于 9月 18, 2021

* - REorder disabling caching

* - compilation fix

* - another compilation fix

* - another compilation fix

* - compilation fix

* - Fix

* - yet another compilation fix

* - suppresingly another compilation fix

* - lint

* - fix after review

* - fix

e4c2a854

08 9月, 2021 2 次提交
- N
  
  Modify the reduce op according to the kernel primitive api (#35282) · 82b33be3
  由 niuliling123 提交于 9月 08, 2021
  
  82b33be3
- Z
  
  Add op define extra for norm and frobenius norm op. (#35329) · 3dab2e20
  由 Zhong Hui 提交于 9月 08, 2021
  
  3dab2e20
26 8月, 2021 1 次提交

[oneDNN] disable caching oneDNN primitives in matmul v2, Reduce grad and... · 31f0221f

由 Jacek Czaja 提交于 8月 26, 2021

[oneDNN] disable caching oneDNN primitives in  matmul v2, Reduce grad and elementwise_add grad, expand_v2 (#35132)

* - grad caching disabled of matmul_v1

- compilation fix

- compilation fix

* - reduction removed

* - Matmul v2 disabled caching

* Draft of further changes

* - workaround for reducegrad

* - fixes to UT

* - fix to compilation

* - another fix

* - fix

31f0221f

17 8月, 2021 1 次提交
- N
  fix a bug in nlp: text_matching/sentence_transformers when last dim is 1 and... · 181f7cec
  由 niuliling123 提交于 8月 17, 2021
```
fix a bug in nlp: text_matching/sentence_transformers when last dim is 1 and reduce mid dim (#34941)
```
  181f7cec
11 8月, 2021 2 次提交
- R
  [NPU] add reduce_mean_op_npu and test (#34053) · f6fab559
  由 ronnywang 提交于 8月 11, 2021
```
* add reduce_mean_op_npu and test

* remove skip.If

* update
```
  f6fab559
- N
  
  modified reduce_sum_op and reduce_mean_op for higher_performance (#32885) · 6a9fac14
  由 niuliling123 提交于 8月 11, 2021
  
  6a9fac14
06 8月, 2021 1 次提交

[NPU]add reduce_prod (#34182) · 47d81b09

由 furnace 提交于 8月 06, 2021

* [NPU] add reduce_prod

* [NPU] delete check_dygraph=False

* [NPU] delete skipIf

* add attrs support or check

* [NPU] delete extra codes for test_reduce_max_op_npu

* [NPU] add attr out_dtype

47d81b09

05 8月, 2021 2 次提交

New executor dev (#34407) · 012d12b5

由 hong 提交于 8月 05, 2021

* first test version

* add test exec;

* add data transfer; test=develop

* add new exec head;

* add memcpy; test=develop

* add python fetch

* add new test

* add graph node; test=develop

* remove useless new executor test; test=develop

* remove gperf dependency; test=develop

* fix compile bugs; test=develop

* remove useless code; test=develop

* remove useless code; test=develop

* add uni test; test=develop

* polish code; test=develop

* polish code; test=develop

* add interpreter cmakefile; test=develop

* remove useless code; test=develop

012d12b5

L

Support Ternary ops in elmentwise and broadcast (#33976) · 1d7b75dd
由 limingshu 提交于 8月 05, 2021

1d7b75dd

03 8月, 2021 1 次提交
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
02 8月, 2021 2 次提交
- Z
  
  Unify the block/grid strategy and implementation of ReduceLastDim and ReduceAny (#34436) · c7cc5ac2
  由 Zhang Zheng 提交于 8月 02, 2021
  
  c7cc5ac2
- F
  [NPU] add reduce_max (#34179) · de53f2bf
  由 furnace 提交于 8月 02, 2021
```
* [NPU] add reduce_max

* [NPU] delete skipIf

* [NPU] add atrrs support or check

* [NPU] add attr out_dtype

* [NPU] delete debug codes
```
  de53f2bf
30 7月, 2021 1 次提交

Added expand_v2 BF16/FP32 FWD/BWD kernels (#34284) · 41c4f723

由 jakpiase 提交于 7月 30, 2021

* added expand_v2 bf16/fp32 kernel

* minor change

* CI fix

* added missing test file

* added formatting

* reduced binary size

* CI fix

41c4f723

12 7月, 2021 1 次提交
- Z
  
  optimize perfermance of multiple-dimension reduce (#33761) · 2dde0eb0
  由 Zhang Zheng 提交于 7月 12, 2021
  
  2dde0eb0
05 7月, 2021 1 次提交
- Z
  
  Reduce build time by deleting the template param BlockDim (#33901) · 7a476608
  由 Zhang Zheng 提交于 7月 05, 2021
  
  7a476608
02 7月, 2021 1 次提交
- N
  
  modified reduce_all_op reduce_any_op for higher performance (#33267) · 9b48199a
  由 niuliling123 提交于 7月 02, 2021
  
  9b48199a
22 6月, 2021 1 次提交
- N
  
  modified reduce_max, reduce_min, reduce_prod to higher_performance implementation. (#32974) · 480b284c
  由 niuliling123 提交于 6月 22, 2021
  
  480b284c
15 6月, 2021 1 次提交

Support reduce_sum_op float16 (#32966) · 606939de

由 jiangcheng 提交于 6月 15, 2021

* add reduce_sum_op by add self-kernel

* set all ReduceKernel MPType for accuracy

* add float16 test script which input is integer number

* solve reduce sum float16 check_grad problem

* solve conflict and change test script for CI

* change kernel register for CI

* remove all useless template

606939de

28 5月, 2021 2 次提交
- C
  modify to complex template types for fill_constant op (#33179) · 1187c610
  由 chentianyu03 提交于 5月 28, 2021
```
* modify to complex template types for fill_constant op

* modify to complex template types for py_layer, strided_slice and reduce_sum_op.part
```
  1187c610
- C
  
  modify to complex template types in reduce_sum OP and rewrite it's IdentityFunctor struct (#33164) · 5756d3e5
  由 chentianyu03 提交于 5月 28, 2021
  
  5756d3e5
26 5月, 2021 1 次提交

[NPU] refine NpuOpRunner (#32869) · 8259d9bf

由 Leo Chen 提交于 5月 26, 2021

* refine ~npuOpRunner

* implement destructor and forbid copy

* use reference to avoid copy

* use const reference

* relax adam precision

* fix top_k

8259d9bf

25 5月, 2021 1 次提交
- N
  
  Add a new high performance framework for reduce ops (#32697) · 88b43b51
  由 niuliling123 提交于 5月 25, 2021
  
  88b43b51
20 5月, 2021 1 次提交

fix gather op and add logsumexp op on kunlun (#32931) · a96e8bc9

由 TTerror 提交于 5月 20, 2021

* fix gather op and add logsumexp op on kunlun

* update xpu depence

* update tests and fix elementwise_add

a96e8bc9

18 5月, 2021 1 次提交
- L
  
  add unit8 for concat (#32850) · 53580bb4
  由 liuyuhui 提交于 5月 18, 2021
  
  53580bb4
30 4月, 2021 1 次提交
- J
  
  Reduce grad fix (#32592) · 43527a2b
  由 jakpiase 提交于 4月 30, 2021
  
  43527a2b
21 4月, 2021 1 次提交
- J
  
  Added oneDNN reduce_op GRAD kernel (#32280) · ead83422
  由 jakpiase 提交于 4月 21, 2021
  
  ead83422
19 4月, 2021 1 次提交

[NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8

由 Leo Chen 提交于 4月 19, 2021

* [NPU] support GarbageCollector for npu (#31874)

* support GarbageCollector for npu

* fix typo

* fix gather_grad

* disable NPUDefaultStreamGarbageCollector on NPU

* [NPU] support npu for memcpy op (#31808)

* support npu for memcpy op

* add ut

* fix ut

* fix typo

* 【NPU】fix bug of using temp vector (#31963)

* fix bug when beta1_pow on cpu (#31995)

* [NPU] support npu profiler (#31684)

* support npu profiler

* add python api

* fix bugs

* add wrapper for incomplete type

* update profile proto

* record npu wait

* add xpu placeholder

* fix adam (#32016)

* [NPU] enable async copy and  add wait before sync operation (#31956)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* make TensorFromVector/TensorToVector sync

* [NPU] Support dataloader on npu place. (#31867)

* [NPU] Wait on NPUPlace (#32086)

* [NPU] fix cast op (#32121)

* fix npu kernel of cast op to handle casting to same dtype

* add comments

* [NPU] support cann 20.3 (#32044)

* fix compile problem on cann 20.3

* fix ut

* fix test_mul

* fix check_finite_and_scale

* fix lookup_table_v2_grad

* fix cmake

* support print op

* [NPU] Support npu save load (#31893)

* support save load for NPU

* add save load npu unittest

* support np.array transform in NPU

* fix errors

* delete dygraph in unittest

* add Wait

* fix unittest

* fix review comment

* fix unittest problem

* fix little problem

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)

* change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace

* refine code

* fix NPUDeviceContext in all c++ unittest (#32198)

* fix NPUDeviceContext in all c++ unittest

* refine log
Co-authored-by: Npangyoki <pangyoki@126.com>

* [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)

* enable async copy and  add wait before sync operation

* remove unneccessary wait

* add FillNpuTensorWithConstant

* refine

* fix fill_constant

* change TensorFromVector to FillNpuTensorWithConstant

* fix ignored api

* delete extra unittest

* fix little error

* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu

* change TensorCopySync to TensorCopy

* delete useless Wait and add StreamWait

* fix npu_stream error

* fix check_finite_and_unscale_op_npu TensorCopy

* only save stream wait

* fix NPUDeviceContext in all c++ unittest

* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>

* delete useless unittest file (#32206)

* Fix op test (#32231)

* fix conditional block (#32243)

* fix adam bug again (#32246)

* fix compile

* fix ut

* fix ut
Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
Co-authored-by: Npangyoki <pangyoki@126.com>

cbe5c9f8

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致