提交 · fe214af2733fd7cb14c2adc6bca3251917472039 · PaddlePaddle / Paddle

13 4月, 2022 4 次提交
- Y
  [Phi] Support construct Scalar by using Non-CPU Tensosr (#41528) · fe214af2
  由 YuanRisheng 提交于 4月 13, 2022
```
* support construct scalar using non-cpu tensor

* fix bugs when run unittest

* fix compile bugs

* fix bugs when run ci

* fix compile bugs

* fix bugs when move copy

* perfect unit test

* perfect unittest

* update according to comment

* add target dependency
```
  fe214af2
- Z
  
  concat and relu sopport FP16 in XPU, test=kunlun (#41631) · c4d5a77f
  由 zhangyikun02 提交于 4月 13, 2022
  
  c4d5a77f
- Z
  
  support bce_loss and bce_loss_grad in XPU, test=kunlun (#41610) · 468c1ad7
  由 zhangyikun02 提交于 4月 13, 2022
  
  468c1ad7
- H
  Update sign op xpu (#41685) · a4d4c116
  由 houj04 提交于 4月 13, 2022
```
* update sign op on xpu. test=kunlun

* fix typo. test=kunlun
```
  a4d4c116
12 4月, 2022 3 次提交

[KP] Add Logical/compare/bitwise registry & UT (#40802) · 3749198e

由 Lijunhui 提交于 4月 12, 2022

* init commit no push

* collect comile errors

* bitwise UT

* fix compile problem

* cancel comments

* restore miss deletion

* fix compilation

* fix UT

* NO stash in multiple branch at the same times

* fix error

* combine .cu from gpu and kps

* replace gpu by kps

* fix by Chen-weihang

* Revert "Fix kps compile error in Junhui logic compare bitwise"

* fix backend test

* rm comments
Co-authored-by: NChen Weihang <chenweihang@baidu.com>

3749198e

J
fix_paddle_numel_check (#41607) · 51cae7f7
由 JingZhuangzhuang 提交于 4月 12, 2022
```
* fix_paddle_numel_check

* fix_paddle_numel_check
```
51cae7f7
L

Update Profiler (#41638) · c3e1d257
由 liutiexing 提交于 4月 12, 2022

c3e1d257

11 4月, 2022 2 次提交
- fix dynamic flag bug on mac (#41571) · b026840a
  由 zhouweiwei2014 提交于 4月 11, 2022
  
  b026840a
- A
  
  support more ops (#41421) · fc621dfe
  由 Allen Guo 提交于 4月 11, 2022
  
  fc621dfe
09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

08 4月, 2022 2 次提交
- Q
  [ROCm] fix dcu error in device event base, test=develop (#41521) · 14dba636
  由 Qi Li 提交于 4月 08, 2022
```
* [ROCm] fix dcu error in device event base, test=develop

* fix, test=develop
```
  14dba636
- T
  
  xpu mul unittest *test=kunlun (#41140) · 770ce7cf
  由 taixiurong 提交于 4月 08, 2022
  
  770ce7cf
07 4月, 2022 5 次提交
- remove FLAGS_use_curand and change all random op CUDA implementation (#41308) · 9714878c
  由 zhouweiwei2014 提交于 4月 07, 2022
  
  9714878c
- C
  Fix dygraph record event position (#41445) · 8fba68d3
  由 chenjian 提交于 4月 07, 2022
```
* no

* maintain old profiler

* fix old dygraph record event
```
  8fba68d3
- Q
  ignore some failed test for KL2 (#41342) · 81389c51
  由 QingshuChen 提交于 4月 07, 2022
```
* ignore some failed test for KL2
*test=kunlun

* minor
*test=kunlun

* minor
*test=kunlun
```
  81389c51
- J
  modify infer gpu memory strategy (#41427) · 56e72b20
  由 JingZhuangzhuang 提交于 4月 07, 2022
```
* modify infer gpu memory strategy

* modify infer gpu memory strategy
```
  56e72b20
- Y
  Add GPU memory usage information in the print of profiler. (#41440) · 516160a4
  由 Yiqun Liu 提交于 4月 07, 2022
```
* Add GPU memory usage information in the print of profiler.

* Add ifdef.
```
  516160a4
06 4月, 2022 1 次提交
- A
  [IPU] remove paddle_ipu shared library (#41307) · 229e91bf
  由 Allen Guo 提交于 4月 06, 2022
```
* remove paddle_ipu shared library

* fix unique_name
```
  229e91bf
03 4月, 2022 1 次提交

add maximum limit for grid of index_select (#41127) · af8d2482

由 FlyingQianMM 提交于 4月 03, 2022

* limit grid dim for index select

* mv LimitGridDim into gpu_launch_config.h

* fix conflicts

* fix conflicts

* fix code style

* set block to 256

* fix grid setting

* set dtype of block_dim to unsigned int

af8d2482

01 4月, 2022 2 次提交

[Eager] Support pinned (#41035) · f3270fc8

由 wanghuancoder 提交于 4月 01, 2022

* support pinned, test=develop

* support async_write, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

f3270fc8

support multi_layer of bilstm,*test=kunlun (#41151) · 00d23897

由 z8hanghuan 提交于 4月 01, 2022

* support multi_layer of bilstm,*test=kunlun

* support multi_layer of bilstm, *test=kunlun

* support multi_layer of bilstm, *test=kunlun

* support multi_layer of bilstm, *test=kunlun

00d23897

31 3月, 2022 3 次提交

[new-exec] fit mkldnn op (#41058) · 02cf6764

由 Leo Chen 提交于 3月 31, 2022

* fix bug that some op has no op_role attr

* add mkldnn support for new executor

* fit for mkldnn data_transfer

* fit for mkldnn data_transfer

02cf6764

Maintain old profiler (#41132) · a6bf2218

由 chenjian 提交于 3月 31, 2022

* no

* maintain old profiler

* exclude new python record events for old profiler

* maintain old profiler

* maintain

* maintain old profiler

* maintain

* fix cmakes

a6bf2218

Add time range duration display (#41029) · 6744754f

由 chenjian 提交于 3月 31, 2022

* no

* fix bugs

* fix doc according to review

* fix api doc format

* fix api doc according to review

* fix bug and add unit test

* fix record event bug

* optimize chrome tracing display

* fix bug

* add comment

* add unit test

* fix a bug

* fix

* fix

* fix format

6744754f

30 3月, 2022 3 次提交

Add new APIs for GPU memory monitoring (max_memory_allocated,... · afe02e9d

由 From00 提交于 3月 30, 2022

Add new APIs for GPU memory monitoring (max_memory_allocated, max_memory_reserved, memory_allocated, memory_reserved) (#38657)

* Add new API memory_reserved

* Add memory_allocated, max_memory_reserved and max_memory_allocater

* Fix CI error

* Fix CI error

* Enhance UT

* Add FLAGS_memory_stats_opt

* Add STATS macro functions

* Add StatAllocator

* Fix CI errors

* Add UT

* Fix CI errors

afe02e9d

add bilinear interpolate v2 to xpu list and unitteset, *test=kunlun (#41037) · 4e86dff2

由 ykkk2333 提交于 3月 30, 2022

* add bilinear interpolate v2 to xpu list and unitteset, *test=kunlun

* Delete ps_usr_print_log

* Delete ps_usr_print_log

* Delete xpu_op_test

4e86dff2

swish and pow op for xpu test=kunlun (#40654) · d951f3af

由 houj04 提交于 3月 30, 2022

* swish and pow op for xpu. test=kunlun

* fix code style. test=kunlun.

* use pow_grad xdnn api. test=kunlun.

d951f3af

29 3月, 2022 1 次提交
- Z
  
  softmax_with_cross_entropy support fp16 on xpu, test=kunlun (#40869) · 649948a6
  由 zhangyikun02 提交于 3月 29, 2022
  
  649948a6
28 3月, 2022 1 次提交

Fix profiler package bug (#40888) · 77a455c7

由 chenjian 提交于 3月 28, 2022

* no

* fix bugs

* fix doc according to review

* fix api doc format

* fix api doc according to review

* fix bug and add unit test

* fix record event bug

77a455c7

27 3月, 2022 1 次提交

[new-exec] fit for mkldnn and inplace op (#40955) · afa0e82c

由 Leo Chen 提交于 3月 27, 2022

* fit for mkldnn and inplace op

* fix compile

* refine ut

* register op version

* fix inplace op

* fix transfer_layout

afa0e82c

25 3月, 2022 2 次提交

[Phi] Migrate Adam and AdamW into Phi (#40351) · 56cd3407

由 Aurelius84 提交于 3月 25, 2022

* [Phi] Migrate Adam and Adamw into Phi

* fix compile error and unittest ok

* fix compile error and unittest ok

* fix undefined reference to fLI::FLAGS

* test depend on operator

* fix cmake

* fix xpu compile

* fix infrt

* fix amp_type_traits

* fix amp_type_traits

* modify according reviewer

* modify according reviewer

* fix dtype float16

* fix typo

* fix Cmake

* fix code style

56cd3407

F
add maximum limit for grid of reduce, elementwise, gather and scatter (#40813) · 608a5f55
由 FlyingQianMM 提交于 3月 25, 2022
```
* add maximum limit for grid of reduce, elementwise and gather

* add {} after if
```
608a5f55

23 3月, 2022 3 次提交

[NPU] add npu support for conv3d and conv3d_grad (#38480) · ff568afa

由 furnace 提交于 3月 23, 2022

* [NPU] add npu support for conv3d and conv3d_grad

* [NPU] delete failed unittests due to Ascend not support

* [NPU] delete debug codes

* [NPU] optimize codes, notest

* [NPU] remove const_cast

* [NPU] optimize for remove const_cast

* [NPU] fix written errors

ff568afa

Performance optimization for StreamSafeCudaAllocator (#40718) · d8bff988

由 From00 提交于 3月 23, 2022

* Performance optimize

* Optimize GetAllocator, RWLock and ProcessUnfreedAllocation

* Remove test file

* Fix CI error

* Fix CI errors

* Fix CI errors

d8bff988

Add profiler features (#40357) · c15e3823

由 chenjian 提交于 3月 23, 2022

* add event record for model profiling

* fix format

* fix format

* fix code example bug

* no

* add profiler statistic

* add profiler feature

* fix bug

* fix bug

* fix bug

* fix bug

* required: gpu

* required: gpu

* fix bug

* required: gpu

* fix ci bug

* fix ci error

* fix ci error

* upgrade document

* fix doc

* fix ci bug

* add doc and fix bug

* nothing

* fix bug

* fix format bug

* modify format

* add deprecated description for old profiler

* fix bug

* fix bug

* fix

* add load_profiler_reuslt doc

* add load_profiler_reuslt doc

* add load_profiler_reuslt doc

* help fix old profiler sample code

* add api doc

* fix format

* fix api doc

* fix api doc format

* fix api doc format

* fix api doc c format

* fix api doc format

c15e3823

21 3月, 2022 4 次提交

[Phi] Add phi device context pool (#40635) · 0e1191f4

由 Chen Weihang 提交于 3月 21, 2022

* add phi device context pool

* change year

* fix compile error

* fix operator = error

* refine init impl

* polish details

* refine init impl

0e1191f4

Z

conv2d support FP16 on xpu and update unittest for conv2d, test=kunlun (#40395) · 276017bb
由 zhangyikun02 提交于 3月 21, 2022

276017bb

[IPU] add more ops (#40691) · df3ae18a

由 Allen Guo 提交于 3月 21, 2022

* add more ops

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* rm ipu_strategy.check()

* fix UT fail

* fix typo
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

df3ae18a

[IPU] update ipu_backend (#40685) · d67fe921

由 Allen Guo 提交于 3月 21, 2022

* sync changes

* copy sOpNamescope

* fix UTs

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* fix code-format

* fix compile error

* add comments for feed_op
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

d67fe921

15 3月, 2022 1 次提交

oneDNN NHWC fixes (#40049) · dde9cec0

由 Jacek Czaja 提交于 3月 15, 2022

* - Prototype of third solution

- fix

- compilation fixes

- fix

- fixe

- fix

- fix

- compilation fix

- comment fix

- lint

update mkldnn conv_elementwise_add_fuse_pass ut

- NHWC changes to prelu

- alhpa dims

- UT fix

- fix to UT

- lint

- Some fixes

- added to BWD of prelu NHWC support

- reverted removal of resetting cu_layout in clearing of caching

* - Small changes

* - compilation fix

* - fix

* - fix

* lint

* - fixes after internal review

* - compilation fix

* - lint

dde9cec0

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功