提交 · 9056cc8b12faa4beb037dab1646ac2dc71428292 · BaiXuePrincess / Paddle

18 1月, 2023 5 次提交

[PHI] remove bitwise and, or, xor (#49916) · 9056cc8b

由 RuohengMa 提交于 1月 18, 2023

* add reduce_sum_int64 and reduce_sum_int8 xpu kernels

* [PHI] add clip grad kernel with support type float32 and int32

* [PHI unittest] add clip_grad unit test

* adapt code to clang-format

* update xpu api output with clip_grad api

* remove int8 support of reduce_sum xpu kernel since it can not pass unit tests

* adapt license date, add code for XPUDataType convertion

* add int8 support of reduce_sum

* add reduce_sum unit tests for dtype int64, int8, and add more test cases

* update license date

* remove buggy bitwise and, or and xor xpu kernels, refine bitwise not xpu kernel

* change license date

9056cc8b

H

[XPU] add logical_not op. (#49911) · 60d1199a
由 houj04 提交于 1月 18, 2023

60d1199a

[0 Tensor support] support the 0d tensor for the cumsum (#49518) · 5fca45ea

由 wawltor 提交于 1月 18, 2023

* Add the cumsum 0d tensor

* xpu and cpu judge the 0d  tensor

* change to 2022 to 2023 in new commit

* fix the reverse logic

5fca45ea

Z

[Zero-Dim] Fix bug in masked_select for XPU (#49904) · 1a8be158
由 Zhang Zheng 提交于 1月 18, 2023

1a8be158

use default XPU stream for computing (#49806) · f6b23d6d

由 jameszhang 提交于 1月 18, 2023

* revert to use default XPU stream for computing

XPUContext now has a null stream by default. If you want to use a separate stream
 (e.g. in async collective communication), you should create a dedicated XPUContext
and invoke its XPUContext::CreateStream()

* minor

f6b23d6d

17 1月, 2023 6 次提交

Refine munmap freq for RefcountedMemoryMapAllocation (#49691) · 3fdc105f

由 zhangbo9674 提交于 1月 17, 2023

* refine munmap freq for ref_cnt_mmap_allocator

* add shm reuse logic

* fix compile bug

* fix compile bug

* fix bug of file refcount

* fix compile bug

* fix compile bug

* refine code for delete shm case

* polish code

* refine shm cache pool size setting logic

* set buffer is 2

* refine shm cache size logic

* refine max shm cache

* refine shm cache size

3fdc105f

Y
[Zero-Dim] support input 0D Tensor for equal_all (#49845) · f287b1e9
由 yeliang2258 提交于 1月 17, 2023
```
* add zero dims test

* update code

* fix zero dims

* update code
```
f287b1e9

support CUDA Graph for new executor (#49708) · 8e5ed04d

由 pangyoki 提交于 1月 17, 2023

* new exe supports CUDA Graph

* fix

* fix

* fix

* fix FLAGS_use_stream_safe_cuda_allocator in unittest

* insert output of coalesce_tensor op to skip_gc_var

* fix

8e5ed04d

[PHI]Change feed_op to phi kernel (#49116) · f7f1dc03

由 YuanRisheng 提交于 1月 17, 2023

* change feed_op to phi kernel

* fix ci bugs

* fix build bugs

* fix ci bugs

* fix compile bugs

* fix ci bugs

* perfect code

* perfect comment code

* fix install bugs

* modify code according comment

* remove visitor in feed_op

* modify according comment

* perfect code according comment

* add infershape

* fix py3 bugs

* fix getexpected kernel type

* fix getexpected kernel type

* fix ci bugs

* add registry for custom device

* fix py3 bugs

* fix floating point error

* fix py3 test bugs

f7f1dc03

H

SetDevice when parse TensorBase (#49860) · 4c576870
由 HongyuJia 提交于 1月 17, 2023

4c576870

【Prim】Add multiply,expand,div vjp rules (#49831) · 39c6765a

由 Xiaoxu Chen 提交于 1月 17, 2023

* support elementwise base func

* fix compiling error and add test

* support vjp for div using comp

* remove additional change

* fix dy2st error with magic num

* fix dy magic num

* another magic

* another magic

* another magic

* add skip rename strategy

* support add vjp

* support add with new axis cal

* support sub vjp

* [prim] add multiply vjp rules

* [prim] add multiply vjp rules

* [prim] fix no infershape with composite in _append_backward_ops

* [prim] add expand vjp rule

* [prim] add exp vjp rule

* uncomment infer shape for reshape/sum static prim api

* [prim] fix tanh nullptr error

* remove some print message

* fix magic number in run_program relative tests @JiaBinYang

* [prim] add expand,multiply,exp vjp rules

* fix only support single direction reduce error

* infer reduce dims using out dims
Co-authored-by: NJiabinYang <360788950@qq.com>

39c6765a

16 1月, 2023 7 次提交
- Support the 'data_transform' for generating static graph ops (#49772) · 28864137
  由 HappyHeavyRain 提交于 1月 16, 2023
```
* support the 'data_transform' for generating static graph ops

* reset 'pow' code

* change the 'GetKernelTypeForVar'
```
  28864137
- Z
  CUDA12.0 integration (#49539) · 1885d55a
  由 zlsh80826 提交于 1月 16, 2023
```
* Update warpctc for cuda-12

* Deprecate cudaProfilerInitialize for CUDA > 11

* Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040

* Add the missing thrust header
```
  1885d55a
- W
  
  [PHI] channel_shuffle add yaml (#49808) · 56dbe426
  由 Weilong Wu 提交于 1月 16, 2023
  
  56dbe426
- W
  
  add add_n for the 0d tensor (#49854) · 65b0181e
  由 wawltor 提交于 1月 16, 2023
  
  65b0181e
- Q
  
  add prod for kunlun (#49816) · bd03652f
  由 QingshuChen 提交于 1月 16, 2023
  
  bd03652f
- Z
  
  add sqrt_comp_grad composite rule (#49769) · 70378584
  由 zqw_1997 提交于 1月 16, 2023
  
  70378584
- X
  
  【prim】vjp for reduce sum (#49736) · 292f3f77
  由 xiaoguoguo626807 提交于 1月 16, 2023
  
  292f3f77
13 1月, 2023 16 次提交
- W
  
  [Phi] heaviside add yaml (#49807) · 4b7aeba4
  由 Weilong Wu 提交于 1月 13, 2023
  
  4b7aeba4
- L
  Move GeneralDivMod from segmented_array.h to fast_divmod.h (#48934) · ad4824e5
  由 limingshu 提交于 1月 13, 2023
```
* first commit

* add some changes in stack kernel.

* move the location of GeneralDivMod

* fix code format error according to ci
```
  ad4824e5
- C
  
  New feature: add register composite rule of ops (#49605) · 6ed8221a
  由 cyber-pioneer 提交于 1月 13, 2023
  
  6ed8221a
- R
  [Zero-Dim] add where, atan2, median 0-Dim ut (#49692) · 1508cae7
  由 ronnywang 提交于 1月 13, 2023
```
* add where, atan2, median 0d ut

* add where, atan2, median 0d ut

* update

* update

* update
```
  1508cae7
- D
  [Custom Device] Clear ProcessGroup Manually (#49182) · a923a757
  由 duanyanhui 提交于 1月 13, 2023
```
* clear ProcessGroupCustom manually

* fix bug

* fix bug

* move destroy ProcessGroup to ProcessGroupIdMap

* enable destroy to all device

* remove unused comments

* change to internal api

* Update process_group.cc

* Update process_group.cc
```
  a923a757
- J
  【Prim】Support elementwise related VJP with primitives (#49784) · 561f9013
  由 Jiabin Yang 提交于 1月 13, 2023
```
* support elementwise base func

* fix compiling error and add test

* remove additional param

* support vjp for div using comp

* remove additional change

* fix dy2st error with magic num

* fix dy magic num

* another magic

* another magic

* add more test

* fix windows problem

* another magic

* fix windows compile

* invoke ci

* add skip rename strategy

* support add vjp

* fix test_tanh

* support add with new axis cal

* fix resnet and some test

* add composite log

* support sub vjp
```
  561f9013
- J
  kunlun add support for c_concat and c_split (#49757) · a09b9a3f
  由 jameszhang 提交于 1月 13, 2023
```
* kunlun add support for c_concat and c_split

* replace mutable_data() and ShareDataWith()
```
  a09b9a3f
- Y
  
  add xpu adagrad and where_grad kernels (#49701) · a99c3cd4
  由 ykkk2333 提交于 1月 13, 2023
  
  a99c3cd4
- J
  fix xpu unittest issue (#49760) · ddc8a726
  由 jameszhang 提交于 1月 13, 2023
```
* fix xpu unittest issue: zero_dim_tensor

* deal with leftout issue introduced by #49470
```
  ddc8a726
- L
  
  Add unitest for set_value, set_value_grad. test=kunlun (#49773) · 5e722245
  由 Leo Guo 提交于 1月 13, 2023
  
  5e722245
- W
  
  add prelu & prelu_grad op for xpu (#49672) · 8d512b8f
  由 wangshengxiang 提交于 1月 13, 2023
  
  8d512b8f
- Z
  Generate static graph code of stack, unbind, unique_consecutive op (#49726) · ac9debee
  由 zyfncg 提交于 1月 13, 2023
```
* generate static graph code of stack, unbind, unique_consecutive op

* fix bug
```
  ac9debee
- W
  [cpplint fix] under ps (#49759) · d5c5bbc3
  由 wangzhen38 提交于 1月 13, 2023
```
* [cpplint fix] under ps
```
  d5c5bbc3
- W
  [PHI] rrelu add yaml (#49779) · 8447f876
  由 Weilong Wu 提交于 1月 13, 2023
```
* [PHI] rrelu add yaml

* polish

* polish
```
  8447f876
- Z
  
  Update threshold of bn1d (#49734) · 0294ab41
  由 zhangkaihuo 提交于 1月 13, 2023
  
  0294ab41
- Y
  
  fix fc and fused_fc_elementwise_layernorm kernel diff (#49778) · 0b24d167
  由 Yuanle Liu 提交于 1月 13, 2023
  
  0b24d167
12 1月, 2023 6 次提交
- S
  lerp support 0 Tensor (#49667) · 8cd0d5b3
  由 sunli 提交于 1月 12, 2023
```
* lerp support 0 Tensor

* fix lerp grad

* fix lerp zero test

* fix 0D + ND/ND + 0D

* fix check

* update code

* fix lerp infer shape

* static backward test

* updata static graph test
```
  8cd0d5b3
- W
  Migrate collective communication checks to PHI (#49754) · c24e7fe1
  由 Wen Sun 提交于 1月 12, 2023
```
* refactor: migrate comm checks

* refactor: add check in comm context

* feat: add gloo static check

* refactor: add place param in static check
```
  c24e7fe1
- Y
  
  deal with conflict (#49766) · 27aec62b
  由 YuanRisheng 提交于 1月 12, 2023
  
  27aec62b
- X
  
  fix_split (#49743) · 3fb4a08c
  由 xiaoxiaohehe001 提交于 1月 12, 2023
  
  3fb4a08c
- L
  Fix the bugs of set_value and set_value_grad ops and add register in (#49750) · 438975fd
  由 Leo Guo 提交于 1月 12, 2023
```
xpu2_op_list.cc. test=kunlun
```
  438975fd
- Y
  [PHI]Rename some PHI Kernel (#49470) · 30f5e39b
  由 YuanRisheng 提交于 1月 12, 2023
```
* rename kernel

* delete sig

* modify code according comment

* fix ci bugs
```
  30f5e39b

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致