提交 · cb0b53cbbc1c52643238a48e3c2dca873c4d169b · 机器未来 / Paddle

25 8月, 2022 10 次提交

A
[OpAttr]min/max of uniform_random support Tensor type (#45417) · c8955d0d
由 Aurelius84 提交于 8月 25, 2022
```
* [OpAttr]min/max of Uniform_rand support Tensor type

* fix typo
```
c8955d0d

Transfer memcpy d2h from fluid to phi (#45150) · 0d14e74a

由 kangguangli 提交于 8月 25, 2022

* transfer memcpy_d2h from fluid to phi

* refine arg check and add comment

* fix cannot fallback to phi kernel

* fix gpu_context host alloc when tensor size = 0

* add kernel for std::vector<DenseTensor> args

* fix bugs in MemcpyD2HMultiIOKernel

* remove useless header file

* polish format

* fix typo

* add testcase for cudapinned place

* refine check condition in test

* polish error message

* polish error message

* remove header in fluid  directory

* merge memcpy_h2d and memcpy_d2h into one file, change register method to simplify implementation

* fix code style check

0d14e74a

R
[NPU] add run_program_op_npu (#45349) · 64afa638
由 ronnywang 提交于 8月 25, 2022
```
* [NPU] add run_program_op_npu

* add run_program_op_npu ut
```
64afa638
S
make full_like support double_max in dygraph (#45385) · edd66f2e
由 Sing_chan 提交于 8月 25, 2022
```
* make full_like support double_max in dygraph

* fix bug
```
edd66f2e

optimize conv algo cache (#41891) · 1cd7e68b

由 hong 提交于 8月 25, 2022

* optimizer conv alog speed

* code polish

* remove useless code

* fix compile error

* fix cpu compile error

* not use cudnn alog t

* add search cache max number

* polish code

* fix cache test bug

* add groups data format to conv args

* fix cache test bug

* fix cudnn_deterministic bug

* fix test switch auto tune bug

* fix test swith autotune bug;

* fix conv cache bug

* fix cache test error

* fix cache test bug

* fix windows mac compile error

* fix workspace search error

* update cudnn cache

* fix cache test bug; test=develop

* fix autotune swith test error

* polish code

* oplish code

1cd7e68b

R

[triu_indices] add triu_indices_op (#45168) · a410c397
由 Rayman 提交于 8月 25, 2022

a410c397
W

fix params sync multi times problem (#45406) · 20d38664
由 Wilber 提交于 8月 25, 2022

20d38664
U

fix roi_align_op_npu to pass the unittest (#45310) · 256bf6ff
由 USTCKAY 提交于 8月 25, 2022

256bf6ff
H

add temporal shift and grad *test=kunlun (#45300) · 63d9a175
由 haosicheng 提交于 8月 25, 2022

63d9a175
Z

enforce_reshape (#45386) · 0bf40070
由 zhoutianzi666 提交于 8月 25, 2022

0bf40070

24 8月, 2022 13 次提交
- W
  fix mean/variance shape infer bug during loop call of dynamic trt enqueue (#45387) · 4e3f0b95
  由 Wang Bojun 提交于 8月 24, 2022
```
* fix bug fix
```
  4e3f0b95
- S
  Solve the random state serialization (#45327) · 73e41c89
  由 ShenLiang 提交于 8月 24, 2022
```
* fix utest

* fix utest

* fix utest

* fix log

* fix random utest
```
  73e41c89
- L
  make tensor_util contains no cuda code (#45256) · 78916a7a
  由 Leo Chen 提交于 8月 24, 2022
```
* make tensor_util contains no cuda code

* refine isfinite

* revert ut

* move isfinite function to its op

* fix test

* fix compile

* std::isnan is not defined for int type on windows

* fix windows compile

* fix fp16

* fix rocm compile

* revert gradient node
```
  78916a7a
- Y
  
  fix op_teller with_dynamic_shape judge bug (#45384) · 9e0baf6e
  由 Yuanle Liu 提交于 8月 24, 2022
  
  9e0baf6e
- J
  
  fix argsort grad not fill zero (#45371) · c70d79a0
  由 Jiabin Yang 提交于 8月 24, 2022
  
  c70d79a0
- H
  [phi] Transfer merged_momentum yaml to phi (#45359) · 09acc860
  由 HongyuJia 提交于 8月 24, 2022
```
* add legacy_api.yaml

* set merged_momentum inplace only

* support inplace optional<vector<tensor>>

* add dygraph_mode api

* optimize TensorToConstDenseTensorPtr
```
  09acc860
- W
  
  Adapt tensor axis for cumsum (#45372) · 7f49b9ba
  由 WangZhen 提交于 8月 24, 2022
  
  7f49b9ba
- W
  
  conv_eltwiseadd_bn_fuse support fp16 (#45379) · 62b5452d
  由 Wilber 提交于 8月 24, 2022
  
  62b5452d
- M
  Support fp16 of adam operator in xpu environment (#45292) · a012d426
  由 mengqingchun02 提交于 8月 24, 2022
```
* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support beam_search operator on xpu. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun

* support fp16 of adam operator in xpu environment. test=kunlun
```
  a012d426
- W
  [OpAttr]Adapt tensor minlength for bincount (#45342) · 12917c8c
  由 WangZhen 提交于 8月 24, 2022
```
* Adapt minlength attr for bincount
```
  12917c8c
- W
  trt input tensor issue (#45358) · 55393172
  由 wenbin 提交于 8月 24, 2022
```
* fix

* optimize
```
  55393172
- Z
  
  [MLU]: compile prior box CPU kernel when WITH_MLU is ON for SSD (#45238) · 596a9405
  由 zhaoying9105 提交于 8月 24, 2022
  
  596a9405
- W
  
  fix convert weight failed. (#45346) · 3d514e48
  由 Wilber 提交于 8月 24, 2022
  
  3d514e48
23 8月, 2022 9 次提交
- P
  
  print log while use new exe (#45335) · fac8a260
  由 pangyoki 提交于 8月 23, 2022
  
  fac8a260
- Z
  [AutoParallel] Add Quant Pass (#44877) · 61bc016c
  由 zhaoyingli 提交于 8月 23, 2022
```
* add quant pass
```
  61bc016c
- L
  
  [FleetExecutor] Using program to be the only interface of TaskNode (#43869) · 9ccdb5fa
  由 LiYuRio 提交于 8月 23, 2022
  
  9ccdb5fa
- N
  
  Delete the template parameter BLockSize in Kernel Primitive API (#45220) · 1a0cd447
  由 niuliling123 提交于 8月 23, 2022
  
  1a0cd447
- O
  
  Update scope.h (#45270) · 60e072d3
  由 OccupyMars2025 提交于 8月 23, 2022
  
  60e072d3
- L
  
  Add store_barrier to prevent master exit (#44964) · 1734bc6f
  由 LiYuRio 提交于 8月 23, 2022
  
  1734bc6f
- O
  modify something unimportant when I read source code (#45273) · 5edc96e6
  由 OccupyMars2025 提交于 8月 23, 2022
```
* Update scope.h

* typo

* Update dense_tensor.inl
```
  5edc96e6
- Y
  [Phi]Move distribute_fpn_proposals to PHI (#45212) · 8f8ed7de
  由 YuanRisheng 提交于 8月 23, 2022
```
* move distribute_fpn_proposals

* fix some code

* fix yaml bugs

* add set dtype

* move proposal_impl to funcs

* fix compile bugs
```
  8f8ed7de
- R
  [CustomDevice] add profiler apis (#45130) · da51baf2
  由 ronnywang 提交于 8月 23, 2022
```
* [CustomDevice] add profiler apis

* migrate CalculateEstOccupancy into cuda_tracer

* update

* add ut
```
  da51baf2
22 8月, 2022 7 次提交
- J
  Add int8 support for matmul+elementwise_add fuse pass (#45077) · 9e5f3a38
  由 joanna.wozna.intel 提交于 8月 22, 2022
```
* Add int8 support for matmul+elementwiae_add fuse

* Corrections after review and ernie test fix
```
  9e5f3a38
- S
  Extend conv_concat_relu to support all activations (#45089) · d03ef054
  由 Sławomir Siwek 提交于 8月 22, 2022
```
* merge conv_concat_relu to conv_act

* fix typo

* extend unit test

* reuse existing gpd

* codestyle

* enforce mkldnn conv
```
  d03ef054
- Z
  
  [Paddle-TRT] support output_padding in conv2d_transpose and conv3d_transpose (#45004) · 25d25b00
  由 zhoutianzi666 提交于 8月 22, 2022
  
  25d25b00
- W
  [Eager] some python c api use final state (#45221) · d2ef888b
  由 wanghuancoder 提交于 8月 22, 2022
```
some python c api use final state
```
  d2ef888b
- Y
  
  remove trt_skip_layernorm_fuse_pass from gpu passes (#45293) · 25d58db6
  由 Yuanle Liu 提交于 8月 22, 2022
  
  25d58db6
- H
  [jit] add jit layer function default constructor (#45169) · e3574f72
  由 Hui Zhang 提交于 8月 22, 2022
```
* fix jit layer function

* fix comment

* fix comment
```
  e3574f72
- R
  
  [CustomDevice] fix custom ccl (#45276) · 307ad60d
  由 ronnywang 提交于 8月 22, 2022
  
  307ad60d
20 8月, 2022 1 次提交
- W
  [Eager] pylayer detach output tensor if it is equal with input (#45065) · bba13e21
  由 wanghuancoder 提交于 8月 20, 2022
```
* pylayer detach output tensor if it is equal with input

* pylayer detach output tensor if it is equal with input
```
  bba13e21

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致