提交 · 23d9e947be1e274291a54f3b13980fe252144a6b · Crayon鑫 / Paddle

20 12月, 2021 5 次提交
- S
  
  fix use of implicitly deleted constructor (#38225) · 23d9e947
  由 Sylwester Fraczek 提交于 12月 20, 2021
  
  23d9e947
- Z
  Add multi_tensor for momentum optimizer and clear_grads (#37564) · 0cc5e22c
  由 zhangbo9674 提交于 12月 20, 2021
```
* add multi_tensor for momentum and clear_grads for optimizer

* fix bug for dygraph

* add unittest

* refine comment

* add param_group

* refine regularizaiton logic

* del clear_grads

* add clear_grads

* add dispensable check of None

* refine clear_grad

* fix build bug

* refine code by comment

* refine code

* add multi tensor check

* refine param_group update

* add multi tensor for static mode

* refine comments

* delete useless comma for momentum

* refine comment for momentum

* refine code by commment
```
  0cc5e22c
- Y
  
  [fleet_executor] Remove runtime graph, all scheduler on python side (#38261) · 2f188341
  由 Yuang Liu 提交于 12月 20, 2021
  
  2f188341
- Y
  Fix bugs that copy occurs when tensor "in" and tensor "out" is same in reshape kernel (#38249) · a615002a
  由 YuanRisheng 提交于 12月 20, 2021
```
* fix bugs when run reshape

* fix ci bug
```
  a615002a
- Z
  
  move the directory of fill kernels in pten (#38219) · 06128b9f
  由 zyfncg 提交于 12月 20, 2021
  
  06128b9f
18 12月, 2021 3 次提交
- N
  
  [pnorm] fix bug in pnorm (#38215) · 9e42fe9a
  由 Noel 提交于 12月 18, 2021
  
  9e42fe9a
- G
  
  fix seed for class_center_sample using paddle.seed (#38248) · 59be8e0e
  由 Guoxia Wang 提交于 12月 18, 2021
  
  59be8e0e
- F
  add complex op (#37918) · 31e874b1
  由 Feiyu Chan 提交于 12月 18, 2021
```
* add complex op and `paddle.complex`.
```
  31e874b1
17 12月, 2021 17 次提交

Support multi place constructor (#38171) · 6f439e5a

由 Jiabin Yang 提交于 12月 17, 2021

* support more eager tensor api

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* refine test in pure cpu

* refine test in pure cpu

6f439e5a

L

fit CI_SKIP_CPP_TEST (#38242) · b613c31e
由 Leo Chen 提交于 12月 17, 2021

b613c31e

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

F

add test for conv_transpose_bn_fuse_pass (#38203) · 672d94b2
由 feng_shuai 提交于 12月 17, 2021

672d94b2
S

fix bug when build inference lib without tensorrt (#38156) · 6d1b8c52
由 Sing_chan 提交于 12月 17, 2021

6d1b8c52

[pten] modify reduce_sum reduce_mean args (#38216) · eaa2363e

由 chentianyu03 提交于 12月 17, 2021

* modify sum mean args

* add GetExpectedPtenKernelArgs for redcue_op

* modify kernel args number

* modify kernel args number

eaa2363e

L

[fleet_executor] Fix the problem in fleet executor stop (#38114) · 843435ff
由 LiYuRio 提交于 12月 17, 2021

843435ff

Generated CoreOpsInfos for potential use in append_op API (#38085) · e3b033f9

由 Zhanlue Yang 提交于 12月 17, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* Generated CoreOpsInfos for potential use in append_op API

* Fixed CI problem

e3b033f9

K

add op/api repeat/interleave (#37981) · a7de0e66
由 kuizhiqing 提交于 12月 17, 2021

a7de0e66
L
[Paddle-TRT] Use TRT inspector to show the information inside an engine to verbose log (#38200) · 237c1fe6
由 Leo Chen 提交于 12月 17, 2021
```
* Inspect the information inside a TRT engine.

* Follow up the google code style.

* Fix code error.
```
237c1fe6

add launch bound to limit the registers usage for volta architecture (#38113) · 18a59822

由 zlsh80826 提交于 12月 17, 2021

From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block.
There are two ways to resolve this problem:
Reduce the threads per block launch configuration
add __launch_bound__ to give information to nvcc compiler for reducing registers usage
this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.

18a59822

N

Delete cub_reduce.h and modified the TensorReduce to TensorReduceFunctorImpl (#38197) · 9a8a4c77
由 niuliling123 提交于 12月 17, 2021

9a8a4c77

Get base pointer from Allocation (#37978) · 431a2d6a

由 From00 提交于 12月 17, 2021

* Get GPU BasePtr from CUDA allocation

* Fix compile error for ROCm

* Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc

* Add alignment for BuddyAllocator

* Set address alignment of BuddyAllocator to 32 bytes

* Fix CI error

* Remove code for naive_best_fit strategy

431a2d6a

F

Add GetStream Interface for StreamSafeCUDAAllocator (#38195) · b0d12d99
由 From00 提交于 12月 17, 2021

b0d12d99
Y

[fleet_executor] run time graph on python side (#38164) · fc701369
由 Yuang Liu 提交于 12月 17, 2021

fc701369
L
[BugFix]: Elementwise branch selection and Broadcast dimension merge (#38204) · e097a748
由 limingshu 提交于 12月 17, 2021
```
* fix_bugs_for_elementwise_branch_selection

* fix merge_dims bugs

* fix all influenced file
```
e097a748
H

update xpu1 op list, for train ResNet50 using PaddleClas. (#38201) · 3a0e0b6f
由 houj04 提交于 12月 17, 2021

3a0e0b6f

16 12月, 2021 15 次提交

L
[new-exec] skip add_dependency when cc_test skipped because of CI_SKIP_CPP_TEST=ON (#38191) · 30973183
由 Leo Chen 提交于 12月 16, 2021
```
* fix cmake

* not check execution time
```
30973183
C

add grad maker debug log (#38183) · a43d8e59
由 chentianyu03 提交于 12月 16, 2021

a43d8e59

Faster implementation of CPU kernel for ROI Align operator (#37848) · 023ff4f5

由 Tomasz Socha 提交于 12月 16, 2021

* Faster implementation of CPU kernel for ROI_ALIGN Operator

* Add missing variable to CUDA roi_align_op

* Style

* Fix boundaries

* Rename variables for indexes calculation

* Remove unnecessary emplace

* Revert "Remove unnecessary emplace"

This reverts commit c10e87f7fb812f1a672fde32f2690a97d47e2f5a.

* Style

023ff4f5

C

pylayer support HIP (#38184) · 2e76d5ad
由 chentianyu03 提交于 12月 16, 2021

2e76d5ad

Fixed LD_LIBRARY_PATH for eager_code_generator (#38160) · af30f545

由 Zhanlue Yang 提交于 12月 16, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Enabled Eager AutoCodeGen for All Existing Operators & Possible Future Operators

* Fixed CI issues

* Fixed LD_LIBRARY_PATH for eager_code_generator

af30f545

Add arc hyperbolic function op (#37076) · 36b7368d

由 xiaoting 提交于 12月 16, 2021

* add activation

* update activation_op

* add unitest for activation

* fix acosh for init, test=develop

36b7368d

Conv transpose eltwiseadd bn fuse pass (#37800) · e64f0997

由 feng_shuai 提交于 12月 16, 2021

* conv_transpose_eltwiseadd_bn_fuse_pass

* change timeout

* add TIMEOUT

* add random num for group and dilation

* change PassCompat

e64f0997

王

Revert "modify the fix_seed attribute in dropout op is a def... · 464f2af8

由王明冬提交于 12月 16, 2021

Revert "modify the fix_seed attribute in dropout op is a def attribute.test=develop (#38100)" (#38127)

This reverts commit f44add7b.

464f2af8

Add tests for PaddleInference Pass (#37676) · 96597a85

由 yeliang2258 提交于 12月 16, 2021

* add test for conv_elementwise_add2_act_fuse_pass and conv_elementwise_add_act_fuse_pass

* Add conv_eltwiseadd_bn_fuse_pass test and fix test_conv_elementwise_addX_act_fuse_pass

* add tests for conv_act_mkldnn_fuse_pass

* add test for conv_bias_mkldnn_fuse_pass

* update code

* add conv_act_mkldnn_fuse_pass for relu, relu6, swish, leaky_relu

* update test

* update

* update bug

* update

* update pattern_detector

* fix test_conv_eltwiseadd_bn_fuse_pass

* add diff display notest;test=windows_ci_inference

* fix

* remove test_conv_act_mkldnn_fuse_pass.py

* ifix

96597a85

[PTen] Add register_ctx_kernel marco and move scale kernel (#38121) · af498677

由 Chen Weihang 提交于 12月 16, 2021

* add register_ctx_kernel and move scale kernel

* polish details by reviewer comment

* fix xpu compile failed

* fix cmake error

af498677

J
support eager switch system (#38170) · 8305c2be
由 Jiabin Yang 提交于 12月 16, 2021
```
* support eager switch system

* polish code
```
8305c2be
D
[psgpu]add checknan print and fix trainer device (#38131) · 092839d6
由 danleifeng 提交于 12月 16, 2021
```
* trainer_device fix and checknan tool for psgpu;test=develop

* disable show_one_table;test=develop
```
092839d6

Adapt host event recorder to profiler (#37766) · 5b6be4d7

由 liutiexing 提交于 12月 16, 2021

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* add os_info

* update

* update

* update

* update

* update

* update for bugfix

* update

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

5b6be4d7

L
Add fmax and fmin operators (#37826) · dd3afc9d
由 LJQ❤️ 提交于 12月 16, 2021
```
Add elementwise_fmax and elementwise_fmin operators
```
dd3afc9d

Add sparse_attention mask ,test=develop (#37973) · fa463b90

由 Liu-xiandong 提交于 12月 16, 2021

Add key_padding_mask and attn_mask in sparse_attention Api

1.Key padding mask is a tensor with dimensions [batch_size, seq_len], and attention mask is a tensor with dimensions [seq_len, seq_len]. The data types of the two masks are consistent with Q, K, and V, which are float32 or float64. If the value in Mask is 0, it means that the position needs to be masked.

2.The changed files are mainly paddle/fluid/operators/sparse_attention_op.cu and python/paddle/fluid/tests/unittests/test_sparse_attention_op.py. sparse_attention has three parts: sddmm, softmax, and dsd. Adding the mask operation only needs to modify the softmax. It has no effect on the other two parts. In addition, in order to test the mask function, related tests has been added.

fa463b90

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致