提交 · aadc867467adfb16044c35c0ca6dcac4344d1e90 · Crayon鑫 / Paddle

21 12月, 2021 15 次提交
- B
  update squared_mat_sub_fuse_pass ut (#37838) · aadc8674
  由 baoachun 提交于 12月 21, 2021
```
* update squared_mat_sub_fuse_pass ut

* update ut

* update ut
```
  aadc8674
- C
  [PTen] Rename cuda dir and context to gpu (#38296) · dc7597e3
  由 Chen Weihang 提交于 12月 21, 2021
```
* rename cuda to gpu

* revert CMake change

* resolve conflit

* rename other cuda to gpu

* poish details
```
  dc7597e3
- C
  use elementwise to optimize gelu forward implementation on GPU (#38188) · aff43684
  由 crystal 提交于 12月 21, 2021
```
* relu forward opt

* add gelu functor

* optimize code
```
  aff43684
- A
  
  Fix for wrong conditions between forward and backward in elementwise_add_grad op (#38176) · d9780a22
  由 arlesniak 提交于 12月 21, 2021
  
  d9780a22
- Y
  
  [fleet_executor] Python side fleet executor and task node (#38290) · a4afb97a
  由 Yuang Liu 提交于 12月 21, 2021
  
  a4afb97a
- G
  
  fix recompute no grad warning (#38293) · 2005b98b
  由 Guoxia Wang 提交于 12月 21, 2021
  
  2005b98b
- B
  add seqpool_cvm_concat_fuse_pass ut (#37902) · 06cf314a
  由 baoachun 提交于 12月 21, 2021
```
* add seqpool_cvm_concat_fuse_pass ut

* rename ut name
```
  06cf314a
- C
  [pten] fix when out_dtype is same with x.dtype and still transform type error (#38285) · e0fd3bbf
  由 chentianyu03 提交于 12月 21, 2021
```
* fix when out_dtype is same with x.dtype and still transform type error

* fix spell error
```
  e0fd3bbf
- S
  Support FP16 mean (#38289) · 643a268e
  由 sneaxiy 提交于 12月 21, 2021
```
* mean first version

* fix scalar mean

* add fp16 dtype for api
```
  643a268e
- Y
  Fix test_conv_eltwiseadd_bn_fuse_pass timeout bug (#38302) · c197d73b
  由 yeliang2258 提交于 12月 21, 2021
```
* fix timeout bug

* update
```
  c197d73b
- B
  update repeated_fc_relu_fuse_pass ut (#37845) · a896d1ce
  由 baoachun 提交于 12月 21, 2021
```
* update repeated_fc_relu_fuse_pass ut

* update ut
```
  a896d1ce
- H
  optimize performance of offload in dygraph sharding stage2 (#38064) · f74ebd8a
  由 Haohongxiang 提交于 12月 21, 2021
```
* update

* fix bugs

* modify code style

* fix bugs of _get_global_group
```
  f74ebd8a
- H
  PassAutoScan 基线跟测试用例使用一样配置的config (#38252) · 61ef56a1
  由 heliqi 提交于 12月 21, 2021
```
* add timeout

* add timeout

* PassAutoScan base_line use same config

* try run base_line

* fix dropout Mask of output attr error

* fix dropout Mask of output attr error
```
  61ef56a1
- 石
  updates the check_file_diff_approvals for allocation refactor (#38257) · 88c2cba1
  由石晓伟提交于 12月 21, 2021
```
* updates the check_file_diff_approvals for allocation refactor, test=develop

* fix a bug, test=develop
```
  88c2cba1
- C
  [PTen] Remove eigen and blas directory (#38291) · d9fcdc3a
  由 Chen Weihang 提交于 12月 20, 2021
```
* remove eigen and blas dir

* fix declare error
```
  d9fcdc3a
20 12月, 2021 20 次提交
- S
  
  add check pass conflict tools (#38276) · 0d12aa64
  由 sneaxiy 提交于 12月 20, 2021
  
  0d12aa64
- B
  add mkldnn conv_transpose_bias fuse pass ut (#37508) · ac696941
  由 baoachun 提交于 12月 20, 2021
```
* add mkldnn conv_transpose_bias fuse pass ut

* update conv_transpose_bias_mkldnn_fuse_pass ut

* update conv_transpose_bias_mkldnn_fuse_pass ut

* update conv_transpose_bias_mkldnn_fuse_pass ut

* restrict conv2d data_format in conv_transpose_bias_mkldnn_fuse_pass

* update ut timeout setting

* update ut
```
  ac696941
- C
  [pten]add pten conj kernel (#38247) · a2793e5e
  由 chentianyu03 提交于 12月 20, 2021
```
* add pten conj kernel

* modify conj_kernel file path

* add defined cuda macro to cuda/conj_kernel.h
```
  a2793e5e
- B
  
  add gelu pbtxt for conv+gelu mkldnn fuse pass (#38162) · 1b7f6ae9
  由 baoachun 提交于 12月 20, 2021
  
  1b7f6ae9
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
- F
  
  Skip zero-size Allocation in RecordStream (#38264) · 48937020
  由 From00 提交于 12月 20, 2021
  
  48937020
- S
  Support FP16 for more ops (#38123) · 1f445bf3
  由 sneaxiy 提交于 12月 20, 2021
```
* support FP16 for more ops

* add amp list tests

* refine reduce_mean_grad

* fix OP benchmark ci

* fix fp16 reduce_mean

* updat ut, but still have some problems

* remove mean/reduce_mean fp16 kernel
```
  1f445bf3
- F
  optimize softmax with cross entropy soft label (#32387) · f8955602
  由 Feng Xing 提交于 12月 20, 2021
```
softmax_with_cross_entropy optimization with soft label. This PR includes optimization of
    "SoftmaxWithCrossEntropySoftLabel" : compute log_softmax and then compute loss.
    "CrossEntropySoftLabel" : compute loss with softmax as input.
These optimization includes following technics:
    read data to buffer with vectorization
    compute max and sum in warp
    fixed loop size with macro
Performance (computation time):
    softmax_with_cross_entropy_0 (forward) : -40.1%
    softmax_with_cross_entropy_0 (backward): -41%
```
  f8955602
- 石
  
  changes the call AllocShared to Alloc, test=develop (#38258) · bb0713b2
  由石晓伟提交于 12月 20, 2021
  
  bb0713b2
- F
  
  fix typos in header inclusion in complex_op.cc (#38272) · 2635cc86
  由 Feiyu Chan 提交于 12月 20, 2021
  
  2635cc86
- H
  add matmul_scale_fuse_pass (#37962) · ce335c23
  由 heliqi 提交于 12月 20, 2021
```
* add matmul_scale matmul_v2_scale fuse pass

* add scaletensor judge

* modify var name

* add timeout notest;test=coverag

* fix error commit

* fix use_mkldnn attr

* fix use_mkldnn attr
```
  ce335c23
- S
  
  fix use of implicitly deleted constructor (#38225) · 23d9e947
  由 Sylwester Fraczek 提交于 12月 20, 2021
  
  23d9e947
- S
  Remove windows requirement numpy <=1.19 (#38104) · 0e9597d5
  由 Sing_chan 提交于 12月 20, 2021
```
* test if windows still need numpy <=1.19

* modify acoording to zhouwei's comment
```
  0e9597d5
- K
  
  fix repeat doc, test=document_fix (#38238) · 2fc479c0
  由 kuizhiqing 提交于 12月 20, 2021
  
  2fc479c0
- 0
  
  [Dy2St]Skip windows for test_mnist_pure_fp16 (#38214) · 69cfb7a2
  由 0x45f 提交于 12月 20, 2021
  
  69cfb7a2
- Z
  Add multi_tensor for momentum optimizer and clear_grads (#37564) · 0cc5e22c
  由 zhangbo9674 提交于 12月 20, 2021
```
* add multi_tensor for momentum and clear_grads for optimizer

* fix bug for dygraph

* add unittest

* refine comment

* add param_group

* refine regularizaiton logic

* del clear_grads

* add clear_grads

* add dispensable check of None

* refine clear_grad

* fix build bug

* refine code by comment

* refine code

* add multi tensor check

* refine param_group update

* add multi tensor for static mode

* refine comments

* delete useless comma for momentum

* refine comment for momentum

* refine code by commment
```
  0cc5e22c
- Y
  
  [fleet_executor] Remove runtime graph, all scheduler on python side (#38261) · 2f188341
  由 Yuang Liu 提交于 12月 20, 2021
  
  2f188341
- F
  
  add doc for is_complex and is_integer and expose them as public APIs (#38158) · 8c9c81cc
  由 Feiyu Chan 提交于 12月 20, 2021
  
  8c9c81cc
- Y
  Fix bugs that copy occurs when tensor "in" and tensor "out" is same in reshape kernel (#38249) · a615002a
  由 YuanRisheng 提交于 12月 20, 2021
```
* fix bugs when run reshape

* fix ci bug
```
  a615002a
- Z
  
  move the directory of fill kernels in pten (#38219) · 06128b9f
  由 zyfncg 提交于 12月 20, 2021
  
  06128b9f
19 12月, 2021 1 次提交
- B
  
  Integration sharding stage2 function (#38151) · 327e5050
  由 Baibaifan 提交于 12月 19, 2021
  
  327e5050
18 12月, 2021 4 次提交
- N
  
  [pnorm] fix bug in pnorm (#38215) · 9e42fe9a
  由 Noel 提交于 12月 18, 2021
  
  9e42fe9a
- G
  
  fix seed for class_center_sample using paddle.seed (#38248) · 59be8e0e
  由 Guoxia Wang 提交于 12月 18, 2021
  
  59be8e0e
- Y
  add test_conv_act_mkldnn_fuse_pass (#38153) · 6418bc75
  由 yeliang2258 提交于 12月 18, 2021
```
* add test_conv_act_mkldnn_fuse_pass

* update cmakelist

* fix cmakelist

* fix timeout

* fix timeout

* fix timeout

* fix
```
  6418bc75
- F
  add complex op (#37918) · 31e874b1
  由 Feiyu Chan 提交于 12月 18, 2021
```
* add complex op and `paddle.complex`.
```
  31e874b1

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致