提交 · 48937020dc75bd2ede45ff2f1ef9d33386b0934a · 机器未来 / Paddle

20 12月, 2021 15 次提交
- F
  
  Skip zero-size Allocation in RecordStream (#38264) · 48937020
  由 From00 提交于 12月 20, 2021
  
  48937020
- S
  Support FP16 for more ops (#38123) · 1f445bf3
  由 sneaxiy 提交于 12月 20, 2021
```
* support FP16 for more ops

* add amp list tests

* refine reduce_mean_grad

* fix OP benchmark ci

* fix fp16 reduce_mean

* updat ut, but still have some problems

* remove mean/reduce_mean fp16 kernel
```
  1f445bf3
- F
  optimize softmax with cross entropy soft label (#32387) · f8955602
  由 Feng Xing 提交于 12月 20, 2021
```
softmax_with_cross_entropy optimization with soft label. This PR includes optimization of
    "SoftmaxWithCrossEntropySoftLabel" : compute log_softmax and then compute loss.
    "CrossEntropySoftLabel" : compute loss with softmax as input.
These optimization includes following technics:
    read data to buffer with vectorization
    compute max and sum in warp
    fixed loop size with macro
Performance (computation time):
    softmax_with_cross_entropy_0 (forward) : -40.1%
    softmax_with_cross_entropy_0 (backward): -41%
```
  f8955602
- 石
  
  changes the call AllocShared to Alloc, test=develop (#38258) · bb0713b2
  由石晓伟提交于 12月 20, 2021
  
  bb0713b2
- F
  
  fix typos in header inclusion in complex_op.cc (#38272) · 2635cc86
  由 Feiyu Chan 提交于 12月 20, 2021
  
  2635cc86
- H
  add matmul_scale_fuse_pass (#37962) · ce335c23
  由 heliqi 提交于 12月 20, 2021
```
* add matmul_scale matmul_v2_scale fuse pass

* add scaletensor judge

* modify var name

* add timeout notest;test=coverag

* fix error commit

* fix use_mkldnn attr

* fix use_mkldnn attr
```
  ce335c23
- S
  
  fix use of implicitly deleted constructor (#38225) · 23d9e947
  由 Sylwester Fraczek 提交于 12月 20, 2021
  
  23d9e947
- S
  Remove windows requirement numpy <=1.19 (#38104) · 0e9597d5
  由 Sing_chan 提交于 12月 20, 2021
```
* test if windows still need numpy <=1.19

* modify acoording to zhouwei's comment
```
  0e9597d5
- K
  
  fix repeat doc, test=document_fix (#38238) · 2fc479c0
  由 kuizhiqing 提交于 12月 20, 2021
  
  2fc479c0
- 0
  
  [Dy2St]Skip windows for test_mnist_pure_fp16 (#38214) · 69cfb7a2
  由 0x45f 提交于 12月 20, 2021
  
  69cfb7a2
- Z
  Add multi_tensor for momentum optimizer and clear_grads (#37564) · 0cc5e22c
  由 zhangbo9674 提交于 12月 20, 2021
```
* add multi_tensor for momentum and clear_grads for optimizer

* fix bug for dygraph

* add unittest

* refine comment

* add param_group

* refine regularizaiton logic

* del clear_grads

* add clear_grads

* add dispensable check of None

* refine clear_grad

* fix build bug

* refine code by comment

* refine code

* add multi tensor check

* refine param_group update

* add multi tensor for static mode

* refine comments

* delete useless comma for momentum

* refine comment for momentum

* refine code by commment
```
  0cc5e22c
- Y
  
  [fleet_executor] Remove runtime graph, all scheduler on python side (#38261) · 2f188341
  由 Yuang Liu 提交于 12月 20, 2021
  
  2f188341
- F
  
  add doc for is_complex and is_integer and expose them as public APIs (#38158) · 8c9c81cc
  由 Feiyu Chan 提交于 12月 20, 2021
  
  8c9c81cc
- Y
  Fix bugs that copy occurs when tensor "in" and tensor "out" is same in reshape kernel (#38249) · a615002a
  由 YuanRisheng 提交于 12月 20, 2021
```
* fix bugs when run reshape

* fix ci bug
```
  a615002a
- Z
  
  move the directory of fill kernels in pten (#38219) · 06128b9f
  由 zyfncg 提交于 12月 20, 2021
  
  06128b9f
19 12月, 2021 1 次提交
- B
  
  Integration sharding stage2 function (#38151) · 327e5050
  由 Baibaifan 提交于 12月 19, 2021
  
  327e5050
18 12月, 2021 5 次提交
- N
  
  [pnorm] fix bug in pnorm (#38215) · 9e42fe9a
  由 Noel 提交于 12月 18, 2021
  
  9e42fe9a
- G
  
  fix seed for class_center_sample using paddle.seed (#38248) · 59be8e0e
  由 Guoxia Wang 提交于 12月 18, 2021
  
  59be8e0e
- Y
  add test_conv_act_mkldnn_fuse_pass (#38153) · 6418bc75
  由 yeliang2258 提交于 12月 18, 2021
```
* add test_conv_act_mkldnn_fuse_pass

* update cmakelist

* fix cmakelist

* fix timeout

* fix timeout

* fix timeout

* fix
```
  6418bc75
- F
  add complex op (#37918) · 31e874b1
  由 Feiyu Chan 提交于 12月 18, 2021
```
* add complex op and `paddle.complex`.
```
  31e874b1
- 王
  
  [infrt] add unit test script for infrt. test=develop (#38232) · a3bd6fc0
  由王明冬提交于 12月 18, 2021
  
  a3bd6fc0
17 12月, 2021 19 次提交

Add mcmc of planner, of update cost model and relaunch (#38177) · 1bb2c68a

由 caozhou 提交于 12月 17, 2021

* add planner

* add planner

* add cost model update

* add relaunch updation

* update process_group

* fix error

* add unitest

* update unitest

* update cost model

* avoid api problem

1bb2c68a

Support multi place constructor (#38171) · 6f439e5a

由 Jiabin Yang 提交于 12月 17, 2021

* support more eager tensor api

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* refine test in pure cpu

* refine test in pure cpu

6f439e5a

L

fit CI_SKIP_CPP_TEST (#38242) · b613c31e
由 Leo Chen 提交于 12月 17, 2021

b613c31e
S
Add _compile_dir argument for custom ops compilation (#38211) · 411d64ad
由 sneaxiy 提交于 12月 17, 2021
```
* add compile_dir

* follow comments
```
411d64ad
C

add scale lost deps (#38237) · 66a9d71a
由 Chen Weihang 提交于 12月 17, 2021

66a9d71a

Refine some AMP operators for BERT (#37923) · d80fe268

由 sneaxiy 提交于 12月 17, 2021

* support multi precision update for LAMB

* hide some api

* fix ci uts

* fix lamb output of dygraph

* remove some changes to some PR

* try to fix Py3 CI compile error

* fix test_imperative_optimizer, add lars ut, add layer_norm ut

* fix ut, fix format

* fix ut

* fix windows ci

d80fe268

S

write includes.txt (#38210) · cff03734
由 sneaxiy 提交于 12月 17, 2021

cff03734
F

add test for conv_transpose_bn_fuse_pass (#38203) · 672d94b2
由 feng_shuai 提交于 12月 17, 2021

672d94b2
S

fix bug when build inference lib without tensorrt (#38156) · 6d1b8c52
由 Sing_chan 提交于 12月 17, 2021

6d1b8c52

[pten] modify reduce_sum reduce_mean args (#38216) · eaa2363e

由 chentianyu03 提交于 12月 17, 2021

* modify sum mean args

* add GetExpectedPtenKernelArgs for redcue_op

* modify kernel args number

* modify kernel args number

eaa2363e

L

[fleet_executor] Fix the problem in fleet executor stop (#38114) · 843435ff
由 LiYuRio 提交于 12月 17, 2021

843435ff

Generated CoreOpsInfos for potential use in append_op API (#38085) · e3b033f9

由 Zhanlue Yang 提交于 12月 17, 2021

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* Generated CoreOpsInfos for potential use in append_op API

* Fixed CI problem

e3b033f9

K

add op/api repeat/interleave (#37981) · a7de0e66
由 kuizhiqing 提交于 12月 17, 2021

a7de0e66
H
test_adaptive_pool2d_convert_global_pass增加超时时间 (#38220) · 885767e3
由 heliqi 提交于 12月 17, 2021
```
* add timeout

* add timeout
```
885767e3
L
[Paddle-TRT] Use TRT inspector to show the information inside an engine to verbose log (#38200) · 237c1fe6
由 Leo Chen 提交于 12月 17, 2021
```
* Inspect the information inside a TRT engine.

* Follow up the google code style.

* Fix code error.
```
237c1fe6
A
[CustomOp]Add RWLock to protect loading module under multi-thread and multi-process (#38128) · 8bc27015
由 Aurelius84 提交于 12月 17, 2021
```
* Add RWLock to protect loading module under multi-thread

* refine code

* remove import statement
```
8bc27015

add launch bound to limit the registers usage for volta architecture (#38113) · 18a59822

由 zlsh80826 提交于 12月 17, 2021

From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block.
There are two ways to resolve this problem:
Reduce the threads per block launch configuration
add __launch_bound__ to give information to nvcc compiler for reducing registers usage
this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.

18a59822

Z
[AutoParallel] add gpt model for unittest (#38202) · 76eb371e
由 zhaoyingli 提交于 12月 17, 2021
```
* add gpt modeling

* update file name
```
76eb371e
N

Delete cub_reduce.h and modified the TensorReduce to TensorReduceFunctorImpl (#38197) · 9a8a4c77
由 niuliling123 提交于 12月 17, 2021

9a8a4c77

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致