提交 · e6310dbd03413a00c4a6d71ee89fe1ba0ca81bc5 · 机器未来 / Paddle

22 12月, 2021 14 次提交
- C
  [PTen] Add cmake function for kernels (#38311) · e6310dbd
  由 Chen Weihang 提交于 3年前
```
* add pten kernel cmake

* add pten kernel cmake function

* fix compile error

* add enforce include for full kernel

* fix compile failed

* change cuda to gpu

* fix cmake function error
```
  e6310dbd
- Z
  
  Replaced core.ops with _C_ops (#38337) · 242ef2b9
  由 Zhanlue Yang 提交于 3年前
  
  242ef2b9
- B
  add mkldnn reshape_transpose_matmul fuse pass ut and op version check (#37468) · 274b135b
  由 baoachun 提交于 3年前
```
* add mkldnn reshape_transpose_matmul fuse pass ut and op version check

* update reshape_transpose_matmul_mkldnn_fuse_pass ut

* update ut
```
  274b135b
- B
  update mkldnn batch_norm_activation fuse pass ut (#37402) · 3d7e737c
  由 baoachun 提交于 3年前
```
* update mkldnn batch_norm_activation fuse pass ut

* update ut

* update mkldnn batch_norm_act_fuse_pass ut

* update batch_norm_act_fuse_pass ut

* update ut
```
  3d7e737c
- 王
  
  [infrt] add tensorrt op teller pass. test=develop (#38304) · 44112817
  由王明冬提交于 3年前
  
  44112817
- L
  
  [fleet_executor] Move IntraSend to Carrier. Using blocking queue (#38322) · ddc15a18
  由 LiYuRio 提交于 3年前
  
  ddc15a18
- G
  
  fix clip extra when QAT export model (#38323) · 142ea171
  由 Guanghua Yu 提交于 3年前
  
  142ea171
- G
  
  fix prelu weight shape for NHWC of static mode (#38310) · 0a79499c
  由 Guoxia Wang 提交于 3年前
  
  0a79499c
- C
  
  add copy constructor for densetensor (#38319) · fabc058b
  由 Chen Weihang 提交于 3年前
  
  fabc058b
- Y
  [PTen]Move flatten kernel to new directory (#38255) · 4d1ce184
  由 YuanRisheng 提交于 3年前
```
* move flatten

* fix bugs of test

* modify header file

* add copy declare

* fix compile bugs
```
  4d1ce184
- J
  
  Add nearest_interp/v2 int8 and uint8 support (#37985) · 56e2a6a6
  由 joanna.wozna.intel 提交于 3年前
  
  56e2a6a6
- Z
  Rename full infer_meta (#38332) · abb07f35
  由 zyfncg 提交于 3年前
```
* rename full infer_meta

* fix merge problem
```
  abb07f35
- W
  CE fix (#38324) · 90e9a486
  由 wenbin 提交于 3年前
```
* CE fix

* more format
```
  90e9a486
- Z
  
  Added check_file_diff_approvals for tensor and lod_tensor (#38314) · c9bc2758
  由 Zhanlue Yang 提交于 3年前
  
  c9bc2758
21 12月, 2021 17 次提交
- Z
  Fix inplace problem of setitem (#38298) · da61df5c
  由 zyfncg 提交于 3年前
```
* add inplace_map for trace_op in pybind

* fix inplace problem of setitem

* refactor the param format  of trace_op
Co-authored-by: Npangyoki <pangyoki@126.com>
```
  da61df5c
- B
  update seqconv_eltadd_relu_fuse_pass ut (#37907) · 4e578c2b
  由 baoachun 提交于 3年前
```
* update seqconv_eltadd_relu_fuse_pass ut

* update ut

* update ut

* update ut
```
  4e578c2b
- B
  update squared_mat_sub_fuse_pass ut (#37838) · aadc8674
  由 baoachun 提交于 3年前
```
* update squared_mat_sub_fuse_pass ut

* update ut

* update ut
```
  aadc8674
- C
  [PTen] Rename cuda dir and context to gpu (#38296) · dc7597e3
  由 Chen Weihang 提交于 3年前
```
* rename cuda to gpu

* revert CMake change

* resolve conflit

* rename other cuda to gpu

* poish details
```
  dc7597e3
- C
  use elementwise to optimize gelu forward implementation on GPU (#38188) · aff43684
  由 crystal 提交于 3年前
```
* relu forward opt

* add gelu functor

* optimize code
```
  aff43684
- A
  
  Fix for wrong conditions between forward and backward in elementwise_add_grad op (#38176) · d9780a22
  由 arlesniak 提交于 3年前
  
  d9780a22
- Y
  
  [fleet_executor] Python side fleet executor and task node (#38290) · a4afb97a
  由 Yuang Liu 提交于 3年前
  
  a4afb97a
- G
  
  fix recompute no grad warning (#38293) · 2005b98b
  由 Guoxia Wang 提交于 3年前
  
  2005b98b
- B
  add seqpool_cvm_concat_fuse_pass ut (#37902) · 06cf314a
  由 baoachun 提交于 3年前
```
* add seqpool_cvm_concat_fuse_pass ut

* rename ut name
```
  06cf314a
- C
  [pten] fix when out_dtype is same with x.dtype and still transform type error (#38285) · e0fd3bbf
  由 chentianyu03 提交于 3年前
```
* fix when out_dtype is same with x.dtype and still transform type error

* fix spell error
```
  e0fd3bbf
- S
  Support FP16 mean (#38289) · 643a268e
  由 sneaxiy 提交于 3年前
```
* mean first version

* fix scalar mean

* add fp16 dtype for api
```
  643a268e
- Y
  Fix test_conv_eltwiseadd_bn_fuse_pass timeout bug (#38302) · c197d73b
  由 yeliang2258 提交于 3年前
```
* fix timeout bug

* update
```
  c197d73b
- B
  update repeated_fc_relu_fuse_pass ut (#37845) · a896d1ce
  由 baoachun 提交于 3年前
```
* update repeated_fc_relu_fuse_pass ut

* update ut
```
  a896d1ce
- H
  optimize performance of offload in dygraph sharding stage2 (#38064) · f74ebd8a
  由 Haohongxiang 提交于 3年前
```
* update

* fix bugs

* modify code style

* fix bugs of _get_global_group
```
  f74ebd8a
- H
  PassAutoScan 基线跟测试用例使用一样配置的config (#38252) · 61ef56a1
  由 heliqi 提交于 3年前
```
* add timeout

* add timeout

* PassAutoScan base_line use same config

* try run base_line

* fix dropout Mask of output attr error

* fix dropout Mask of output attr error
```
  61ef56a1
- 石
  updates the check_file_diff_approvals for allocation refactor (#38257) · 88c2cba1
  由石晓伟提交于 3年前
```
* updates the check_file_diff_approvals for allocation refactor, test=develop

* fix a bug, test=develop
```
  88c2cba1
- C
  [PTen] Remove eigen and blas directory (#38291) · d9fcdc3a
  由 Chen Weihang 提交于 3年前
```
* remove eigen and blas dir

* fix declare error
```
  d9fcdc3a
20 12月, 2021 9 次提交

S

add check pass conflict tools (#38276) · 0d12aa64
由 sneaxiy 提交于 3年前

0d12aa64

add mkldnn conv_transpose_bias fuse pass ut (#37508) · ac696941

由 baoachun 提交于 3年前

* add mkldnn conv_transpose_bias fuse pass ut

* update conv_transpose_bias_mkldnn_fuse_pass ut

* update conv_transpose_bias_mkldnn_fuse_pass ut

* update conv_transpose_bias_mkldnn_fuse_pass ut

* restrict conv2d data_format in conv_transpose_bias_mkldnn_fuse_pass

* update ut timeout setting

* update ut

ac696941

[pten]add pten conj kernel (#38247) · a2793e5e

由 chentianyu03 提交于 3年前

* add pten conj kernel

* modify conj_kernel file path

* add defined cuda macro to cuda/conj_kernel.h

a2793e5e

B

add gelu pbtxt for conv+gelu mkldnn fuse pass (#38162) · 1b7f6ae9
由 baoachun 提交于 3年前

1b7f6ae9
F

[MLU]add mlu backend (#38207) · 76514a1f
由 fwenguang 提交于 3年前

76514a1f
F

Skip zero-size Allocation in RecordStream (#38264) · 48937020
由 From00 提交于 3年前

48937020

Support FP16 for more ops (#38123) · 1f445bf3

由 sneaxiy 提交于 3年前

* support FP16 for more ops

* add amp list tests

* refine reduce_mean_grad

* fix OP benchmark ci

* fix fp16 reduce_mean

* updat ut, but still have some problems

* remove mean/reduce_mean fp16 kernel

1f445bf3

optimize softmax with cross entropy soft label (#32387) · f8955602

由 Feng Xing 提交于 3年前

softmax_with_cross_entropy optimization with soft label. This PR includes optimization of
    "SoftmaxWithCrossEntropySoftLabel" : compute log_softmax and then compute loss.
    "CrossEntropySoftLabel" : compute loss with softmax as input.
These optimization includes following technics:
    read data to buffer with vectorization
    compute max and sum in warp
    fixed loop size with macro
Performance (computation time):
    softmax_with_cross_entropy_0 (forward) : -40.1%
    softmax_with_cross_entropy_0 (backward): -41%

f8955602

石

changes the call AllocShared to Alloc, test=develop (#38258) · bb0713b2
由石晓伟提交于 3年前

bb0713b2

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致