提交 · c48a9ad56e69a5d27d1b36df8c731c9c32f84d78 · Crayon鑫 / Paddle

17 1月, 2022 7 次提交

[Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5

由 Wilber 提交于 1月 17, 2022

* add pten::Place data structure.

* update ci problem

* fix ci problem

* update

* using platform::Place=pten::Place

* remove BOOST_GET_CONST for CPUPlace and GPUPlace

* compile pass 25%.

* compile pass 45%

* compile pass 60%

* remove boost_get for xpu npu mlu and ipu

* compile pass on cpu and gpu.

* fix compile problem

* fix compile error.

* update

* fix ci problem

* update

* ci approve

* fix ci problem

* fix ci eager test problem

* remove BOOST_GET_CONST

* fix npu compile

c48a9ad5

W

fix benchmark in paddlerec (#38278) · 1dbc8632
由 wangguanqun 提交于 1月 17, 2022

1dbc8632
S
Add NoReduce mode for ParallelExecutor (#38969) · e50d883e
由 sneaxiy 提交于 1月 17, 2022
```
* add no reduce mode for pe

* add NoReduce ut
```
e50d883e
S

add squared_l2_norm (#38968) · 6eeb16b8
由 sneaxiy 提交于 1月 17, 2022

6eeb16b8
R
fix paddle.where torch diff (#38870) · 096afbe1
由 ronnywang 提交于 1月 17, 2022
```
* fix paddle.where torch diff

* update
```
096afbe1
0
[Dy2St]close enable_inplace PASS for PE and open test_mnist_pure_fp16.py for windows (#38752) · 724d49da
由 0x45f 提交于 1月 17, 2022
```
* close enable_inplace PASS for PE, and test dy2st pure fp16 training stability

* add some comment

* enlarge atol
```
724d49da
J
Support auto prune logic in eager mode (#38960) · f81569e3
由 Jiabin Yang 提交于 1月 17, 2022
```
* support test_auto_prune_partial

* support rest of autoprune strategy in eager mode
```
f81569e3

15 1月, 2022 2 次提交

C
[PTen] Remove cached kernel context (#38953) · 35d2b71a
由 Chen Weihang 提交于 1月 15, 2022
```
* remove cached kernel context

* revert dataloader format change
```
35d2b71a

[Unify Tensors PR #7] Merged LoDTensor with Tensor, test=allcases (#38880) · 88966b28

由 Zhanlue Yang 提交于 1月 15, 2022

* Merged LoDTensor with Tensor,test=allcases

* Patched python level LoDTensor

* Fixed example code failure

* Polished function names, removed duplicated forward declarations

88966b28

14 1月, 2022 4 次提交

add flatten_contiguous_range OpConvert for Paddle-TRT (#38922) · 050aa6fe

由 heliqi 提交于 1月 14, 2022

* add trt_convert_flatten_contiguous_rang op

* trt version >7,support trt_convert_flatten_contiguous_rang

* trt version >7,support trt_convert_flatten_contiguous_rang

* trt version >7,support trt_convert_flatten_contiguous_rang

* test cast add trt version >=7 skip

050aa6fe

[XPU]add stack_grad op for kunlun2,*test=kunlun (#38674) · 87ee3e4f

由 Zhangjingyu06 提交于 1月 14, 2022

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun2,*test=kunlun

* [XPU]add split op for kunlun,*test=kunlun

* [XPU]add stack_grad op for kunlun2,*test=kunlun
Co-authored-by: NQingshuChen <chenqingshu@baidu.com>

87ee3e4f

B

Add dygraph sharding stage3 (#38052) · 4c77a908
由 Baibaifan 提交于 1月 14, 2022

4c77a908

[MLU]Add mean and reduce_mean op (#38872) · 7f8d5bc8

由 qipengh 提交于 1月 14, 2022

* [MLU]: add mean and reduce mean op

* [MLU]add mlu pytest dir in CMakeLists.txt

* [MLU]fix tensor data

* [MLU]fix TensorToPyArray and license

7f8d5bc8

13 1月, 2022 8 次提交

F
[NPU] fix tril_triu (#38864) · eaccdc71
由 furnace 提交于 1月 13, 2022
```
[NPU] fix tril_triu
```
eaccdc71
F
[NPU] fix expand op (#38526) · 7a5af630
由 furnace 提交于 1月 13, 2022
```
* [NPU] fix expand op

* [NPU] optimize codes

* [NPU] optimize codes
```
7a5af630

[pten]Remove pten/include dir files (#38878) · 7e0292ea

由 chentianyu03 提交于 1月 13, 2022

* move dot_dev api into dot_kernel.h

* add infermate header

* modify to dotkerel in dot_op.h

* mvoe conj dev api into complex_kernel.h

* move sign dev api into  sign_kernel.h

* move scale dev api into kernel.h and remove infermete.h

* rm paddle/pten/include/math.h

* rm paddle/pten/include/math.h

* rm include dir

* rm paddle/pten/include/math.h

* fix conflict with develop branch

* rm devContext in conj_op.h

* add the missing complex_kernel header

7e0292ea

J

[Dist Pass] AMP pass add dist_update_loss_scaling op (#38902) · 53783e1e
由 JZ-LIANG 提交于 1月 13, 2022

53783e1e
W
roi_align aligned supported (#38905) · 08dcea18
由 wenbin 提交于 1月 13, 2022
```
roi_align aligned supported
```
08dcea18

Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b

由 jakpiase 提交于 1月 13, 2022

* base changes for mul reimplementation

* empty commit

* tmp save

* full implementation of mul bf16/fp32 fwd bwd

* CI fix

* CI rerun

* changed unity build cmake to avoid gpu issues

* removed mul mkldnn from unity build

* added skipping tests if not cpu_bf16

* CI fix

* CI fix

* CI fix

fc6eed5b

C
Fix mkldnn invalid infershape impl (#38837) · 281644cd
由 Chen Weihang 提交于 1月 13, 2022
```
* fix mkldnn invalid infershape

* add unittest for mkldnn in new executor

* add import os
```
281644cd

Support test_imperative using_non_zero_gpu with _test_eager_guard() (#38881) · 5e515781

由 Weilong Wu 提交于 1月 13, 2022

* Support test_imperative using_non_zero_gpu and Add a TODO comment

* Change GPU number to 0

* Modify the cuda device selection method

5e515781

12 1月, 2022 7 次提交

the_one_ps dirs reconstruct (#38804) · 50609214

由 ziyoujiyi 提交于 1月 12, 2022

* delete gloo connect retry

* the_one_ps dirs reconstruct

* .

* .

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* create the_one_ps dirs

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

* the one ps dirs modify

50609214

S
Fix conv act int8 scale (#38331) · 4825addd
由 Sylwester Fraczek 提交于 1月 12, 2022
```
* fix conv act int8 scale

* add unit test for conv+hard_swish
```
4825addd

support 5d for nearest interp (#38868) · d296456c

由 xiaoting 提交于 1月 12, 2022

* support 5d for nearest

* update nearest3d unittest, test=develop

* fix approve ci, test=develop

* fix approve ci, test=develop

d296456c

[Dist Pass] Amp Pass (#38764) · cc24427e

由 JZ-LIANG 提交于 1月 12, 2022

* auto parallel sharding base

* chmod

* add unitest

* set unitest cmake dist label

* revise code according to rewiew

* chmod

* bugfix for grad_clip and param broadcast

* chmod

* update unitest

* chmod

* add clip

* chmod

* add amp pass

* chmod

* add unitest

* remove grad update

* fixed bug

* fixed bug

* fixed typose

* fixed typoes

cc24427e

J

support test_auto_prune_partial (#38871) · 4640955c
由 Jiabin Yang 提交于 1月 12, 2022

4640955c

Fix api docs (#38882) · 572ba24e

由 Chen Long 提交于 1月 12, 2022

* update readme test=document_fix

* update conll05 docs

* update conll05 docs test=document_fix

572ba24e

S
add args check and comment for exp,polynomy decay (#38782) · b7bae939
由 Sing_chan 提交于 1月 12, 2022
```
* add args check and comment for exp,polynomy decay

* modify according to zhouwei's comment
```
b7bae939

11 1月, 2022 5 次提交

W

Support test_numpy_bridge and thread_local_has_grad (#38835) · 29c211ee
由 Weilong Wu 提交于 1月 11, 2022

29c211ee

【Auto Parallel】New local tensor (#38747) · d3ba1895

由 caozhou 提交于 1月 11, 2022

* update dist tensor

* add unitest

* update unitest

* refactor dist tensor

* update dist tensor and unitest

d3ba1895

Z
[AMP] Check call order of paddle.amp.decorate and paddle.DataParallel (#38785) · fbb40281
由 zhangbo9674 提交于 1月 11, 2022
```
* check amp.decorate and DataParallel

* refine coverage

* fix layer dtype

* refine code
```
fbb40281

Jit pre save hook (#38186) · e91f7c02

由 Ming-Xu Huang 提交于 1月 11, 2022

* Pre-save hooks of jit.save

1. Added pre_save_hooks features to jit.save.
2. Added related unittests

* Added jit pre_save_hooks functions's alias to paddle.jit and copyright.

* Make jit.save_pre_hook style be consisent with Paddle's rule.

* Fixed arguments passing bug in run_save_pre_hooks

* Added API Documents

* Move clear and run_pre_save_hooks as internal methonds only.

* Made register_save_pre_hook as an internal function.

e91f7c02

[Eager] fix some eager logic (#38576) · d3686471

由 wanghuancoder 提交于 1月 11, 2022

* Rearranged Eager AutoCodeGen directory structure

* Removed USE_OP in Eager AutoCodeGen

* Enabled generation for Operators without Grad/Inputs/Outputs

* Resolved operators without input

* Fixed merge conflicts

* Enabled Eager AutoCodeGen for 10+ more operators

* Refactored Eager AutoCodeGen with more organized helper objects

* Enabled Eager AutoCodeGen for operators with multiple OpBases

* Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument

* Handled Dispensable Inputs/Outputs in Eager AutoCodeGen

* Adjusted function generation/call between Python-C API & Dygraph API

* Synchronized auto-generated Python-C API with Dygraph Forward Functions

* support more eager tensor api

* fix merge compile error

* fix compile error and fit develop code

* support pure CPU

* fix some logic error in eager_mode

* support _varbase_creator in eager mode

* Added safe_initialized interface to EagerTensor for use in processing dispensable inputs

* for eager mode

* refine

* support multiple constructor for eager tensor

* add place related code

* polish code

* specific randint with dtype of int64

* Support pure cpu test

* eager logic

* refine test in pure cpu

* eager logic

* eager logic

* eager logic, test=develop

* skip core.eager when in inference, test=develop

* refine, test=develop

* refine, test=develop

* call RetainGrad after run forward kernel, test=develop

* refine, test=develop

* support dygraph util, meta, guard test

* eager test case

* support inference test

* refine test and fix initializer failed

* modify eagertensor patch method

* add eagertensor.clear_grandint, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* support create varbase and fix retain grad error

* call monkey_patch_varbase in _test_eager_guard, test=develop

* fix windows error

* split clear_gradient to clear_gradient and zero_grads, test=develop

* refine, test=develop

* refine, test=develop

* support test_imperative_basic test in eager mode

* remove additional log in variable.h

* remove additional log in variable.h

* remove additional code create in merge

* eager

* fix some eager logic, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop
Co-authored-by: Njim19930609 <jim19930609@gmail.com>
Co-authored-by: NJiabinYang <360788950@qq.com>

d3686471

10 1月, 2022 7 次提交
- B
  
  update mul_gru_fuse_pass ut timeout setting (#38763) · 1f8fe035
  由 baoachun 提交于 1月 10, 2022
  
  1f8fe035
- H
  Add gpu kernel for new api : linalg.lstsq (#38621) · 405103d8
  由 Haohongxiang 提交于 1月 10, 2022
```
* add lstsq gpu kernel

* update

* add docs_en

* modify ut

* fix bugs

* modify example in docs_en

* remove lstsq_op.cu from ROCM cmake

* modify docs_en

* modify docs_en

* modify docs_en

* remove unneccessary TensorCopy
```
  405103d8
- L
  
  [Fleet Executor] Modified python cache strategy to support multi carriers (#38839) · c50c22b0
  由 LiYuRio 提交于 1月 10, 2022
  
  c50c22b0
- S
  
  fix bug of fp16 (#38838) · 7d4ce5b3
  由 ShenLiang 提交于 1月 10, 2022
  
  7d4ce5b3
- Y
  Add the backward support for QR (#38824) · 657b6742
  由 Yulong Ao 提交于 1月 10, 2022
```
* Add the backward support for QR

* Remove unnecessary comments
```
  657b6742
- H
  
  replace where with min and max · e30150dd
  由 HydrogenSulfate 提交于 1月 10, 2022
  
  e30150dd
- H
  
  update code · 3ab9ace5
  由 HydrogenSulfate 提交于 12月 28, 2021
  
  3ab9ace5

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致