提交 · 6a15d407b043630420dcbf528990dc915e39a3e0 · PaddlePaddle / Paddle

16 8月, 2022 11 次提交
- C
  [Auto Paralle]Add reshard cost and update estimator (#45118) · 6a15d407
  由 caozhou 提交于 8月 16, 2022
```
* update reshard cost and cost estimator

* add unittest

* add dropout cost

* fix import error

* fix reshard code style error

* improve unittest coverage
```
  6a15d407
- F
  convert multihead to oss (#45019) · f706d95d
  由 feng_shuai 提交于 8月 16, 2022
```
* convert multihead to oss

* fix:bug

* fix:delete const cast

* fix:don't support bias_qk

* add vit pass

* fix:convert bug and add preln_residual_bias

* support length=-1

* add UT for convert

* add no_bias_qk support for gpu_multihead_op

* delete infer_shape depends on bias_qk

* oss just can be used in T4 and A*

* fix:change api for ROCM CI
```
  f706d95d
- Y
  
  Enable Profiler to add nvtx tag. (#45162) · ff2f1373
  由 Yiqun Liu 提交于 8月 16, 2022
  
  ff2f1373
- W
  fix the document for the add api (#45101) · d85cf4d4
  由 wawltor 提交于 8月 16, 2022
```
* fix the api for the add

* update the document for the api add

* update add docs; test=document_fix
Co-authored-by: NLigoml <39876205+Ligoml@users.noreply.github.com>
```
  d85cf4d4
- H
  [Fleet] Reconstruct of Fleet API in Dygraph Mode (#44922) · c17e6af8
  由 Haohongxiang 提交于 8月 16, 2022
```
* reconstruct_of_fleet_api

* update
```
  c17e6af8
- H
  
  transfer nearest_interp op to phi, change name from nearest_interp_v2 to nearest_interp (#45148) · 6452ab3b
  由 HongyuJia 提交于 8月 16, 2022
  
  6452ab3b
- H
  
  [XPU] add truncated_gaussian_random op. (#45152) · 5bcabf78
  由 houj04 提交于 8月 16, 2022
  
  5bcabf78
- J
  [AutoParallel] Prune D2H memcpy for fp16 pass (#45159) · e2b924bf
  由 JZ-LIANG 提交于 8月 16, 2022
```
* prune d2h memcpy for fp16 pass
```
  e2b924bf
- S
  【autograd】add select_p、eq_p、pow_p primitive operator for new autograd (#44813) · b681c88c
  由 Sing_chan 提交于 8月 16, 2022
```
* add select_p

* fix bugs

* add custom test for select_p; modify select_p primrules

* modify according to xiaoxu's comment

* add eq_p, select_p, pow_p, use autograd to test grad of high order

* add requirement of autograd, modify expected type of eq

* modify according to xiaoxu's comment

* import primops to use primops.pow
```
  b681c88c
- F
  
  add strongly typed functions to set attributes to avoid unexpected type conversions. (#45107) · 307801d5
  由 Feiyu Chan 提交于 8月 16, 2022
  
  307801d5
- C
  
  support momentum op auto generation (#45163) · 642f6df9
  由 Charles-hit 提交于 8月 16, 2022
  
  642f6df9
15 8月, 2022 12 次提交

C

support adamw generation (#45149) · 1353761a
由 Charles-hit 提交于 8月 15, 2022

1353761a

由 wuhuachaocoding 提交于 8月 15, 2022

* refactor fleet.

* refact fleet.py.

* update fleet/__init__.py.

* update fleet.py

* update code style.

* update fleet

* update fleet

* update fleet

* update fleet

* update model.py

* update fleet.

* update __init__.py

* update fleet.

* update fleet.

* update fleet

* update fleet

* update fleet

* update fleet.

* update optimizer.py

* update optimizer

* update fleet.py

* update scaler.py

* update setup.py.in

8636d2a2

R
modify atol and rtol to solve unnittest failure (#45139) · f30c7bd6
由 RichardWooSJTU 提交于 8月 15, 2022
```
Co-authored-by: NminghaoBD <liminghao03@baidu.com>
```
f30c7bd6

[phi] change op name linear_interp_v2 to linear_interp (#45128) · 6de3bdb3

由 HongyuJia 提交于 8月 15, 2022

* change name linear_interp_v2 to linear_interp

* fix deprecated_op_names

* deprecated_op_names add linear_interp_grad

6de3bdb3

Refine TRT unit test (#45102) · 3512bf11

由 zlsh80826 提交于 8月 15, 2022

* Reduce pool2d test configuration

* Reduce depthwise_conv2d test configuration

* Reduce trt_convert_conv2d_fusion test configuration

* Reduce trt_convert_conv2d test configuration

* Reduce trt_convert_conv2d_transpose test configuration

* Reduce trt_convert_hard_swish test configuration

* Enhance trt auto scan test error message and mechanism

* Increase FP16 trt ut tolerance

3512bf11

W
[Eager] fix sync batch norm to inplace (#45028) · c75b091b
由 wanghuancoder 提交于 8月 15, 2022
```
* fix sync batch norm to inplace
```
c75b091b
Z

add mish and mish_grad for XPU, test=kunlun (#45098) · 6815c8ab
由 zhangyikun02 提交于 8月 15, 2022

6815c8ab
Z
[AutoParallel] add collate_fn for dist_loader (#45053) · 3649099f
由 zhaoyingli 提交于 8月 15, 2022
```
* add collate_fn

* fix number of inputs
```
3649099f
H
[jit] rm useless property pybind (#44962) · 8788513b
由 Hui Zhang 提交于 8月 15, 2022
```
* rm useless pybind

* rm useless ut
```
8788513b

[Auto Parallel] Move the distributed info from python to c++ (#44510) · a52357fe

由 Yulong Ao 提交于 8月 15, 2022

* [Auto Parallel] Move the distributed info from python to c++

* [Auto Parallel] Add dist_attrs for VarDesc and OpDesc

* [Auto Parallel] Add the lost file

* [Auto Parallel] Make the dist attr be unique_ptr

* [Auto Parallel] Add the proto conversion

* [Auto Parallel] Improve the proto support

* [Auto Parallel] Fix the bugs for adding a device or a link

* [Auto Parallel] Add the C++ ProcessMesh and DistributedMapper

* [Auto Parallel] Improve the impl of these dist attrs

* [Auto Parallel] Pybind11 ProcessMesh and DeviceMesh

* [Auto Parallel] Fix the unittest problem

* [Auto Parallel] Explicitly add the src file for auto_parallel target

* [Auto Parallel] Add the proto depedency explicitly

* [Auto Parallel] Fix the cmake bug on windows and mac

* [Auto Parallel] Remove the pybind11 header file in process_mesh.h

* [Auto Parallel] Remove unused codes

* [Auto Parallel] Check whether the dist attr is null

* [Auto Parallel] Implement the assign operator for OpDesc explicitly

a52357fe

[XPU] add some collective ops. (#45049) · 7e2a20d5

由 houj04 提交于 8月 15, 2022

* [XPU] add some collective ops. test=kunlun

* use XPUOpTestWrapper. test=kunlun

* skip kl1 for collective ops. fix typo: deivce -> device. test=kunlun

7e2a20d5

R
Update FLAGS for standalone executor (#45127) · 566bbf0c
由 Ruibiao Chen 提交于 8月 15, 2022
```
* Update FLAGS for standalone executor

* Update FLAGS_FORCE_USE_PROGRAM_CACHE
```
566bbf0c

13 8月, 2022 2 次提交

Refine program cache (#45005) · e96dae8b

由 Leo Chen 提交于 8月 13, 2022

* add cached_serialize_str_

* support program hash

* add sha

* add ut

* use hash_str only for new_exe

* fix attr order

e96dae8b

fl-ps: support split sparse params in local & remote (#44864) · 3f5c405f

由 ziyoujiyi 提交于 8月 13, 2022

* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* fl-ps v1.0

* .

* support N + N mode

* .

* .

* .

* .

* delete print

* .

* .

* .

* .

* fix bug

* .

* .

* fl-ps with coordinator ready

* merge dev

* update message parse only

* update fl client scheduler

* fix bug

* update multithreads sync

* fix ci errors

* update role_maker.py

* update role_maker.py

* fix ci error: windows py import error

* fix ci error: windows py import error

* fix windows ci pylib import error

* add dump fields & params

* try to fix windows import fleet error

* fix ps FLAGS error

* fix logging risk

* fix logging possible risk

* write trainer_desc file

* support split sparse params in local & remote

* fix import paddle.fluid.core.PSGPU

* fix import paddle.fluid.core.PSGPU

* add remote_sparse & local_sparse config

* fix unittest

* fix test_dist_fleet_geo table error

* fix PADDLE_ENFORCE error

* fix other's pr conflict

3f5c405f

12 8月, 2022 11 次提交

Offload calculations from matmul op to fuse pass (#44941) · acb78ea2

由 Sławomir Siwek 提交于 8月 12, 2022

* remove v2_transpose_reshape

* matmul_transpose_reshape

* reshape_transpose_matmul

* Add int8 support for matmulV2

* restore ut

* adjust old ut

* restore parallel UT ruels

* remove mkldnn code from base ops

* move enforces to pass

* remove duplicated functions

* delete duplicated enforces

* feedback from review

* add comments to variables

* enable eltwise support

* dynamic attribute

* remove fusepass tests from op test

* remove fuse pass cases from op test

* revert introduction of dynamic attributes

* style
Co-authored-by: Nwozna <joanna.wozna@intel.com>

acb78ea2

[phi] Transfer linear_interp_v2 yaml to phi (#45072) · c737232f

由 HongyuJia 提交于 8月 12, 2022

* support optional<vector<Tensor>> in yaml and eager

* delete useless comments in eager_gen.py

* fix api_base.py support optional<vector<TTensor>>

* python_c_gen.py support optional<vector<tensor>>

* transfer linear_interp_v2 yaml from fluid to phi

* fix op_test typo error

* change linear_interp_v2 testcase

* fix args in final_state_linear_interp_v2

* fix zeropad2d typo. test=document_fix

c737232f

[Auto Parallel] Update reshard for auto search (#45002) · 8624f3b1

由 caozhou 提交于 8月 12, 2022

* update reshard for auto search

* fix unittest bug

* update dist tensor

* update reshard output

* fix unittests bug

* merge develop

8624f3b1

C

Add Quant Row&Column ParallelLinear (#44869) · 236ad4fc
由 Chang Xu 提交于 8月 12, 2022

236ad4fc
A
Fix concat and tile attribute for 2ONNX (#44658) · bb8203cd
由 Aurelius84 提交于 8月 12, 2022
```
* Fix concat and tile attribute for ONNX

* disable unittest
```
bb8203cd
J
[Auto Parallel] Data Parallel Optimization Pass 1 (#44882) · 7aeec4ed
由 JZ-LIANG 提交于 8月 12, 2022
```
* bugfix

* remove scaling

* support rescale_grad opt
```
7aeec4ed

[Eager] Support more final_state code (#44986) · cf17ae8a

由 Jiabin Yang 提交于 8月 12, 2022

* support more final_state code

* support more final_state code

* fix api error

* fix norm error

* fix pool3d error

* revert pool3d and max_pool_3d_adaptive

* fix code check error

* fix norm problem

cf17ae8a

H

change default log level (#45093) · 34234282
由 hong 提交于 8月 12, 2022

34234282

[Auto Parallel] Pybind ProcessMesh and DeviceMesh (#45013) · 5bf3dec9

由 Yulong Ao 提交于 8月 12, 2022

* [Auto Parallel] Pybind11 ProcessMesh and DeviceMesh

* [Auto Parallel] Fix the unittest problem

* [Auto Parallel] Explicitly add the src file for auto_parallel target

* [Auto Parallel] Add the proto depedency explicitly

* [Auto Parallel] Fix the cmake bug on windows and mac

* [Auto Parallel] Remove the pybind11 header file in process_mesh.h

5bf3dec9

D
enhance grid_sampler to support 3d input (#45015) · 1773fbba
由 duanyanhui 提交于 8月 12, 2022
```
* enhance grid_sampler to support 3d input
```
1773fbba

[geometric]Add paddle.geometric.send_ue_recv API (#43174) · 615b15a3

由 Siming Dai 提交于 8月 12, 2022

* add init file

* add op definition and infermeta

* add kernel definition funcs

* add broadcast infer shape

* add gpu forward kernel

* delete SUB and DIV

* add x_grad

* add template

* add e_grad for min and max

* fix small bug

* temp commit

* temp commit

* add e_grad for sum and mean

* fix some compile bug

* fix compile bugs

* fix compile problem

* add sum forward unittest

* fix broadcast error, add kernel sig, register e_grad, change unit test

* fix grad

* add temp grad fix

* temp commit

* add min max unittest

* add max, min unittest, fix mul bug

* add cpu forward sum and mean

* add forward min max, fix mean unittest

* add cpu backward min max

* fix code-style

* add backward sum mean

* fix rocm ci

* set uniitest timeout

* fix bug of x broadcast to e, gpu grad

* fix bug of x broadcast to e, cpu grad

* rename BOOST_GET_CONST macro

* fix rocm ci

* mv graph_send_e_recv to graph_send_ue_recv

* move out_size to IntArray

* add eager op test

* fix max pool type bug, add unittest for api

* revise api doc

* add fp16 for atomic min and max, add unittest

* add unittest

* add fp16 support for graph_send_recv

* fix unittest fp16 bug

* change OutSizeTensor to Out_size

* move E to Y

* add copyright, fix comment

* review code

* fix thread block size

* fix thread block size

* change api attribute name: pool_type to reduce_op, compute_type to message_op

* change api attribute name, move pool_type to reduce_op, move compute_type to message_op

615b15a3

11 8月, 2022 4 次提交
- H
  
  [XPU] fix typo in unit tests. (#45081) · 9b35f035
  由 houj04 提交于 8月 11, 2022
  
  9b35f035
- C
  make affine_grid_op support 5d input_dim on cpu and gpu (#45012) · 7812522c
  由 carryyu 提交于 8月 11, 2022
```
* make affine_grid_op support 5d_input on cpu and gpu
```
  7812522c
- H
  add new ptq method ptf (#44246) · f4bc69ec
  由 handiz 提交于 8月 11, 2022
```
* add new ptq method ptf

* add post training quantization mobilenetv1 test for ptf

* add post training quantization mobilenetv1 test for ptf test=allcases
```
  f4bc69ec
- R
  
  Disable ExecutionStrategy UTs for standalone executor (#45067) · f901f020
  由 Ruibiao Chen 提交于 8月 11, 2022
  
  f901f020

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功