提交 · 3f2a665a0a23a0b0e0d472ef0699322e5e80a9a8 · PaddlePaddle / Paddle

30 11月, 2021 4 次提交

G
support data_format='NHWC' for prelu channel mode (#37019) · 3f2a665a
由 Guoxia Wang 提交于 11月 30, 2021
```
* support data_format='NHWC' for prelu channel mode
```
3f2a665a

[Auto Parallel] Do the physical mapping between the process graph and the cluster graph (#37094) · b0dff05d

由 Yulong Ao 提交于 11月 30, 2021

* [Auto Parallel]  Add the unified cluster representation

* [Auto Parallel] Add the graph class for physical mapping

* [Auto Parallel] Add the simple physical mapper

* Set the timeout of the mapper

* Merge the upstream develop unittests cmake files

* Fix a bug of the process group

* Remove mapper unittest from platforms which is not GPU

* Move the instantiation of process group after resharding

* Add the local id for devices

* Update the rank mapping format

* Add some comments

* Remove the related files about mapping

* Update the unittest for auto mapping

* Remove unused rank_mapping unittest

* Improve the unittest coverage

* Improve the unittest coverage

b0dff05d

L

[Fleet_Executor] Passing runtime scope and place (#37603) · 87e65a99
由 LiYuRio 提交于 11月 30, 2021

87e65a99
X
Fix test calc gradient (#37672) · a0631364
由 xiongkun 提交于 11月 30, 2021
```
* add scope_guard

* 1. fix control flow cases 2. fix calc_gradient
```
a0631364

29 11月, 2021 7 次提交

add expand_v2/expand_as_v2 for kunlun (#37592) · dae4e7f2

由 TTerror 提交于 11月 29, 2021

* add expand_v2/expand_as_v2 for kunlun

* update expand_as_v2

* update expand_as_v2

* support float16/bool

* update xpu.cmake

dae4e7f2

[AMP] For `amp.decorate()` optimizers set to None is ok (#37541) · 2bb3f0b5

由 zhangbo9674 提交于 11月 29, 2021

* amp.decorate optimizers set to None is ok

* refine unittest

* add unittest and refine example code

* refine unittest

2bb3f0b5

Y

[fleet_executor] Hold the carrier while running for one micro step. (#37605) · 74ca89ef
由 Yuang Liu 提交于 11月 29, 2021

74ca89ef

[New features] Support batch_jacobian and batch_hessian (#37547) · 4d24d352

由 Weilong Wu 提交于 11月 29, 2021

* native commit for triple grad of sigmod

* Updated unittests files

* init functional jacobian api

* Updated trible_test func

* Updated gradient_checker & test_script

* finish test with dtype float32

* add float64 test case

* polish code

* use atol=1e-5 with dtype float64

* fix for ci

* set timeout for test_jacobian

* fix dygraph grad to support high differential

* polish API docstring

* Updated gradient checker and some related files

* fix double grad strip error for high differential

* fix double grad strip error for high differential

* Add Sigmoid triple grad tests

* fix dygraph double grad dtype error when calling for high differential senario

* Updated triple grad teses func

* Use np.random to initialize ddx

* Updated triple_grad_check func

* add todo for gradient checker and refine some comments

* remove additional code

* add test for warnging in backward.py

* format python code

* support multi input in triple gradient checker

* Add matmul triple grad kernel

* Updated comments of TODO

* Supported some special tests

* Change code-format to follow CI std

* Updated gradient_checker.py

* Fix conflicts

* Removed unnecessary printing log

* Change code style to follow CI std

* support batch in jacobian and hessian

* add batch jacobian and batch hessian

* Add batch_jacobian test, draft version

* [New features] Add elementwise_mul triple grad kernel (#37152)

* Add elementwise_mul triple grad kernel

* Removed InplaceInferer and polished code

* Add numerical_batch_jacobian,numerical_batch_hessian and tests

* Support batch_jacobian and batch_numerical

* Use pre-commit to check code format

* Update doc, polish code, add unit test

* Reset the TIMEOUT properties of test_jacobian to pass CI
Co-authored-by: Nlevi131 <limaolin01@baidu.com>
Co-authored-by: NJiabin Yang <360788950@qq.com>

4d24d352

B

fix_InternalStorage (#37568) · d0a89744
由 Baibaifan 提交于 11月 29, 2021

d0a89744
李
Complement the collective communication english docs (#37030) · 51804e4d
由李季提交于 11月 29, 2021
```
Co-authored-by: NChen Long <1300851984@qq.com>
```
51804e4d
W

[ut] Update skip concept to ignore. (#37635) · ae544242
由 Wilber 提交于 11月 29, 2021

ae544242

27 11月, 2021 2 次提交

[Auto Parallel] Add the graph class for the process and cluster (#37482) · 48faf638

由 Yulong Ao 提交于 11月 27, 2021

* [Auto Parallel]  Add the unified cluster representation

* [Auto Parallel] Add the graph class for physical mapping

* [Auto Parallel] Add the simple physical mapper

* Set the timeout of the mapper

* Merge the upstream develop unittests cmake files

* Fix a bug of the process group

* Remove mapper unittest from platforms which is not GPU

* Move the instantiation of process group after resharding

* Add the local id for devices

* Update the rank mapping format

* Add some comments

* Remove the related files about mapping

* Remove unused rank_mapping unittest

* Improve the unittest coverage

48faf638

J

fix save inference model conditional op (#37579) · fd41456f
由 JingZhuangzhuang 提交于 11月 27, 2021

fd41456f

26 11月, 2021 7 次提交

S
fix data parallel when VOCAB var in program (#37543) · e05540f7
由 Steffy-zxf 提交于 11月 26, 2021
```
* fix data parallel when VOCAB var in program
```
e05540f7
W

use prune program in new executor, test=develop (#37591) · 4201c94a
由 wanghuancoder 提交于 11月 26, 2021

4201c94a
Z
upgrade async distributed training in pscore (#37515) · 74605fc2
由 zhaocaibei123 提交于 11月 26, 2021
```
* test

* test

* rm test

* update

* update

* update

* add unittest

* update

* update save
```
74605fc2
L
Fix bugs when bias add none in static graph for fused_attention op. (#37566) · 097e098d
由 Li Min 提交于 11月 26, 2021
```
* Fix bugs when bias is none for static graph for fused_attention op.
```
097e098d

Added interface reset_grad_inplace_version (#37573) · dcb91fd7

由 Zhanlue Yang 提交于 11月 26, 2021

reset_inplace_version removes all inplace related records to VarBase/VariableWrapper, the essential purpose of which is to let you use inplace operations as if using its non-inplaced version, which of course will cause unexpected consequences if not used with care.

This is essentially a hack interface to satisfy one specific request

dcb91fd7

TDM2 (#37044) · 4826167c

由 wangzhen38 提交于 11月 26, 2021

* add tdm sample

* add tdm sample in c++

* update tdm sample

* modify sample count

* fix conflict

* add set_date

* fix cmake error

* fix bug of proto

* update index_dataset proto

* update cmake

* fix error cmake

* fix cmake mkldnn

* fix cmake proto

* update cmake proto

* update cmake

* update rec

* update dataset

* update dataset

* update dataset

* updata dataset

* updata dataset

* updata coverage

* updata ci

* goback4

* fix npu ci

* add xxhash dep

4826167c

Fix dropout static when axis != None (#37223) · f25fda37

由 smallv0221 提交于 11月 26, 2021

* fix dropout static when axis != None

* update dropout test

* add dropout test

* fix test

* Update test_dropout_op.py

* Update test_dropout_op.py

* fix testcase

* fix testcase

* Update test_dropout_op.py

* fix testcase

* fix testcase

* optimize perf

* add new test

* fix testcase

f25fda37

25 11月, 2021 10 次提交
- F
  [NPU] add int64 support for argsort op (#37434) · 3e088aaf
  由 furnace 提交于 11月 25, 2021
```
* [NPU] add int64 support for argsort op

* [NPU] delete debug codes
```
  3e088aaf
- F
  [NPU] add NPU kernel for prior_box op (#37519) · 1127fecb
  由 furnace 提交于 11月 25, 2021
```
* [NPU] add NPU kernel for prior_box op

* [NPU] delete debug codes
```
  1127fecb
- B
  
  Add InternalStorage and add ShardingOptimizerStage2 (#37489) · 5af64631
  由 Baibaifan 提交于 11月 25, 2021
  
  5af64631
- Z
  [cherry-pick 2.2 heterps]bug fix for launch_utils.py (#37521) · 8bb1038c
  由 zmx 提交于 11月 25, 2021
```
* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* [heterps]bug fix for _run_from_dataset

* fix heter_server.cc

* fix launch_utils.py

* fix heter_section_worker.cc

* fix. test=develop

* fix. test=develop
```
  8bb1038c
- add new API paddle.nn.initializer.Dirac (#37389) · bbb9b28a
  由 zhouweiwei2014 提交于 11月 25, 2021
```
* add new API paddle.nn.initializer.Dirac

* fix doc
```
  bbb9b28a
- L
  [new-exec] fix program cache key (#37500) · e64829e2
  由 Leo Chen 提交于 11月 25, 2021
```
* fix program cache key

* bug fix

* fix cache problem

* remove unused code
```
  e64829e2
- L
  
  Export task node to python (#37509) · 3f815e76
  由 LiYuRio 提交于 11月 25, 2021
  
  3f815e76
- C
  Hot fix for dataloader thread error because of pten (#37520) · ed7a21de
  由 Chen Weihang 提交于 11月 24, 2021
```
* hot fix for dataloader thread error

* polish comment

* fix type in comment, test=document_fix
```
  ed7a21de
- M
  【PaddlePaddle Hackathon】6、在 Paddle 中新增 ZeroPad2d (#37151) · 81861f69
  由 Matsumoto GAO 提交于 11月 25, 2021
```
* add zeropad2d v0.1

* add zeropad2d v0.2

* add zeropad2d v0.3

* add zeropad2d v0.3

* add zeropad2d v0.3

* add zeropad2d v0.4

* add zeropad2d v0.5

* add zeropad2d v0.5 codestyle

* add zeropad2d v0.5 codestyle

* add zeropad2d v0.6 functional

* add zeropad2d v0.6 functional

* add zeropad2d v0.6 functional
```
  81861f69
- L
  [new-exec] skip compiled program (#37512) · 171da2ce
  由 Leo Chen 提交于 11月 25, 2021
```
* skip compiled program

* fix ut
```
  171da2ce
24 11月, 2021 6 次提交

T
[GpuPs]pybind core (#37287) · d69daed1
由 Thunderbrook 提交于 11月 24, 2021
```
* pybind core

* set use psgpu
```
d69daed1
J

fix range op (#37486) · d5c51e62
由 Jiawei Wang 提交于 11月 24, 2021

d5c51e62

[Paddle-Inference] Matmul_int8_convert: tensor*tensor (#37285) · 16590799

由 Wangzheee 提交于 11月 24, 2021

* matmul_convert_int8

* matmul_convert_int8

* matmulconvert_int8

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

* Matmul_int8_convert: tensor*tensor

16590799

Z
Adapt auto search (#37490) · 025053b4
由 zhaoyingli 提交于 11月 24, 2021
```
* adapt auto search

* adapt auto search

* fix matmulv2 compatible

* del debug
```
025053b4
Y
[Auto Parallel] Add the unified cluster representation (#37091) · db727551
由 Yulong Ao 提交于 11月 24, 2021
```
* [Auto Parallel]  Add the unified cluster representation

* Add the local id for devices

* Add some comments
```
db727551

[Dy2stat]support pure fp16 for dy2stat (#36944) · 52edad6a

由 0x45f 提交于 11月 24, 2021

* run dy2stat pure fp16 in Linear model

* no use self._pure_fp16_inputs

* add test and fix Adam error in dy2stat pure fp16 training

* use paddle.optimizer.Adam

* run test in gpu

* change test time for CI

* enlarge atol for test_resnet_pure_fp16

* refine code and enlarge atol

* make custom_white_list and custom_black_list take effect for AMP and pure fp16

* check tracer is not None

* use default atol

* change filter_size

* change atol and add some NOTE

52edad6a

23 11月, 2021 4 次提交

P
fix inplace bug when the first grad_var(loss_grad) is inplace var (#37420) · ee1e1642
由 pangyoki 提交于 11月 23, 2021
```
* fix inplace bug

* fix custom grad input error

* add unittest

* fix inplace bug
```
ee1e1642
L
Add support bias is none for fused_attention op. (#37411) · 1a8786cf
由 Li Min 提交于 11月 23, 2021
```
Add support for bias is none for fused_attention op.
```
1a8786cf

Speedup download uncompress function (#37311) · 467099f0

由 CtfGo 提交于 11月 23, 2021

`paddle.utils.download` ：change to call `extractall` on tar/zip compressd file  to speed up the uncompress process when they includes many files

--- result of decompression speed comparison ---
1. dataset：https://paddlenlp.bj.bcebos.com/datasets/cnn_dailymail/cnn_stories.tgz, decompression time
：5m50s vs 20s
2. dataset：https://paddlenlp.bj.bcebos.com/datasets/cnn_dailymail/dailymail_stories.tgz, decompression time：33m20s vs 47s

467099f0

L
[new-exec] skip compiled program with places > 1 (#37457) · 2dfcdf21
由 Leo Chen 提交于 11月 23, 2021
```
* skip compiled program with places > 1

* fix corner case and add ut
```
2dfcdf21

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功