提交 · d31d597fb73376c5f57c7007215cedcafda8e67f · PaddlePaddle / Paddle

25 11月, 2021 1 次提交

Cherry-pick PR 37420, fix inplace bug when the first grad_var(loss_grad) is... · d31d597f

由 pangyoki 提交于 11月 25, 2021

Cherry-pick PR 37420, fix inplace bug when the first grad_var(loss_grad) is inplace var (#37420) (#37488)

fix inplace bug，Cherry pick PR #37420

d31d597f

16 11月, 2021 1 次提交

[cherry-pick-2.2.1]fix fused_transformer_encoder_layer bug (#37229) · 36dd295e

由 zhangkaihuo 提交于 11月 16, 2021

修复了fused_transformer_encoder_layer fine-tune过程发现的一些问题：

    fused_attention_op添加attn_mask=None的支持：PR
    pre_layer_norm处理问题：PR
    参数处理，计算错误的问题：PR
    add_bias计算错误问题：PR
    添加pure fp16的支持：PR

36dd295e

27 10月, 2021 1 次提交
- Z
  [cherry-pick]Fused transformer encoder layer and fused feedforward layer #36776 · e1b5b1da
  由 zhangkaihuo 提交于 10月 27, 2021
```
本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。
```
  e1b5b1da
26 10月, 2021 4 次提交

[Cherry-pick] Add FasterTokenizer Operator (#36716) · edff5b79

由 Steffy-zxf 提交于 10月 26, 2021

* Add FasterTokenizer Operator (#34491)

Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.

* support the text string as an input Tensor
* support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
* Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
* It first applies basic tokenization, followed by wordpiece tokenization.

* optimize fast tokenizer

* remove const_cast
Co-authored-by: Nzhoushunjie <zhoushunjie@baidu.com>
Co-authored-by: Nwawltor <fangzeyang0904@hotmail.com>

edff5b79

Support various length support for SelectedRows in GLOO::AllGather (#36637) (#36722) · fced11bd

由 xiongkun 提交于 10月 26, 2021

Support various length support for SelectedRows in GLOO::AllGather (#36637)

    In cpu parallel using gloo, add various length support for SelectedRows

fced11bd

L
[Amp] refine code of amp level (#36362) (#36726) · 1ee4fc32
由 Leo Chen 提交于 10月 26, 2021
```
* refine amp level

* fix typo

* update tracer._amp_level
```
1ee4fc32

[cherry-pick] Support CPU Parallel in DataParallel Interface by GLOO to speed... · beb920cd

由 xiongkun 提交于 10月 26, 2021

[cherry-pick] Support CPU Parallel in DataParallel Interface by GLOO to speed up training (#35745) (#36605)

* User specified backend (#35745)

* remove tensordot

beb920cd

18 9月, 2021 1 次提交
- Z
  
  fix flags dep (#35859) · 6d45d8da
  由 Zeng Jinle 提交于 9月 18, 2021
  
  6d45d8da
17 9月, 2021 2 次提交

[AMP] Support pure fp16 training mode for dygraph (#35521) · adaeee4d

由 zhangbo9674 提交于 9月 17, 2021

* add pure fp16 major function in auto_cast & tracer

* support master weight in dygraph for pure fp16

* check mix dtype of fp16&fp32 for check_finite_and_unscale op

* change pure fp16 funtion name

* refine some bug in auto_cast

* refine auto_cast interface logic

* add param _casted_by_pure_fp16 for class Layer

* support state_dict hook for save model by user appointed dtype in pure_fp16_decorator

* refine pure_fp16_decorator as decorator

* add unittest

* add comment

* add comment

* support recompute

* add comment for auto_cast and decorator

* support to_static_state_dict for paddle.jit.save

* unlimite models num and optimizers num

* add lookup_table in black_list

* fix momentum and layer state_dict

* fix bug in layer state_dict

* fix bug in layer state_dict_helper

* refine unittest

* refine test_momentun_op

* refine interface and some code

* refine amp_decorator interface

* refine pure fp16 interface

* refine master weight interface

adaeee4d

Make flag adding easier (#35823) · 2c781455

由 Zeng Jinle 提交于 9月 17, 2021

* make flag setter easier

* update

* rename macro name

* fix bug of public/writable

* update to pass CI

* polish

* fix CPU link error

2c781455

14 9月, 2021 1 次提交

Add solutions to PyLayer which is unsupported in DataParallel (#35401) · d483b8c0

由 Haohongxiang 提交于 9月 14, 2021

* Add solutions to PyLayer which is unsupported in DataParallel

* modify note format for parallel.py

* modify docs of dataparallel

* add docs of dp with pylayer

* modify docs format

* modify example format

* change example of dp with pylayer

* add unittest for dp with pylayer

* modify ut

* merge latest codes

* update

* modify for CI-Coverage

* modify text-indent

d483b8c0

10 9月, 2021 1 次提交
- R
  
  [NPU] support gradient_accumulator (#35044) · 0b6623d7
  由 ronnywang 提交于 9月 10, 2021
  
  0b6623d7
08 9月, 2021 2 次提交

L
add backward inplace for dygraph (#35412) · 0cb413d3
由 Leo Chen 提交于 9月 08, 2021
```
* add backward inplace for dygraph

* fix bug

* support gradient accumulation
```
0cb413d3

Intergrate GLOOParallelContext to support Multi-CPU Core for Dygraph DataParallel (#35154) · 51cc73f0

由 xiongkun 提交于 9月 08, 2021

* can pass the fake test

* add files

* modify cmake to pass windows-ci

* for ci pass

* WITH_GLOO=ON

* for pass coverage test

* add cpuonly testcase

* add

* disable nccl when compile with cuda

* change python version in cpuonly

* add backend argument

* add required gpu

* add required:gpu

51cc73f0

01 9月, 2021 1 次提交
- Q
  support KL label smooth (#35177) · 7ca28bb6
  由 QingshuChen 提交于 9月 01, 2021
```
* support KL label smooth

* update UT for KL label_smooth
```
  7ca28bb6
24 8月, 2021 1 次提交

Add no_sync in data parallel for dynamic graph (#34740) · b09f4d7f

由 Haohongxiang 提交于 8月 24, 2021

* Add no_sync in data parallel for dynamic graph

* modify UT of no_sync

* delete test_parallel_dygraph_dataparallel_no_sync.py

* add test_parallel_dygraph_no_sync.py

* modify run_trainer_with_spawn in UTs

* Add UT of complex control flow in no_sync

* add specific descriptions and notes for no_sync

* check code style

* modify UT's TIMEOUT in CMakeLists.txt

b09f4d7f

12 8月, 2021 1 次提交
- fix set_grad_ivar bug of Tensor.backward (#34819) · dffb0b22
  由 zhouweiwei2014 提交于 8月 12, 2021
  
  dffb0b22
06 8月, 2021 1 次提交
- Q
  support kunlun black list and add kl1 op (#34605) · 21beef91
  由 QingshuChen 提交于 8月 06, 2021
```
* support kunlun black list and add kl1 op

* xpu_op_list add device_context dependence
```
  21beef91
05 8月, 2021 1 次提交
- Z
  
  fix dygraph has_grad (#34649) · 68377b44
  由 Zeng Jinle 提交于 8月 05, 2021
  
  68377b44
04 8月, 2021 1 次提交

Fix backward bug (#34582) · a7c38367

由 chentianyu03 提交于 8月 04, 2021

* fix backward bug

* format code style

* add test case for grad tensor accumulator

a7c38367

03 8月, 2021 3 次提交
- W
  
  [hybrid] remove the using of global ring in hybrid parallel (#34525) · 56b7ebbc
  由 WangXi 提交于 8月 03, 2021
  
  56b7ebbc
- Q
  support Kunlun2 (#34459) · 2d0f3d9b
  由 QingshuChen 提交于 8月 03, 2021
```
* support Kunlun2

* support KL2

* support KL2
```
  2d0f3d9b
- W
  
  fix attr can not find in mkldnn, test=develop (#34567) · e7dcdb79
  由 wanghuancoder 提交于 8月 03, 2021
  
  e7dcdb79
09 7月, 2021 1 次提交
- Z
  
  fix double grad hang bug (#34023) · 8768ffb7
  由 Zeng Jinle 提交于 7月 09, 2021
  
  8768ffb7
02 7月, 2021 1 次提交
- H
  
  Enhance npu/xpu log when kernel fallback to cpu, and fix cmake warnings. (#33927) · a74e01ab
  由 houj04 提交于 7月 02, 2021
  
  a74e01ab
30 6月, 2021 1 次提交
- H
  [NPU] support set_device (#33815) · 8225a6a1
  由 houj04 提交于 6月 30, 2021
```
* support set_device for NPU.

* minor update doc and add more unit test.
```
  8225a6a1
29 6月, 2021 1 次提交
- T
  
  xpu support amp (#33809) · 4d4fb660
  由 taixiurong 提交于 6月 29, 2021
  
  4d4fb660
24 6月, 2021 1 次提交

[NPU] support dygraph execution on npu place(#33579) · 6aea6be2

由 houj04 提交于 6月 24, 2021

* in NPU environment, use CPUPlace for missing operators.

* in NPU environment, use CPUPlace for missing operators.

* fix TensorCopy bug and add unit test.

* fix code style.

* add more unit tests.

6aea6be2

23 6月, 2021 1 次提交

optimize attr default value (#33357) · 5d2eb678

由 wanghuancoder 提交于 6月 23, 2021

* optimize attr default value, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* fix bug in AttrReader, test=develop

* fix bug, test=develop

* fix double_grad, test=develop

* refine, test=develop

* refine, test=develop

* fix checker null, test=develop

* for test, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

5d2eb678

21 6月, 2021 1 次提交
- C
  Combine amp and qat (#33484) · f88af205
  由 cc 提交于 6月 21, 2021
```
* Combine amp and qat
* add unit test
```
  f88af205
10 6月, 2021 1 次提交
- C
  [Debug] Add nan& inf check FLAG for dygraph (#32635) · df4a978c
  由 Chen Weihang 提交于 6月 10, 2021
```
* add check nan of inf for dygraph

* add unittest for dygraph

* revert error change
```
  df4a978c
08 6月, 2021 1 次提交
- W
  replace 'InnerSetOverridedStopGradient' with 'SetOverridedStopGradient'. (#33303) · 37385f63
  由 WeiXin 提交于 6月 08, 2021
```
* replace 'InnerSetOverridedStopGradient' with 'SetOverridedStopGradient'.

* improve coverage.

* polish error message.
```
  37385f63
26 5月, 2021 1 次提交
- C
  modify matmul Op to complex template types (#33130) · 6c07cd7e
  由 chentianyu03 提交于 5月 26, 2021
```
* modify matmul Op to complex template types

* remove complex64/128 head file
```
  6c07cd7e
12 5月, 2021 1 次提交
- L
  
  [NPU] Support npu pinned allocator and manage Tensor on NPUPinnedPlace (#32840) · 6b3bb796
  由 liym27 提交于 5月 12, 2021
  
  6b3bb796
11 5月, 2021 1 次提交
- S
  Support control flow in DataParallel (#32826) · 298f210d
  由 ShenLiang 提交于 5月 11, 2021
```
* fix find_unused_parameters default value
```
  298f210d
10 5月, 2021 1 次提交
- R
  
  Dynamic amp support sync_batch_norm op (#32770) · 23ab01e3
  由 Roc 提交于 5月 10, 2021
  
  23ab01e3
01 5月, 2021 1 次提交
- S
  
  fix traverse graph in reducer (#32715) · f4a3f85b
  由 ShenLiang 提交于 5月 01, 2021
  
  f4a3f85b
30 4月, 2021 2 次提交

W

pylayer_op:release context after compute. (#32707) · 3cc11a3d
由 WeiXin 提交于 4月 30, 2021

3cc11a3d

Add 12 inplace APIs including auto generated (#32573) · 308073de

由 pangyoki 提交于 4月 30, 2021

* add relu6_ hardsigmoid_ leaky_relu_ Inplace APIs

* add softmax_with_cross_entropy_ Inplace API

* add clip_ scale_ add_ subtract_ Inplace APIs

* add wlist

* fix parameter of scale api

* add add_n_ Inplace API and remove log_ Inplace API

* fix elementwise_add_ and elementwise_sub_ broadcast problem

* elementwise inplace api give error message before run the op

* use broadcast_shape in elementwise inplace op

* add 8 inplace apis that is auto generated

* add unittest for all inplace apis

* add decorator for inplace apis in static mode

* fix windows blas fail of exp inplace api, change array_equal to allclose

* add flatten inplace api

* add flatten unittest

* fix flatten unittest

* add decorator

* fix grad.numpy in test_pylayer_op

* unsupport softmax_with_cross_entropy_

* add test_inplace_softmax_with_cross_entropy to static_mode_white_list

* delete __all__ in inplace_utils

* delete activation inplace function and add Tensor.inplace_func

* change paddle.inplace_ to Tensor.inplace_

* fix little problem

* add paddle in inplace_utils

308073de

29 4月, 2021 1 次提交
- L
  
  [Kunlun]fix multi xpu dygraph hang, test=kunlun (#32662) · a3e77197
  由 liuyuhui 提交于 4月 29, 2021
  
  a3e77197

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功