提交 · 4877bd59448f0c15df7383a14a47e59195d984b7 · PaddlePaddle / Paddle

23 10月, 2020 1 次提交

Fix test_parallel_executor_test_while_train Random Failure by Decreasing GPU Usage (#28213) · a1e7fd4a

由 Huihuang Zheng 提交于 10月 23, 2020

Recently, test_parallel_executor_test_while_train randomly failed on CI. On all CI logs, it showed NCCL initialization failed or cusolver initialization failed. I found online that those failure is usually caused by GPU shortage. Those API calls CUDA APIs directly so it shouldn't be the problem of allocator. It may be somewhere in PaddlePaddle increases GPU usage.

However, I run this test for 1000 times on my machine and the CI machine, either of them can reproduce the random failure. Maybe there is something related to the environment only happened in test env.

To verify my assumption that somewhere in PaddlePaddle increases GPU usage and also fix this CI, I decreased the batch_size to see whether the random failure disappears in test env.

a1e7fd4a

22 10月, 2020 4 次提交

fix strided_slice_op's GetExpectedKernelType (#28192) · efe6e284

由 Feiyu Chan 提交于 10月 22, 2020

* fix strided_slice_op's GetExpectedKernelType when input tensor is at CUDAPinnedPlace

* add unittest for tensors in cuda pinned place

* skip test for cuda pinned place on cpu machines

efe6e284

Fix bug of fetch_async_op_handle when fetching the feed variable (#28194) · 1f3be859

由 Leo Chen 提交于 10月 22, 2020

* fix bug of fetch_async_op_handle

* revert some changes of test_buffer_shared_memory_reuse_pass

* revert some changes of test_buffer_shared_memory_reuse_pass

1f3be859

A
[Dy2stat] Refine return mechanism in @to_static (#28116) · e7305160
由 Aurelius84 提交于 10月 22, 2020
```
* remove some judgement

* fix len(outputs) == 1
```
e7305160

Update hapi predict interface (#28180) · 68449d19

由 LielinJiang 提交于 10月 22, 2020

* update hapi predict interface

* fix code style

* fix docs

* fix docs

* fix docs

* update docs

* fix codes style

* fix unittest

* fix unittest

* fix coverage

68449d19

21 10月, 2020 7 次提交

C

fix test_weight_decay_extend error (#28178) · 5d73bfdb
由 Chen Weihang 提交于 10月 21, 2020

5d73bfdb
L
modify ut cmakefile (#28140) · 4873c20d
由 lilong12 提交于 10月 21, 2020
```
* modify ut cmakefile, test=develop
```
4873c20d

2.0rc api rename (#28088) · 7c1aa0d6

由 cnn 提交于 10月 21, 2020

* rename manual_seed to seed

* rename xxx1d-->xxx1D, xxx2d-->xxx2D, xxx3d-->xxx3D

* rename manual_seed --> seed

* do not rename .cc, .cu and .h file

* rename manual_seed --> seed

* rename manual_seed --> seed

* rename manual_seed --> seed

* rename manual_seed --> seed

* disable_static on doc example code

* donot change manual_seed on generator

* add enable_static on sample code

* convert python/paddle/fluid/layers/nn.py to bak

* fix typo

* fix code style

* fix seed to manual_seed when call functions of Generator()

* fix bug

7c1aa0d6

L

fix dynamic decode imperative (#28160) · bc460692
由 liu zhengxi 提交于 10月 21, 2020

bc460692

add static_mode_white_list (#28112) · 2d45d9a0

由 pangyoki 提交于 10月 21, 2020

* add static_mode_white_list

* add Mac CI static list

* add Win CI white_list

* add Coverage and Py3 CI white_list, add test_unittest

2d45d9a0

W

support multiclass nms for multi-batch, test=develop (#28154) · 5cd97a1c
由 wangguanzhong 提交于 10月 21, 2020

5cd97a1c

Add new api: is_tensor (#28111) · 446d184e

由 zhulei 提交于 10月 21, 2020

* Add new api: is_tensor

* Add new api: is_tensor

* Add new api: is_tensor

* Add new api: is_tensor

446d184e

20 10月, 2020 8 次提交
- L
  
  disable test_dist_mnist_hallreduce, test=develop (#28129) · cd372447
  由 lilong12 提交于 10月 20, 2020
  
  cd372447
- W
  fix generate_proposal_labels in cascade-rcnn series model, test=develop (#27892) · d1e1f174
  由 wangguanzhong 提交于 10月 20, 2020
```
* fix generate_proposal_labels in cascade-rcnn series model, test=develop

* fix example code & unittest, test=develop

* update code from review comments, test=develop
```
  d1e1f174
- L
  fill_constant op supports NaN and Inf (#28109) · a911c19e
  由 Leo Chen 提交于 10月 20, 2020
```
* fill_constant supports nan and inf

* add ut
```
  a911c19e
- A
  [Dy2stat] Refine code of DygraphToStaticAst (#28103) · 135b62a4
  由 Aurelius84 提交于 10月 20, 2020
```
* refine code of DygraphToStaticAst

* add __init__ function
```
  135b62a4
- H
  
  reduce imperative ocr attention config; test=develop (#28079) · 5a589b2f
  由 hong 提交于 10月 20, 2020
  
  5a589b2f
- Z
  
  fix test_group_norm_op_v2.py, test=develop (#28104) · af709240
  由 zhang wenhui 提交于 10月 20, 2020
  
  af709240
- D
  add rois_num for roi_align xpu OP (#28077) · d43f75e4
  由 Double_V 提交于 10月 20, 2020
```
* add stack pool2d roi_align xpu op,test=kunlun

* error message opt, test=kunlun

* add xpu unittest,test=kunlun

* skip check grad,test=kunlun

* fix boostget , test=kunlun

* error message opt for XPU, test=kunlun

* add rois_num for roi_align xpu OP, test=develop
```
  d43f75e4
- L
  Fix dataloader when stack input data with different type (#27950) · 8327accc
  由 LielinJiang 提交于 10月 20, 2020
```
* fix dataloader
```
  8327accc
19 10月, 2020 10 次提交

xpu adam op (#28031) · 6f0c3d1f

由 yinhaofeng 提交于 10月 19, 2020

* lookup_table_xpu op report errors;test=kunlun

* add adam xpu op;test=kunlun

* reset lookup

* change adam wrong;test=kunlun

6f0c3d1f

T

Add xpu transpose2 op.test=kunlun (#28086) · a5c95cd5
由 TeslaZhao 提交于 10月 19, 2020

a5c95cd5
L
Fix diag OP bug on Windows Python3.8 · c8d32c8c
由 LutaoChu 提交于 10月 19, 2020
```
Fix diag OP bug on Windows Python3.8 ，remove the std::min
```
c8d32c8c
M
fleet support paddle.optimzier (#28026) · 55098b97
由 MRXLT 提交于 10月 19, 2020
```
fleet support paddle.optimzier

* bug fix

* fix fleet_base

* bug fix

* fix coverage
```
55098b97

[API 2.0: doc] transfer from paddle.fluid.layers.assign() into creation.py (#27999) · e21b13fb

由 liuyuhui 提交于 10月 19, 2020

* transfer from paddle.fluid.layers.assign() into creation.py,test=develop

* fix ut fail,add support for paddle.assign,test=develop

* fix,test=develop

* fix UT coverage,test=coverage

* fix UT fail,test=coverage

* fix doc,test=develop

e21b13fb

Allclose op (#27891) · d4668938

由 huangxu96 提交于 10月 19, 2020

* Still has bugs.

* Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs.

* improved CUDA kernel performance.

* Changed CUDA code.

* Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it.

* Add a test case for float32 input.

d4668938

Fix error message of multinomial op (#27946) · 975bd887

由 pangyoki 提交于 10月 19, 2020

* fix multinomial doc

* fix multinomial error message

* little doc change

* fix Categorical class doc

* optimize format of error message

* fix CPU Kernel error message format

* fix isinf and isnan error in WindowsOPENBLAS CI

* delete inf and nan

* add manual_seed in sample code

* little error message change

* change error message to InvalidArgument

* add full point for error message and add manual_seed in CPU environment

975bd887

Add truncated_gaussian_random XPU kernel (#27861) · 4c5b779a

由 pangyoki 提交于 10月 19, 2020

* Add truncated_gaussian_random_op XPU kernel

* Add truncated_gaussian_random_op XPU kernel, test=kunlun

* little change, test=kunlun

* change boost_get to BOOST_GET_CONST

* change boost_get to BOOST_GET_CONST, test=kunlun

* little change, test=kunlun

* use Generator to generate random number and optimize format, test=kunlun

* little change, test=kunlun

* add TODO, test=kunlun

4c5b779a

Add gaussian_random XPU kernels (#27853) · 5b8e5001

由 pangyoki 提交于 10月 19, 2020

* Add gaussian_random XPU kernels

* commit kunlun, test=kunlun

* new version, test=kunlun

* change boost_get to BOOST_GET_CONST, test=kunlun

* use Generator to generate random number and optimize format, test=kunlun

* add TODO, test=kunlun

5b8e5001

Add uniform_random XPU kernel (#27846) · 74ce0397

由 pangyoki 提交于 10月 19, 2020

* support uniform_random op on Baidu Kunlun

* change dtype of attr shape from int to int64_t

* kunlun ci, test=kunlun

* new version, test=kunlun

* change boost_get to BOOST_GET_CONST

* change boost_get to BOOST_GET_CONST, test=kunlun

* use Generator to generate random number and optimize format

* run Kunlun CI, test=kunlun

* add TODO, test=kunlun

74ce0397

18 10月, 2020 1 次提交

add cast/concat/assign xpu op (#27911) · 3e956865

由 liuyuhui 提交于 10月 18, 2020

* addd

* add cast_op_xpu, test=kunlun

* fix bug for cast_op_xpu,test=kunlun

* add concat_op_xpu, test=kunlun

* slove conflicts, test=kunlun

* fix bug,test=kunlun

* add assign_op_xpu, test=kunlun

* fix bug,test=kunlun

* test=kunlun;test=develop

* fix concat bug,test=kunlun

* fix check_dygraph set in test_concat_op_xpu.py,test=kunlun

* fix error message,test=kunlun
Co-authored-by: Nmapingshuo <mps2012@yeah.net>

3e956865

17 10月, 2020 2 次提交
- L
  Add API for pad op. (#27943) · 2ed84a67
  由 littletomatodonkey 提交于 10月 17, 2020
```
* add pad apis
* rm pad2d test_layer
* fix code example
```
  2ed84a67
- A
  Fix test_lstm unittest failed and Add more unittest (#28029) · 3718b2e7
  由 Aurelius84 提交于 10月 17, 2020
```
* fix test_lstm unittest failed

* add more unittest

* modify cmakelist

* fix judgement
```
  3718b2e7
16 10月, 2020 7 次提交

Y
disable test_lstm,test=document_fix (#28030) · bf5325f3
由 YUNSHEN XIE 提交于 10月 16, 2020
```
* disable test_lstm,test=document_fix

* fix some error,test=document_fix
```
bf5325f3
W

【paddle.fleet】fleet add _get_applied_meta_list and _get_applied_graph_list (#27952) · fb641c91
由 WangXi 提交于 10月 16, 2020

fb641c91

Incorporate cudnn_lstm into LSTM api (#27217) · fa9d3fa5

由 Guo Sheng 提交于 10月 16, 2020

* Incorporate cudnn_lstm into LSTM api.
test=develop

* Make coalesce_tensor support alignment optionally.
test=develop

* Reorganize RNN apis. test=develop

* Fix cudnn rnn layout conversion.
test=develop

* Add sequence_length support for RNN cudnn implement.
Add optional init_h and init_c gradient for cudnn_lstm_op.
test=develop

* Use create_parameter for rnn cudnn impl.
test=develop

* Move `self._flat_weight = self.create_parameter()` in RNNBase to main_program.
test=develop

* Update RNN api unittest to use set_device.
test=develop

* Fix set_place for unit tests of RNN apis.
test=develop

* Fix use_align in coalesce_tensor_op.
test=develop

* Adjust RNN apis arguments according to comments.
test=develop

* Polish documents for SimpleRNN apis.
test=develop

* Refine random seed in cudnn_lstm_op.
Expose rnn params from sublayers to RNN.
test=develop

* Fix RNN saving for jit.save.
Refine cudnn_lstm dropout behavior.
test=develop

* Fix doc of GRU. test=develop

* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop

* Remove updates on cudnn_lstm temporarily.
test=develop

* Use ShareDataWith to avoid copying for cudnn_lstm_op test.
test=develop

* Refine random seed in cudnn_lstm_op.
test=develop

* Fix test_lstm by adjust ConcreteProgram buffer getter.
test=develop

* Use create_parameter instead of create_var for rnn._flat_weight for static graph usage.
test=develop

* Remove W input for cudnn_lstm to pass unused_var_check.
test=develop

* Add test_predict for RNN unit tests coverage.
test=develop

* Fix code style of rnn.
test=develop

* Fix F.rnn usage in rnn.py.
test=develop

fa9d3fa5

L

fix random failure (#27996) · 78b1026f
由 Leo Chen 提交于 10月 16, 2020

78b1026f

[Dy2Stat] Fix Error when generating train_program in eval mode (#27975) · ffcc1175

由 Aurelius84 提交于 10月 16, 2020

* Fix save in eval mode

* remove assert statement

* fix test_partial_program failed

* add more test

* modify back into _train_program

ffcc1175

C

change paddle.fluid.data to paddle.static.data in sample code (#27992) · 57a9c272
由 chentianyu03 提交于 10月 16, 2020

57a9c272

Fix xpu enforce (#27978) · d330cf66

由 Jack Zhou 提交于 10月 16, 2020

* test=kunlun;

Add elementwise XPU OP kernel for KUNLUN core, including (but still cannot process common broadcast):

    * elementwise_div op
    * elementwise_max op
    * elementwise_mul op (with grad op)
    * elementwise_sub op (with grad op)

* 0.05->0.01

* add xpu error message description;test=kunlun

d330cf66

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功