提交 · 41ab76e55baad03af2e57a160137b555b366812e · BaiXuePrincess / Paddle

02 7月, 2019 2 次提交

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

C
Add find_no_grad_vars in backward.py (#17942) · e0d8c6ac
由 chengduo 提交于 7月 02, 2019
```
* add not_been_used_vars to no_grad_set
test=develop
```
e0d8c6ac

01 7月, 2019 2 次提交

Make roi_perspective_transform op return mask and transform matrix (#18371) · 449c7a9f

由 LielinJiang 提交于 7月 01, 2019

* modify roi_perspective_transform_op to output mask and transform matrix

* modify comment

* modify comment

* modify API.spec

* update API.spec

* remove no use header, test=develop

* resolve conflict

449c7a9f

Fix bug in quantize kernel which cause crash in vgg16/19 model (#17964) · 4bc2987d

由 Brian Liu 提交于 7月 01, 2019

* Fix bug in quantize kernel which cause crash in vgg16/19 model

test=develop

* refine the code to reduce verbose code; test=develop

* remove useless code; test=develop

4bc2987d

28 6月, 2019 2 次提交

Fix potential mkldnn concat/pool/conv kernel issues (#18393) · 681d3553

由 Leo Zhao 提交于 6月 28, 2019

1. some key generation method is not aligned with PR#17965
2. enlarge ptr lifetime to avoid memory release if SetBlob fails
   otherwise it will get core dump.

test=develop

681d3553

Z
Add a unittest to inplace elementwise_add (#18385) · f5641000
由 Zeng Jinle 提交于 6月 28, 2019
```
* add_elementwise_add_inplace_test,test=develop

* rename file, test=develop
```
f5641000

27 6月, 2019 4 次提交

T
fix communicator with pyreader (#18350) · 999d9a59
由 tangwei12 提交于 6月 27, 2019
```
* add is_runnning in communicator, test=develop
```
999d9a59

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

S
add int8 mkldnn prior_box (#17242) · 9252e8fa
由 Sylwester Fraczek 提交于 6月 27, 2019
```
add prior_box quantization code

add scale algo rules for prior box

test=develop
```
9252e8fa

[MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146) · c2efdfd5

由 Jacek Czaja 提交于 6月 27, 2019

* - Reusing of reuder used in elementwise_add_mkldnn

- Added MKL-DNN sum prim reusing

test=develop

- Compilation fixes

test=develop

- Yet another compilation fix

test=develop

- Yet another compilation fix

test=develo

- Yet another linking fix

test=develop

- Final compilation fix

test=develop

- lint fixes

test=develop

- Lint fixes

test=develop

* - Fixes after review

test=develop

c2efdfd5

26 6月, 2019 2 次提交
- Q
  Simplify multi_box_head API in detection.py and remove assign op. (#18310) · 9047ac68
  由 qingqing01 提交于 6月 26, 2019
```
* Simplify multi_box_head API in detection.py and remove assign op.
```
  9047ac68
- Y
  Update lamb optimizer (#18333) · 23941e43
  由 Yibing Liu 提交于 6月 26, 2019
```
* Update lamb optimizer

test=develop, test=document_preview

* Regenerate api spec

test=develop, test=document_preview
```
  23941e43
25 6月, 2019 3 次提交

T
fix softrelu doc (#18324) · 81ec5382
由 tensor-tang 提交于 6月 25, 2019
```
* fix softrelu doc

test=develop

* update API doc

test=develop
```
81ec5382

Sequence mask support tensor (#18249) · df2eee71

由 Hongyu Liu 提交于 6月 25, 2019

* sequnce mask support max length tensor input; test=develop

* add rnn_impl.py; test=develop

* add basic gru lstm unittest; test=develop

* fix api spec; test=develop

* fix sequence_mask op bug;
test=develop
test=document_preview

* change +-*x to elmentwise_op; test=develop

* add mkl flag; test=develop

* fix rnn impl bug; test=develop

* update api spec; test=develop

* fix doc bug; test=develop

* fix lstm bugs; test=develop

df2eee71

optimize communicator merge sparse gradient test=develop (#18159) · 0e08e91c

由 Qiao Longfei 提交于 6月 25, 2019

* optimize communicator merge sparse gradient test=develop

* revert multithread selected rows merge add test=develop

* follow comment test=develop

0e08e91c

24 6月, 2019 2 次提交

Fix the bug of sequence_unpad op (#18290) · f57ee369

由 Yibing Liu 提交于 6月 24, 2019

* Use TensorCopySync for sequence_unpad op

test=develop

* Fix the tensor memory alloc bug

test=develop

f57ee369

Clean build strategy (#18148) · 5489216e

由 chengduo 提交于 6月 24, 2019

* clean build_strategy
test=develop

* DataBalanceOpHandle has been removed
test=develop

* debug

* update build_strategy.
test=develop

5489216e

21 6月, 2019 2 次提交

fix some bug when merge sparse embedding parameters, test=develop (#18223) · 6b3d9625

由 songhao 提交于 6月 21, 2019

1. fix the bug that out_put_var in SaveSelectedRows would be empty string
2. use merge_sparse_lookup_table to replace sum op for load_persistables_for_inference
3. fix the bug in _clone_var_in_block_ when the var is SELECTED_ROWS.

6b3d9625

X
set src_idx > 0 for bilinear_interp_op (#18238) · b58bb802
由 xiaoting 提交于 6月 21, 2019
```
* set src_idx > 0, test=develop

* add unittest and cu, test=develop
```
b58bb802

20 6月, 2019 1 次提交

Fix slice op shape=-1 bug (#18107) · cefd0fb5

由 Hongyu Liu 提交于 6月 20, 2019

* fix slice op bug; test=develop

* fix variabel test bug; test=develop

* remove slice while true; test=develop

cefd0fb5

19 6月, 2019 2 次提交

翟

fix spelling errors (#17941) · 802ea509

由翟飞跃提交于 6月 19, 2019

* fix spelling errors; test=develop

* Update API.spec

update md5

* Update API.spec

* change the order of api;test=develop

802ea509

fix type error of std::pow in sigmoid_focal_loss_op.cu and sigmoid_focal_loss_op.h (#18152) · 944c3165

由 FlyingQianMM 提交于 6月 19, 2019

* test=develop
fix type error of std::pow in sigmoid_focal_loss_op.cu and sigmoid_focal_loss_op.h

* test=develop
fix wrong code stype in sigmoid_focal_loss_op.cu and sigmoid_focal_loss_op.h

944c3165

17 6月, 2019 1 次提交
- Z
  Fix py_reader iterable bug (#18108) · 6eec66a1
  由 Zeng Jinle 提交于 6月 17, 2019
```
* fix py_reader iterable bug, test=develop

* move data from buffered_reader,test=develop
```
  6eec66a1
16 6月, 2019 4 次提交

Update backward appending stragety to support double backward and fix some bug. (#18104) · 80d2e66f

由 qingqing01 提交于 6月 16, 2019

* Update backward.py:
     - If there is no input grad var in all outputs of previous ops, do not append this op into graph.
     - Only apply this stragety when double backward.
* Update some double backward op.
* Update sum_op to judge whether a tensor is empty by numel or IsInitialized().

80d2e66f

add detection output operator for supporting retinanet (#17896) · ff83655f

由 FlyingQianMM 提交于 6月 16, 2019

* test=develop
add detection output for supporting retinanet

* test=develop
add test_layers.py

* test=develop
add API.spec

* test=develop
alter test_retinanet_detection_output.py

* test=develop
alter round 2

* test=develop
alter retinanet_detection_output

* test=develop
alter paddle/fluid/API.spec

* test=devlop
alter detection.py

* test=develop
alter retinanet_detection_output

* test=develop
alter paddle/fluid/API.spec

* test=develop
alter detection.py

* test=develop
alter API.spec

* test=develop
alter retinanet_detection_output

* test=develop
alter paddle/fluid/API.spec

* test=develop
alter python/paddle/fluid/tests/unittests/test_retinanet_detection_output.py

* test=develop
alter python/paddle/fluid/tests/unittests/test_retinanet_detection_output.py

* test=develop
fix grammer error

* test=develop
fix grammer error

* test=develop
fix grammer error

* test=develop
alter python/paddle/fluid/tests/unittests/test_layers.py

* test=develop
alter paddle/fluid/API.spec

ff83655f

add sigmoid focal loss operator for supporting retinanet (#17895) · 0aee1f00

由 FlyingQianMM 提交于 6月 16, 2019

* test=develop
add sigmoid_focal_loss for supporting retinanet

* test=develop
add test_layers

* test=develop
add API.spc

* test=develop
alter sigmoid_focal_loss_op.cc

* test=develop
alter detection.py

* test=develop
alter API.spec

* test=develop
alter round 1

* test=develop
alter simooid_focal_loss

* test=develop
alter sigmoid_focal_loss_op.cc

* test=develop
alter test_layers.py

* test=develop
alter paddle/fluid/API.spec

* test=develop
alter sigmoid_focal_loss_op.cu

* test=develop
alter paddle/fluid/operators/detection/sigmoid_focal_loss_op.cc

0aee1f00

F
Update generate_proposal_labels_op to support CascadeRCNN. (#17200) · 9e4b9d97
由 FDInSky 提交于 6月 16, 2019
```
* Update generate_proposal_labels_op to support CascadeRCNN.
```
9e4b9d97

15 6月, 2019 2 次提交

add target assign operator for supporting retinanet (#17893) · 9ed2f936

由 FlyingQianMM 提交于 6月 15, 2019

* test=develop add target assign for retinanet

* test=develop
run ci

* test=developp
add test_layers

* test=develop
add APi.spec

* test=develop
alter round 1

* test=develop
alter rpn_target_assign_op.cc

* test=develop
alter test_rpn_target_assign_op.py

* test=develop
alter rpn_target_assign_op.cc

* test=develop

alter API.spec

* test=develop
alter paddle/fluid/operators/detection/rpn_target_assign_op.cc

* test=develop
alter rpn_target_assign_op.cc

* test=develop
alter python/paddle/fluid/layers/detection.py

* test=develop
alter paddle/fluid/API.spec

9ed2f936

C
Fix bug of scope_buffered_ssa_graph_executor (#18100) · 24e988a4
由 chengduo 提交于 6月 15, 2019
```
* fix code bug
test=develop
```
24e988a4

14 6月, 2019 2 次提交
- W
  Add warning for cudnn warpctc kernel in CUDA9\CUDA10. (#18046) · 354643d8
  由 whs 提交于 6月 14, 2019
```
test=develop
```
  354643d8
- Y
  Optimize fused_elewise_activation_grad op. (#18041) · 660c1a65
  由 Yiqun Liu 提交于 6月 14, 2019
```
test=develop
```
  660c1a65
13 6月, 2019 3 次提交

refactor the function ConvFwdPrimitiveDesc (#17897) · f8ecc3de

由 lidanqing 提交于 6月 13, 2019

* refractor the function ConvFwdPrimitiveDesc
test=develop

* change according to review
test=develop

* use pointer way without boost::optional
test=develop

* pass vector to function by reference instead of raw vector
test=develop

* change pointer to shared_ptr
test=develop

f8ecc3de

Added unit test for QAT FP32 & INT8 comparison (#17814) · 78e93286

由 Wojciech Uss 提交于 6月 13, 2019

* added unit test for QAT FP32 & INT8 comparison

test=develop

* enabled other models and updated filenames

test=develop

* added accuracy check and multiple batch handling

test=develop

* removed quantization_mkldnn_pass.py

test=develop

* cleanup

test=develop

* updated model paths

test=develop

* renamed tests without MKL-DNN

test=develop

* fix reusing mkldnn pool2d primitive

test=develop

* add performance measuring

test=develop

* fix accuracy statistics

test=develop

* removed non-mkldnn tests

test=develop

* added conv2d_depthwise->conv2d mkldnn transformation

test=develop

* format update

test=develop

* fixed creating key for pool2d grad

test=develop

* added pass

* Fix the accuracy issue while using float precision to get the scale.

test=develop

* Fix the format issue when 'X' is not nchw.

test=develop

* removed output comparing and changed number of images

test=develop

* cmake and comment fix

test=develop

* updated acc threshold for QAT comparison tests

test=develop

* added OMP_NUM_THREADS setting

test=develop

* enable all QAT INT8 tests

test=develop

* restored upstream version of a file

test=develop

* modified directory names

test=develop

78e93286

T
concat op support negative axis (#18045) · 566bf2ec
由 tensor-tang 提交于 6月 13, 2019
```
test=develop
```
566bf2ec

12 6月, 2019 6 次提交

Y
Optimize the concat and split cuda implementation for cases when the number of... · 7e463c84
由 Yiqun Liu 提交于 6月 12, 2019
```
Optimize the concat and split cuda implementation for cases when the number of inputs/outputs is less than 5. (#17979)

test=develop
```
7e463c84
T
fix save/load in fleet (#17675) · 101f74cb
由 tangwei12 提交于 6月 12, 2019
```
* fix save/load in Fleet
* add UT framework of Fleet
```
101f74cb

Fix GetExpectedKernelType of add_position_encoding_op (#17935) · a06b316b

由 Guo Sheng 提交于 6月 12, 2019

* Fix the GetExpectedKernelType of add_position_encoding_op.
test=develop

* Fix the doc of lstm_unit outputs in nn.py.
test=develop

a06b316b

Fix scatter and gather op when has duplicate index (#17952) · 8eb134c3

由 wawltor 提交于 6月 12, 2019

* test=develop
The scatter op has a calc bug when the indices has same index, the scatter op use overwrite mode to calculate the same index, fix this bug by using the accumulate mode to calculate the same index.At the same time, the gather op has the same bug when the op calc the grad. And we use the lib of open-blas and eigen to optimize the time cost in accumulate mode.

* test=develop
Fix some code format problem, and the same time add the test case in gather and scatter op

8eb134c3

update load_error_info, test=develop (#18000) · 75fcd292

由 lujun 提交于 6月 12, 2019

Repair error prompt: Users are prompted to check whether the model or parameter files are damaged when loading parameters are wrong.

75fcd292

test=develop (#17984) · 2ae8decc

由 wawltor 提交于 6月 12, 2019

Fix bug in sequence_unpad op, when allocate the output memory do not match actual memory, check memory failed. Fix this bug by allocating the output memeory in correct code position.

2ae8decc

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致