提交 · b7128bac5f12138062ec2518a0f856915c752a69 · 兽拳 / Paddle

27 6月, 2019 4 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

S
add int8 mkldnn prior_box (#17242) · 9252e8fa
由 Sylwester Fraczek 提交于 6月 27, 2019
```
add prior_box quantization code

add scale algo rules for prior box

test=develop
```
9252e8fa

some fixes for int8 mobilenet_ssd tester (#18112) · 5fd68ac1

由 lidanqing 提交于 6月 27, 2019

* some fixes for int8 mobilenet_ssd tester
test=develop

* change wrong data file name
test=develop

* change test images bin file from 200 images to 100 images

* change directory existence to file existence during downloading
test=develop

* reuse download_data
test=develop

* run full dataset when iterations=0
test=develop

5fd68ac1

[MKL-DNN] Extending reusing to Elementwise_add_mkldnn op (#18146) · c2efdfd5

由 Jacek Czaja 提交于 6月 27, 2019

* - Reusing of reuder used in elementwise_add_mkldnn

- Added MKL-DNN sum prim reusing

test=develop

- Compilation fixes

test=develop

- Yet another compilation fix

test=develop

- Yet another compilation fix

test=develo

- Yet another linking fix

test=develop

- Final compilation fix

test=develop

- lint fixes

test=develop

- Lint fixes

test=develop

* - Fixes after review

test=develop

c2efdfd5

26 6月, 2019 9 次提交
- Q
  Simplify multi_box_head API in detection.py and remove assign op. (#18310) · 9047ac68
  由 qingqing01 提交于 6月 26, 2019
```
* Simplify multi_box_head API in detection.py and remove assign op.
```
  9047ac68
- H
  
  add ut for pipeline training (#18289) · e42057cd
  由 hutuxian 提交于 6月 26, 2019
  
  e42057cd
- Z
  Refine CUDAPlace error message. (#18343) · 5826b72e
  由 Zeng Jinle 提交于 6月 26, 2019
```
* refine cuda place error msg, test=develop

* use LOG(ERROR)+exit(-1), test=develop
```
  5826b72e
- T
  remove unused jemalloc option (#18314) · 3c9755bb
  由 Tao Luo 提交于 6月 26, 2019
```
test=develop
```
  3c9755bb
- J
  
  test=develop, recover ocr ut on dygraph (#18166) · bd61d899
  由 Jiabin Yang 提交于 6月 26, 2019
  
  bd61d899
- Y
  Update lamb optimizer (#18333) · 23941e43
  由 Yibing Liu 提交于 6月 26, 2019
```
* Update lamb optimizer

test=develop, test=document_preview

* Regenerate api spec

test=develop, test=document_preview
```
  23941e43
- C
  update reduce config (#18334) · 135a59ed
  由 chengduo 提交于 6月 26, 2019
```
test=develop
```
  135a59ed
- W
  Fix checkpoint of Light-NAS (#18330) · 1bdfd2eb
  由 whs 提交于 6月 26, 2019
```
Socket can't be pickled.
test=develop
```
  1bdfd2eb
- J
  
  test=develop, disable basic gru related ut (#18329) · 79bcdbbf
  由 Jiabin Yang 提交于 6月 26, 2019
  
  79bcdbbf
25 6月, 2019 9 次提交

T
fix softrelu doc (#18324) · 81ec5382
由 tensor-tang 提交于 6月 25, 2019
```
* fix softrelu doc

test=develop

* update API doc

test=develop
```
81ec5382

Add install check for multigpu (#18323) · 831a3e62

由 Jiabin Yang 提交于 6月 25, 2019

* test=develop, add_install_check_for_multigpu

* test=develop, refine code to use cuda_devices

831a3e62

Z

fix lod_tensor.py grammar error, test=develop (#18308) · f88e07a0
由 Zeng Jinle 提交于 6月 25, 2019

f88e07a0

Sequence mask support tensor (#18249) · df2eee71

由 Hongyu Liu 提交于 6月 25, 2019

* sequnce mask support max length tensor input; test=develop

* add rnn_impl.py; test=develop

* add basic gru lstm unittest; test=develop

* fix api spec; test=develop

* fix sequence_mask op bug;
test=develop
test=document_preview

* change +-*x to elmentwise_op; test=develop

* add mkl flag; test=develop

* fix rnn impl bug; test=develop

* update api spec; test=develop

* fix doc bug; test=develop

* fix lstm bugs; test=develop

df2eee71

J
test=develop, Revert "Add multi gpu install check" (#18313) · 9cb799be
由 Jiabin Yang 提交于 6月 25, 2019
```
* Revert "Add multi gpu install check (#18229)"

This reverts commit 61ed06b2.

* test=develop, start ci
```
9cb799be

optimize communicator merge sparse gradient test=develop (#18159) · 0e08e91c

由 Qiao Longfei 提交于 6月 25, 2019

* optimize communicator merge sparse gradient test=develop

* revert multithread selected rows merge add test=develop

* follow comment test=develop

0e08e91c

J
init black/white lists (#17847) · 172c2fac
由 Jie Fang 提交于 6月 25, 2019
```
test=develop
```
172c2fac
C
Fix default value of fluid.memory_optimize (#18295) · e06c69c7
由 chengduo 提交于 6月 25, 2019
```
* fix default value of fluid.memory_optimize
test=develop

* fix api.spec
test=develop
```
e06c69c7
Z
fix split and sampled softmax (#18280) · 6978b2e4
由 Zhaolong Xing 提交于 6月 25, 2019
```
test=develop
```
6978b2e4

24 6月, 2019 6 次提交
- Y
  Fix the bug of sequence_unpad op (#18290) · f57ee369
  由 Yibing Liu 提交于 6月 24, 2019
```
* Use TensorCopySync for sequence_unpad op

test=develop

* Fix the tensor memory alloc bug

test=develop
```
  f57ee369
- H
  
  add api desc for pipeline training (#18293) · 6ed73830
  由 hutuxian 提交于 6月 24, 2019
  
  6ed73830
- C
  Clean build strategy (#18148) · 5489216e
  由 chengduo 提交于 6月 24, 2019
```
* clean build_strategy
test=develop

* DataBalanceOpHandle has been removed
test=develop

* debug

* update build_strategy.
test=develop
```
  5489216e
- C
  update alloc_continuous_space_for_grad_pass (#18287) · 14e1e165
  由 chengduo 提交于 6月 24, 2019
```
test=develop
```
  14e1e165
- L
  add Dygraph api to api.spec (#18235) · 7e61baaa
  由 lujun 提交于 6月 24, 2019
```
add Dygraph api to api.spec
```
  7e61baaa
- L
  improve doc of lstm, sequence_enumerate, softmax_with_cross_entropy, space_to_depth APIs (#18261) · a736c03b
  由 liuwei1031 提交于 6月 24, 2019
```
* improve doc of lstm, sequence_enumerate, softmax_with_cross_entropy, space_to_depth APIs, test=develop

* update API.spec, test=develop
```
  a736c03b
23 6月, 2019 4 次提交
- C
  add random seed for recurrent op test (#18274) · d54e13bb
  由 chengduo 提交于 6月 23, 2019
```
test=develop
```
  d54e13bb
- L
  
  improve the hint message of memory optimize, test=develop (#18260) · 4151d90c
  由 liuwei1031 提交于 6月 23, 2019
  
  4151d90c
- G
  fix paddle cloud role maker bug (#18269) · ff399fd7
  由 guru4elephant 提交于 6月 23, 2019
```
* fix paddle cloud role maker bug
```
  ff399fd7
- Y
  Fix ema's example & fp16 update (#18273) · 412951d7
  由 Yibing Liu 提交于 6月 23, 2019
```
test=develop, test=document_preview
```
  412951d7
22 6月, 2019 2 次提交
- F
  fix double buffer example (#18169) · fdf798f9
  由 flame 提交于 6月 22, 2019
```
test=develop
test=document_preview
```
  fdf798f9
- B
  
  fix api doc example, test=develop (#18266) · 23b8b18e
  由 Bai Yifan 提交于 6月 22, 2019
  
  23b8b18e
21 6月, 2019 6 次提交

P

fix a bug in examples of metrics.Acc · cd9d57f5
由 pkpk 提交于 6月 21, 2019

cd9d57f5
T
refine core cmake warning and print more info (#18248) · 68da8b2a
由 tensor-tang 提交于 6月 21, 2019
```
* refine core cmake warning and print more info

test=develop

* fix comments

test=develop
```
68da8b2a
Z
Add StaticRNN.output code example (#18251) · 32c95f17
由 zhaoyuchen2018 提交于 6月 21, 2019
```
refine StaticRNN api doc
test=develop
test=document_preview
```
32c95f17
X

fix yolo_box example,test=develop (#18247) · 2f0d6826
由 xiaoting 提交于 6月 21, 2019

2f0d6826

fix some bug when merge sparse embedding parameters, test=develop (#18223) · 6b3d9625

由 songhao 提交于 6月 21, 2019

1. fix the bug that out_put_var in SaveSelectedRows would be empty string
2. use merge_sparse_lookup_table to replace sum op for load_persistables_for_inference
3. fix the bug in _clone_var_in_block_ when the var is SELECTED_ROWS.

6b3d9625

dataset (#17973) · 3f8031e2

由 jiaqi 提交于 6月 21, 2019

(1) use channel instead of vector/BlockingQueue in Dataset，to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training.
(2) add Record because MultiSlotType costs too much memory (80B)，fix memory out of limit problem.
(3) add Channel, Archive in paddle/fluid/framework
(4) change dataset from shared_ptr to unique_ptr in pybind
(5) move create/destroy readers from trainer to dataset
(6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network.
(7) fix thread num bug of Dataset when filelist size < thread num
(8) support set_queue_num in InMemoryDataset

3f8031e2

兽拳 / Paddle 与 Fork 源项目一致

兽拳 / Paddle
与 Fork 源项目一致