提交 · 08fa98f7cc75187d7a2030e07ae1e349979f27cc · BaiXuePrincess / Paddle

30 7月, 2019 1 次提交

Revert "use static variable to do cache instead of thread local in thread... · 10eeed93

由 Leo Zhao 提交于 7月 30, 2019

Revert "use static variable to do cache instead of thread local in thread frequent switching case (#18428)" (#18879)

This reverts commit ce38bb53.

test=develop

10eeed93

29 7月, 2019 2 次提交

Remove legacy C++ memory optimization codes (#18834) · 8008ab4e

由 Zeng Jinle 提交于 7月 29, 2019

* remove legacy memory optimization codes, test=develop

* follow huihuang's comments,test=develop

* follow luotao's comments, test=develop

8008ab4e

add clear_model interface in fleetwrapper (#18815) · 52c1431e

由 Thunderbrook 提交于 7月 29, 2019

* dump slot

* test

* proto

* dump slot

* test

* proto

* code style

* code style

* code style

* style

* add delete after unseen days

* add unseen days

* code style

* conflict solve
test=develop

* add clear model

* code style
test=develop

* code style
test=develop

52c1431e

27 7月, 2019 1 次提交
- C
  Open fuse optimization ops (#18741) · 4140fe11
  由 chengduo 提交于 7月 27, 2019
```
* open fuse optimization ops
test=develop
```
  4140fe11
26 7月, 2019 1 次提交

Feature/mem opt pass refactor (#18735) · a802da65

由 Zeng Jinle 提交于 7月 26, 2019

* first version memory optimize pass, test=develop

* remove move_tensor_sharing_pass, test=develop

* refine code comments, add unittests, test=develop

* turn off memory_optimize by default, test=develop

* follow huihuang's comments, test=develop

* follow chengduoZH's comments, test=develop

* fix grammar error, add const qualifier, fix pass_test exception message, test=develop

* follow chengduoZH's comments 2nd, test=develop

a802da65

25 7月, 2019 1 次提交

Fix shrink-dense and add scale-datanorm (#18746) · c167a4b4

由 fuyinno4 提交于 7月 25, 2019

Fix FleetWrapper:
1. fix shrink dense: just scale show
2. add datanorm scale: divide datanorm's gradient by batch_size

c167a4b4

24 7月, 2019 2 次提交

Update trt5 for paddle-trt (#18645) · 26ae6d49

由 Zhaolong Xing 提交于 7月 24, 2019

* update paddle-trt for:
    1. fix bug: when batch > 2, core in split plugin.
    2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.)
    3. add new attr to dropout.
    4. shuffle channel, swish, relu6 support
    test=develop

* 1. fix ci
test=develop

26ae6d49

add slot to sparse table (#18686) · d8396281

由 Thunderbrook 提交于 7月 24, 2019

The change includes 2 things:

1. save delta model and shrink table are control by the same parameter before, now add delete_after_unseen_days to control shrink table.
2. value in sparse table has no slot before, now add slot in sparse table, and add DownpureCtrAccessor to support the new meta.
test=develop

d8396281

23 7月, 2019 2 次提交

support patch data, add load_one_table, fix bug (#18509) · d18aabb4

由 jiaqi 提交于 7月 23, 2019

（1）support patch data （merge slots of instances of same line id, modify dense layer which
changes its size）
（2）add fleet load_one_table interface, support load from paddle model and load from pslib model
（3）fix push sparse bug which cause push sparse cost more time（about 10% in my testcase）
（4）when some slots are not in one of your network (join/update, etc.)，data feed、collect label info、push/pull sparse will skip these slots， instead of throw error.
（5）add more debug info in TrainFilesWithProfiler

d18aabb4

C
Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664) · fd3aad6c
由 chengduo 提交于 7月 23, 2019
```
* support sparse gradients
test=develop
```
fd3aad6c

19 7月, 2019 1 次提交

Support memory eager deletion on recurrent OP (#17710) · 89bc3fd8

由 Huihuang Zheng 提交于 7月 19, 2019

Test PaddingRNN on V100 GPU device.

Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU.

GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR)
Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)

89bc3fd8

18 7月, 2019 1 次提交

Feature/auto_growth_allocator (#18561) · ae58afc5

由 Zeng Jinle 提交于 7月 18, 2019

* feature/auto_growth_allocator, test=develop

* add unittest of AlignedAllocator, test=develop

* try to turn on auto_growth to test on CI, test=develop

* fix segmentation fault in mixed_vector.h, test=develop

* add unittests, test=develop

ae58afc5

17 7月, 2019 1 次提交
- G
  remove async executor and add data_feed.proto to the deps of train demo (#18659) · d714bf03
  由 guru4elephant 提交于 7月 17, 2019
```
* remove async executor and add data_feed.proto to the deps of train demo
```
  d714bf03
16 7月, 2019 1 次提交
- C
  fix PE fetch bug (#18644) · a6d468a2
  由 chengduo 提交于 7月 16, 2019
```
test=develop
```
  a6d468a2
12 7月, 2019 2 次提交

not use transferscope cache in cpu case (#18578) · ff77dea9

由 Leo Zhao 提交于 7月 12, 2019

* not use transferscope cache in cpu case

test=develop

* adjust variable name and add comments

test=develop

* use correct format for class member in operator.h

* use correct format for class member in operator.cc

test=develop

ff77dea9

1
fix #17430: int64类型的attr训练非预期 (#18264) · b414645a
由 123malin 提交于 7月 12, 2019
```
* fix int64_t

* update fill constant op unittest

* add empty line
```
b414645a

11 7月, 2019 2 次提交

G

Polish backwards optimizer dependency codes and use more default values. (#18255) · c0a82748
由 gongweibao 提交于 7月 11, 2019

c0a82748

Feature/buffer_shared_inplace (#17911) · d3003a16

由 Zeng Jinle 提交于 7月 11, 2019

* feature/buffer_shared_inplace, test=develop

* refine code, test=develop

* fix elementwise_add op cpu inplace and sum inplace bug, test=develop

* add unittest and debug log, test=develop

* fix parallel_executor scope bug, polish code, test=develop

* fix sum op, activation op, single_in_place_inference bug, test=develop

* remove kLocalExecScopeName, test=develop

* fix unittest,test=develop

* fix out_var first version bug, test=develop

* follow comments,test=develop

d3003a16

10 7月, 2019 1 次提交
- Z
  Clean unused code of dim and place (#18565) · be24e5b3
  由 Zeng Jinle 提交于 7月 10, 2019
```
* clean code of dim and place, test=develop

* fix failed unittests, test=develop
```
  be24e5b3
09 7月, 2019 1 次提交

Fix/gcc 4.8 ubt link error (#18558) · 667f88f9

由 Jiabin Yang 提交于 7月 09, 2019

* test=develop, fix docker with paddle nccl problem

* test=develop, fix/gcc_4.8_ubt_link_error

* test=develop, fix code format

667f88f9

08 7月, 2019 3 次提交
- Z
  Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532) · 88b52a27
  由 Zhaolong Xing 提交于 7月 08, 2019
```
* Fix Mask rcnn predictor
    1. refine memory optim algorithm to support the model with the block op.
    2. output diff : modify the affine channel fuse
    3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop

* add the missing files.
test=develop
```
  88b52a27
- L
  
  use static variable to do cache instead of thread local in thread frequent switching case (#18428) · ce38bb53
  由 Leo Zhao 提交于 7月 08, 2019
  
  ce38bb53
- G
  
  Regroup fusion by date type. (#18496) · 160ddc98
  由 gongweibao 提交于 7月 08, 2019
  
  160ddc98
04 7月, 2019 1 次提交
- C
  
  Make fuse_all_reduce_op_pass support mix_precision (#17652) · 74538573
  由 chengduo 提交于 7月 04, 2019
  
  74538573
03 7月, 2019 2 次提交
- P
  Nan debugger init (#18401) · e9c7e218
  由 pkpk 提交于 7月 03, 2019
```
test=develop
```
  e9c7e218
- T
  add transfer_scope_cache unit-test (#18467) · d234aa02
  由 Tao Luo 提交于 7月 03, 2019
```
test=develop
```
  d234aa02
02 7月, 2019 1 次提交

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

01 7月, 2019 1 次提交

Fix Pooling output scale (#18186) · 7023a86c

由 Michał Gallus 提交于 7月 01, 2019

* Int8: Fix Pooling output scale

test=develop

* Update scales quantization for certain operators

These include: concat, transpose, pool and reshape. test=develop

* Move concat minimum scale finding to quantizer

test=develop

7023a86c

29 6月, 2019 1 次提交

fix data feed ptr error (#18419) · 93a2b317

由 jiaqi 提交于 6月 29, 2019

fix data feed ptr runtime error, pipeline trainer will core in some cases, so set it nullptr as default value.

93a2b317

27 6月, 2019 4 次提交

Fix Bug-prone code of PE (#18354) · 8ed33bf9

由 chengduo 提交于 6月 27, 2019

* update pe reduce config
test=develop

*  drop the local_exe_scopes of the previous parallel_executor
test=develop

8ed33bf9

T
fix communicator with pyreader (#18350) · 999d9a59
由 tangwei12 提交于 6月 27, 2019
```
* add is_runnning in communicator, test=develop
```
999d9a59

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

S
add int8 mkldnn prior_box (#17242) · 9252e8fa
由 Sylwester Fraczek 提交于 6月 27, 2019
```
add prior_box quantization code

add scale algo rules for prior box

test=develop
```
9252e8fa

26 6月, 2019 1 次提交
- C
  update reduce config (#18334) · 135a59ed
  由 chengduo 提交于 6月 26, 2019
```
test=develop
```
  135a59ed
24 6月, 2019 2 次提交
- C
  Clean build strategy (#18148) · 5489216e
  由 chengduo 提交于 6月 24, 2019
```
* clean build_strategy
test=develop

* DataBalanceOpHandle has been removed
test=develop

* debug

* update build_strategy.
test=develop
```
  5489216e
- C
  update alloc_continuous_space_for_grad_pass (#18287) · 14e1e165
  由 chengduo 提交于 6月 24, 2019
```
test=develop
```
  14e1e165
21 6月, 2019 1 次提交

dataset (#17973) · 3f8031e2

由 jiaqi 提交于 6月 21, 2019

(1) use channel instead of vector/BlockingQueue in Dataset，to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training.
(2) add Record because MultiSlotType costs too much memory (80B)，fix memory out of limit problem.
(3) add Channel, Archive in paddle/fluid/framework
(4) change dataset from shared_ptr to unique_ptr in pybind
(5) move create/destroy readers from trainer to dataset
(6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network.
(7) fix thread num bug of Dataset when filelist size < thread num
(8) support set_queue_num in InMemoryDataset

3f8031e2

19 6月, 2019 1 次提交
- C
  Update execution_strategy option default value (#18183) · 25f3cd64
  由 chengduo 提交于 6月 19, 2019
```
* update execution_strategy option default value
test=develop

* fix doc error
test=develop
```
  25f3cd64
18 6月, 2019 1 次提交
- C
  Remove nccl dep when the number of GPU is 1 (#18158) · 4978db2c
  由 chengduo 提交于 6月 18, 2019
```
* remove nccl dep when the number of GPU is 1
test=develop
```
  4978db2c
15 6月, 2019 1 次提交
- C
  Fix bug of scope_buffered_ssa_graph_executor (#18100) · 24e988a4
  由 chengduo 提交于 6月 15, 2019
```
* fix code bug
test=develop
```
  24e988a4

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致