提交 · 3f021781a1316638936acc3c27b4711643833f12 · TonyTonyFun / Paddle

19 9月, 2019 1 次提交
- H
  Fix deps of prune (#19876) · a35557d8
  由 Huihuang Zheng 提交于 9月 19, 2019
```
Add boost as dependency of prune

fix #19862
```
  a35557d8
11 9月, 2019 1 次提交

Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320

由 Huihuang Zheng 提交于 9月 11, 2019

TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.

We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.

Also added data_feed_proto to operator to fix CI in CPU compilation

12542320

08 9月, 2019 1 次提交
- H
  fix cmakelist deps (#19668) · 1ca6ea03
  由 hutuxian 提交于 9月 08, 2019
```
fix cmakelist deps: remove unnecessary deps and add proper op deps
```
  1ca6ea03
31 8月, 2019 1 次提交

Paddlebox Framework (#18982) · c756b5d2

由 hutuxian 提交于 8月 31, 2019

* Support looking up embeddings from BoxPS.
* Add a _pull_box_sparse op, for now this op is not exposed to users.
* Add a BoxHelper class, providing 'BeginPass', 'EndPass', 'FeedPass' functions and so on.
* Add 'BoxPSDataset' in python code.
* Add a compile options WITH_BOX_PS and a MACRO PADDLE_WITH_BOX_PS.
* Add UT.
* More concrete information pls refer to: https://github.com/PaddlePaddle/Paddle/pull/18982

c756b5d2

19 8月, 2019 1 次提交
- Z
  
  merge develop to solve conflict, also fix API doc, test=develop (#18823) · 5b6673c4
  由 Zeng Jinle 提交于 8月 19, 2019
  
  5b6673c4
09 8月, 2019 1 次提交

Add call stack info during compile time (#19067) · 21440b4d

由 chengduo 提交于 8月 09, 2019

* Add call stack info during runtime and compile time
test=develop

* Rename operator_call_stack
test=develop

* Add unit test
test=develop

* follow comment
test=develop

21440b4d

02 8月, 2019 1 次提交

Open gc by default (#18836) · 7ac748ad

由 Zeng Jinle 提交于 8月 02, 2019

* open gc by default, test=develop

* fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop

* fix conditional_block op eager deletion bug, test=develop

* add some comments to reviewers, test=develop

7ac748ad

29 7月, 2019 1 次提交

Remove legacy C++ memory optimization codes (#18834) · 8008ab4e

由 Zeng Jinle 提交于 7月 29, 2019

* remove legacy memory optimization codes, test=develop

* follow huihuang's comments,test=develop

* follow luotao's comments, test=develop

8008ab4e

19 7月, 2019 1 次提交

Support memory eager deletion on recurrent OP (#17710) · 89bc3fd8

由 Huihuang Zheng 提交于 7月 19, 2019

Test PaddingRNN on V100 GPU device.

Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU.

GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR)
Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)

89bc3fd8

17 7月, 2019 1 次提交
- G
  remove async executor and add data_feed.proto to the deps of train demo (#18659) · d714bf03
  由 guru4elephant 提交于 7月 17, 2019
```
* remove async executor and add data_feed.proto to the deps of train demo
```
  d714bf03
27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

12 6月, 2019 1 次提交
- H
  
  add trainer_desc proto DEPS (#18019) · f1d458da
  由 hutuxian 提交于 6月 12, 2019
  
  f1d458da
11 6月, 2019 1 次提交

Pipeline Concurrency (#17402) · 969e6378

由 hutuxian 提交于 6月 11, 2019

Add Pipeline Concurrency Train Mode:
- Cpp: pipeline_trainer & section_worker
- Python: PipelineOptimizer
- Add a new data_feed type: PrivateInstantDataFeed
- Add a test demo of pipeline trainer and the test model is gnn
- Do not support win32 now

969e6378

23 5月, 2019 1 次提交

Fix allocator bug (#16712) · c6189637

由 Zeng Jinle 提交于 5月 23, 2019

* Revert "Revert "Fix allocator bug""

This reverts commit 174d0d0b.

* Revert "fix travis ci"

This reverts commit 5656fa9f.

test=develop

* add inlined_vector.h, test=develop

* add inlined_vector_test,test=develop

c6189637

29 3月, 2019 22 次提交
- L
  fix comments of 16410, test=develop (#16499) · 278debab
  由 liuwei1031 提交于 3月 29, 2019
```
* fix comments of 16410, test=develop

* modify inplace_op_inference_test according to pass interface change, test=develop
```
  278debab
- D
  rebase current develop and fix conflict · 720647e1
  由 dongdaxiang 提交于 3月 29, 2019
```
test=develop
```
  720647e1
- D
  add timer to distributed executor · 241d8808
  由 dongdaxiang 提交于 3月 28, 2019
```
test=develop
```
  241d8808
- D
  add trainer_desc.proto to distributed executor · 3c73859e
  由 dongdaxiang 提交于 3月 28, 2019
```
test=develop
```
  3c73859e
- D
  fix distributed building · 0030eb2a
  由 dongdaxiang 提交于 3月 28, 2019
```
test=develop
```
  0030eb2a
- D
  remove trainer_library in CMakeLists · f39b323e
  由 dongdaxiang 提交于 3月 23, 2019
```
test=develop
```
  f39b323e
- D
  
  fix dataset float32 type problem · f6c9232a
  由 dongdaxiang 提交于 3月 18, 2019
  
  f6c9232a
- X
  
  add dataset factory && fix style · ecfc7df9
  由 xujiaqi01 提交于 3月 13, 2019
  
  ecfc7df9
- D
  
  make Dataset* as an argument · b415ec27
  由 dongdaxiang 提交于 3月 09, 2019
  
  b415ec27
- X
  
  modify c++ and python dataset related code & fix bug · dd67ad08
  由 xjqbest 提交于 3月 09, 2019
  
  dd67ad08
- D
  
  fix some conflict for compilation · cc4def6b
  由 dongdaxiang 提交于 3月 08, 2019
  
  cc4def6b
- H
  
  refactor & fix bug · 9bca1926
  由 heqiaozhi 提交于 3月 08, 2019
  
  9bca1926
- X
  
  add DataSet and InMemoryDataFeed, support load data into memory and shuffle data · 2e9a836c
  由 xjqbest 提交于 3月 06, 2019
  
  2e9a836c
- D
  
  add RunFromDataset in executor · 24863897
  由 dongdaxiang 提交于 3月 08, 2019
  
  24863897
- X
  
  add DataSet and InMemoryDataFeed, support load data into memory and shuffle data · 824b84d1
  由 xjqbest 提交于 3月 06, 2019
  
  824b84d1
- D
  move fs.cc and shell.cc into paddle/fluid/framework/io · 1fe54416
  由 dongdaxiang 提交于 2月 22, 2019
```
test=develop
```
  1fe54416
- D
  
  add fs_local_open example · afaf9370
  由 dongdaxiang 提交于 2月 22, 2019
  
  afaf9370
- D
  
  add printer for fetch variable · cf136064
  由 dongdaxiang 提交于 2月 18, 2019
  
  cf136064
- D
  
  fix ngraph compile option · 54f047a1
  由 dongdaxiang 提交于 2月 02, 2019
  
  54f047a1
- D
  
  add common.h.in back · dd1dc9bc
  由 dongdaxiang 提交于 2月 02, 2019
  
  dd1dc9bc
- D
  refine device_worker and trainer code · c1650120
  由 dongdaxiang 提交于 2月 02, 2019
```
test=develop
```
  c1650120
- D
  
  make -DWITH_PSLIB=ON compilable · 24a80011
  由 dongdaxiang 提交于 1月 28, 2019
  
  24a80011
27 3月, 2019 1 次提交
- S
  delete source file no_need_buffer_vars_inference.cc · a0f4fefb
  由 sneaxiy 提交于 3月 27, 2019
```
test=develop
```
  a0f4fefb
26 3月, 2019 2 次提交
- S
  fix env variable settting bug · 78fb3a62
  由 sneaxiy 提交于 3月 26, 2019
```
test=develop
```
  78fb3a62
- S
  fix some op grad maker · 7000ec85
  由 sneaxiy 提交于 3月 25, 2019
```
fix ctest eager deletion disable bug
test=develop
```
  7000ec85
25 3月, 2019 1 次提交
- S
  split PR · c20db635
  由 sneaxiy 提交于 3月 25, 2019
```
test=develop
```
  c20db635

TonyTonyFun / Paddle 与 Fork 源项目一致

TonyTonyFun / Paddle
与 Fork 源项目一致