提交 · 4ef6b8457aa29995f1850ee4c9c6ff41739e64c9 · BaiXuePrincess / Paddle

28 8月, 2019 2 次提交

Y
adapte fleet api for localsgd and support nccl comm configuration in executor (#19443) · 4ef6b845
由 Yi Liu 提交于 8月 28, 2019
```
test=develop
```
4ef6b845

Fix the correctness of async mode at distributed training (#18863) · 65c73684

由 tangwei12 提交于 8月 28, 2019

* fix correctness of the communicator

* fix a bug in send thread when sending var context is empty, test=develop

* add lookup_table_prefetch_op and prefetch optimize, test=develop

* remove remote prefetch GPU supported

* word2vec force with CPU, test=develop

* test dist remote lookup table force with CPU, test=develop

65c73684

26 8月, 2019 1 次提交
- T
  fix distribute transpiler GRPC error code 4, RPC Deadline (#18984) · 19dac67e
  由 tangwei12 提交于 8月 26, 2019
```
* fix sync mode hang in transpiler
* remove sync mode in send/recv
* replace PADDLE_ENFORCE with PADDLE_ENFORCE_NE
```
  19dac67e
16 8月, 2019 1 次提交

remove unused inference_transpiler unit-tests (#19130) · 2f8c7e02

由 Tao Luo 提交于 8月 16, 2019

* remove unused inference_transpiler unit-tests

test=develop

* remove InferenceTranspiler usage in quantize_transpiler.py

test=develop

2f8c7e02

12 8月, 2019 1 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
10 8月, 2019 1 次提交

Try to deprecate unstable python memory optimize (#18983) · c194b0c8

由 Zeng Jinle 提交于 8月 10, 2019

* deprecate python memory optimize, test=develop

* remove memory_optimize in unittests, test=develop

* add unittests to deprecated interfaces, test=develop

c194b0c8

29 7月, 2019 1 次提交

Remove legacy C++ memory optimization codes (#18834) · 8008ab4e

由 Zeng Jinle 提交于 7月 29, 2019

* remove legacy memory optimization codes, test=develop

* follow huihuang's comments,test=develop

* follow luotao's comments, test=develop

8008ab4e

23 7月, 2019 1 次提交

supports distributed classification (#18690) · 157211c4

由 Yi Liu 提交于 7月 23, 2019

* supports distributed classification training
* update API.spec
* fix evenly division in python3
* change "index_range" to "index_num" in shard_index operator
test=document_preview
test=develop

157211c4

22 7月, 2019 1 次提交
- T
  do some odd jobs (#18641) · d8458483
  由 tangwei12 提交于 7月 22, 2019
```
do some odd jobs, test=develop
```
  d8458483
11 7月, 2019 1 次提交
- G
  
  Polish backwards optimizer dependency codes and use more default values. (#18255) · c0a82748
  由 gongweibao 提交于 7月 11, 2019
  
  c0a82748
02 7月, 2019 1 次提交

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

25 6月, 2019 1 次提交
- C
  Fix default value of fluid.memory_optimize (#18295) · e06c69c7
  由 chengduo 提交于 6月 25, 2019
```
* fix default value of fluid.memory_optimize
test=develop

* fix api.spec
test=develop
```
  e06c69c7
31 5月, 2019 1 次提交
- T
  fix document of python api get_startup_program() (#17764) · 659b72a9
  由 tangwei12 提交于 5月 31, 2019
```
* add example to get_startup_program()
* fix example to get_startup_program()
```
  659b72a9
30 5月, 2019 1 次提交
- Y
  
  fix distributed_transpiler.py api test=develop (#17668) · ac92e4c0
  由 yaoxuefeng 提交于 5月 30, 2019
  
  ac92e4c0
29 5月, 2019 2 次提交
- G
  
  fix 2dconn test=develop (#17681) · 0d561ef4
  由 gongweibao 提交于 5月 29, 2019
  
  0d561ef4
- T
  fix doc in transpiler, test=develop (#17313) · 0d3c48e0
  由 tangwei12 提交于 5月 29, 2019
```
* fix doc in transpiler, test=develop
```
  0d3c48e0
27 5月, 2019 1 次提交
- G
  
  Add multi-ncclcomm and 2D ncclallreduce support. (#17263) · 65bbf950
  由 gongweibao 提交于 5月 27, 2019
  
  65bbf950
24 5月, 2019 1 次提交

[MKL-DNN] Add Fully Connected Op for inference only(#15226) · 0c39b97b

由 Michał Gallus 提交于 5月 24, 2019

* fuse mul and elementwise add to fc

* Reimplement the FC forward operator

* Fix FC MKLDNN integration by transposing weights

* Add FC MKLDNN Pass

test=develop

* FC MKLDNN Pass: change memcpy to std::copy

* Fix MKLDNN FC handling of mismatch input and weights dims

* Lower tolerance for MKL-DNN in resnet50 test

test=develop

* Adjust FC to support MKLDNN Op placement

test=develop

* Adjust Placement Op to set use_mkldnn attribute for graph

test=develop

* MKLDNN FC: fix weights format so that gemm version is called

test=develop

* FC MKLDNN: Remove tolerance decrease from tester_helper

* FC MKL-DNN: Refactor the code, change input reorder to weight reorder

* MKL-DNN FC: Introduce operator caching

test=develop

* FC MKL-DNN: Fix the tensor type in ExpectedKernelType

test=develop

* FC MKL-DNN: fix style changes

test=develop

* FC MKL-DNN: fallback to native on non-supported dim sizes

test=develop

* FC MKLDNN: fix CMake paths

test=develop

* FC MKLDNN: Refine placement pass graph mkldnn attribute

test=develop

* Fix Transpiler error for fuse_conv_eltwise

test=develop

* Fix missing STL includes in files

test=develop

* FC MKL-DNN: Enable new output size computation

Also, refine pass to comply with newest interface.
test=develop

* FC MKL-DNN: enable only when fc_mkldnn_pass is enabled

* FC MKL-DNN: Allow Weights to use oi or io format

* FC MKL-DNN: Adjust UT to work with correct dims

test=develop

* Enable MKL DEBUG for resnet50 analyzer

test=develop

* FC MKL-DNN: Improve Hashing function

test=develop

* FC MKL-DNN: Fix shape for fc weights in transpiler

* FC MKL-DNN: Update input pointer in re-used fc primitive

* Add log for not handling fc fuse for unsupported dims

test=develop

* FC MKL-DNN: Move transpose from pass to Op Kernel

test=develop

* FC MKL-DNN: Disable transpose in unit test

test=develop

* FC MKL-DNN: Remove fc_mkldnn_pass from default list

* Correct Flag for fake data analyzer tests

test=develop

* FC MKL-DNN: Add comment about fc mkldnn pass disablement

test=develop

* FC MKL-DNN: Disable fc in int8 tests

test=develop

0c39b97b

23 5月, 2019 2 次提交
- Q
  fix distribute doc test=develop (#17318) · 92e7d5d7
  由 Qiao Longfei 提交于 5月 23, 2019
```
* fix distribute doc
```
  92e7d5d7
- Q
  Async exe support communicator (#17386) · 58f7695a
  由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
  58f7695a
20 5月, 2019 1 次提交
- L
  improve the doc of paddle.fluid.memory_optimize, test=develop (#17473) · f82e4d75
  由 liuwei1031 提交于 5月 20, 2019
```
* improve the doc of paddle.fluid.memory_optimize, test=develop

* fix typo, test=develop
```
  f82e4d75
16 5月, 2019 1 次提交

improve the API Sample of DataFeeder, memory_optimize and release_memory (#17374) · 6a53fa95

由 liuwei1031 提交于 5月 16, 2019

* improve the API Sample of DataFeeder, memory_optimize and release_memory, test=develop

* update API.spec, test=develop, test=document_preview

* tweak the code format of feed API, test=develop

*  update API.spec, test=develop

* improve doc for DataFeeder and default_main_program, test=develop

6a53fa95

26 4月, 2019 1 次提交
- T
  
  truncated_gaussian_random supported in distributed training, test=develop (#17091) · 7330cd63
  由 tangwei12 提交于 4月 26, 2019
  
  7330cd63
25 4月, 2019 1 次提交
- T
  Fleet unify distributed training (#16791) · 1a4a51db
  由 tangwei12 提交于 4月 25, 2019
```
* implement distributed transpiler with fleet
```
  1a4a51db
27 3月, 2019 1 次提交
- Q
  
  fix pylint · d640c6cf
  由 Qiao Longfei 提交于 3月 27, 2019
  
  d640c6cf
25 3月, 2019 1 次提交
- Q
  
  fix trainer_id · 542b52fa
  由 Qiao Longfei 提交于 3月 25, 2019
  
  542b52fa
23 3月, 2019 1 次提交
- Q
  
  update transpiler and listen and serv op · de65398c
  由 Qiao Longfei 提交于 3月 23, 2019
  
  de65398c
04 3月, 2019 2 次提交
- X
  polish · 8e094f71
  由 Xin Pan 提交于 2月 27, 2019
```
test=develop
```
  8e094f71
- X
  add deprecation warning. · 9f3a3252
  由 Xin Pan 提交于 2月 27, 2019
```
test=develop
```
  9f3a3252
27 2月, 2019 2 次提交
- X
  polish · 0c277ac6
  由 Xin Pan 提交于 2月 27, 2019
```
test=develop
```
  0c277ac6
- X
  add deprecation warning. · 840cf780
  由 Xin Pan 提交于 2月 27, 2019
```
test=develop
```
  840cf780
20 2月, 2019 1 次提交
- T
  fix params with only 1 dim (#15828) · 971f3bc9
  由 tangwei12 提交于 2月 20, 2019
```
* fix params with only 1 dim
* test=develop
```
  971f3bc9
14 2月, 2019 2 次提交
- D
  update. test=develop · 84f067be
  由 dzhwinter 提交于 2月 14, 2019
```
test=develop
```
  84f067be
- D
  
  add details. test=develop · d453b0dc
  由 dzhwinter 提交于 2月 14, 2019
  
  d453b0dc
08 2月, 2019 2 次提交
- Q
  
  parameter recv can run · 8bda4ab2
  由 Qiao Longfei 提交于 2月 08, 2019
  
  8bda4ab2
- Q
  
  complete recv op · fbd186bd
  由 Qiao Longfei 提交于 2月 08, 2019
  
  fbd186bd
06 2月, 2019 1 次提交
- Q
  
  complete parameter_send · 4356f186
  由 Qiao Longfei 提交于 2月 06, 2019
  
  4356f186
31 1月, 2019 1 次提交
- D
  
  follow comments. test=develop · 0a63234c
  由 dzhwinter 提交于 1月 31, 2019
  
  0a63234c
30 1月, 2019 1 次提交
- D
  
  rerun ci. test=develop · 8b97a3a4
  由 dzhwinter 提交于 1月 30, 2019
  
  8b97a3a4

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致