提交 · 7fb817d4471494e1b1ee2168811cbd15308785ab · 机器未来 / Paddle

06 1月, 2020 1 次提交
- 1
  add distributed_strategy (#21710) · 7fb817d4
  由 123malin 提交于 1月 06, 2020
```
* add distributed_strategy
```
  7fb817d4
12 12月, 2019 1 次提交

由 tangwei12 提交于 12月 12, 2019

* add fake init for the trainer, fix large memory hold in the trainer
* do not merge recv vars from a remote endpoint, test=develop
* add recv and save op, merge slice var in one op, save memory
* remove hsigmoid with pull sparse, test=develop

9ad940fd

06 12月, 2019 1 次提交
- H
  Paddlebox Related to Framework (#21586) · c5aec2fe
  由 hutuxian 提交于 12月 06, 2019
```
* Add a single_process_multi_thread transpiler.
* Add some UTs.
* Fix some API description.
```
  c5aec2fe
28 11月, 2019 1 次提交
- K
  add Adam beta1/beta2 support Variable (#21234) · ebfb720a
  由 Kaipeng Deng 提交于 11月 28, 2019
```
* add Adam beta1/beta2 support Variable. test=develop
```
  ebfb720a
01 11月, 2019 1 次提交
- 1
  Optimize decay (#20816) · 20cdff0e
  由 123malin 提交于 11月 01, 2019
```
* update pserver decay blocks

* update distributed notify handler
```
  20cdff0e
17 10月, 2019 1 次提交
- T
  fix fetch handler error with pslib (#20679) · 1d925440
  由 tangwei12 提交于 10月 17, 2019
```
* fix fetch handler error with pslib
* fix distributed lookup table op with 1 pserver
```
  1d925440
15 10月, 2019 2 次提交

Fix communicator slow bug & fix communicator stop bug (#20366) · 940c6ff1

由 Chengmo 提交于 10月 15, 2019

* test=develop,Fix communicator slow bug

* test=develop, delete if() in stop_worker()

* test=develop

* fix UT, test=develop

* fix bug in fetch handler, test=develop

* fix bug in fetch handler, test=develop

* test=develop, fix fetch barrier bug

* test=develop, bug fix

* test=develop, bug fix

* test=develop, fix bug

940c6ff1

1
bug fix: invalid learning rate decay in pserver async mode (#20325) · b4a3b750
由 123malin 提交于 10月 15, 2019
```
* bug fix: invalid learning rate decay in pserver async mode
```
b4a3b750

11 10月, 2019 1 次提交
- T
  doc fix, test=develop, test=document_fix (#20239) · a010d883
  由 tangwei12 提交于 10月 11, 2019
```
* doc fix, test=develop, test=document_fix
```
  a010d883
09 10月, 2019 1 次提交
- C
  Fix transpiler en doc (#20149) · 494d6cf2
  由 Chengmo 提交于 10月 09, 2019
```
* test=develop,test=document_fix,fix transpiler doc,add API.spec
```
  494d6cf2
07 10月, 2019 2 次提交
- C
  Speed GEO-SGD (#20158) · eb05db71
  由 Chengmo 提交于 10月 07, 2019
```
* delete debug vlog & add rpc function & fix word2vec bug & speed GEO-SGD
```
  eb05db71
- T
  Trainer heartbeat for async mode (#19600) · b5a41046
  由 tangwei12 提交于 10月 07, 2019
```
Heartbeat for distributed async training.
```
  b5a41046
30 9月, 2019 2 次提交
- C
  Add GEO-SGD distribute training algorithm (#20018) · 728ec1b4
  由 Chengmo 提交于 9月 30, 2019
```
* refector geo sgd & communicator
```
  728ec1b4
- Z
  Add deprecated memory optimize doc (#20111) · 5f2290ab
  由 Zeng Jinle 提交于 9月 30, 2019
```
* add deprecated memory optimize doc, test=develop, test=document_fix

* merge develop to solve conflict, test=develop, test=document_fix
```
  5f2290ab
26 9月, 2019 1 次提交
- 1
  fix APIs, test=document_preview (#19954) · 6c74e738
  由 123malin 提交于 9月 26, 2019
```
* fix DistributeTranspilerConfig document, test=develop
```
  6c74e738
16 9月, 2019 1 次提交
- T
  fix sync_with_distributed_lookup_table, test=develop (#19737) · 6a1db204
  由 tangwei12 提交于 9月 16, 2019
```
fix wrong place with distributed_lookup_table
```
  6a1db204
06 9月, 2019 1 次提交
- 1
  Optimize fleet API: add input check for some interfaces (#18971) · a25a716e
  由 123malin 提交于 9月 06, 2019
```
* fleet api add input check, test=develop
```
  a25a716e
28 8月, 2019 2 次提交

Y
adapte fleet api for localsgd and support nccl comm configuration in executor (#19443) · 4ef6b845
由 Yi Liu 提交于 8月 28, 2019
```
test=develop
```
4ef6b845

Fix the correctness of async mode at distributed training (#18863) · 65c73684

由 tangwei12 提交于 8月 28, 2019

* fix correctness of the communicator

* fix a bug in send thread when sending var context is empty, test=develop

* add lookup_table_prefetch_op and prefetch optimize, test=develop

* remove remote prefetch GPU supported

* word2vec force with CPU, test=develop

* test dist remote lookup table force with CPU, test=develop

65c73684

26 8月, 2019 1 次提交
- T
  fix distribute transpiler GRPC error code 4, RPC Deadline (#18984) · 19dac67e
  由 tangwei12 提交于 8月 26, 2019
```
* fix sync mode hang in transpiler
* remove sync mode in send/recv
* replace PADDLE_ENFORCE with PADDLE_ENFORCE_NE
```
  19dac67e
16 8月, 2019 1 次提交

remove unused inference_transpiler unit-tests (#19130) · 2f8c7e02

由 Tao Luo 提交于 8月 16, 2019

* remove unused inference_transpiler unit-tests

test=develop

* remove InferenceTranspiler usage in quantize_transpiler.py

test=develop

2f8c7e02

12 8月, 2019 1 次提交
- G
  Polish fleet API to support cuda collective mode and nccl2 mode. (#18966) · 29d87812
  由 gongweibao 提交于 8月 12, 2019
```
Polish fleet API to support cuda collective mode and nccl2 mode
```
  29d87812
10 8月, 2019 1 次提交

Try to deprecate unstable python memory optimize (#18983) · c194b0c8

由 Zeng Jinle 提交于 8月 10, 2019

* deprecate python memory optimize, test=develop

* remove memory_optimize in unittests, test=develop

* add unittests to deprecated interfaces, test=develop

c194b0c8

29 7月, 2019 1 次提交

Remove legacy C++ memory optimization codes (#18834) · 8008ab4e

由 Zeng Jinle 提交于 7月 29, 2019

* remove legacy memory optimization codes, test=develop

* follow huihuang's comments,test=develop

* follow luotao's comments, test=develop

8008ab4e

23 7月, 2019 1 次提交

supports distributed classification (#18690) · 157211c4

由 Yi Liu 提交于 7月 23, 2019

* supports distributed classification training
* update API.spec
* fix evenly division in python3
* change "index_range" to "index_num" in shard_index operator
test=document_preview
test=develop

157211c4

22 7月, 2019 1 次提交
- T
  do some odd jobs (#18641) · d8458483
  由 tangwei12 提交于 7月 22, 2019
```
do some odd jobs, test=develop
```
  d8458483
11 7月, 2019 1 次提交
- G
  
  Polish backwards optimizer dependency codes and use more default values. (#18255) · c0a82748
  由 gongweibao 提交于 7月 11, 2019
  
  c0a82748
02 7月, 2019 1 次提交

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

27 6月, 2019 1 次提交

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

25 6月, 2019 1 次提交
- C
  Fix default value of fluid.memory_optimize (#18295) · e06c69c7
  由 chengduo 提交于 6月 25, 2019
```
* fix default value of fluid.memory_optimize
test=develop

* fix api.spec
test=develop
```
  e06c69c7
31 5月, 2019 1 次提交
- T
  fix document of python api get_startup_program() (#17764) · 659b72a9
  由 tangwei12 提交于 5月 31, 2019
```
* add example to get_startup_program()
* fix example to get_startup_program()
```
  659b72a9
30 5月, 2019 1 次提交
- Y
  
  fix distributed_transpiler.py api test=develop (#17668) · ac92e4c0
  由 yaoxuefeng 提交于 5月 30, 2019
  
  ac92e4c0
29 5月, 2019 2 次提交
- G
  
  fix 2dconn test=develop (#17681) · 0d561ef4
  由 gongweibao 提交于 5月 29, 2019
  
  0d561ef4
- T
  fix doc in transpiler, test=develop (#17313) · 0d3c48e0
  由 tangwei12 提交于 5月 29, 2019
```
* fix doc in transpiler, test=develop
```
  0d3c48e0
27 5月, 2019 1 次提交
- G
  
  Add multi-ncclcomm and 2D ncclallreduce support. (#17263) · 65bbf950
  由 gongweibao 提交于 5月 27, 2019
  
  65bbf950
24 5月, 2019 1 次提交

[MKL-DNN] Add Fully Connected Op for inference only(#15226) · 0c39b97b

由 Michał Gallus 提交于 5月 24, 2019

* fuse mul and elementwise add to fc

* Reimplement the FC forward operator

* Fix FC MKLDNN integration by transposing weights

* Add FC MKLDNN Pass

test=develop

* FC MKLDNN Pass: change memcpy to std::copy

* Fix MKLDNN FC handling of mismatch input and weights dims

* Lower tolerance for MKL-DNN in resnet50 test

test=develop

* Adjust FC to support MKLDNN Op placement

test=develop

* Adjust Placement Op to set use_mkldnn attribute for graph

test=develop

* MKLDNN FC: fix weights format so that gemm version is called

test=develop

* FC MKLDNN: Remove tolerance decrease from tester_helper

* FC MKL-DNN: Refactor the code, change input reorder to weight reorder

* MKL-DNN FC: Introduce operator caching

test=develop

* FC MKL-DNN: Fix the tensor type in ExpectedKernelType

test=develop

* FC MKL-DNN: fix style changes

test=develop

* FC MKL-DNN: fallback to native on non-supported dim sizes

test=develop

* FC MKLDNN: fix CMake paths

test=develop

* FC MKLDNN: Refine placement pass graph mkldnn attribute

test=develop

* Fix Transpiler error for fuse_conv_eltwise

test=develop

* Fix missing STL includes in files

test=develop

* FC MKL-DNN: Enable new output size computation

Also, refine pass to comply with newest interface.
test=develop

* FC MKL-DNN: enable only when fc_mkldnn_pass is enabled

* FC MKL-DNN: Allow Weights to use oi or io format

* FC MKL-DNN: Adjust UT to work with correct dims

test=develop

* Enable MKL DEBUG for resnet50 analyzer

test=develop

* FC MKL-DNN: Improve Hashing function

test=develop

* FC MKL-DNN: Fix shape for fc weights in transpiler

* FC MKL-DNN: Update input pointer in re-used fc primitive

* Add log for not handling fc fuse for unsupported dims

test=develop

* FC MKL-DNN: Move transpose from pass to Op Kernel

test=develop

* FC MKL-DNN: Disable transpose in unit test

test=develop

* FC MKL-DNN: Remove fc_mkldnn_pass from default list

* Correct Flag for fake data analyzer tests

test=develop

* FC MKL-DNN: Add comment about fc mkldnn pass disablement

test=develop

* FC MKL-DNN: Disable fc in int8 tests

test=develop

0c39b97b

23 5月, 2019 2 次提交
- Q
  fix distribute doc test=develop (#17318) · 92e7d5d7
  由 Qiao Longfei 提交于 5月 23, 2019
```
* fix distribute doc
```
  92e7d5d7
- Q
  Async exe support communicator (#17386) · 58f7695a
  由 Qiao Longfei 提交于 5月 23, 2019
```
Async exe support communicator
```
  58f7695a
20 5月, 2019 1 次提交
- L
  improve the doc of paddle.fluid.memory_optimize, test=develop (#17473) · f82e4d75
  由 liuwei1031 提交于 5月 20, 2019
```
* improve the doc of paddle.fluid.memory_optimize, test=develop

* fix typo, test=develop
```
  f82e4d75
16 5月, 2019 1 次提交

improve the API Sample of DataFeeder, memory_optimize and release_memory (#17374) · 6a53fa95

由 liuwei1031 提交于 5月 16, 2019

* improve the API Sample of DataFeeder, memory_optimize and release_memory, test=develop

* update API.spec, test=develop, test=document_preview

* tweak the code format of feed API, test=develop

*  update API.spec, test=develop

* improve doc for DataFeeder and default_main_program, test=develop

6a53fa95

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致