提交 · a99bc64c63c1035108255a165b51a3e5b8febf6f · PaddlePaddle / Paddle

08 8月, 2019 2 次提交

add fleet util, add some interface in hdfs util (#18752) · a99bc64c

由 jiaqi 提交于 8月 08, 2019

* add fleet util (fleet/utils/fleet_util.py): functions for users' convenience
* add some interface in hdfs util : hdfs is_file、hdfs cat

a99bc64c

Fix memory overwriting of tensors returned by executor (#19030) · 8f537354

由 Leo Chen 提交于 8月 08, 2019

* fix memory overlapping of fetch var (return of executor.run), test=develop

* fix wrong usage of ParallelExecutor in op_test, test=develop

* remove useless parameter and simplify code

* avoid tensor destruct untimely, test=develop

* add testcase independent of OpTest, test=develop

8f537354

05 8月, 2019 2 次提交

fix warpctc.dll not found issue (#18761) · a43a763b

由 liuwei1031 提交于 8月 05, 2019

* fix warpctc.dll not found issue, test=develop

* revert the linux platform change, test=develop

* delete warpctc_lib_path.h.in, test=develop

* add SetPySitePackagePath function

* fix warpctc.dylib not found issue on Mac, test=develop

* improve the paddle lib path setting logic, test=develop

* fix mac ci issue caused by test_warpctc_op unittest, test=develop

* tweak code, test=develop

a43a763b

F
python inference enable_memory_optim(#18817) · 65d98752
由 flame 提交于 8月 05, 2019
```
python inference API support enable_memory_optim
```
65d98752

31 7月, 2019 2 次提交

Trt fp16 support (#18860) · 61238d31

由 Zhaolong Xing 提交于 7月 31, 2019

* Fix Mask rcnn predictor
    1. refine memory optim algorithm to support the model with the block op.
    2. output diff : modify the affine channel fuse
    3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop

* add the missing files.
test=develop

* 1 add trt fp16 support
test=develop

61238d31

C
[DyGraph] Make multi-card program faster (#18892) · 20859c08
由 chengduo 提交于 7月 31, 2019
```
* update parallel.py
test=develop
```
20859c08

29 7月, 2019 2 次提交

Remove legacy C++ memory optimization codes (#18834) · 8008ab4e

由 Zeng Jinle 提交于 7月 29, 2019

* remove legacy memory optimization codes, test=develop

* follow huihuang's comments,test=develop

* follow luotao's comments, test=develop

8008ab4e

add clear_model interface in fleetwrapper (#18815) · 52c1431e

由 Thunderbrook 提交于 7月 29, 2019

* dump slot

* test

* proto

* dump slot

* test

* proto

* code style

* code style

* code style

* style

* add delete after unseen days

* add unseen days

* code style

* conflict solve
test=develop

* add clear model

* code style
test=develop

* code style
test=develop

52c1431e

25 7月, 2019 1 次提交
- C
  fix build strategy doc (#18725) · 292dfbce
  由 chengduo 提交于 7月 25, 2019
```
test=develop
```
  292dfbce
23 7月, 2019 2 次提交

support patch data, add load_one_table, fix bug (#18509) · d18aabb4

由 jiaqi 提交于 7月 23, 2019

（1）support patch data （merge slots of instances of same line id, modify dense layer which
changes its size）
（2）add fleet load_one_table interface, support load from paddle model and load from pslib model
（3）fix push sparse bug which cause push sparse cost more time（about 10% in my testcase）
（4）when some slots are not in one of your network (join/update, etc.)，data feed、collect label info、push/pull sparse will skip these slots， instead of throw error.
（5）add more debug info in TrainFilesWithProfiler

d18aabb4

C
Make fuse_optimizer_op_pass also work when the model contains sparse gradients. (#18664) · fd3aad6c
由 chengduo 提交于 7月 23, 2019
```
* support sparse gradients
test=develop
```
fd3aad6c

18 7月, 2019 1 次提交

Feature/auto_growth_allocator (#18561) · ae58afc5

由 Zeng Jinle 提交于 7月 18, 2019

* feature/auto_growth_allocator, test=develop

* add unittest of AlignedAllocator, test=develop

* try to turn on auto_growth to test on CI, test=develop

* fix segmentation fault in mixed_vector.h, test=develop

* add unittests, test=develop

ae58afc5

17 7月, 2019 1 次提交
- G
  remove async executor and add data_feed.proto to the deps of train demo (#18659) · d714bf03
  由 guru4elephant 提交于 7月 17, 2019
```
* remove async executor and add data_feed.proto to the deps of train demo
```
  d714bf03
12 7月, 2019 1 次提交
- 1
  fix #17430: int64类型的attr训练非预期 (#18264) · b414645a
  由 123malin 提交于 7月 12, 2019
```
* fix int64_t

* update fill constant op unittest

* add empty line
```
  b414645a
11 7月, 2019 2 次提交

G

Polish backwards optimizer dependency codes and use more default values. (#18255) · c0a82748
由 gongweibao 提交于 7月 11, 2019

c0a82748

Feature/buffer_shared_inplace (#17911) · d3003a16

由 Zeng Jinle 提交于 7月 11, 2019

* feature/buffer_shared_inplace, test=develop

* refine code, test=develop

* fix elementwise_add op cpu inplace and sum inplace bug, test=develop

* add unittest and debug log, test=develop

* fix parallel_executor scope bug, polish code, test=develop

* fix sum op, activation op, single_in_place_inference bug, test=develop

* remove kLocalExecScopeName, test=develop

* fix unittest,test=develop

* fix out_var first version bug, test=develop

* follow comments,test=develop

d3003a16

08 7月, 2019 1 次提交

Inference: fix mask rcnn model diff, optim memory usage, memory leak. (#18532) · 88b52a27

由 Zhaolong Xing 提交于 7月 08, 2019

* Fix Mask rcnn predictor
    1. refine memory optim algorithm to support the model with the block op.
    2. output diff : modify the affine channel fuse
    3. add condition_block_infer op
add interface for setting trt calib table dir
test=develop

* add the missing files.
test=develop

88b52a27

02 7月, 2019 1 次提交

supports collective training with programs (#18392) · a873fa84

由 Yi Liu 提交于 7月 02, 2019

1. Since allreduce op has 4 reduce types, We split these four reduce types into four ops
2. We also refined the collective op code, e.g. we separated the collective op kernel into CPUKernel and CUDAKernel, and remove the device specified DeviceContext parameter in template as we already knew the target DeviceContext
3. We remove the newly added Collective op role to reduce the complexity of program and graph analysis

a873fa84

01 7月, 2019 1 次提交
- X
  
  add "import paddle.fluid as fluid" to examples lack of it · 47e2ef38
  由 xsrobin 提交于 7月 01, 2019
  
  47e2ef38
27 6月, 2019 3 次提交

L
Fix dygraph show style (#18297) · fd6631ef
由 lujun 提交于 6月 27, 2019
```
Fix dygraph show style for FluidDoc.
```
fd6631ef
T
fix communicator with pyreader (#18350) · 999d9a59
由 tangwei12 提交于 6月 27, 2019
```
* add is_runnning in communicator, test=develop
```
999d9a59

supports collective communicated training (#18175) · b7128bac

由 HaoRen 提交于 6月 27, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* fix comment
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* supports collective training in executor

* make fetch_list runable with variables, add more unittest for use_program_cache
test=develop

* use unique name for nccl_id

* supports output to stream in program_to_code

* insert sync_comm_stream before regularization; add skip_op_callstack capability in program_to_code

* set op role in collective training

* add collective op role

* fix comment
test=develop

* remove orig file

* add build optimizer by strategy

* add collective strategy

* refine collective strategy

* add multi-process role maker

* refine strategy building factory so that we can easily plugin more strategy

* scale loss grad in collective sgd transpiler

* add support for distributed fc

* code format

* revert some features for dist fc

* add support for distributed fc training

* test=develop
add collective op unittest standard

* test=develop
remove the test_collective directory

* test=develop
remove the test_collective directory

* remove slicegather test

* code format for reducescatter

* update attr of shard_index_op

* Modify macro nccl_helper

* remove test without distribute

* macro collective_helper

* marcro update

* test=develop
update support python3.5

* test=develop change gpu memory use to 0.1 when test

* test=develop
update ut equal func

* test=develop
set flags to 1.5

* test=develop fix pickle dumple  py35

* test=develop
fix divide in slice and add sync_comm_stream
update atol and rtol to 1e-05
rm shard_index op and test
modify read input from file to read from memory
remove origin_program in framework and add i/o in c_sync_calc_stream

* test=develop update unittest sync operator I/O

b7128bac

26 6月, 2019 1 次提交
- Z
  Refine CUDAPlace error message. (#18343) · 5826b72e
  由 Zeng Jinle 提交于 6月 26, 2019
```
* refine cuda place error msg, test=develop

* use LOG(ERROR)+exit(-1), test=develop
```
  5826b72e
21 6月, 2019 1 次提交

dataset (#17973) · 3f8031e2

由 jiaqi 提交于 6月 21, 2019

(1) use channel instead of vector/BlockingQueue in Dataset，to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training.
(2) add Record because MultiSlotType costs too much memory (80B)，fix memory out of limit problem.
(3) add Channel, Archive in paddle/fluid/framework
(4) change dataset from shared_ptr to unique_ptr in pybind
(5) move create/destroy readers from trainer to dataset
(6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network.
(7) fix thread num bug of Dataset when filelist size < thread num
(8) support set_queue_num in InMemoryDataset

3f8031e2

19 6月, 2019 1 次提交
- C
  Update execution_strategy option default value (#18183) · 25f3cd64
  由 chengduo 提交于 6月 19, 2019
```
* update execution_strategy option default value
test=develop

* fix doc error
test=develop
```
  25f3cd64
18 6月, 2019 1 次提交
- Z
  Fix dygraph mem leak (#18082) · 25ab23be
  由 Zeng Jinle 提交于 6月 18, 2019
```
* fix dygraph mem leak, test=develop

* polish msg, test=develop
```
  25ab23be
15 6月, 2019 1 次提交
- S
  
  fix slim int8 mkldnn multithreading issue (#18009) · accb132f
  由 Sylwester Fraczek 提交于 6月 15, 2019
  
  accb132f
12 6月, 2019 1 次提交

combine noavx and avx package (#17889) · 5c06bff2

由 tensor-tang 提交于 6月 12, 2019

* support avx and noavx core

* add catch and give some log

test=develop

* fix build

test=develop

* add missing package

test=develop

* fix pybind name

test=develop

* fix import error

test=develop

* conbime noavx core

test=develop

* add requirements

test=develop

* fix unkown message

test=develop

* fix api spec

test=develop

* refine and clean

test=develop

* update

* pass dist ut

* follow comments

test=develop

* refine scripts

test=develop

5c06bff2

10 6月, 2019 1 次提交
- J
  Feature/refine api for dygraph (#17907) · 4d5f6937
  由 Jiabin Yang 提交于 6月 10, 2019
```
* WIP

* WIP

* test=develop, add api doc and example code for dygraph
```
  4d5f6937
06 6月, 2019 4 次提交

G

Add backward and optimizer operator dependency pass. (#17746) · fbbdc9cc
由 gongweibao 提交于 6月 06, 2019

fbbdc9cc
W
Make ParallelExecutor support Windows GPU (#17787) · 453a49b1
由 wopeizl 提交于 6月 06, 2019
```
* fix the ParallelExecutor on Windows
test=develop
* restrict to use one GPU only under windows
```
453a49b1

翟

INT8 MKL-DNN v2 integrate to slim (#17634) · 993c703b

由翟飞跃提交于 6月 06, 2019

* refactor PR 16865

* delete mergetool files

* test=develop

* test=develop

* test=develop

* test=develop

* create dir for int8 model before call SaveOptimModel

* test=develop

* mkldnn int8 only support linux; test=develop

* refine code; test=develop

* remove comment; test=develop

* refine code; test=develop

* fix bug; test=develop

* add exception for mkldnn_post_training_strategy

* reuse int8v2 CAPI dataset; test=develop

* fix accuracy check bug; test=develop

* remove tab

* convert files to unix format

* test=develop

* reduce CI time;test=develop

* reduce CI time and refine code;test=develop

* refine comment; test=develop

* add cmake FLAGS;test=develop

* remove predict_num;test=develop

993c703b

use pyreader to read data in dygraph mode (#17314) · 841553e1

由 wopeizl 提交于 6月 06, 2019

* use pyreader to read data

* add return_list to PyReader to support return value represented as list

841553e1

05 6月, 2019 1 次提交

Use Python C-API to speed up dygraph trace (#17837) · 674e0ce2

由 Zeng Jinle 提交于 6月 05, 2019

* use python api to reduce python time cost, test=develop

* fix travis ci, test=develop

* fix Py_None error,test=develop

674e0ce2

04 6月, 2019 1 次提交

Using Smart pointer to optimizer memory usage of dyGraph (#17768) · 3b70f870

由 Jiabin Yang 提交于 6月 04, 2019

* for debug

* test=develop, memory optimize for dygraph using shared_ptr

* test=develop, fix travis ci showed error

* test=develop, fix bug for recurrent usage of varbase

* test=develop, init varbase when it need to be Add

3b70f870

31 5月, 2019 1 次提交

fix prepare context redundant code problem, optimize executor by cach… (#17743) · d5239109

由 guru4elephant 提交于 5月 31, 2019

* fix prepare context redundant code problem, optimize executor by caching create_varaiables
test=develop

* cache sub_scope, program, var when use_program_cache=True is set

* make fetch_list runable with variables, add more unittest for use_program_cache

d5239109

27 5月, 2019 2 次提交
- Z
  
  clean code of py_layer in dygraph mode,test=develop (#17661) · 432ac701
  由 Zeng Jinle 提交于 5月 27, 2019
  
  432ac701
- G
  
  Add multi-ncclcomm and 2D ncclallreduce support. (#17263) · 65bbf950
  由 gongweibao 提交于 5月 27, 2019
  
  65bbf950
25 5月, 2019 1 次提交

TRT: Support set dynamic range in int8 mode. (#17524) · 61221ebc

由 Zhaolong Xing 提交于 5月 25, 2019

* fluid int8 train and trt int8 predict align.
trt int8 predict init
op converter

* 2. align fluid int8 train and trt int8 inference.
enhance quant dequant fuse pass
enhance op converter, trt engine, trt engine op, trt subgraph pass.

* 3. add delete_quant_dequant_pass for trt

test=develop

* 4. add the missing file
test=develop

* 5. i modify the c++ interface, but forget to modify the pybind code
fix the IS_TRT_VERSION_GE bug, and fix elementwise op converter
test=develop

61221ebc

24 5月, 2019 1 次提交
- W
  add __str__ method for tensor and lodtensor to support print test=dev… (#17588) · 6724a652
  由 wopeizl 提交于 5月 24, 2019
```
* add __str__ method for tensor and lodtensor to support print test=develop
```
  6724a652

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功