提交 · 2fdbc1ce65be36770f840e405222fc6f222d0d50 · BaiXuePrincess / Paddle

19 6月, 2018 1 次提交

由 Qiyang Min 提交于 6月 19, 2018

* Add sub_blocks of lr_decay_op to pserver_prog after distribute_transpiler

* Remove unused logs and logics

* 1. Add ops to new block (considering the nested block condition)
2. Follow the original hierarchy of blocks
3. Change the function's name and remove debug lines

a29cb4be

16 6月, 2018 2 次提交
- Q
  
  update comment · 2b1ecdf5
  由 qiaolongfei 提交于 6月 16, 2018
  
  2b1ecdf5
- Q
  
  add keep_kids flag for executor · daa0fbd5
  由 qiaolongfei 提交于 6月 16, 2018
  
  daa0fbd5
15 6月, 2018 1 次提交

Modify Pybind LoDTensor API according to length-based LoD (#11106) · 417fcf4f

由 Kexin Zhao 提交于 6月 15, 2018

* add lod_tensor util and modify pybind

* refind pybind LoDTensor API and modify LoDTensor and DataFeeder test

* fix test error

* fix detection map op test

* fix reorder_lod_tensor test

* fix seq_concat_op

* fix chunk evel op test

* fix target assign op

* fix warp ctc op

* address comments step 1: reverse reset_lod op

* step 2: modify op test

* add warning message

* remove has_valid_lod

* add back has_valid_lod

* address comments

* add exception catching trial

417fcf4f

14 6月, 2018 3 次提交

T

initial with only 1 mkl/openblas threads for each pthreads · 3e58df20
由 tensor-tang 提交于 6月 14, 2018

3e58df20

Fix NCCLBcast hang up bug in Parallel Executor (#11377) · 046bb5c8

由 Qiyang Min 提交于 6月 13, 2018

* 1. Create buddy allocator in each places before NcclBcast the variables
2. Check the memory usage of ALL gpus rather than the first one

* 1. Make NCCLGroupGuard guards only the ncclBcast part, which avoid ncclGroupEnd blocking the exception throwing
2. NOTE the usage of NCCLGroupGuard

* Remove the memory usage check of gpus

* Fix code style

046bb5c8

Y

Dynamic Graph first prototype (#11415) · d827c6e8
由 Yang Yang(Tony) 提交于 6月 13, 2018

d827c6e8

13 6月, 2018 3 次提交
- Q
  
  add row_size for selected rows in DebugStringEx · 7ebef493
  由 qiaolongfei 提交于 6月 13, 2018
  
  7ebef493
- Q
  
  fix concurrency_test build error on mac · 82416f18
  由 qiaolongfei 提交于 6月 13, 2018
  
  82416f18
- Q
  
  fix build on mac · 9ebbfa6b
  由 qiaolongfei 提交于 6月 13, 2018
  
  9ebbfa6b
12 6月, 2018 2 次提交
- T
  
  throw warning if try to use mkldnn while not compiled · 6602db5b
  由 tensor-tang 提交于 6月 12, 2018
  
  6602db5b
- W
  Trainer send term signal (#11220) · 34865f2d
  由 Wu Yi 提交于 6月 12, 2018
```
* wip

* use executor.complete to end trainer

* fix build

* fix build with distribute off

* fix typo

* fix cmake typo

* fix build
```
  34865f2d
11 6月, 2018 11 次提交
- Q
  
  fix build problem · 83a577e8
  由 qiaolongfei 提交于 6月 11, 2018
  
  83a577e8
- D
  add inplace attribute to op_proto_maker (#10665) · bfa3fd6f
  由 dzhwinter 提交于 6月 11, 2018
```
* "add inplace attribute"

* "register inplace attribute"

* "change se-next model for memory-reuse"

* "fix typo"

* repick

* fix merge conflict

* "fix stupid error"
```
  bfa3fd6f
- G
  
  polish (#11363) · 9087c668
  由 gongweibao 提交于 6月 11, 2018
  
  9087c668
- C
  replace use_event with use_cuda, because use_event means the program running... · aadaadf7
  由 chengduoZH 提交于 6月 11, 2018
```
replace use_event with use_cuda, because use_event means the program running with CUDA, so use_cuda maybe more intuitive.
```
  aadaadf7
- G
  
  Clean `sendop` `recv` operator. (#11309) · 627d7a64
  由 gongweibao 提交于 6月 11, 2018
  
  627d7a64
- C
  
  follow comments · 961fbce8
  由 chengduoZH 提交于 6月 11, 2018
  
  961fbce8
- C
  
  Add cpu test for parallel_executor_crf executor_fetch_feed, and enable these tests · 7b723839
  由 chengduoZH 提交于 6月 11, 2018
  
  7b723839
- C
  
  fix allReduce bug · d24e046c
  由 chengduoZH 提交于 6月 11, 2018
  
  d24e046c
- C
  
  add cpu test · a57e8a43
  由 chengduoZH 提交于 6月 11, 2018
  
  a57e8a43
- Q
  
  add more debug string · 0485405b
  由 qiaolongfei 提交于 6月 11, 2018
  
  0485405b
- G
  
  Add comments to a singleton. (#11333) · 062d5a56
  由 gongweibao 提交于 6月 11, 2018
  
  062d5a56
10 6月, 2018 5 次提交
- C
  
  small fix · 1e731f59
  由 chengduoZH 提交于 6月 10, 2018
  
  1e731f59
- C
  
  ADD CPU_NUM · 495368c2
  由 chengduoZH 提交于 6月 10, 2018
  
  495368c2
- C
  
  nccl_all_reduce_op_handle => all_reduce_op_handle · 27073c28
  由 chengduoZH 提交于 6月 10, 2018
  
  27073c28
- C
  
  code refine · 2d94697a
  由 chengduoZH 提交于 6月 10, 2018
  
  2d94697a
- C
  
  fix in c++ side · 5a3c8bf8
  由 chengduoZH 提交于 6月 09, 2018
  
  5a3c8bf8
08 6月, 2018 6 次提交
- L
  
  add FLAGS_use_mkldnn to global control use_mkldnn · c6d230e0
  由 Luo Tao 提交于 6月 08, 2018
  
  c6d230e0
- F
  
  fix a small compile error on Mac · d745840a
  由 fengjiayi 提交于 6月 08, 2018
  
  d745840a
- Y
  
  polish docs · 5be454bf
  由 yi.wu 提交于 6月 08, 2018
  
  5be454bf
- C
  
  add SSA graph checker · 0c851cab
  由 chengduoZH 提交于 6月 08, 2018
  
  0c851cab
- C
  
  refine logic · 1076e851
  由 chengduoZH 提交于 6月 07, 2018
  
  1076e851
- Y
  fix dist train error (#11281) · 0aa9546e
  由 Yancey 提交于 6月 08, 2018
```
* fix dist train error

* update by comment
```
  0aa9546e
07 6月, 2018 6 次提交

T

make scope thread safe · b8d315fb
由 tensor-tang 提交于 6月 07, 2018

b8d315fb

split reduce op into multiple libraries, accelerate the compiling (#11029) · d48172f2

由 dzhwinter 提交于 6月 07, 2018

* "split into multiple .ccl"

* "refine file structure"

* "refine files"

* "remove the cmakelist"

* "fix typo"

* "fix typo"

* fix ci

d48172f2

Big data op_test benchmark, for checking output consistent in different runs. (#10646) · f7c96f07

由 dzhwinter 提交于 6月 07, 2018

* "init benchmark ops"

* "untrack outputs"

* "delete some usused code"

* "benchmark"

* "fix ci"

* "fix op test"

* "fix uint16 missing"

* "fix ci"

* "follow comments"

* "fix ci"

* "follow comments"

* "conficts. merge develop branch"

* repick

* "merge develop branch"

f7c96f07

F

fix bugs in the implementation of 'HasInput' and 'HasOutput' · dc8e0b49
由 fengjiayi 提交于 6月 07, 2018

dc8e0b49

Mkldnn layout (#11040) · 3ff9ba0e

由 mozga-intel 提交于 6月 07, 2018

* Add MKLDNN layout support in Paddle

Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout
can be used in MKLDNN enabled OP kernel. Before this commit, NCHW
is hardcode to be used in all MKLDNN op kernels. As a result,
non-optimized execution path is selected in MKLDNN primitive which
bring worse performance.
Besides framework change, three MKLDNN OP kernels were updated
for using new MKLDNN layout. They are conv/pool2d/batch_norm.
Other MKLDNN OP kernels need be also updated in similar way to
achieve best performance.

* Add MKLDNN layout support in activation OP

* Don't populate layout from input to output when kMKLDNN in

* Refine pool mkldnn op kernel

* MKLDNN layout

* Remove the inferitance from tensor file

* MKLDNN layout: refactoring

* Remove additional #define to register new operator

* Prepare mkldnn tests to work with layout

3ff9ba0e

C

replace graph_builder_factory with ssa_graph_builder_factory · 8291b916
由 chengduoZH 提交于 6月 07, 2018

8291b916

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致