提交 · a914d9b116af76142a20059f91068ed9c4835f57 · 机器未来 / Paddle

07 5月, 2019 7 次提交

由 Zhen Wang 提交于 5月 07, 2019

* Add MovingAverageAbsMaxScale operator which is only used for calculating the quantization scale.

* test=develop

* change the output into inplace. test=develop

* Revert "test=develop"

This reverts commit 696cf62699ba1e1c98f61f7345ac7060010eb29a.

* Revert "change the output into inplace. test=develop"

This reverts commit a19acd20f07eee82622701a3015e6e9c073a5e0b.

* test=develop.

* update the MovingAverageAbsMaxScaleOp test. test=develop

a914d9b1

optimize sum op (#16820) · 32b62c25

由 zhaoyuchen2018 提交于 5月 07, 2019

* optimize sum op

fuse multi eigen kernel calls into one cuda kernel.
refine code

test=develop.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code according to comments.

test=develop

* refine code

delete sum_op_gpu.h
test=develop

* Fix test error.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code in format.

test=develop.

* refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

32b62c25

石

Cherry-pick benchmark related changes from release/1.4 (#17156) · a72dbe9a

由石晓伟提交于 5月 07, 2019

* cherry-pick commit from 88770542

* cherry-pick commit from 3f0b97df

* cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn

(cherry picked from commit 8643dbc2)

* Cherry-Pick from 16662 : Anakin subgraph cpu support

(cherry picked from commit 7ad182e1)

* Cherry-pick from 1662, 16797.. : add anakin int8 support

(cherry picked from commit e14ab180)

* Cherry-pick from 16813 : change singleton to graph RegistBlock
test=release/1.4

(cherry picked from commit 4b9fa423)

* Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2

Support ShuffleNet and MobileNet-v2, test=release/1.4

(cherry picked from commit a6fb066f)

* Cherry-pick : anakin subgraph add opt config layout argument #16846
test=release/1.4

(cherry picked from commit 8121b3ec)

* 1. add shuffle_channel_detect

(cherry picked from commit 6efdea89)

* update shuffle_channel op convert, test=release/1.4

(cherry picked from commit e4726a06)

* Modify symbol export rules

test=develop

a72dbe9a

T
fix api_example of tree_conv (#17239) · 16922e00
由 Tao Luo 提交于 5月 07, 2019
```
test=develop
```
16922e00
J
Refine api doc (#17230) · ef66baed
由 jerrywgz 提交于 5月 07, 2019
```
* refine api comment, test=develop
```
ef66baed

call SetNumThreads everytime to avoid missing omp thread setting (#17224) · 54636a19

由 Leo Zhao 提交于 5月 07, 2019

* call SetNumThreads everytime to avoid missing omp thread setting

resolve #17153
test=develop

* add paddle_num_threads into config for test_analyzer_pyramid_dnn

resolve #17153
test=develop

54636a19

Y

Fix some APIs' example (#17214) · 6b0f27e8
由 Yibing Liu 提交于 5月 07, 2019

6b0f27e8

06 5月, 2019 6 次提交
- R
  Fix unexecutable API examples (#17218) · 5817077c
  由 ruri 提交于 5月 06, 2019
```
* fix unexecutable API comments, test=develop

* add API.spec,test=develop
```
  5817077c
- J
  fix distribute fpn proposals, test=develop (#16152) · cc95a751
  由 jerrywgz 提交于 5月 06, 2019
```
* fix distribute fpn proposals, test=develop
```
  cc95a751
- T
  fix profiler and name_scope API examples (#17212) · 9ec4615d
  由 Tao Luo 提交于 5月 06, 2019
```
* fix profiler and name_scope API examples

test=develop

* update API.spec

test=develop
```
  9ec4615d
- Z
  Fix tensor_py.h (#17195) · c5eeecca
  由 Zeng Jinle 提交于 5月 06, 2019
```
* fix tensor_py,test=develop

* change class name,test=develop
```
  c5eeecca
- Z
  Add use_cuda to inplace pass (#17205) · ee2028a1
  由 Zeng Jinle 提交于 5月 05, 2019
```
* add use_cuda to inplace pass,test=develop

* add test softmax_with_xe_inplace test,test=develop
```
  ee2028a1
- C
  It doesn't need sync when fetch_list nit not empty (#17201) · 950aec55
  由 chengduo 提交于 5月 06, 2019
```
test=develop
```
  950aec55
05 5月, 2019 3 次提交
- J
  Enhance concat op to support empty input. (#17015) · a72907bb
  由 jerrywgz 提交于 5月 05, 2019
```
* enhance_concat, test=develop
```
  a72907bb
- W
  
  use two GPUs to run the exclusive test test=develop (#17187) · 83c4f772
  由 wopeizl 提交于 5月 05, 2019
  
  83c4f772
- C
  Remove unnecessary set_devices (#17158) · 3c6ab799
  由 chengduo 提交于 5月 05, 2019
```
* remove unnecessary set_devices
```
  3c6ab799
01 5月, 2019 1 次提交

remove async executor python api to fix document (#17174) · f938ccec

由 guru4elephant 提交于 5月 01, 2019

* remove async executor python api
test=develop

* remove test_async_executor.py
add executor train_from_dataset demo
test=develop

* fix import bug
test=develop

f938ccec

30 4月, 2019 7 次提交
- Z
  Fix mem leak when converting Tensor to numpy array (#17182) · 5dfe2ab9
  由 Zeng Jinle 提交于 4月 30, 2019
```
* fix mem leak when converting Tensor to numpy array
test=develop

* remove unused unittest,test=develop

* follow comments, test=develop

* fix dygraph bug,test=develop
```
  5dfe2ab9
- H
  Fix a typo in gpu_info.cc (#17175) · e4a53324
  由 Huihuang Zheng 提交于 4月 30, 2019
```
test=develop
```
  e4a53324
- T
  fix bn fuse vardesc and add model saver (#17143) · 79ed1c76
  由 tensor-tang 提交于 4月 30, 2019
```
* fix bn fuse vardesc and add model saver

test=develop

* unify save model in test helper

test=develop

* fix mkdir on windows

test=develop

* remove magic number use bn bias var desc

test=develop
```
  79ed1c76
- Z
  Rewrite inplace pass and fix gc bug (#17126) · 4e1bc6e8
  由 Zeng Jinle 提交于 4月 29, 2019
```
* fix op graph view
test=develop

* rewrite inplace pass and fix reference count pass bug
test=develop

* fix unittest failed
test=develop

* follow comments, test=develop
```
  4e1bc6e8
- Z
  
  fix reader default stream,test=develop (#17106) · 08773b60
  由 Zeng Jinle 提交于 4月 29, 2019
  
  08773b60
- X
  polish the label_smooth (#17138) · bc48453b
  由 xiaoting 提交于 4月 30, 2019
```
* polish the label_smooth

test=develop

* polish code

test=develop
```
  bc48453b
- L
  fix assertion failure issue when test_analyzer_bert uses ngraph (#17148) · bf4b21fa
  由 Leo Zhao 提交于 4月 30, 2019
```
resolve #17147
test=develop
```
  bf4b21fa
29 4月, 2019 3 次提交
- T
  cvm op feature (#17081) · deb510d4
  由 tangwei12 提交于 4月 29, 2019
```
cvm without LoD.
```
  deb510d4
- W
  1. move the API check into CPU process (#17110) · 3acb3635
  由 wopeizl 提交于 4月 29, 2019
```
* 1. move the API check into CPU process
2. adjust the check order
```
  3acb3635
- T
  
  Supplementary monitoring file reason explanation (#17131) · 92ce4452
  由 tianshuo78520a 提交于 4月 29, 2019
  
  92ce4452
28 4月, 2019 2 次提交

Refine dropout gpu memory (#17095) · 28d69d71

由 Zeng Jinle 提交于 4月 28, 2019

* refine_dropout_mem,test=develop

* # This is a combination of 14 commits.
# The first commit's message is:
remove ut test_dist_word2vec in mac ci, will fix it in private, test=develop (#17066)

# This is the 2nd commit message:

Fleet unify distributed training (#16791)

* implement distributed transpiler with fleet
# This is the 3rd commit message:

ParallelDyGraph with GPU collective mode (#16827)

implement dygraph.parallel.DataParallel to hook reduce op.

# This is the 4th commit message:

Init mixed precision training interface (#16856)

* Init mixed precision training interface

* Add fp16 test script

test=develop

* All initializers support float16

test=develop

* Code cleanup & add more code annotations

test=develop

* Update API spec

test=develop

* Add usage example in doc

test=develop

# This is the 5th commit message:

fix reference_count_pass,test=develop (#17060)

test=develop
# This is the 6th commit message:

Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (#17090)

* Cache the information of linear interpolation in forward and use it in backward.
test=develop

* Fix cuda kernel.
test=develop

# This is the 7th commit message:

remove unnecessary prepare_data (#17080)

test=develop
# This is the 8th commit message:

fix interpolate cu. test=develop (#17101)

# This is the 9th commit message:

test=develop, double backward leaky_relu (#17067)

backward of backward: leaky_relu
# This is the 10th commit message:

fix fuse optimizer ops (#17102)

test=develop
# This is the 11th commit message:

truncated_gaussian_random supported in distributed training, test=develop (#17091)

# This is the 12th commit message:

 Detailed coordinate description for yolov3 loss (#17007)

* Detailed coordinate description for yolov3 loss

test=develop

* modified api.spec

test=develop

* modified loss name

* fix api.spec

test=develop

* polish description

test=develop

* modified api.spec

test=develop

# This is the 13th commit message:

fix test_weight_decay (#17109)

test=develop
# This is the 14th commit message:

Path flag (#17105)

* fix python/paddle/fluid/__init__.py detecting problems

28d69d71

Use CudnnWorkspaceHandle in exhaustive search (#17082) · b9494058

由 Huihuang Zheng 提交于 4月 28, 2019

1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn.
2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search.

test=develop

b9494058

27 4月, 2019 1 次提交
- T
  Path flag (#17105) · 2192e7bb
  由 tianshuo78520a 提交于 4月 27, 2019
```
* fix python/paddle/fluid/__init__.py detecting problems
```
  2192e7bb
26 4月, 2019 5 次提交
- X
  Detailed coordinate description for yolov3 loss (#17007) · 7da7881c
  由 xiaoting 提交于 4月 26, 2019
```
* Detailed coordinate description for yolov3 loss

test=develop

* modified api.spec

test=develop

* modified loss name

* fix api.spec

test=develop

* polish description

test=develop

* modified api.spec

test=develop
```
  7da7881c
- C
  fix fuse optimizer ops (#17102) · 794a1958
  由 chengduo 提交于 4月 26, 2019
```
test=develop
```
  794a1958
- C
  test=develop, double backward leaky_relu (#17067) · 258e000b
  由 ceci3 提交于 4月 26, 2019
```
backward of backward: leaky_relu
```
  258e000b
- K
  
  fix interpolate cu. test=develop (#17101) · 10c487eb
  由 Kaipeng Deng 提交于 4月 26, 2019
  
  10c487eb
- T
  remove unnecessary prepare_data (#17080) · aca60e9a
  由 Tao Luo 提交于 4月 26, 2019
```
test=develop
```
  aca60e9a
25 4月, 2019 4 次提交

Speedup roi_perspective_transform op by caching the information of linear... · 55ce36e9

由 whs 提交于 4月 25, 2019

Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (#17090)

* Cache the information of linear interpolation in forward and use it in backward.
test=develop

* Fix cuda kernel.
test=develop

55ce36e9

Z
fix reference_count_pass,test=develop (#17060) · 842ded14
由 Zeng Jinle 提交于 4月 25, 2019
```
test=develop
```
842ded14

Init mixed precision training interface (#16856) · beda7825

由 Yibing Liu 提交于 4月 25, 2019

* Init mixed precision training interface

* Add fp16 test script

test=develop

* All initializers support float16

test=develop

* Code cleanup & add more code annotations

test=develop

* Update API spec

test=develop

* Add usage example in doc

test=develop

beda7825

Y
ParallelDyGraph with GPU collective mode (#16827) · 0b07eef1
由 Yan Xu 提交于 4月 25, 2019
```
implement dygraph.parallel.DataParallel to hook reduce op.
```
0b07eef1

24 4月, 2019 1 次提交

specify the cuda arch name and bin to decrease the compile time for i… (#17020) · f5d6937f

由 wopeizl 提交于 4月 24, 2019

1. specify the cuda arch name and bin to decrease the compile time for inference test=develop
2. simplify the script and add comments
3. remove the fluid process from cicheck

f5d6937f

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致