提交 · 792443ef233c91458796305821ca351bc0ddbff7 · PaddlePaddle / PaddleDetection

08 5月, 2019 7 次提交

Refine elementwise kernel. (#16952) · 792443ef

由 zhaoyuchen2018 提交于 5月 08, 2019

* Refine elementwise kernel.

Add a simple cuda kernel if grad x and y both exist
Use 2D block cuda kernel to do broadcast.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

792443ef

Optimize the cuda implementation of sum_op (#17283) · 6b84688b

由 Yiqun Liu 提交于 5月 08, 2019

* Optimize the cuda implementation of sum_op, which add two lod_tensors inplace.
test=develop

* Use eigen to add to tensors.
test=develop

6b84688b

C
update assert (#17282) · db5e74ab
由 chengduo 提交于 5月 08, 2019
```
test=develop
```
db5e74ab

Fix concat shape check (#17247) · c3195de5

由 Hongyu Liu 提交于 5月 08, 2019

* fix shape_check; test=develop

* fix format; test=develop

* fix format; test=develop

* fix ddim bug; test=develop

* fix c++ format; test=develop

* change function name; test=develop

c3195de5

W

Fix bp of roi perspective transform op. (#17216) · 7d7e2995
由 whs 提交于 5月 08, 2019

7d7e2995

Adding lrn op for ngraph engine (#17189) · 7bd1d03e

由 baojun 提交于 5月 07, 2019

* added lrn op test=develop

* Added CreateConstant method test=develop

* avoid duplicates test=develop

7bd1d03e

G

Fix code in document. (#17237) · 91784f8e
由 gongweibao 提交于 5月 08, 2019

91784f8e

07 5月, 2019 7 次提交

Enhance inplace/mem-opt pass and enhance softmax_with_cross_entropy op inplace (#17225) · 4f859408

由 Zeng Jinle 提交于 5月 07, 2019

* add use_cuda to inplace pass,test=develop

* add test softmax_with_xe_inplace test,test=develop

* fix potential inplace bug
test=develop

* add more skip vars in mem opt pass,test=develop

* follow comment,test=develop

* follow comments,move duplicate out arg check to program->graph,test=develop

4f859408

B

update sofmax with axis arg test=develop (#17190) · e782b54b
由 baojun 提交于 5月 07, 2019

e782b54b

Softmax_cross_entropy op add axis (#16806) · a71d8fdb

由 Kaipeng Deng 提交于 5月 07, 2019

* add attr axis infershape. test=develop

* add CUDA kernel. test=develop

* fix unittest. test=develop

* fix unittest for soft_label. test=develop

* fix fp16 unittest. test=develop

* remove comment code. test=develop

* refine test for axis. test=develop

* add python api. test=develop

* fix doc. test=develop

* fix fp16 unittest. test=develop

* fix ngraph test. test=develop

* fix ENFORCE for test_imperative_transformer. test=develop

* fit for ngraph test. test=develop

* fix after rebase develop. test=develop

* fix doc. test=develop

* fix API.spec. test=develop

* fix test_layers. test=develop

* fix format. test=develop

a71d8fdb

Quant output scale (#17215) · a914d9b1

由 Zhen Wang 提交于 5月 07, 2019

* Add MovingAverageAbsMaxScale operator which is only used for calculating the quantization scale.

* test=develop

* change the output into inplace. test=develop

* Revert "test=develop"

This reverts commit 696cf626.

* Revert "change the output into inplace. test=develop"

This reverts commit a19acd20.

* test=develop.

* update the MovingAverageAbsMaxScaleOp test. test=develop

a914d9b1

optimize sum op (#16820) · 32b62c25

由 zhaoyuchen2018 提交于 5月 07, 2019

* optimize sum op

fuse multi eigen kernel calls into one cuda kernel.
refine code

test=develop.
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code according to comments.

test=develop

* refine code

delete sum_op_gpu.h
test=develop

* Fix test error.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code in format.

test=develop.

* refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

32b62c25

石

Cherry-pick benchmark related changes from release/1.4 (#17156) · a72dbe9a

由石晓伟提交于 5月 07, 2019

* cherry-pick commit from 88770542

* cherry-pick commit from 3f0b97df

* cherry-pick from 16691:Anakin subgraph support yolo_v3 and faster-rcnn

(cherry picked from commit 8643dbc2)

* Cherry-Pick from 16662 : Anakin subgraph cpu support

(cherry picked from commit 7ad182e1)

* Cherry-pick from 1662, 16797.. : add anakin int8 support

(cherry picked from commit e14ab180)

* Cherry-pick from 16813 : change singleton to graph RegistBlock
test=release/1.4

(cherry picked from commit 4b9fa423)

* Cherry Pick : 16837 Support ShuffleNet and MobileNet-v2

Support ShuffleNet and MobileNet-v2, test=release/1.4

(cherry picked from commit a6fb066f)

* Cherry-pick : anakin subgraph add opt config layout argument #16846
test=release/1.4

(cherry picked from commit 8121b3ec)

* 1. add shuffle_channel_detect

(cherry picked from commit 6efdea89)

* update shuffle_channel op convert, test=release/1.4

(cherry picked from commit e4726a06)

* Modify symbol export rules

test=develop

a72dbe9a

J
Refine api doc (#17230) · ef66baed
由 jerrywgz 提交于 5月 07, 2019
```
* refine api comment, test=develop
```
ef66baed

06 5月, 2019 2 次提交
- J
  fix distribute fpn proposals, test=develop (#16152) · cc95a751
  由 jerrywgz 提交于 5月 06, 2019
```
* fix distribute fpn proposals, test=develop
```
  cc95a751
- Z
  Add use_cuda to inplace pass (#17205) · ee2028a1
  由 Zeng Jinle 提交于 5月 05, 2019
```
* add use_cuda to inplace pass,test=develop

* add test softmax_with_xe_inplace test,test=develop
```
  ee2028a1
05 5月, 2019 1 次提交
- J
  Enhance concat op to support empty input. (#17015) · a72907bb
  由 jerrywgz 提交于 5月 05, 2019
```
* enhance_concat, test=develop
```
  a72907bb
30 4月, 2019 4 次提交
- Z
  Rewrite inplace pass and fix gc bug (#17126) · 4e1bc6e8
  由 Zeng Jinle 提交于 4月 29, 2019
```
* fix op graph view
test=develop

* rewrite inplace pass and fix reference count pass bug
test=develop

* fix unittest failed
test=develop

* follow comments, test=develop
```
  4e1bc6e8
- Z
  
  fix reader default stream,test=develop (#17106) · 08773b60
  由 Zeng Jinle 提交于 4月 29, 2019
  
  08773b60
- X
  polish the label_smooth (#17138) · bc48453b
  由 xiaoting 提交于 4月 30, 2019
```
* polish the label_smooth

test=develop

* polish code

test=develop
```
  bc48453b
- L
  fix assertion failure issue when test_analyzer_bert uses ngraph (#17148) · bf4b21fa
  由 Leo Zhao 提交于 4月 30, 2019
```
resolve #17147
test=develop
```
  bf4b21fa
29 4月, 2019 1 次提交
- T
  cvm op feature (#17081) · deb510d4
  由 tangwei12 提交于 4月 29, 2019
```
cvm without LoD.
```
  deb510d4
28 4月, 2019 2 次提交

Refine dropout gpu memory (#17095) · 28d69d71

由 Zeng Jinle 提交于 4月 28, 2019

* refine_dropout_mem,test=develop

* # This is a combination of 14 commits.
# The first commit's message is:
remove ut test_dist_word2vec in mac ci, will fix it in private, test=develop (#17066)

# This is the 2nd commit message:

Fleet unify distributed training (#16791)

* implement distributed transpiler with fleet
# This is the 3rd commit message:

ParallelDyGraph with GPU collective mode (#16827)

implement dygraph.parallel.DataParallel to hook reduce op.

# This is the 4th commit message:

Init mixed precision training interface (#16856)

* Init mixed precision training interface

* Add fp16 test script

test=develop

* All initializers support float16

test=develop

* Code cleanup & add more code annotations

test=develop

* Update API spec

test=develop

* Add usage example in doc

test=develop

# This is the 5th commit message:

fix reference_count_pass,test=develop (#17060)

test=develop
# This is the 6th commit message:

Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (#17090)

* Cache the information of linear interpolation in forward and use it in backward.
test=develop

* Fix cuda kernel.
test=develop

# This is the 7th commit message:

remove unnecessary prepare_data (#17080)

test=develop
# This is the 8th commit message:

fix interpolate cu. test=develop (#17101)

# This is the 9th commit message:

test=develop, double backward leaky_relu (#17067)

backward of backward: leaky_relu
# This is the 10th commit message:

fix fuse optimizer ops (#17102)

test=develop
# This is the 11th commit message:

truncated_gaussian_random supported in distributed training, test=develop (#17091)

# This is the 12th commit message:

 Detailed coordinate description for yolov3 loss (#17007)

* Detailed coordinate description for yolov3 loss

test=develop

* modified api.spec

test=develop

* modified loss name

* fix api.spec

test=develop

* polish description

test=develop

* modified api.spec

test=develop

# This is the 13th commit message:

fix test_weight_decay (#17109)

test=develop
# This is the 14th commit message:

Path flag (#17105)

* fix python/paddle/fluid/__init__.py detecting problems

28d69d71

Use CudnnWorkspaceHandle in exhaustive search (#17082) · b9494058

由 Huihuang Zheng 提交于 4月 28, 2019

1. Use CudnnWorkspaceHandle in exhaustive search of conv_cudnn.
2. For Ops using CudnnWorkspaceHandle in exhaustive search, release their GPU memory after exhaustive search.

test=develop

b9494058

26 4月, 2019 3 次提交
- X
  Detailed coordinate description for yolov3 loss (#17007) · 7da7881c
  由 xiaoting 提交于 4月 26, 2019
```
* Detailed coordinate description for yolov3 loss

test=develop

* modified api.spec

test=develop

* modified loss name

* fix api.spec

test=develop

* polish description

test=develop

* modified api.spec

test=develop
```
  7da7881c
- C
  test=develop, double backward leaky_relu (#17067) · 258e000b
  由 ceci3 提交于 4月 26, 2019
```
backward of backward: leaky_relu
```
  258e000b
- K
  
  fix interpolate cu. test=develop (#17101) · 10c487eb
  由 Kaipeng Deng 提交于 4月 26, 2019
  
  10c487eb
25 4月, 2019 2 次提交

Speedup roi_perspective_transform op by caching the information of linear... · 55ce36e9

由 whs 提交于 4月 25, 2019

Speedup roi_perspective_transform op by caching the information of linear interpolation in forward (#17090)

* Cache the information of linear interpolation in forward and use it in backward.
test=develop

* Fix cuda kernel.
test=develop

55ce36e9

Y
ParallelDyGraph with GPU collective mode (#16827) · 0b07eef1
由 Yan Xu 提交于 4月 25, 2019
```
implement dygraph.parallel.DataParallel to hook reduce op.
```
0b07eef1

23 4月, 2019 2 次提交

Z
Make conv cudnn workspace size configurable (#17036) · 0c335dcd
由 Zeng Jinle 提交于 4月 23, 2019
```
* make_conv_cudnn_ws_size_configurable, test=develop

* change std::max to std::min
test=develop
```
0c335dcd

Support backward of backward for Relu and add a new gradient checker by... · c1c2633a

由 qingqing01 提交于 4月 23, 2019

Support backward of backward for Relu and add a new gradient checker by comparing theoretical and numerical Jacobian. (#16862)

* Support backward of backward and a new gradient checker
* Rename decorators.py to decorator_helper.py, since Python on Windows CI has decorators package.

1. Add ReluDoubleGradMaker when register relu_grad.
2. Add a new gradient checker by comparing theoretical and numerical Jacobian.  Check double gradients by double_grad_check.

c1c2633a

22 4月, 2019 4 次提交
- T
  
  fix bug in save, test=develop · 45136b1b
  由 tangwei12 提交于 4月 22, 2019
  
  45136b1b
- J
  
  fix potential hung in generate proposals, test=develop · b2df6de8
  由 jerrywgz 提交于 4月 22, 2019
  
  b2df6de8
- Q
  Speed unit testing. (#16978) · ea42e431
  由 qingqing01 提交于 4月 22, 2019
```
* Speed affine_channel_op unit testing
* Add check in tensor_py
* Fix ONLY_CPU Compiling
```
  ea42e431
- J
  
  enhance generate proposal labels, test=develop · d3a66fc6
  由 jerrywgz 提交于 4月 22, 2019
  
  d3a66fc6
21 4月, 2019 1 次提交

Refine model gpu memory (#16993) · 1202d3fc

由 Zeng Jinle 提交于 4月 21, 2019

* speedup gc and inplace softmax_with_cross_entropy_grad
test=develop

* refine models gpu mem
Merge skip vars and warning messages of mem opt
remove relu mem opt
test=develop

* follow comments
test=develop

1202d3fc

20 4月, 2019 1 次提交

Support seq len equal to 0 in sequence ops (#16935) · 3c375751

由 Yibing Liu 提交于 4月 20, 2019

* Support seq len equal to 0 in sequence ops

test=develop

* Add more test cases

* Fix some comments

test=develop

* Fix py3 error

test=develop

3c375751

19 4月, 2019 1 次提交

Check some shapes only in runtime (#16919) · 36c05d36

由 Yibing Liu 提交于 4月 19, 2019

* Check some shapes only in runtime

test=develop

* Follow review comments

test=develop

* Update API spec

36c05d36

18 4月, 2019 1 次提交
- G
  
  Polish DGC code (#16818) · cbdb8a17
  由 gongweibao 提交于 4月 18, 2019
  
  cbdb8a17
17 4月, 2019 1 次提交
- T
  fix sampling id op bug (#16909) · 2b61db07
  由 tangwei12 提交于 4月 17, 2019
```
* fix sampling id op bug, test=develop
```
  2b61db07

PaddlePaddle / PaddleDetection 大约 1 年 前同步成功

PaddlePaddle / PaddleDetection
大约 1 年前同步成功