提交 · c670058a8dddd2db49dd49b2b4138b1e2b63d5f9 · Crayon鑫 / Paddle

25 9月, 2019 6 次提交

add support of matmul with multiple head even different width and height (#19708) · c670058a

由 Bob Zhu 提交于 9月 25, 2019

* add support of matmul with multiple head even different width and height

Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.

One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]

test=develop

* add support of matmul with multiple head even different width and height

Original matmul with multiple head supports only the mat_a.width == mat_b.height,
in that case, mat_b will be horizontally split. In this patch, we extend the
support when mat_a.width != mat_b.height but mat_a.width/head_number == mat_b.height,
in this case, mab_b will be vertically split.

One example is A is [3, 8], B is [2, 16], head_number is 4. In this
case, A will be split as [3, 2], B will be (vertically) split as
[2, 4]. The final result will be 4 matrix of 4 matrix of [3,4], i.e. [3, 16]

test=develop

* refactor the code of matmul with multiple head even different width and height

test=develop

c670058a

L
refine ctc align op with padding (#19926) · 6884dc80
由 Liufang Sang 提交于 9月 25, 2019
```
* refine ctc align op with padding 
* refine api sample code
```
6884dc80

FIx C++ inference BUG: When open memory optim and enable trt subgraph at the... · e89b1288

由 Zhaolong Xing 提交于 9月 25, 2019

FIx C++ inference BUG: When open memory optim and enable trt subgraph at the same time, there is a bug (#19969)

* fix memory optimization type
test=develop

* 1. fix BUG: open trt and memory optim will trigger bug.
2. Clean memory optim bug.
test=develop

e89b1288

Add support for new QAT models (#18970) · 4286a627

由 Wojciech Uss 提交于 9月 25, 2019

* Add support for new QAT models

test=develop
Co-Authored-By: NMichał Gallus <michal.gallus@intel.com>
Co-Authored-By: NWojciech Uss <wojciech.uss@intel.com>

* fixed fps results

test=develop

* fix top5 accuracy drop problem

* updated for new QAT models

* skip quantizing average pooling - dirty but working

* add missing pass

* added missing conv+brelu fuse pass

* removed a call to non-existent pass

test=develop

* renamed pass

test=develop

* Adjust finding pooling scale to newest QAT models

* Remove unnecessary code from quantization_mkldnn_pass

* Copy Pooling input scale to output scale in QAT

* Refactor & remove unused code in QAT

* Incorporate fp32 FC into QAT

test=develop

* Enable graph drawing with debug flag

test=develop

* Add tests for QATv2

* Fix paths for QATv2 models

test=develop

* Add option to save transformed int8 qat model

test=develop

* Remove redundant lines from qat mkldnn pass

test=develop

* Delegate disablement of avg pooling to qat

test=develop

* fix CI bug, test=develop

* Follow Wangzhen's Review, test=develop

* Update API.spec

test=develop

* Name False in (is_unsigned, TensorScale) tuple

test=develop

4286a627

Removing length dims constraints of seq_pad and seq_unpad (#19497) · 99a9615a

由 Aurelius84 提交于 9月 25, 2019

* Removing last dims constraints of seq_pad and seq_unpad test=develop

* fix test_layer api code test=develop

* fix sequence_pad_op.cc conflict test=develop

* remove test_analyzer_mm_dnn test=develop

* fix vectorize bug test=develop

* fix vectorize<int> test=develop

99a9615a

C
polish multi process warning info (#19961) · cca26f5c
由 chengduo 提交于 9月 25, 2019
```
test=develop
```
cca26f5c

24 9月, 2019 14 次提交

Y
update en document of shard_index_op (#19963) · 2efdf0ef
由 Yi Liu 提交于 9月 24, 2019
```
test=develop
test=document_fix
```
2efdf0ef
J

add optimizer:dpsgd,test=develop (#19915) · 766bd529
由 jhjiangcs 提交于 9月 24, 2019

766bd529
Z

fix cuda dev_ctx allocator cmake deps, test=develop (#19953) · 37f76407
由 Zeng Jinle 提交于 9月 24, 2019

37f76407

Add float16 support to `sync_batch_norm_op` (#19681) · ebff68fa

由 Yang Zhang 提交于 9月 24, 2019

* Add float16 support to `sync_batch_norm_op`

test=develop

* Add test for sync_bn with FP16 input

test=develop

ebff68fa

Remove constraint that last dimension is forced to be 1 by adding lookup_table_v2 (#19735) · 039b9710

由 Aurelius84 提交于 9月 24, 2019

* Remove constraint that last dimension is forced to be 1 by add
lookup_table_v2 test=develop

* modify into PADDLE_ENFORCE_CUDA_SUCCESS test=develop

* Revert "modify into PADDLE_ENFORCE_CUDA_SUCCESS test=develop"

This reverts commit 8a960bfc61e51aa27c3c529df8fb90b93ebd19f9.

* move api into fluid.embedding test=develop

* fix example code test=develop

* move one_hot into fluid.one_hot

* modify api.spec test=develop

* fix loss shape test=develop

039b9710

Z

fix allocator ut,test=develop (#19945) · 80e0f547
由 Zeng Jinle 提交于 9月 24, 2019

80e0f547

[PaddleSlim] Enhence compressor api in PaddleSlim (#19894) · bdb3e376

由 whs 提交于 9月 24, 2019

1. Support customize eval function instead of eval program.
2. Fix loading checkpoint in quantization strategy.
3. Support saving eval model when saving a checkpoint.
4. Fix decoder of loading context in PaddleSlim.
5. Fix restoring from the checkpoint of uniform prune strategy.
6. Support saving eval model and infer model during training.
7. Add ‘unitest’ for saving eval model, saving infer model and uniform pruning restoring from the checkpoint.
8. Fix pruning of depthwise_conv_grad op by updating the groups.

bdb3e376

support change shuffle and train thread num (#19841) · cedc0477

由 xujiaqi01 提交于 9月 24, 2019

* support change shuffle thread num
* support change train thread num
* fix receive shuffle data of each channel
* data norm stop gradient
* add check thread_tensor type and root_tensor type when merge metric
* remove sleep in shuffle, add config
* add config of pslib client to client communication
* fix xbox str
* add data norm op testcase
* add flush in trainer finalize

cedc0477

K

add elementwise mod support float/double. test=develop (#19570) · 14625ffe
由 Kaipeng Deng 提交于 9月 24, 2019

14625ffe

- ReImplemented pooling fwd mkldnn (#19911) · 5b07ca9c

由 Jacek Czaja 提交于 9月 24, 2019

- First implementation of BWD and FWD of pooling mkl-dnn

- Compilation fix

- Fix

- Fix

 - Fix

- Fix to crash

- Compilation fix

- Combined AcquireBacward with Fwd

test=develop

5b07ca9c

Z

fix huber loss op attr type, test=develop (#19937) · b1e83b33
由 Zeng Jinle 提交于 9月 24, 2019

b1e83b33
Z

add inplace to assign op, test=develop (#19927) · cc157d59
由 Zeng Jinle 提交于 9月 24, 2019

cc157d59
C
clean tensor array (#19930) · 55ce6969
由 chengduo 提交于 9月 24, 2019
```
test=develop
```
55ce6969

Make OpTest check grad inplace even if forward has no inplace (#19847) · 57606205

由 Leo Chen 提交于 9月 24, 2019

* make OpTest check grad inplace even if forward has no inplace, test=develop

* do not run PE when enable_inplace is False, test=develop

* add conv3d cuda kernel for float16 type, test=develop

* refactor OpTest for inplace, test=develop

* add comments, test=develop

57606205

23 9月, 2019 10 次提交

Z

resize Ops support data_layout:channel_last, test=develop, test=document_preview (#19914) · cb8f3c03
由 Zhang Ting 提交于 9月 23, 2019

cb8f3c03

Forward recompute3 (#19913) · 9901f696

由 mapingshuo 提交于 9月 23, 2019

* add recompute based checkpoints methods for large batch training
test=develop

* add append_backward_with_forward_recomputation
test=develop

* refine optimizer
test=develop

* update backward and optimizer
test=develop

* make Variable usable
test=develop

* add recompute code

* refine optimizer
test=develop

* refine addup _append_backward_ops_with_checkpoints_
1) for recompute part, just cache the grad_op_desc without appending to block
2) before appending grad_op_desc to backward part, addup_repetitive_vars, remove unused branch
test=develop

* make method private

* add recompute strategy into DistributedStrategy
test=develop

* checkpoint version3
test=develop

* remove some print information
test=develop

* remove unused sumop
test=develop

* try to fix recompute with graph building modules

* add input names to vars should be held

* add memory debug tool

* backup backward

* Fix bugs

* add backward desc for op not in any segments

* add exception info for sub_block

test=develop

* modify code style

test=develop

* modify code style

test=develop

* remove print functions

test=develop

* add API spec

test=develop
test=document_preview

* make Recompute a child class of Optimizer

test=develop
test=document_preview

* add API spec

test=develop
test=document_preview

* modify API spec

test=develop
test=document_preview

* add document for Recompute

test=develop
test=document_preview

* change API doc of Rcompute

test=develop
test=document_preview

* code cleaning

test=develop
test=document_preview

* modify API spec

* fix bugs when segments hold no element

* add testcase for Recompute Optimizer

test=develop
test=document_preview

* add test for apply_gradient, and code cleaning

test=develop
test=document_preview

* add test case for load function

* enable CI

test=develop
test=document

* add test case

test=develop
test=document_preview

* add sample code for 4 function of recompute optimizer

test=develop
test=document_preview

9901f696

C
Delete local execution scopes (#19749) · d7251a8e
由 chengduo 提交于 9月 23, 2019
```
* Add RecordHistoryLocalExecScopes
test=develop
```
d7251a8e
W
remove the useless warning for user to avoid confuse test=develop (#19871) · 5452b6a1
由 wopeizl 提交于 9月 23, 2019
```
* remove the useless warning for user to avoid confuse test=develop
```
5452b6a1
R
add mse_loss (#19759) · d31c92a2
由 ruri 提交于 9月 23, 2019
```
* add mse_loss op
```
d31c92a2

Add op compatible information (#19910) · 85b398f1

由 hong 提交于 9月 23, 2019

* add op compatible infomation; test=develop

* add enum type

* add enum type; test=develop

85b398f1

K
fix softmax CE time limit check failed (#19846) · 3f021781
由 Kaipeng Deng 提交于 9月 23, 2019
```
* fix softmax ce time limit check failed. test=develop

* refine softmax calc. test=develop
```
3f021781

move tree_conv to fluid.contrib.layers (#19918) · a4919d36

由 Tao Luo 提交于 9月 23, 2019

* move tree_conv to fluid.contrib.layers

test=develop

* update API.spec for tree_conv

test=develop

* update tree_conv api to increase unit coverage

test=develop

a4919d36

石

tensor_array_to_tensor_op.cc, test=develop (#19289) · 30adea0a
由石晓伟提交于 9月 23, 2019

30adea0a

Unify DataLoader APIs (#19305) · 0436efd6

由 Zeng Jinle 提交于 9月 23, 2019

* unify DataLoader APIs, test=develop

* integrate iterable CPU Dataset, test=develop
add GPU dataset supporting, test=develop

* add unittests for dataset, test=develop

* add more docs to dataloader apis, test=develop, test=document_preview

* refine doc, test=develop

* refine doc again, test=develop

* increase coverage, test=develop

0436efd6

22 9月, 2019 2 次提交
- L
  add instance norm (#19500) · 4155e625
  由 lvmengsi 提交于 9月 22, 2019
```
* add instance norm op
```
  4155e625
- Z
  Add lock to cudnn handle calls (#19845) · c7f36e7c
  由 Zeng Jinle 提交于 9月 22, 2019
```
* refine reallocate of workspace size, test=develop

* add lock to cudnn handle calls, test=develop
```
  c7f36e7c
21 9月, 2019 6 次提交

P
Add two extra flags for test_analyzer_int8_image_classification to disable fp32/int8 (#19840) · 2c5c6365
由 pawelpiotrowicz 提交于 9月 21, 2019
```
test=develop
```
2c5c6365
A
Add support for other axes in MKLDNN softmax op (#19907) · cb65439d
由 Adam 提交于 9月 21, 2019
```
* Initial, functional commit

* Clean commit related files
test=develop
```
cb65439d

Feature/auto prune in dygraph (#19757) · 45425411

由 Jiabin Yang 提交于 9月 21, 2019

* refactor dygraph,test=develop

* fix failed unittest,test=develop

* polish code,test=develop

* check windows ci error,test=develop
try to fix windows ci error by np.allclose,test=develop

* polish vlog and profiler, test=develop

* try to fix preceding ops order,test=develop

* test transformer in windows ci, test=develop

* use python c-api to speed up tracer.trace,test=develop

* test=develop, fix docker with paddle nccl problem

* test=develop, add ut for debug string and gradient_accumulator

* test=develop, add tests for layer/gradient_accumulator/prepared_op

* test=develop, fix complie error for test_prepared_op

* test=develop, add more ut for dygraph

* test=develop, create API.spec for dygraph api change

* test=develop, refoctor name to make it easier to understand

* test=develop, refoctor name to make it easier to understand

* test=develop, fix multi-gpu failed problem , add Tracer tests, change PADDLEENFORCE to PADDLEENFORCE_EQ

* test=develop, fix ut failed on parallel se-resnext

* test=develop, change one more PADDLE_ENFORCE

* support auto prune in dygraph mode

* test=develop, support auto prune

* test=develop, merge develop conflict

* test=develop, fix test_layer and test_tracer ut

* test=develop, fix bug which may cause stop_gradient disabled with a list of backward inputs

45425411

A

move match_matrix var_conv2d et.al api into fluid.contrib test=develop (#19859) · 418a0967
由 Aurelius84 提交于 9月 21, 2019

418a0967
P
Add TRT input shape check between model and runtime (#19864) · baccd7e2
由 Pei Yang 提交于 9月 21, 2019
```
* add TRT shape check, test=develop

* model_input_shape == runtime_input_shape, refine message, test=develop
```
baccd7e2
P
Fix BUGS: paddle-TRT repeatedly sets weight_map and overdeletes repetitive_params (#19825) · 74812d1c
由 Pei Yang 提交于 9月 21, 2019
```
* fix trt bugs when sharing params, test=develop

* add unittest for cascade_rcnn
```
74812d1c

20 9月, 2019 2 次提交

Refine err msg of out of gpu memory (#19779) · 747d4498

由 Zeng Jinle 提交于 9月 20, 2019

* refine err msg of out of gpu memory, test=develop

* refine err msg again, test=develop

* refine errog message again, test=develop

* follow reviewer's comments, test=develop

747d4498

A
support 2-level lod of input in sequence_pool (#19839) · fcf53e55
由 Aurelius84 提交于 9月 20, 2019
```
* support 2-level lod of input in sequence_pool test=develop

* fix lod level bug in .cu test=develop
```
fcf53e55

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致