提交 · 8d6dc102febfd017fa1de440aa0cd1713b2958d3 · 机器未来 / Paddle

11 3月, 2020 1 次提交

[Ernie GPU Optimize]: Embedding_eltwise_layernorm Fuse (#22494) · 8d6dc102

由 Zhaolong Xing 提交于 3月 11, 2020

* 1. add embedding eltwise layernorm fuse
2. add embedding eltwise layernorm op
3. refine inplace_add_relu
4. refine fc_eltwise_layernorm
test=develop

* 1. refine fc
test=develop

* fix comments
test=develop

* fix comments

test=develop

8d6dc102

10 1月, 2020 1 次提交

Add bn and relu fuse pass (#22048) · 46189b16

由 Zhen Wang 提交于 1月 10, 2020

* add bn and relu fuse pass

* add op attr assert and dtype assert

* fix some inputs&&outputs bugs for the fused op and pattern.

* add the unittest for fuse_bn_act_pass. test=develop

* use normative enforce statements. test=develop

* add the cpu test. test=develop

* add the support of batch_size=1 for the bn with relu op. test=develop

* add the error type for paddle throws. test=develop

* add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop

46189b16

03 1月, 2020 1 次提交

Add the first implememtation of fusion_group op (#19621) · d4832077

由 Yiqun Liu 提交于 1月 03, 2020

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Refine the calling of PADDLE_ENFORCE.
test=develop

d4832077

09 12月, 2019 1 次提交

dygraph_grad_maker supports varbase without grad_var (#21524) · 84b72671

由 Leo Chen 提交于 12月 09, 2019

* dygraph_grad_maker supports varbase without grad_var, test=develop

* fix compile, test=develop

* fix test_tracer, test=develop

* follow comments, test=develop

84b72671

27 11月, 2019 1 次提交

INT8 Fully-connected (#17641) · 5d7d5482

由 Michał Gallus 提交于 11月 27, 2019

* Implement Int8 FC

* Integrate FC into INT8v2

test=develop

* int8 FC: transpose weights before computing scales

test=develop

* Add support for activation_type string in FC

test=develop

* Disable MKL-DNN's FC in VGG16 and 19

test=develop

* Disable FC quantization when mkldnn FC is disabled

test=develop

* Solve PADDLE_ENFORCES in FC int8

* Fix Paddle enforces and remove const cast

test=develop

* Fix style changes

test=develop

* Fix quantizer_tester test and add fc quantization

test=develop

* Fix FC test fail on CUDA

* Remove unnecessary log from quantize placement pass

test=develop

* Add Thread ID to FC hash key

test=develop

* Add comments to MKL-DNN FC Kernel

test=develop

* Refactor quantizer

test=develop

* Fix linter issues

test=develop

* Fix crash in slim googlenet

test=develop

* Fix PADDLE_ENFORCE messages

test=develop

5d7d5482

08 11月, 2019 1 次提交

Add transpose2 INT8 for mkl-dnn (#19424) · 77c20835

由 joanna.wozna.intel 提交于 11月 08, 2019

* Add transpose2 INT8 for mkl-dnn

test=develop

* Fix test_transpose_int8_mkldnn

test=develop

* Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"

This reverts commit 34011bdb, reversing
changes made to 2ce6473f.

* Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2""

This reverts commit 23754dd7.

* Add template to TransposeMKLDNNHandler

test=develop

* Resolve conflict

test=develop

* Restore get_size and refactor

test=develop

77c20835

02 10月, 2019 1 次提交

Add multihead op for ernie opt (#19933) · e8673668

由 zhaoyuchen2018 提交于 10月 02, 2019

* Add multihead op for ernie opt

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine softmax

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine kernel.

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine cuda kernel

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine cuda version

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine code

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

* Refine cmake

test=develop
Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>

e8673668

29 9月, 2019 1 次提交

fix conv2d and conv3d: (#20042) · 3aa331d9

由 liym27 提交于 9月 29, 2019

1.support asymmetric padding;
    2.support padding algorithm:"SAME" and "VALID";
    3.support channel_last: data_format NHWC and NDHWC;
    4.change doc of python API and c++;

    test=develop, test=document_preview

3aa331d9

19 9月, 2019 1 次提交

Add a pass to fuse fc+elementwise_add+layernorm (#19776) · 3cd985a6

由 Yiqun Liu 提交于 9月 19, 2019

* Add fc_elementwise_layernorm_fuse pass and unittest.

* Add fused_fc_elementwise_layernorm op and its GPU kernel.
test=develop

* Apply fc_elementwise_layernorm_fuse_pass to GPU inference.

* Add the setting of attrs in the definition of binary_op.
test=develop

* Add comment.

* Implement the unittest.
test=develop

* Change the unittest name of layer_norm.
test=develop

3cd985a6

17 9月, 2019 1 次提交
- C
  add deformable conv v1 op and cpu version of deformable conv v2 (#18500) · 00efd1d8
  由 chengjuntao 提交于 9月 17, 2019
```
* add deformable conv v1 op, test=develop
```
  00efd1d8
11 9月, 2019 1 次提交

Implement the GPU kernel of fc operator (#19687) · a65c728e

由 Yiqun Liu 提交于 9月 11, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

a65c728e

30 5月, 2019 1 次提交
- B
  Add deformable conv v2 op,test=develop (#17145) · bba57cdd
  由 Bai Yifan 提交于 5月 30, 2019
```
* unit commits, test=develop

* update API.spec, test=develop
```
  bba57cdd
28 3月, 2019 1 次提交
- G
  
  Add DGC(Deep Gradient Compression) interface. (#15841) · eb83abea
  由 gongweibao 提交于 3月 28, 2019
  
  eb83abea
15 3月, 2019 1 次提交

Support sync batch norm. (#16121) · 8ad672a2

由 qingqing01 提交于 3月 15, 2019

* Support Sync Batch Norm.
* Note, do not enable it in one device.

Usage:

build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True
binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
        loss_name=loss_mean.name,
        build_strategy=build_strategy)

8ad672a2

04 3月, 2019 1 次提交

由 dzhwinter 提交于 2月 27, 2019

* staged.

* polish code

* polish code. test=develop

* polish code. test=develop

* api change. test=develop

* fix default value. test=develop

* fix default value. test=develop

4449e855

27 2月, 2019 1 次提交

由 dzhwinter 提交于 2月 27, 2019

* staged.

* polish code

* polish code. test=develop

* polish code. test=develop

* api change. test=develop

* fix default value. test=develop

* fix default value. test=develop

225c11a9

25 2月, 2019 1 次提交
- L
  Enable function coverage for U8/S8 ConvMKLDNNOpKernel · 4acc5220
  由 liangan1 提交于 2月 25, 2019
```
test=develop
```
  4acc5220
29 1月, 2019 1 次提交
- K
  Make separate folders for mkldnn codes · b1bdcd4d
  由 Krzysztof Binias 提交于 1月 28, 2019
```
test=develop
```
  b1bdcd4d
28 12月, 2018 1 次提交

Inception fusion operator. (#14968) · 6f0a1d7b

由 qingqing01 提交于 12月 28, 2018

* Inception fusion operator.
* Support horizontal layer fusion in conv_fusion_op.
* Search conv algo strategy for variable-length input.
   search N times and cache the searched algos. For other input, choose the algo of input whose area is closest to this input.

6f0a1d7b

18 12月, 2018 3 次提交
- P
  
  add ctc support for windows · 19ebd8b4
  由 peizhilin 提交于 12月 18, 2018
  
  19ebd8b4
- P
  include the mkl fix only · b601f2de
  由 peizhilin 提交于 12月 18, 2018
```
test=develop
```
  b601f2de
- P
  
  add mkl,ctc support for windows · 5a6d7fe2
  由 peizhilin 提交于 12月 18, 2018
  
  5a6d7fe2
05 12月, 2018 1 次提交
- X
  allow customize kernel selection · 41c28d54
  由 Xin Pan 提交于 12月 05, 2018
```
test=develop
```
  41c28d54
26 11月, 2018 1 次提交
- Q
  Transpose-Flatten-Concat fusion operator. (#14568) · 6224e61f
  由 qingqing01 提交于 11月 26, 2018
```
* Transpose-Flatten-Concat fusion operator.
* Add unit testing and fix bug.
```
  6224e61f
22 11月, 2018 1 次提交

Windows/online (#14474) · d9a1f3e5

由 wopeizl 提交于 11月 22, 2018

* add recordio support

* disable the openblas multi-thread on windows since no support
adjust the python script

* code style

* code style
test=develop

* add create_recordio_file_reader back

* fix code style
test=develop

* fix the gtest.cmake on windows

* fix cc_test on windows

* fix the win build
test=develop

* remove fused compile support on windows
test=develop

* add the jit support
test=develop

* add the jit support, test=develop

* add the jit support, test=develop

* add the jit back
fix compile error on windows

* rollback test=develop

* test case fix

* disable DSO by default on windows

* exclude warpctc_op on windows

* exclude the dynload_warpctc out on windows
test=develop

* fix the scripts error
test=develop

* disable avx on windows by default
test=develop

* re-organize the cmake file

* disable mkl on windows by default

* add warp_ctc back

* fix the dependency

* fix the dependency

* fix the build issue on windows

* remove unsupported flag on windows

* code style

* code style
test=develop

* fix issue

* add profiler, parallel_executor back

* clean up the pre-definitions on windows

* fix build issue

* test=develop

d9a1f3e5

21 11月, 2018 1 次提交
- P
  
  clean up the pre-definitions on windows · 6e66fadb
  由 peizhilin 提交于 11月 21, 2018
  
  6e66fadb
19 11月, 2018 2 次提交

Q
Convolution fusion operator. (#14449) · fd7e6431
由 qingqing01 提交于 11月 19, 2018
```
* Convolution fusion operator.
* Clean code
test=develop
```
fd7e6431

fix dist deps (#14471) · d7bd0361

由 Wu Yi 提交于 11月 19, 2018

* fix dist deps test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

d7bd0361

18 11月, 2018 1 次提交
- P
  add the jit back · a3e952f4
  由 peizhilin 提交于 11月 18, 2018
```
fix compile error on windows
```
  a3e952f4
17 11月, 2018 1 次提交
- P
  
  add the jit support, test=develop · 928efeed
  由 peizhilin 提交于 11月 17, 2018
  
  928efeed
16 11月, 2018 3 次提交

Refine operator cmake (#14413) · a2d9b344

由 Wu Yi 提交于 11月 16, 2018

* wip simplify operator framework

* wip

* wip

* done test=develop

* clean test=develop

* fix test=develop

* fix deps test=develop

* fix cpu build test=develop

* fix tensorrt build test=develop

* fix tests test=develop

* fix test=develop

* fix cpu build test=develop

a2d9b344

Make nce support more distribution. (#13549) · 17226782

由 whs 提交于 11月 16, 2018

* Fix truncated normal.

* Fix.

* Make nce support more distribution.

* Fix API.spec.

* Fix python API.

* Fix.
test=develop

* Fix API.spec
test=develop

* Fix sampler.

* Fix order of arguments in python API.
test=develop

17226782

Add cudnn ctc loss (#12366) · b32c13dc

由 Wu Yi 提交于 11月 16, 2018

* add cudnn ctc loss

* wip add test test=develop

* wip

* wip

* done test=develop

* move include cudnn test=develop

* test test=develop

* fix build test=develop

* fix build test=develop

* fix build on cudnn5 test=develop

* fix cudnn5 build test=develop

* fix cudnn5 build test=develop

* merge develop softmax functor change test=develop

b32c13dc

15 11月, 2018 1 次提交
- P
  
  add recordio support · d1429ac4
  由 peizhilin 提交于 11月 15, 2018
  
  d1429ac4
13 11月, 2018 1 次提交
- M
  
  Add CMake deps · b0afdc4e
  由 minqiyang 提交于 11月 13, 2018
  
  b0afdc4e
12 11月, 2018 1 次提交
- Y
  perf(compile): speed up reduce_op compile by splitting files (#14294) · 8f9bfad2
  由 Yu Yang 提交于 11月 12, 2018
```
test=develop
```
  8f9bfad2
09 11月, 2018 2 次提交

P

simplify the logic · 7638f0af
由 peizhilin 提交于 11月 09, 2018

7638f0af

Add lod tensor array to tensor op (#13990) · 688ed601

由 li099 提交于 11月 09, 2018

* add lod tensor array concat

* add lod tensor array concat

* test=develop

* add lod tensor array concat
test=develop

* Fix API.spec
test=develop

* add lod tensor array concat
test=develop

* revise some bug of lod tensor array concat
test=develop

* add unittest for tensor array concat
test=develop

* change to tensor array to tensor
test=develop

* revise bug
test=develop

* revise a bug
test=develop

* revise a bug
test=develop

* revise a bug of python3
test=develop

688ed601

08 11月, 2018 2 次提交

Fix input<tensor> (#14208) · c5b6573a

由 chengduo 提交于 11月 08, 2018

* fix input<tensor>
test=develop

* fix split_ids
test=develop

* ElementwiseMul should not support SelectedRows

* fix scale op
test=develop

* change GetTensorFromVar() method to GetTensorOrSelectedRowsFromVar()

* fix operator

* refine MultiOutput

* fix MultiOutput
test=develop

* disable test_dist_save_load
test=develop

* fix elementwise_op
test=develop

* add get_sparse_as_op
test=develop

* add info for check
test=develop

* rename get_sparse_as_op with extract_rows_as_op.
test=develop

* elementwise doesn't support selected_rows

* fix regularizer

* remove extract_rows_as
test=develop

* fix ci
test=develop

* add test for sum_op

* fix regularizer
test=develop

*  test=develop

* fix pserver weight decay multi inputs test=develop

c5b6573a

Z

Revert "cherry picked windows patches." · ba8b5619
由 Zhaolong Xing 提交于 11月 08, 2018

ba8b5619

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致