- 15 11月, 2020 1 次提交
-
-
由 Shang Zhizhou 提交于
* 裁剪transformer模型trt支持;修复tensorRT不支持DeletePass的bug (#28517) * skip_layernorm_op done * add unittest * slice op convertor support trt < 6 * skip_layernorm only work in ernie * fix unittest
-
- 23 9月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
* add fused_bn_add_relu op
-
- 06 8月, 2020 1 次提交
-
-
由 Adam 提交于
* Add oneDNN fusion_gru kernel and fix fc+gru pass test=develop * Formatting changes test=develop * Lint fixes test=develop * Add memory::format_tag::any to GRU weights test=develop * Fix build with CUDA * Fix build with CUDA v2
-
- 11 3月, 2020 1 次提交
-
-
由 Zhaolong Xing 提交于
* 1. add embedding eltwise layernorm fuse 2. add embedding eltwise layernorm op 3. refine inplace_add_relu 4. refine fc_eltwise_layernorm test=develop * 1. refine fc test=develop * fix comments test=develop * fix comments test=develop
-
- 10 1月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
* add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
-
- 03 1月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop
-
- 30 10月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Move the codes of fused operators to operators/fused directory. test=develop * Correct the op name in cmake. * Change the use of PADDLE_ENFORCE. test=develop
-
- 19 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add fc_elementwise_layernorm_fuse pass and unittest. * Add fused_fc_elementwise_layernorm op and its GPU kernel. test=develop * Apply fc_elementwise_layernorm_fuse_pass to GPU inference. * Add the setting of attrs in the definition of binary_op. test=develop * Add comment. * Implement the unittest. test=develop * Change the unittest name of layer_norm. test=develop
-
- 03 1月, 2019 1 次提交
-
-
由 qingqing01 提交于
test=develop
-
- 28 12月, 2018 1 次提交
-
-
由 qingqing01 提交于
* Inception fusion operator. * Support horizontal layer fusion in conv_fusion_op. * Search conv algo strategy for variable-length input. search N times and cache the searched algos. For other input, choose the algo of input whose area is closest to this input.
-
- 26 11月, 2018 1 次提交
-
-
由 qingqing01 提交于
* Transpose-Flatten-Concat fusion operator. * Add unit testing and fix bug.
-
- 16 11月, 2018 1 次提交
-
-
由 Wu Yi 提交于
* wip simplify operator framework * wip * wip * done test=develop * clean test=develop * fix test=develop * fix deps test=develop * fix cpu build test=develop * fix tensorrt build test=develop * fix tests test=develop * fix test=develop * fix cpu build test=develop
-