- 15 11月, 2020 1 次提交
-
-
由 Shang Zhizhou 提交于
* 裁剪transformer模型trt支持;修复tensorRT不支持DeletePass的bug (#28517) * skip_layernorm_op done * add unittest * slice op convertor support trt < 6 * skip_layernorm only work in ernie * fix unittest
-
- 27 9月, 2020 1 次提交
-
-
由 QingshuChen 提交于
* support elementwise add, activation, matmul on Baidu Kunlun * test=kunlun * minor * test=kunlun * reconstuct the xpu directory * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun * minor * test=kunlun
-
- 23 9月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
* add fused_bn_add_relu op
-
- 16 9月, 2020 1 次提交
-
-
由 Leo Chen 提交于
-
- 21 8月, 2020 1 次提交
-
-
由 QingshuChen 提交于
* support Baidu AI Accelerator * test=kunlun * minor * test=kunlun * support xpu op in separate file * test=kunlun * update XPU error message and remove duplicated code * test=kunlun * minor * test=kunlun * minor * test=kunlun
-
- 13 8月, 2020 1 次提交
-
-
由 Leo Chen 提交于
* add unchaged infershape function * add broadcast infershape function * fix bug * rename infershape functions * add UnaryOpUnchangedInferShapeCheckAxis * add error message * add test for common infer shape functions * dont update existed ops * dont update op_desc.h * add more test * add error check, refine error message
-
- 06 8月, 2020 1 次提交
-
-
由 Adam 提交于
* Add oneDNN fusion_gru kernel and fix fc+gru pass test=develop * Formatting changes test=develop * Lint fixes test=develop * Add memory::format_tag::any to GRU weights test=develop * Fix build with CUDA * Fix build with CUDA v2
-
- 30 7月, 2020 1 次提交
-
-
由 wawltor 提交于
Update the code for the compare_ops, update the api and doc
-
- 03 4月, 2020 1 次提交
-
-
由 channings 提交于
* update linspace, equal operators to API 2.0, test=develop * equal support higher performance CUDA kernel, test=develop * update comment of equal&linspace operator, test=develop * update comment of equal&linspace operator, test=develop
-
- 11 3月, 2020 1 次提交
-
-
由 Zhaolong Xing 提交于
* 1. add embedding eltwise layernorm fuse 2. add embedding eltwise layernorm op 3. refine inplace_add_relu 4. refine fc_eltwise_layernorm test=develop * 1. refine fc test=develop * fix comments test=develop * fix comments test=develop
-
- 10 1月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
* add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
-
- 03 1月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop
-
- 09 12月, 2019 1 次提交
-
-
由 Leo Chen 提交于
* dygraph_grad_maker supports varbase without grad_var, test=develop * fix compile, test=develop * fix test_tracer, test=develop * follow comments, test=develop
-
- 27 11月, 2019 1 次提交
-
-
由 Michał Gallus 提交于
* Implement Int8 FC * Integrate FC into INT8v2 test=develop * int8 FC: transpose weights before computing scales test=develop * Add support for activation_type string in FC test=develop * Disable MKL-DNN's FC in VGG16 and 19 test=develop * Disable FC quantization when mkldnn FC is disabled test=develop * Solve PADDLE_ENFORCES in FC int8 * Fix Paddle enforces and remove const cast test=develop * Fix style changes test=develop * Fix quantizer_tester test and add fc quantization test=develop * Fix FC test fail on CUDA * Remove unnecessary log from quantize placement pass test=develop * Add Thread ID to FC hash key test=develop * Add comments to MKL-DNN FC Kernel test=develop * Refactor quantizer test=develop * Fix linter issues test=develop * Fix crash in slim googlenet test=develop * Fix PADDLE_ENFORCE messages test=develop
-
- 08 11月, 2019 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Add transpose2 INT8 for mkl-dnn test=develop * Fix test_transpose_int8_mkldnn test=develop * Revert "Merge branch 'develop' into transpose_int8_mkldnn_2" This reverts commit 34011bdb, reversing changes made to 2ce6473f. * Revert "Revert "Merge branch 'develop' into transpose_int8_mkldnn_2"" This reverts commit 23754dd7. * Add template to TransposeMKLDNNHandler test=develop * Resolve conflict test=develop * Restore get_size and refactor test=develop
-
- 02 10月, 2019 1 次提交
-
-
由 zhaoyuchen2018 提交于
* Add multihead op for ernie opt test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine softmax test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine kernel. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine cuda kernel test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine cuda version test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine code test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine cmake test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 29 9月, 2019 1 次提交
-
-
由 liym27 提交于
1.support asymmetric padding; 2.support padding algorithm:"SAME" and "VALID"; 3.support channel_last: data_format NHWC and NDHWC; 4.change doc of python API and c++; test=develop, test=document_preview
-
- 19 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add fc_elementwise_layernorm_fuse pass and unittest. * Add fused_fc_elementwise_layernorm op and its GPU kernel. test=develop * Apply fc_elementwise_layernorm_fuse_pass to GPU inference. * Add the setting of attrs in the definition of binary_op. test=develop * Add comment. * Implement the unittest. test=develop * Change the unittest name of layer_norm. test=develop
-
- 17 9月, 2019 1 次提交
-
-
由 chengjuntao 提交于
* add deformable conv v1 op, test=develop
-
- 11 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop
-
- 30 5月, 2019 1 次提交
-
-
由 Bai Yifan 提交于
* unit commits, test=develop * update API.spec, test=develop
-
- 28 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 15 3月, 2019 1 次提交
-
-
由 qingqing01 提交于
* Support Sync Batch Norm. * Note, do not enable it in one device. Usage: build_strategy = fluid.BuildStrategy() build_strategy.sync_batch_norm = True binary = fluid.compiler.CompiledProgram(tp).with_data_parallel( loss_name=loss_mean.name, build_strategy=build_strategy)
-
- 04 3月, 2019 1 次提交
-
-
由 dzhwinter 提交于
* staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop
-
- 27 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
* staged. * polish code * polish code. test=develop * polish code. test=develop * api change. test=develop * fix default value. test=develop * fix default value. test=develop
-
- 25 2月, 2019 1 次提交
-
-
由 liangan1 提交于
test=develop
-
- 29 1月, 2019 1 次提交
-
-
由 Krzysztof Binias 提交于
test=develop
-
- 28 12月, 2018 1 次提交
-
-
由 qingqing01 提交于
* Inception fusion operator. * Support horizontal layer fusion in conv_fusion_op. * Search conv algo strategy for variable-length input. search N times and cache the searched algos. For other input, choose the algo of input whose area is closest to this input.
-
- 18 12月, 2018 3 次提交
- 05 12月, 2018 1 次提交
-
-
由 Xin Pan 提交于
test=develop
-
- 26 11月, 2018 1 次提交
-
-
由 qingqing01 提交于
* Transpose-Flatten-Concat fusion operator. * Add unit testing and fix bug.
-
- 22 11月, 2018 1 次提交
-
-
由 wopeizl 提交于
* add recordio support * disable the openblas multi-thread on windows since no support adjust the python script * code style * code style test=develop * add create_recordio_file_reader back * fix code style test=develop * fix the gtest.cmake on windows * fix cc_test on windows * fix the win build test=develop * remove fused compile support on windows test=develop * add the jit support test=develop * add the jit support, test=develop * add the jit support, test=develop * add the jit back fix compile error on windows * rollback test=develop * test case fix * disable DSO by default on windows * exclude warpctc_op on windows * exclude the dynload_warpctc out on windows test=develop * fix the scripts error test=develop * disable avx on windows by default test=develop * re-organize the cmake file * disable mkl on windows by default * add warp_ctc back * fix the dependency * fix the dependency * fix the build issue on windows * remove unsupported flag on windows * code style * code style test=develop * fix issue * add profiler, parallel_executor back * clean up the pre-definitions on windows * fix build issue * test=develop
-
- 21 11月, 2018 1 次提交
-
-
由 peizhilin 提交于
-
- 19 11月, 2018 2 次提交
-
-
由 qingqing01 提交于
* Convolution fusion operator. * Clean code test=develop
-
由 Wu Yi 提交于
* fix dist deps test=develop * update test=develop * update test=develop * update test=develop * update test=develop
-
- 18 11月, 2018 1 次提交
-
-
由 peizhilin 提交于
fix compile error on windows
-
- 17 11月, 2018 1 次提交
-
-
由 peizhilin 提交于
-
- 16 11月, 2018 1 次提交
-
-
由 Wu Yi 提交于
* wip simplify operator framework * wip * wip * done test=develop * clean test=develop * fix test=develop * fix deps test=develop * fix cpu build test=develop * fix tensorrt build test=develop * fix tests test=develop * fix test=develop * fix cpu build test=develop
-