- 10 1月, 2020 2 次提交
-
-
由 GaoWei8 提交于
* Optimize the kernel implementation of layernorm with openmp (#20895) * Add ernie c++ inference test (#21015) * Add ernie unit test test=develop * Add ernie unit test test=develop * Add ernie unit test test=develop * remove ngraph * optimize gpu test test=develop * optimize codes test=develop * fix cmake fails on inference_download_and_uncompress (#21185) * solve cmake fails on inference_download_and_uncompress test=develop * solve cmake fails on inference_download_and_uncompress test=develop * Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop * Polish the codes of fc when needs padding (#21378) test=develop * Add ernie large c++ inference test (#21365) * add ernie-large test test=develop * add ernie large c++ inference test test=develop * Modify padding strategy: remove weight copy in fc padding (#21650) test=develop * optimize fc jit (#21878) test=develop Co-authored-by: NYihua Xu <yihuaxu@hotmail.com>
-
由 石晓伟 提交于
* fix multi-thread error of fc_gru_fuse_pass.cc, test=develop * export FLAGS and GLOG symbols, test=develop
-
- 08 1月, 2020 1 次提交
-
-
由 liu zhengxi 提交于
* fix seqconv_eltadd_relu pass during multi-threads predictor, test=develop * fix attention_lstm_fuse_pass during multi-threads inference, test=develop * fix embedding_fc_lstm_fuse_pass during multi-threads inference, test=develop * fix fc_lstm_fuse_pass during multi-threads inference, test=develop * fix seq_concat_fc_fuse_pass during multi-threads inference, test=develop
-
- 07 1月, 2020 1 次提交
-
-
由 Pei Yang 提交于
-
- 09 12月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
This reverts commit 0473cdb8.
-
- 02 12月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
-
- 26 11月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 25 11月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
* add pre condition check for fuse optimizer op pass, test=develop * add log & set init to zero, test=develop * fix test_fuse_all_reduce_pass failed, test=develop * polish details, test=develop * refine PADDLE_ENFORCE & remove needless VLOG, test=develop * refactor op check method, test=develop
-
- 07 11月, 2019 1 次提交
-
-
由 Wilber 提交于
[cherry-pick] fix squared_mat_sub_fuse_pass bug when elementwise_op input is persistable param test=develop test=release/1.6 (#21044) fix squared_mat_sub_fuse_pass bug when elementwise_op input is persistable param
-
- 30 10月, 2019 1 次提交
-
-
由 liu zhengxi 提交于
* add support to gcc8, add docker env * remove the warning issue
-
- 21 10月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 20 10月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
test=release/1.6
-
- 14 10月, 2019 2 次提交
-
-
由 Pei Yang 提交于
-
由 zhaoyuchen2018 提交于
* Add Multihead matmul fuse pass (#20167) * Add multihead fuse pass for ernie opt * Refine softmax test=develop * Refine cuda kernel * Refine cuda version * Refine cmake test=develop * refine header file * refine test case and pass * refine comments * Delete useless code. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
- 28 9月, 2019 1 次提交
-
-
由 bingyanghuang 提交于
* Follow Wangzhen's comment in PR 18970, test=develop * Review comments, test=develop * Leave fake quantization around mul test=develop * Replace Fake with Real Quantized Mul test=develop * Fix bug in quantize placement pass Nodes in the graph now have checked type instead of node name when they are to be marked for quantization test=develop
-
- 27 9月, 2019 2 次提交
-
-
由 joanna.wozna.intel 提交于
* Fix conv2d+dequantize squash for residual fusion test=develop * Disable conv-requant squash test=develop
-
由 wangchaochaohu 提交于
* codegen code for reconstruction test=develop * fix the cmake test=develop * fix review advice test=develop
-
- 26 9月, 2019 1 次提交
-
-
由 chengduo 提交于
Add dtype for coalesce_tensor_op
-
- 19 9月, 2019 2 次提交
-
-
由 joanna.wozna.intel 提交于
* Fix conv2d+dequantize squash for residual fusion test=develop * Change condition test=develop
-
由 Yiqun Liu 提交于
* Add fc_elementwise_layernorm_fuse pass and unittest. * Add fused_fc_elementwise_layernorm op and its GPU kernel. test=develop * Apply fc_elementwise_layernorm_fuse_pass to GPU inference. * Add the setting of attrs in the definition of binary_op. test=develop * Add comment. * Implement the unittest. test=develop * Change the unittest name of layer_norm. test=develop
-
- 18 9月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
-
由 Zeng Jinle 提交于
* fix memory reuse bug on feeding variables, test=develop * add comments to reference count members, test=develop
-
- 16 9月, 2019 2 次提交
-
-
由 chengduo 提交于
* fix warning info test=develop * fix bug of all_reduce_deps_pass test=develop
-
由 Yiqun Liu 提交于
* Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop * Enhance fc_fuse_pass to enable fusing relu. * Allow print the shapes of var_desc in graph. test=develop * Enhance fc_fuse_pass_tester. * Remove the use of PADDLE_ENFORCE. test=develop * Correct the number of ops after fusing. test=develop * Fix a typo. test=develop * Set activation_type to null when there is no relu in fc. test=develop * Refine fc_fuse_pass's codes. * Enable the set of shape for tensor. * Refine repeated_fc_relu_pass and add unittest. test=develop
-
- 13 9月, 2019 1 次提交
-
-
由 chengduo 提交于
* Open fuse all reduce op test=develop * Add Fuse optimization op log * Add log in fuse_optimizer op pass and fuse all_reduce op pass * replace with boost::optional<bool> test=develop * Polish code test=develop * fix code coverage test=develop
-
- 11 9月, 2019 3 次提交
-
-
由 chengduo 提交于
* fix vlog level and fuse option type test=develop
-
由 Yiqun Liu 提交于
* Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop
-
由 chengduo 提交于
* Enable fused_all_reduce_op_handle support GPU and CPU Gradients
-
- 06 9月, 2019 1 次提交
-
-
由 wangchaochaohu 提交于
* test=develop codegen for fused elementwise operation * fix test=develop
-
- 04 9月, 2019 1 次提交
-
-
由 baojun 提交于
* enable ngraph throught build_strategy test=develop * add unittest test=develop * put use_ngraph unconditional test=develop * remove paddle_enforce test=develop * remove paddle_enforce test=develop * fix copyright test=develop * limit for ngraph only test=develop
-
- 03 9月, 2019 2 次提交
-
-
由 Tao Luo 提交于
test=develop
-
由 Yiqun Liu 提交于
* Add a interface to enable cudnn for inference. * Add cudnn_placement_pass. test=develop * Set the default value of cudnn_enabled_op_types to null. test=develop * Write the common basic class, placement_pass_base, to refine the codes. test=develop * Call EnableCUDNN in unittest. test=develop * Refine cudnn_placement_pass tester. * Enable the testing of cudnn_placement_pass in inference's unittest. test=develop * Add the check of op kernels. test=develop
-
- 30 8月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true. test=develop * Delete dropout_op directly when upscale_in_train is true. test=develop * Improve the debug string, adding the print of op_desc information. * Fix the case when dropout's input x is reused as the next op's output. * Add the pass to inference. test=develop * Change the log level. test=develop * Add unittest for inplace case. * Add comment to explain the pass. * Apply the pass for CPU inference. test=develop * Fix the typo. test=develop * Add the check of AttrType. test=develop
-
- 28 8月, 2019 1 次提交
-
-
由 tangwei12 提交于
* fix correctness of the communicator * fix a bug in send thread when sending var context is empty, test=develop * add lookup_table_prefetch_op and prefetch optimize, test=develop * remove remote prefetch GPU supported * word2vec force with CPU, test=develop * test dist remote lookup table force with CPU, test=develop
-
- 27 8月, 2019 1 次提交
-
-
由 joanna.wozna.intel 提交于
-
- 23 8月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 21 8月, 2019 1 次提交
-
-
由 Adam 提交于
* Add generalized Conv+Activation MKLDNN fuse pass creation Part2 test=develop * Undefined behaviour of GetAttrIfExists<> FIX test=develop
-
- 19 8月, 2019 3 次提交
-
-
由 Zhaolong Xing 提交于
* fix mask rcnn bug: 1. affine channel fuse (diff) 2. condition block op (memory leak) 3. merge lod tensor op (diff) 4. memroy optim (diff) test=develop * fix ci aboud PADDLE_ENFOCE fix merge lod infer op ut test=develop
-
由 liuwei1031 提交于
* fix compilation issue in windows vs2017, test=develop * fix gtest lib not found issue, test=develop
-
由 juncaipeng 提交于
remove the warning for reminding user to avoid using the OriginProgram method, test=develop (#19244) This log information may annoy users who don't need to care about it.
-