- 19 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add fc_elementwise_layernorm_fuse pass and unittest. * Add fused_fc_elementwise_layernorm op and its GPU kernel. test=develop * Apply fc_elementwise_layernorm_fuse_pass to GPU inference. * Add the setting of attrs in the definition of binary_op. test=develop * Add comment. * Implement the unittest. test=develop * Change the unittest name of layer_norm. test=develop
-
- 18 9月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
-
由 Zeng Jinle 提交于
* fix memory reuse bug on feeding variables, test=develop * add comments to reference count members, test=develop
-
- 16 9月, 2019 2 次提交
-
-
由 chengduo 提交于
* fix warning info test=develop * fix bug of all_reduce_deps_pass test=develop
-
由 Yiqun Liu 提交于
* Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop * Enhance fc_fuse_pass to enable fusing relu. * Allow print the shapes of var_desc in graph. test=develop * Enhance fc_fuse_pass_tester. * Remove the use of PADDLE_ENFORCE. test=develop * Correct the number of ops after fusing. test=develop * Fix a typo. test=develop * Set activation_type to null when there is no relu in fc. test=develop * Refine fc_fuse_pass's codes. * Enable the set of shape for tensor. * Refine repeated_fc_relu_pass and add unittest. test=develop
-
- 13 9月, 2019 1 次提交
-
-
由 chengduo 提交于
* Open fuse all reduce op test=develop * Add Fuse optimization op log * Add log in fuse_optimizer op pass and fuse all_reduce op pass * replace with boost::optional<bool> test=develop * Polish code test=develop * fix code coverage test=develop
-
- 11 9月, 2019 3 次提交
-
-
由 chengduo 提交于
* fix vlog level and fuse option type test=develop
-
由 Yiqun Liu 提交于
* Refine the codes related to fc op. * Add GPU implementation for fc functor. * Apply fc_fuse_pass in GPU inference. test=develop * Change the cmake for fc op. * Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ. * Add an attribute to set the activation type in fc_op. * Enhance the unittest of fc_op. test=develop * Remove the declaration of FCOpGrad back to the header file. test=develop * Set default value for newly added arguments in test_fc_op. test=develop
-
由 chengduo 提交于
* Enable fused_all_reduce_op_handle support GPU and CPU Gradients
-
- 06 9月, 2019 1 次提交
-
-
由 wangchaochaohu 提交于
* test=develop codegen for fused elementwise operation * fix test=develop
-
- 04 9月, 2019 1 次提交
-
-
由 baojun 提交于
* enable ngraph throught build_strategy test=develop * add unittest test=develop * put use_ngraph unconditional test=develop * remove paddle_enforce test=develop * remove paddle_enforce test=develop * fix copyright test=develop * limit for ngraph only test=develop
-
- 03 9月, 2019 2 次提交
-
-
由 Tao Luo 提交于
test=develop
-
由 Yiqun Liu 提交于
* Add a interface to enable cudnn for inference. * Add cudnn_placement_pass. test=develop * Set the default value of cudnn_enabled_op_types to null. test=develop * Write the common basic class, placement_pass_base, to refine the codes. test=develop * Call EnableCUDNN in unittest. test=develop * Refine cudnn_placement_pass tester. * Enable the testing of cudnn_placement_pass in inference's unittest. test=develop * Add the check of op kernels. test=develop
-
- 30 8月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add simplify_with_basic_ops_pass to replace dropout_op with scale_op when is_test is true. test=develop * Delete dropout_op directly when upscale_in_train is true. test=develop * Improve the debug string, adding the print of op_desc information. * Fix the case when dropout's input x is reused as the next op's output. * Add the pass to inference. test=develop * Change the log level. test=develop * Add unittest for inplace case. * Add comment to explain the pass. * Apply the pass for CPU inference. test=develop * Fix the typo. test=develop * Add the check of AttrType. test=develop
-
- 28 8月, 2019 1 次提交
-
-
由 tangwei12 提交于
* fix correctness of the communicator * fix a bug in send thread when sending var context is empty, test=develop * add lookup_table_prefetch_op and prefetch optimize, test=develop * remove remote prefetch GPU supported * word2vec force with CPU, test=develop * test dist remote lookup table force with CPU, test=develop
-
- 27 8月, 2019 1 次提交
-
-
由 joanna.wozna.intel 提交于
-
- 23 8月, 2019 1 次提交
-
-
由 Tao Luo 提交于
test=develop
-
- 21 8月, 2019 1 次提交
-
-
由 Adam 提交于
* Add generalized Conv+Activation MKLDNN fuse pass creation Part2 test=develop * Undefined behaviour of GetAttrIfExists<> FIX test=develop
-
- 19 8月, 2019 3 次提交
-
-
由 Zhaolong Xing 提交于
* fix mask rcnn bug: 1. affine channel fuse (diff) 2. condition block op (memory leak) 3. merge lod tensor op (diff) 4. memroy optim (diff) test=develop * fix ci aboud PADDLE_ENFOCE fix merge lod infer op ut test=develop
-
由 liuwei1031 提交于
* fix compilation issue in windows vs2017, test=develop * fix gtest lib not found issue, test=develop
-
由 juncaipeng 提交于
remove the warning for reminding user to avoid using the OriginProgram method, test=develop (#19244) This log information may annoy users who don't need to care about it.
-
- 15 8月, 2019 1 次提交
-
-
由 Adam 提交于
test=develop
-
- 13 8月, 2019 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Add requantize squash test=develop * Add more precise tests test=develop * REname and REfactor tester test=develop
-
- 12 8月, 2019 2 次提交
-
-
由 joanna.wozna.intel 提交于
test=develop
-
由 chengduo 提交于
test=develop
-
- 09 8月, 2019 1 次提交
-
-
由 chengduo 提交于
* Enhance fuse optimization op pass test=develop
-
- 06 8月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 02 8月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
* open gc by default, test=develop * fix test_train_recognize_digits and disable gc when ngraph is enabled, test=develop * fix conditional_block op eager deletion bug, test=develop * add some comments to reviewers, test=develop
-
由 石晓伟 提交于
* add fusion_seqpool_cvm_concat test=develop * simplify pass, test=develop * fix code style, test=develop
-
- 29 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop
-
- 27 7月, 2019 1 次提交
-
-
由 chengduo 提交于
* open fuse optimization ops test=develop
-
- 26 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* first version memory optimize pass, test=develop * remove move_tensor_sharing_pass, test=develop * refine code comments, add unittests, test=develop * turn off memory_optimize by default, test=develop * follow huihuang's comments, test=develop * follow chengduoZH's comments, test=develop * fix grammar error, add const qualifier, fix pass_test exception message, test=develop * follow chengduoZH's comments 2nd, test=develop
-
- 24 7月, 2019 1 次提交
-
-
由 Zhaolong Xing 提交于
* update paddle-trt for: 1. fix bug: when batch > 2, core in split plugin. 2. add leaky_relu trt5.0 support (yolov3 from 65ms to 42ms.) 3. add new attr to dropout. 4. shuffle channel, swish, relu6 support test=develop * 1. fix ci test=develop
-
- 23 7月, 2019 1 次提交
-
-
由 chengduo 提交于
* support sparse gradients test=develop
-
- 19 7月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
Test PaddingRNN on V100 GPU device. Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU. GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR) Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)
-
- 11 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop
-
- 08 7月, 2019 2 次提交
-
-
由 Zhaolong Xing 提交于
* Fix Mask rcnn predictor 1. refine memory optim algorithm to support the model with the block op. 2. output diff : modify the affine channel fuse 3. add condition_block_infer op add interface for setting trt calib table dir test=develop * add the missing files. test=develop
-
由 gongweibao 提交于
-
- 04 7月, 2019 1 次提交
-
-
由 chengduo 提交于
-
- 01 7月, 2019 1 次提交
-
-
由 Michał Gallus 提交于
* Int8: Fix Pooling output scale test=develop * Update scales quantization for certain operators These include: concat, transpose, pool and reshape. test=develop * Move concat minimum scale finding to quantizer test=develop
-