- 13 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
test=develop
-
- 07 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
- 10 1月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
* add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
-
- 12 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 28 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix ref_cnt pass, test=develop * add cpp unittests to reference_count_pass, test=develop * follow comments, test=develop
-
- 25 11月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 30 9月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 23 9月, 2019 1 次提交
-
-
由 chengduo 提交于
* Add RecordHistoryLocalExecScopes test=develop
-
- 04 9月, 2019 1 次提交
-
-
由 baojun 提交于
* enable ngraph throught build_strategy test=develop * add unittest test=develop * put use_ngraph unconditional test=develop * remove paddle_enforce test=develop * remove paddle_enforce test=develop * fix copyright test=develop * limit for ngraph only test=develop
-
- 29 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop
-
- 26 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* first version memory optimize pass, test=develop * remove move_tensor_sharing_pass, test=develop * refine code comments, add unittests, test=develop * turn off memory_optimize by default, test=develop * follow huihuang's comments, test=develop * follow chengduoZH's comments, test=develop * fix grammar error, add const qualifier, fix pass_test exception message, test=develop * follow chengduoZH's comments 2nd, test=develop
-
- 23 7月, 2019 1 次提交
-
-
由 chengduo 提交于
* support sparse gradients test=develop
-
- 11 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop
-
- 10 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* clean code of dim and place, test=develop * fix failed unittests, test=develop
-
- 06 6月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 08 5月, 2019 1 次提交
-
-
由 chengduo 提交于
* move pass to ir * polish code test=develop * fix dependency test=develop
-
- 23 4月, 2019 1 次提交
-
-
由 chengduo 提交于
* Add fuse momenutum ops
-
- 21 4月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop
-
- 18 4月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 30 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
* fix compiled test=develop * follow comments test=develop
-
- 28 3月, 2019 2 次提交
-
-
由 chengduo 提交于
* fuse optimizer
-
由 gongweibao 提交于
-
- 27 3月, 2019 2 次提交
-
-
由 Qiao Longfei 提交于
-
由 Wu Yi 提交于
* test fix fetch bar place for ce * fix ps mode dist train in develop test=develop * fix style check test=develop * update test=develop
-
- 22 3月, 2019 1 次提交
-
-
由 chengduo 提交于
* refine parallelExecutor test=develop * Polish op_handle test=develop * Remove unnecessary op_handle test=develop * Fix Travis CI test=develop * Fix fetch bug test=develop * Remove WaitInputVarGenerated * Fix OpHandleBase::Run test=develop * debug test=develop * use origin fetch_op_handle test=develop * Revert op_handle_base.cc test=develop * Polish code test=develop * Fix OpHandleBase::Run test=develop * code refine * test CI and CE test=develop * fix OpHandle::Run test=develop * refine AllReduceOpHandle test=develop * Polish code test=develop
-
- 20 3月, 2019 1 次提交
-
-
由 chengduo 提交于
* fuse all_reduce test=develop * add fuse_parameter_groups_size test=develop * Polish code test=develop * Fix travis-ci test=develop * Add SetGroupAccordingToLayers and SetGroupAccordingToGroupSize test=develop * Add SetGroupAccordingToMemorySize test=develop * fix multi_devices_graph test=develop * reset params_grads test=develop * Polish code test=develop
-
- 05 3月, 2019 2 次提交
-
-
由 sneaxiy 提交于
test=develop
-
由 Qiao Longfei 提交于
-
- 18 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
-
- 14 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
-
- 13 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
-
- 11 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
-
- 31 1月, 2019 2 次提交
- 27 1月, 2019 1 次提交
-
-
由 dzhwinter 提交于
-
- 21 1月, 2019 2 次提交
-
-
由 dzhwinter 提交于
-
由 Dun 提交于
* mem opt * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * test=develop * refine code test=develop * refine code test=develop * refine code test=develop * refine code test=develop * refine with cub test=develop * fix mkldnn test && remove comments && test=develop * polish code && test=develop * add only_forward test && test=develop
-
- 17 1月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
-
- 10 1月, 2019 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 07 1月, 2019 1 次提交
-
-
由 minqiyang 提交于
test=develop
-