- 02 9月, 2020 1 次提交
-
-
由 wanghuancoder 提交于
* optimized transformation form tensor to numpy, test=develop * Modify fetch op handle, from memcpy Sync to memcpy Async, test=develop * modify CUDAPinnedPlace to CPUPlace, test=develop * modify CPUPlace to CUDAPinnedPlace, and set default inplace to false, test=develop * revert fetch_op_handle, add fetch_async_op_handle, test=develop * revert fetch_op_handle, add fetch_async_op_handle, test=develop * fix error msg report, test=develop * fix bug in cpuplace, test=develop * fix bug in unmerge and tensorarray modle, test=develop * fix bug, double copy gpu memory, test=develop * fix chenweihang¡¯s review advice, test=develop
-
- 07 7月, 2020 1 次提交
-
-
由 hong 提交于
* cat bad alloc exception; test=develop * add unitest; test=develop * move bad alloc catch to the first place; test=develop * polish error message; test=develop * polish error message; test=develop * add mutex header; test=develop
-
- 14 4月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
* correct reader device index, test=develop * fix async executor scope var initialization, test=develop
-
- 09 4月, 2020 1 次提交
-
-
由 mozga-intel 提交于
* Remove the NGraph engine from PDPD repository 1. Each operator was removed from the operator's directory 2. Each test was removed from the unittest directory 3. The parallel executor support was removed from the PDPD 4. The CMake file was removed from the PDPD 5. The NG flags were removed from the repository test=develop * Remove ngraph from: 1. Cmake file 2. Python file test=develop
-
- 01 4月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 20 3月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
* sequential reader stage 1, test=develop * fix ut, test=develop * fix iterable=False reset bug, add some logs and polish code, test=develop * inference feed partial data, test=develop * Turn on keep_order=True for test, test=develop * enhance ut to test more cases, test=develop * test commit for reverting * Revert "test commit for reverting", test=develop This reverts commit 80aef42e. * add ut of merged and unmerged results, test=develop * add more uts for coverages and add en doc of api, test=develop * follow comments, test=develop * change note style, test=develop
-
- 13 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
test=develop
-
- 07 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
- 10 1月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
* add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
-
- 12 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 28 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix ref_cnt pass, test=develop * add cpp unittests to reference_count_pass, test=develop * follow comments, test=develop
-
- 25 11月, 2019 1 次提交
-
-
由 zhouwei25 提交于
-
- 30 9月, 2019 1 次提交
-
-
由 chengduo 提交于
test=develop
-
- 23 9月, 2019 1 次提交
-
-
由 chengduo 提交于
* Add RecordHistoryLocalExecScopes test=develop
-
- 04 9月, 2019 1 次提交
-
-
由 baojun 提交于
* enable ngraph throught build_strategy test=develop * add unittest test=develop * put use_ngraph unconditional test=develop * remove paddle_enforce test=develop * remove paddle_enforce test=develop * fix copyright test=develop * limit for ngraph only test=develop
-
- 29 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop
-
- 26 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* first version memory optimize pass, test=develop * remove move_tensor_sharing_pass, test=develop * refine code comments, add unittests, test=develop * turn off memory_optimize by default, test=develop * follow huihuang's comments, test=develop * follow chengduoZH's comments, test=develop * fix grammar error, add const qualifier, fix pass_test exception message, test=develop * follow chengduoZH's comments 2nd, test=develop
-
- 23 7月, 2019 1 次提交
-
-
由 chengduo 提交于
* support sparse gradients test=develop
-
- 11 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* feature/buffer_shared_inplace, test=develop * refine code, test=develop * fix elementwise_add op cpu inplace and sum inplace bug, test=develop * add unittest and debug log, test=develop * fix parallel_executor scope bug, polish code, test=develop * fix sum op, activation op, single_in_place_inference bug, test=develop * remove kLocalExecScopeName, test=develop * fix unittest,test=develop * fix out_var first version bug, test=develop * follow comments,test=develop
-
- 10 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* clean code of dim and place, test=develop * fix failed unittests, test=develop
-
- 06 6月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 08 5月, 2019 1 次提交
-
-
由 chengduo 提交于
* move pass to ir * polish code test=develop * fix dependency test=develop
-
- 23 4月, 2019 1 次提交
-
-
由 chengduo 提交于
* Add fuse momenutum ops
-
- 21 4月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* speedup gc and inplace softmax_with_cross_entropy_grad test=develop * refine models gpu mem Merge skip vars and warning messages of mem opt remove relu mem opt test=develop * follow comments test=develop
-
- 18 4月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 30 3月, 2019 1 次提交
-
-
由 gongweibao 提交于
* fix compiled test=develop * follow comments test=develop
-
- 28 3月, 2019 2 次提交
-
-
由 chengduo 提交于
* fuse optimizer
-
由 gongweibao 提交于
-
- 27 3月, 2019 2 次提交
-
-
由 Qiao Longfei 提交于
-
由 Wu Yi 提交于
* test fix fetch bar place for ce * fix ps mode dist train in develop test=develop * fix style check test=develop * update test=develop
-
- 22 3月, 2019 1 次提交
-
-
由 chengduo 提交于
* refine parallelExecutor test=develop * Polish op_handle test=develop * Remove unnecessary op_handle test=develop * Fix Travis CI test=develop * Fix fetch bug test=develop * Remove WaitInputVarGenerated * Fix OpHandleBase::Run test=develop * debug test=develop * use origin fetch_op_handle test=develop * Revert op_handle_base.cc test=develop * Polish code test=develop * Fix OpHandleBase::Run test=develop * code refine * test CI and CE test=develop * fix OpHandle::Run test=develop * refine AllReduceOpHandle test=develop * Polish code test=develop
-
- 20 3月, 2019 1 次提交
-
-
由 chengduo 提交于
* fuse all_reduce test=develop * add fuse_parameter_groups_size test=develop * Polish code test=develop * Fix travis-ci test=develop * Add SetGroupAccordingToLayers and SetGroupAccordingToGroupSize test=develop * Add SetGroupAccordingToMemorySize test=develop * fix multi_devices_graph test=develop * reset params_grads test=develop * Polish code test=develop
-
- 05 3月, 2019 2 次提交
-
-
由 sneaxiy 提交于
test=develop
-
由 Qiao Longfei 提交于
-
- 18 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
-
- 14 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
-
- 13 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
-
- 11 2月, 2019 1 次提交
-
-
由 dzhwinter 提交于
-
- 31 1月, 2019 2 次提交