- 20 3月, 2020 3 次提交
-
-
由 Zeng Jinle 提交于
* sequential reader stage 1, test=develop * fix ut, test=develop * fix iterable=False reset bug, add some logs and polish code, test=develop * inference feed partial data, test=develop * Turn on keep_order=True for test, test=develop * enhance ut to test more cases, test=develop * test commit for reverting * Revert "test commit for reverting", test=develop This reverts commit 80aef42e. * add ut of merged and unmerged results, test=develop * add more uts for coverages and add en doc of api, test=develop * follow comments, test=develop * change note style, test=develop
-
由 Wilber 提交于
update embedding_eltwise_layernorm fuse pass and fused kernel, to support multi input
-
由 Yiqun Liu 提交于
-
- 19 3月, 2020 1 次提交
-
-
由 Sylwester Fraczek 提交于
-
- 13 3月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* add fusion group test for backward and refine code
-
- 12 3月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* add support for expression type convert and add cast Op support in fusion group
-
- 11 3月, 2020 2 次提交
-
-
由 Wilber 提交于
* add skip_layernorm pass. test=develop
-
由 Zhaolong Xing 提交于
* 1. add embedding eltwise layernorm fuse 2. add embedding eltwise layernorm op 3. refine inplace_add_relu 4. refine fc_eltwise_layernorm test=develop * 1. refine fc test=develop * fix comments test=develop * fix comments test=develop
-
- 09 3月, 2020 1 次提交
-
-
由 liu zhengxi 提交于
* fix fc padding during fusion, test=develop * fix optim model inference after SaveOptimModel, test=develop
-
- 01 3月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* Add the codegen and auto fusion for sum Op in fusion group
-
- 28 2月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 24 2月, 2020 1 次提交
-
-
由 GaoWei8 提交于
* Add an interface of disabling FC padding * fix bert regression * polish fc padding interface * recover pass function * fix argument error * fix mkldnn error
-
- 23 2月, 2020 1 次提交
-
-
由 tianshuo78520a 提交于
-
- 21 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 14 2月, 2020 1 次提交
-
-
由 Wilber 提交于
当一个模型中有多个fc_lstm子图的时候,且其中fc共用了同一个persistable的bias,此时不应该将bias节点删除,只将非persistable的节点去除即可。
-
- 13 2月, 2020 1 次提交
-
-
由 Zhaolong Xing 提交于
* 1. optim multihead matmul: fuse three fc to multihtead matmul test=develop * fix conflict test=develop * fix comments test=develop
-
- 07 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
- 06 2月, 2020 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Add dequant scale squash test=develop * Correct dequant-scale squash test test=develop
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 04 2月, 2020 1 次提交
-
-
由 石晓伟 提交于
-
- 31 1月, 2020 1 次提交
-
-
由 Michał Gallus 提交于
* Enable quantize to reorder to nchw as well * Correct FC MKL-DNN input dim requirements to accept 3D * Improve DNNL FC format, error and 3D input handling test=develop * Improve error checking in FC test=develop * Improve PADDLE_ENFORCE messages in fc-related files * Remove data layout attribute from obligatory pass args test=develop * Fix message in fc_mkldnn_pass to be logically correct test=develop
-
- 25 1月, 2020 1 次提交
-
-
由 joanna.wozna.intel 提交于
-
- 17 1月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Implement a common python unittest to test the ir passes. test=develop * Save the results in np.array and support to startup on CPU. test=develop * Fix the unittest. test=develop * Add check_program to check whether the optimized program is different from the origin one. test=develop * Remove the inferface all_ops. test=develop * Add exception test in pass_test. test=develop
-
- 16 1月, 2020 1 次提交
-
-
由 lidanqing 提交于
-
- 15 1月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
-
- 14 1月, 2020 1 次提交
-
-
由 Wojciech Uss 提交于
-
- 10 1月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
* add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
-
- 09 1月, 2020 2 次提交
-
-
由 joanna.wozna.intel 提交于
-
由 Yiqun Liu 提交于
* Polish the PADDLE_ENFORCE in fusion_group pass related codes. test=develop * Correct the unittest because of the change relu_grad's formula. test=develop
-
- 07 1月, 2020 2 次提交
-
-
由 liu zhengxi 提交于
-
由 Yiqun Liu 提交于
test=develop
-
- 03 1月, 2020 2 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop
-
由 Michał Gallus 提交于
-
- 29 12月, 2019 1 次提交
-
-
由 liu zhengxi 提交于
* fix seqconv_eltadd_relu pass during multi-threads predictor, test=develop * fix attention_lstm_fuse_pass during multi-threads inference, test=develop * fix embedding_fc_lstm_fuse_pass during multi-threads inference, test=develop * fix fc_lstm_fuse_pass during multi-threads inference, test=develop * fix seq_concat_fc_fuse_pass during multi-threads inference, test=develop
-
- 27 12月, 2019 1 次提交
-
-
由 石晓伟 提交于
* fix multi-thread error of fc_gru_fuse_pass.cc, test=develop * export FLAGS and GLOG symbols, test=develop
-
- 25 12月, 2019 1 次提交
-
-
由 Pei Yang 提交于
-
- 24 12月, 2019 1 次提交
-
-
由 Aurelius84 提交于
* optimize adam speed by removing _finish_update test=develop * fix SparseAdamFunctor param list test=develop * Remove scale_op in expect_list of adam_op test=develop * fix test optimizer loss assert error test=develop * fix test optimizer loss assert error test=develop * modify PADDLE_ENFORCE usage test=develop * fix op_type in lamb_op.cc test=develop * fix errors ostream format bug test=develop * add betaPowOut in ngraph op test=develop * fix ngraph::op api for gcc8 test=develop * clean code test=develop * modify struct into class test=develop * remove code of beta1Tensor in lamb_op test=develop
-
- 16 12月, 2019 1 次提交
-
-
由 lidanqing 提交于
* fc-dequantize squash test=develop * change according to reviews test=develop * change PADDLE_ENFORCE test=develop * add second test when fc-dequant do not fuse test=develop * change all related PADDLE_ENFORCE test=develop
-
- 12 12月, 2019 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Add reshape int8 op test=develop * Change test to CPUPlace test=develop * Correct tests test=develop
-
- 02 12月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* fix -Wno-error=sign-compare warning in gcc8 test=develop * fix warning in distributed codes test=develop
-