- 19 2月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
* fix the profile print error test=develop
-
- 18 2月, 2020 2 次提交
-
-
由 lidanqing 提交于
* improve the mul_mkldnn_op line coverage test=develop * remove fp32 mul mkldnn kernel test=develop * locally refactoring test=develop * change according to reviews test=develop
-
由 wangchaochaohu 提交于
* add python flag to control profile level test=develop
-
- 17 2月, 2020 5 次提交
-
-
由 123malin 提交于
-
由 Zhaolong Xing 提交于
* fix trt log test=develop * fix comments test=develop
-
由 Adam 提交于
-
由 Adam 提交于
-
由 Jiawei Wang 提交于
* Add TopK Op Grad CPU&GPU Kernel test=develop * Add TopK Op Grad, modify grad op maker test=develop * Add TopK Op Grad, modify grad op maker test=develop * Add TopK Op Grad, modify PADDLE_ENFORCE test=develop * Add TopK Op Grad, modify PADDLE_THROW test=develop * Add TopK Op Grad, modify unittest test=develop * fix ngraph top k op unittest test=develop
-
- 15 2月, 2020 2 次提交
-
-
由 Steffy-zxf 提交于
* update ops's unittest of elementwise_pow, elementwise_max, elementwise_min, scale and sqrt 1. update elementwise_pow, elementwise_max and scale's unitests with input data type (float32 -> float64) 2. fix bug that the elementwise_pow doesn't meet threshold requirements with tackling float64 data 3. remove sqrt from op_accuracy_white_list.py 4. update the unittests of elementwise_pow, elementwise_max and elementwise_min ops that their input data shape over 100 5. test=develop * modify the writing style according suggestions test=develop
-
由 flame 提交于
-
- 14 2月, 2020 4 次提交
-
-
由 Chen Weihang 提交于
-
由 Wilber 提交于
当一个模型中有多个fc_lstm子图的时候,且其中fc共用了同一个persistable的bias,此时不应该将bias节点删除,只将非persistable的节点去除即可。
-
由 Chen Weihang 提交于
* reproduce match error, test=develop, test=document_fix * fix mismatch error, test=develop, test=document_fix
-
由 flame 提交于
* support golang inference
-
- 13 2月, 2020 3 次提交
-
-
由 Zhaolong Xing 提交于
* 1. optim multihead matmul: fuse three fc to multihtead matmul test=develop * fix conflict test=develop * fix comments test=develop
-
由 Yiqun Liu 提交于
test=develop
-
由 Zeng Jinle 提交于
-
- 12 2月, 2020 4 次提交
-
-
由 Guo Sheng 提交于
* Add support for dynamic_decode(while) training. test=develop * Fix assign_op and tensor_array_read_write_op after solving conflict. test=develop * Fix test_rnn_decode_api.py. test=develop * Refine docs for apis in rnn.py. test=develop * Adjust outputs of dynamic_decode. test=develop * Remove the force_cpu update in assign_op. test=develop * Remove the force_cpu update in assign_op. test=develop * Make RNNCell.get_initial_states support batch_dim_idx argument. test=develop * Rename _create_array_outof_while as _create_array_out_of_while in rnn.py. test=develop
-
由 tangwei12 提交于
* add thread barrier for the compiled program
-
由 Wojciech Uss 提交于
* a test for Ernie QAT INT8 accuracy check test=develop * Remove NLP comparison test to split PRs test=develop * Fix typo and tabs, delete commented lines test=develop * re-combine the 2 PRs, test=develop Co-authored-by: NMichał Gallus <sand3r@interia.eu> Co-authored-by: Nbingyanghuang <33643817+bingyanghuang@users.noreply.github.com>
-
由 Double_V 提交于
* support slice double grad, test=develop * merge two doublegradopmaker to one doublegradopmaker,test=develop * change the shape of slice_OP's unittest, test=develop
-
- 11 2月, 2020 6 次提交
-
-
由 hutuxian 提交于
Refine PaddleBox Framework, Main functions: * Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC. * Replace FeedPass with new interface: BeginFeedPass & EndFeedPass * Refactor Pull/Push Sparse Function in box_wrapper. * Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct. * Cache copied keys in pull sparse in order to reuse it in push period.
-
由 huzhiqiang 提交于
-
由 yaoxuefeng 提交于
* update * update test=develop * update compile set test=develop * update compile set test=develop * update test=develop * update test=develop * update test=develop * update compile setting test=develop * update compile setting test=develop * update run demo test=develop * update test=develop * update test=develop * fix test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update format test=develop * update format test=develop * update style test=develop * update style test=develop * change style test=develop * change style test=develop * change style test=develop * add dataset unittest test=develop * update test=develop * update for record test=develop * udpate style for record test=develop * update for record test=develop * update for record test=develop * update for record test=develop * fix format test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop
-
由 zhaoyuchen2018 提交于
* Refine code, fix select tile error,test=develop * Refine element type and some comments, test=develop * Refine comments and gpu utils, test=develop * Remove some useless condition * Refine floor and ceil, test=develop * refine for loop. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Wilber 提交于
支持不依赖nccl进行编译。[1/2] 多卡下,如果没有打开WITH_NCCL开关编译,多卡不能通信,则只能选择一张卡使用。 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
由 guofei 提交于
This PR makes assign op support LoDTensorArray and enable the loop_vars in while_loop to support tuple or list.
-
- 10 2月, 2020 4 次提交
-
-
由 Zhaolong Xing 提交于
[Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3 models for Inference. (#22483) * add int8 op teller for trt. * refine trt int8 * add int8 op teller for trt. test=develop
-
由 Wilber 提交于
Compile without nccl deps. [1/2] Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
由 Yiqun Liu 提交于
test=develop
-
由 Wilber 提交于
-
- 07 2月, 2020 5 次提交
-
-
由 Zhong Hui 提交于
Fix the integer overflow problem in the op of sequence2batch, change the int32_t to size_t, In the /paddle/fluid/operators/math/sequence2batch.h#L122.
-
由 cc 提交于
* support weight quantization in post_training_quanzitaion, test=develop * add test for weight quantization, test=develop
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
由 Tao Luo 提交于
test=develop
-
由 LielinJiang 提交于
* optimize interpolate op, test=develop
-
- 06 2月, 2020 4 次提交
-
-
由 wangchaochaohu 提交于
-
由 Yiqun Liu 提交于
Correct the use of DeviceContext in unittest sequence_pooling_test and sequence_padding_test (#22456) * Add log in memory::Copy for debug purpose. * Change to use context in DeviceContextPool directly in sequence_pooling_test, instead to new one. * Change to use context in DeviceContextPool directly in sequence_padding_test, instead to new one. test=develop * Change the type of second_dim from size_t to int64_t. test=develop
-
由 joanna.wozna.intel 提交于
* Add dequant scale squash test=develop * Correct dequant-scale squash test test=develop
-
由 mapingshuo 提交于
* update readme * test=develop
-