- 13 2月, 2020 6 次提交
-
-
由 Zhaolong Xing 提交于
* 1. optim multihead matmul: fuse three fc to multihtead matmul test=develop * fix conflict test=develop * fix comments test=develop
-
由 Huihuang Zheng 提交于
As the title
-
由 joanna.wozna.intel 提交于
test=develop
-
由 Yiqun Liu 提交于
test=develop
-
由 石晓伟 提交于
-
由 Zeng Jinle 提交于
-
- 12 2月, 2020 5 次提交
-
-
由 Guo Sheng 提交于
* Add support for dynamic_decode(while) training. test=develop * Fix assign_op and tensor_array_read_write_op after solving conflict. test=develop * Fix test_rnn_decode_api.py. test=develop * Refine docs for apis in rnn.py. test=develop * Adjust outputs of dynamic_decode. test=develop * Remove the force_cpu update in assign_op. test=develop * Remove the force_cpu update in assign_op. test=develop * Make RNNCell.get_initial_states support batch_dim_idx argument. test=develop * Rename _create_array_outof_while as _create_array_out_of_while in rnn.py. test=develop
-
由 tangwei12 提交于
* add thread barrier for the compiled program
-
由 Wojciech Uss 提交于
* a test for Ernie QAT INT8 accuracy check test=develop * Remove NLP comparison test to split PRs test=develop * Fix typo and tabs, delete commented lines test=develop * re-combine the 2 PRs, test=develop Co-authored-by: NMichał Gallus <sand3r@interia.eu> Co-authored-by: Nbingyanghuang <33643817+bingyanghuang@users.noreply.github.com>
-
由 Pei Yang 提交于
-
由 Double_V 提交于
* support slice double grad, test=develop * merge two doublegradopmaker to one doublegradopmaker,test=develop * change the shape of slice_OP's unittest, test=develop
-
- 11 2月, 2020 8 次提交
-
-
由 hutuxian 提交于
Refine PaddleBox Framework, Main functions: * Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC. * Replace FeedPass with new interface: BeginFeedPass & EndFeedPass * Refactor Pull/Push Sparse Function in box_wrapper. * Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct. * Cache copied keys in pull sparse in order to reuse it in push period.
-
由 huzhiqiang 提交于
-
由 littletomatodonkey 提交于
-
由 WangXi 提交于
-
由 yaoxuefeng 提交于
* update * update test=develop * update compile set test=develop * update compile set test=develop * update test=develop * update test=develop * update test=develop * update compile setting test=develop * update compile setting test=develop * update run demo test=develop * update test=develop * update test=develop * fix test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update format test=develop * update format test=develop * update style test=develop * update style test=develop * change style test=develop * change style test=develop * change style test=develop * add dataset unittest test=develop * update test=develop * update for record test=develop * udpate style for record test=develop * update for record test=develop * update for record test=develop * update for record test=develop * fix format test=develop * update test=develop * update test=develop * update test=develop * update test=develop * update test=develop
-
由 zhaoyuchen2018 提交于
* Refine code, fix select tile error,test=develop * Refine element type and some comments, test=develop * Refine comments and gpu utils, test=develop * Remove some useless condition * Refine floor and ceil, test=develop * refine for loop. test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 Wilber 提交于
支持不依赖nccl进行编译。[1/2] 多卡下,如果没有打开WITH_NCCL开关编译,多卡不能通信,则只能选择一张卡使用。 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
由 guofei 提交于
This PR makes assign op support LoDTensorArray and enable the loop_vars in while_loop to support tuple or list.
-
- 10 2月, 2020 9 次提交
-
-
由 Zhaolong Xing 提交于
[Refine Paddle-TRT INT8]: Support PaddleSlim's Resnet50, Mobilenetv1, Yolov3 models for Inference. (#22483) * add int8 op teller for trt. * refine trt int8 * add int8 op teller for trt. test=develop
-
由 liu zhengxi 提交于
* add InterencePassTest for testing precision of inference passes, test=develop
-
由 zhongpu 提交于
add cp27-cp27m-gcc82 and cp27-cp27mu-gcc82 branch to support gcc8.2 compile for paddle, test=develop (#22504)
-
由 Guo Sheng 提交于
-
由 cc 提交于
* post_training_quantization support set bits, test=develop * up, test=develop
-
由 Wilber 提交于
Compile without nccl deps. [1/2] Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
由 Yiqun Liu 提交于
test=develop
-
由 Wilber 提交于
-
由 Huihuang Zheng 提交于
This PR provides very basic and simple framework for transforming Dygraph to Static Graph. API names, final outputs are not determined yet. Feel free to modify or add class/function/type when you think the framework is not extendable for you.
-
- 07 2月, 2020 6 次提交
-
-
由 Zhong Hui 提交于
Fix the integer overflow problem in the op of sequence2batch, change the int32_t to size_t, In the /paddle/fluid/operators/math/sequence2batch.h#L122.
-
由 cc 提交于
* support weight quantization in post_training_quanzitaion, test=develop * add test for weight quantization, test=develop
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
由 Aurelius84 提交于
* polish backward api doc test=develop, test=document_preview, test=document_fix * polish backward api doc test=develop, test=document_preview, test=document_fix * no_grad supports set of Variable test=develop, test=document_preview * polish sample code of append_backward test=develop, test=document_preview * modify assert into Raise TypeError test=develop,test=document_preview * fix unittest failed test=develop * rm useless file test=develop * polish en doc test=develop * polish code of no_grad_set test=develop * polish code of no_grad_set test=develop
-
由 Tao Luo 提交于
test=develop
-
由 LielinJiang 提交于
* optimize interpolate op, test=develop
-
- 06 2月, 2020 6 次提交
-
-
由 wangchaochaohu 提交于
-
由 Yiqun Liu 提交于
Correct the use of DeviceContext in unittest sequence_pooling_test and sequence_padding_test (#22456) * Add log in memory::Copy for debug purpose. * Change to use context in DeviceContextPool directly in sequence_pooling_test, instead to new one. * Change to use context in DeviceContextPool directly in sequence_padding_test, instead to new one. test=develop * Change the type of second_dim from size_t to int64_t. test=develop
-
由 flame 提交于
* R-language inference support
-
由 Tao Luo 提交于
-
由 joanna.wozna.intel 提交于
* Add dequant scale squash test=develop * Correct dequant-scale squash test test=develop
-
由 mapingshuo 提交于
* update readme * test=develop
-