- 03 1月, 2020 3 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop
-
由 Michał Gallus 提交于
-
由 FDInSky 提交于
* test=develop fix generate_proposal_labesl op
-
- 02 1月, 2020 2 次提交
-
-
由 ceci3 提交于
* update error information about batch_norm_grad * update bn,test=develop
-
由 Aurelius84 提交于
* fix integer overflow in match_matrix test=develop * fix integer overflow in match_matrix test=develop * fix typo test=develop
-
- 31 12月, 2019 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 30 12月, 2019 1 次提交
-
-
由 danleifeng 提交于
-
- 27 12月, 2019 3 次提交
-
-
由 zhaoyuchen2018 提交于
* Refine multihead kernel, align block to 32 test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com> * Refine log comments test=develop Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
-
由 zhoushiyu 提交于
* add shuffle batch op, test=develop, test=document_preview * fix size_t conflict and check_output test=develop, test=document_preview * fix bug test=develop, test=document_preview * add unittest of shuffle_batch layer test=develop, test=document_preview * fix py coverage and op input type, test=develop, test=document_preview * fix py coverage, test=develop * fix en doc, test=develop * move to contrib test=develop * add unique_name test=develop * invoke shuffle_batch in contrib.layers test=develop
-
由 mapingshuo 提交于
* make reverse op support negative axis
-
- 26 12月, 2019 2 次提交
-
-
由 Aurelius84 提交于
* fix compile error in CUDA10 test=develop * remove double in pad2d test=develop
-
由 hutuxian 提交于
* fix stat shape back in global auc scenario * add UT to cover global auc
-
- 25 12月, 2019 3 次提交
-
-
由 Aurelius84 提交于
* add register op_data_type test=develop * fix register bug in isfinite op test=develop * rm int int64_t in pad2d gradKernel test=develop
-
由 hong 提交于
-
由 zhouwei25 提交于
-
- 24 12月, 2019 3 次提交
-
-
由 Aurelius84 提交于
* optimize adam speed by removing _finish_update test=develop * fix SparseAdamFunctor param list test=develop * Remove scale_op in expect_list of adam_op test=develop * fix test optimizer loss assert error test=develop * fix test optimizer loss assert error test=develop * modify PADDLE_ENFORCE usage test=develop * fix op_type in lamb_op.cc test=develop * fix errors ostream format bug test=develop * add betaPowOut in ngraph op test=develop * fix ngraph::op api for gcc8 test=develop * clean code test=develop * modify struct into class test=develop * remove code of beta1Tensor in lamb_op test=develop
-
由 FDInSky 提交于
Update iou_similarity op to support non-normalized bbox
-
由 guofei 提交于
-
- 23 12月, 2019 2 次提交
- 20 12月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
-
- 19 12月, 2019 4 次提交
-
-
由 Chengmo 提交于
* test=develop, speed dense calc & communication
-
由 Wojciech Uss 提交于
test=develop
-
由 guofei 提交于
1. Make while_op accept GPU conditional data 2. Add more complex test cases for while_loop API
-
由 WangXi 提交于
-
- 17 12月, 2019 1 次提交
-
-
由 Huihuang Zheng 提交于
-
- 16 12月, 2019 3 次提交
-
-
由 zhaoyuchen2018 提交于
* Fix softmax cuda bug * Refine multihead log and softmax logic
-
由 Kaipeng Deng 提交于
* yolo_box OP add Attr(clip_bbox). test=develop
-
由 Leo Chen 提交于
* fix elementwise_pow bug on integer, test=develop * use llrint to support elementwise_pow_grad, test=develop * add some tests, test=develop * revert grad functor, test=develop
-
- 15 12月, 2019 1 次提交
-
-
由 Chen Weihang 提交于
* rename paddle throw error macro, test=develop * fix new error use case, test=develop
-
- 12 12月, 2019 2 次提交
-
-
由 joanna.wozna.intel 提交于
* Add reshape int8 op test=develop * Change test to CPUPlace test=develop * Correct tests test=develop
-
由 tangwei12 提交于
* add fake init for the trainer, fix large memory hold in the trainer * do not merge recv vars from a remote endpoint, test=develop * add recv and save op, merge slice var in one op, save memory * remove hsigmoid with pull sparse, test=develop
-
- 11 12月, 2019 1 次提交
-
-
由 GaoWei8 提交于
test=develop
-
- 10 12月, 2019 5 次提交
-
-
由 wangchaochaohu 提交于
-
由 Zeng Jinle 提交于
-
由 mapingshuo 提交于
* add seed op
-
由 Adam 提交于
* MKLDNN v1.0 rebase to Paddle 1.6 test=develop * Add hacky paddle::string::to_string() implementation * vectorize<int64-t>() -> vectorize() cleanup test=develop * PADDLE_ENFORCE and void_cast fixes test=develop * Rebase changes test=develop * Cosmetics test=develop * Delete MKL from mkldnn.cmake test=develop * CMake debug commands test=develop * Delete MKLDNN_VERBOSE and rebase fixes test=develop * Rebase fixes test=develop * Temporarily disable int8 resnet101 vgg16 and vgg19 tests test=develop * Add libmkldnn.so.1 to python setup test=develop * Add libmkldnn.so.1 to inference_lib cmake after rebase test=develop * Post rebase fixes + FC int8 changes test=develop * Fix LRN NHWC test=develop * Fix NHWC conv3d test=develop * Windows build fix + next conv3d fix test=develop * Fix conv2d on AVX2 machines test=develop
-
由 wangchaochaohu 提交于
* accelerate mean op test=develop
-
- 06 12月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
* polish infer shape registry, test=develop * modify some operators registry, test=develop
-
由 Aurelius84 提交于
-