- 10 2月, 2020 6 次提交
-
-
由 Guo Sheng 提交于
-
由 cc 提交于
* post_training_quantization support set bits, test=develop * up, test=develop
-
由 Wilber 提交于
Compile without nccl deps. [1/2] Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
由 Yiqun Liu 提交于
test=develop
-
由 Wilber 提交于
-
由 Huihuang Zheng 提交于
This PR provides very basic and simple framework for transforming Dygraph to Static Graph. API names, final outputs are not determined yet. Feel free to modify or add class/function/type when you think the framework is not extendable for you.
-
- 07 2月, 2020 6 次提交
-
-
由 Zhong Hui 提交于
Fix the integer overflow problem in the op of sequence2batch, change the int32_t to size_t, In the /paddle/fluid/operators/math/sequence2batch.h#L122.
-
由 cc 提交于
* support weight quantization in post_training_quanzitaion, test=develop * add test for weight quantization, test=develop
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
由 Aurelius84 提交于
* polish backward api doc test=develop, test=document_preview, test=document_fix * polish backward api doc test=develop, test=document_preview, test=document_fix * no_grad supports set of Variable test=develop, test=document_preview * polish sample code of append_backward test=develop, test=document_preview * modify assert into Raise TypeError test=develop,test=document_preview * fix unittest failed test=develop * rm useless file test=develop * polish en doc test=develop * polish code of no_grad_set test=develop * polish code of no_grad_set test=develop
-
由 Tao Luo 提交于
test=develop
-
由 LielinJiang 提交于
* optimize interpolate op, test=develop
-
- 06 2月, 2020 7 次提交
-
-
由 wangchaochaohu 提交于
-
由 Yiqun Liu 提交于
Correct the use of DeviceContext in unittest sequence_pooling_test and sequence_padding_test (#22456) * Add log in memory::Copy for debug purpose. * Change to use context in DeviceContextPool directly in sequence_pooling_test, instead to new one. * Change to use context in DeviceContextPool directly in sequence_padding_test, instead to new one. test=develop * Change the type of second_dim from size_t to int64_t. test=develop
-
由 flame 提交于
* R-language inference support
-
由 Tao Luo 提交于
-
由 joanna.wozna.intel 提交于
* Add dequant scale squash test=develop * Correct dequant-scale squash test test=develop
-
由 mapingshuo 提交于
* update readme * test=develop
-
由 Aurelius84 提交于
* add skip_check_grad_ci of var_conv_2d test=develop * modify check_shape_white_list test=develop
-
- 05 2月, 2020 7 次提交
-
-
由 Zhaolong Xing 提交于
* add mutex for trt engine test=develop * add the test for copy_to_cpu test=develop
-
由 XiaoguangHu 提交于
更新文档错误
-
由 Daniel Yang 提交于
* these are the revised readme/readme_cn pull-request, test=develop * this is a readme pull request, test = develop * these are the revised readme/readme_cn pull request,test = develop
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
由 Bai Yifan 提交于
-
由 Tao Luo 提交于
* Sigmoid bug fix, test=develop * fix code format test=develop Co-authored-by: NManjunath Bhat <manjunathbhat9920@gmail.com>
-
由 xujiaqi01 提交于
* add hdfs ls retry time and sleep time, fix save inference * test=develop
-
- 04 2月, 2020 4 次提交
-
-
由 xujiaqi01 提交于
* fix copy table bug of lost some feasign * test=develop
-
由 Leo Chen 提交于
* add int16 support, test=develop * add test, test=develop * fix typo, test=develop * fix dtype error in slice, test=develop
-
由 石晓伟 提交于
-
由 juncaipeng 提交于
* fix chain doc, test=develop, test=document_preview
-
- 03 2月, 2020 1 次提交
-
-
由 tangwei12 提交于
* fix bug with half communicator
-
- 02 2月, 2020 2 次提交
-
-
由 liu zhengxi 提交于
* update the ut precision of pad pad2d pad_constant_like from fp32 to fp64, test=develop
-
由 xujiaqi01 提交于
* add GeneralRoleMaker which is for general usage * test=develop
-
- 31 1月, 2020 2 次提交
-
-
由 Michał Gallus 提交于
* Enable quantize to reorder to nchw as well * Correct FC MKL-DNN input dim requirements to accept 3D * Improve DNNL FC format, error and 3D input handling test=develop * Improve error checking in FC test=develop * Improve PADDLE_ENFORCE messages in fc-related files * Remove data layout attribute from obligatory pass args test=develop * Fix message in fc_mkldnn_pass to be logically correct test=develop
-
由 joanna.wozna.intel 提交于
-
- 25 1月, 2020 2 次提交
-
-
由 lidanqing 提交于
-
由 joanna.wozna.intel 提交于
-
- 23 1月, 2020 1 次提交
-
-
由 Wojciech Uss 提交于
-
- 22 1月, 2020 2 次提交
-
-
由 wangchaochaohu 提交于
-
由 ceci3 提交于
-