- 07 2月, 2020 2 次提交
-
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
由 Aurelius84 提交于
* polish backward api doc test=develop, test=document_preview, test=document_fix * polish backward api doc test=develop, test=document_preview, test=document_fix * no_grad supports set of Variable test=develop, test=document_preview * polish sample code of append_backward test=develop, test=document_preview * modify assert into Raise TypeError test=develop,test=document_preview * fix unittest failed test=develop * rm useless file test=develop * polish en doc test=develop * polish code of no_grad_set test=develop * polish code of no_grad_set test=develop
-
- 06 2月, 2020 2 次提交
-
-
由 Tao Luo 提交于
-
由 Aurelius84 提交于
* add skip_check_grad_ci of var_conv_2d test=develop * modify check_shape_white_list test=develop
-
- 05 2月, 2020 3 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
由 Bai Yifan 提交于
-
由 xujiaqi01 提交于
* add hdfs ls retry time and sleep time, fix save inference * test=develop
-
- 04 2月, 2020 2 次提交
-
-
由 Leo Chen 提交于
* add int16 support, test=develop * add test, test=develop * fix typo, test=develop * fix dtype error in slice, test=develop
-
由 juncaipeng 提交于
* fix chain doc, test=develop, test=document_preview
-
- 03 2月, 2020 1 次提交
-
-
由 tangwei12 提交于
* fix bug with half communicator
-
- 02 2月, 2020 2 次提交
-
-
由 liu zhengxi 提交于
* update the ut precision of pad pad2d pad_constant_like from fp32 to fp64, test=develop
-
由 xujiaqi01 提交于
* add GeneralRoleMaker which is for general usage * test=develop
-
- 25 1月, 2020 2 次提交
-
-
由 lidanqing 提交于
-
由 joanna.wozna.intel 提交于
-
- 23 1月, 2020 1 次提交
-
-
由 Wojciech Uss 提交于
-
- 22 1月, 2020 1 次提交
-
-
由 wangchaochaohu 提交于
-
- 21 1月, 2020 7 次提交
-
-
由 lilong12 提交于
-
由 石晓伟 提交于
* add no_grad_set value check in op_test, test=develop * update ops list, test=develop
-
由 Chengmo 提交于
* test=develop, fix geo Send & Init
-
由 songyouwei 提交于
test=develop
-
由 gongweibao 提交于
-
由 juncaipeng 提交于
* remove skip_check in test_activation_mkldnn_op, test=develop
-
由 whs 提交于
-
- 20 1月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
* polish backward prune, test=develop * fix control flow op bug, test=develop * add some unittests, test=develop * fix unittest args, test=develop * follow huihuang's comments, test=develop
-
- 19 1月, 2020 3 次提交
-
-
由 Zeng Jinle 提交于
* fix dataloader early reset bug, test=develop * change implementation, test=develop * fix ut, test=develop
-
由 zhupengyang 提交于
-
由 juncaipeng 提交于
-
- 18 1月, 2020 1 次提交
-
-
由 Chengmo 提交于
fix timeout of test_dist_fleet_geo
-
- 17 1月, 2020 5 次提交
-
-
由 Yiqun Liu 提交于
* Implement a common python unittest to test the ir passes. test=develop * Save the results in np.array and support to startup on CPU. test=develop * Fix the unittest. test=develop * Add check_program to check whether the optimized program is different from the origin one. test=develop * Remove the inferface all_ops. test=develop * Add exception test in pass_test. test=develop
-
由 songyouwei 提交于
* fix circular dependent * try import layers.nn from dygraph test=develop
-
由 songyouwei 提交于
* allow sublayer or param shadow attrs * add unittest test=develop * change remove fn name test=develop
-
由 zhupengyang 提交于
-
由 tangwei12 提交于
* add half_async in the communicator * fix DistributedStrategy
-
- 16 1月, 2020 7 次提交
-
-
由 hong 提交于
* add learning rate api; test=develop * fix uni test converage; test=develop * fix travis ci error; test=develop * fix comment; test=develop * fix example error; test=develop * polish the api description, test=develop Co-authored-by: Nzhongpu <2013000149@qq.com>
-
由 Li Fuchen 提交于
* Fixed warpctc, test=develop * Set lod level of sequence_unpad's output to 1 in compile time test=develop * fix the en doc and example code of warpctc, test=develop, test=document_fix
-
由 zhangchunle 提交于
-
由 juncaipeng 提交于
-
由 zhupengyang 提交于
-
由 zhongpu 提交于
-
由 hong 提交于
* fix test_layers compare static graph and dygraph result; test=develop * fix test_layers random error; test=develop
-