- 22 2月, 2020 1 次提交
-
-
由 tangwei12 提交于
* add sync communicator and implement
-
- 16 2月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* split unittests in data loader test, test=release/1.7 * split unittests to different files, test=develop * remove repeat unittest, test=develop
-
- 07 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the first implememtation of fusion_group op #19621 (#3) * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Enable generating code for a given subgraph. #21126 (#4) * Enable generating code for a given subgraph. * Support sorting the subgraph. * Remove the rearange of expressions because we use the sorted subgraph directly. * Enable generating code for a subgraph which is composed of grad ops. * Use expression information to check the accuracy in unittest. * Separate load and store from computation expressions. test=develop * Improve the loading statements in generated codes. test=develop * Remove unused arguments from formal list. test=develop * Enable the detection of subgraph of grad ops. * Generate code for detected subgraph in fusion_group_pass. * Add an option in BuildStrategy to enable fusion_group_pass and add unittest. test=develop * Fix a bug when checking whether the shape of all inputs are the same. * Add debug information. * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5) test=develop * Call subgraph_detector in fusion_group pass. test=develop * Disable fusion_group when WITH_GPU is OFF. test=develop * Refine all PADDLE_ENFORCE message. test=develop * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op. test=develop * Follow review comments. test=develop
-
- 05 2月, 2020 1 次提交
-
-
由 Wilber 提交于
cmake选项中添加了WITH_NCCL,显示指定是否编译NCCL的部分代码,WITH_NCCL默认打开,但如果WITH_GPU为OFF,则关闭WITH_NCCL 添加了PADDLE_WITH_NCCL定义 单机单卡能够关闭NCCL编译,多卡的话需要默认打开NCCL,如果关闭NCCL,则只能使用单卡 Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 21 1月, 2020 1 次提交
-
-
由 gongweibao 提交于
-
- 17 1月, 2020 2 次提交
-
-
由 Yiqun Liu 提交于
* Implement a common python unittest to test the ir passes. test=develop * Save the results in np.array and support to startup on CPU. test=develop * Fix the unittest. test=develop * Add check_program to check whether the optimized program is different from the origin one. test=develop * Remove the inferface all_ops. test=develop * Add exception test in pass_test. test=develop
-
由 tangwei12 提交于
* add half_async in the communicator * fix DistributedStrategy
-
- 16 1月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* add multiprocess for dygraph data loader, test=develop * polish code & add safe gurad, test=develop * refactor dygraph dataloader & add signal handler, test=develop * fix member initializer compile error on ci, test=develop * fix member initializer compile error one more, test=develop * remove useless config, test=develop * skip windows incompatible problem, test=develop * add unittest for coverage, test=coverage * add more exception unittest case, test=develop * deal with signal handler coverage, test=develop * polish code & add signal handler tests, test=develop * deal with coverage ci problem, test=develop * split data loader test & coverage ci fix, test=develop * remove test_imperative_data_loader_with_exception, test=develop * remove singal process except test case, test=develop * add exception tests again & remove sample list test, test=develop * split normal and exception unittests to diff class, test=develop * polish doc for use_multiprocess effect in static mode, test=develop
-
- 14 1月, 2020 1 次提交
-
- 11 1月, 2020 1 次提交
-
-
由 liym27 提交于
* add NotImplementedError for multi optimizers used on multi-places . test=develop * assert error only if num_devices>1. test=develop * set test_optimizer_in_control_flow in CMakeLists for using multi-GPU.test=develop
-
- 10 1月, 2020 2 次提交
-
-
由 Zhen Wang 提交于
* add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
-
由 songyouwei 提交于
-
- 08 1月, 2020 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 07 1月, 2020 1 次提交
-
-
由 Chengmo 提交于
* add special way to add distribute vars, Update Pyramid hash op
-
- 02 1月, 2020 1 次提交
-
-
由 liym27 提交于
-
- 31 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 30 12月, 2019 1 次提交
-
-
由 Kaipeng Deng 提交于
-
- 25 12月, 2019 2 次提交
-
-
由 WangXi 提交于
-
由 songyouwei 提交于
* move sequence op unittests to a separate folder test=develop * add missing CMakeList file test=develop * fix relative path import test=develop * fix relative import test=develop * use sys.path.append test=develop
-
- 23 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 21 12月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 17 12月, 2019 1 次提交
-
-
由 WangXi 提交于
-
- 15 12月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 14 12月, 2019 1 次提交
-
-
由 juncaipeng 提交于
-
- 12 12月, 2019 1 次提交
-
-
由 tangwei12 提交于
* add fake init for the trainer, fix large memory hold in the trainer * do not merge recv vars from a remote endpoint, test=develop * add recv and save op, merge slice var in one op, save memory * remove hsigmoid with pull sparse, test=develop
-
- 03 12月, 2019 1 次提交
-
-
由 lilong12 提交于
* set dim[0] to -1 if dim[0] < 0 and remove assertion to runtime, test=develop * modify ENFORCE message, test=develop * add validation for x.shape[0] > 0, test=develop * add ut, test=develop
-
- 27 11月, 2019 1 次提交
-
-
由 hutuxian 提交于
* support data_norm_op run in CUDA * add two parameters sync_stats & summary_decay_rate * add UT
-
- 18 11月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* fix warnings oof gcc 8 compilation, test=develop * fix boost::bad_get, test=develop * refine PADDLE_ENFORCE, test=develop
-
- 02 11月, 2019 1 次提交
-
-
由 Dong Daxiang 提交于
* add launch_ps module so that we can launch a parameter server training job 1) a user can specify worker_num and server_num 2) parameter server can be killed after all workers exit 3) unit test is added test=develop
-
- 28 10月, 2019 1 次提交
-
-
由 Aurelius84 提交于
-
- 24 10月, 2019 2 次提交
- 22 10月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 17 10月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 09 10月, 2019 2 次提交
-
-
由 gongweibao 提交于
-
由 chengduo 提交于
test=develop
-
- 30 9月, 2019 1 次提交
-
-
由 Chengmo 提交于
* refector geo sgd & communicator
-
- 28 9月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 26 9月, 2019 1 次提交
-
-
由 gongweibao 提交于
-
- 25 9月, 2019 1 次提交
-
-
由 ShenLiang 提交于
* treat broadcast as non-initial, test=develop * rename the class name * rename the class name, test=develop
-