1. 22 2月, 2020 1 次提交
  2. 16 2月, 2020 1 次提交
  3. 07 2月, 2020 1 次提交
    • Y
      Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038
      Yiqun Liu 提交于
      * Add the first implememtation of fusion_group op #19621 (#3)
      
      * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
      test=develop
      
      * Call CUDA driver api to launch the kernel compiled by nvrtc.
      test=develop
      
      * Disable for mac and windows.
      test=develop
      
      * Refine the codes to support manually specified num_threads and workload_per_thread.
      test=develop
      
      * Refine the CUDA kernel to support large dims.
      test=develop
      
      * Add DeviceCodePool to manage all device codes.
      
      * Add the first implementation fusion_group op.
      
      * Add unit-test for fusion_group op.
      
      * Add the check of result.
      
      * Add the check of nvrtc in unit-test.
      test=develop
      
      * Add comment to explain the inputs, outputs and features of fusion_group op.
      test=develop
      
      * Disable fusion_group op for mac and windows.
      test=develop
      
      * Make the compiling of device code return status instead of hanging up.
      test=develop
      
      * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
      
      * Unify fusion_group_op's input and output names.
      test=develop
      
      * Add the check of CUDA driver library in unittest.
      test=develop
      
      * Enable generating code for a given subgraph. #21126 (#4)
      
      * Enable generating code for a given subgraph.
      
      * Support sorting the subgraph.
      
      * Remove the rearange of expressions because we use the sorted subgraph directly.
      
      * Enable generating code for a subgraph which is composed of grad ops.
      
      * Use expression information to check the accuracy in unittest.
      
      * Separate load and store from computation expressions.
      test=develop
      
      * Improve the loading statements in generated codes.
      test=develop
      
      * Remove unused arguments from formal list.
      test=develop
      
      * Enable the detection of subgraph of grad ops.
      
      * Generate code for detected subgraph in fusion_group_pass.
      
      * Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
      test=develop
      
      * Fix a bug when checking whether the shape of all inputs are the same.
      
      * Add debug information.
      
      * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)
      
      test=develop
      
      * Call subgraph_detector in fusion_group pass.
      test=develop
      
      * Disable fusion_group when WITH_GPU is OFF.
      test=develop
      
      * Refine all PADDLE_ENFORCE message.
      test=develop
      
      * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
      test=develop
      
      * Follow review comments.
      test=develop
      dcfb6038
  4. 05 2月, 2020 1 次提交
  5. 21 1月, 2020 1 次提交
  6. 17 1月, 2020 2 次提交
    • Y
      Implement a common python unittest to test the ir passes. (#22209) · b7cac50b
      Yiqun Liu 提交于
      * Implement a common python unittest to test the ir passes.
      test=develop
      
      * Save the results in np.array and support to startup on CPU.
      test=develop
      
      * Fix the unittest.
      test=develop
      
      * Add check_program to check whether the optimized program is different from the origin one.
      test=develop
      
      * Remove the inferface all_ops.
      test=develop
      
      * Add exception test in pass_test.
      test=develop
      b7cac50b
    • T
      integrated HALF_ASYNC to communicator (#21869) · 82bc814a
      tangwei12 提交于
      * add half_async in the communicator
      * fix DistributedStrategy
      82bc814a
  7. 16 1月, 2020 1 次提交
    • C
      Speeding up dygraph DataLoader with multiprocessing (#21762) · 35efbe6d
      Chen Weihang 提交于
      * add multiprocess for dygraph data loader, test=develop
      
      * polish code & add safe gurad, test=develop
      
      * refactor dygraph dataloader & add signal handler, test=develop
      
      * fix member initializer compile error on ci, test=develop
      
      * fix member initializer compile error one more, test=develop
      
      * remove useless config, test=develop
      
      * skip windows incompatible problem, test=develop
      
      * add unittest for coverage, test=coverage
      
      * add more exception unittest case, test=develop
      
      * deal with signal handler coverage, test=develop
      
      * polish code & add signal handler tests, test=develop
      
      * deal with coverage ci problem, test=develop
      
      * split data loader test & coverage ci fix, test=develop
      
      * remove test_imperative_data_loader_with_exception, test=develop
      
      * remove singal process except test case, test=develop
      
      * add exception tests again & remove sample list test, test=develop
      
      * split normal and exception unittests to diff class, test=develop
      
      * polish doc for use_multiprocess effect in static mode, test=develop
      35efbe6d
  8. 14 1月, 2020 1 次提交
  9. 11 1月, 2020 1 次提交
    • L
      add NotImplementedError for multi optimizers (#22181) · 8de33f41
      liym27 提交于
      * add NotImplementedError for multi optimizers used on multi-places . test=develop
      
      * assert error only if num_devices>1. test=develop
      
      * set test_optimizer_in_control_flow in CMakeLists for using multi-GPU.test=develop
      8de33f41
  10. 10 1月, 2020 2 次提交
    • Z
      Add bn and relu fuse pass (#22048) · 46189b16
      Zhen Wang 提交于
      * add bn and relu fuse pass
      
      * add op attr assert and dtype assert
      
      * fix some inputs&&outputs bugs for the fused op and pattern.
      
      * add the unittest for fuse_bn_act_pass. test=develop
      
      * use normative enforce statements. test=develop
      
      * add the cpu test. test=develop
      
      * add the support of batch_size=1 for the bn with relu op. test=develop
      
      * add the error type for paddle throws. test=develop
      
      * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
      46189b16
    • S
  11. 08 1月, 2020 1 次提交
  12. 07 1月, 2020 1 次提交
  13. 02 1月, 2020 1 次提交
  14. 31 12月, 2019 1 次提交
  15. 30 12月, 2019 1 次提交
  16. 25 12月, 2019 2 次提交
  17. 23 12月, 2019 1 次提交
  18. 21 12月, 2019 1 次提交
  19. 17 12月, 2019 1 次提交
  20. 15 12月, 2019 1 次提交
  21. 14 12月, 2019 1 次提交
  22. 12 12月, 2019 1 次提交
    • T
      memory leak for cpu (#21174) · 9ad940fd
      tangwei12 提交于
      * add fake init for the trainer, fix large memory hold in the trainer
      * do not merge recv vars from a remote endpoint, test=develop
      * add recv and save op, merge slice var in one op, save memory
      * remove hsigmoid with pull sparse, test=develop
      9ad940fd
  23. 03 12月, 2019 1 次提交
  24. 27 11月, 2019 1 次提交
  25. 18 11月, 2019 1 次提交
    • Z
      Fix warn of gcc8 (#21205) · cdb3d279
      Zeng Jinle 提交于
      * fix warnings oof gcc 8 compilation, test=develop
      
      * fix boost::bad_get, test=develop
      
      * refine PADDLE_ENFORCE, test=develop
      cdb3d279
  26. 02 11月, 2019 1 次提交
  27. 28 10月, 2019 1 次提交
  28. 24 10月, 2019 2 次提交
  29. 22 10月, 2019 1 次提交
  30. 17 10月, 2019 1 次提交
  31. 09 10月, 2019 2 次提交
  32. 30 9月, 2019 1 次提交
  33. 28 9月, 2019 1 次提交
  34. 26 9月, 2019 1 次提交
  35. 25 9月, 2019 1 次提交