1. 27 10月, 2020 1 次提交
  2. 24 9月, 2020 1 次提交
    • W
      use iwyu clean include (#27267) · df43905f
      wanghuancoder 提交于
      * use iwyu clean include, test=develop, test=win
      
      * compilation error, test=develop
      
      * fix compilation error2, test=develop
      
      * fix compilation error3, test=develop
      
      * fix compilation error4, test=develop
      
      * fix compilation error5, test=develop
      
      * fix compilation error6, test=develop
      
      * fix compilation error7, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error10, test=develop
      
      * fix compilation error11, test=develop
      df43905f
  3. 21 9月, 2020 1 次提交
  4. 23 2月, 2020 1 次提交
  5. 11 2月, 2020 1 次提交
  6. 07 2月, 2020 1 次提交
    • Y
      Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038
      Yiqun Liu 提交于
      * Add the first implememtation of fusion_group op #19621 (#3)
      
      * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
      test=develop
      
      * Call CUDA driver api to launch the kernel compiled by nvrtc.
      test=develop
      
      * Disable for mac and windows.
      test=develop
      
      * Refine the codes to support manually specified num_threads and workload_per_thread.
      test=develop
      
      * Refine the CUDA kernel to support large dims.
      test=develop
      
      * Add DeviceCodePool to manage all device codes.
      
      * Add the first implementation fusion_group op.
      
      * Add unit-test for fusion_group op.
      
      * Add the check of result.
      
      * Add the check of nvrtc in unit-test.
      test=develop
      
      * Add comment to explain the inputs, outputs and features of fusion_group op.
      test=develop
      
      * Disable fusion_group op for mac and windows.
      test=develop
      
      * Make the compiling of device code return status instead of hanging up.
      test=develop
      
      * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
      
      * Unify fusion_group_op's input and output names.
      test=develop
      
      * Add the check of CUDA driver library in unittest.
      test=develop
      
      * Enable generating code for a given subgraph. #21126 (#4)
      
      * Enable generating code for a given subgraph.
      
      * Support sorting the subgraph.
      
      * Remove the rearange of expressions because we use the sorted subgraph directly.
      
      * Enable generating code for a subgraph which is composed of grad ops.
      
      * Use expression information to check the accuracy in unittest.
      
      * Separate load and store from computation expressions.
      test=develop
      
      * Improve the loading statements in generated codes.
      test=develop
      
      * Remove unused arguments from formal list.
      test=develop
      
      * Enable the detection of subgraph of grad ops.
      
      * Generate code for detected subgraph in fusion_group_pass.
      
      * Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
      test=develop
      
      * Fix a bug when checking whether the shape of all inputs are the same.
      
      * Add debug information.
      
      * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)
      
      test=develop
      
      * Call subgraph_detector in fusion_group pass.
      test=develop
      
      * Disable fusion_group when WITH_GPU is OFF.
      test=develop
      
      * Refine all PADDLE_ENFORCE message.
      test=develop
      
      * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
      test=develop
      
      * Follow review comments.
      test=develop
      dcfb6038
  7. 10 1月, 2020 1 次提交
    • Z
      Add bn and relu fuse pass (#22048) · 46189b16
      Zhen Wang 提交于
      * add bn and relu fuse pass
      
      * add op attr assert and dtype assert
      
      * fix some inputs&&outputs bugs for the fused op and pattern.
      
      * add the unittest for fuse_bn_act_pass. test=develop
      
      * use normative enforce statements. test=develop
      
      * add the cpu test. test=develop
      
      * add the support of batch_size=1 for the bn with relu op. test=develop
      
      * add the error type for paddle throws. test=develop
      
      * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
      46189b16
  8. 26 9月, 2019 1 次提交
  9. 13 9月, 2019 1 次提交
    • C
      Open fuse all reduce option (#19765) · 056fdedd
      chengduo 提交于
      * Open fuse all reduce op
      test=develop
      
      * Add Fuse optimization op log
      
      * Add log in fuse_optimizer op pass and fuse all_reduce op pass
      
      * replace with boost::optional<bool>
      test=develop
      
      * Polish code
      test=develop
      
      * fix code coverage
      test=develop
      056fdedd
  10. 11 9月, 2019 1 次提交
  11. 12 8月, 2019 1 次提交
  12. 02 8月, 2019 1 次提交
  13. 29 7月, 2019 1 次提交
  14. 27 7月, 2019 1 次提交
  15. 26 7月, 2019 1 次提交
    • Z
      Feature/mem opt pass refactor (#18735) · a802da65
      Zeng Jinle 提交于
      * first version memory optimize pass, test=develop
      
      * remove move_tensor_sharing_pass, test=develop
      
      * refine code comments, add unittests, test=develop
      
      * turn off memory_optimize by default, test=develop
      
      * follow huihuang's comments, test=develop
      
      * follow chengduoZH's comments, test=develop
      
      * fix grammar error, add const qualifier, fix pass_test exception message, test=develop
      
      * follow chengduoZH's comments 2nd, test=develop
      a802da65
  16. 11 7月, 2019 2 次提交
    • G
    • Z
      Feature/buffer_shared_inplace (#17911) · d3003a16
      Zeng Jinle 提交于
      * feature/buffer_shared_inplace, test=develop
      
      * refine code, test=develop
      
      * fix elementwise_add op cpu inplace and sum inplace bug, test=develop
      
      * add unittest and debug log, test=develop
      
      * fix parallel_executor scope bug, polish code, test=develop
      
      * fix sum op, activation op, single_in_place_inference bug, test=develop
      
      * remove kLocalExecScopeName, test=develop
      
      * fix unittest,test=develop
      
      * fix out_var first version bug, test=develop
      
      * follow comments,test=develop
      d3003a16
  17. 24 6月, 2019 1 次提交
    • C
      Clean build strategy (#18148) · 5489216e
      chengduo 提交于
      * clean build_strategy
      test=develop
      
      * DataBalanceOpHandle has been removed
      test=develop
      
      * debug
      
      * update build_strategy.
      test=develop
      5489216e
  18. 14 6月, 2019 1 次提交
  19. 06 6月, 2019 1 次提交
  20. 27 5月, 2019 1 次提交
  21. 20 5月, 2019 1 次提交
  22. 14 5月, 2019 1 次提交
  23. 11 4月, 2019 1 次提交
  24. 10 4月, 2019 1 次提交
  25. 08 4月, 2019 1 次提交
  26. 02 4月, 2019 1 次提交
  27. 28 3月, 2019 2 次提交
  28. 20 3月, 2019 1 次提交
    • C
      Fuse AllReduce (#15921) · f26ba5bd
      chengduo 提交于
      * fuse all_reduce
      test=develop
      
      * add fuse_parameter_groups_size
      test=develop
      
      * Polish code
      test=develop
      
      * Fix travis-ci
      test=develop
      
      * Add SetGroupAccordingToLayers and SetGroupAccordingToGroupSize
      test=develop
      
      * Add SetGroupAccordingToMemorySize
      test=develop
      
      * fix multi_devices_graph
      test=develop
      
      * reset params_grads
      test=develop
      
      * Polish code
      test=develop
      f26ba5bd
  29. 15 3月, 2019 1 次提交
    • Q
      Support sync batch norm. (#16121) · 8ad672a2
      qingqing01 提交于
      * Support Sync Batch Norm.
      * Note, do not enable it in one device.
      
      Usage:
      
      build_strategy = fluid.BuildStrategy()
      build_strategy.sync_batch_norm = True
      binary = fluid.compiler.CompiledProgram(tp).with_data_parallel(
              loss_name=loss_mean.name,
              build_strategy=build_strategy)
      8ad672a2
  30. 06 3月, 2019 1 次提交
    • L
      add IfElse test case for ir memory optimize (#15998) · 9cc6f400
      liuwei1031 提交于
      * add ir memory optimize test case for IfElse op, test=develop
      
      * fix some unitttest failure by force using the python memory_optimize, test=develop
      
      * tweak comments, test=develop
      
      * fix unittest, test=develop
      
      * fix unittest, test=develop
      9cc6f400
  31. 05 3月, 2019 2 次提交
  32. 21 2月, 2019 1 次提交
  33. 11 2月, 2019 1 次提交
  34. 31 1月, 2019 1 次提交
  35. 22 1月, 2019 1 次提交
  36. 21 1月, 2019 2 次提交
    • D
      squash commits. test=develop · 8f3b2523
      dzhwinter 提交于
      8f3b2523
    • D
      Memory optimization of depthwise conv op and group norm op (#15313) · 9f8f0fc2
      Dun 提交于
      * mem opt
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * refine code  test=develop
      
      * refine code  test=develop
      
      * refine code  test=develop
      
      * refine code  test=develop
      
      * refine with cub test=develop
      
      * fix mkldnn test && remove comments && test=develop
      
      * polish code && test=develop
      
      * add only_forward test && test=develop
      9f8f0fc2