1. 03 3月, 2020 1 次提交
  2. 02 3月, 2020 2 次提交
    • Z
      Unmerged fetch list (#22635) · 89cfa491
      Zhen Wang 提交于
      * update ScopeBufferedSSAGraphExecutor&AsyncSSAGraphExecutor&ThreadedSSAGraphExecutor&FastThreadedSSAGraphExecutor&ParallelSSAGraphExecutor&ParallelExecutor for fetching unmerged results.
      
      * add the unit test for fetch_unmerged.
      
      * update ut for multi-card and multi-cpu.
      
      * add the error message and the user suggestion in FetchOpHandle. test=develop
      89cfa491
    • H
      support customized download command in dataset (#22782) · 53a2b68f
      hutuxian 提交于
      * user can call dataset.set_download_cmd to set its customized download cmd
      * add UT to cover this scenario
      53a2b68f
  3. 01 3月, 2020 1 次提交
  4. 28 2月, 2020 1 次提交
  5. 26 2月, 2020 1 次提交
  6. 25 2月, 2020 1 次提交
    • H
      PaddleBox Framework Part2 (#22466) · 175954d8
      hutuxian 提交于
      * Add two types of Metric Calculator: MultiTaskCalculator & CmatchRankCalculator.
      * Add a config for DynamicAdjustChannelNum function to denote whether we will discard the remaining instances when they are not be distributed evenly.
      * Remove CPU code in Pull/PushSparse and we will add it back when testing it fully.
      * Fix some known issues: such as copying persistable vars after one epoch running.
      175954d8
  7. 24 2月, 2020 1 次提交
  8. 23 2月, 2020 1 次提交
  9. 22 2月, 2020 1 次提交
  10. 21 2月, 2020 1 次提交
  11. 18 2月, 2020 1 次提交
  12. 17 2月, 2020 1 次提交
  13. 15 2月, 2020 1 次提交
  14. 14 2月, 2020 1 次提交
  15. 13 2月, 2020 2 次提交
  16. 12 2月, 2020 1 次提交
  17. 11 2月, 2020 5 次提交
    • H
      Paddlebox about box_wrapper (#22497) · 1a7962be
      hutuxian 提交于
      Refine PaddleBox Framework, Main functions: 
      * Add MetricMsg util class, which can calculate metrics like AUC, bucket_error, COPC.
      * Replace FeedPass with new interface: BeginFeedPass & EndFeedPass
      * Refactor Pull/Push Sparse Function in box_wrapper.
      * Use CUDA Kernel to copy keys and copy feasign between tensor and boxps struct.
      * Cache copied keys in pull sparse in order to reuse it in push period.
      1a7962be
    • Y
      multi-loss optimization by adding a DownpourOpt worker (#22025) · 2235ee1a
      yaoxuefeng 提交于
      * update
      
      * update test=develop
      
      * update compile set test=develop
      
      * update compile set test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update compile setting test=develop
      
      * update compile setting test=develop
      
      * update run demo test=develop
      
      * update test=develop
      
      * update test=develop
      
      * fix test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update format test=develop
      
      * update format test=develop
      
      * update style test=develop
      
      * update style test=develop
      
      * change style test=develop
      
      * change style test=develop
      
      * change style test=develop
      
      * add dataset unittest test=develop
      
      * update test=develop
      
      * update for record test=develop
      
      * udpate style for record test=develop
      
      * update for record test=develop
      
      * update for record test=develop
      
      * update for record test=develop
      
      * fix format test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      
      * update test=develop
      2235ee1a
    • Z
      Improve transpose performance with tile sm copy, test=develop (#22311) · 54970444
      zhaoyuchen2018 提交于
      
      * Refine code, fix select tile error,test=develop
      
      * Refine element type and some comments, test=develop
      
      * Refine comments and gpu utils, test=develop
      
      * Remove some useless condition
      
      * Refine floor and ceil, test=develop
      
      * refine for loop. test=develop
      Signed-off-by: Nzhaoyuchen <zhaoyuchen01@baidu.com>
      54970444
    • W
      Compile without nccl deps. [1/2] (#22509) · a90fa540
      Wilber 提交于
      支持不依赖nccl进行编译。[1/2]
      
      多卡下,如果没有打开WITH_NCCL开关编译,多卡不能通信,则只能选择一张卡使用。
      Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
      a90fa540
    • G
      Make assign op support LoDTensorArray and modify while_loop API (#22309) · 3a59a7a1
      guofei 提交于
      This PR makes assign op support LoDTensorArray and enable the loop_vars in
      while_loop to support tuple or list.
      3a59a7a1
  18. 07 2月, 2020 1 次提交
    • Y
      Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038
      Yiqun Liu 提交于
      * Add the first implememtation of fusion_group op #19621 (#3)
      
      * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
      test=develop
      
      * Call CUDA driver api to launch the kernel compiled by nvrtc.
      test=develop
      
      * Disable for mac and windows.
      test=develop
      
      * Refine the codes to support manually specified num_threads and workload_per_thread.
      test=develop
      
      * Refine the CUDA kernel to support large dims.
      test=develop
      
      * Add DeviceCodePool to manage all device codes.
      
      * Add the first implementation fusion_group op.
      
      * Add unit-test for fusion_group op.
      
      * Add the check of result.
      
      * Add the check of nvrtc in unit-test.
      test=develop
      
      * Add comment to explain the inputs, outputs and features of fusion_group op.
      test=develop
      
      * Disable fusion_group op for mac and windows.
      test=develop
      
      * Make the compiling of device code return status instead of hanging up.
      test=develop
      
      * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
      
      * Unify fusion_group_op's input and output names.
      test=develop
      
      * Add the check of CUDA driver library in unittest.
      test=develop
      
      * Enable generating code for a given subgraph. #21126 (#4)
      
      * Enable generating code for a given subgraph.
      
      * Support sorting the subgraph.
      
      * Remove the rearange of expressions because we use the sorted subgraph directly.
      
      * Enable generating code for a subgraph which is composed of grad ops.
      
      * Use expression information to check the accuracy in unittest.
      
      * Separate load and store from computation expressions.
      test=develop
      
      * Improve the loading statements in generated codes.
      test=develop
      
      * Remove unused arguments from formal list.
      test=develop
      
      * Enable the detection of subgraph of grad ops.
      
      * Generate code for detected subgraph in fusion_group_pass.
      
      * Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
      test=develop
      
      * Fix a bug when checking whether the shape of all inputs are the same.
      
      * Add debug information.
      
      * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)
      
      test=develop
      
      * Call subgraph_detector in fusion_group pass.
      test=develop
      
      * Disable fusion_group when WITH_GPU is OFF.
      test=develop
      
      * Refine all PADDLE_ENFORCE message.
      test=develop
      
      * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
      test=develop
      
      * Follow review comments.
      test=develop
      dcfb6038
  19. 06 2月, 2020 1 次提交
  20. 05 2月, 2020 1 次提交
  21. 04 2月, 2020 2 次提交
  22. 02 2月, 2020 1 次提交
  23. 31 1月, 2020 1 次提交
    • M
      [DNNL] Fix accuracy in INT8 FC (#22404) · 269db0d1
      Michał Gallus 提交于
      * Enable quantize to reorder to nchw as well
      
      * Correct FC MKL-DNN input dim requirements to accept 3D
      
      * Improve DNNL FC format, error and 3D input handling
      
      test=develop
      
      * Improve error checking in FC
      
      test=develop
      
      * Improve PADDLE_ENFORCE messages in fc-related files
      
      * Remove data layout attribute from obligatory pass args
      
      test=develop
      
      * Fix message in fc_mkldnn_pass to be logically correct
      
      test=develop
      269db0d1
  24. 25 1月, 2020 1 次提交
  25. 19 1月, 2020 1 次提交
  26. 17 1月, 2020 2 次提交
    • Y
      Implement a common python unittest to test the ir passes. (#22209) · b7cac50b
      Yiqun Liu 提交于
      * Implement a common python unittest to test the ir passes.
      test=develop
      
      * Save the results in np.array and support to startup on CPU.
      test=develop
      
      * Fix the unittest.
      test=develop
      
      * Add check_program to check whether the optimized program is different from the origin one.
      test=develop
      
      * Remove the inferface all_ops.
      test=develop
      
      * Add exception test in pass_test.
      test=develop
      b7cac50b
    • T
      integrated HALF_ASYNC to communicator (#21869) · 82bc814a
      tangwei12 提交于
      * add half_async in the communicator
      * fix DistributedStrategy
      82bc814a
  27. 16 1月, 2020 2 次提交
  28. 15 1月, 2020 1 次提交
  29. 14 1月, 2020 3 次提交