1. 25 3月, 2020 1 次提交
  2. 19 3月, 2020 1 次提交
  3. 18 3月, 2020 1 次提交
  4. 13 3月, 2020 1 次提交
  5. 12 3月, 2020 1 次提交
  6. 07 3月, 2020 2 次提交
  7. 04 3月, 2020 1 次提交
    • Z
      Add flags to limit gpu memory (#22793) · d41d802b
      Zeng Jinle 提交于
      * add recorded cuda memory apis, fix typo, test=develop
      
      * add more ut, test=develop
      
      * follow comments, test=develop
      
      * fix py35 incompatible issues, test=develop
      d41d802b
  8. 03 3月, 2020 1 次提交
  9. 02 3月, 2020 2 次提交
  10. 26 2月, 2020 1 次提交
  11. 25 2月, 2020 1 次提交
  12. 24 2月, 2020 1 次提交
  13. 23 2月, 2020 1 次提交
  14. 21 2月, 2020 1 次提交
  15. 19 2月, 2020 1 次提交
  16. 18 2月, 2020 1 次提交
  17. 14 2月, 2020 2 次提交
  18. 10 2月, 2020 1 次提交
  19. 07 2月, 2020 1 次提交
  20. 06 2月, 2020 1 次提交
  21. 05 2月, 2020 1 次提交
  22. 31 1月, 2020 1 次提交
    • M
      [DNNL] Fix accuracy in INT8 FC (#22404) · 269db0d1
      Michał Gallus 提交于
      * Enable quantize to reorder to nchw as well
      
      * Correct FC MKL-DNN input dim requirements to accept 3D
      
      * Improve DNNL FC format, error and 3D input handling
      
      test=develop
      
      * Improve error checking in FC
      
      test=develop
      
      * Improve PADDLE_ENFORCE messages in fc-related files
      
      * Remove data layout attribute from obligatory pass args
      
      test=develop
      
      * Fix message in fc_mkldnn_pass to be logically correct
      
      test=develop
      269db0d1
  23. 10 1月, 2020 1 次提交
  24. 09 1月, 2020 3 次提交
  25. 08 1月, 2020 2 次提交
  26. 07 1月, 2020 2 次提交
  27. 06 1月, 2020 3 次提交
  28. 05 1月, 2020 1 次提交
  29. 03 1月, 2020 1 次提交
    • Y
      Add the first implememtation of fusion_group op (#19621) · d4832077
      Yiqun Liu 提交于
      * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
      test=develop
      
      * Call CUDA driver api to launch the kernel compiled by nvrtc.
      test=develop
      
      * Disable for mac and windows.
      test=develop
      
      * Refine the codes to support manually specified num_threads and workload_per_thread.
      test=develop
      
      * Refine the CUDA kernel to support large dims.
      test=develop
      
      * Add DeviceCodePool to manage all device codes.
      
      * Add the first implementation fusion_group op.
      
      * Add unit-test for fusion_group op.
      
      * Add the check of result.
      
      * Add the check of nvrtc in unit-test.
      test=develop
      
      * Add comment to explain the inputs, outputs and features of fusion_group op.
      test=develop
      
      * Disable fusion_group op for mac and windows.
      test=develop
      
      * Make the compiling of device code return status instead of hanging up.
      test=develop
      
      * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
      
      * Unify fusion_group_op's input and output names.
      test=develop
      
      * Add the check of CUDA driver library in unittest.
      test=develop
      
      * Refine the calling of PADDLE_ENFORCE.
      test=develop
      d4832077
  30. 01 1月, 2020 1 次提交
  31. 30 12月, 2019 1 次提交