1. 09 7月, 2020 1 次提交
  2. 03 1月, 2020 1 次提交
    • Y
      Add the first implememtation of fusion_group op (#19621) · d4832077
      Yiqun Liu 提交于
      * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
      test=develop
      
      * Call CUDA driver api to launch the kernel compiled by nvrtc.
      test=develop
      
      * Disable for mac and windows.
      test=develop
      
      * Refine the codes to support manually specified num_threads and workload_per_thread.
      test=develop
      
      * Refine the CUDA kernel to support large dims.
      test=develop
      
      * Add DeviceCodePool to manage all device codes.
      
      * Add the first implementation fusion_group op.
      
      * Add unit-test for fusion_group op.
      
      * Add the check of result.
      
      * Add the check of nvrtc in unit-test.
      test=develop
      
      * Add comment to explain the inputs, outputs and features of fusion_group op.
      test=develop
      
      * Disable fusion_group op for mac and windows.
      test=develop
      
      * Make the compiling of device code return status instead of hanging up.
      test=develop
      
      * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
      
      * Unify fusion_group_op's input and output names.
      test=develop
      
      * Add the check of CUDA driver library in unittest.
      test=develop
      
      * Refine the calling of PADDLE_ENFORCE.
      test=develop
      d4832077
  3. 05 9月, 2019 1 次提交
    • Y
      Integrate NVRTC to support compiling CUDA kernel at runtime (#19422) · 42b5bec6
      Yiqun Liu 提交于
      * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
      test=develop
      
      * Call CUDA driver api to launch the kernel compiled by nvrtc.
      test=develop
      
      * Disable for mac and windows.
      test=develop
      
      * Refine the codes to support manually specified num_threads and workload_per_thread.
      test=develop
      
      * Refine the CUDA kernel to support large dims.
      test=develop
      42b5bec6
  4. 20 6月, 2018 1 次提交
  5. 08 4月, 2018 1 次提交
  6. 12 2月, 2018 1 次提交
  7. 10 2月, 2018 2 次提交
  8. 26 12月, 2017 1 次提交
  9. 15 12月, 2017 1 次提交
  10. 24 10月, 2017 1 次提交
    • Y
      Feature/nccl dso (#5001) · 43c6ff21
      Yu Yang 提交于
      * "add nccl enforce"
      
      * Dev
      
      * Update comment
      
      * Add nccl test
      
      * Follow comments
      43c6ff21
  11. 20 8月, 2017 2 次提交
  12. 16 8月, 2017 1 次提交
  13. 01 8月, 2017 1 次提交
  14. 26 7月, 2017 2 次提交
  15. 25 7月, 2017 1 次提交
  16. 17 7月, 2017 2 次提交
  17. 11 7月, 2017 2 次提交
  18. 06 7月, 2017 2 次提交
  19. 05 7月, 2017 1 次提交
  20. 04 7月, 2017 4 次提交
  21. 03 7月, 2017 1 次提交
  22. 28 6月, 2017 2 次提交