• Z
    New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d
    Zhanlue Yang 提交于
    [Background]
    Expansion in code size can be irreversible in the long run, leading to huge release packages which
    not only hampers user experience but also exceeds a hard limit of pypi.
    
    In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
    arches supported.
    
    This PR aims to prune this NV_FATBIN.
    
    [Solution]
    In the new release strategy, two types of whl packages will be involved:
    
    Cubin PIP package:
    PIP package maintains a smaller window for GPU arches support, containing
    sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches
    
    JIT release package:
    This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
    compute_70, compute_75, compute_80, with best performance and GPU arches coverage.
    
    However, it takes around 10 min to install due to the JIT compilation.
    
    [How to use]
    The new release strategy is disabled by default.
    To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
    To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL
    2f3b393d
cuda.cmake 10.1 KB