1. 31 8月, 2021 5 次提交
    • Z
      New whl release strategy with pruned nv_fatbin (#35239) · 2f3b393d
      Zhanlue Yang 提交于
      [Background]
      Expansion in code size can be irreversible in the long run, leading to huge release packages which
      not only hampers user experience but also exceeds a hard limit of pypi.
      
      In such, NV_FATBIN section takes up 86% of the compiled dylib size, owing to the vast number of GPU
      arches supported.
      
      This PR aims to prune this NV_FATBIN.
      
      [Solution]
      In the new release strategy, two types of whl packages will be involved:
      
      Cubin PIP package:
      PIP package maintains a smaller window for GPU arches support, containing
      sm_60, sm_70, sm_75, sm_80 cubins, covering Pascal - Ampere arches
      
      JIT release package:
      This is a backup for Cubin PIP package, containing compute_35, compute_50, compute_60,
      compute_70, compute_75, compute_80, with best performance and GPU arches coverage.
      
      However, it takes around 10 min to install due to the JIT compilation.
      
      [How to use]
      The new release strategy is disabled by default.
      To compile for Cubin PIP package, add this to cmake: -DCUBIN_RELEASE_PIP
      To compile for JIT release package, add this to cmake: -DJIT_RELEASE_WHL
      2f3b393d
    • W
      update infer trt ut. (#35261) · 96e7d903
      Wilber 提交于
      96e7d903
    • X
      support fuse layers for ptq (#35015) · ef536250
      XGZhang 提交于
      ef536250
    • A
      NPU add elementwise_mod (#35245) · 561841d2
      Aganlengzi 提交于
      561841d2
    • A
      NPU add fill_zeros_like kernel (#35246) · aaaa9965
      Aganlengzi 提交于
      aaaa9965
  2. 30 8月, 2021 3 次提交
  3. 29 8月, 2021 1 次提交
  4. 27 8月, 2021 31 次提交