• Z
    add launch bound to limit the registers usage for volta architecture (#38113) · 18a59822
    zlsh80826 提交于
    From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block.
    There are two ways to resolve this problem:
        Reduce the threads per block launch configuration
        add __launch_bound__ to give information to nvcc compiler for reducing registers usage
    this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
    18a59822
segment_pooling.cu 16.4 KB