• A
    net-zerocopy: Defer vm zap unless actually needed. · 94ab9eb9
    Arjun Roy 提交于
    Zapping pages is required only if we are calling vm_insert_page into a
    region where pages had previously been mapped. Receive zerocopy allows
    reusing such regions, and hitherto called zap_page_range() before
    calling vm_insert_page() in that range.
    
    zap_page_range() can also be triggered from userspace with
    madvise(MADV_DONTNEED). If userspace is configured to call this before
    reusing a segment, or if there was nothing mapped at this virtual
    address to begin with, we can avoid calling zap_page_range() under the
    socket lock. That said, if userspace does not do that, then we are
    still responsible for calling zap_page_range().
    
    This patch adds a flag that the user can use to hint to the kernel
    that a zap is not required. If the flag is not set, or if an older
    user application does not have a flags field at all, then the kernel
    calls zap_page_range as before. Also, if the flag is set but a zap is
    still required, the kernel performs that zap as necessary. Thus
    incorrectly indicating that a zap can be avoided does not change the
    correctness of operation. It also increases the batchsize for
    vm_insert_pages and prefetches the page struct for the batch since
    we're about to bump the refcount.
    
    An alternative mechanism could be to not have a flag, assume by
    default a zap is not needed, and fall back to zapping if needed.
    However, this would harm performance for older applications for which
    a zap is necessary, and thus we implement it with an explicit flag
    so newer applications can opt in.
    
    When using RPC-style traffic with medium sized (tens of KB) RPCs, this
    change yields an efficency improvement of about 30% for QPS/CPU usage.
    Signed-off-by: NArjun Roy <arjunroy@google.com>
    Signed-off-by: NEric Dumazet <edumazet@google.com>
    Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: NJakub Kicinski <kuba@kernel.org>
    94ab9eb9
tcp.c 116.1 KB