• T
    net: avoid double accounting for pure zerocopy skbs · f1a456f8
    Talal Ahmad 提交于
    Track skbs with only zerocopy data and avoid charging them to kernel
    memory to correctly account the memory utilization for msg_zerocopy.
    All of the data in such skbs is held in user pages which are already
    accounted to user. Before this change, they are charged again in
    kernel in __zerocopy_sg_from_iter. The charging in kernel is
    excessive because data is not being copied into skb frags. This
    excessive charging can lead to kernel going into memory pressure
    state which impacts all sockets in the system adversely. Mark pure
    zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove
    charge/uncharge for data in such skbs.
    
    Initially, an skb is marked pure zerocopy when it is empty and in
    zerocopy path. skb can then change from a pure zerocopy skb to mixed
    data skb (zerocopy and copy data) if it is at tail of write queue and
    there is room available in it and non-zerocopy data is being sent in
    the next sendmsg call. At this time sk_mem_charge is done for the pure
    zerocopied data and the pure zerocopy flag is unmarked. We found that
    this happens very rarely on workloads that pass MSG_ZEROCOPY.
    
    A pure zerocopy skb can later be coalesced into normal skb if they are
    next to each other in queue but this patch prevents coalescing from
    happening. This avoids complexity of charging when skb downgrades from
    pure zerocopy to mixed. This is also rare.
    
    In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge
    for SKB_TRUESIZE(MAX_TCP_HEADER) is done for sk_mem_charge in
    tcp_skb_entail for an skb without data.
    
    Testing with the msg_zerocopy.c benchmark between two hosts(100G nics)
    with zerocopy showed that before this patch the 'sock' variable in
    memory.stat for cgroup2 that tracks sum of sk_forward_alloc,
    sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this
    change it is 0. This is due to no charge to sk_forward_alloc for
    zerocopy data and shows memory utilization for kernel is lowered.
    Signed-off-by: NTalal Ahmad <talalahmad@google.com>
    Acked-by: NArjun Roy <arjunroy@google.com>
    Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: NWillem de Bruijn <willemb@google.com>
    Signed-off-by: NEric Dumazet <edumazet@google.com>
    Signed-off-by: NJakub Kicinski <kuba@kernel.org>
    f1a456f8
tcp_output.c 118.2 KB