1. 16 11月, 2021 2 次提交
    • E
      tcp: defer skb freeing after socket lock is released · f35f8219
      Eric Dumazet 提交于
      tcp recvmsg() (or rx zerocopy) spends a fair amount of time
      freeing skbs after their payload has been consumed.
      
      A typical ~64KB GRO packet has to release ~45 page
      references, eventually going to page allocator
      for each of them.
      
      Currently, this freeing is performed while socket lock
      is held, meaning that there is a high chance that
      BH handler has to queue incoming packets to tcp socket backlog.
      
      This can cause additional latencies, because the user
      thread has to process the backlog at release_sock() time,
      and while doing so, additional frames can be added
      by BH handler.
      
      This patch adds logic to defer these frees after socket
      lock is released, or directly from BH handler if possible.
      
      Being able to free these skbs from BH handler helps a lot,
      because this avoids the usual alloc/free assymetry,
      when BH handler and user thread do not run on same cpu or
      NUMA node.
      
      One cpu can now be fully utilized for the kernel->user copy,
      and another cpu is handling BH processing and skb/page
      allocs/frees (assuming RFS is not forcing use of a single CPU)
      
      Tested:
       100Gbit NIC
       Max throughput for one TCP_STREAM flow, over 10 runs
      
      MTU : 1500
      Before: 55 Gbit
      After:  66 Gbit
      
      MTU : 4096+(headers)
      Before: 82 Gbit
      After:  95 Gbit
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f35f8219
    • E
      net: use sk_is_tcp() in more places · 42f67eea
      Eric Dumazet 提交于
      Move sk_is_tcp() to include/net/sock.h and use it where we can.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42f67eea
  2. 15 11月, 2021 3 次提交
  3. 13 11月, 2021 2 次提交
  4. 12 11月, 2021 4 次提交
  5. 11 11月, 2021 2 次提交
    • D
      libata: fix read log timeout value · 68dbbe7d
      Damien Le Moal 提交于
      Some ATA drives are very slow to respond to READ_LOG_EXT and
      READ_LOG_DMA_EXT commands issued from ata_dev_configure() when the
      device is revalidated right after resuming a system or inserting the
      ATA adapter driver (e.g. ahci). The default 5s timeout
      (ATA_EH_CMD_DFL_TIMEOUT) used for these commands is too short, causing
      errors during the device configuration. Ex:
      
      ...
      ata9: SATA max UDMA/133 abar m524288@0x9d200000 port 0x9d200400 irq 209
      ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
      ata9.00: ATA-9: XXX  XXXXXXXXXXXXXXX, XXXXXXXX, max UDMA/133
      ata9.00: qc timeout (cmd 0x2f)
      ata9.00: Read log page 0x00 failed, Emask 0x4
      ata9.00: Read log page 0x00 failed, Emask 0x40
      ata9.00: NCQ Send/Recv Log not supported
      ata9.00: Read log page 0x08 failed, Emask 0x40
      ata9.00: 27344764928 sectors, multi 16: LBA48 NCQ (depth 32), AA
      ata9.00: Read log page 0x00 failed, Emask 0x40
      ata9.00: ATA Identify Device Log not supported
      ata9.00: failed to set xfermode (err_mask=0x40)
      ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
      ata9.00: configured for UDMA/133
      ...
      
      The timeout error causes a soft reset of the drive link, followed in
      most cases by a successful revalidation as that give enough time to the
      drive to become fully ready to quickly process the read log commands.
      However, in some cases, this also fails resulting in the device being
      dropped.
      
      Fix this by using adding the ata_eh_revalidate_timeouts entries for the
      READ_LOG_EXT and READ_LOG_DMA_EXT commands. This defines a timeout
      increased to 15s, retriable one time.
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      68dbbe7d
    • M
      bpf: Add ingress_ifindex to bpf_sk_lookup · f8931565
      Mark Pashmfouroush 提交于
      It may be helpful to have access to the ifindex during bpf socket
      lookup. An example may be to scope certain socket lookup logic to
      specific interfaces, i.e. an interface may be made exempt from custom
      lookup code.
      
      Add the ifindex of the arriving connection to the bpf_sk_lookup API.
      Signed-off-by: NMark Pashmfouroush <markpash@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211110111016.5670-2-markpash@cloudflare.com
      f8931565
  6. 10 11月, 2021 27 次提交