1. 25 9月, 2020 1 次提交
  2. 15 9月, 2020 1 次提交
    • E
      tcp: remove SOCK_QUEUE_SHRUNK · 0cbe6a8f
      Eric Dumazet 提交于
      SOCK_QUEUE_SHRUNK is currently used by TCP as a temporary state
      that remembers if some room has been made in the rtx queue
      by an incoming ACK packet.
      
      This is later used from tcp_check_space() before
      considering to send EPOLLOUT.
      
      Problem is: If we receive SACK packets, and no packet
      is removed from RTX queue, we can send fresh packets, thus
      moving them from write queue to rtx queue and eventually
      empty the write queue.
      
      This stall can happen if TCP_NOTSENT_LOWAT is used.
      
      With this fix, we no longer risk stalling sends while holes
      are repaired, and we can fully use socket sndbuf.
      
      This also removes a cache line dirtying for typical RPC
      workloads.
      
      Fixes: c9bee3b7 ("tcp: TCP_NOTSENT_LOWAT socket option")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cbe6a8f
  3. 01 9月, 2020 1 次提交
  4. 26 8月, 2020 1 次提交
    • K
      bpf: Renames in preparation for bpf_local_storage · 1f00d375
      KP Singh 提交于
      A purely mechanical change to split the renaming from the actual
      generalization.
      
      Flags/consts:
      
        SK_STORAGE_CREATE_FLAG_MASK	BPF_LOCAL_STORAGE_CREATE_FLAG_MASK
        BPF_SK_STORAGE_CACHE_SIZE	BPF_LOCAL_STORAGE_CACHE_SIZE
        MAX_VALUE_SIZE		BPF_LOCAL_STORAGE_MAX_VALUE_SIZE
      
      Structs:
      
        bucket			bpf_local_storage_map_bucket
        bpf_sk_storage_map		bpf_local_storage_map
        bpf_sk_storage_data		bpf_local_storage_data
        bpf_sk_storage_elem		bpf_local_storage_elem
        bpf_sk_storage		bpf_local_storage
      
      The "sk" member in bpf_local_storage is also updated to "owner"
      in preparation for changing the type to void * in a subsequent patch.
      
      Functions:
      
        selem_linked_to_sk			selem_linked_to_storage
        selem_alloc				bpf_selem_alloc
        __selem_unlink_sk			bpf_selem_unlink_storage_nolock
        __selem_link_sk			bpf_selem_link_storage_nolock
        selem_unlink_sk			__bpf_selem_unlink_storage
        sk_storage_update			bpf_local_storage_update
        __sk_storage_lookup			bpf_local_storage_lookup
        bpf_sk_storage_map_free		bpf_local_storage_map_free
        bpf_sk_storage_map_alloc		bpf_local_storage_map_alloc
        bpf_sk_storage_map_alloc_check	bpf_local_storage_map_alloc_check
        bpf_sk_storage_map_check_btf		bpf_local_storage_map_check_btf
      Signed-off-by: NKP Singh <kpsingh@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200825182919.1118197-2-kpsingh@chromium.org
      1f00d375
  5. 06 8月, 2020 1 次提交
  6. 25 7月, 2020 2 次提交
  7. 20 7月, 2020 3 次提交
  8. 14 7月, 2020 1 次提交
  9. 10 7月, 2020 1 次提交
  10. 25 6月, 2020 1 次提交
  11. 24 6月, 2020 1 次提交
  12. 02 6月, 2020 1 次提交
  13. 30 5月, 2020 1 次提交
  14. 29 5月, 2020 9 次提交
  15. 08 4月, 2020 1 次提交
  16. 31 3月, 2020 3 次提交
  17. 22 2月, 2020 1 次提交
    • J
      net, sk_msg: Clear sk_user_data pointer on clone if tagged · f1ff5ce2
      Jakub Sitnicki 提交于
      sk_user_data can hold a pointer to an object that is not intended to be
      shared between the parent socket and the child that gets a pointer copy on
      clone. This is the case when sk_user_data points at reference-counted
      object, like struct sk_psock.
      
      One way to resolve it is to tag the pointer with a no-copy flag by
      repurposing its lowest bit. Based on the bit-flag value we clear the child
      sk_user_data pointer after cloning the parent socket.
      
      The no-copy flag is stored in the pointer itself as opposed to externally,
      say in socket flags, to guarantee that the pointer and the flag are copied
      from parent to child socket in an atomic fashion. Parent socket state is
      subject to change while copying, we don't hold any locks at that time.
      
      This approach relies on an assumption that sk_user_data holds a pointer to
      an object aligned at least 2 bytes. A manual audit of existing users of
      rcu_dereference_sk_user_data helper confirms our assumption.
      
      Also, an RCU-protected sk_user_data is not likely to hold a pointer to a
      char value or a pathological case of "struct { char c; }". To be safe, warn
      when the flag-bit is set when setting sk_user_data to catch any future
      misuses.
      
      It is worth considering why clearing sk_user_data unconditionally is not an
      option. There exist users, DRBD, NVMe, and Xen drivers being among them,
      that rely on the pointer being copied when cloning the listening socket.
      
      Potentially we could distinguish these users by checking if the listening
      socket has been created in kernel-space via sock_create_kern, and hence has
      sk_kern_sock flag set. However, this is not the case for NVMe and Xen
      drivers, which create sockets without marking them as belonging to the
      kernel.
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200218171023.844439-3-jakub@cloudflare.com
      f1ff5ce2
  18. 17 2月, 2020 1 次提交
    • R
      net/sock.h: fix all kernel-doc warnings · 66256e0b
      Randy Dunlap 提交于
      Fix all kernel-doc warnings for <net/sock.h>.
      Fixes these warnings:
      
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_portpair' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_cookie' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_listener' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common'
      ../include/net/sock.h:232: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common'
      
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_rx_skb_cache' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_wq_raw' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_tx_skb_cache' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_txtime_report_errors' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_validate_xmit_skb' not described in 'sock'
      ../include/net/sock.h:498: warning: Function parameter or member 'sk_bpf_storage' not described in 'sock'
      
      ../include/net/sock.h:2024: warning: No description found for return value of 'sk_wmem_alloc_get'
      ../include/net/sock.h:2035: warning: No description found for return value of 'sk_rmem_alloc_get'
      ../include/net/sock.h:2046: warning: No description found for return value of 'sk_has_allocations'
      ../include/net/sock.h:2082: warning: No description found for return value of 'skwq_has_sleeper'
      ../include/net/sock.h:2244: warning: No description found for return value of 'sk_page_frag'
      ../include/net/sock.h:2444: warning: Function parameter or member 'tcp_rx_skb_cache_key' not described in 'DECLARE_STATIC_KEY_FALSE'
      ../include/net/sock.h:2444: warning: Excess function parameter 'sk' description in 'DECLARE_STATIC_KEY_FALSE'
      ../include/net/sock.h:2444: warning: Excess function parameter 'skb' description in 'DECLARE_STATIC_KEY_FALSE'
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66256e0b
  19. 22 1月, 2020 1 次提交
  20. 10 1月, 2020 3 次提交
  21. 18 12月, 2019 1 次提交
  22. 14 12月, 2019 1 次提交
  23. 10 12月, 2019 1 次提交
  24. 07 11月, 2019 2 次提交
    • E
      net: silence data-races on sk_backlog.tail · 9ed498c6
      Eric Dumazet 提交于
      sk->sk_backlog.tail might be read without holding the socket spinlock,
      we need to add proper READ_ONCE()/WRITE_ONCE() to silence the warnings.
      
      KCSAN reported :
      
      BUG: KCSAN: data-race in tcp_add_backlog / tcp_recvmsg
      
      write to 0xffff8881265109f8 of 8 bytes by interrupt on cpu 1:
       __sk_add_backlog include/net/sock.h:907 [inline]
       sk_add_backlog include/net/sock.h:938 [inline]
       tcp_add_backlog+0x476/0xce0 net/ipv4/tcp_ipv4.c:1759
       tcp_v4_rcv+0x1a70/0x1bd0 net/ipv4/tcp_ipv4.c:1947
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:4929
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5043
       netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5133
       napi_skb_finish net/core/dev.c:5596 [inline]
       napi_gro_receive+0x28f/0x330 net/core/dev.c:5629
       receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061
       virtnet_receive drivers/net/virtio_net.c:1323 [inline]
       virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428
       napi_poll net/core/dev.c:6311 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6379
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       invoke_softirq kernel/softirq.c:373 [inline]
       irq_exit+0xbb/0xe0 kernel/softirq.c:413
       exiting_irq arch/x86/include/asm/apic.h:536 [inline]
       do_IRQ+0xa6/0x180 arch/x86/kernel/irq.c:263
       ret_from_intr+0x0/0x19
       native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71
       arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571
       default_idle_call+0x1e/0x40 kernel/sched/idle.c:94
       cpuidle_idle_call kernel/sched/idle.c:154 [inline]
       do_idle+0x1af/0x280 kernel/sched/idle.c:263
       cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355
       start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264
       secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241
      
      read to 0xffff8881265109f8 of 8 bytes by task 8057 on cpu 0:
       tcp_recvmsg+0x46e/0x1b40 net/ipv4/tcp.c:2050
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1889 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
       ksys_read+0xd5/0x1b0 fs/read_write.c:587
       __do_sys_read fs/read_write.c:597 [inline]
       __se_sys_read fs/read_write.c:595 [inline]
       __x64_sys_read+0x4c/0x60 fs/read_write.c:595
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 8057 Comm: syz-fuzzer Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ed498c6
    • E
      net: annotate lockless accesses to sk->sk_max_ack_backlog · 099ecf59
      Eric Dumazet 提交于
      sk->sk_max_ack_backlog can be read without any lock being held
      at least in TCP/DCCP cases.
      
      We need to use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing
      and/or potential KCSAN warnings.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      099ecf59