1. 04 8月, 2021 1 次提交
    • P
      sock: allow reading and changing sk_userlocks with setsockopt · 04190bf8
      Pavel Tikhomirov 提交于
      SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags disable automatic socket
      buffers adjustment done by kernel (see tcp_fixup_rcvbuf() and
      tcp_sndbuf_expand()). If we've just created a new socket this adjustment
      is enabled on it, but if one changes the socket buffer size by
      setsockopt(SO_{SND,RCV}BUF*) it becomes disabled.
      
      CRIU needs to call setsockopt(SO_{SND,RCV}BUF*) on each socket on
      restore as it first needs to increase buffer sizes for packet queues
      restore and second it needs to restore back original buffer sizes. So
      after CRIU restore all sockets become non-auto-adjustable, which can
      decrease network performance of restored applications significantly.
      
      CRIU need to be able to restore sockets with enabled/disabled adjustment
      to the same state it was before dump, so let's add special setsockopt
      for it.
      
      Let's also export SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK flags to uAPI so
      that using these interface one can reenable automatic socket buffer
      adjustment on their sockets.
      Signed-off-by: NPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04190bf8
  2. 29 7月, 2021 1 次提交
    • P
      skbuff: allow 'slow_gro' for skb carring sock reference · 5e10da53
      Paolo Abeni 提交于
      This change leverages the infrastructure introduced by the previous
      patches to allow soft devices passing to the GRO engine owned skbs
      without impacting the fast-path.
      
      It's up to the GRO caller ensuring the slow_gro bit validity before
      invoking the GRO engine. The new helper skb_prepare_for_gro() is
      introduced for that goal.
      
      On slow_gro, skbs are aggregated only with equal sk.
      Additionally, skb truesize on GRO recycle and free is correctly
      updated so that sk wmem is not changed by the GRO processing.
      
      rfc-> v1:
       - fixed bad truesize on dev_gro_receive NAPI_FREE
       - use the existing state bit
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e10da53
  3. 02 7月, 2021 1 次提交
    • Y
      net: sock: extend SO_TIMESTAMPING for PHC binding · d463126e
      Yangbo Lu 提交于
      Since PTP virtual clock support is added, there can be
      several PTP virtual clocks based on one PTP physical
      clock for timestamping.
      
      This patch is to extend SO_TIMESTAMPING API to support
      PHC (PTP Hardware Clock) binding by adding a new flag
      SOF_TIMESTAMPING_BIND_PHC. When PTP virtual clocks are
      in use, user space can configure to bind one for
      timestamping, but PTP physical clock is not supported
      and not needed to bind.
      
      This patch is preparation for timestamp conversion from
      raw timestamp to a specific PTP virtual clock time in
      core net.
      Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d463126e
  4. 30 6月, 2021 1 次提交
  5. 11 6月, 2021 2 次提交
    • E
      inet: annotate date races around sk->sk_txhash · b71eaed8
      Eric Dumazet 提交于
      UDP sendmsg() path can be lockless, it is possible for another
      thread to re-connect an change sk->sk_txhash under us.
      
      There is no serious impact, but we can use READ_ONCE()/WRITE_ONCE()
      pair to document the race.
      
      BUG: KCSAN: data-race in __ip4_datagram_connect / skb_set_owner_w
      
      write to 0xffff88813397920c of 4 bytes by task 30997 on cpu 1:
       sk_set_txhash include/net/sock.h:1937 [inline]
       __ip4_datagram_connect+0x69e/0x710 net/ipv4/datagram.c:75
       __ip6_datagram_connect+0x551/0x840 net/ipv6/datagram.c:189
       ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272
       inet_dgram_connect+0xfd/0x180 net/ipv4/af_inet.c:580
       __sys_connect_file net/socket.c:1837 [inline]
       __sys_connect+0x245/0x280 net/socket.c:1854
       __do_sys_connect net/socket.c:1864 [inline]
       __se_sys_connect net/socket.c:1861 [inline]
       __x64_sys_connect+0x3d/0x50 net/socket.c:1861
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff88813397920c of 4 bytes by task 31039 on cpu 0:
       skb_set_hash_from_sk include/net/sock.h:2211 [inline]
       skb_set_owner_w+0x118/0x220 net/core/sock.c:2101
       sock_alloc_send_pskb+0x452/0x4e0 net/core/sock.c:2359
       sock_alloc_send_skb+0x2d/0x40 net/core/sock.c:2373
       __ip6_append_data+0x1743/0x21a0 net/ipv6/ip6_output.c:1621
       ip6_make_skb+0x258/0x420 net/ipv6/ip6_output.c:1983
       udpv6_sendmsg+0x160a/0x16b0 net/ipv6/udp.c:1527
       inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:642
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
       __do_sys_sendmmsg net/socket.c:2519 [inline]
       __se_sys_sendmmsg net/socket.c:2516 [inline]
       __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0xbca3c43d -> 0xfdb309e0
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 31039 Comm: syz-executor.2 Not tainted 5.13.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b71eaed8
    • E
      net: annotate data race in sock_error() · f13ef100
      Eric Dumazet 提交于
      sock_error() is known to be racy. The code avoids
      an atomic operation is sk_err is zero, and this field
      could be changed under us, this is fine.
      
      Sysbot reported:
      
      BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock
      
      write to 0xffff888131855630 of 4 bytes by task 9365 on cpu 1:
       unix_release_sock+0x2e9/0x6e0 net/unix/af_unix.c:550
       unix_release+0x2f/0x50 net/unix/af_unix.c:859
       __sock_release net/socket.c:599 [inline]
       sock_close+0x6c/0x150 net/socket.c:1258
       __fput+0x25b/0x4e0 fs/file_table.c:280
       ____fput+0x11/0x20 fs/file_table.c:313
       task_work_run+0xae/0x130 kernel/task_work.c:164
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
       exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:208
       __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301
       do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888131855630 of 4 bytes by task 9385 on cpu 0:
       sock_error include/net/sock.h:2269 [inline]
       sock_alloc_send_pskb+0xe4/0x4e0 net/core/sock.c:2336
       unix_dgram_sendmsg+0x478/0x1610 net/unix/af_unix.c:1671
       unix_seqpacket_sendmsg+0xc2/0x100 net/unix/af_unix.c:2055
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
       __sys_sendmsg_sock+0x25/0x30 net/socket.c:2416
       io_sendmsg fs/io_uring.c:4367 [inline]
       io_issue_sqe+0x231a/0x6750 fs/io_uring.c:6135
       __io_queue_sqe+0xe9/0x360 fs/io_uring.c:6414
       __io_req_task_submit fs/io_uring.c:2039 [inline]
       io_async_task_func+0x312/0x590 fs/io_uring.c:5074
       __tctx_task_work fs/io_uring.c:1910 [inline]
       tctx_task_work+0x1d4/0x3d0 fs/io_uring.c:1924
       task_work_run+0xae/0x130 kernel/task_work.c:164
       tracehook_notify_signal include/linux/tracehook.h:212 [inline]
       handle_signal_work kernel/entry/common.c:145 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0xf8/0x190 kernel/entry/common.c:208
       __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
       syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301
       do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0x00000068
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 9385 Comm: syz-executor.3 Not tainted 5.13.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f13ef100
  6. 05 6月, 2021 2 次提交
  7. 13 5月, 2021 1 次提交
  8. 12 4月, 2021 1 次提交
  9. 02 4月, 2021 1 次提交
  10. 01 4月, 2021 1 次提交
  11. 31 3月, 2021 1 次提交
    • P
      net: let skb_orphan_partial wake-up waiters. · 9adc89af
      Paolo Abeni 提交于
      Currently the mentioned helper can end-up freeing the socket wmem
      without waking-up any processes waiting for more write memory.
      
      If the partially orphaned skb is attached to an UDP (or raw) socket,
      the lack of wake-up can hang the user-space.
      
      Even for TCP sockets not calling the sk destructor could have bad
      effects on TSQ.
      
      Address the issue using skb_orphan to release the sk wmem before
      setting the new sock_efree destructor. Additionally bundle the
      whole ownership update in a new helper, so that later other
      potential users could avoid duplicate code.
      
      v1 -> v2:
       - use skb_orphan() instead of sort of open coding it (Eric)
       - provide an helper for the ownership change (Eric)
      
      Fixes: f6ba8d33 ("netem: fix skb_orphan_partial()")
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9adc89af
  12. 13 3月, 2021 1 次提交
  13. 12 2月, 2021 1 次提交
  14. 04 2月, 2021 1 次提交
  15. 21 1月, 2021 1 次提交
  16. 20 1月, 2021 1 次提交
  17. 01 12月, 2020 3 次提交
  18. 21 11月, 2020 2 次提交
  19. 15 11月, 2020 1 次提交
  20. 03 11月, 2020 1 次提交
    • J
      net: Un-hide lockdep_sock_is_held() for !LOCKDEP · d97f3bdf
      Jakub Kicinski 提交于
      Currently, variables used only within lockdep expressions are flagged
      as unused, requiring that these variables' declarations be decorated
      with either #ifdef or __maybe_unused.  This results in ugly code.
      This commit therefore causes the lockdep_sock_is_held() function to be
      visible even when lockdep is not enabled, thus removing the need for
      these decorations.  This approach further relies on dead-code elimination
      to remove any references to functions or variables that are not available
      in non-lockdep kernels.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      d97f3bdf
  21. 25 9月, 2020 1 次提交
  22. 15 9月, 2020 1 次提交
    • E
      tcp: remove SOCK_QUEUE_SHRUNK · 0cbe6a8f
      Eric Dumazet 提交于
      SOCK_QUEUE_SHRUNK is currently used by TCP as a temporary state
      that remembers if some room has been made in the rtx queue
      by an incoming ACK packet.
      
      This is later used from tcp_check_space() before
      considering to send EPOLLOUT.
      
      Problem is: If we receive SACK packets, and no packet
      is removed from RTX queue, we can send fresh packets, thus
      moving them from write queue to rtx queue and eventually
      empty the write queue.
      
      This stall can happen if TCP_NOTSENT_LOWAT is used.
      
      With this fix, we no longer risk stalling sends while holes
      are repaired, and we can fully use socket sndbuf.
      
      This also removes a cache line dirtying for typical RPC
      workloads.
      
      Fixes: c9bee3b7 ("tcp: TCP_NOTSENT_LOWAT socket option")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cbe6a8f
  23. 01 9月, 2020 1 次提交
  24. 26 8月, 2020 1 次提交
    • K
      bpf: Renames in preparation for bpf_local_storage · 1f00d375
      KP Singh 提交于
      A purely mechanical change to split the renaming from the actual
      generalization.
      
      Flags/consts:
      
        SK_STORAGE_CREATE_FLAG_MASK	BPF_LOCAL_STORAGE_CREATE_FLAG_MASK
        BPF_SK_STORAGE_CACHE_SIZE	BPF_LOCAL_STORAGE_CACHE_SIZE
        MAX_VALUE_SIZE		BPF_LOCAL_STORAGE_MAX_VALUE_SIZE
      
      Structs:
      
        bucket			bpf_local_storage_map_bucket
        bpf_sk_storage_map		bpf_local_storage_map
        bpf_sk_storage_data		bpf_local_storage_data
        bpf_sk_storage_elem		bpf_local_storage_elem
        bpf_sk_storage		bpf_local_storage
      
      The "sk" member in bpf_local_storage is also updated to "owner"
      in preparation for changing the type to void * in a subsequent patch.
      
      Functions:
      
        selem_linked_to_sk			selem_linked_to_storage
        selem_alloc				bpf_selem_alloc
        __selem_unlink_sk			bpf_selem_unlink_storage_nolock
        __selem_link_sk			bpf_selem_link_storage_nolock
        selem_unlink_sk			__bpf_selem_unlink_storage
        sk_storage_update			bpf_local_storage_update
        __sk_storage_lookup			bpf_local_storage_lookup
        bpf_sk_storage_map_free		bpf_local_storage_map_free
        bpf_sk_storage_map_alloc		bpf_local_storage_map_alloc
        bpf_sk_storage_map_alloc_check	bpf_local_storage_map_alloc_check
        bpf_sk_storage_map_check_btf		bpf_local_storage_map_check_btf
      Signed-off-by: NKP Singh <kpsingh@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200825182919.1118197-2-kpsingh@chromium.org
      1f00d375
  25. 06 8月, 2020 1 次提交
  26. 25 7月, 2020 2 次提交
  27. 20 7月, 2020 3 次提交
  28. 14 7月, 2020 1 次提交
  29. 10 7月, 2020 1 次提交
  30. 25 6月, 2020 1 次提交
  31. 24 6月, 2020 1 次提交
  32. 02 6月, 2020 1 次提交