1. 11 8月, 2022 1 次提交
    • H
      net: fix refcount bug in sk_psock_get (2) · 2a013372
      Hawkins Jiawei 提交于
      Syzkaller reports refcount bug as follows:
      ------------[ cut here ]------------
      refcount_t: saturated; leaking memory.
      WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
      Modules linked in:
      CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda #0
       <TASK>
       __refcount_add_not_zero include/linux/refcount.h:163 [inline]
       __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
       refcount_inc_not_zero include/linux/refcount.h:245 [inline]
       sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
       tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
       tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
       tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
       tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
       tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
       sk_backlog_rcv include/net/sock.h:1061 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2849
       release_sock+0x54/0x1b0 net/core/sock.c:3404
       inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
       __sys_shutdown_sock net/socket.c:2331 [inline]
       __sys_shutdown_sock net/socket.c:2325 [inline]
       __sys_shutdown+0xf1/0x1b0 net/socket.c:2343
       __do_sys_shutdown net/socket.c:2351 [inline]
       __se_sys_shutdown net/socket.c:2349 [inline]
       __x64_sys_shutdown+0x50/0x70 net/socket.c:2349
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
       </TASK>
      
      During SMC fallback process in connect syscall, kernel will
      replaces TCP with SMC. In order to forward wakeup
      smc socket waitqueue after fallback, kernel will sets
      clcsk->sk_user_data to origin smc socket in
      smc_fback_replace_callbacks().
      
      Later, in shutdown syscall, kernel will calls
      sk_psock_get(), which treats the clcsk->sk_user_data
      as psock type, triggering the refcnt warning.
      
      So, the root cause is that smc and psock, both will use
      sk_user_data field. So they will mismatch this field
      easily.
      
      This patch solves it by using another bit(defined as
      SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
      sk_user_data points to a psock object or not.
      This patch depends on a PTRMASK introduced in commit f1ff5ce2
      ("net, sk_msg: Clear sk_user_data pointer on clone if tagged").
      
      For there will possibly be more flags in the sk_user_data field,
      this patch also refactor sk_user_data flags code to be more generic
      to improve its maintainability.
      
      Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: NWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: NHawkins Jiawei <yin31149@gmail.com>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      2a013372
  2. 03 6月, 2022 1 次提交
  3. 15 3月, 2022 1 次提交
    • W
      bpf, sockmap: Fix memleak in sk_psock_queue_msg · 938d3480
      Wang Yufen 提交于
      If tcp_bpf_sendmsg is running during a tear down operation we may enqueue
      data on the ingress msg queue while tear down is trying to free it.
      
       sk1 (redirect sk2)                         sk2
       -------------------                      ---------------
      tcp_bpf_sendmsg()
       tcp_bpf_send_verdict()
        tcp_bpf_sendmsg_redir()
         bpf_tcp_ingress()
                                                sock_map_close()
                                                 lock_sock()
          lock_sock() ... blocking
                                                 sk_psock_stop
                                                  sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
                                                 release_sock(sk);
          lock_sock()
          sk_mem_charge()
          get_page()
          sk_psock_queue_msg()
           sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED);
            drop_sk_msg()
          release_sock()
      
      While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge
      and has sg pages need to put. To fix we use sk_msg_free() and then kfee()
      msg.
      
      This issue can cause the following info:
      WARNING: CPU: 0 PID: 9202 at net/core/stream.c:205 sk_stream_kill_queues+0xc8/0xe0
      Call Trace:
       <IRQ>
       inet_csk_destroy_sock+0x55/0x110
       tcp_rcv_state_process+0xe5f/0xe90
       ? sk_filter_trim_cap+0x10d/0x230
       ? tcp_v4_do_rcv+0x161/0x250
       tcp_v4_do_rcv+0x161/0x250
       tcp_v4_rcv+0xc3a/0xce0
       ip_protocol_deliver_rcu+0x3d/0x230
       ip_local_deliver_finish+0x54/0x60
       ip_local_deliver+0xfd/0x110
       ? ip_protocol_deliver_rcu+0x230/0x230
       ip_rcv+0xd6/0x100
       ? ip_local_deliver+0x110/0x110
       __netif_receive_skb_one_core+0x85/0xa0
       process_backlog+0xa4/0x160
       __napi_poll+0x29/0x1b0
       net_rx_action+0x287/0x300
       __do_softirq+0xff/0x2fc
       do_softirq+0x79/0x90
       </IRQ>
      
      WARNING: CPU: 0 PID: 531 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x175/0x1b0
      Call Trace:
       <TASK>
       __sk_destruct+0x24/0x1f0
       sk_psock_destroy+0x19b/0x1c0
       process_one_work+0x1b3/0x3c0
       ? process_one_work+0x3c0/0x3c0
       worker_thread+0x30/0x350
       ? process_one_work+0x3c0/0x3c0
       kthread+0xe6/0x110
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x22/0x30
       </TASK>
      
      Fixes: 9635720b ("bpf, sockmap: Fix memleak on ingress msg enqueue")
      Signed-off-by: NWang Yufen <wangyufen@huawei.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220304081145.2037182-2-wangyufen@huawei.com
      938d3480
  4. 05 2月, 2022 1 次提交
  5. 27 1月, 2022 1 次提交
  6. 16 11月, 2021 1 次提交
  7. 09 11月, 2021 1 次提交
  8. 02 11月, 2021 1 次提交
  9. 27 10月, 2021 1 次提交
  10. 28 7月, 2021 1 次提交
  11. 30 6月, 2021 1 次提交
  12. 21 6月, 2021 1 次提交
  13. 18 5月, 2021 1 次提交
  14. 12 4月, 2021 1 次提交
  15. 07 4月, 2021 1 次提交
    • J
      bpf, sockmap: Fix sk->prot unhash op reset · 1c84b331
      John Fastabend 提交于
      In '4da6a196' we fixed a potential unhash loop caused when
      a TLS socket in a sockmap was removed from the sockmap. This
      happened because the unhash operation on the TLS ctx continued
      to point at the sockmap implementation of unhash even though the
      psock has already been removed. The sockmap unhash handler when a
      psock is removed does the following,
      
       void sock_map_unhash(struct sock *sk)
       {
      	void (*saved_unhash)(struct sock *sk);
      	struct sk_psock *psock;
      
      	rcu_read_lock();
      	psock = sk_psock(sk);
      	if (unlikely(!psock)) {
      		rcu_read_unlock();
      		if (sk->sk_prot->unhash)
      			sk->sk_prot->unhash(sk);
      		return;
      	}
              [...]
       }
      
      The unlikely() case is there to handle the case where psock is detached
      but the proto ops have not been updated yet. But, in the above case
      with TLS and removed psock we never fixed sk_prot->unhash() and unhash()
      points back to sock_map_unhash resulting in a loop. To fix this we added
      this bit of code,
      
       static inline void sk_psock_restore_proto(struct sock *sk,
                                                struct sk_psock *psock)
       {
             sk->sk_prot->unhash = psock->saved_unhash;
      
      This will set the sk_prot->unhash back to its saved value. This is the
      correct callback for a TLS socket that has been removed from the sock_map.
      Unfortunately, this also overwrites the unhash pointer for all psocks.
      We effectively break sockmap unhash handling for any future socks.
      Omitting the unhash operation will leave stale entries in the map if
      a socket transition through unhash, but does not do close() op.
      
      To fix set unhash correctly before calling into tls_update. This way the
      TLS enabled socket will point to the saved unhash() handler.
      
      Fixes: 4da6a196 ("bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop")
      Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reported-by: NLorenz Bauer <lmb@cloudflare.com>
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/161731441904.68884.15593917809745631972.stgit@john-XPS-13-9370
      1c84b331
  16. 02 4月, 2021 6 次提交
  17. 27 2月, 2021 6 次提交
  18. 28 1月, 2021 1 次提交
  19. 12 10月, 2020 1 次提交
  20. 22 8月, 2020 1 次提交
  21. 01 7月, 2020 1 次提交
  22. 02 6月, 2020 1 次提交
    • J
      bpf: Fix running sk_skb program types with ktls · e91de6af
      John Fastabend 提交于
      KTLS uses a stream parser to collect TLS messages and send them to
      the upper layer tls receive handler. This ensures the tls receiver
      has a full TLS header to parse when it is run. However, when a
      socket has BPF_SK_SKB_STREAM_VERDICT program attached before KTLS
      is enabled we end up with two stream parsers running on the same
      socket.
      
      The result is both try to run on the same socket. First the KTLS
      stream parser runs and calls read_sock() which will tcp_read_sock
      which in turn calls tcp_rcv_skb(). This dequeues the skb from the
      sk_receive_queue. When this is done KTLS code then data_ready()
      callback which because we stacked KTLS on top of the bpf stream
      verdict program has been replaced with sk_psock_start_strp(). This
      will in turn kick the stream parser again and eventually do the
      same thing KTLS did above calling into tcp_rcv_skb() and dequeuing
      a skb from the sk_receive_queue.
      
      At this point the data stream is broke. Part of the stream was
      handled by the KTLS side some other bytes may have been handled
      by the BPF side. Generally this results in either missing data
      or more likely a "Bad Message" complaint from the kTLS receive
      handler as the BPF program steals some bytes meant to be in a
      TLS header and/or the TLS header length is no longer correct.
      
      We've already broke the idealized model where we can stack ULPs
      in any order with generic callbacks on the TX side to handle this.
      So in this patch we do the same thing but for RX side. We add
      a sk_psock_strp_enabled() helper so TLS can learn a BPF verdict
      program is running and add a tls_sw_has_ctx_rx() helper so BPF
      side can learn there is a TLS ULP on the socket.
      
      Then on BPF side we omit calling our stream parser to avoid
      breaking the data stream for the KTLS receiver. Then on the
      KTLS side we call BPF_SK_SKB_STREAM_VERDICT once the KTLS
      receiver is done with the packet but before it posts the
      msg to userspace. This gives us symmetry between the TX and
      RX halfs and IMO makes it usable again. On the TX side we
      process packets in this order BPF -> TLS -> TCP and on
      the receive side in the reverse order TCP -> TLS -> BPF.
      
      Discovered while testing OpenSSL 3.0 Alpha2.0 release.
      
      Fixes: d829e9c4 ("tls: convert to generic sk_msg interface")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/159079361946.5745.605854335665044485.stgit@john-Precision-5820-TowerSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      e91de6af
  23. 06 5月, 2020 1 次提交
    • J
      bpf, sockmap: bpf_tcp_ingress needs to subtract bytes from sg.size · 81aabbb9
      John Fastabend 提交于
      In bpf_tcp_ingress we used apply_bytes to subtract bytes from sg.size
      which is used to track total bytes in a message. But this is not
      correct because apply_bytes is itself modified in the main loop doing
      the mem_charge.
      
      Then at the end of this we have sg.size incorrectly set and out of
      sync with actual sk values. Then we can get a splat if we try to
      cork the data later and again try to redirect the msg to ingress. To
      fix instead of trying to track msg.size do the easy thing and include
      it as part of the sk_msg_xfer logic so that when the msg is moved the
      sg.size is always correct.
      
      To reproduce the below users will need ingress + cork and hit an
      error path that will then try to 'free' the skmsg.
      
      [  173.699981] BUG: KASAN: null-ptr-deref in sk_msg_free_elem+0xdd/0x120
      [  173.699987] Read of size 8 at addr 0000000000000008 by task test_sockmap/5317
      
      [  173.700000] CPU: 2 PID: 5317 Comm: test_sockmap Tainted: G          I       5.7.0-rc1+ #43
      [  173.700005] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
      [  173.700009] Call Trace:
      [  173.700021]  dump_stack+0x8e/0xcb
      [  173.700029]  ? sk_msg_free_elem+0xdd/0x120
      [  173.700034]  ? sk_msg_free_elem+0xdd/0x120
      [  173.700042]  __kasan_report+0x102/0x15f
      [  173.700052]  ? sk_msg_free_elem+0xdd/0x120
      [  173.700060]  kasan_report+0x32/0x50
      [  173.700070]  sk_msg_free_elem+0xdd/0x120
      [  173.700080]  __sk_msg_free+0x87/0x150
      [  173.700094]  tcp_bpf_send_verdict+0x179/0x4f0
      [  173.700109]  tcp_bpf_sendpage+0x3ce/0x5d0
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/158861290407.14306.5327773422227552482.stgit@john-Precision-5820-Tower
      81aabbb9
  24. 10 3月, 2020 3 次提交
  25. 22 2月, 2020 1 次提交
    • J
      net, sk_msg: Annotate lockless access to sk_prot on clone · b8e202d1
      Jakub Sitnicki 提交于
      sk_msg and ULP frameworks override protocol callbacks pointer in
      sk->sk_prot, while tcp accesses it locklessly when cloning the listening
      socket, that is with neither sk_lock nor sk_callback_lock held.
      
      Once we enable use of listening sockets with sockmap (and hence sk_msg),
      there will be shared access to sk->sk_prot if socket is getting cloned
      while being inserted/deleted to/from the sockmap from another CPU:
      
      Read side:
      
      tcp_v4_rcv
        sk = __inet_lookup_skb(...)
        tcp_check_req(sk)
          inet_csk(sk)->icsk_af_ops->syn_recv_sock
            tcp_v4_syn_recv_sock
              tcp_create_openreq_child
                inet_csk_clone_lock
                  sk_clone_lock
                    READ_ONCE(sk->sk_prot)
      
      Write side:
      
      sock_map_ops->map_update_elem
        sock_map_update_elem
          sock_map_update_common
            sock_map_link_no_progs
              tcp_bpf_init
                tcp_bpf_update_sk_prot
                  sk_psock_update_proto
                    WRITE_ONCE(sk->sk_prot, ops)
      
      sock_map_ops->map_delete_elem
        sock_map_delete_elem
          __sock_map_delete
           sock_map_unref
             sk_psock_put
               sk_psock_drop
                 sk_psock_restore_proto
                   tcp_update_ulp
                     WRITE_ONCE(sk->sk_prot, proto)
      
      Mark the shared access with READ_ONCE/WRITE_ONCE annotations.
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200218171023.844439-2-jakub@cloudflare.com
      b8e202d1
  26. 19 2月, 2020 2 次提交
  27. 16 1月, 2020 1 次提交