1. 09 8月, 2022 1 次提交
  2. 12 7月, 2022 1 次提交
  3. 23 6月, 2022 1 次提交
  4. 20 6月, 2022 3 次提交
  5. 03 6月, 2022 1 次提交
  6. 29 4月, 2022 1 次提交
  7. 15 3月, 2022 1 次提交
    • W
      bpf, sockmap: Fix memleak in tcp_bpf_sendmsg while sk msg is full · 9c34e38c
      Wang Yufen 提交于
      If tcp_bpf_sendmsg() is running while sk msg is full. When sk_msg_alloc()
      returns -ENOMEM error, tcp_bpf_sendmsg() goes to wait_for_memory. If partial
      memory has been alloced by sk_msg_alloc(), that is, msg_tx->sg.size is
      greater than osize after sk_msg_alloc(), memleak occurs. To fix we use
      sk_msg_trim() to release the allocated memory, then goto wait for memory.
      
      Other call paths of sk_msg_alloc() have the similar issue, such as
      tls_sw_sendmsg(), so handle sk_msg_trim logic inside sk_msg_alloc(),
      as Cong Wang suggested.
      
      This issue can cause the following info:
      WARNING: CPU: 3 PID: 7950 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
      Call Trace:
       <TASK>
       inet_csk_destroy_sock+0x55/0x110
       __tcp_close+0x279/0x470
       tcp_close+0x1f/0x60
       inet_release+0x3f/0x80
       __sock_release+0x3d/0xb0
       sock_close+0x11/0x20
       __fput+0x92/0x250
       task_work_run+0x6a/0xa0
       do_exit+0x33b/0xb60
       do_group_exit+0x2f/0xa0
       get_signal+0xb6/0x950
       arch_do_signal_or_restart+0xac/0x2a0
       exit_to_user_mode_prepare+0xa9/0x200
       syscall_exit_to_user_mode+0x12/0x30
       do_syscall_64+0x46/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
       </TASK>
      
      WARNING: CPU: 3 PID: 2094 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
      Call Trace:
       <TASK>
       __sk_destruct+0x24/0x1f0
       sk_psock_destroy+0x19b/0x1c0
       process_one_work+0x1b3/0x3c0
       kthread+0xe6/0x110
       ret_from_fork+0x22/0x30
       </TASK>
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: NWang Yufen <wangyufen@huawei.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220304081145.2037182-3-wangyufen@huawei.com
      9c34e38c
  8. 03 3月, 2022 1 次提交
  9. 20 11月, 2021 1 次提交
    • J
      bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap · c0d95d33
      John Fastabend 提交于
      When a sock is added to a sock map we evaluate what proto op hooks need to
      be used. However, when the program is removed from the sock map we have not
      been evaluating if that changes the required program layout.
      
      Before the patch listed in the 'fixes' tag this was not causing failures
      because the base program set handles all cases. Specifically, the case with
      a stream parser and the case with out a stream parser are both handled. With
      the fix below we identified a race when running with a proto op that attempts
      to read skbs off both the stream parser and the skb->receive_queue. Namely,
      that a race existed where when the stream parser is empty checking the
      skb->receive_queue from recvmsg at the precies moment when the parser is
      paused and the receive_queue is not empty could result in skipping the stream
      parser. This may break a RX policy depending on the parser to run.
      
      The fix tag then loads a specific proto ops that resolved this race. But, we
      missed removing that proto ops recv hook when the sock is removed from the
      sockmap. The result is the stream parser is stopped so no more skbs will be
      aggregated there, but the hook and BPF program continues to be attached on
      the psock. User space will then get an EBUSY when trying to read the socket
      because the recvmsg() handler is now waiting on a stopped stream parser.
      
      To fix we rerun the proto ops init() function which will look at the new set
      of progs attached to the psock and rest the proto ops hook to the correct
      handlers. And in the above case where we remove the sock from the sock map
      the RX prog will no longer be listed so the proto ops is removed.
      
      Fixes: c5d2177a ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20211119181418.353932-3-john.fastabend@gmail.com
      c0d95d33
  10. 02 11月, 2021 1 次提交
  11. 27 10月, 2021 1 次提交
  12. 28 7月, 2021 3 次提交
    • J
      bpf, sockmap: Fix memleak on ingress msg enqueue · 9635720b
      John Fastabend 提交于
      If backlog handler is running during a tear down operation we may enqueue
      data on the ingress msg queue while tear down is trying to free it.
      
       sk_psock_backlog()
         sk_psock_handle_skb()
           skb_psock_skb_ingress()
             sk_psock_skb_ingress_enqueue()
               sk_psock_queue_msg(psock,msg)
                                                 spin_lock(ingress_lock)
                                                  sk_psock_zap_ingress()
                                                   _sk_psock_purge_ingerss_msg()
                                                    _sk_psock_purge_ingress_msg()
                                                  -- free ingress_msg list --
                                                 spin_unlock(ingress_lock)
                 spin_lock(ingress_lock)
                 list_add_tail(msg,ingress_msg) <- entry on list with no one
                                                   left to free it.
                 spin_unlock(ingress_lock)
      
      To fix we only enqueue from backlog if the ENABLED bit is set. The tear
      down logic clears the bit with ingress_lock set so we wont enqueue the
      msg in the last step.
      
      Fixes: 799aa7f9 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Acked-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20210727160500.1713554-4-john.fastabend@gmail.com
      9635720b
    • J
      bpf, sockmap: On cleanup we additionally need to remove cached skb · 476d9801
      John Fastabend 提交于
      Its possible if a socket is closed and the receive thread is under memory
      pressure it may have cached a skb. We need to ensure these skbs are
      free'd along with the normal ingress_skb queue.
      
      Before 799aa7f9 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") tear
      down and backlog processing both had sock_lock for the common case of
      socket close or unhash. So it was not possible to have both running in
      parrallel so all we would need is the kfree in those kernels.
      
      But, latest kernels include the commit 799aa7f98d5e and this requires a
      bit more work. Without the ingress_lock guarding reading/writing the
      state->skb case its possible the tear down could run before the state
      update causing it to leak memory or worse when the backlog reads the state
      it could potentially run interleaved with the tear down and we might end up
      free'ing the state->skb from tear down side but already have the reference
      from backlog side. To resolve such races we wrap accesses in ingress_lock
      on both sides serializing tear down and backlog case. In both cases this
      only happens after an EAGAIN error case so having an extra lock in place
      is likely fine. The normal path will skip the locks.
      
      Note, we check state->skb before grabbing lock. This works because
      we can only enqueue with the mutex we hold already. Avoiding a race
      on adding state->skb after the check. And if tear down path is running
      that is also fine if the tear down path then removes state->skb we
      will simply set skb=NULL and the subsequent goto is skipped. This
      slight complication avoids locking in normal case.
      
      With this fix we no longer see this warning splat from tcp side on
      socket close when we hit the above case with redirect to ingress self.
      
      [224913.935822] WARNING: CPU: 3 PID: 32100 at net/core/stream.c:208 sk_stream_kill_queues+0x212/0x220
      [224913.935841] Modules linked in: fuse overlay bpf_preload x86_pkg_temp_thermal intel_uncore wmi_bmof squashfs sch_fq_codel efivarfs ip_tables x_tables uas xhci_pci ixgbe mdio xfrm_algo xhci_hcd wmi
      [224913.935897] CPU: 3 PID: 32100 Comm: fgs-bench Tainted: G          I       5.14.0-rc1alu+ #181
      [224913.935908] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
      [224913.935914] RIP: 0010:sk_stream_kill_queues+0x212/0x220
      [224913.935923] Code: 8b 83 20 02 00 00 85 c0 75 20 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 df e8 2b 11 fe ff eb c3 0f 0b e9 7c ff ff ff 0f 0b eb ce <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 90 0f 1f 44 00 00 41 57 41
      [224913.935932] RSP: 0018:ffff88816271fd38 EFLAGS: 00010206
      [224913.935941] RAX: 0000000000000ae8 RBX: ffff88815acd5240 RCX: dffffc0000000000
      [224913.935948] RDX: 0000000000000003 RSI: 0000000000000ae8 RDI: ffff88815acd5460
      [224913.935954] RBP: ffff88815acd5460 R08: ffffffff955c0ae8 R09: fffffbfff2e6f543
      [224913.935961] R10: ffffffff9737aa17 R11: fffffbfff2e6f542 R12: ffff88815acd5390
      [224913.935967] R13: ffff88815acd5480 R14: ffffffff98d0c080 R15: ffffffff96267500
      [224913.935974] FS:  00007f86e6bd1700(0000) GS:ffff888451cc0000(0000) knlGS:0000000000000000
      [224913.935981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [224913.935988] CR2: 000000c0008eb000 CR3: 00000001020e0005 CR4: 00000000003706e0
      [224913.935994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [224913.936000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [224913.936007] Call Trace:
      [224913.936016]  inet_csk_destroy_sock+0xba/0x1f0
      [224913.936033]  __tcp_close+0x620/0x790
      [224913.936047]  tcp_close+0x20/0x80
      [224913.936056]  inet_release+0x8f/0xf0
      [224913.936070]  __sock_release+0x72/0x120
      [224913.936083]  sock_close+0x14/0x20
      
      Fixes: a136678c ("bpf: sk_msg, zap ingress queue on psock down")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Acked-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20210727160500.1713554-3-john.fastabend@gmail.com
      476d9801
    • J
      bpf, sockmap: Zap ingress queues after stopping strparser · 343597d5
      John Fastabend 提交于
      We don't want strparser to run and pass skbs into skmsg handlers when
      the psock is null. We just sk_drop them in this case. When removing
      a live socket from map it means extra drops that we do not need to
      incur. Move the zap below strparser close to avoid this condition.
      
      This way we stop the stream parser first stopping it from processing
      packets and then delete the psock.
      
      Fixes: a136678c ("bpf: sk_msg, zap ingress queue on psock down")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Acked-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20210727160500.1713554-2-john.fastabend@gmail.com
      343597d5
  13. 16 7月, 2021 1 次提交
  14. 21 6月, 2021 6 次提交
  15. 18 5月, 2021 1 次提交
  16. 07 4月, 2021 1 次提交
    • J
      bpf, sockmap: Fix incorrect fwd_alloc accounting · 144748eb
      John Fastabend 提交于
      Incorrect accounting fwd_alloc can result in a warning when the socket
      is torn down,
      
       [18455.319240] WARNING: CPU: 0 PID: 24075 at net/core/stream.c:208 sk_stream_kill_queues+0x21f/0x230
       [...]
       [18455.319543] Call Trace:
       [18455.319556]  inet_csk_destroy_sock+0xba/0x1f0
       [18455.319577]  tcp_rcv_state_process+0x1b4e/0x2380
       [18455.319593]  ? lock_downgrade+0x3a0/0x3a0
       [18455.319617]  ? tcp_finish_connect+0x1e0/0x1e0
       [18455.319631]  ? sk_reset_timer+0x15/0x70
       [18455.319646]  ? tcp_schedule_loss_probe+0x1b2/0x240
       [18455.319663]  ? lock_release+0xb2/0x3f0
       [18455.319676]  ? __release_sock+0x8a/0x1b0
       [18455.319690]  ? lock_downgrade+0x3a0/0x3a0
       [18455.319704]  ? lock_release+0x3f0/0x3f0
       [18455.319717]  ? __tcp_close+0x2c6/0x790
       [18455.319736]  ? tcp_v4_do_rcv+0x168/0x370
       [18455.319750]  tcp_v4_do_rcv+0x168/0x370
       [18455.319767]  __release_sock+0xbc/0x1b0
       [18455.319785]  __tcp_close+0x2ee/0x790
       [18455.319805]  tcp_close+0x20/0x80
      
      This currently happens because on redirect case we do skb_set_owner_r()
      with the original sock. This increments the fwd_alloc memory accounting
      on the original sock. Then on redirect we may push this into the queue
      of the psock we are redirecting to. When the skb is flushed from the
      queue we give the memory back to the original sock. The problem is if
      the original sock is destroyed/closed with skbs on another psocks queue
      then the original sock will not have a way to reclaim the memory before
      being destroyed. Then above warning will be thrown
      
        sockA                          sockB
      
        sk_psock_strp_read()
         sk_psock_verdict_apply()
           -- SK_REDIRECT --
           sk_psock_skb_redirect()
                                      skb_queue_tail(psock_other->ingress_skb..)
      
        sk_close()
         sock_map_unref()
           sk_psock_put()
             sk_psock_drop()
               sk_psock_zap_ingress()
      
      At this point we have torn down our own psock, but have the outstanding
      skb in psock_other. Note that SK_PASS doesn't have this problem because
      the sk_psock_drop() logic releases the skb, its still associated with
      our psock.
      
      To resolve lets only account for sockets on the ingress queue that are
      still associated with the current socket. On the redirect case we will
      check memory limits per 6fa9201a, but will omit fwd_alloc accounting
      until skb is actually enqueued. When the skb is sent via skb_send_sock_locked
      or received with sk_psock_skb_ingress memory will be claimed on psock_other.
      
      Fixes: 6fa9201a ("bpf, sockmap: Avoid returning unneeded EAGAIN when redirecting to self")
      Reported-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/161731444013.68884.4021114312848535993.stgit@john-XPS-13-9370
      144748eb
  17. 02 4月, 2021 8 次提交
  18. 27 2月, 2021 7 次提交
新手
引导
客服 返回
顶部