1. 10 3月, 2021 2 次提交
    • Y
      bpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp · de920fc6
      Yonghong Song 提交于
      x86 bpf_jit_comp.c used kmalloc_array to store jited addresses
      for each bpf insn. With a large bpf program, we have see the
      following allocation failures in our production server:
      
          page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
                                   nodemask=(null),cpuset=/,mems_allowed=0"
          Call Trace:
          dump_stack+0x50/0x70
          warn_alloc.cold.120+0x72/0xd2
          ? __alloc_pages_direct_compact+0x157/0x160
          __alloc_pages_slowpath+0xcdb/0xd00
          ? get_page_from_freelist+0xe44/0x1600
          ? vunmap_page_range+0x1ba/0x340
          __alloc_pages_nodemask+0x2c9/0x320
          kmalloc_order+0x18/0x80
          kmalloc_order_trace+0x1d/0xa0
          bpf_int_jit_compile+0x1e2/0x484
          ? kmalloc_order_trace+0x1d/0xa0
          bpf_prog_select_runtime+0xc3/0x150
          bpf_prog_load+0x480/0x720
          ? __mod_memcg_lruvec_state+0x21/0x100
          __do_sys_bpf+0xc31/0x2040
          ? close_pdeo+0x86/0xe0
          do_syscall_64+0x42/0x110
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
          RIP: 0033:0x7f2f300f7fa9
          Code: Bad RIP value.
      
      Dumped assembly:
      
          ffffffff810b6d70 <bpf_int_jit_compile>:
          ; {
          ffffffff810b6d70: e8 eb a5 b4 00        callq   0xffffffff81c01360 <__fentry__>
          ffffffff810b6d75: 41 57                 pushq   %r15
          ...
          ffffffff810b6f39: e9 72 fe ff ff        jmp     0xffffffff810b6db0 <bpf_int_jit_compile+0x40>
          ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
          ffffffff810b6f3e: 8b 45 0c              movl    12(%rbp), %eax
          ;       return __kmalloc(bytes, flags);
          ffffffff810b6f41: be c0 0c 00 00        movl    $3264, %esi
          ;       addrs = kmalloc_array(prog->len + 1, sizeof(*addrs), GFP_KERNEL);
          ffffffff810b6f46: 8d 78 01              leal    1(%rax), %edi
          ;       if (unlikely(check_mul_overflow(n, size, &bytes)))
          ffffffff810b6f49: 48 c1 e7 02           shlq    $2, %rdi
          ;       return __kmalloc(bytes, flags);
          ffffffff810b6f4d: e8 8e 0c 1d 00        callq   0xffffffff81287be0 <__kmalloc>
          ;       if (!addrs) {
          ffffffff810b6f52: 48 85 c0              testq   %rax, %rax
      
      Change kmalloc_array() to kvmalloc_array() to avoid potential
      allocation error for big bpf programs.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210309015647.3657852-1-yhs@fb.com
      de920fc6
    • Y
      bpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs · 05a68ce5
      Yonghong Song 提交于
      For kuprobe and tracepoint bpf programs, kernel calls
      trace_call_bpf() which calls BPF_PROG_RUN_ARRAY_CHECK()
      to run the program array. Currently, BPF_PROG_RUN_ARRAY_CHECK()
      also calls bpf_cgroup_storage_set() to set percpu
      cgroup local storage with NULL value. This is
      due to Commit 394e40a2 ("bpf: extend bpf_prog_array to store
      pointers to the cgroup storage") which modified
      __BPF_PROG_RUN_ARRAY() to call bpf_cgroup_storage_set()
      and this macro is also used by BPF_PROG_RUN_ARRAY_CHECK().
      
      kuprobe and tracepoint programs are not allowed to call
      bpf_get_local_storage() helper hence does not
      access percpu cgroup local storage. Let us
      change BPF_PROG_RUN_ARRAY_CHECK() not to
      modify percpu cgroup local storage.
      
      The issue is observed when I tried to debug [1] where
      percpu data is overwritten due to
        preempt_disable -> migration_disable
      change. This patch does not completely fix the above issue,
      which will be addressed separately, e.g., multiple cgroup
      prog runs may preempt each other. But it does fix
      any potential issue caused by tracing program
      overwriting percpu cgroup storage:
       - in a busy system, a tracing program is to run between
         bpf_cgroup_storage_set() and the cgroup prog run.
       - a kprobe program is triggered by a helper in cgroup prog
         before bpf_get_local_storage() is called.
      
       [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T
      
      Fixes: 394e40a2 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Link: https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.com
      05a68ce5
  2. 09 3月, 2021 3 次提交
  3. 08 3月, 2021 2 次提交
  4. 06 3月, 2021 14 次提交
  5. 05 3月, 2021 19 次提交
    • B
      bpf: Explicitly zero-extend R0 after 32-bit cmpxchg · 39491867
      Brendan Jackman 提交于
      As pointed out by Ilya and explained in the new comment, there's a
      discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
      the value from memory into r0, while x86 only does so when r0 and the
      value in memory are different. The same issue affects s390.
      
      At first this might sound like pure semantics, but it makes a real
      difference when the comparison is 32-bit, since the load will
      zero-extend r0/rax.
      
      The fix is to explicitly zero-extend rax after doing such a
      CMPXCHG. Since this problem affects multiple archs, this is done in
      the verifier by patching in a BPF_ZEXT_REG instruction after every
      32-bit cmpxchg. Any archs that don't need such manual zero-extension
      can do a look-ahead with insn_is_zext to skip the unnecessary mov.
      
      Note this still goes on top of Ilya's patch:
      
      https://lore.kernel.org/bpf/20210301154019.129110-1-iii@linux.ibm.com/T/#u
      
      Differences v5->v6[1]:
       - Moved is_cmpxchg_insn and ensured it can be safely re-used. Also renamed it
         and removed 'inline' to match the style of the is_*_function helpers.
       - Fixed up comments in verifier test (thanks for the careful review, Martin!)
      
      Differences v4->v5[1]:
       - Moved the logic entirely into opt_subreg_zext_lo32_rnd_hi32, thanks to Martin
         for suggesting this.
      
      Differences v3->v4[1]:
       - Moved the optimization against pointless zext into the correct place:
         opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.
      
      Differences v2->v3[1]:
       - Moved patching into fixup_bpf_calls (patch incoming to rename this function)
       - Added extra commentary on bpf_jit_needs_zext
       - Added check to avoid adding a pointless zext(r0) if there's already one there.
      
      Difference v1->v2[1]: Now solved centrally in the verifier instead of
        specifically for the x86 JIT. Thanks to Ilya and Daniel for the suggestions!
      
      [1] v5: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
          v4: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
          v3: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
          v2: https://lore.kernel.org/bpf/08669818-c99d-0d30-e1db-53160c063611@iogearbox.net/T/#t
          v1: https://lore.kernel.org/bpf/d7ebaefb-bfd6-a441-3ff2-2fdfe699b1d2@iogearbox.net/T/#tReported-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Fixes: 5ffa2550 ("bpf: Add instructions for atomic_[cmp]xchg")
      Signed-off-by: NBrendan Jackman <jackmanb@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Tested-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      39491867
    • P
      cipso,calipso: resolve a number of problems with the DOI refcounts · ad5d07f4
      Paul Moore 提交于
      The current CIPSO and CALIPSO refcounting scheme for the DOI
      definitions is a bit flawed in that we:
      
      1. Don't correctly match gets/puts in netlbl_cipsov4_list().
      2. Decrement the refcount on each attempt to remove the DOI from the
         DOI list, only removing it from the list once the refcount drops
         to zero.
      
      This patch fixes these problems by adding the missing "puts" to
      netlbl_cipsov4_list() and introduces a more conventional, i.e.
      not-buggy, refcounting mechanism to the DOI definitions.  Upon the
      addition of a DOI to the DOI list, it is initialized with a refcount
      of one, removing a DOI from the list removes it from the list and
      drops the refcount by one; "gets" and "puts" behave as expected with
      respect to refcounts, increasing and decreasing the DOI's refcount by
      one.
      
      Fixes: b1edeb10 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
      Fixes: d7cce015 ("netlabel: Add support for removing a CALIPSO DOI.")
      Reported-by: syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad5d07f4
    • J
      ibmvnic: always store valid MAC address · 67eb2114
      Jiri Wiesner 提交于
      The last change to ibmvnic_set_mac(), 8fc3672a, meant to prevent
      users from setting an invalid MAC address on an ibmvnic interface
      that has not been brought up yet. The change also prevented the
      requested MAC address from being stored by the adapter object for an
      ibmvnic interface when the state of the ibmvnic interface is
      VNIC_PROBED - that is after probing has finished but before the
      ibmvnic interface is brought up. The MAC address stored by the
      adapter object is used and sent to the hypervisor for checking when
      an ibmvnic interface is brought up.
      
      The ibmvnic driver ignoring the requested MAC address when in
      VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more
      than one slave to malfunction. The bonding code must be able to
      change the MAC address of its slaves before they are brought up
      during enslaving. The inability of kernels with 8fc3672a to set
      the MAC addresses of bonding slaves is observable in the output of
      "ip address show". The MAC addresses of the slaves are the same as
      the MAC address of the bond on a working system whereas the slaves
      retain their original MAC addresses on a system with a malfunctioning
      LACP bond.
      
      Fixes: 8fc3672a ("ibmvnic: fix ibmvnic_set_mac")
      Signed-off-by: NJiri Wiesner <jwiesner@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67eb2114
    • H
      netdevsim: init u64 stats for 32bit hardware · 863a42b2
      Hillf Danton 提交于
      Init the u64 stats in order to avoid the lockdep prints on the 32bit
      hardware like
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0
       Hardware name: ARM-Versatile Express
       Backtrace:
       [<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       [<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 [inline])
       [<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 lib/dump_stack.c:120)
       [<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key kernel/locking/lockdep.c:935 [inline])
       [<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 kernel/locking/lockdep.c:1247)
       [<802bef04>] (register_lock_class) from [<802baa2c>] (__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711)
       [<802ba9a8>] (__lock_acquire) from [<802be840>] (lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442)
       [<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5415)
       [<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin include/linux/u64_stats_sync.h:164 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin include/linux/u64_stats_sync.h:175 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 drivers/net/netdevsim/netdev.c:70)
       [<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 net/core/dev.c:10405)
       [<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 net/core/rtnetlink.c:1211)
       [<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] (rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783)
       [<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] (rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798)
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3830 [inline])
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3821 [inline])
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839)
       [<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c net/core/dev.c:10103)
       [<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 drivers/net/netdevsim/netdev.c:317)
       [<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 drivers/net/netdevsim/dev.c:941)
       [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all drivers/net/netdevsim/dev.c:990 [inline])
       [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119)
       [<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 drivers/net/netdevsim/bus.c:287)
       [<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c drivers/base/dd.c:554)
       [<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 drivers/base/dd.c:740)
       [<811728cc>] (driver_probe_device) from [<81172fe4>] (__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846)
       [<81172f58>] (__device_attach_driver) from [<8116fee0>] (bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431)
       [<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 drivers/base/dd.c:914)
       [<81172b90>] (__device_attach) from [<8117305c>] (device_initial_probe+0x14/0x18 drivers/base/dd.c:961)
       [<81173048>] (device_initial_probe) from [<81171358>] (bus_probe_device+0x90/0x98 drivers/base/bus.c:491)
       [<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 drivers/base/core.c:3109)
       [<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 drivers/base/core.c:3182)
       [<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline])
       [<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 drivers/net/netdevsim/bus.c:215)
       [<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 drivers/base/bus.c:122)
       [<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 fs/sysfs/file.c:139)
       [<805b4b44>] (sysfs_kf_write) from [<805b3c90>] (kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296)
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter include/linux/fs.h:1901 [inline])
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write fs/read_write.c:518 [inline])
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c fs/read_write.c:605)
       [<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec fs/read_write.c:658)
       [<804d259c>] (ksys_write) from [<804d2698>] (__do_sys_write fs/read_write.c:670 [inline])
       [<804d259c>] (ksys_write) from [<804d2698>] (sys_write+0x10/0x14 fs/read_write.c:667)
       [<804d2688>] (sys_write) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)
      
      Fixes: 83c9e13a ("netdevsim: add software driver for testing offloads")
      Reported-by: syzbot+e74a6857f2d0efe3ad81@syzkaller.appspotmail.com
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NHillf Danton <hdanton@sina.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      863a42b2
    • D
      Merge branch 'mptcp-fixes' · bdda7dfa
      David S. Miller 提交于
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for v5.12
      
      These patches from the MPTCP tree fix a few multipath TCP issues:
      
      Patches 1 and 5 clear some stale pointers when subflows close.
      
      Patches 2, 4, and 9 plug some memory leaks.
      
      Patch 3 fixes a memory accounting error identified by syzkaller.
      
      Patches 6 and 7 fix a race condition that slowed data transmission.
      
      Patch 8 adds missing wakeups when write buffer space is freed.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdda7dfa
    • G
      mptcp: free resources when the port number is mismatched · 9238e900
      Geliang Tang 提交于
      When the port number is mismatched with the announced ones, use
      'goto dispose_child' to free the resources instead of using 'goto out'.
      
      This patch also moves the port number checking code in
      subflow_syn_recv_sock before mptcp_finish_join, otherwise subflow_drop_ctx
      will fail in dispose_child.
      
      Fixes: 5bc56388 ("mptcp: add port number check for MP_JOIN")
      Reported-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9238e900
    • P
      mptcp: fix missing wakeup · 417789df
      Paolo Abeni 提交于
      __mptcp_clean_una() can free write memory and should wake-up
      user-space processes when needed.
      
      When such function is invoked by the MPTCP receive path, the wakeup
      is not needed, as the TCP stack will later trigger subflow_write_space
      which will do the wakeup as needed.
      
      Other __mptcp_clean_una() call sites need an additional wakeup check
      Let's bundle the relevant code in a new helper and use it.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/165
      Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks")
      Fixes: 64b9cea7 ("mptcp: fix spurious retransmissions")
      Tested-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      417789df
    • P
      mptcp: fix race in release_cb · c2e6048f
      Paolo Abeni 提交于
      If we receive a MPTCP_PUSH_PENDING even from a subflow when
      mptcp_release_cb() is serving the previous one, the latter
      will be delayed up to the next release_sock(msk).
      
      Address the issue implementing a test/serve loop for such
      event.
      
      Additionally rename the push helper to __mptcp_push_pending()
      to be more consistent with the existing code.
      
      Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2e6048f
    • P
      mptcp: factor out __mptcp_retrans helper() · 2948d0a1
      Paolo Abeni 提交于
      Will simplify the following patch, no functional change
      intended.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2948d0a1
    • F
      mptcp: reset 'first' and ack_hint on subflow close · c8fe62f0
      Florian Westphal 提交于
      Just like with last_snd, we have to NULL 'first' on subflow close.
      
      ack_hint isn't strictly required (its never dereferenced), but better to
      clear this explicitly as well instead of making it an exception.
      
      msk->first is dereferenced unconditionally at accept time, but
      at that point the ssk is not on the conn_list yet -- this means
      worker can't see it when iterating the conn_list.
      Reported-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8fe62f0
    • F
      mptcp: dispose initial struct socket when its subflow is closed · 17aee05d
      Florian Westphal 提交于
      Christoph Paasch reported following crash:
      dst_release underflow
      WARNING: CPU: 0 PID: 1319 at net/core/dst.c:175 dst_release+0xc1/0xd0 net/core/dst.c:175
      CPU: 0 PID: 1319 Comm: syz-executor217 Not tainted 5.11.0-rc6af8e85128b4d0d24083c5cac646e891227052e0c #70
      Call Trace:
       rt_cache_route+0x12e/0x140 net/ipv4/route.c:1503
       rt_set_nexthop.constprop.0+0x1fc/0x590 net/ipv4/route.c:1612
       __mkroute_output net/ipv4/route.c:2484 [inline]
      ...
      
      The worker leaves msk->subflow alone even when it
      happened to close the subflow ssk associated with it.
      
      Fixes: 866f26f2 ("mptcp: always graft subflow socket to parent")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/157Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17aee05d
    • P
      mptcp: fix memory accounting on allocation error · eaeef1ce
      Paolo Abeni 提交于
      In case of memory pressure the MPTCP xmit path keeps
      at most a single skb in the tx cache, eventually freeing
      additional ones.
      
      The associated counter for forward memory is not update
      accordingly, and that causes the following splat:
      
      WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
      Modules linked in:
      CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Workqueue: events mptcp_worker
      RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
      Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0
      RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003
      RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7
      R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100
      R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85
      FS:  0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0
      Call Trace:
       __mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547
       mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272
       process_one_work+0x896/0x1170 kernel/workqueue.c:2275
       worker_thread+0x605/0x1350 kernel/workqueue.c:2421
       kthread+0x344/0x410 kernel/kthread.c:292
       ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296
      
      At close time, as reported by syzkaller/Christoph.
      
      This change address the issue properly updating the fwd
      allocated memory counter in the error path.
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/136
      Fixes: 724cfd2e ("mptcp: allocate TX skbs in msk context")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaeef1ce
    • F
      mptcp: put subflow sock on connect error · f0715779
      Florian Westphal 提交于
      mptcp_add_pending_subflow() performs a sock_hold() on the subflow,
      then adds the subflow to the join list.
      
      Without a sock_put the subflow sk won't be freed in case connect() fails.
      
      unreferenced object 0xffff88810c03b100 (size 3000):
      [..]
          sk_prot_alloc.isra.0+0x2f/0x110
          sk_alloc+0x5d/0xc20
          inet6_create+0x2b7/0xd30
          __sock_create+0x17f/0x410
          mptcp_subflow_create_socket+0xff/0x9c0
          __mptcp_subflow_connect+0x1da/0xaf0
          mptcp_pm_nl_work+0x6e0/0x1120
          mptcp_worker+0x508/0x9a0
      
      Fixes: 5b950ff4 ("mptcp: link MPC subflow into msk only after accept")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0715779
    • F
      mptcp: reset last_snd on subflow close · e0be4931
      Florian Westphal 提交于
      Send logic caches last active subflow in the msk, so it needs to be
      cleared when the cached subflow is closed.
      
      Fixes: d5f49190 ("mptcp: allow picking different xmit subflows")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/155Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0be4931
    • M
      net: sched: avoid duplicates in classes dump · bfc25605
      Maximilian Heyne 提交于
      This is a follow up of commit ea327469 ("net: sched: avoid
      duplicates in qdisc dump") which has fixed the issue only for the qdisc
      dump.
      
      The duplicate printing also occurs when dumping the classes via
        tc class show dev eth0
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Signed-off-by: NMaximilian Heyne <mheyne@amazon.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfc25605
    • D
      net: usb: qmi_wwan: allow qmimux add/del with master up · 6c59cff3
      Daniele Palmas 提交于
      There's no reason for preventing the creation and removal
      of qmimux network interfaces when the underlying interface
      is up.
      
      This makes qmi_wwan mux implementation more similar to the
      rmnet one, simplifying userspace management of the same
      logical interfaces.
      
      Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
      Reported-by: NAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: NDaniele Palmas <dnlplm@gmail.com>
      Acked-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c59cff3
    • V
      net: dsa: sja1105: fix ucast/bcast flooding always remaining enabled · 6a5166e0
      Vladimir Oltean 提交于
      In the blamed patch I managed to introduce a bug while moving code
      around: the same logic is applied to the ucast_egress_floods and
      bcast_egress_floods variables both on the "if" and the "else" branches.
      
      This is clearly an unintended change compared to how the code used to be
      prior to that bugfix, so restore it.
      
      Fixes: 7f7ccdea ("net: dsa: sja1105: fix leakage of flooded frames outside bridging domain")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a5166e0
    • V
      net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10 · 053d8ad1
      Vladimir Oltean 提交于
      When using MLO_AN_PHY or MLO_AN_FIXED, the MII_BMCR of the SGMII PCS is
      read before resetting the switch so it can be reprogrammed afterwards.
      This works for the speeds of 1Gbps and 100Mbps, but not for 10Mbps,
      because SPEED_10 is actually 0, so AND-ing anything with 0 is false,
      therefore that last branch is dead code.
      
      Do what others do (genphy_read_status_fixed, phy_mii_ioctl) and just
      remove the check for SPEED_10, let it fall into the default case.
      
      Fixes: ffe10e67 ("net: dsa: sja1105: Add support for the SGMII port")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      053d8ad1
    • V
      net: mscc: ocelot: properly reject destination IP keys in VCAP IS1 · f1becbed
      Vladimir Oltean 提交于
      An attempt is made to warn the user about the fact that VCAP IS1 cannot
      offload keys matching on destination IP (at least given the current half
      key format), but sadly that warning fails miserably in practice, due to
      the fact that it operates on an uninitialized "match" variable. We must
      first decode the keys from the flow rule.
      
      Fixes: 75944fda ("net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1")
      Reported-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1becbed
新手
引导
客服 返回
顶部