1. 14 7月, 2020 1 次提交
    • W
      ip6_gre: fix null-ptr-deref in ip6gre_init_net() · 46ef5b89
      Wei Yongjun 提交于
      KASAN report null-ptr-deref error when register_netdev() failed:
      
      KASAN: null-ptr-deref in range [0x00000000000003c0-0x00000000000003c7]
      CPU: 2 PID: 422 Comm: ip Not tainted 5.8.0-rc4+ #12
      Call Trace:
       ip6gre_init_net+0x4ab/0x580
       ? ip6gre_tunnel_uninit+0x3f0/0x3f0
       ops_init+0xa8/0x3c0
       setup_net+0x2de/0x7e0
       ? rcu_read_lock_bh_held+0xb0/0xb0
       ? ops_init+0x3c0/0x3c0
       ? kasan_unpoison_shadow+0x33/0x40
       ? __kasan_kmalloc.constprop.0+0xc2/0xd0
       copy_net_ns+0x27d/0x530
       create_new_namespaces+0x382/0xa30
       unshare_nsproxy_namespaces+0xa1/0x1d0
       ksys_unshare+0x39c/0x780
       ? walk_process_tree+0x2a0/0x2a0
       ? trace_hardirqs_on+0x4a/0x1b0
       ? _raw_spin_unlock_irq+0x1f/0x30
       ? syscall_trace_enter+0x1a7/0x330
       ? do_syscall_64+0x1c/0xa0
       __x64_sys_unshare+0x2d/0x40
       do_syscall_64+0x56/0xa0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      ip6gre_tunnel_uninit() has set 'ign->fb_tunnel_dev' to NULL, later
      access to ign->fb_tunnel_dev cause null-ptr-deref. Fix it by saving
      'ign->fb_tunnel_dev' to local variable ndev.
      
      Fixes: dafabb65 ("ip6_gre: fix use-after-free in ip6gre_tunnel_lookup()")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46ef5b89
  2. 10 7月, 2020 3 次提交
    • C
      tcp: make sure listeners don't initialize congestion-control state · ce69e563
      Christoph Paasch 提交于
      syzkaller found its way into setsockopt with TCP_CONGESTION "cdg".
      tcp_cdg_init() does a kcalloc to store the gradients. As sk_clone_lock
      just copies all the memory, the allocated pointer will be copied as
      well, if the app called setsockopt(..., TCP_CONGESTION) on the listener.
      If now the socket will be destroyed before the congestion-control
      has properly been initialized (through a call to tcp_init_transfer), we
      will end up freeing memory that does not belong to that particular
      socket, opening the door to a double-free:
      
      [   11.413102] ==================================================================
      [   11.414181] BUG: KASAN: double-free or invalid-free in tcp_cleanup_congestion_control+0x58/0xd0
      [   11.415329]
      [   11.415560] CPU: 3 PID: 4884 Comm: syz-executor.5 Not tainted 5.8.0-rc2 #80
      [   11.416544] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [   11.418148] Call Trace:
      [   11.418534]  <IRQ>
      [   11.418834]  dump_stack+0x7d/0xb0
      [   11.419297]  print_address_description.constprop.0+0x1a/0x210
      [   11.422079]  kasan_report_invalid_free+0x51/0x80
      [   11.423433]  __kasan_slab_free+0x15e/0x170
      [   11.424761]  kfree+0x8c/0x230
      [   11.425157]  tcp_cleanup_congestion_control+0x58/0xd0
      [   11.425872]  tcp_v4_destroy_sock+0x57/0x5a0
      [   11.426493]  inet_csk_destroy_sock+0x153/0x2c0
      [   11.427093]  tcp_v4_syn_recv_sock+0xb29/0x1100
      [   11.427731]  tcp_get_cookie_sock+0xc3/0x4a0
      [   11.429457]  cookie_v4_check+0x13d0/0x2500
      [   11.433189]  tcp_v4_do_rcv+0x60e/0x780
      [   11.433727]  tcp_v4_rcv+0x2869/0x2e10
      [   11.437143]  ip_protocol_deliver_rcu+0x23/0x190
      [   11.437810]  ip_local_deliver+0x294/0x350
      [   11.439566]  __netif_receive_skb_one_core+0x15d/0x1a0
      [   11.441995]  process_backlog+0x1b1/0x6b0
      [   11.443148]  net_rx_action+0x37e/0xc40
      [   11.445361]  __do_softirq+0x18c/0x61a
      [   11.445881]  asm_call_on_stack+0x12/0x20
      [   11.446409]  </IRQ>
      [   11.446716]  do_softirq_own_stack+0x34/0x40
      [   11.447259]  do_softirq.part.0+0x26/0x30
      [   11.447827]  __local_bh_enable_ip+0x46/0x50
      [   11.448406]  ip_finish_output2+0x60f/0x1bc0
      [   11.450109]  __ip_queue_xmit+0x71c/0x1b60
      [   11.451861]  __tcp_transmit_skb+0x1727/0x3bb0
      [   11.453789]  tcp_rcv_state_process+0x3070/0x4d3a
      [   11.456810]  tcp_v4_do_rcv+0x2ad/0x780
      [   11.457995]  __release_sock+0x14b/0x2c0
      [   11.458529]  release_sock+0x4a/0x170
      [   11.459005]  __inet_stream_connect+0x467/0xc80
      [   11.461435]  inet_stream_connect+0x4e/0xa0
      [   11.462043]  __sys_connect+0x204/0x270
      [   11.465515]  __x64_sys_connect+0x6a/0xb0
      [   11.466088]  do_syscall_64+0x3e/0x70
      [   11.466617]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   11.467341] RIP: 0033:0x7f56046dc469
      [   11.467844] Code: Bad RIP value.
      [   11.468282] RSP: 002b:00007f5604dccdd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      [   11.469326] RAX: ffffffffffffffda RBX: 000000000068bf00 RCX: 00007f56046dc469
      [   11.470379] RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000004
      [   11.471311] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
      [   11.472286] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [   11.473341] R13: 000000000041427c R14: 00007f5604dcd5c0 R15: 0000000000000003
      [   11.474321]
      [   11.474527] Allocated by task 4884:
      [   11.475031]  save_stack+0x1b/0x40
      [   11.475548]  __kasan_kmalloc.constprop.0+0xc2/0xd0
      [   11.476182]  tcp_cdg_init+0xf0/0x150
      [   11.476744]  tcp_init_congestion_control+0x9b/0x3a0
      [   11.477435]  tcp_set_congestion_control+0x270/0x32f
      [   11.478088]  do_tcp_setsockopt.isra.0+0x521/0x1a00
      [   11.478744]  __sys_setsockopt+0xff/0x1e0
      [   11.479259]  __x64_sys_setsockopt+0xb5/0x150
      [   11.479895]  do_syscall_64+0x3e/0x70
      [   11.480395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   11.481097]
      [   11.481321] Freed by task 4872:
      [   11.481783]  save_stack+0x1b/0x40
      [   11.482230]  __kasan_slab_free+0x12c/0x170
      [   11.482839]  kfree+0x8c/0x230
      [   11.483240]  tcp_cleanup_congestion_control+0x58/0xd0
      [   11.483948]  tcp_v4_destroy_sock+0x57/0x5a0
      [   11.484502]  inet_csk_destroy_sock+0x153/0x2c0
      [   11.485144]  tcp_close+0x932/0xfe0
      [   11.485642]  inet_release+0xc1/0x1c0
      [   11.486131]  __sock_release+0xc0/0x270
      [   11.486697]  sock_close+0xc/0x10
      [   11.487145]  __fput+0x277/0x780
      [   11.487632]  task_work_run+0xeb/0x180
      [   11.488118]  __prepare_exit_to_usermode+0x15a/0x160
      [   11.488834]  do_syscall_64+0x4a/0x70
      [   11.489326]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Wei Wang fixed a part of these CDG-malloc issues with commit c1201444
      ("tcp: memset ca_priv data to 0 properly").
      
      This patch here fixes the listener-scenario: We make sure that listeners
      setting the congestion-control through setsockopt won't initialize it
      (thus CDG never allocates on listeners). For those who use AF_UNSPEC to
      reuse a socket, tcp_disconnect() is changed to cleanup afterwards.
      
      (The issue can be reproduced at least down to v4.4.x.)
      
      Cc: Wei Wang <weiwan@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 2b0a8c9e ("tcp: add CDG congestion control")
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce69e563
    • M
      ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit() · 365f9ae4
      Michal Kubecek 提交于
      If the genlmsg_put() call in ethnl_default_dumpit() fails, we bail out
      without checking if we already have some messages in current skb like we do
      with ethnl_default_dump_one() failure later. Therefore if existing messages
      almost fill up the buffer so that there is not enough space even for
      netlink and genetlink header, we lose all prepared messages and return and
      error.
      
      Rather than duplicating the skb->len check, move the genlmsg_put(),
      genlmsg_cancel() and genlmsg_end() calls into ethnl_default_dump_one().
      This is also more logical as all message composition will be in
      ethnl_default_dump_one() and only iteration logic will be left in
      ethnl_default_dumpit().
      
      Fixes: 728480f1 ("ethtool: default handlers for GET requests")
      Reported-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      365f9ae4
    • C
      net_sched: fix a memory leak in atm_tc_init() · 306381ae
      Cong Wang 提交于
      When tcf_block_get() fails inside atm_tc_init(),
      atm_tc_put() is called to release the qdisc p->link.q.
      But the flow->ref prevents it to do so, as the flow->ref
      is still zero.
      
      Fix this by moving the p->link.ref initialization before
      tcf_block_get().
      
      Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
      Reported-and-tested-by: syzbot+d411cff6ab29cc2c311b@syzkaller.appspotmail.com
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      306381ae
  3. 09 7月, 2020 8 次提交
    • K
      bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok() · 63960260
      Kees Cook 提交于
      When evaluating access control over kallsyms visibility, credentials at
      open() time need to be used, not the "current" creds (though in BPF's
      case, this has likely always been the same). Plumb access to associated
      file->f_cred down through bpf_dump_raw_ok() and its callers now that
      kallsysm_show_value() has been refactored to take struct cred.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: bpf@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 7105e828 ("bpf: allow for correlation of maps and helpers in dump")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      63960260
    • H
      tipc: fix retransmission on unicast links · a34f8291
      Hamish Martin 提交于
      A scenario has been observed where a 'bc_init' message for a link is not
      retransmitted if it fails to be received by the peer. This leads to the
      peer never establishing the link fully and it discarding all other data
      received on the link. In this scenario the message is lost in transit to
      the peer.
      
      The issue is traced to the 'nxt_retr' field of the skb not being
      initialised for links that aren't a bc_sndlink. This leads to the
      comparison in tipc_link_advance_transmq() that gates whether to attempt
      retransmission of a message performing in an undesirable way.
      Depending on the relative value of 'jiffies', this comparison:
          time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)
      may return true or false given that 'nxt_retr' remains at the
      uninitialised value of 0 for non bc_sndlinks.
      
      This is most noticeable shortly after boot when jiffies is initialised
      to a high value (to flush out rollover bugs) and we compare a jiffies of,
      say, 4294940189 to zero. In that case time_before returns 'true' leading
      to the skb not being retransmitted.
      
      The fix is to ensure that all skbs have a valid 'nxt_retr' time set for
      them and this is achieved by refactoring the setting of this value into
      a central function.
      With this fix, transmission losses of 'bc_init' messages do not stall
      the link establishment forever because the 'bc_init' message is
      retransmitted and the link eventually establishes correctly.
      
      Fixes: 382f598f ("tipc: reduce duplicate packets for unicast traffic")
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NHamish Martin <hamish.martin@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a34f8291
    • X
      l2tp: remove skb_dst_set() from l2tp_xmit_skb() · 27d53323
      Xin Long 提交于
      In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set
      skb's dst. However, it will eventually call inet6_csk_xmit() or
      ip_queue_xmit() where skb's dst will be overwritten by:
      
         skb_dst_set_noref(skb, dst);
      
      without releasing the old dst in skb. Then it causes dst/dev refcnt leak:
      
        unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      This can be reproduced by simply running:
      
        # modprobe l2tp_eth && modprobe l2tp_ip
        # sh ./tools/testing/selftests/net/l2tp.sh
      
      So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst
      should be dropped. This patch is to fix it by removing skb_dst_set()
      from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core().
      
      Fixes: 3557baab ("[L2TP]: PPP over L2TP driver core")
      Reported-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NJames Chapman <jchapman@katalix.com>
      Tested-by: NJames Chapman <jchapman@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27d53323
    • U
      net/smc: tolerate future SMCD versions · fb4f7926
      Ursula Braun 提交于
      CLC proposal messages of future SMCD versions could be larger than SMCD
      V1 CLC proposal messages.
      To enable toleration in SMC V1 the receival of CLC proposal messages
      is adapted:
      * accept larger length values in CLC proposal
      * check trailing eye catcher for incoming CLC proposal with V1 length only
      * receive the whole CLC proposal even in cases it does not fit into the
        V1 buffer
      
      Fixes: e7b7a64a ("smc: support variable CLC proposal messages")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb4f7926
    • U
      net/smc: switch smcd_dev_list spinlock to mutex · 82087c03
      Ursula Braun 提交于
      The similar smc_ib_devices spinlock has been converted to a mutex.
      Protecting the smcd_dev_list by a mutex is possible as well. This
      patch converts the smcd_dev_list spinlock to a mutex.
      
      Fixes: c6ba7c9b ("net/smc: add base infrastructure for SMC-D and ISM")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82087c03
    • U
      net/smc: fix sleep bug in smc_pnet_find_roce_resource() · 92f3cb0e
      Ursula Braun 提交于
      Tests showed this BUG:
      [572555.252867] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
      [572555.252876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 131031, name: smcapp
      [572555.252879] INFO: lockdep is turned off.
      [572555.252883] CPU: 1 PID: 131031 Comm: smcapp Tainted: G           O      5.7.0-rc3uschi+ #356
      [572555.252885] Hardware name: IBM 3906 M03 703 (LPAR)
      [572555.252887] Call Trace:
      [572555.252896]  [<00000000ac364554>] show_stack+0x94/0xe8
      [572555.252901]  [<00000000aca1f400>] dump_stack+0xa0/0xe0
      [572555.252906]  [<00000000ac3c8c10>] ___might_sleep+0x260/0x280
      [572555.252910]  [<00000000acdc0c98>] __mutex_lock+0x48/0x940
      [572555.252912]  [<00000000acdc15c2>] mutex_lock_nested+0x32/0x40
      [572555.252975]  [<000003ff801762d0>] mlx5_lag_get_roce_netdev+0x30/0xc0 [mlx5_core]
      [572555.252996]  [<000003ff801fb3aa>] mlx5_ib_get_netdev+0x3a/0xe0 [mlx5_ib]
      [572555.253007]  [<000003ff80063848>] smc_pnet_find_roce_resource+0x1d8/0x310 [smc]
      [572555.253011]  [<000003ff800602f0>] __smc_connect+0x1f0/0x3e0 [smc]
      [572555.253015]  [<000003ff80060634>] smc_connect+0x154/0x190 [smc]
      [572555.253022]  [<00000000acbed8d4>] __sys_connect+0x94/0xd0
      [572555.253025]  [<00000000acbef620>] __s390x_sys_socketcall+0x170/0x360
      [572555.253028]  [<00000000acdc6800>] system_call+0x298/0x2b8
      [572555.253030] INFO: lockdep is turned off.
      
      Function smc_pnet_find_rdma_dev() might be called from
      smc_pnet_find_roce_resource(). It holds the smc_ib_devices list
      spinlock while calling infiniband op get_netdev(). At least for mlx5
      the get_netdev operation wants mutex serialization, which conflicts
      with the smc_ib_devices spinlock.
      This patch switches the smc_ib_devices spinlock into a mutex to
      allow sleeping when calling get_netdev().
      
      Fixes: a4cf0443 ("smc: introduce SMC as an IB-client")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92f3cb0e
    • K
      net/smc: fix work request handling · b7eede75
      Karsten Graul 提交于
      Wait for pending sends only when smc_switch_conns() found a link to move
      the connections to. Do not wait during link freeing, this can lead to
      permanent hang situations. And refuse to provide a new tx slot on an
      unusable link.
      
      Fixes: c6f02ebe ("net/smc: switch connections to alternate link")
      Reviewed-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7eede75
    • K
      net/smc: separate LLC wait queues for flow and messages · 6778a6be
      Karsten Graul 提交于
      There might be races in scenarios where both SMC link groups are on the
      same system. Prevent that by creating separate wait queues for LLC flows
      and messages. Switch to non-interruptable versions of wait_event() and
      wake_up() for the llc flow waiter to make sure the waiters get control
      sequentially. Fine tune the llc_flow_lock to include the assignment of
      the message. Write to system log when an unexpected message was
      dropped. And remove an extra indirection and use the existing local
      variable lgr in smc_llc_enqueue().
      
      Fixes: 555da9af ("net/smc: add event-based llc_flow framework")
      Reviewed-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6778a6be
  4. 08 7月, 2020 7 次提交
  5. 07 7月, 2020 1 次提交
    • D
      ipv6: fib6_select_path can not use out path for nexthop objects · 34fe5a1c
      David Ahern 提交于
      Brian reported a crash in IPv6 code when using rpfilter with a setup
      running FRR and external nexthop objects. The root cause of the crash
      is fib6_select_path setting fib6_nh in the result to NULL because of
      an improper check for nexthop objects.
      
      More specifically, rpfilter invokes ip6_route_lookup with flowi6_oif
      set causing fib6_select_path to be called with have_oif_match set.
      fib6_select_path has early check on have_oif_match and jumps to the
      out label which presumes a builtin fib6_nh. This path is invalid for
      nexthop objects; for external nexthops fib6_select_path needs to just
      return if the fib6_nh has already been set in the result otherwise it
      returns after the call to nexthop_path_fib6_result. Update the check
      on have_oif_match to not bail on external nexthops.
      
      Update selftests for this problem.
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Reported-by: NBrian Rak <brak@choopa.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34fe5a1c
  6. 05 7月, 2020 1 次提交
    • T
      hsr: fix interface leak in error path of hsr_dev_finalize() · ccfc9df1
      Taehee Yoo 提交于
      To release hsr(upper) interface, it should release
      its own lower interfaces first.
      Then, hsr(upper) interface can be released safely.
      In the current code of error path of hsr_dev_finalize(), it releases hsr
      interface before releasing a lower interface.
      So, a warning occurs, which warns about the leak of lower interfaces.
      In order to fix this problem, changing the ordering of the error path of
      hsr_dev_finalize() is needed.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add dummy2 type dummy
          ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
          ip link add hsr1 type hsr slave1 dummy2 slave2 dummy0
      
      Splat looks like:
      [  214.923127][    C2] WARNING: CPU: 2 PID: 1093 at net/core/dev.c:8992 rollback_registered_many+0x986/0xcf0
      [  214.923129][    C2] Modules linked in: hsr dummy openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipx
      [  214.923154][    C2] CPU: 2 PID: 1093 Comm: ip Not tainted 5.8.0-rc2+ #623
      [  214.923156][    C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  214.923157][    C2] RIP: 0010:rollback_registered_many+0x986/0xcf0
      [  214.923160][    C2] Code: 41 8b 4e cc 45 31 c0 31 d2 4c 89 ee 48 89 df e8 e0 47 ff ff 85 c0 0f 84 cd fc ff ff 5
      [  214.923162][    C2] RSP: 0018:ffff8880c5156f28 EFLAGS: 00010287
      [  214.923165][    C2] RAX: ffff8880d1dad458 RBX: ffff8880bd1b9000 RCX: ffffffffb929d243
      [  214.923167][    C2] RDX: 1ffffffff77e63f0 RSI: 0000000000000008 RDI: ffffffffbbf31f80
      [  214.923168][    C2] RBP: dffffc0000000000 R08: fffffbfff77e63f1 R09: fffffbfff77e63f1
      [  214.923170][    C2] R10: ffffffffbbf31f87 R11: 0000000000000001 R12: ffff8880c51570a0
      [  214.923172][    C2] R13: ffff8880bd1b90b8 R14: ffff8880c5157048 R15: ffff8880d1dacc40
      [  214.923174][    C2] FS:  00007fdd257a20c0(0000) GS:ffff8880da200000(0000) knlGS:0000000000000000
      [  214.923175][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  214.923177][    C2] CR2: 00007ffd78beb038 CR3: 00000000be544005 CR4: 00000000000606e0
      [  214.923179][    C2] Call Trace:
      [  214.923180][    C2]  ? netif_set_real_num_tx_queues+0x780/0x780
      [  214.923182][    C2]  ? dev_validate_mtu+0x140/0x140
      [  214.923183][    C2]  ? synchronize_rcu.part.79+0x85/0xd0
      [  214.923185][    C2]  ? synchronize_rcu_expedited+0xbb0/0xbb0
      [  214.923187][    C2]  rollback_registered+0xc8/0x170
      [  214.923188][    C2]  ? rollback_registered_many+0xcf0/0xcf0
      [  214.923190][    C2]  unregister_netdevice_queue+0x18b/0x240
      [  214.923191][    C2]  hsr_dev_finalize+0x56e/0x6e0 [hsr]
      [  214.923192][    C2]  hsr_newlink+0x36b/0x450 [hsr]
      [  214.923194][    C2]  ? hsr_dellink+0x70/0x70 [hsr]
      [  214.923195][    C2]  ? rtnl_create_link+0x2e4/0xb00
      [  214.923197][    C2]  ? __netlink_ns_capable+0xc3/0xf0
      [  214.923198][    C2]  __rtnl_newlink+0xbdb/0x1270
      [ ... ]
      
      Fixes: e0a4b997 ("hsr: use upper/lower device infrastructure")
      Reported-by: syzbot+7f1c020f68dab95aab59@syzkaller.appspotmail.com
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccfc9df1
  7. 04 7月, 2020 1 次提交
    • T
      sched: consistently handle layer3 header accesses in the presence of VLANs · d7bf2ebe
      Toke Høiland-Jørgensen 提交于
      There are a couple of places in net/sched/ that check skb->protocol and act
      on the value there. However, in the presence of VLAN tags, the value stored
      in skb->protocol can be inconsistent based on whether VLAN acceleration is
      enabled. The commit quoted in the Fixes tag below fixed the users of
      skb->protocol to use a helper that will always see the VLAN ethertype.
      
      However, most of the callers don't actually handle the VLAN ethertype, but
      expect to find the IP header type in the protocol field. This means that
      things like changing the ECN field, or parsing diffserv values, stops
      working if there's a VLAN tag, or if there are multiple nested VLAN
      tags (QinQ).
      
      To fix this, change the helper to take an argument that indicates whether
      the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
      make sure to skip all of them, so behaviour is consistent even in QinQ
      mode.
      
      To make the helper usable from the ECN code, move it to if_vlan.h instead
      of pkt_sched.h.
      
      v3:
      - Remove empty lines
      - Move vlan variable definitions inside loop in skb_protocol()
      - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
        bpf_skb_ecn_set_ce()
      
      v2:
      - Use eth_type_vlan() helper in skb_protocol()
      - Also fix code that reads skb->protocol directly
      - Change a couple of 'if/else if' statements to switch constructs to avoid
        calling the helper twice
      Reported-by: NIlya Ponetayev <i.ponetaev@ndmsystems.com>
      Fixes: d8b9605d ("net: sched: fix skb->protocol use in case of accelerated vlan path")
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7bf2ebe
  8. 03 7月, 2020 2 次提交
    • P
      netfilter: conntrack: refetch conntrack after nf_conntrack_update() · d005fbb8
      Pablo Neira Ayuso 提交于
      __nf_conntrack_update() might refresh the conntrack object that is
      attached to the skbuff. Otherwise, this triggers UAF.
      
      [  633.200434] ==================================================================
      [  633.200472] BUG: KASAN: use-after-free in nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200478] Read of size 1 at addr ffff888370804c00 by task nfqnl_test/6769
      
      [  633.200487] CPU: 1 PID: 6769 Comm: nfqnl_test Not tainted 5.8.0-rc2+ #388
      [  633.200490] Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012
      [  633.200491] Call Trace:
      [  633.200499]  dump_stack+0x7c/0xb0
      [  633.200526]  ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200532]  print_address_description.constprop.6+0x1a/0x200
      [  633.200539]  ? _raw_write_lock_irqsave+0xc0/0xc0
      [  633.200568]  ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200594]  ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200598]  kasan_report.cold.9+0x1f/0x42
      [  633.200604]  ? call_rcu+0x2c0/0x390
      [  633.200633]  ? nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200659]  nf_conntrack_update+0x34e/0x770 [nf_conntrack]
      [  633.200687]  ? nf_conntrack_find_get+0x30/0x30 [nf_conntrack]
      
      Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1436
      Fixes: ee04805f ("netfilter: conntrack: make conntrack userspace helpers work again")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d005fbb8
    • E
      tcp: md5: allow changing MD5 keys in all socket states · 1ca0fafd
      Eric Dumazet 提交于
      This essentially reverts commit 72123032 ("tcp: md5: reject TCP_MD5SIG
      or TCP_MD5SIG_EXT on established sockets")
      
      Mathieu reported that many vendors BGP implementations can
      actually switch TCP MD5 on established flows.
      
      Quoting Mathieu :
         Here is a list of a few network vendors along with their behavior
         with respect to TCP MD5:
      
         - Cisco: Allows for password to be changed, but within the hold-down
           timer (~180 seconds).
         - Juniper: When password is initially set on active connection it will
           reset, but after that any subsequent password changes no network
           resets.
         - Nokia: No notes on if they flap the tcp connection or not.
         - Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until
           both sides are ok with new passwords.
         - Meta-Switch: Expects the password to be set before a connection is
           attempted, but no further info on whether they reset the TCP
           connection on a change.
         - Avaya: Disable the neighbor, then set password, then re-enable.
         - Zebos: Would normally allow the change when socket connected.
      
      We can revert my prior change because commit 9424e2e7 ("tcp: md5: fix potential
      overestimation of TCP option space") removed the leak of 4 kernel bytes to
      the wire that was the main reason for my patch.
      
      While doing my investigations, I found a bug when a MD5 key is changed, leading
      to these commits that stable teams want to consider before backporting this revert :
      
       Commit 6a2febec ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
       Commit e6ced831 ("tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers")
      
      Fixes: 72123032 "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets"
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ca0fafd
  9. 02 7月, 2020 6 次提交
    • E
      tcp: fix SO_RCVLOWAT possible hangs under high mem pressure · ba3bb0e7
      Eric Dumazet 提交于
      Whenever tcp_try_rmem_schedule() returns an error, we are under
      trouble and should make sure to wakeup readers so that they
      can drain socket queues and eventually make room.
      
      Fixes: 03f45c88 ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba3bb0e7
    • W
      ip: Fix SO_MARK in RST, ACK and ICMP packets · 0da7536f
      Willem de Bruijn 提交于
      When no full socket is available, skbs are sent over a per-netns
      control socket. Its sk_mark is temporarily adjusted to match that
      of the real (request or timewait) socket or to reflect an incoming
      skb, so that the outgoing skb inherits this in __ip_make_skb.
      
      Introduction of the socket cookie mark field broke this. Now the
      skb is set through the cookie and cork:
      
      <caller>		# init sockc.mark from sk_mark or cmsg
      ip_append_data
        ip_setup_cork		# convert sockc.mark to cork mark
      ip_push_pending_frames
        ip_finish_skb
          __ip_make_skb	# set skb->mark to cork mark
      
      But I missed these special control sockets. Update all callers of
      __ip(6)_make_skb that were originally missed.
      
      For IPv6, the same two icmp(v6) paths are affected. The third
      case is not, as commit 92e55f41 ("tcp: don't annotate
      mark on control socket from tcp_v6_send_response()") replaced
      the ctl_sk->sk_mark with passing the mark field directly as a
      function argument. That commit predates the commit that
      introduced the bug.
      
      Fixes: c6af0c22 ("ip: support SO_MARK cmsg")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reported-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0da7536f
    • E
      tcp: md5: do not send silly options in SYNCOOKIES · e114e1e8
      Eric Dumazet 提交于
      Whenever cookie_init_timestamp() has been used to encode
      ECN,SACK,WSCALE options, we can not remove the TS option in the SYNACK.
      
      Otherwise, tcp_synack_options() will still advertize options like WSCALE
      that we can not deduce later when receiving the packet from the client
      to complete 3WHS.
      
      Note that modern linux TCP stacks wont use MD5+TS+SACK in a SYN packet,
      but we can not know for sure that all TCP stacks have the same logic.
      
      Before the fix a tcpdump would exhibit this wrong exchange :
      
      10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 456965269 ecr 0,nop,wscale 8], length 0
      10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602, win 65535, options [nop,nop,md5 valid,mss 1400,nop,nop,sackOK,nop,wscale 8], length 0
      10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid], length 0
      10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid], length 12
      10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options [nop,nop,md5 valid], length 0
      
      After this patch the exchange looks saner :
      
      11:59:59.882990 IP C > S: Flags [S], seq 517075944, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508483 ecr 0,nop,wscale 8], length 0
      11:59:59.883002 IP S > C: Flags [S.], seq 1902939253, ack 517075945, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508479 ecr 1751508483,nop,wscale 8], length 0
      11:59:59.883012 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 0
      11:59:59.883114 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 12
      11:59:59.883122 IP S > C: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508483], length 0
      11:59:59.883152 IP S > C: Flags [P.], seq 1:13, ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508483], length 12
      11:59:59.883170 IP C > S: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508484], length 0
      
      Of course, no SACK block will ever be added later, but nothing should break.
      Technically, we could remove the 4 nops included in MD5+TS options,
      but again some stacks could break seeing not conventional alignment.
      
      Fixes: 4957faad ("TCPCT part 1g: Responder Cookie => Initiator")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e114e1e8
    • R
      rds: If one path needs re-connection, check all and re-connect · 9ef845f8
      Rao Shoaib 提交于
      In testing with mprds enabled, Oracle Cluster nodes after reboot were
      not able to communicate with others nodes and so failed to rejoin
      the cluster. Peers with lower IP address initiated connection but the
      node could not respond as it choose a different path and could not
      initiate a connection as it had a higher IP address.
      
      With this patch, when a node sends out a packet and the selected path
      is down, all other paths are also checked and any down paths are
      re-connected.
      Reviewed-by: NKa-cheong Poon <ka-cheong.poon@oracle.com>
      Reviewed-by: NDavid Edmondson <david.edmondson@oracle.com>
      Signed-off-by: NSomasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
      Signed-off-by: NRao Shoaib <rao.shoaib@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ef845f8
    • E
      tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers · e6ced831
      Eric Dumazet 提交于
      My prior fix went a bit too far, according to Herbert and Mathieu.
      
      Since we accept that concurrent TCP MD5 lookups might see inconsistent
      keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb()
      
      Clearing all key->key[] is needed to avoid possible KMSAN reports,
      if key->keylen is increased. Since tcp_md5_do_add() is not fast path,
      using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler.
      
      data_race() was added in linux-5.8 and will prevent KCSAN reports,
      this can safely be removed in stable backports, if data_race() is
      not yet backported.
      
      v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add()
      
      Fixes: 6a2febec ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Marco Elver <elver@google.com>
      Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6ced831
    • S
      genetlink: remove genl_bind · 1e82a62f
      Sean Tranchetti 提交于
      A potential deadlock can occur during registering or unregistering a
      new generic netlink family between the main nl_table_lock and the
      cb_lock where each thread wants the lock held by the other, as
      demonstrated below.
      
      1) Thread 1 is performing a netlink_bind() operation on a socket. As part
         of this call, it will call netlink_lock_table(), incrementing the
         nl_table_users count to 1.
      2) Thread 2 is registering (or unregistering) a genl_family via the
         genl_(un)register_family() API. The cb_lock semaphore will be taken for
         writing.
      3) Thread 1 will call genl_bind() as part of the bind operation to handle
         subscribing to GENL multicast groups at the request of the user. It will
         attempt to take the cb_lock semaphore for reading, but it will fail and
         be scheduled away, waiting for Thread 2 to finish the write.
      4) Thread 2 will call netlink_table_grab() during the (un)registration
         call. However, as Thread 1 has incremented nl_table_users, it will not
         be able to proceed, and both threads will be stuck waiting for the
         other.
      
      genl_bind() is a noop, unless a genl_family implements the mcast_bind()
      function to handle setting up family-specific multicast operations. Since
      no one in-tree uses this functionality as Cong pointed out, simply removing
      the genl_bind() function will remove the possibility for deadlock, as there
      is no attempt by Thread 1 above to take the cb_lock semaphore.
      
      Fixes: c380d9a7 ("genetlink: pass multicast bind/unbind to families")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJohannes Berg <johannes.berg@intel.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e82a62f
  10. 01 7月, 2020 10 次提交
新手
引导
客服 返回
顶部