1. 18 6月, 2019 2 次提交
  2. 04 6月, 2019 17 次提交
    • G
      net/tcp: Support tunable tcp timeout value in TIME-WAIT state · ebfbd2cf
      George Zhang 提交于
      By default the tcp_tw_timeout value is 60 seconds. The minimum is
      1 second and the maximum is 600. This setting is useful on system under
      heavy tcp load.
      
      NOTE: set the tcp_tw_timeout below 60 seconds voilates the "quiet time"
      restriction, and make your system into the risk of causing some old data
      to be accepted as new or new data rejected as old duplicated by some
      receivers.
      
      Link: http://web.archive.org/web/20150102003320/http://tools.ietf.org/html/rfc793Signed-off-by: NGeorge Zhang <georgezhang@linux.alibaba.com>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      ebfbd2cf
    • K
      net/hookers: fix link error with ipv6 disabled · 7ab038c3
      kbuild test robot 提交于
      lkp-build bot reported the following link error with ipv6 disabled:
      
      ld: net/hookers/hookers.o:(.data+0x40): undefined reference to `ipv6_specific'
      ld: net/hookers/hookers.o:(.data+0x78): undefined reference to `ipv6_mapped'
      ld: net/hookers/hookers.o:(.data+0xe8): undefined reference to `inet6_stream_ops'
      
      Fixed this issue by adding IS_ENABLED(CONFIG_IPV6) check.
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NCaspar Zhang <caspar@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      7ab038c3
    • C
      net/hookers: only enable on x86 platform · 91277586
      Caspar Zhang 提交于
      read/write_cr0() are used in net/hookers.c, but they are only available
      on x86 platform. Adding a depend-on fields in Kconfig to disable this
      feature in other platforms.
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NCaspar Zhang <caspar@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      91277586
    • G
      net: kernel hookers service for toa module · 3f52596d
      George Zhang 提交于
      LVS fullnat will replace network traffic's source ip with its local ip,
      and thus the backend servers cannot obtain the real client ip.
      
      To solve this, LVS has introduced the tcp option address (TOA) to store
      the essential ip address information in the last tcp ack packet of the
      3-way handshake, and the backend servers need to retrieve it from the
      packet header.
      
      In this patch, we have introduced the sk_toa_data member in the sock
      structure to hold the TOA information. There used to be an in-tree
      module for TOA managing, whereas it has now been maintained as an
      standalone module.
      
      In this case, the toa module should register its hook function(s) using
      the provided interfaces in the hookers module.
      
      TOA in sock structure:
      
      	__be32 sk_toa_data[16];
      
      The hookers module only provides the sk_toa_data placeholder, and the
      toa module can use this variable through the layout it needs.
      
      Hook interfaces:
      
      The hookers module replaces the kernel's syn_recv_sock and getname
      handler with a stub that chains the toa module's hook function(s) to the
      original handling function. The hookers module allows hook functions to
      be installed and uninstalled in any order.
      
      toa module:
      
      The external toa module will be provided in separate RPM package.
      
      [xuyu@linux.alibaba.com: amend commit log]
      Signed-off-by: NGeorge Zhang <georgezhang@linux.alibaba.com>
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
      3f52596d
    • J
      tipc: fix modprobe tipc failed after switch order of device registration · ca75a9fc
      Junwei Hu 提交于
      commit 526f5b851a96566803ee4bee60d0a34df56c77f8 upstream.
      
      Error message printed:
      modprobe: ERROR: could not insert 'tipc': Address family not
      supported by protocol.
      when modprobe tipc after the following patch: switch order of
      device registration, commit 7e27e8d6130c
      ("tipc: switch order of device registration to fix a crash")
      
      Because sock_create_kern(net, AF_TIPC, ...) called by
      tipc_topsrv_create_listener() in the initialization process
      of tipc_init_net(), so tipc_socket_init() must be execute before that.
      Meanwhile, tipc_net_id need to be initialized when sock_create()
      called, and tipc_socket_init() is no need to be called for each namespace.
      
      I add a variable tipc_topsrv_net_ops, and split the
      register_pernet_subsys() of tipc into two parts, and split
      tipc_socket_init() with initialization of pernet params.
      
      By the way, I fixed resources rollback error when tipc_bcast_init()
      failed in tipc_init_net().
      
      Fixes: 7e27e8d6130c ("tipc: switch order of device registration to fix a crash")
      Signed-off-by: NJunwei Hu <hujunwei4@huawei.com>
      Reported-by: NWang Wang <wangwang2@huawei.com>
      Reported-by: syzbot+1e8114b61079bfe9cbc5@syzkaller.appspotmail.com
      Reviewed-by: NKang Zhou <zhoukang7@huawei.com>
      Reviewed-by: NSuanming Mou <mousuanming@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca75a9fc
    • D
      Revert "tipc: fix modprobe tipc failed after switch order of device registration" · ab69a230
      David S. Miller 提交于
      commit 5593530e56943182ebb6d81eca8a3be6db6dbba4 upstream.
      
      This reverts commit 532b0f7ece4cb2ffd24dc723ddf55242d1188e5e.
      
      More revisions coming up.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab69a230
    • M
      jump_label: move 'asm goto' support test to Kconfig · 0276ebf1
      Masahiro Yamada 提交于
      commit e9666d10a5677a494260d60d1fa0b73cc7646eb3 upstream.
      
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
      
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
        #endif
      
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      [nc: Fix trivial conflicts in 4.19
           arch/xtensa/kernel/jump_label.c doesn't exist yet
           Ensured CC_HAVE_ASM_GOTO and HAVE_JUMP_LABEL were sufficiently
           eliminated]
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      0276ebf1
    • J
      net/tls: don't ignore netdev notifications if no TLS features · fb6cf4f3
      Jakub Kicinski 提交于
      [ Upstream commit c3f4a6c39cf269a40d45f813c05fa830318ad875 ]
      
      On device surprise removal path (the notifier) we can't
      bail just because the features are disabled.  They may
      have been enabled during the lifetime of the device.
      This bug leads to leaking netdev references and
      use-after-frees if there are active connections while
      device features are cleared.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb6cf4f3
    • J
      net/tls: fix state removal with feature flags off · fb69403e
      Jakub Kicinski 提交于
      [ Upstream commit 3686637e507b48525fcea6fb91e1988bdbc14530 ]
      
      TLS offload drivers shouldn't (and currently don't) block
      the TLS offload feature changes based on whether there are
      active offloaded connections or not.
      
      This seems to be a good idea, because we want the admin to
      be able to disable the TLS offload at any time, and there
      is no clean way of disabling it for active connections
      (TX side is quite problematic).  So if features are cleared
      existing connections will stay offloaded until they close,
      and new connections will not attempt offload to a given
      device.
      
      However, the offload state removal handling is currently
      broken if feature flags get cleared while there are
      active TLS offloads.
      
      RX side will completely bail from cleanup, even on normal
      remove path, leaving device state dangling, potentially
      causing issues when the 5-tuple is reused.  It will also
      fail to release the netdev reference.
      
      Remove the RX-side warning message, in next release cycle
      it should be printed when features are disabled, rather
      than when connection dies, but for that we need a more
      efficient method of finding connection of a given netdev
      (a'la BPF offload code).
      
      Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb69403e
    • V
      net: sched: don't use tc_action->order during action dump · 6ab96847
      Vlad Buslov 提交于
      [ Upstream commit 4097e9d250fb17958c1d9b94538386edd3f20144 ]
      
      Function tcf_action_dump() relies on tc_action->order field when starting
      nested nla to send action data to userspace. This approach breaks in
      several cases:
      
      - When multiple filters point to same shared action, tc_action->order field
        is overwritten each time it is attached to filter. This causes filter
        dump to output action with incorrect attribute for all filters that have
        the action in different position (different order) from the last set
        tc_action->order value.
      
      - When action data is displayed using tc action API (RTM_GETACTION), action
        order is overwritten by tca_action_gd() according to its position in
        resulting array of nl attributes, which will break filter dump for all
        filters attached to that shared action that expect it to have different
        order value.
      
      Don't rely on tc_action->order when dumping actions. Set nla according to
      action position in resulting array of actions instead.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ab96847
    • E
      net-gro: fix use-after-free read in napi_gro_frags() · 39fd0dc4
      Eric Dumazet 提交于
      [ Upstream commit a4270d6795b0580287453ea55974d948393e66ef ]
      
      If a network driver provides to napi_gro_frags() an
      skb with a page fragment of exactly 14 bytes, the call
      to gro_pull_from_frag0() will 'consume' the fragment
      by calling skb_frag_unref(skb, 0), and the page might
      be freed and reused.
      
      Reading eth->h_proto at the end of napi_frags_skb() might
      read mangled data, or crash under specific debugging features.
      
      BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline]
      BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
      Read of size 2 at addr ffff88809366840c by task syz-executor599/8957
      
      CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
       __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       kasan_report+0x12/0x20 mm/kasan/common.c:614
       __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142
       napi_frags_skb net/core/dev.c:5833 [inline]
       napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
       tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
       call_write_iter include/linux/fs.h:1872 [inline]
       do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
       do_iter_write fs/read_write.c:970 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:951
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
       do_writev+0x15b/0x330 fs/read_write.c:1058
      
      Fixes: a50e233c ("net-gro: restore frag0 optimization")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39fd0dc4
    • E
      llc: fix skb leak in llc_build_and_send_ui_pkt() · 2d04f32c
      Eric Dumazet 提交于
      [ Upstream commit 8fb44d60d4142cd2a440620cd291d346e23c131e ]
      
      If llc_mac_hdr_init() returns an error, we must drop the skb
      since no llc_build_and_send_ui_pkt() caller will take care of this.
      
      BUG: memory leak
      unreferenced object 0xffff8881202b6800 (size 2048):
        comm "syz-executor907", pid 7074, jiffies 4294943781 (age 8.590s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          1a 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
        backtrace:
          [<00000000e25b5abe>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
          [<00000000e25b5abe>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<00000000e25b5abe>] slab_alloc mm/slab.c:3326 [inline]
          [<00000000e25b5abe>] __do_kmalloc mm/slab.c:3658 [inline]
          [<00000000e25b5abe>] __kmalloc+0x161/0x2c0 mm/slab.c:3669
          [<00000000a1ae188a>] kmalloc include/linux/slab.h:552 [inline]
          [<00000000a1ae188a>] sk_prot_alloc+0xd6/0x170 net/core/sock.c:1608
          [<00000000ded25bbe>] sk_alloc+0x35/0x2f0 net/core/sock.c:1662
          [<000000002ecae075>] llc_sk_alloc+0x35/0x170 net/llc/llc_conn.c:950
          [<00000000551f7c47>] llc_ui_create+0x7b/0x140 net/llc/af_llc.c:173
          [<0000000029027f0e>] __sock_create+0x164/0x250 net/socket.c:1430
          [<000000008bdec225>] sock_create net/socket.c:1481 [inline]
          [<000000008bdec225>] __sys_socket+0x69/0x110 net/socket.c:1523
          [<00000000b6439228>] __do_sys_socket net/socket.c:1532 [inline]
          [<00000000b6439228>] __se_sys_socket net/socket.c:1530 [inline]
          [<00000000b6439228>] __x64_sys_socket+0x1e/0x30 net/socket.c:1530
          [<00000000cec820c1>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
          [<000000000c32554f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      BUG: memory leak
      unreferenced object 0xffff88811d750d00 (size 224):
        comm "syz-executor907", pid 7074, jiffies 4294943781 (age 8.600s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 f0 0c 24 81 88 ff ff 00 68 2b 20 81 88 ff ff  ...$.....h+ ....
        backtrace:
          [<0000000053026172>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
          [<0000000053026172>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<0000000053026172>] slab_alloc_node mm/slab.c:3269 [inline]
          [<0000000053026172>] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579
          [<00000000fa8f3c30>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198
          [<00000000d96fdafb>] alloc_skb include/linux/skbuff.h:1058 [inline]
          [<00000000d96fdafb>] alloc_skb_with_frags+0x5f/0x250 net/core/skbuff.c:5327
          [<000000000a34a2e7>] sock_alloc_send_pskb+0x269/0x2a0 net/core/sock.c:2225
          [<00000000ee39999b>] sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2242
          [<00000000e034d810>] llc_ui_sendmsg+0x10a/0x540 net/llc/af_llc.c:933
          [<00000000c0bc8445>] sock_sendmsg_nosec net/socket.c:652 [inline]
          [<00000000c0bc8445>] sock_sendmsg+0x54/0x70 net/socket.c:671
          [<000000003b687167>] __sys_sendto+0x148/0x1f0 net/socket.c:1964
          [<00000000922d78d9>] __do_sys_sendto net/socket.c:1976 [inline]
          [<00000000922d78d9>] __se_sys_sendto net/socket.c:1972 [inline]
          [<00000000922d78d9>] __x64_sys_sendto+0x2a/0x30 net/socket.c:1972
          [<00000000cec820c1>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
          [<000000000c32554f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d04f32c
    • D
      ipv6: Fix redirect with VRF · 44217666
      David Ahern 提交于
      [ Upstream commit 31680ac265802397937d75461a2809a067b9fb93 ]
      
      IPv6 redirect is broken for VRF. __ip6_route_redirect walks the FIB
      entries looking for an exact match on ifindex. With VRF the flowi6_oif
      is updated by l3mdev_update_flow to the l3mdev index and the
      FLOWI_FLAG_SKIP_NH_OIF set in the flags to tell the lookup to skip the
      device match. For redirects the device match is requires so use that
      flag to know when the oif needs to be reset to the skb device index.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44217666
    • M
      ipv6: Consider sk_bound_dev_if when binding a raw socket to an address · ed753b39
      Mike Manning 提交于
      [ Upstream commit 72f7cfab6f93a8ea825fab8ccfb016d064269f7f ]
      
      IPv6 does not consider if the socket is bound to a device when binding
      to an address. The result is that a socket can be bound to eth0 and
      then bound to the address of eth1. If the device is a VRF, the result
      is that a socket can only be bound to an address in the default VRF.
      
      Resolve by considering the device if sk_bound_dev_if is set.
      Signed-off-by: NMike Manning <mmanning@vyatta.att-mail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed753b39
    • E
      ipv4/igmp: fix build error if !CONFIG_IP_MULTICAST · 46702dd5
      Eric Dumazet 提交于
      [ Upstream commit 903869bd10e6719b9df6718e785be7ec725df59f ]
      
      ip_sf_list_clear_all() needs to be defined even if !CONFIG_IP_MULTICAST
      
      Fixes: 3580d04aa674 ("ipv4/igmp: fix another memory leak in igmpv3_del_delrec()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46702dd5
    • E
      ipv4/igmp: fix another memory leak in igmpv3_del_delrec() · e9f94e48
      Eric Dumazet 提交于
      [ Upstream commit 3580d04aa674383c42de7b635d28e52a1e5bc72c ]
      
      syzbot reported memory leaks [1] that I have back tracked to
      a missing cleanup from igmpv3_del_delrec() when
      (im->sfmode != MCAST_INCLUDE)
      
      Add ip_sf_list_clear_all() and kfree_pmc() helpers to explicitely
      handle the cleanups before freeing.
      
      [1]
      
      BUG: memory leak
      unreferenced object 0xffff888123e32b00 (size 64):
        comm "softirq", pid 0, jiffies 4294942968 (age 8.010s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 e0 00 00 01 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000006105011b>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
          [<000000006105011b>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<000000006105011b>] slab_alloc mm/slab.c:3326 [inline]
          [<000000006105011b>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
          [<000000004bba8073>] kmalloc include/linux/slab.h:547 [inline]
          [<000000004bba8073>] kzalloc include/linux/slab.h:742 [inline]
          [<000000004bba8073>] ip_mc_add1_src net/ipv4/igmp.c:1961 [inline]
          [<000000004bba8073>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2085
          [<00000000a46a65a0>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2475
          [<000000005956ca89>] do_ip_setsockopt.isra.0+0x1795/0x1930 net/ipv4/ip_sockglue.c:957
          [<00000000848e2d2f>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1246
          [<00000000b9db185c>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616
          [<000000003028e438>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3130
          [<0000000015b65589>] __sys_setsockopt+0x98/0x120 net/socket.c:2078
          [<00000000ac198ef0>] __do_sys_setsockopt net/socket.c:2089 [inline]
          [<00000000ac198ef0>] __se_sys_setsockopt net/socket.c:2086 [inline]
          [<00000000ac198ef0>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
          [<000000000a770437>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
          [<00000000d3adb93b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 9c8bb163 ("igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hangbin Liu <liuhangbin@gmail.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9f94e48
    • E
      inet: switch IP ID generator to siphash · 07480da0
      Eric Dumazet 提交于
      [ Upstream commit df453700e8d81b1bdafdf684365ee2b9431fb702 ]
      
      According to Amit Klein and Benny Pinkas, IP ID generation is too weak
      and might be used by attackers.
      
      Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
      having 64bit key and Jenkins hash is risky.
      
      It is time to switch to siphash and its 128bit keys.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAmit Klein <aksecurity@gmail.com>
      Reported-by: NBenny Pinkas <benny@pinkas.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07480da0
  3. 31 5月, 2019 5 次提交
    • L
      batman-adv: allow updating DAT entry timeouts on incoming ARP Replies · 25204fe6
      Linus Lüssing 提交于
      [ Upstream commit 099e6cc1582dc2903fecb898bbeae8f7cf4262c7 ]
      
      Currently incoming ARP Replies, for example via a DHT-PUT message, do
      not update the timeout for an already existing DAT entry. These ARP
      Replies are dropped instead.
      
      This however defeats the purpose of the DHCPACK snooping, for instance.
      Right now, a DAT entry in the DHT will be purged every five minutes,
      likely leading to a mesh-wide ARP Request broadcast after this timeout.
      Which then recreates the entry. The idea of the DHCPACK snooping is to
      be able to update an entry before a timeout happens, to avoid ARP Request
      flooding.
      
      This patch fixes this issue by updating a DAT entry on incoming
      ARP Replies even if a matching DAT entry already exists. While still
      filtering the ARP Reply towards the soft-interface, to avoid duplicate
      messages on the client device side.
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Acked-by: NAntonio Quartulli <a@unstable.cc>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      25204fe6
    • S
      mac80211/cfg80211: update bss channel on channel switch · ca5b9d63
      Sergey Matyukevich 提交于
      [ Upstream commit 5dc8cdce1d722c733f8c7af14c5fb595cfedbfa8 ]
      
      FullMAC STAs have no way to update bss channel after CSA channel switch
      completion. As a result, user-space tools may provide inconsistent
      channel info. For instance, consider the following two commands:
      $ sudo iw dev wlan0 link
      $ sudo iw dev wlan0 info
      The latter command gets channel info from the hardware, so most probably
      its output will be correct. However the former command gets channel info
      from scan cache, so its output will contain outdated channel info.
      In fact, current bss channel info will not be updated until the
      next [re-]connect.
      
      Note that mac80211 STAs have a workaround for this, but it requires
      access to internal cfg80211 data, see ieee80211_chswitch_work:
      
      	/* XXX: shouldn't really modify cfg80211-owned data! */
      	ifmgd->associated->channel = sdata->csa_chandef.chan;
      
      This patch suggests to convert mac80211 workaround into cfg80211 behavior
      and to update current bss channel in cfg80211_ch_switch_notify.
      Signed-off-by: NSergey Matyukevich <sergey.matyukevich.os@quantenna.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ca5b9d63
    • J
      Bluetooth: Ignore CC events not matching the last HCI command · 8603d499
      João Paulo Rechi Vita 提交于
      [ Upstream commit f80c5dad7b6467b884c445ffea45985793b4b2d0 ]
      
      This commit makes the kernel not send the next queued HCI command until
      a command complete arrives for the last HCI command sent to the
      controller. This change avoids a problem with some buggy controllers
      (seen on two SKUs of QCA9377) that send an extra command complete event
      for the previous command after the kernel had already sent a new HCI
      command to the controller.
      
      The problem was reproduced when starting an active scanning procedure,
      where an extra command complete event arrives for the LE_SET_RANDOM_ADDR
      command. When this happends the kernel ends up not processing the
      command complete for the following commmand, LE_SET_SCAN_PARAM, and
      ultimately behaving as if a passive scanning procedure was being
      performed, when in fact controller is performing an active scanning
      procedure. This makes it impossible to discover BLE devices as no device
      found events are sent to userspace.
      
      This problem is reproducible on 100% of the attempts on the affected
      controllers. The extra command complete event can be seen at timestamp
      27.420131 on the btmon logs bellow.
      
      Bluetooth monitor ver 5.50
      = Note: Linux version 5.0.0+ (x86_64)                                  0.352340
      = Note: Bluetooth subsystem version 2.22                               0.352343
      = New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0)               [hci0] 0.352344
      = Open Index: 80:C5:F2:8F:87:84                                 [hci0] 0.352345
      = Index Info: 80:C5:F2:8F:87:84 (Qualcomm)                      [hci0] 0.352346
      @ MGMT Open: bluetoothd (privileged) version 1.14             {0x0001} 0.352347
      @ MGMT Open: btmon (privileged) version 1.14                  {0x0002} 0.352366
      @ MGMT Open: btmgmt (privileged) version 1.14                {0x0003} 27.302164
      @ MGMT Command: Start Discovery (0x0023) plen 1       {0x0003} [hci0] 27.302310
              Address type: 0x06
                LE Public
                LE Random
      < HCI Command: LE Set Random Address (0x08|0x0005) plen 6   #1 [hci0] 27.302496
              Address: 15:60:F2:91:B2:24 (Non-Resolvable)
      > HCI Event: Command Complete (0x0e) plen 4                 #2 [hci0] 27.419117
            LE Set Random Address (0x08|0x0005) ncmd 1
              Status: Success (0x00)
      < HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7  #3 [hci0] 27.419244
              Type: Active (0x01)
              Interval: 11.250 msec (0x0012)
              Window: 11.250 msec (0x0012)
              Own address type: Random (0x01)
              Filter policy: Accept all advertisement (0x00)
      > HCI Event: Command Complete (0x0e) plen 4                 #4 [hci0] 27.420131
            LE Set Random Address (0x08|0x0005) ncmd 1
              Status: Success (0x00)
      < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2      #5 [hci0] 27.420259
              Scanning: Enabled (0x01)
              Filter duplicates: Enabled (0x01)
      > HCI Event: Command Complete (0x0e) plen 4                 #6 [hci0] 27.420969
            LE Set Scan Parameters (0x08|0x000b) ncmd 1
              Status: Success (0x00)
      > HCI Event: Command Complete (0x0e) plen 4                 #7 [hci0] 27.421983
            LE Set Scan Enable (0x08|0x000c) ncmd 1
              Status: Success (0x00)
      @ MGMT Event: Command Complete (0x0001) plen 4        {0x0003} [hci0] 27.422059
            Start Discovery (0x0023) plen 1
              Status: Success (0x00)
              Address type: 0x06
                LE Public
                LE Random
      @ MGMT Event: Discovering (0x0013) plen 2             {0x0003} [hci0] 27.422067
              Address type: 0x06
                LE Public
                LE Random
              Discovery: Enabled (0x01)
      @ MGMT Event: Discovering (0x0013) plen 2             {0x0002} [hci0] 27.422067
              Address type: 0x06
                LE Public
                LE Random
              Discovery: Enabled (0x01)
      @ MGMT Event: Discovering (0x0013) plen 2             {0x0001} [hci0] 27.422067
              Address type: 0x06
                LE Public
                LE Random
              Discovery: Enabled (0x01)
      Signed-off-by: NJoão Paulo Rechi Vita <jprvita@endlessm.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8603d499
    • L
      batman-adv: mcast: fix multicast tt/tvlv worker locking · 363aa80a
      Linus Lüssing 提交于
      commit a3c7cd0cdf1107f891aff847ad481e34df727055 upstream.
      
      Syzbot has reported some issues with the locking assumptions made for
      the multicast tt/tvlv worker: It was able to trigger the WARN_ON() in
      batadv_mcast_mla_tt_retract() and batadv_mcast_mla_tt_add().
      While hard/not reproduceable for us so far it seems that the
      delayed_work_pending() we use might not be quite safe from reordering.
      
      Therefore this patch adds an explicit, new spinlock to protect the
      update of the mla_list and flags in bat_priv and then removes the
      WARN_ON(delayed_work_pending()).
      
      Reported-by: syzbot+83f2d54ec6b7e417e13f@syzkaller.appspotmail.com
      Reported-by: syzbot+050927a651272b145a5d@syzkaller.appspotmail.com
      Reported-by: syzbot+979ffc89b87309b1b94b@syzkaller.appspotmail.com
      Reported-by: syzbot+f9f3f388440283da2965@syzkaller.appspotmail.com
      Fixes: cbebd363 ("batman-adv: Use own timer for multicast TT and TVLV updates")
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      363aa80a
    • D
      bpf: add bpf_jit_limit knob to restrict unpriv allocations · 43caa29c
      Daniel Borkmann 提交于
      commit ede95a63b5e84ddeea6b0c473b36ab8bfd8c6ce3 upstream.
      
      Rick reported that the BPF JIT could potentially fill the entire module
      space with BPF programs from unprivileged users which would prevent later
      attempts to load normal kernel modules or privileged BPF programs, for
      example. If JIT was enabled but unsuccessful to generate the image, then
      before commit 290af866 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
      we would always fall back to the BPF interpreter. Nowadays in the case
      where the CONFIG_BPF_JIT_ALWAYS_ON could be set, then the load will abort
      with a failure since the BPF interpreter was compiled out.
      
      Add a global limit and enforce it for unprivileged users such that in case
      of BPF interpreter compiled out we fail once the limit has been reached
      or we fall back to BPF interpreter earlier w/o using module mem if latter
      was compiled in. In a next step, fair share among unprivileged users can
      be resolved in particular for the case where we would fail hard once limit
      is reached.
      
      Fixes: 290af866 ("bpf: introduce BPF_JIT_ALWAYS_ON config")
      Fixes: 0a14842f ("net: filter: Just In Time compiler for x86-64")
      Co-Developed-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: LKML <linux-kernel@vger.kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43caa29c
  4. 26 5月, 2019 16 次提交
    • B
      mac80211: Fix kernel panic due to use of txq after free · 9c045d8c
      Bhagavathi Perumal S 提交于
      [ Upstream commit f1267cf3c01b12e0f843fb6a7450a7f0b2efab8a ]
      
      The txq of vif is added to active_txqs list for ATF TXQ scheduling
      in the function ieee80211_queue_skb(), but it was not properly removed
      before freeing the txq object. It was causing use after free of the txq
      objects from the active_txqs list, result was kernel panic
      due to invalid memory access.
      
      Fix kernel invalid memory access by properly removing txq object
      from active_txqs list before free the object.
      Signed-off-by: NBhagavathi Perumal S <bperumal@codeaurora.org>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9c045d8c
    • S
      xfrm4: Fix uninitialized memory read in _decode_session4 · a654a73d
      Steffen Klassert 提交于
      [ Upstream commit 8742dc86d0c7a9628117a989c11f04a9b6b898f3 ]
      
      We currently don't reload pointers pointing into skb header
      after doing pskb_may_pull() in _decode_session4(). So in case
      pskb_may_pull() changed the pointers, we read from random
      memory. Fix this by putting all the needed infos on the
      stack, so that we don't need to access the header pointers
      after doing pskb_may_pull().
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a654a73d
    • M
      xfrm: Honor original L3 slave device in xfrmi policy lookup · 6faa6206
      Martin Willi 提交于
      [ Upstream commit 025c65e119bf58b610549ca359c9ecc5dee6a8d2 ]
      
      If an xfrmi is associated to a vrf layer 3 master device,
      xfrm_policy_check() fails after traffic decapsulation. The input
      interface is replaced by the layer 3 master device, and hence
      xfrmi_decode_session() can't match the xfrmi anymore to satisfy
      policy checking.
      
      Extend ingress xfrmi lookup to honor the original layer 3 slave
      device, allowing xfrm interfaces to operate within a vrf domain.
      
      Fixes: f203b76d ("xfrm: Add virtual xfrm interfaces")
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6faa6206
    • S
      esp4: add length check for UDP encapsulation · 3716c262
      Sabrina Dubroca 提交于
      [ Upstream commit 8dfb4eba4100e7cdd161a8baef2d8d61b7a7e62e ]
      
      esp_output_udp_encap can produce a length that doesn't fit in the 16
      bits of a UDP header's length field. In that case, we'll send a
      fragmented packet whose length is larger than IP_MAX_MTU (resulting in
      "Oversized IP packet" warnings on receive) and with a bogus UDP
      length.
      
      To prevent this, add a length check to esp_output_udp_encap and return
       -EMSGSIZE on failure.
      
      This seems to be older than git history.
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3716c262
    • C
      xfrm: clean up xfrm protocol checks · d410ef75
      Cong Wang 提交于
      [ Upstream commit dbb2483b2a46fbaf833cfb5deb5ed9cace9c7399 ]
      
      In commit 6a53b759 ("xfrm: check id proto in validate_tmpl()")
      I introduced a check for xfrm protocol, but according to Herbert
      IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
      it should be removed from validate_tmpl().
      
      And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
      protocols, this is why xfrm_state_flush() could still miss
      IPPROTO_ROUTING, which leads that those entries are left in
      net->xfrm.state_all before exit net. Fix this by replacing
      IPSEC_PROTO_ANY with zero.
      
      This patch also extracts the check from validate_tmpl() to
      xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
      With this, no other protocols should be added into xfrm.
      
      Fixes: 6a53b759 ("xfrm: check id proto in validate_tmpl()")
      Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d410ef75
    • J
      vti4: ipip tunnel deregistration fixes. · 159269cc
      Jeremy Sowden 提交于
      [ Upstream commit 5483844c3fc18474de29f5d6733003526e0a9f78 ]
      
      If tunnel registration failed during module initialization, the module
      would fail to deregister the IPPROTO_COMP protocol and would attempt to
      deregister the tunnel.
      
      The tunnel was not deregistered during module-exit.
      
      Fixes: dd9ee3444014e ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel")
      Signed-off-by: NJeremy Sowden <jeremy@azazel.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      159269cc
    • S
      xfrm6_tunnel: Fix potential panic when unloading xfrm6_tunnel module · 64f214ce
      Su Yanjun 提交于
      [ Upstream commit 6ee02a54ef990a71bf542b6f0a4e3321de9d9c66 ]
      
      When unloading xfrm6_tunnel module, xfrm6_tunnel_fini directly
      frees the xfrm6_tunnel_spi_kmem. Maybe someone has gotten the
      xfrm6_tunnel_spi, so need to wait it.
      
      Fixes: 91cc3bb0("xfrm6_tunnel: RCU conversion")
      Signed-off-by: NSu Yanjun <suyj.fnst@cn.fujitsu.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      64f214ce
    • Y
      xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink · c9516503
      YueHaibing 提交于
      [ Upstream commit b805d78d300bcf2c83d6df7da0c818b0fee41427 ]
      
      UBSAN report this:
      
      UBSAN: Undefined behaviour in net/xfrm/xfrm_policy.c:1289:24
      index 6 is out of range for type 'unsigned int [6]'
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.162-514.55.6.9.x86_64+ #13
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
       0000000000000000 1466cf39b41b23c9 ffff8801f6b07a58 ffffffff81cb35f4
       0000000041b58ab3 ffffffff83230f9c ffffffff81cb34e0 ffff8801f6b07a80
       ffff8801f6b07a20 1466cf39b41b23c9 ffffffff851706e0 ffff8801f6b07ae8
      Call Trace:
       <IRQ>  [<ffffffff81cb35f4>] __dump_stack lib/dump_stack.c:15 [inline]
       <IRQ>  [<ffffffff81cb35f4>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
       [<ffffffff81d94225>] ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
       [<ffffffff81d954db>] __ubsan_handle_out_of_bounds+0x16e/0x1b2 lib/ubsan.c:382
       [<ffffffff82a25acd>] __xfrm_policy_unlink+0x3dd/0x5b0 net/xfrm/xfrm_policy.c:1289
       [<ffffffff82a2e572>] xfrm_policy_delete+0x52/0xb0 net/xfrm/xfrm_policy.c:1309
       [<ffffffff82a3319b>] xfrm_policy_timer+0x30b/0x590 net/xfrm/xfrm_policy.c:243
       [<ffffffff813d3927>] call_timer_fn+0x237/0x990 kernel/time/timer.c:1144
       [<ffffffff813d8e7e>] __run_timers kernel/time/timer.c:1218 [inline]
       [<ffffffff813d8e7e>] run_timer_softirq+0x6ce/0xb80 kernel/time/timer.c:1401
       [<ffffffff8120d6f9>] __do_softirq+0x299/0xe10 kernel/softirq.c:273
       [<ffffffff8120e676>] invoke_softirq kernel/softirq.c:350 [inline]
       [<ffffffff8120e676>] irq_exit+0x216/0x2c0 kernel/softirq.c:391
       [<ffffffff82c5edab>] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
       [<ffffffff82c5edab>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
       [<ffffffff82c5c985>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:735
       <EOI>  [<ffffffff81188096>] ? native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:52
       [<ffffffff810834d7>] arch_safe_halt arch/x86/include/asm/paravirt.h:111 [inline]
       [<ffffffff810834d7>] default_idle+0x27/0x430 arch/x86/kernel/process.c:446
       [<ffffffff81085f05>] arch_cpu_idle+0x15/0x20 arch/x86/kernel/process.c:437
       [<ffffffff8132abc3>] default_idle_call+0x53/0x90 kernel/sched/idle.c:92
       [<ffffffff8132b32d>] cpuidle_idle_call kernel/sched/idle.c:156 [inline]
       [<ffffffff8132b32d>] cpu_idle_loop kernel/sched/idle.c:251 [inline]
       [<ffffffff8132b32d>] cpu_startup_entry+0x60d/0x9a0 kernel/sched/idle.c:299
       [<ffffffff8113e119>] start_secondary+0x3c9/0x560 arch/x86/kernel/smpboot.c:245
      
      The issue is triggered as this:
      
      xfrm_add_policy
          -->verify_newpolicy_info  //check the index provided by user with XFRM_POLICY_MAX
      			      //In my case, the index is 0x6E6BB6, so it pass the check.
          -->xfrm_policy_construct  //copy the user's policy and set xfrm_policy_timer
          -->xfrm_policy_insert
      	--> __xfrm_policy_link //use the orgin dir, in my case is 2
      	--> xfrm_gen_index   //generate policy index, there is 0x6E6BB6
      
      then xfrm_policy_timer be fired
      
      xfrm_policy_timer
         --> xfrm_policy_id2dir  //get dir from (policy index & 7), in my case is 6
         --> xfrm_policy_delete
            --> __xfrm_policy_unlink //access policy_count[dir], trigger out of range access
      
      Add xfrm_policy_id2dir check in verify_newpolicy_info, make sure the computed dir is
      valid, to fix the issue.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: e682adf0 ("xfrm: Try to honor policy index if it's supplied by user")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c9516503
    • J
      vsock/virtio: Initialize core virtio vsock before registering the driver · 9f12f4c9
      Jorge E. Moreira 提交于
      [ Upstream commit ba95e5dfd36647622d8897a2a0470dde60e59ffd ]
      
      Avoid a race in which static variables in net/vmw_vsock/af_vsock.c are
      accessed (while handling interrupts) before they are initialized.
      
      [    4.201410] BUG: unable to handle kernel paging request at ffffffffffffffe8
      [    4.207829] IP: vsock_addr_equals_addr+0x3/0x20
      [    4.211379] PGD 28210067 P4D 28210067 PUD 28212067 PMD 0
      [    4.211379] Oops: 0000 [#1] PREEMPT SMP PTI
      [    4.211379] Modules linked in:
      [    4.211379] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 4.14.106-419297-gd7e28cc1f241 #1
      [    4.211379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [    4.211379] Workqueue: virtio_vsock virtio_transport_rx_work
      [    4.211379] task: ffffa3273d175280 task.stack: ffffaea1800e8000
      [    4.211379] RIP: 0010:vsock_addr_equals_addr+0x3/0x20
      [    4.211379] RSP: 0000:ffffaea1800ebd28 EFLAGS: 00010286
      [    4.211379] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffffffffb94e42f0
      [    4.211379] RDX: 0000000000000400 RSI: ffffffffffffffe0 RDI: ffffaea1800ebdd0
      [    4.211379] RBP: ffffaea1800ebd58 R08: 0000000000000001 R09: 0000000000000001
      [    4.211379] R10: 0000000000000000 R11: ffffffffb89d5d60 R12: ffffaea1800ebdd0
      [    4.211379] R13: 00000000828cbfbf R14: 0000000000000000 R15: ffffaea1800ebdc0
      [    4.211379] FS:  0000000000000000(0000) GS:ffffa3273fd00000(0000) knlGS:0000000000000000
      [    4.211379] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    4.211379] CR2: ffffffffffffffe8 CR3: 000000002820e001 CR4: 00000000001606e0
      [    4.211379] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    4.211379] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    4.211379] Call Trace:
      [    4.211379]  ? vsock_find_connected_socket+0x6c/0xe0
      [    4.211379]  virtio_transport_recv_pkt+0x15f/0x740
      [    4.211379]  ? detach_buf+0x1b5/0x210
      [    4.211379]  virtio_transport_rx_work+0xb7/0x140
      [    4.211379]  process_one_work+0x1ef/0x480
      [    4.211379]  worker_thread+0x312/0x460
      [    4.211379]  kthread+0x132/0x140
      [    4.211379]  ? process_one_work+0x480/0x480
      [    4.211379]  ? kthread_destroy_worker+0xd0/0xd0
      [    4.211379]  ret_from_fork+0x35/0x40
      [    4.211379] Code: c7 47 08 00 00 00 00 66 c7 07 28 00 c7 47 08 ff ff ff ff c7 47 04 ff ff ff ff c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 8b 47 08 <3b> 46 08 75 0a 8b 47 04 3b 46 04 0f 94 c0 c3 31 c0 c3 90 66 2e
      [    4.211379] RIP: vsock_addr_equals_addr+0x3/0x20 RSP: ffffaea1800ebd28
      [    4.211379] CR2: ffffffffffffffe8
      [    4.211379] ---[ end trace f31cc4a2e6df3689 ]---
      [    4.211379] Kernel panic - not syncing: Fatal exception in interrupt
      [    4.211379] Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [    4.211379] Rebooting in 5 seconds..
      
      Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug")
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Stefano Garzarella <sgarzare@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: kvm@vger.kernel.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: netdev@vger.kernel.org
      Cc: kernel-team@android.com
      Cc: stable@vger.kernel.org [4.9+]
      Signed-off-by: NJorge E. Moreira <jemoreira@google.com>
      Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f12f4c9
    • J
      tipc: fix modprobe tipc failed after switch order of device registration · 4b900077
      Junwei Hu 提交于
      [ Upstream commit 532b0f7ece4cb2ffd24dc723ddf55242d1188e5e ]
      
      Error message printed:
      modprobe: ERROR: could not insert 'tipc': Address family not
      supported by protocol.
      when modprobe tipc after the following patch: switch order of
      device registration, commit 7e27e8d6130c
      ("tipc: switch order of device registration to fix a crash")
      
      Because sock_create_kern(net, AF_TIPC, ...) is called by
      tipc_topsrv_create_listener() in the initialization process
      of tipc_net_ops, tipc_socket_init() must be execute before that.
      
      I move tipc_socket_init() into function tipc_init_net().
      
      Fixes: 7e27e8d6130c
      ("tipc: switch order of device registration to fix a crash")
      Signed-off-by: NJunwei Hu <hujunwei4@huawei.com>
      Reported-by: NWang Wang <wangwang2@huawei.com>
      Reviewed-by: NKang Zhou <zhoukang7@huawei.com>
      Reviewed-by: NSuanming Mou <mousuanming@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b900077
    • S
      vsock/virtio: free packets during the socket release · 4af8a327
      Stefano Garzarella 提交于
      [ Upstream commit ac03046ece2b158ebd204dfc4896fd9f39f0e6c8 ]
      
      When the socket is released, we should free all packets
      queued in the per-socket list in order to avoid a memory
      leak.
      Signed-off-by: NStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4af8a327
    • J
      tipc: switch order of device registration to fix a crash · 2f7025b0
      Junwei Hu 提交于
      [ Upstream commit 7e27e8d6130c5e88fac9ddec4249f7f2337fe7f8 ]
      
      When tipc is loaded while many processes try to create a TIPC socket,
      a crash occurs:
       PANIC: Unable to handle kernel paging request at virtual
       address "dfff20000000021d"
       pc : tipc_sk_create+0x374/0x1180 [tipc]
       lr : tipc_sk_create+0x374/0x1180 [tipc]
         Exception class = DABT (current EL), IL = 32 bits
       Call trace:
        tipc_sk_create+0x374/0x1180 [tipc]
        __sock_create+0x1cc/0x408
        __sys_socket+0xec/0x1f0
        __arm64_sys_socket+0x74/0xa8
       ...
      
      This is due to race between sock_create and unfinished
      register_pernet_device. tipc_sk_insert tries to do
      "net_generic(net, tipc_net_id)".
      but tipc_net_id is not initialized yet.
      
      So switch the order of the two to close the race.
      
      This can be reproduced with multiple processes doing socket(AF_TIPC, ...)
      and one process doing module removal.
      
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: NJunwei Hu <hujunwei4@huawei.com>
      Reported-by: NWang Wang <wangwang2@huawei.com>
      Reviewed-by: NXiaogang Wang <wangxiaogang3@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f7025b0
    • S
      rtnetlink: always put IFLA_LINK for links with a link-netnsid · 2636da60
      Sabrina Dubroca 提交于
      [ Upstream commit feadc4b6cf42a53a8a93c918a569a0b7e62bd350 ]
      
      Currently, nla_put_iflink() doesn't put the IFLA_LINK attribute when
      iflink == ifindex.
      
      In some cases, a device can be created in a different netns with the
      same ifindex as its parent. That device will not dump its IFLA_LINK
      attribute, which can confuse some userspace software that expects it.
      For example, if the last ifindex created in init_net and foo are both
      8, these commands will trigger the issue:
      
          ip link add parent type dummy                   # ifindex 9
          ip link add link parent netns foo type macvlan  # ifindex 9 in ns foo
      
      So, in case a device puts the IFLA_LINK_NETNSID attribute in a dump,
      always put the IFLA_LINK attribute as well.
      
      Thanks to Dan Winship for analyzing the original OpenShift bug down to
      the missing netlink attribute.
      
      v2: change Fixes tag, it's been here forever, as Nicolas Dichtel said
          add Nicolas' ack
      v3: change Fixes tag
          fix subject typo, spotted by Edward Cree
      Analyzed-by: NDan Winship <danw@redhat.com>
      Fixes: d8a5ec67 ("[NET]: netlink support for moving devices between network namespaces.")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2636da60
    • E
      net: avoid weird emergency message · 746f8cd5
      Eric Dumazet 提交于
      [ Upstream commit d7c04b05c9ca14c55309eb139430283a45c4c25f ]
      
      When host is under high stress, it is very possible thread
      running netdev_wait_allrefs() returns from msleep(250)
      10 seconds late.
      
      This leads to these messages in the syslog :
      
      [...] unregister_netdevice: waiting for syz_tun to become free. Usage count = 0
      
      If the device refcount is zero, the wait is over.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      746f8cd5
    • E
      ipv6: prevent possible fib6 leaks · 6bc3240a
      Eric Dumazet 提交于
      [ Upstream commit 61fb0d01680771f72cc9d39783fb2c122aaad51e ]
      
      At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible
      for finding all percpu routes and set their ->from pointer
      to NULL, so that fib6_ref can reach its expected value (1).
      
      The problem right now is that other cpus can still catch the
      route being deleted, since there is no rcu grace period
      between the route deletion and call to fib6_drop_pcpu_from()
      
      This can leak the fib6 and associated resources, since no
      notifier will take care of removing the last reference(s).
      
      I decided to add another boolean (fib6_destroying) instead
      of reusing/renaming exception_bucket_flushed to ease stable backports,
      and properly document the memory barriers used to implement this fix.
      
      This patch has been co-developped with Wei Wang.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Martin Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bc3240a
    • W
      ipv6: fix src addr routing with the exception table · 81a61a95
      Wei Wang 提交于
      [ Upstream commit 510e2ceda031eed97a7a0f9aad65d271a58b460d ]
      
      When inserting route cache into the exception table, the key is
      generated with both src_addr and dest_addr with src addr routing.
      However, current logic always assumes the src_addr used to generate the
      key is a /128 host address. This is not true in the following scenarios:
      1. When the route is a gateway route or does not have next hop.
         (rt6_is_gw_or_nonexthop() == false)
      2. When calling ip6_rt_cache_alloc(), saddr is passed in as NULL.
      This means, when looking for a route cache in the exception table, we
      have to do the lookup twice: first time with the passed in /128 host
      address, second time with the src_addr stored in fib6_info.
      
      This solves the pmtu discovery issue reported by Mikael Magnusson where
      a route cache with a lower mtu info is created for a gateway route with
      src addr. However, the lookup code is not able to find this route cache.
      
      Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
      Reported-by: NMikael Magnusson <mikael.kernel@lists.m7n.se>
      Bisected-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Cc: Martin Lau <kafai@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81a61a95