1. 31 5月, 2019 5 次提交
    • E
      net-gro: fix use-after-free read in napi_gro_frags() · a4270d67
      Eric Dumazet 提交于
      If a network driver provides to napi_gro_frags() an
      skb with a page fragment of exactly 14 bytes, the call
      to gro_pull_from_frag0() will 'consume' the fragment
      by calling skb_frag_unref(skb, 0), and the page might
      be freed and reused.
      
      Reading eth->h_proto at the end of napi_frags_skb() might
      read mangled data, or crash under specific debugging features.
      
      BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline]
      BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
      Read of size 2 at addr ffff88809366840c by task syz-executor599/8957
      
      CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
       __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       kasan_report+0x12/0x20 mm/kasan/common.c:614
       __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142
       napi_frags_skb net/core/dev.c:5833 [inline]
       napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
       tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
       call_write_iter include/linux/fs.h:1872 [inline]
       do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
       do_iter_write fs/read_write.c:970 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:951
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
       do_writev+0x15b/0x330 fs/read_write.c:1058
      
      Fixes: a50e233c ("net-gro: restore frag0 optimization")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4270d67
    • V
      net: dsa: tag_8021q: Create a stable binary format · 0471dd42
      Vladimir Oltean 提交于
      Tools like tcpdump need to be able to decode the significance of fake
      VLAN headers that DSA uses to separate switch ports.
      
      But currently these have no global significance - they are simply an
      ordered list of DSA_MAX_SWITCHES x DSA_MAX_PORTS numbers ending at 4095.
      
      The reason why this is submitted as a fix is that the existing mapping
      of VIDs should not enter into a stable kernel, so we can pretend that
      only the new format exists. This way tcpdump won't need to try to make
      something out of the VLAN tags on 5.2 kernels.
      
      Fixes: f9bbe447 ("net: dsa: Optional VLAN-based port separation for switches without tagging")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0471dd42
    • I
      net: dsa: tag_8021q: Change order of rx_vid setup · d34d2baa
      Ioana Ciornei 提交于
      The 802.1Q tagging performs an unbalanced setup in terms of RX VIDs on
      the CPU port. For the ingress path of a 802.1Q switch to work, the RX
      VID of a port needs to be seen as tagged egress on the CPU port.
      
      While configuring the other front-panel ports to be part of this VID,
      for bridge scenarios, the untagged flag is applied even on the CPU port
      in dsa_switch_vlan_add.  This happens because DSA applies the same flags
      on the CPU port as on the (bridge-controlled) slave ports, and the
      effect in this case is that the CPU port tagged settings get deleted.
      
      Instead of fixing DSA by introducing a way to control VLAN flags on the
      CPU port (and hence stop inheriting from the slave ports) - a hard,
      perhaps intractable problem - avoid this situation by moving the setup
      part of the RX VID on the CPU port after all the other front-panel ports
      have been added to the VID.
      
      Fixes: f9bbe447 ("net: dsa: Optional VLAN-based port separation for switches without tagging")
      Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d34d2baa
    • Y
      ipv4: tcp_input: fix stack out of bounds when parsing TCP options. · 9609dad2
      Young Xiao 提交于
      The TCP option parsing routines in tcp_parse_options function could
      read one byte out of the buffer of the TCP options.
      
      1         while (length > 0) {
      2                 int opcode = *ptr++;
      3                 int opsize;
      4
      5                 switch (opcode) {
      6                 case TCPOPT_EOL:
      7                         return;
      8                 case TCPOPT_NOP:        /* Ref: RFC 793 section 3.1 */
      9                         length--;
      10                        continue;
      11                default:
      12                        opsize = *ptr++; //out of bound access
      
      If length = 1, then there is an access in line2.
      And another access is occurred in line 12.
      This would lead to out-of-bound access.
      
      Therefore, in the patch we check that the available data length is
      larger enough to pase both TCP option code and size.
      Signed-off-by: NYoung Xiao <92siuyang@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9609dad2
    • S
      net: core: support XDP generic on stacked devices. · 458bf2f2
      Stephen Hemminger 提交于
      When a device is stacked like (team, bonding, failsafe or netvsc) the
      XDP generic program for the parent device was not called.
      
      Move the call to XDP generic inside __netif_receive_skb_core where
      it can be done multiple times for stacked case.
      
      Fixes: d4455169 ("net: xdp: support xdp generic on virtual devices")
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      458bf2f2
  2. 29 5月, 2019 1 次提交
    • E
      llc: fix skb leak in llc_build_and_send_ui_pkt() · 8fb44d60
      Eric Dumazet 提交于
      If llc_mac_hdr_init() returns an error, we must drop the skb
      since no llc_build_and_send_ui_pkt() caller will take care of this.
      
      BUG: memory leak
      unreferenced object 0xffff8881202b6800 (size 2048):
        comm "syz-executor907", pid 7074, jiffies 4294943781 (age 8.590s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          1a 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
        backtrace:
          [<00000000e25b5abe>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
          [<00000000e25b5abe>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<00000000e25b5abe>] slab_alloc mm/slab.c:3326 [inline]
          [<00000000e25b5abe>] __do_kmalloc mm/slab.c:3658 [inline]
          [<00000000e25b5abe>] __kmalloc+0x161/0x2c0 mm/slab.c:3669
          [<00000000a1ae188a>] kmalloc include/linux/slab.h:552 [inline]
          [<00000000a1ae188a>] sk_prot_alloc+0xd6/0x170 net/core/sock.c:1608
          [<00000000ded25bbe>] sk_alloc+0x35/0x2f0 net/core/sock.c:1662
          [<000000002ecae075>] llc_sk_alloc+0x35/0x170 net/llc/llc_conn.c:950
          [<00000000551f7c47>] llc_ui_create+0x7b/0x140 net/llc/af_llc.c:173
          [<0000000029027f0e>] __sock_create+0x164/0x250 net/socket.c:1430
          [<000000008bdec225>] sock_create net/socket.c:1481 [inline]
          [<000000008bdec225>] __sys_socket+0x69/0x110 net/socket.c:1523
          [<00000000b6439228>] __do_sys_socket net/socket.c:1532 [inline]
          [<00000000b6439228>] __se_sys_socket net/socket.c:1530 [inline]
          [<00000000b6439228>] __x64_sys_socket+0x1e/0x30 net/socket.c:1530
          [<00000000cec820c1>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
          [<000000000c32554f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      BUG: memory leak
      unreferenced object 0xffff88811d750d00 (size 224):
        comm "syz-executor907", pid 7074, jiffies 4294943781 (age 8.600s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 f0 0c 24 81 88 ff ff 00 68 2b 20 81 88 ff ff  ...$.....h+ ....
        backtrace:
          [<0000000053026172>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
          [<0000000053026172>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<0000000053026172>] slab_alloc_node mm/slab.c:3269 [inline]
          [<0000000053026172>] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579
          [<00000000fa8f3c30>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198
          [<00000000d96fdafb>] alloc_skb include/linux/skbuff.h:1058 [inline]
          [<00000000d96fdafb>] alloc_skb_with_frags+0x5f/0x250 net/core/skbuff.c:5327
          [<000000000a34a2e7>] sock_alloc_send_pskb+0x269/0x2a0 net/core/sock.c:2225
          [<00000000ee39999b>] sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2242
          [<00000000e034d810>] llc_ui_sendmsg+0x10a/0x540 net/llc/af_llc.c:933
          [<00000000c0bc8445>] sock_sendmsg_nosec net/socket.c:652 [inline]
          [<00000000c0bc8445>] sock_sendmsg+0x54/0x70 net/socket.c:671
          [<000000003b687167>] __sys_sendto+0x148/0x1f0 net/socket.c:1964
          [<00000000922d78d9>] __do_sys_sendto net/socket.c:1976 [inline]
          [<00000000922d78d9>] __se_sys_sendto net/socket.c:1972 [inline]
          [<00000000922d78d9>] __x64_sys_sendto+0x2a/0x30 net/socket.c:1972
          [<00000000cec820c1>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
          [<000000000c32554f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fb44d60
  3. 27 5月, 2019 2 次提交
  4. 26 5月, 2019 2 次提交
  5. 25 5月, 2019 1 次提交
    • V
      net: sched: don't use tc_action->order during action dump · 4097e9d2
      Vlad Buslov 提交于
      Function tcf_action_dump() relies on tc_action->order field when starting
      nested nla to send action data to userspace. This approach breaks in
      several cases:
      
      - When multiple filters point to same shared action, tc_action->order field
        is overwritten each time it is attached to filter. This causes filter
        dump to output action with incorrect attribute for all filters that have
        the action in different position (different order) from the last set
        tc_action->order value.
      
      - When action data is displayed using tc action API (RTM_GETACTION), action
        order is overwritten by tca_action_gd() according to its position in
        resulting array of nl attributes, which will break filter dump for all
        filters attached to that shared action that expect it to have different
        order value.
      
      Don't rely on tc_action->order when dumping actions. Set nla according to
      action position in resulting array of actions instead.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4097e9d2
  6. 24 5月, 2019 1 次提交
  7. 23 5月, 2019 7 次提交
    • E
      ipv4/igmp: fix build error if !CONFIG_IP_MULTICAST · 903869bd
      Eric Dumazet 提交于
      ip_sf_list_clear_all() needs to be defined even if !CONFIG_IP_MULTICAST
      
      Fixes: 3580d04a ("ipv4/igmp: fix another memory leak in igmpv3_del_delrec()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      903869bd
    • E
      ipv4/igmp: fix another memory leak in igmpv3_del_delrec() · 3580d04a
      Eric Dumazet 提交于
      syzbot reported memory leaks [1] that I have back tracked to
      a missing cleanup from igmpv3_del_delrec() when
      (im->sfmode != MCAST_INCLUDE)
      
      Add ip_sf_list_clear_all() and kfree_pmc() helpers to explicitely
      handle the cleanups before freeing.
      
      [1]
      
      BUG: memory leak
      unreferenced object 0xffff888123e32b00 (size 64):
        comm "softirq", pid 0, jiffies 4294942968 (age 8.010s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 e0 00 00 01 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000006105011b>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
          [<000000006105011b>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<000000006105011b>] slab_alloc mm/slab.c:3326 [inline]
          [<000000006105011b>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
          [<000000004bba8073>] kmalloc include/linux/slab.h:547 [inline]
          [<000000004bba8073>] kzalloc include/linux/slab.h:742 [inline]
          [<000000004bba8073>] ip_mc_add1_src net/ipv4/igmp.c:1961 [inline]
          [<000000004bba8073>] ip_mc_add_src+0x36b/0x400 net/ipv4/igmp.c:2085
          [<00000000a46a65a0>] ip_mc_msfilter+0x22d/0x310 net/ipv4/igmp.c:2475
          [<000000005956ca89>] do_ip_setsockopt.isra.0+0x1795/0x1930 net/ipv4/ip_sockglue.c:957
          [<00000000848e2d2f>] ip_setsockopt+0x3b/0xb0 net/ipv4/ip_sockglue.c:1246
          [<00000000b9db185c>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616
          [<000000003028e438>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3130
          [<0000000015b65589>] __sys_setsockopt+0x98/0x120 net/socket.c:2078
          [<00000000ac198ef0>] __do_sys_setsockopt net/socket.c:2089 [inline]
          [<00000000ac198ef0>] __se_sys_setsockopt net/socket.c:2086 [inline]
          [<00000000ac198ef0>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
          [<000000000a770437>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
          [<00000000d3adb93b>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 9c8bb163 ("igmp, mld: Fix memory leak in igmpv3/mld_del_delrec()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hangbin Liu <liuhangbin@gmail.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3580d04a
    • D
      ipv6: Fix redirect with VRF · 31680ac2
      David Ahern 提交于
      IPv6 redirect is broken for VRF. __ip6_route_redirect walks the FIB
      entries looking for an exact match on ifindex. With VRF the flowi6_oif
      is updated by l3mdev_update_flow to the l3mdev index and the
      FLOWI_FLAG_SKIP_NH_OIF set in the flags to tell the lookup to skip the
      device match. For redirects the device match is requires so use that
      flag to know when the oif needs to be reset to the skb device index.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31680ac2
    • J
      net/tls: don't ignore netdev notifications if no TLS features · c3f4a6c3
      Jakub Kicinski 提交于
      On device surprise removal path (the notifier) we can't
      bail just because the features are disabled.  They may
      have been enabled during the lifetime of the device.
      This bug leads to leaking netdev references and
      use-after-frees if there are active connections while
      device features are cleared.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3f4a6c3
    • J
      net/tls: fix state removal with feature flags off · 3686637e
      Jakub Kicinski 提交于
      TLS offload drivers shouldn't (and currently don't) block
      the TLS offload feature changes based on whether there are
      active offloaded connections or not.
      
      This seems to be a good idea, because we want the admin to
      be able to disable the TLS offload at any time, and there
      is no clean way of disabling it for active connections
      (TX side is quite problematic).  So if features are cleared
      existing connections will stay offloaded until they close,
      and new connections will not attempt offload to a given
      device.
      
      However, the offload state removal handling is currently
      broken if feature flags get cleared while there are
      active TLS offloads.
      
      RX side will completely bail from cleanup, even on normal
      remove path, leaving device state dangling, potentially
      causing issues when the 5-tuple is reused.  It will also
      fail to release the netdev reference.
      
      Remove the RX-side warning message, in next release cycle
      it should be printed when features are disabled, rather
      than when connection dies, but for that we need a more
      efficient method of finding connection of a given netdev
      (a'la BPF offload code).
      
      Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3686637e
    • J
      net/tls: avoid NULL-deref on resync during device removal · 38030d7c
      Jakub Kicinski 提交于
      When netdev with active kTLS sockets in unregistered
      notifier callback walks the offloaded sockets and
      cleans up offload state.  RX data may still be processed,
      however, and if resync was requested prior to device
      removal we would hit a NULL pointer dereference on
      ctx->netdev use.
      
      Make sure resync is under the device offload lock
      and NULL-check the netdev pointer.
      
      This should be safe, because the pointer is set to
      NULL either in the netdev notifier (under said lock)
      or when socket is completely dead and no resync can
      happen.
      
      The other access to ctx->netdev in tls_validate_xmit_skb()
      does not dereference the pointer, it just checks it against
      other device pointer, so it should be pretty safe (perhaps
      we can add a READ_ONCE/WRITE_ONCE there, if paranoid).
      
      Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38030d7c
    • M
      Validate required parameters in inet6_validate_link_af · 7dc2bcca
      Maxim Mikityanskiy 提交于
      inet6_set_link_af requires that at least one of IFLA_INET6_TOKEN or
      IFLA_INET6_ADDR_GET_MODE is passed. If none of them is passed, it
      returns -EINVAL, which may cause do_setlink() to fail in the middle of
      processing other commands and give the following warning message:
      
        A link change request failed with some changes committed already.
        Interface eth0 may have been left with an inconsistent configuration,
        please check.
      
      Check the presence of at least one of them in inet6_validate_link_af to
      detect invalid parameters at an early stage, before do_setlink does
      anything. Also validate the address generation mode at an early stage.
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7dc2bcca
  8. 22 5月, 2019 7 次提交
    • F
      netfilter: nft_flow_offload: IPCB is only valid for ipv4 family · 69aeb538
      Florian Westphal 提交于
      Guard this with a check vs. ipv4, IPCB isn't valid in ipv6 case.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      69aeb538
    • F
      netfilter: nft_flow_offload: don't offload when sequence numbers need adjustment · 91a9048f
      Florian Westphal 提交于
      We can't deal with tcp sequence number rewrite in flow_offload.
      While at it, simplify helper check, we only need to know if the extension
      is present, we don't need the helper data.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      91a9048f
    • F
      netfilter: nft_flow_offload: set liberal tracking mode for tcp · 8437a620
      Florian Westphal 提交于
      Without it, whenever a packet has to be pushed up the stack (e.g. because
      of mtu mismatch), then conntrack will flag packets as invalid, which in
      turn breaks NAT.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8437a620
    • F
      netfilter: nf_flow_table: ignore DF bit setting · e75b3e1c
      Florian Westphal 提交于
      Its irrelevant if the DF bit is set or not, we must pass packet to
      stack in either case.
      
      If the DF bit is set, we must pass it to stack so the appropriate
      ICMP error can be generated.
      
      If the DF is not set, we must pass it to stack for fragmentation.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e75b3e1c
    • M
      ipv6: Consider sk_bound_dev_if when binding a raw socket to an address · 72f7cfab
      Mike Manning 提交于
      IPv6 does not consider if the socket is bound to a device when binding
      to an address. The result is that a socket can be bound to eth0 and
      then bound to the address of eth1. If the device is a VRF, the result
      is that a socket can only be bound to an address in the default VRF.
      
      Resolve by considering the device if sk_bound_dev_if is set.
      Signed-off-by: NMike Manning <mmanning@vyatta.att-mail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72f7cfab
    • F
      netfilter: nat: fix udp checksum corruption · 6bac76db
      Florian Westphal 提交于
      Due to copy&paste error nf_nat_mangle_udp_packet passes IPPROTO_TCP,
      resulting in incorrect udp checksum when payload had to be mangled.
      
      Fixes: dac3fe72 ("netfilter: nat: remove csum_recalc hook")
      Reported-by: NMarc Haber <mh+netdev@zugschlus.de>
      Tested-by: NMarc Haber <mh+netdev@zugschlus.de>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6bac76db
    • Y
      ipvs: Fix use-after-free in ip_vs_in · 719c7d56
      YueHaibing 提交于
      BUG: KASAN: use-after-free in ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
      Read of size 4 at addr ffff8881e9b26e2c by task sshd/5603
      
      CPU: 0 PID: 5603 Comm: sshd Not tainted 4.19.39+ #30
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      Call Trace:
       dump_stack+0x71/0xab
       print_address_description+0x6a/0x270
       kasan_report+0x179/0x2c0
       ip_vs_in.part.29+0xe8/0xd20 [ip_vs]
       ip_vs_in+0xd8/0x170 [ip_vs]
       nf_hook_slow+0x5f/0xe0
       __ip_local_out+0x1d5/0x250
       ip_local_out+0x19/0x60
       __tcp_transmit_skb+0xba1/0x14f0
       tcp_write_xmit+0x41f/0x1ed0
       ? _copy_from_iter_full+0xca/0x340
       __tcp_push_pending_frames+0x52/0x140
       tcp_sendmsg_locked+0x787/0x1600
       ? tcp_sendpage+0x60/0x60
       ? inet_sk_set_state+0xb0/0xb0
       tcp_sendmsg+0x27/0x40
       sock_sendmsg+0x6d/0x80
       sock_write_iter+0x121/0x1c0
       ? sock_sendmsg+0x80/0x80
       __vfs_write+0x23e/0x370
       vfs_write+0xe7/0x230
       ksys_write+0xa1/0x120
       ? __ia32_sys_read+0x50/0x50
       ? __audit_syscall_exit+0x3ce/0x450
       do_syscall_64+0x73/0x200
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7ff6f6147c60
      Code: 73 01 c3 48 8b 0d 28 12 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 5d 73 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83
      RSP: 002b:00007ffd772ead18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000034 RCX: 00007ff6f6147c60
      RDX: 0000000000000034 RSI: 000055df30a31270 RDI: 0000000000000003
      RBP: 000055df30a31270 R08: 0000000000000000 R09: 0000000000000000
      R10: 00007ffd772ead70 R11: 0000000000000246 R12: 00007ffd772ead74
      R13: 00007ffd772eae20 R14: 00007ffd772eae24 R15: 000055df2f12ddc0
      
      Allocated by task 6052:
       kasan_kmalloc+0xa0/0xd0
       __kmalloc+0x10a/0x220
       ops_init+0x97/0x190
       register_pernet_operations+0x1ac/0x360
       register_pernet_subsys+0x24/0x40
       0xffffffffc0ea016d
       do_one_initcall+0x8b/0x253
       do_init_module+0xe3/0x335
       load_module+0x2fc0/0x3890
       __do_sys_finit_module+0x192/0x1c0
       do_syscall_64+0x73/0x200
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 6067:
       __kasan_slab_free+0x130/0x180
       kfree+0x90/0x1a0
       ops_free_list.part.7+0xa6/0xc0
       unregister_pernet_operations+0x18b/0x1f0
       unregister_pernet_subsys+0x1d/0x30
       ip_vs_cleanup+0x1d/0xd2f [ip_vs]
       __x64_sys_delete_module+0x20c/0x300
       do_syscall_64+0x73/0x200
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The buggy address belongs to the object at ffff8881e9b26600 which belongs to the cache kmalloc-4096 of size 4096
      The buggy address is located 2092 bytes inside of 4096-byte region [ffff8881e9b26600, ffff8881e9b27600)
      The buggy address belongs to the page:
      page:ffffea0007a6c800 count:1 mapcount:0 mapping:ffff888107c0e600 index:0x0 compound_mapcount: 0
      flags: 0x17ffffc0008100(slab|head)
      raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff888107c0e600
      raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      while unregistering ipvs module, ops_free_list calls
      __ip_vs_cleanup, then nf_unregister_net_hooks be called to
      do remove nf hook entries. It need a RCU period to finish,
      however net->ipvs is set to NULL immediately, which will
      trigger NULL pointer dereference when a packet is hooked
      and handled by ip_vs_in where net->ipvs is dereferenced.
      
      Another scene is ops_free_list call ops_free to free the
      net_generic directly while __ip_vs_cleanup finished, then
      calling ip_vs_in will triggers use-after-free.
      
      This patch moves nf_unregister_net_hooks from __ip_vs_cleanup()
      to __ip_vs_dev_cleanup(),  where rcu_barrier() is called by
      unregister_pernet_device -> unregister_pernet_operations,
      that will do the needed grace period.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: efe41606 ("ipvs: convert to use pernet nf_hook api")
      Suggested-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      719c7d56
  9. 21 5月, 2019 14 次提交