1. 04 10月, 2017 3 次提交
  2. 03 10月, 2017 4 次提交
    • E
      socket, bpf: fix possible use after free · eefca20e
      Eric Dumazet 提交于
      Starting from linux-4.4, 3WHS no longer takes the listener lock.
      
      Since this time, we might hit a use-after-free in sk_filter_charge(),
      if the filter we got in the memcpy() of the listener content
      just happened to be replaced by a thread changing listener BPF filter.
      
      To fix this, we need to make sure the filter refcount is not already
      zero before incrementing it again.
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eefca20e
    • D
      Merge branch 'mlxsw-gre-fixes' · 4ee4553e
      David S. Miller 提交于
      Jiri Pirko says:
      
      ====================
      mlxsw: Fixes in GRE offloading
      
      Petr says:
      
      This patchset fixes a couple unrelated problems in offloading IP-in-IP tunnels
      in mlxsw driver.
      
      - The first patch fixes a potential reference-counting problem that might lead
        to a kernel crash.
      
      - The second patch associates IPIP next hops with their loopback RIFs. Besides
        being the right thing to do, it also fixes a problem where offloaded IPv6
        routes that forward to IP-in-IP netdevices were not flagged as such.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ee4553e
    • P
      mlxsw: spectrum_router: Track RIF of IPIP next hops · de0f43c0
      Petr Machata 提交于
      When considering whether to set RTNH_F_OFFLOAD flag on an IPv6 route,
      mlxsw_sp_fib6_entry_offload_set() looks up the mlxsw_sp_nexthop
      corresponding to a given route, and decides based on whether the next
      hop's offloaded flag was set. When looking for the matching next hop, it
      also takes into account the device of the route, which must match next
      hop's RIF.
      
      IPIP next hops however hitherto didn't set the RIF. As a result, IPv6
      routes forwarding traffic to IP-in-IP netdevices are never marked as
      offloaded, even when they actually are.
      
      Thus track RIF of IPIP next hops the same way as that of ETHERNET next
      hops.
      
      Fixes: 8f28a309 ("mlxsw: spectrum_router: Support IPv6 overlay encap")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de0f43c0
    • P
      mlxsw: spectrum_router: Move VRF refcounting · 28a04c7b
      Petr Machata 提交于
      When creating a new RIF, bumping RIF count of the containing VR is the
      last thing to be done. Symmetrically, when destroying a RIF, RIF count
      is first dropped and only then the rest of the cleanup proceeds.
      
      That's a problem for loopback RIFs. Those hold two VR references: one
      for overlay and one for underlay. mlxsw_sp_rif_destroy() releases the
      overlay one, and the deconfigure() callback the underlay one. But if
      both overlay and underlay are the same, and if there are no other
      artifacts holding the VR alive, this put actually destroys the VR. Later
      on, when mlxsw_sp_rif_destroy() calls mlxsw_sp_vr_put() for the same VR,
      the VR will already have been released and the kernel crashes with NULL
      pointer dereference.
      
      The underlying problem is that the RIF under destruction ends up
      referencing the overlay VR much longer than it claims: all the way until
      the call to mlxsw_sp_vr_put(). So line up the reference counting
      properly to reflect this. Make corresponding changes in
      mlxsw_sp_rif_create() as well for symmetry.
      
      Fixes: 6ddb7426 ("mlxsw: spectrum_router: Introduce loopback RIFs")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28a04c7b
  3. 02 10月, 2017 9 次提交
  4. 01 10月, 2017 8 次提交
    • P
      tipc: use only positive error codes in messages · aad06212
      Parthasarathy Bhuvaragan 提交于
      In commit e3a77561 ("tipc: split up function tipc_msg_eval()"),
      we have updated the function tipc_msg_lookup_dest() to set the error
      codes to negative values at destination lookup failures. Thus when
      the function sets the error code to -TIPC_ERR_NO_NAME, its inserted
      into the 4 bit error field of the message header as 0xf instead of
      TIPC_ERR_NO_NAME (1). The value 0xf is an unknown error code.
      
      In this commit, we set only positive error code.
      
      Fixes: e3a77561 ("tipc: split up function tipc_msg_eval()")
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aad06212
    • G
      ppp: fix __percpu annotation · 5a59a3a0
      Guillaume Nault 提交于
      Move sparse annotation right after pointer type.
      
      Fixes sparse warning:
          drivers/net/ppp/ppp_generic.c:1422:13: warning: incorrect type in initializer (different address spaces)
          drivers/net/ppp/ppp_generic.c:1422:13:    expected void const [noderef] <asn:3>*__vpp_verify
          drivers/net/ppp/ppp_generic.c:1422:13:    got int *<noident>
          ...
      
      Fixes: e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp devices")
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a59a3a0
    • D
      Merge branch 'udp-fix-early-demux-for-mcast-packets' · 230583c1
      David S. Miller 提交于
      Paolo Abeni says:
      
      ====================
      udp: fix early demux for mcast packets
      
      Currently the early demux callbacks do not perform source address validation.
      This is not an issue for TCP or UDP unicast, where the early demux
      is only allowed for connected sockets and the source address is validated
      for the first packet and never change.
      
      The UDP protocol currently allows early demux also for unconnected multicast
      sockets, and we are not currently doing any validation for them, after that
      the first packet lands on the socket: beyond ignoring the rp_filter - if
      enabled - any kind of martian sources are also allowed.
      
      This series addresses the issue allowing the early demux callback to return an
      error code, and performing the proper checks for unconnected UDP multicast
      sockets before leveraging the rx dst cache.
      
      Alternatively we could disable the early demux for unconnected mcast sockets,
      but that would cause relevant performance regression - around 50% - while with
      this series, with full rp_filter in place, we keep the regression to a more
      moderate level.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      230583c1
    • P
      udp: perform source validation for mcast early demux · bc044e8d
      Paolo Abeni 提交于
      The UDP early demux can leverate the rx dst cache even for
      multicast unconnected sockets.
      
      In such scenario the ipv4 source address is validated only on
      the first packet in the given flow. After that, when we fetch
      the dst entry  from the socket rx cache, we stop enforcing
      the rp_filter and we even start accepting any kind of martian
      addresses.
      
      Disabling the dst cache for unconnected multicast socket will
      cause large performace regression, nearly reducing by half the
      max ingress tput.
      
      Instead we factor out a route helper to completely validate an
      skb source address for multicast packets and we call it from
      the UDP early demux for mcast packets landing on unconnected
      sockets, after successful fetching the related cached dst entry.
      
      This still gives a measurable, but limited performance
      regression:
      
      		rp_filter = 0		rp_filter = 1
      edmux disabled:	1182 Kpps		1127 Kpps
      edmux before:	2238 Kpps		2238 Kpps
      edmux after:	2037 Kpps		2019 Kpps
      
      The above figures are on top of current net tree.
      Applying the net-next commit 6e617de8 ("net: avoid a full
      fib lookup when rp_filter is disabled.") the delta with
      rp_filter == 0 will decrease even more.
      
      Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc044e8d
    • P
      IPv4: early demux can return an error code · 7487449c
      Paolo Abeni 提交于
      Currently no error is emitted, but this infrastructure will
      used by the next patch to allow source address validation
      for mcast sockets.
      Since early demux can do a route lookup and an ipv4 route
      lookup can return an error code this is consistent with the
      current ipv4 route infrastructure.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7487449c
    • X
      ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx path · d41bb33b
      Xin Long 提交于
      Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel
      device, like ip6gre_tap tunnel, for which it should also subtract ether
      header to get the correct mtu.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d41bb33b
    • X
      ip6_gre: ip6gre_tap device should keep dst · 2d40557c
      Xin Long 提交于
      The patch 'ip_gre: ipgre_tap device should keep dst' fixed
      a issue that ipgre_tap mtu couldn't be updated in tx path.
      
      The same fix is needed for ip6gre_tap as well.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d40557c
    • X
      ip_gre: ipgre_tap device should keep dst · d51711c0
      Xin Long 提交于
      Without keeping dst, the tunnel will not update any mtu/pmtu info,
      since it does not have a dst on the skb.
      
      Reproducer:
        client(ipgre_tap1 - eth1) <-----> (eth1 - ipgre_tap1)server
      
      After reducing eth1's mtu on client, then perforamnce became 0.
      
      This patch is to netif_keep_dst in gre_tap_init, as ipgre does.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d51711c0
  5. 30 9月, 2017 1 次提交
    • J
      netlink: do not proceed if dump's start() errs · fef0035c
      Jason A. Donenfeld 提交于
      Drivers that use the start method for netlink dumping rely on dumpit not
      being called if start fails. For example, ila_xlat.c allocates memory
      and assigns it to cb->args[0] in its start() function. It might fail to
      do that and return -ENOMEM instead. However, even when returning an
      error, dumpit will be called, which, in the example above, quickly
      dereferences the memory in cb->args[0], which will OOPS the kernel. This
      is but one example of how this goes wrong.
      
      Since start() has always been a function with an int return type, it
      therefore makes sense to use it properly, rather than ignoring it. This
      patch thus returns early and does not call dumpit() when start() fails.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fef0035c
  6. 29 9月, 2017 14 次提交
  7. 28 9月, 2017 1 次提交
    • A
      tun: bail out from tun_get_user() if the skb is empty · 2580c4c1
      Alexander Potapenko 提交于
      KMSAN (https://github.com/google/kmsan) reported accessing uninitialized
      skb->data[0] in the case the skb is empty (i.e. skb->len is 0):
      
      ================================================
      BUG: KMSAN: use of uninitialized memory in tun_get_user+0x19ba/0x3770
      CPU: 0 PID: 3051 Comm: probe Not tainted 4.13.0+ #3140
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
      ...
       __msan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:477
       tun_get_user+0x19ba/0x3770 drivers/net/tun.c:1301
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
      ...
      origin:
      ...
       kmsan_poison_shadow+0x6e/0xc0 mm/kmsan/kmsan.c:211
       slab_alloc_node mm/slub.c:2732
       __kmalloc_node_track_caller+0x351/0x370 mm/slub.c:4351
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x26a/0x810 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:903
       alloc_skb_with_frags+0x1d7/0xc80 net/core/skbuff.c:4756
       sock_alloc_send_pskb+0xabf/0xfe0 net/core/sock.c:2037
       tun_alloc_skb drivers/net/tun.c:1144
       tun_get_user+0x9a8/0x3770 drivers/net/tun.c:1274
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
      ================================================
      
      Make sure tun_get_user() doesn't touch skb->data[0] unless there is
      actual data.
      
      C reproducer below:
      ==========================
          // autogenerated by syzkaller (http://github.com/google/syzkaller)
      
          #define _GNU_SOURCE
      
          #include <fcntl.h>
          #include <linux/if_tun.h>
          #include <netinet/ip.h>
          #include <net/if.h>
          #include <string.h>
          #include <sys/ioctl.h>
      
          int main()
          {
            int sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP);
            int tun_fd = open("/dev/net/tun", O_RDWR);
            struct ifreq req;
            memset(&req, 0, sizeof(struct ifreq));
            strcpy((char*)&req.ifr_name, "gre0");
            req.ifr_flags = IFF_UP | IFF_MULTICAST;
            ioctl(tun_fd, TUNSETIFF, &req);
            ioctl(sock, SIOCSIFFLAGS, "gre0");
            write(tun_fd, "hi", 0);
            return 0;
          }
      ==========================
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2580c4c1