1. 03 10月, 2017 1 次提交
    • P
      mlxsw: spectrum_router: Move VRF refcounting · 28a04c7b
      Petr Machata 提交于
      When creating a new RIF, bumping RIF count of the containing VR is the
      last thing to be done. Symmetrically, when destroying a RIF, RIF count
      is first dropped and only then the rest of the cleanup proceeds.
      
      That's a problem for loopback RIFs. Those hold two VR references: one
      for overlay and one for underlay. mlxsw_sp_rif_destroy() releases the
      overlay one, and the deconfigure() callback the underlay one. But if
      both overlay and underlay are the same, and if there are no other
      artifacts holding the VR alive, this put actually destroys the VR. Later
      on, when mlxsw_sp_rif_destroy() calls mlxsw_sp_vr_put() for the same VR,
      the VR will already have been released and the kernel crashes with NULL
      pointer dereference.
      
      The underlying problem is that the RIF under destruction ends up
      referencing the overlay VR much longer than it claims: all the way until
      the call to mlxsw_sp_vr_put(). So line up the reference counting
      properly to reflect this. Make corresponding changes in
      mlxsw_sp_rif_create() as well for symmetry.
      
      Fixes: 6ddb7426 ("mlxsw: spectrum_router: Introduce loopback RIFs")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28a04c7b
  2. 02 10月, 2017 9 次提交
  3. 01 10月, 2017 8 次提交
    • P
      tipc: use only positive error codes in messages · aad06212
      Parthasarathy Bhuvaragan 提交于
      In commit e3a77561 ("tipc: split up function tipc_msg_eval()"),
      we have updated the function tipc_msg_lookup_dest() to set the error
      codes to negative values at destination lookup failures. Thus when
      the function sets the error code to -TIPC_ERR_NO_NAME, its inserted
      into the 4 bit error field of the message header as 0xf instead of
      TIPC_ERR_NO_NAME (1). The value 0xf is an unknown error code.
      
      In this commit, we set only positive error code.
      
      Fixes: e3a77561 ("tipc: split up function tipc_msg_eval()")
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aad06212
    • G
      ppp: fix __percpu annotation · 5a59a3a0
      Guillaume Nault 提交于
      Move sparse annotation right after pointer type.
      
      Fixes sparse warning:
          drivers/net/ppp/ppp_generic.c:1422:13: warning: incorrect type in initializer (different address spaces)
          drivers/net/ppp/ppp_generic.c:1422:13:    expected void const [noderef] <asn:3>*__vpp_verify
          drivers/net/ppp/ppp_generic.c:1422:13:    got int *<noident>
          ...
      
      Fixes: e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp devices")
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a59a3a0
    • D
      Merge branch 'udp-fix-early-demux-for-mcast-packets' · 230583c1
      David S. Miller 提交于
      Paolo Abeni says:
      
      ====================
      udp: fix early demux for mcast packets
      
      Currently the early demux callbacks do not perform source address validation.
      This is not an issue for TCP or UDP unicast, where the early demux
      is only allowed for connected sockets and the source address is validated
      for the first packet and never change.
      
      The UDP protocol currently allows early demux also for unconnected multicast
      sockets, and we are not currently doing any validation for them, after that
      the first packet lands on the socket: beyond ignoring the rp_filter - if
      enabled - any kind of martian sources are also allowed.
      
      This series addresses the issue allowing the early demux callback to return an
      error code, and performing the proper checks for unconnected UDP multicast
      sockets before leveraging the rx dst cache.
      
      Alternatively we could disable the early demux for unconnected mcast sockets,
      but that would cause relevant performance regression - around 50% - while with
      this series, with full rp_filter in place, we keep the regression to a more
      moderate level.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      230583c1
    • P
      udp: perform source validation for mcast early demux · bc044e8d
      Paolo Abeni 提交于
      The UDP early demux can leverate the rx dst cache even for
      multicast unconnected sockets.
      
      In such scenario the ipv4 source address is validated only on
      the first packet in the given flow. After that, when we fetch
      the dst entry  from the socket rx cache, we stop enforcing
      the rp_filter and we even start accepting any kind of martian
      addresses.
      
      Disabling the dst cache for unconnected multicast socket will
      cause large performace regression, nearly reducing by half the
      max ingress tput.
      
      Instead we factor out a route helper to completely validate an
      skb source address for multicast packets and we call it from
      the UDP early demux for mcast packets landing on unconnected
      sockets, after successful fetching the related cached dst entry.
      
      This still gives a measurable, but limited performance
      regression:
      
      		rp_filter = 0		rp_filter = 1
      edmux disabled:	1182 Kpps		1127 Kpps
      edmux before:	2238 Kpps		2238 Kpps
      edmux after:	2037 Kpps		2019 Kpps
      
      The above figures are on top of current net tree.
      Applying the net-next commit 6e617de8 ("net: avoid a full
      fib lookup when rp_filter is disabled.") the delta with
      rp_filter == 0 will decrease even more.
      
      Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc044e8d
    • P
      IPv4: early demux can return an error code · 7487449c
      Paolo Abeni 提交于
      Currently no error is emitted, but this infrastructure will
      used by the next patch to allow source address validation
      for mcast sockets.
      Since early demux can do a route lookup and an ipv4 route
      lookup can return an error code this is consistent with the
      current ipv4 route infrastructure.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7487449c
    • X
      ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx path · d41bb33b
      Xin Long 提交于
      Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel
      device, like ip6gre_tap tunnel, for which it should also subtract ether
      header to get the correct mtu.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d41bb33b
    • X
      ip6_gre: ip6gre_tap device should keep dst · 2d40557c
      Xin Long 提交于
      The patch 'ip_gre: ipgre_tap device should keep dst' fixed
      a issue that ipgre_tap mtu couldn't be updated in tx path.
      
      The same fix is needed for ip6gre_tap as well.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d40557c
    • X
      ip_gre: ipgre_tap device should keep dst · d51711c0
      Xin Long 提交于
      Without keeping dst, the tunnel will not update any mtu/pmtu info,
      since it does not have a dst on the skb.
      
      Reproducer:
        client(ipgre_tap1 - eth1) <-----> (eth1 - ipgre_tap1)server
      
      After reducing eth1's mtu on client, then perforamnce became 0.
      
      This patch is to netif_keep_dst in gre_tap_init, as ipgre does.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d51711c0
  4. 30 9月, 2017 1 次提交
    • J
      netlink: do not proceed if dump's start() errs · fef0035c
      Jason A. Donenfeld 提交于
      Drivers that use the start method for netlink dumping rely on dumpit not
      being called if start fails. For example, ila_xlat.c allocates memory
      and assigns it to cb->args[0] in its start() function. It might fail to
      do that and return -ENOMEM instead. However, even when returning an
      error, dumpit will be called, which, in the example above, quickly
      dereferences the memory in cb->args[0], which will OOPS the kernel. This
      is but one example of how this goes wrong.
      
      Since start() has always been a function with an int return type, it
      therefore makes sense to use it properly, rather than ignoring it. This
      patch thus returns early and does not call dumpit() when start() fails.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fef0035c
  5. 29 9月, 2017 14 次提交
  6. 28 9月, 2017 7 次提交
    • A
      tun: bail out from tun_get_user() if the skb is empty · 2580c4c1
      Alexander Potapenko 提交于
      KMSAN (https://github.com/google/kmsan) reported accessing uninitialized
      skb->data[0] in the case the skb is empty (i.e. skb->len is 0):
      
      ================================================
      BUG: KMSAN: use of uninitialized memory in tun_get_user+0x19ba/0x3770
      CPU: 0 PID: 3051 Comm: probe Not tainted 4.13.0+ #3140
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
      ...
       __msan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:477
       tun_get_user+0x19ba/0x3770 drivers/net/tun.c:1301
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245
      ...
      origin:
      ...
       kmsan_poison_shadow+0x6e/0xc0 mm/kmsan/kmsan.c:211
       slab_alloc_node mm/slub.c:2732
       __kmalloc_node_track_caller+0x351/0x370 mm/slub.c:4351
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x26a/0x810 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:903
       alloc_skb_with_frags+0x1d7/0xc80 net/core/skbuff.c:4756
       sock_alloc_send_pskb+0xabf/0xfe0 net/core/sock.c:2037
       tun_alloc_skb drivers/net/tun.c:1144
       tun_get_user+0x9a8/0x3770 drivers/net/tun.c:1274
       tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365
       call_write_iter ./include/linux/fs.h:1743
       new_sync_write fs/read_write.c:457
       __vfs_write+0x6c3/0x7f0 fs/read_write.c:470
       vfs_write+0x3e4/0x770 fs/read_write.c:518
       SYSC_write+0x12f/0x2b0 fs/read_write.c:565
       SyS_write+0x55/0x80 fs/read_write.c:557
       do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245
      ================================================
      
      Make sure tun_get_user() doesn't touch skb->data[0] unless there is
      actual data.
      
      C reproducer below:
      ==========================
          // autogenerated by syzkaller (http://github.com/google/syzkaller)
      
          #define _GNU_SOURCE
      
          #include <fcntl.h>
          #include <linux/if_tun.h>
          #include <netinet/ip.h>
          #include <net/if.h>
          #include <string.h>
          #include <sys/ioctl.h>
      
          int main()
          {
            int sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP);
            int tun_fd = open("/dev/net/tun", O_RDWR);
            struct ifreq req;
            memset(&req, 0, sizeof(struct ifreq));
            strcpy((char*)&req.ifr_name, "gre0");
            req.ifr_flags = IFF_UP | IFF_MULTICAST;
            ioctl(tun_fd, TUNSETIFF, &req);
            ioctl(sock, SIOCSIFFLAGS, "gre0");
            write(tun_fd, "hi", 0);
            return 0;
          }
      ==========================
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2580c4c1
    • O
      net/mlx5: Fix wrong indentation in enable SRIOV code · 353f59f4
      Or Gerlitz 提交于
      Smatch is screaming:
      
      drivers/net/ethernet/mellanox/mlx5/core/sriov.c:112
      	mlx5_device_enable_sriov() warn: inconsistent indenting
      
      fix that.
      
      Fixes: 7ecf6d8f ('IB/mlx5: Restore IB guid/policy for virtual functions')
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      353f59f4
    • M
      net/mlx5: Fix static checker warning on steering tracepoints code · 480df991
      Matan Barak 提交于
      Fix this sparse complaint:
      
      drivers/net/ethernet/mellanox/mlx5/core/./diag/fs_tracepoint.h:172:1:
      	warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
      
      Fixes: d9fea79171ee ('net/mlx5: Add tracepoints')
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      480df991
    • G
      net/mlx5e: Fix calculated checksum offloads counters · 603e1f5b
      Gal Pressman 提交于
      Instead of calculating the offloads counters, count them explicitly.
      The calculations done for these counters would result in bugs in some
      cases, for example:
      When running TCP traffic over a VXLAN tunnel with TSO enabled the following
      counters would increase:
             tx_csum_partial: 1,333,284
             tx_csum_partial_inner: 29,286
             tx4_csum_partial_inner: 384
             tx7_csum_partial_inner: 8
             tx9_csum_partial_inner: 34
             tx10_csum_partial_inner: 26,807
             tx11_csum_partial_inner: 287
             tx12_csum_partial_inner: 27
             tx16_csum_partial_inner: 6
             tx25_csum_partial_inner: 1,733
      
      Seems like tx_csum_partial increased out of nowhere.
      The issue is in the following calculation in mlx5e_update_sw_counters:
      s->tx_csum_partial = s->tx_packets - tx_offload_none - s->tx_csum_partial_inner;
      
      While tx_packets increases by the number of GSO segments for each SKB,
      tx_csum_partial_inner will only increase by one, resulting in wrong
      tx_csum_partial counter.
      
      Fixes: bfe6d8d1 ("net/mlx5e: Reorganize ethtool statistics")
      Signed-off-by: NGal Pressman <galp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      603e1f5b
    • G
      net/mlx5e: Don't add/remove 802.1ad rules when changing 802.1Q VLAN filter · 1456f69f
      Gal Pressman 提交于
      Toggling of C-tag VLAN filter should not affect the "any S-tag" steering rule.
      
      Fixes: 8a271746 ("net/mlx5e: Receive s-tagged packets in promiscuous mode")
      Signed-off-by: NGal Pressman <galp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      1456f69f
    • G
      net/mlx5e: Print netdev features correctly in error message · b20eab15
      Gal Pressman 提交于
      Use the correct formatting for netdev features.
      
      Fixes: 0e405443 ("net/mlx5e: Improve set features ndo resiliency")
      Signed-off-by: NGal Pressman <galp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      b20eab15
    • V
      net/mlx5e: Check encap entry state when offloading tunneled flows · b2812089
      Vlad Buslov 提交于
      Encap entries cached by the driver could be invalidated due to
      tunnel destination neighbour state changes.
      When attempting to offload a flow that uses a cached encap entry,
      we must check the entry validity and defer the offloading
      if the entry exists but not valid.
      
      When EAGAIN is returned, the flow offloading to hardware takes place
      by the neigh update code when the tunnel destination neighbour
      becomes connected.
      
      Fixes: 232c0013 ("net/mlx5e: Add support to neighbour update flow")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      b2812089