1. 09 1月, 2019 22 次提交
    • F
      xfrm: policy: fix reinsertion on node merge · 1d38900c
      Florian Westphal 提交于
      "newpos" has wrong scope.  It must be NULL on each iteration of the loop.
      Otherwise, when policy is to be inserted at the start, we would instead
      insert at point found by the previous loop-iteration instead.
      
      Also, we need to unlink the policy before we reinsert it to the new node,
      else we can get next-points-to-self loops.
      
      Because policies are only ordered by priority it is irrelevant which policy
      is "more recent" except when two policies have same priority.
      (the more recent one is placed after the older one).
      
      In these cases, we can use the ->pos id number to know which one is the
      'older': the higher the id, the more recent the policy.
      
      So we only need to unlink all policies from the node that is about to be
      removed, and insert them to the replacement node.
      
      Fixes: 9cf545eb ("xfrm: policy: store inexact policies in a tree ordered by destination address")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      1d38900c
    • F
      xfrm: policy: delete inexact policies from inexact list on hash rebuild · 1548bc4e
      Florian Westphal 提交于
      An xfrm hash rebuild has to reset the inexact policy list before the
      policies get re-inserted: A change of hash thresholds will result in
      policies to get moved from inexact tree to the policy hash table.
      
      If the thresholds are increased again later, they get moved from hash
      table to inexact tree.
      
      We must unlink all policies from the inexact tree before re-insertion.
      
      Otherwise 'migrate' may find policies that are in main hash table a
      second time, when it searches the inexact lists.
      
      Furthermore, re-insertion without deletion can cause elements ->next to
      point back to itself, causing soft lockups or double-frees.
      
      Reported-by: syzbot+9d971dd21eb26567036b@syzkaller.appspotmail.com
      Fixes: 9cf545eb ("xfrm: policy: store inexact policies in a tree ordered by destination address")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      1548bc4e
    • F
      xfrm: policy: increment xfrm_hash_generation on hash rebuild · 7a474c36
      Florian Westphal 提交于
      Hash rebuild will re-set all the inexact entries, then re-insert them.
      Lookups that can occur in parallel will therefore not find any policies.
      
      This was safe when lookups were still guarded by rwlock.
      After rcu-ification, lookups check the hash_generation seqcount to detect
      when a hash resize takes place.  Hash rebuild missed the needed increment.
      
      Hash resizes and hash rebuilds cannot occur in parallel (both acquire
      hash_resize_mutex), so just increment xfrm_hash_generation, like resize.
      
      Fixes: a7c44247 ("xfrm: policy: make xfrm_policy_lookup_bytype lockless")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      7a474c36
    • F
      xfrm: policy: use hlist rcu variants on inexact insert, part 2 · 355b00d1
      Florian Westphal 提交于
      This function was modeled on the 'exact' insert one, which did not use
      the rcu variant either.
      
      When I fixed the 'exact' insert I forgot to propagate this to my
      development tree, so the inexact variant retained the bug.
      
      Fixes: 9cf545eb ("xfrm: policy: store inexact policies in a tree ordered by destination address")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      355b00d1
    • F
      selftests: xfrm: add block rules with adjacent/overlapping subnets · 0977b238
      Florian Westphal 提交于
      The existing script lacks a policy pattern that triggers 'tree node
      merges' in the kernel.
      
      Consider adding policy affecting following subnet:
      pol1: dst 10.0.0.0/22
      pol2: dst 10.0.0.0/23 # adds to existing 10.0.0.0/22 node
      
      -> no problems here.  But now, lets consider reverse order:
      pol1: dst 10.0.0.0/24
      pol2: dst 10.0.0.0/23 # CANNOT add to existing node
      
      When second policy gets added, the kernel must check that the new node
      ("10.0.0.0/23") doesn't overlap with any existing subnet.
      
      Example:
      dst 10.0.0.0/24
      dst 10.0.0.1/24
      dst 10.0.0.0/23
      
      When the third policy gets added, the kernel must replace the nodes for
      the 10.0.0.0/24 and 10.0.0.1/24 policies with a single one and must merge
      all the subtrees/lists stored in those nodes into the new node.
      
      The existing test cases only have overlaps with a single node, so no
      merging takes place (we can always remove the 'old' node and replace
      it with the new subnet prefix).
      
      Add a few 'block policies' in a pattern that triggers this, with a priority
      that will make kernel prefer the 'esp' rules.
      
      Make sure the 'tunnel ping' tests still pass after they have been added.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      0977b238
    • J
      packet: Do not leak dev refcounts on error exit · d972f3dc
      Jason Gunthorpe 提交于
      'dev' is non NULL when the addr_len check triggers so it must goto a label
      that does the dev_put otherwise dev will have a leaked refcount.
      
      This bug causes the ib_ipoib module to become unloadable when using
      systemd-network as it triggers this check on InfiniBand links.
      
      Fixes: 99137b78 ("packet: validate address length")
      Reported-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d972f3dc
    • D
      Merge branch 'mlxsw-fixes' · 4314b1f6
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-01-08
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix BSD'ism in sendmsg(2) to rewrite unspecified IPv6 dst for
         unconnected UDP sockets with [::1] _after_ cgroup BPF invocation,
         from Andrey.
      
      2) Follow-up fix to the speculation fix where we need to reject a
         corner case for sanitation when ptr and scalars are mixed in the
         same alu op. Also, some unrelated minor doc fixes, from Daniel.
      
      3) Fix BPF kselftest's incorrect uses of create_and_get_cgroup()
         by not assuming fd of zero value to be the result of an error
         case, from Stanislav.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4314b1f6
    • I
      selftests: forwarding: Add a test for VLAN deletion · 4fabf3bf
      Ido Schimmel 提交于
      Add a VLAN on a bridge port, delete it and make sure the PVID VLAN is
      not affected.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fabf3bf
    • I
      mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion · 674bed5d
      Ido Schimmel 提交于
      When a VLAN is deleted from a bridge port we should not change the PVID
      unless the deleted VLAN is the PVID.
      
      Fixes: fe9ccc78 ("mlxsw: spectrum_switchdev: Don't batch VLAN operations")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      674bed5d
    • I
      selftests: forwarding: Fix test for different devices · 289fb44d
      Ido Schimmel 提交于
      When running the test on the Spectrum ASIC the generated packets are
      counted on the ingress filter and injected back to the pipeline because
      of the 'pass' action. The router block then drops the packets due to
      checksum error, as the test generates packets with zero checksum.
      
      When running the test on an emulator that is not as strict about
      checksum errors the test fails since packets are counted twice. Once by
      the emulated ASIC on its ingress filter and again by the kernel as the
      emulator does not perform checksum validation and allows the packets to
      be trapped by a matching host route.
      
      Fix this by changing the action to 'drop', which will prevent the packet
      from continuing further in the pipeline to the router block.
      
      For veth pairs this change is essentially a NOP given packets are only
      processed once (by the kernel).
      
      Fixes: a0b61f3d ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      289fb44d
    • I
      net: bridge: Fix VLANs memory leak · 27973793
      Ido Schimmel 提交于
      When adding / deleting VLANs to / from a bridge port, the bridge driver
      first tries to propagate the information via switchdev and falls back to
      the 8021q driver in case the underlying driver does not support
      switchdev. This can result in a memory leak [1] when VXLAN and mlxsw
      ports are enslaved to the bridge:
      
      $ ip link set dev vxlan0 master br0
      # No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev
      # notification and the bridge driver adds the VLAN on 'vxlan0' via the
      # 8021q driver
      $ bridge vlan add vid 10 dev vxlan0 pvid untagged
      # mlxsw port is enslaved to the bridge
      $ ip link set dev swp1 master br0
      # mlxsw processes the switchdev notification and the 8021q driver is
      # skipped
      $ bridge vlan del vid 10 dev vxlan0
      
      This results in 'struct vlan_info' and 'struct vlan_vid_info' being
      leaked, as they were allocated by the 8021q driver during VLAN addition,
      but never freed as the 8021q driver was skipped during deletion.
      
      Fix this by introducing a new VLAN private flag that indicates whether
      the VLAN was added on the port by switchdev or the 8021q driver. If the
      VLAN was added by the 8021q driver, then we make sure to delete it via
      the 8021q driver as well.
      
      [1]
      unreferenced object 0xffff88822d20b1e8 (size 256):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s)
        hex dump (first 32 bytes):
          e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00  .B..............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<00000000e0178b02>] vlan_vid_add+0x661/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      unreferenced object 0xffff888227454308 (size 32):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s)
        hex dump (first 32 bytes):
          88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff  .. -...... -....
          81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<0000000018050631>] vlan_vid_add+0x3e6/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      
      Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: bridge@lists.linux-foundation.org
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27973793
    • I
      selftests: mlxsw: Add a test case for VLAN addition error flow · 16dc42e4
      Ido Schimmel 提交于
      Add a test case for the issue fixed by previous commit. In case the
      offloading of an unsupported VxLAN tunnel was triggered by adding the
      mapped VLAN to a local port, then error should be returned to the user.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16dc42e4
    • I
      mlxsw: spectrum_nve: Replace error code with EINVAL · 412283ee
      Ido Schimmel 提交于
      Adding a VLAN on a port can trigger the offload of a VXLAN tunnel which
      is already a member in the VLAN. In case the configuration of the VXLAN
      is not supported, the driver would return -EOPNOTSUPP.
      
      This is problematic since bridge code does not interpret this as error,
      but rather that it should try to setup the VLAN using the 8021q driver
      instead of switchdev.
      
      Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      412283ee
    • I
      mlxsw: spectrum_switchdev: Avoid returning errors in commit phase · 457e20d6
      Ido Schimmel 提交于
      Drivers are not supposed to return errors in switchdev commit phase if
      they returned OK in prepare phase. Otherwise, a WARNING is emitted.
      However, when the offloading of a VXLAN tunnel is triggered by the
      addition of a VLAN on a local port, it is not possible to guarantee that
      the commit phase will succeed without doing a lot of work.
      
      In these cases, the artificial division between prepare and commit phase
      does not make sense, so simply do the work in the prepare phase.
      
      Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      457e20d6
    • I
      mlxsw: spectrum: Add VXLAN dependency for spectrum · 143a8e03
      Ido Schimmel 提交于
      When VXLAN is a loadable module, MLXSW_SPECTRUM must not be built-in:
      
      drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:2547: undefined
      reference to `vxlan_fdb_find_uc'
      
      Add Kconfig dependency to enforce usable configurations.
      
      Fixes: 1231e04f ("mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      143a8e03
    • J
      mlxsw: spectrum: Disable lag port TX before removing it · 8adbe212
      Jiri Pirko 提交于
      Make sure that lag port TX is disabled before mlxsw_sp_port_lag_leave()
      is called and prevent from possible EMAD error.
      
      Fixes: 0d65fc13 ("mlxsw: spectrum: Implement LAG port join/leave")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8adbe212
    • N
      mlxsw: spectrum_acl: Remove ASSERT_RTNL()s in module removal flow · 04d075b7
      Nir Dotan 提交于
      Removal of the mlxsw driver on Spectrum-2 platforms hits an ASSERT_RTNL()
      in Spectrum-2 ACL Bloom filter and in ERP removal paths. This happens
      because the multicast router implementation in Spectrum-2 relies on ACLs.
      Taking the RTNL lock upon driver removal is useless since the driver first
      removes its ports and unregisters from notifiers so concurrent writes
      cannot happen at that time. The assertions were originally put as a
      reminder for future work involving ERP background optimization, but having
      these assertions only during addition serves this purpose as well.
      
      Therefore remove the ASSERT_RTNL() in both places related to ERP and Bloom
      filter removal.
      
      Fixes: cf7221a4 ("mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2")
      Signed-off-by: NNir Dotan <nird@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04d075b7
    • N
      mlxsw: spectrum_acl: Add cleanup after C-TCAM update error condition · ff0db43c
      Nir Dotan 提交于
      When writing to C-TCAM, mlxsw driver uses cregion->ops->entry_insert().
      In case of C-TCAM HW insertion error, the opposite action should take
      place.
      Add error handling case in which the C-TCAM region entry is removed, by
      calling cregion->ops->entry_remove().
      
      Fixes: a0a777b9 ("mlxsw: spectrum_acl: Start using A-TCAM")
      Signed-off-by: NNir Dotan <nird@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff0db43c
    • H
      r8169: load Realtek PHY driver module before r8169 · 11287b69
      Heiner Kallweit 提交于
      This soft dependency works around an issue where sometimes the genphy
      driver is used instead of the dedicated PHY driver. The root cause of
      the issue isn't clear yet. People reported the unloading/re-loading
      module r8169 helps, and also configuring this soft dependency in
      the modprobe config files. Important just seems to be that the
      realtek module is loaded before r8169.
      
      Once this has been applied preliminary fix 38af4b90 ("net: phy:
      add workaround for issue where PHY driver doesn't bind to the device")
      will be removed.
      
      Fixes: f1e911d5 ("r8169: add basic phylib support")
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11287b69
    • B
      lan743x: Remove phy_read from link status change function · a0071840
      Bryan Whitehead 提交于
      It has been noticed that some phys do not have the registers
      required by the previous implementation.
      
      To fix this, instead of using phy_read, the required information
      is extracted from the phy_device structure.
      
      fixes: 23f0703c ("lan743x: Add main source files for new lan743x driver")
      Signed-off-by: NBryan Whitehead <Bryan.Whitehead@microchip.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0071840
    • E
      ptp: uapi: change _IOW to IOWR in PTP_SYS_OFFSET_EXTENDED definition · b7ea4894
      Eugene Syromiatnikov 提交于
      The ioctl command is read/write (or just read, if the fact that user space
      writes n_samples field is ignored).
      Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7ea4894
    • E
      ptp: check that rsv field is zero in struct ptp_sys_offset_extended · 895ac137
      Eugene Syromiatnikov 提交于
      Otherwise it is impossible to use it for something else, as it will break
      userspace that puts garbage there.
      
      The same check should be done in other structures, but the fact that
      data in reserved fields is ignored is already part of the kernel ABI.
      Signed-off-by: NEugene Syromiatnikov <esyr@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      895ac137
  2. 08 1月, 2019 10 次提交
  3. 07 1月, 2019 5 次提交
  4. 06 1月, 2019 3 次提交