1. 01 3月, 2022 1 次提交
    • R
      rtnetlink: add new rtm tunnel api for tunnel id filtering · 7b8135f4
      Roopa Prabhu 提交于
      This patch adds new rtm tunnel msg and api for tunnel id
      filtering in dst_metadata devices. First dst_metadata
      device to use the api is vxlan driver with AF_BRIDGE
      family.
      
      This and later changes add ability in vxlan driver to do
      tunnel id filtering (or vni filtering) on dst_metadata
      devices. This is similar to vlan api in the vlan filtering bridge.
      
      this patch includes selinux nlmsg_route_perms support for RTM_*TUNNEL
      api from Benjamin Poirier.
      Signed-off-by: NRoopa Prabhu <roopa@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b8135f4
  2. 28 2月, 2022 3 次提交
  3. 27 2月, 2022 8 次提交
    • V
      net: mscc: ocelot: enforce FDB isolation when VLAN-unaware · 54c31984
      Vladimir Oltean 提交于
      Currently ocelot uses a pvid of 0 for standalone ports and ports under a
      VLAN-unaware bridge, and the pvid of the bridge for ports under a
      VLAN-aware bridge. Standalone ports do not perform learning, but packets
      received on them are still subject to FDB lookups. So if the MAC DA that
      a standalone port receives has been also learned on a VLAN-unaware
      bridge port, ocelot will attempt to forward to that port, even though it
      can't, so it will drop packets.
      
      So there is a desire to avoid that, and isolate the FDBs of different
      bridges from one another, and from standalone ports.
      
      The ocelot switch library has two distinct entry points: the felix DSA
      driver and the ocelot switchdev driver.
      
      We need to code up a minimal bridge_num allocation in the ocelot
      switchdev driver too, this is copied from DSA with the exception that
      ocelot does not care about DSA trees, cross-chip bridging etc. So it
      only looks at its own ports that are already in the same bridge.
      
      The ocelot switchdev driver uses the bridge_num it has allocated itself,
      while the felix driver uses the bridge_num allocated by DSA. They are
      both stored inside ocelot_port->bridge_num by the common function
      ocelot_port_bridge_join() which receives the bridge_num passed by value.
      
      Once we have a bridge_num, we can only use it to enforce isolation
      between VLAN-unaware bridges. As far as I can see, ocelot does not have
      anything like a FID that further makes VLAN 100 from a port be different
      to VLAN 100 from another port with regard to FDB lookup. So we simply
      deny multiple VLAN-aware bridges.
      
      For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we
      allocate a VLAN for each bridge_num. This will be used as the pvid of
      each port that is under that VLAN-unaware bridge, for as long as that
      bridge is VLAN-unaware.
      
      VID 0 remains only for standalone ports. It is okay if all standalone
      ports use the same VID 0, since they perform no address learning, the
      FDB will contain no entry in VLAN 0, so the packets will always be
      flooded to the only possible destination, the CPU port.
      
      The CPU port module doesn't need to be member of the VLANs to receive
      packets, but if we use the DSA tag_8021q protocol, those packets are
      part of the data plane as far as ocelot is concerned, so there it needs
      to. Just ensure that the DSA tag_8021q CPU port is a member of all
      reserved VLANs when it is created, and is removed when it is deleted.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54c31984
    • V
      net: dsa: pass extack to .port_bridge_join driver methods · 06b9cce4
      Vladimir Oltean 提交于
      As FDB isolation cannot be enforced between VLAN-aware bridges in lack
      of hardware assistance like extra FID bits, it seems plausible that many
      DSA switches cannot do it. Therefore, they need to reject configurations
      with multiple VLAN-aware bridges from the two code paths that can
      transition towards that state:
      
      - joining a VLAN-aware bridge
      - toggling VLAN awareness on an existing bridge
      
      The .port_vlan_filtering method already propagates the netlink extack to
      the driver, let's propagate it from .port_bridge_join too, to make sure
      that the driver can use the same function for both.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06b9cce4
    • V
      net: dsa: request drivers to perform FDB isolation · c2693363
      Vladimir Oltean 提交于
      For DSA, to encourage drivers to perform FDB isolation simply means to
      track which bridge does each FDB and MDB entry belong to. It then
      becomes the driver responsibility to use something that makes the FDB
      entry from one bridge not match the FDB lookup of ports from other
      bridges.
      
      The top-level functions where the bridge is determined are:
      - dsa_port_fdb_{add,del}
      - dsa_port_host_fdb_{add,del}
      - dsa_port_mdb_{add,del}
      - dsa_port_host_mdb_{add,del}
      
      aka the pre-crosschip-notifier functions.
      
      Changing the API to pass a reference to a bridge is not superfluous, and
      looking at the passed bridge argument is not the same as having the
      driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add()
      method.
      
      DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well,
      and those do not have any dp->bridge information to retrieve, because
      they are not in any bridge - they are merely the pipes that serve the
      user ports that are in one or multiple bridges.
      
      The struct dsa_bridge associated with each FDB/MDB entry is encapsulated
      in a larger "struct dsa_db" database. Although only databases associated
      to bridges are notified for now, this API will be the starting point for
      implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB
      entries on the CPU port which belong to the corresponding user port's
      port database. These are supposed to match only when the port is
      standalone.
      
      It is better to introduce the API in its expected final form than to
      introduce it for bridges first, then to have to change drivers which may
      have made one or more assumptions.
      
      Drivers can use the provided bridge.num, but they can also use a
      different numbering scheme that is more convenient.
      
      DSA must perform refcounting on the CPU and DSA ports by also taking
      into account the bridge number. So if two bridges request the same local
      address, DSA must notify the driver twice, once for each bridge.
      
      In fact, if the driver supports FDB isolation, DSA must perform
      refcounting per bridge, but if the driver doesn't, DSA must refcount
      host addresses across all bridges, otherwise it would be telling the
      driver to delete an FDB entry for a bridge and the driver would delete
      it for all bridges. So introduce a bool fdb_isolation in drivers which
      would make all bridge databases passed to the cross-chip notifier have
      the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal()
      say that all bridge databases are the same database - which is
      essentially the legacy behavior.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2693363
    • V
      net: dsa: tag_8021q: rename dsa_8021q_bridge_tx_fwd_offload_vid · b6362bdf
      Vladimir Oltean 提交于
      The dsa_8021q_bridge_tx_fwd_offload_vid is no longer used just for
      bridge TX forwarding offload, it is the private VLAN reserved for
      VLAN-unaware bridging in a way that is compatible with FDB isolation.
      
      So just rename it dsa_tag_8021q_bridge_vid.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6362bdf
    • V
      net: dsa: tag_8021q: merge RX and TX VLANs · 04b67e18
      Vladimir Oltean 提交于
      In the old Shared VLAN Learning mode of operation that tag_8021q
      previously used for forwarding, we needed to have distinct concepts for
      an RX and a TX VLAN.
      
      An RX VLAN could be installed on all ports that were members of a given
      bridge, so that autonomous forwarding could still work, while a TX VLAN
      was dedicated for precise packet steering, so it just contained the CPU
      port and one egress port.
      
      Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX
      all over, those lines have been blurred and we no longer have the need
      to do precise TX towards a port that is in a bridge. As for standalone
      ports, it is fine to use the same VLAN ID for both RX and TX.
      
      This patch changes the tag_8021q format by shifting the VLAN range it
      reserves, and halving it. Previously, our DIR bits were encoding the
      VLAN direction (RX/TX) and were set to either 1 or 2. This meant that
      tag_8021q reserved 2K VLANs, or 50% of the available range.
      
      Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q
      reserve only 1K VLANs, and a different range now (the last 1K). This is
      done so that we leave the old format in place in case we need to return
      to it.
      
      In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan
      functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and
      they are no longer distinct. For example, felix which did different
      things for different VLAN types, now needs to handle the RX and the TX
      logic for the same VLAN.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04b67e18
    • V
      net: dsa: tag_8021q: add support for imprecise RX based on the VBID · d7f9787a
      Vladimir Oltean 提交于
      The sja1105 switch can't populate the PORT field of the tag_8021q header
      when sending a frame to the CPU with a non-zero VBID.
      
      Similar to dsa_find_designated_bridge_port_by_vid() which performs
      imprecise RX for VLAN-aware bridges, let's introduce a helper in
      tag_8021q for performing imprecise RX based on the VLAN that it has
      allocated for a VLAN-unaware bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7f9787a
    • V
      net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging · 91495f21
      Vladimir Oltean 提交于
      For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too
      tied with the sja1105 switch: each port uses the same pvid which is also
      used for standalone operation (a unique one from which the source port
      and device ID can be retrieved when packets from that port are forwarded
      to the CPU). Since each port has a unique pvid when performing
      autonomous forwarding, the switch must be configured for Shared VLAN
      Learning (SVL) such that the VLAN ID itself is ignored when performing
      FDB lookups. Without SVL, packets would always be flooded, since FDB
      lookup in the source port's VLAN would never find any entry.
      
      First of all, to make tag_8021q more palatable to switches which might
      not support Shared VLAN Learning, let's just use a common VLAN for all
      ports that are under the same bridge.
      
      Secondly, using Shared VLAN Learning means that FDB isolation can never
      be enforced. But if all ports under the same VLAN-unaware bridge share
      the same VLAN ID, it can.
      
      The disadvantage is that the CPU port can no longer perform precise
      source port identification for these packets. But at least we have a
      mechanism which has proven to be adequate for that situation: imprecise
      RX (dsa_find_designated_bridge_port_by_vid), which is what we use for
      termination on VLAN-aware bridges.
      
      The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the
      same one as we were previously using for imprecise TX (bridge TX
      forwarding offload). It is already allocated, it is just a matter of
      using it.
      
      Note that because now all ports under the same bridge share the same
      VLAN, the complexity of performing a tag_8021q bridge join decreases
      dramatically. We no longer have to install the RX VLAN of a newly
      joining port into the port membership of the existing bridge ports.
      The newly joining port just becomes a member of the VLAN corresponding
      to that bridge, and the other ports are already members of it from when
      they joined the bridge themselves. So forwarding works properly.
      
      This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the
      cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put
      these calls directly into the sja1105 driver.
      
      With this new mode of operation, a port controlled by tag_8021q can have
      two pvids whereas before it could only have one. The pvid for standalone
      operation is different from the pvid used for VLAN-unaware bridging.
      This is done, again, so that FDB isolation can be enforced.
      Let tag_8021q manage this by deleting the standalone pvid when a port
      joins a bridge, and restoring it when it leaves it.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91495f21
    • D
      PCI: Add Fungible Vendor ID to pci_ids.h · e8eb9e32
      Dimitris Michailidis 提交于
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-pci@vger.kernel.org
      Signed-off-by: NDimitris Michailidis <dmichail@fungible.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8eb9e32
  4. 26 2月, 2022 2 次提交
  5. 25 2月, 2022 9 次提交
    • T
      net: openvswitch: IPv6: Add IPv6 extension header support · 28a3f060
      Toms Atteka 提交于
      This change adds a new OpenFlow field OFPXMT_OFB_IPV6_EXTHDR and
      packets can be filtered using ipv6_ext flag.
      Signed-off-by: NToms Atteka <cpp.code.lv@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28a3f060
    • D
      net/tcp: Merge TCP-MD5 inbound callbacks · 7bbb765b
      Dmitry Safonov 提交于
      The functions do essentially the same work to verify TCP-MD5 sign.
      Code can be merged into one family-independent function in order to
      reduce copy'n'paste and generated code.
      Later with TCP-AO option added, this will allow to create one function
      that's responsible for segment verification, that will have all the
      different checks for MD5/AO/non-signed packets, which in turn will help
      to see checks for all corner-cases in one function, rather than spread
      around different families and functions.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220223175740.452397-1-dima@arista.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      7bbb765b
    • V
      net: dsa: felix: support FDB entries on offloaded LAG interfaces · 961d8b69
      Vladimir Oltean 提交于
      This adds the logic in the Felix DSA driver and Ocelot switch library.
      For Ocelot switches, the DEST_IDX that is the output of the MAC table
      lookup is a logical port (equal to physical port, if no LAG is used, or
      a dynamically allocated number otherwise). The allocation we have in
      place for LAG IDs is different from DSA's, so we can't use that:
      - DSA allocates a continuous range of LAG IDs starting from 1
      - Ocelot appears to require that physical ports and LAG IDs are in the
        same space of [0, num_phys_ports), and additionally, ports that aren't
        in a LAG must have physical port id == logical port id
      
      The implication is that an FDB entry towards a LAG might need to be
      deleted and reinstalled when the LAG ID changes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      961d8b69
    • V
      net: dsa: support FDB events on offloaded LAG interfaces · e212fa7c
      Vladimir Oltean 提交于
      This change introduces support for installing static FDB entries towards
      a bridge port that is a LAG of multiple DSA switch ports, as well as
      support for filtering towards the CPU local FDB entries emitted for LAG
      interfaces that are bridge ports.
      
      Conceptually, host addresses on LAG ports are identical to what we do
      for plain bridge ports. Whereas FDB entries _towards_ a LAG can't simply
      be replicated towards all member ports like we do for multicast, or VLAN.
      Instead we need new driver API. Hardware usually considers a LAG to be a
      "logical port", and sets the entire LAG as the forwarding destination.
      The physical egress port selection within the LAG is made by hashing
      policy, as usual.
      
      To represent the logical port corresponding to the LAG, we pass by value
      a copy of the dsa_lag structure to all switches in the tree that have at
      least one port in that LAG.
      
      To illustrate why a refcounted list of FDB entries is needed in struct
      dsa_lag, it is enough to say that:
      - a LAG may be a bridge port and may therefore receive FDB events even
        while it isn't yet offloaded by any DSA interface
      - DSA interfaces may be removed from a LAG while that is a bridge port;
        we don't want FDB entries lingering around, but we don't want to
        remove entries that are still in use, either
      
      For all the cases below to work, the idea is to always keep an FDB entry
      on a LAG with a reference count equal to the DSA member ports. So:
      - if a port joins a LAG, it requests the bridge to replay the FDB, and
        the FDB entries get created, or their refcount gets bumped by one
      - if a port leaves a LAG, the FDB replay deletes or decrements refcount
        by one
      - if an FDB is installed towards a LAG with ports already present, that
        entry is created (if it doesn't exist) and its refcount is bumped by
        the amount of ports already present in the LAG
      
      echo "Adding FDB entry to bond with existing ports"
      ip link del bond0
      ip link add bond0 type bond mode 802.3ad
      ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
      ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
      ip link del br0
      ip link add br0 type bridge
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      
      ip link del br0
      ip link del bond0
      
      echo "Adding FDB entry to empty bond"
      ip link del bond0
      ip link add bond0 type bond mode 802.3ad
      ip link del br0
      ip link add br0 type bridge
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
      ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
      
      ip link del br0
      ip link del bond0
      
      echo "Adding FDB entry to empty bond, then removing ports one by one"
      ip link del bond0
      ip link add bond0 type bond mode 802.3ad
      ip link del br0
      ip link add br0 type bridge
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up
      ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
      
      ip link set swp1 nomaster
      ip link set swp2 nomaster
      ip link del br0
      ip link del bond0
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e212fa7c
    • V
      net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device · ec638740
      Vladimir Oltean 提交于
      When the switchdev_handle_fdb_event_to_device() event replication helper
      was created, my original thought was that FDB events on LAG interfaces
      should most likely be special-cased, not just replicated towards all
      switchdev ports beneath that LAG. So this replication helper currently
      does not recurse through switchdev lower interfaces of LAG bridge ports,
      but rather calls the lag_mod_cb() if that was provided.
      
      No switchdev driver uses this helper for FDB events on LAG interfaces
      yet, so that was an assumption which was yet to be tested. It is
      certainly usable for that purpose, as my RFC series shows:
      
      https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/
      
      however this approach is slightly convoluted because:
      
      - the switchdev driver gets a "dev" that isn't its own net device, but
        rather the LAG net device. It must call switchdev_lower_dev_find(dev)
        in order to get a handle of any of its own net devices (the ones that
        pass check_cb).
      
      - in order for FDB entries on LAG ports to be correctly refcounted per
        the number of switchdev ports beneath that LAG, we haven't escaped the
        need to iterate through the LAG's lower interfaces. Except that is now
        the responsibility of the switchdev driver, because the replication
        helper just stopped half-way.
      
      So, even though yes, FDB events on LAG bridge ports must be
      special-cased, in the end it's simpler to let switchdev_handle_fdb_*
      just iterate through the LAG port's switchdev lowers, and let the
      switchdev driver figure out that those physical ports are under a LAG.
      
      The switchdev_handle_fdb_event_to_device() helper takes a
      "foreign_dev_check" callback so it can figure out whether @dev can
      autonomously forward to @foreign_dev. DSA fills this method properly:
      if the LAG is offloaded by another port in the same tree as @dev, then
      it isn't foreign. If it is a software LAG, it is foreign - forwarding
      happens in software.
      
      Whether an interface is foreign or not decides whether the replication
      helper will go through the LAG's switchdev lowers or not. Since the
      lan966x doesn't properly fill this out, FDB events on software LAG
      uppers will get called. By changing lan966x_foreign_dev_check(), we can
      suppress them.
      
      Whereas DSA will now start receiving FDB events for its offloaded LAG
      uppers, so we need to return -EOPNOTSUPP, since we currently don't do
      the right thing for them.
      
      Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ec638740
    • V
      net: dsa: create a dsa_lag structure · dedd6a00
      Vladimir Oltean 提交于
      The main purpose of this change is to create a data structure for a LAG
      as seen by DSA. This is similar to what we have for bridging - we pass a
      copy of this structure by value to ->port_lag_join and ->port_lag_leave.
      For now we keep the lag_dev, id and a reference count in it. Future
      patches will add a list of FDB entries for the LAG (these also need to
      be refcounted to work properly).
      
      The LAG structure is created using dsa_port_lag_create() and destroyed
      using dsa_port_lag_destroy(), just like we have for bridging.
      
      Because now, the dsa_lag itself is refcounted, we can simplify
      dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in
      the dst->lags array only as long as at least one port uses it. The
      refcounting logic inside those functions can be removed now - they are
      called only when we should perform the operation.
      
      dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag
      structure instead of the lag_dev net_device.
      
      dsa_lag_foreach_port() now takes the dsa_lag structure as argument.
      
      dst->lags holds an array of dsa_lag structures.
      
      dsa_lag_map() now also saves the dsa_lag->id value, so that linear
      walking of dst->lags in drivers using dsa_lag_id() is no longer
      necessary. They can just look at lag.id.
      
      dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(),
      which can be used by drivers to get the LAG ID assigned by DSA to a
      given port.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      dedd6a00
    • V
      net: dsa: make LAG IDs one-based · 3d4a0a2a
      Vladimir Oltean 提交于
      The DSA LAG API will be changed to become more similar with the bridge
      data structures, where struct dsa_bridge holds an unsigned int num,
      which is generated by DSA and is one-based. We have a similar thing
      going with the DSA LAG, except that isn't stored anywhere, it is
      calculated dynamically by dsa_lag_id() by iterating through dst->lags.
      
      The idea of encoding an invalid (or not requested) LAG ID as zero for
      the purpose of simplifying checks in drivers means that the LAG IDs
      passed by DSA to drivers need to be one-based too. So back-and-forth
      conversion is needed when indexing the dst->lags array, as well as in
      drivers which assume a zero-based index.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3d4a0a2a
    • V
      net: dsa: rename references to "lag" as "lag_dev" · 46a76724
      Vladimir Oltean 提交于
      In preparation of converting struct net_device *dp->lag_dev into a
      struct dsa_lag *dp->lag, we need to rename, for consistency purposes,
      all occurrences of the "lag" variable in the DSA core to "lag_dev".
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      46a76724
    • P
      openvswitch: Fix setting ipv6 fields causing hw csum failure · d9b5ae5c
      Paul Blakey 提交于
      Ipv6 ttl, label and tos fields are modified without first
      pulling/pushing the ipv6 header, which would have updated
      the hw csum (if available). This might cause csum validation
      when sending the packet to the stack, as can be seen in
      the trace below.
      
      Fix this by updating skb->csum if available.
      
      Trace resulted by ipv6 ttl dec and then sending packet
      to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]:
      [295241.900063] s_pf0vf2: hw csum failure
      [295241.923191] Call Trace:
      [295241.925728]  <IRQ>
      [295241.927836]  dump_stack+0x5c/0x80
      [295241.931240]  __skb_checksum_complete+0xac/0xc0
      [295241.935778]  nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack]
      [295241.953030]  nf_conntrack_in+0x498/0x5e0 [nf_conntrack]
      [295241.958344]  __ovs_ct_lookup+0xac/0x860 [openvswitch]
      [295241.968532]  ovs_ct_execute+0x4a7/0x7c0 [openvswitch]
      [295241.979167]  do_execute_actions+0x54a/0xaa0 [openvswitch]
      [295242.001482]  ovs_execute_actions+0x48/0x100 [openvswitch]
      [295242.006966]  ovs_dp_process_packet+0x96/0x1d0 [openvswitch]
      [295242.012626]  ovs_vport_receive+0x6c/0xc0 [openvswitch]
      [295242.028763]  netdev_frame_hook+0xc0/0x180 [openvswitch]
      [295242.034074]  __netif_receive_skb_core+0x2ca/0xcb0
      [295242.047498]  netif_receive_skb_internal+0x3e/0xc0
      [295242.052291]  napi_gro_receive+0xba/0xe0
      [295242.056231]  mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core]
      [295242.062513]  mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core]
      [295242.067669]  mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core]
      [295242.077958]  net_rx_action+0x149/0x3b0
      [295242.086762]  __do_softirq+0xd7/0x2d6
      [295242.090427]  irq_exit+0xf7/0x100
      [295242.093748]  do_IRQ+0x7f/0xd0
      [295242.096806]  common_interrupt+0xf/0xf
      [295242.100559]  </IRQ>
      [295242.102750] RIP: 0033:0x7f9022e88cbd
      [295242.125246] RSP: 002b:00007f9022282b20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
      [295242.132900] RAX: 0000000000000005 RBX: 0000000000000010 RCX: 0000000000000000
      [295242.140120] RDX: 00007f9022282ba8 RSI: 00007f9022282a30 RDI: 00007f9014005c30
      [295242.147337] RBP: 00007f9014014d60 R08: 0000000000000020 R09: 00007f90254a8340
      [295242.154557] R10: 00007f9022282a28 R11: 0000000000000246 R12: 0000000000000000
      [295242.161775] R13: 00007f902308c000 R14: 000000000000002b R15: 00007f9022b71f40
      
      Fixes: 3fdbd1ce ("openvswitch: add ipv6 'set' action")
      Signed-off-by: NPaul Blakey <paulb@nvidia.com>
      Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d9b5ae5c
  6. 24 2月, 2022 10 次提交
  7. 23 2月, 2022 5 次提交
    • V
      nvme-tcp: send H2CData PDUs based on MAXH2CDATA · c2700d28
      Varun Prakash 提交于
      As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3)
      Maximum Host to Controller Data length (MAXH2CDATA): Specifies the
      maximum number of PDU-Data bytes per H2CData PDU in bytes. This value
      is a multiple of dwords and should be no less than 4,096.
      
      Current code sets H2CData PDU data_length to r2t_length,
      it does not check MAXH2CDATA value. Fix this by setting H2CData PDU
      data_length to min(req->h2cdata_left, queue->maxh2cdata).
      
      Also validate MAXH2CDATA value returned by target in ICResp PDU,
      if it is not a multiple of dword or if it is less than 4096 return
      -EINVAL from nvme_tcp_init_connection().
      Signed-off-by: NVarun Prakash <varun@chelsio.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      c2700d28
    • H
      net: bridge: Add support for bridge port in locked mode · a21d9a67
      Hans Schultz 提交于
      In a 802.1X scenario, clients connected to a bridge port shall not
      be allowed to have traffic forwarded until fully authenticated.
      A static fdb entry of the clients MAC address for the bridge port
      unlocks the client and allows bidirectional communication.
      
      This scenario is facilitated with setting the bridge port in locked
      mode, which is also supported by various switchcore chipsets.
      Signed-off-by: NHans Schultz <schultz.hans+netdev@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a21d9a67
    • E
      drop_monitor: remove quadratic behavior · b26ef81c
      Eric Dumazet 提交于
      drop_monitor is using an unique list on which all netdevices in
      the host have an element, regardless of their netns.
      
      This scales poorly, not only at device unregister time (what I
      caught during my netns dismantle stress tests), but also at packet
      processing time whenever trace_napi_poll_hit() is called.
      
      If the intent was to avoid adding one pointer in 'struct net_device'
      then surely we prefer O(1) behavior.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b26ef81c
    • E
      net: preserve skb_end_offset() in skb_unclone_keeptruesize() · 2b88cba5
      Eric Dumazet 提交于
      syzbot found another way to trigger the infamous WARN_ON_ONCE(delta < len)
      in skb_try_coalesce() [1]
      
      I was able to root cause the issue to kfence.
      
      When kfence is in action, the following assertion is no longer true:
      
      int size = xxxx;
      void *ptr1 = kmalloc(size, gfp);
      void *ptr2 = kmalloc(size, gfp);
      
      if (ptr1 && ptr2)
      	ASSERT(ksize(ptr1) == ksize(ptr2));
      
      We attempted to fix these issues in the blamed commits, but forgot
      that TCP was possibly shifting data after skb_unclone_keeptruesize()
      has been used, notably from tcp_retrans_try_collapse().
      
      So we not only need to keep same skb->truesize value,
      we also need to make sure TCP wont fill new tailroom
      that pskb_expand_head() was able to get from a
      addr = kmalloc(...) followed by ksize(addr)
      
      Split skb_unclone_keeptruesize() into two parts:
      
      1) Inline skb_unclone_keeptruesize() for the common case,
         when skb is not cloned.
      
      2) Out of line __skb_unclone_keeptruesize() for the 'slow path'.
      
      WARNING: CPU: 1 PID: 6490 at net/core/skbuff.c:5295 skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295
      Modules linked in:
      CPU: 1 PID: 6490 Comm: syz-executor161 Not tainted 5.17.0-rc4-syzkaller-00229-g4f12b742 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295
      Code: bf 01 00 00 00 0f b7 c0 89 c6 89 44 24 20 e8 62 24 4e fa 8b 44 24 20 83 e8 01 0f 85 e5 f0 ff ff e9 87 f4 ff ff e8 cb 20 4e fa <0f> 0b e9 06 f9 ff ff e8 af b2 95 fa e9 69 f0 ff ff e8 95 b2 95 fa
      RSP: 0018:ffffc900063af268 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 00000000ffffffd5 RCX: 0000000000000000
      RDX: ffff88806fc05700 RSI: ffffffff872abd55 RDI: 0000000000000003
      RBP: ffff88806e675500 R08: 00000000ffffffd5 R09: 0000000000000000
      R10: ffffffff872ab659 R11: 0000000000000000 R12: ffff88806dd554e8
      R13: ffff88806dd9bac0 R14: ffff88806dd9a2c0 R15: 0000000000000155
      FS:  00007f18014f9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020002000 CR3: 000000006be7a000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       tcp_try_coalesce net/ipv4/tcp_input.c:4651 [inline]
       tcp_try_coalesce+0x393/0x920 net/ipv4/tcp_input.c:4630
       tcp_queue_rcv+0x8a/0x6e0 net/ipv4/tcp_input.c:4914
       tcp_data_queue+0x11fd/0x4bb0 net/ipv4/tcp_input.c:5025
       tcp_rcv_established+0x81e/0x1ff0 net/ipv4/tcp_input.c:5947
       tcp_v4_do_rcv+0x65e/0x980 net/ipv4/tcp_ipv4.c:1719
       sk_backlog_rcv include/net/sock.h:1037 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2779
       release_sock+0x54/0x1b0 net/core/sock.c:3311
       sk_wait_data+0x177/0x450 net/core/sock.c:2821
       tcp_recvmsg_locked+0xe28/0x1fd0 net/ipv4/tcp.c:2457
       tcp_recvmsg+0x137/0x610 net/ipv4/tcp.c:2572
       inet_recvmsg+0x11b/0x5e0 net/ipv4/af_inet.c:850
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_recvmsg net/socket.c:962 [inline]
       ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632
       ___sys_recvmsg+0x127/0x200 net/socket.c:2674
       __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: c4777efa ("net: add and use skb_unclone_keeptruesize() helper")
      Fixes: 097b9146 ("net: fix up truesize of cloned skb in skb_prepare_for_shift()")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      2b88cba5
    • E
      net: add skb_set_end_offset() helper · 763087da
      Eric Dumazet 提交于
      We have multiple places where this helper is convenient,
      and plan using it in the following patch.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      763087da
  8. 21 2月, 2022 2 次提交