1. 26 7月, 2021 3 次提交
  2. 25 7月, 2021 15 次提交
  3. 24 7月, 2021 9 次提交
  4. 23 7月, 2021 13 次提交
    • D
      Merge branch 'bridge-tx-fwd' · 356ae88f
      David S. Miller 提交于
      Vladimir Oltean says:
      
      ====================
      Allow TX forwarding for the software bridge data path to be offloaded to capable devices
      
      On RX, switchdev drivers have the ability to mark packets for the
      software bridge as "already forwarded in hardware" via
      skb->offload_fwd_mark. This instructs the nbp_switchdev_allowed_egress()
      function to perform software forwarding of that packet only to the bridge
      ports that are not in the same hardware domain as the source packet.
      
      This series expands the concept for TX, in the sense that we can trust
      the accelerator to:
      (a) look up its FDB (which is more or less in sync with the software
          bridge FDB) for selecting the destination ports for a packet
      (b) replicate the frame in hardware in case it's a multicast/broadcast,
          instead of the software bridge having to clone it and send the
          clones to each net device one at a time. This reduces the bandwidth
          needed between the CPU and the accelerator, as well as the CPU time
          spent.
      
      This is done by augmenting nbp_switchdev_allowed_egress() to also
      exclude the bridge ports which have the tx_fwd_offload capability if the
      skb has already been transmitted to one port from their hardware domain.
      
      Even though in reality, the software bridge still technically looks up
      the FDB/MDB for every frame, but all skb clones are suppressed, this
      offload specifically requires that the switchdev accelerator looks up
      its FDB/MDB again. It is intended to be used to inject "data plane
      packets" into the hardware as opposed to "control plane packets" which
      target a precise destination port.
      
      Towards that goal, the bridge always provides the TX packets with
      skb->offload_fwd_mark = true with the VLAN tag always present, so that
      the accelerator can forward according to that VLAN broadcast domain.
      
      This work is not intended to cater to switches which can inject control
      plane packets to a bit mask of destination ports. I see that as a more
      difficult task to accomplish with potentially less benefits (it provides
      only replication offload). The reason it is more difficult is that
      struct skb_buff would probably need to be extended to contain a list of
      struct net_devices that the packet must be replicated to. Sending data
      plane packets avoids that issue by keeping the hardware and software FDB
      more or less in sync and looking it up twice.
      
      Additionally, the ability for the software bridge to request data plane
      packets to be sent brings the opportunity for "dumb switches" to support
      traffic termination to/from the bridge. Such switches (DSA or otherwise)
      typically only use control packets for link-local traps, and sending or
      receiving a control packet is an expensive operation.
      
      For this class of switches, this patch series makes the difference
      between supporting and not supporting local IP termination through a
      VLAN-aware bridge, bridging with a foreign interface, bridging with
      software upper interfaces like LAG, etc. So instead of telling them
      "oh, what a dumb switch you are!", we can now tell them "oh, what a
      stark contrast you have between the control and data plane!".
      
      Patches 1-3 tested on Turris MOX (3 mv88e6xxx switches in a daisy chain
      topology) and a second DSA driver to be added soon. Patches 4-5 tested
      only on Turris MOX.
      
      ===========================================================
      
      Changes in v5:
      - make sure the static key is decremented on bridge port unoffload
      - rename functions and variables so that the "tx_fwd_offload" string is
        easy to grep across the git tree
      - simplify DSA core bookkeeping of the bridge_num
      
      ===========================================================
      
      Changes in v4:
      
      The biggest change compared to the previous series is not present in the
      patches, but is rather a lack of them. Previously we were replaying
      switchdev objects on the public notifier chain, but that was a mistake
      in my reasoning and it was reverted for v4. Therefore, we are now
      passing the notifier blocks as arguments to switchdev_bridge_port_offload()
      for all drivers. This alone gets rid of 7 patches compared to v3.
      
      Other changes are:
      - Take more care for the case where mlxsw leaves a VLAN or LAG upper
        that is a bridge port, make sure that switchdev_bridge_port_unoffload()
        gets called for that case
      - A couple of DSA bug fixes
      - Add change logs for all patches
      - Copy all switchdev driver maintainers on the changes relevant to them
      
      ===========================================================
      
      Message for v3:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210712152142.800651-1-vladimir.oltean@nxp.com/
      
      In this submission I have introduced a "native switchdev" driver API to
      signal whether the TX forwarding offload is supported or not. This comes
      after a third person has said that the macvlan offload framework used
      for v2 and v1 was simply too convoluted.
      
      This large patch set is submitted for discussion purposes (it is
      provided in its entirety so it can be applied & tested on net-next).
      It is only minimally tested, and yet I will not copy all switchdev
      driver maintainers until we agree on the viability of this approach.
      
      The major changes compared to v2:
      - The introduction of switchdev_bridge_port_offload() and
        switchdev_bridge_port_unoffload() as two major API changes from the
        perspective of a switchdev driver. All drivers were converted to call
        these.
      - Augment switchdev_bridge_port_{,un}offload to also handle the
        switchdev object replays on port join/leave.
      - Augment switchdev_bridge_port_offload to also signal whether the TX
        forwarding offload is supported.
      
      ===========================================================
      
      Message for v2:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210703115705.1034112-1-vladimir.oltean@nxp.com/
      
      For this series I have taken Tobias' work from here:
      https://patchwork.kernel.org/project/netdevbpf/cover/20210426170411.1789186-1-tobias@waldekranz.com/
      and made the following changes:
      - I collected and integrated (hopefully all of) Nikolay's, Ido's and my
        feedback on the bridge driver changes. Otherwise, the structure of the
        bridge changes is pretty much the same as Tobias left it.
      - I basically rewrote the DSA infrastructure for the data plane
        forwarding offload, based on the commonalities with another switch
        driver for which I implemented this feature (not submitted here)
      - I adapted mv88e6xxx to use the new infrastructure, hopefully it still
        works but I didn't test that
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      356ae88f
    • T
      net: dsa: tag_dsa: offload the bridge forwarding process · d82f8ab0
      Tobias Waldekranz 提交于
      Allow the DSA tagger to generate FORWARD frames for offloaded skbs
      sent from a bridge that we offload, allowing the switch to handle any
      frame replication that may be required. This also means that source
      address learning takes place on packets sent from the CPU, meaning
      that return traffic no longer needs to be flooded as unknown unicast.
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d82f8ab0
    • V
      net: dsa: mv88e6xxx: map virtual bridges with forwarding offload in the PVT · ce5df689
      Vladimir Oltean 提交于
      The mv88e6xxx switches have the ability to receive FORWARD (data plane)
      frames from the CPU port and route them according to the FDB. We can use
      this to offload the forwarding process of packets sent by the software
      bridge.
      
      Because DSA supports bridge domain isolation between user ports, just
      sending FORWARD frames is not enough, as they might leak the intended
      broadcast domain of the bridge on behalf of which the packets are sent.
      
      It should be noted that FORWARD frames are also (and typically) used to
      forward data plane packets on DSA links in cross-chip topologies. The
      FORWARD frame header contains the source port and switch ID, and
      switches receiving this frame header forward the packet according to
      their cross-chip port-based VLAN table (PVT).
      
      To address the bridging domain isolation in the context of offloading
      the forwarding on TX, the idea is that we can reuse the parts of the PVT
      that don't have any physical switch mapped to them, one entry for each
      software bridge. The switches will therefore think that behind their
      upstream port lie many switches, all in fact backed up by software
      bridges through tag_dsa.c, which constructs FORWARD packets with the
      right switch ID corresponding to each bridge.
      
      The mapping we use is absolutely trivial: DSA gives us a unique bridge
      number, and we add the number of the physical switches in the DSA switch
      tree to that, to obtain a unique virtual bridge device number to use in
      the PVT.
      Co-developed-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce5df689
    • V
      net: dsa: add support for bridge TX forwarding offload · 123abc06
      Vladimir Oltean 提交于
      For a DSA switch, to offload the forwarding process of a bridge device
      means to send the packets coming from the software bridge as data plane
      packets. This is contrary to everything that DSA has done so far,
      because the current taggers only know to send control packets (ones that
      target a specific destination port), whereas data plane packets are
      supposed to be forwarded according to the FDB lookup, much like packets
      ingressing on any regular ingress port. If the FDB lookup process
      returns multiple destination ports (flooding, multicast), then
      replication is also handled by the switch hardware - the bridge only
      sends a single packet and avoids the skb_clone().
      
      DSA keeps for each bridge port a zero-based index (the number of the
      bridge). Multiple ports performing TX forwarding offload to the same
      bridge have the same dp->bridge_num value, and ports not offloading the
      TX data plane of a bridge have dp->bridge_num = -1.
      
      The tagger can check if the packet that is being transmitted on has
      skb->offload_fwd_mark = true or not. If it does, it can be sure that the
      packet belongs to the data plane of a bridge, further information about
      which can be obtained based on dp->bridge_dev and dp->bridge_num.
      It can then compose a DSA tag for injecting a data plane packet into
      that bridge number.
      
      For the switch driver side, we offer two new dsa_switch_ops methods,
      called .port_bridge_fwd_offload_{add,del}, which are modeled after
      .port_bridge_{join,leave}.
      These methods are provided in case the driver needs to configure the
      hardware to treat packets coming from that bridge software interface as
      data plane packets. The switchdev <-> bridge interaction happens during
      the netdev_master_upper_dev_link() call, so to switch drivers, the
      effect is that the .port_bridge_fwd_offload_add() method is called
      immediately after .port_bridge_join().
      
      If the bridge number exceeds the number of bridges for which the switch
      driver can offload the TX data plane (and this includes the case where
      the driver can offload none), DSA falls back to simply returning
      tx_fwd_offload = false in the switchdev_bridge_port_offload() call.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      123abc06
    • V
      net: dsa: track the number of switches in a tree · 5b22d366
      Vladimir Oltean 提交于
      In preparation of supporting data plane forwarding on behalf of a
      software bridge, some drivers might need to view bridges as virtual
      switches behind the CPU port in a cross-chip topology.
      
      Give them some help and let them know how many physical switches there
      are in the tree, so that they can count the virtual switches starting
      from that number on.
      
      Note that the first dsa_switch_ops method where this information is
      reliably available is .setup(). This is because of how DSA works:
      in a tree with 3 switches, each calling dsa_register_switch(), the first
      2 will advance until dsa_tree_setup() -> dsa_tree_setup_routing_table()
      and exit with error code 0 because the topology is not complete. Since
      probing is parallel at this point, one switch does not know about the
      existence of the other. Then the third switch comes, and for it,
      dsa_tree_setup_routing_table() returns complete = true. This switch goes
      ahead and calls dsa_tree_setup_switches() for everybody else, calling
      their .setup() methods too. This acts as the synchronization point.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b22d366
    • T
      net: bridge: switchdev: allow the TX data plane forwarding to be offloaded · 47211192
      Tobias Waldekranz 提交于
      Allow switchdevs to forward frames from the CPU in accordance with the
      bridge configuration in the same way as is done between bridge
      ports. This means that the bridge will only send a single skb towards
      one of the ports under the switchdev's control, and expects the driver
      to deliver the packet to all eligible ports in its domain.
      
      Primarily this improves the performance of multicast flows with
      multiple subscribers, as it allows the hardware to perform the frame
      replication.
      
      The basic flow between the driver and the bridge is as follows:
      
      - When joining a bridge port, the switchdev driver calls
        switchdev_bridge_port_offload() with tx_fwd_offload = true.
      
      - The bridge sends offloadable skbs to one of the ports under the
        switchdev's control using skb->offload_fwd_mark = true.
      
      - The switchdev driver checks the skb->offload_fwd_mark field and lets
        its FDB lookup select the destination port mask for this packet.
      
      v1->v2:
      - convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long
      - introduce a static key "br_switchdev_fwd_offload_used" to minimize the
        impact of the newly introduced feature on all the setups which don't
        have hardware that can make use of it
      - introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache
        line access
      - reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in
        __br_forward()
      - do not strip VLAN on egress if forwarding offload on VLAN-aware bridge
        is being used
      - propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP
      
      v2->v3:
      - replace the solution based on .ndo_dfwd_add_station with a solution
        based on switchdev_bridge_port_offload
      - rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD
      v3->v4: rebase
      v4->v5:
      - make sure the static key is decremented on bridge port unoffload
      - more function and variable renaming and comments for them:
        br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload
        br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload
        nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom
        nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload
        fwd_accel to tx_fwd_offload
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47211192
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 5af84df9
      David S. Miller 提交于
      Conflicts are simple overlapping changes.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5af84df9
    • D
      Merge branch 'net-remove-compat-alloc-user-space' · 090597b4
      David S. Miller 提交于
      Arnd Bergmann says:
      
      ====================
      remove compat_alloc_user_space()
      
      This is the fifth version of my series, now spanning four patches
      instead of two, with a new approach for handling struct ifreq
      compatibility after I realized that my earlier approach introduces
      additional problems.
      
      The idea here is to always push down the compat conversion
      deeper into the call stack: rather than pretending to be
      native mode with a modified copy of the original data on
      the user space stack, have the code that actually works on
      the data understand the difference between native and compat
      versions.
      
      I have spent a long time looking at all drivers that implement
      an ndo_do_ioctl callback to verify that my assumptions are
      correct. This has led to a series of ~30 additional patches
      that I am not including here but will post separately, fixing
      a number of bugs in SIOCDEVPRIVATE ioctls, removing dead
      code, and splitting ndo_do_ioctl into multiple new ndo callbacks
      for private and ethernet specific commands.
      
            Arnd
      
      Link: https://lore.kernel.org/netdev/20201124151828.169152-1-arnd@kernel.org/
      
      Changes in v6:
       - Split out and expand linux/compat.h rework
       - Split ifconf change into two patches
       - Rebase on latest net-next/master
      
      Changes in v5:
       - Rebase to v5.14-rc2
       - Fix a few build issues
      
      Changes in v4:
       - build fix without CONFIG_INET
       - build fix without CONFIG_COMPAT
       - style fixes pointed out by hch
      
      Changes in v3:
       - complete rewrite of the series
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      090597b4
    • A
      net: socket: rework compat_ifreq_ioctl() · 29c49648
      Arnd Bergmann 提交于
      compat_ifreq_ioctl() is one of the last users of copy_in_user() and
      compat_alloc_user_space(), as it attempts to convert the 'struct ifreq'
      arguments from 32-bit to 64-bit format as used by dev_ioctl() and a
      couple of socket family specific interpretations.
      
      The current implementation works correctly when calling dev_ioctl(),
      inet_ioctl(), ieee802154_sock_ioctl(), atalk_ioctl(), qrtr_ioctl()
      and packet_ioctl(). The ioctl handlers for x25, netrom, rose and x25 do
      not interpret the arguments and only block the corresponding commands,
      so they do not care.
      
      For af_inet6 and af_decnet however, the compat conversion is slightly
      incorrect, as it will copy more data than the native handler accesses,
      both of them use a structure that is shorter than ifreq.
      
      Replace the copy_in_user() conversion with a pair of accessor functions
      to read and write the ifreq data in place with the correct length where
      needed, while leaving the other ones to copy the (already compatible)
      structures directly.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29c49648
    • A
      net: socket: simplify dev_ifconf handling · 876f0bf9
      Arnd Bergmann 提交于
      The dev_ifconf() calling conventions make compat handling
      more complicated than necessary, simplify this by moving
      the in_compat_syscall() check into the function.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      876f0bf9
    • A
      net: socket: remove register_gifconf · b0e99d03
      Arnd Bergmann 提交于
      Since dynamic registration of the gifconf() helper is only used for
      IPv4, and this can not be in a loadable module, this can be simplified
      noticeably by turning it into a direct function call as a preparation
      for cleaning up the compat handling.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0e99d03
    • A
      net: socket: rework SIOC?IFMAP ioctls · 709566d7
      Arnd Bergmann 提交于
      SIOCGIFMAP and SIOCSIFMAP currently require compat_alloc_user_space()
      and copy_in_user() for compat mode.
      
      Move the compat handling into the location where the structures are
      actually used, to avoid using those interfaces and get a clearer
      implementation.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      709566d7
    • A
      ethtool: improve compat ioctl handling · dd98d289
      Arnd Bergmann 提交于
      The ethtool compat ioctl handling is hidden away in net/socket.c,
      which introduces a couple of minor oddities:
      
      - The implementation may end up diverging, as seen in the RXNFC
        extension in commit 84a1d9c4 ("net: ethtool: extend RXNFC
        API to support RSS spreading of filter matches") that does not work
        in compat mode.
      
      - Most architectures do not need the compat handling at all
        because u64 and compat_u64 have the same alignment.
      
      - On x86, the conversion is done for both x32 and i386 user space,
        but it's actually wrong to do it for x32 and cannot work there.
      
      - On 32-bit Arm, it never worked for compat oabi user space, since
        that needs to do the same conversion but does not.
      
      - It would be nice to get rid of both compat_alloc_user_space()
        and copy_in_user() throughout the kernel.
      
      None of these actually seems to be a serious problem that real
      users are likely to encounter, but fixing all of them actually
      leads to code that is both shorter and more readable.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd98d289