1. 30 3月, 2021 1 次提交
    • I
      mlxsw: spectrum: Fix ECN marking in tunnel decapsulation · 66167c31
      Ido Schimmel 提交于
      Cited commit changed the behavior of the software data path with regards
      to the ECN marking of decapsulated packets. However, the commit did not
      change other callers of __INET_ECN_decapsulate(), namely mlxsw. The
      driver is using the function in order to ensure that the hardware and
      software data paths act the same with regards to the ECN marking of
      decapsulated packets.
      
      The discrepancy was uncovered by commit 5aa3c334 ("selftests:
      forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value") that
      aligned the selftest to the new behavior. Without this patch the
      selftest passes when used with veth pairs, but fails when used with
      mlxsw netdevs.
      
      Fix this by instructing the device to propagate the ECT(1) mark from the
      outer header to the inner header when the inner header is ECT(0), for
      both NVE and IP-in-IP tunnels.
      
      A helper is added in order not to duplicate the code between both tunnel
      types.
      
      Fixes: b7237487 ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NPetr Machata <petrm@nvidia.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66167c31
  2. 27 2月, 2021 2 次提交
  3. 13 2月, 2021 2 次提交
  4. 09 2月, 2021 3 次提交
    • A
      mlxsw: spectrum_router: Set offload_failed flag · a4cb1c02
      Amit Cohen 提交于
      When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and route insertion
      fails, FIB abort is triggered.
      
      After aborting, set the appropriate hardware flag to make the kernel emit
      RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED flag.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4cb1c02
    • A
      IPv6: Add "offload failed" indication to routes · 0c5fcf9e
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel, but not
      necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead to a
      routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      To avoid such cases, previous patch set added the ability to emit
      RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed, this behavior is controlled by sysctl.
      
      With the above mentioned behavior, it is possible to know from user-space
      if the route was offloaded, but if the offload fails there is no indication
      to user-space. Following a failure, a routing daemon will wait indefinitely
      for a notification that will never come.
      
      This patch adds an "offload_failed" indication to IPv6 routes, so that
      users will have better visibility into the offload process.
      
      'struct fib6_info' is extended with new field that indicates if route
      offload failed. Note that the new field is added using unused bit and
      therefore there is no need to increase struct size.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c5fcf9e
    • A
      IPv4: Add "offload failed" indication to routes · 36c5100e
      Amit Cohen 提交于
      After installing a route to the kernel, user space receives an
      acknowledgment, which means the route was installed in the kernel, but not
      necessarily in hardware.
      
      The asynchronous nature of route installation in hardware can lead to a
      routing daemon advertising a route before it was actually installed in
      hardware. This can result in packet loss or mis-routed packets until the
      route is installed in hardware.
      
      To avoid such cases, previous patch set added the ability to emit
      RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
      are changed, this behavior is controlled by sysctl.
      
      With the above mentioned behavior, it is possible to know from user-space
      if the route was offloaded, but if the offload fails there is no indication
      to user-space. Following a failure, a routing daemon will wait indefinitely
      for a notification that will never come.
      
      This patch adds an "offload_failed" indication to IPv4 routes, so that
      users will have better visibility into the offload process.
      
      'struct fib_alias', and 'struct fib_rt_info' are extended with new field
      that indicates if route offload failed. Note that the new field is added
      using unused bit and therefore there is no need to increase structs size.
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36c5100e
  5. 04 2月, 2021 3 次提交
    • D
      mlxsw: ethtool: Pass link mode in use to ethtool · 25a96f05
      Danielle Ratson 提交于
      Currently, when user space queries the link's parameters, as speed and
      duplex, each parameter is passed from the driver to ethtool.
      
      Instead, pass the link mode bit in use.
      In Spectrum-1, simply pass the bit that is set to '1' from PTYS register.
      In Spectrum-2, pass the first link mode bit in the mask of the used
      link mode.
      Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      25a96f05
    • D
      mlxsw: ethtool: Add support for setting lanes when autoneg is off · 763ece86
      Danielle Ratson 提交于
      Currently, when auto negotiation is set to off, the user can force a
      specific speed or both speed and duplex. The user cannot influence the
      number of lanes that will be forced.
      
      Add support for setting speed along with lanes so one would be able
      to choose how many lanes will be forced.
      
      When lanes parameter is passed from user space, choose the link mode
      that its actual width equals to it.
      Otherwise, the default link mode will be the one that supports the width
      of the port.
      Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      763ece86
    • D
      mlxsw: ethtool: Remove max lanes filtering · 5fc4053d
      Danielle Ratson 提交于
      Currently, when a speed can be supported by different number of lanes,
      the supported link modes bitmask contains only link modes with a single
      number of lanes.
      
      This was done in order to prevent auto negotiation on number of
      lanes after 50G-1-lane and 100G-2-lanes link modes were introduced.
      
      For example, if a port's max width is 4, only link modes with 4 lanes
      will be presented as supported by that port, so 100G is always achieved by
      4 lanes of 25G.
      
      After the previous patches that allow selection of the number of lanes,
      auto negotiation on number of lanes becomes practical.
      
      Remove that filtering of the maximum number of lanes supported link modes,
      so indeed all the supported and advertised link modes will be shown.
      Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5fc4053d
  6. 03 2月, 2021 2 次提交
  7. 29 1月, 2021 2 次提交
  8. 23 1月, 2021 1 次提交
  9. 15 1月, 2021 1 次提交
    • C
      mlxsw: pci: switch from 'pci_' to 'dma_' API · bb5c64c8
      Christophe JAILLET 提交于
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below and has been
      hand modified to replace GFP_ with a correct flag.
      It has been compile tested.
      
      When memory is allocated in 'mlxsw_pci_queue_init()' and
      'mlxsw_pci_fw_area_init()' GFP_KERNEL can be used because both functions
      are already using this flag and no lock is acquired.
      
      When memory is allocated in 'mlxsw_pci_mbox_alloc()' GFP_KERNEL can be used
      because it is only called from the probe function and no lock is acquired
      in the between.
      The call chain is:
        --> mlxsw_pci_probe()
          --> mlxsw_pci_cmd_init()
            --> mlxsw_pci_mbox_alloc()
      
      While at it, also replace the 'dma_set_mask/dma_set_coherent_mask' sequence
      by a less verbose 'dma_set_mask_and_coherent() call.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Tested-by: NIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20210114084757.490540-1-christophe.jaillet@wanadoo.frSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      bb5c64c8
  10. 12 1月, 2021 4 次提交
    • V
      mlxsw: spectrum_switchdev: remove transactional logic for VLAN objects · 4b400fea
      Vladimir Oltean 提交于
      As of commit 457e20d6 ("mlxsw: spectrum_switchdev: Avoid returning
      errors in commit phase"), the mlxsw driver performs the VLAN object
      offloading during the prepare phase. So conversion just seems to be a
      matter of removing the code that was running in the commit phase.
      
      Ido Schimmel explains that the reason why mlxsw_sp_span_respin is called
      unconditionally is because the bridge driver will ignore -EOPNOTSUPP and
      actually add the VLAN on the bridge device - see commit 9c86ce2c
      ("net: bridge: Notify about bridge VLANs") and commit ea472175
      ("mlxsw: spectrum_switchdev: Ignore bridge VLAN events"). Since the VLAN
      was successfully added on the bridge device, mlxsw_sp_span_respin_work()
      should be able to resolve the egress port for a packet that is mirrored
      to a gre tap and passes through the bridge device. Therefore keep the
      logic as it is.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4b400fea
    • V
      net: switchdev: remove the transaction structure from port attributes · bae33f2b
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port attributes were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      attribute notifier structures, and converts drivers to not look at this
      member.
      
      In part, this patch contains a revert of my previous commit 2e554a7a
      ("net: dsa: propagate switchdev vlan_filtering prepare phase to
      drivers").
      
      For the most part, the conversion was trivial except for:
      - Rocker's world implementation based on Broadcom OF-DPA had an odd
        implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
        done mechanically, by pasting the implementation twice, then only
        keeping the code that would get executed during prepare phase on top,
        then only keeping the code that gets executed during the commit phase
        on bottom, then simplifying the resulting code until this was obtained.
      - DSA's offloading of STP state, bridge flags, VLAN filtering and
        multicast router could be converted right away. But the ageing time
        could not, so a shim was introduced and this was left for a further
        commit.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      bae33f2b
    • V
      net: switchdev: remove the transaction structure from port object notifiers · ffb68fc5
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port objects were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      object notifier structures, and converts drivers to not look at this
      member.
      
      Where driver conversion is trivial (like in the case of the Marvell
      Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
      done in this patch.
      
      Where driver conversion needs more attention (DSA, Mellanox Spectrum),
      the conversion is left for subsequent patches and here we only fake the
      prepare/commit phases at a lower level, just not in the switchdev
      notifier itself.
      
      Where the code has a natural structure that is best left alone as a
      preparation and a commit phase (as in the case of the Ocelot switch),
      that structure is left in place, just made to not depend upon the
      switchdev transactional model.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ffb68fc5
    • V
      net: switchdev: remove vid_begin -> vid_end range from VLAN objects · b7a9e0da
      Vladimir Oltean 提交于
      The call path of a switchdev VLAN addition to the bridge looks something
      like this today:
      
              nbp_vlan_init
              |  __br_vlan_set_default_pvid
              |  |                       |
              |  |    br_afspec          |
              |  |        |              |
              |  |        v              |
              |  | br_process_vlan_info  |
              |  |        |              |
              |  |        v              |
              |  |   br_vlan_info        |
              |  |       / \            /
              |  |      /   \          /
              |  |     /     \        /
              |  |    /       \      /
              v  v   v         v    v
            nbp_vlan_add   br_vlan_add ------+
             |              ^      ^ |       |
             |             /       | |       |
             |            /       /  /       |
             \ br_vlan_get_master/  /        v
              \        ^        /  /  br_vlan_add_existing
               \       |       /  /          |
                \      |      /  /          /
                 \     |     /  /          /
                  \    |    /  /          /
                   \   |   /  /          /
                    v  |   | v          /
                    __vlan_add         /
                       / |            /
                      /  |           /
                     v   |          /
         __vlan_vid_add  |         /
                     \   |        /
                      v  v        v
            br_switchdev_port_vlan_add
      
      The ranges UAPI was introduced to the bridge in commit bdced7ef
      ("bridge: support for multiple vlans and vlan ranges in setlink and
      dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
      have always been passed one by one, through struct bridge_vlan_info
      tmp_vinfo, to br_vlan_info. So the range never went too far in depth.
      
      Then Scott Feldman introduced the switchdev_port_bridge_setlink function
      in commit 47f8328b ("switchdev: add new switchdev bridge setlink").
      That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
      full use of the range. But switchdev_port_bridge_setlink was called like
      this:
      
      br_setlink
      -> br_afspec
      -> switchdev_port_bridge_setlink
      
      Basically, the switchdev and the bridge code were not tightly integrated.
      Then commit 41c498b9 ("bridge: restore br_setlink back to original")
      came, and switchdev drivers were required to implement
      .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.
      
      In the meantime, commits such as 0944d6b5 ("bridge: try switchdev op
      first in __vlan_vid_add/del") finally made switchdev penetrate the
      br_vlan_info() barrier and start to develop the call path we have today.
      But remember, br_vlan_info() still receives VLANs one by one.
      
      Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
      29ab586c ("net: switchdev: Remove bridge bypass support from
      switchdev") so that drivers would not implement .ndo_bridge_setlink any
      longer. The switchdev_port_bridge_setlink also got deleted.
      This refactoring removed the parallel bridge_setlink implementation from
      switchdev, and left the only switchdev VLAN objects to be the ones
      offloaded from __vlan_vid_add (basically RX filtering) and  __vlan_add
      (the latter coming from commit 9c86ce2c ("net: bridge: Notify about
      bridge VLANs")).
      
      That is to say, today the switchdev VLAN object ranges are not used in
      the kernel. Refactoring the above call path is a bit complicated, when
      the bridge VLAN call path is already a bit complicated.
      
      Let's go off and finish the job of commit 29ab586c by deleting the
      bogus iteration through the VLAN ranges from the drivers. Some aspects
      of this feature never made too much sense in the first place. For
      example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
      flag supposed to mean, when a port can obviously have a single pvid?
      This particular configuration _is_ denied as of commit 6623c60d
      ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
      perspective, the driver still has to play pretend, and only offload the
      vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
      modify the flags of another, completely unrelated, switchdev VLAN
      object! (a VLAN that is PVID will invalidate the PVID flag from whatever
      other VLAN had previously been offloaded with switchdev and had that
      flag. Yet switchdev never notifies about that change, drivers are
      supposed to guess).
      
      Nonetheless, having a VLAN range in the API makes error handling look
      scarier than it really is - unwinding on errors and all of that.
      When in reality, no one really calls this API with more than one VLAN.
      It is all unnecessary complexity.
      
      And despite appearing pretentious (two-phase transactional model and
      all), the switchdev API is really sloppy because the VLAN addition and
      removal operations are not paired with one another (you can add a VLAN
      100 times and delete it just once). The bridge notifies through
      switchdev of a VLAN addition not only when the flags of an existing VLAN
      change, but also when nothing changes. There are switchdev drivers out
      there who don't like adding a VLAN that has already been added, and
      those checks don't really belong at driver level. But the fact that the
      API contains ranges is yet another factor that prevents this from being
      addressed in the future.
      
      Of the existing switchdev pieces of hardware, it appears that only
      Mellanox Spectrum supports offloading more than one VLAN at a time,
      through mlxsw_sp_port_vlan_set. I have kept that code internal to the
      driver, because there is some more bookkeeping that makes use of it, but
      I deleted it from the switchdev API. But since the switchdev support for
      ranges has already been de facto deleted by a Mellanox employee and
      nobody noticed for 4 years, I'm going to assume it's not a biggie.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b7a9e0da
  11. 10 1月, 2021 2 次提交
  12. 15 12月, 2020 15 次提交
  13. 09 12月, 2020 2 次提交