1. 30 3月, 2021 1 次提交
    • I
      mlxsw: spectrum: Fix ECN marking in tunnel decapsulation · 66167c31
      Ido Schimmel 提交于
      Cited commit changed the behavior of the software data path with regards
      to the ECN marking of decapsulated packets. However, the commit did not
      change other callers of __INET_ECN_decapsulate(), namely mlxsw. The
      driver is using the function in order to ensure that the hardware and
      software data paths act the same with regards to the ECN marking of
      decapsulated packets.
      
      The discrepancy was uncovered by commit 5aa3c334 ("selftests:
      forwarding: vxlan_bridge_1d: Fix vxlan ecn decapsulate value") that
      aligned the selftest to the new behavior. Without this patch the
      selftest passes when used with veth pairs, but fails when used with
      mlxsw netdevs.
      
      Fix this by instructing the device to propagate the ECT(1) mark from the
      outer header to the inner header when the inner header is ECT(0), for
      both NVE and IP-in-IP tunnels.
      
      A helper is added in order not to duplicate the code between both tunnel
      types.
      
      Fixes: b7237487 ("tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NPetr Machata <petrm@nvidia.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66167c31
  2. 04 2月, 2021 3 次提交
    • D
      mlxsw: ethtool: Pass link mode in use to ethtool · 25a96f05
      Danielle Ratson 提交于
      Currently, when user space queries the link's parameters, as speed and
      duplex, each parameter is passed from the driver to ethtool.
      
      Instead, pass the link mode bit in use.
      In Spectrum-1, simply pass the bit that is set to '1' from PTYS register.
      In Spectrum-2, pass the first link mode bit in the mask of the used
      link mode.
      Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      25a96f05
    • D
      mlxsw: ethtool: Add support for setting lanes when autoneg is off · 763ece86
      Danielle Ratson 提交于
      Currently, when auto negotiation is set to off, the user can force a
      specific speed or both speed and duplex. The user cannot influence the
      number of lanes that will be forced.
      
      Add support for setting speed along with lanes so one would be able
      to choose how many lanes will be forced.
      
      When lanes parameter is passed from user space, choose the link mode
      that its actual width equals to it.
      Otherwise, the default link mode will be the one that supports the width
      of the port.
      Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      763ece86
    • D
      mlxsw: ethtool: Remove max lanes filtering · 5fc4053d
      Danielle Ratson 提交于
      Currently, when a speed can be supported by different number of lanes,
      the supported link modes bitmask contains only link modes with a single
      number of lanes.
      
      This was done in order to prevent auto negotiation on number of
      lanes after 50G-1-lane and 100G-2-lanes link modes were introduced.
      
      For example, if a port's max width is 4, only link modes with 4 lanes
      will be presented as supported by that port, so 100G is always achieved by
      4 lanes of 25G.
      
      After the previous patches that allow selection of the number of lanes,
      auto negotiation on number of lanes becomes practical.
      
      Remove that filtering of the maximum number of lanes supported link modes,
      so indeed all the supported and advertised link modes will be shown.
      Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5fc4053d
  3. 23 1月, 2021 1 次提交
  4. 09 12月, 2020 2 次提交
  5. 07 12月, 2020 1 次提交
  6. 02 12月, 2020 2 次提交
  7. 24 11月, 2020 1 次提交
  8. 27 10月, 2020 1 次提交
    • A
      mlxsw: Only advertise link modes supported by both driver and device · 1601559b
      Amit Cohen 提交于
      During port creation the driver instructs the device to advertise all
      the supported link modes queried from the device.
      
      Since cited commit not all the link modes supported by the device are
      supported by the driver. This can result in the device negotiating a
      link mode that is not recognized by the driver causing ethtool to show
      an unsupported speed:
      
      $ ethtool swp1
      ...
      Speed: Unknown!
      
      This is especially problematic when the netdev is enslaved to a bond, as
      the bond driver uses unknown speed as an indication that the link is
      down:
      
      [13048.900895] net_ratelimit: 86 callbacks suppressed
      [13048.900902] t_bond0: (slave swp52): failed to get link speed/duplex
      [13048.912160] t_bond0: (slave swp49): failed to get link speed/duplex
      
      Fix this by making sure that only link modes that are supported by both
      the device and the driver are advertised.
      
      Fixes: b97cd891 ("mlxsw: Remove 56G speed support")
      Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1601559b
  9. 28 9月, 2020 1 次提交
  10. 18 9月, 2020 1 次提交
    • P
      mlxsw: spectrum_buffers: Support two headroom modes · 69e408a2
      Petr Machata 提交于
      There are two interfaces to configure ETS: qdiscs and DCB. Historically,
      DCB ETS configuration was projected to ingress as well, and configured port
      buffers. Qdisc was not.
      
      So as not to break clients that today use DCB ETS and PFC and rely on
      getting a reasonable ingress buffer priomap, keep the ETS mirroring in
      effect.
      
      Since qdiscs have not done this mirroring historically, it is reasonable
      not to introduce it, but rather permit manual ingress configuration through
      dcbnl_setbuffer only in the qdisc mode.
      
      This will require a toggle to indicate whether buffer sizes should be
      autocomputed or taken from dcbnl_setbuffer, and likewise for priomaps.
      Introduce such and initialize it, and guard port buffer size configuration
      as appropriate. The toggle is currently left in the DCB position. In a
      following patch, qdisc code will switch it.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69e408a2
  11. 17 9月, 2020 10 次提交
    • P
      mlxsw: spectrum_buffers: Manage internal buffer in the hdroom code · 22881adf
      Petr Machata 提交于
      Traffic mirroring modes that are in-chip implemented on egress need an
      internal buffer to work. As the only client, the SPAN module was managing
      the buffer so far. However logically it belongs to the buffers module. E.g.
      buffer size validation needs to take the size of the internal buffer into
      account.
      
      Therefore move the related code from SPAN to spectrum_buffers. Move over
      the callbacks that determine the minimum buffer size as a function of
      maximum speed and MTU. Add a field describing the internal buffer to struct
      mlxsw_sp_hdroom. Extend mlxsw_sp_hdroom_bufs_reset_sizes() to take care of
      sizing the internal buffer as well. Change the SPAN module to invoke that
      function and mlxsw_sp_hdroom_configure() like all the other hdroom clients.
      Drop the now-unnecessary mlxsw_sp_span_port_buffer_disable().
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22881adf
    • P
      mlxsw: spectrum_buffers: Introduce shared buffer ops · a41b9626
      Petr Machata 提交于
      The size of the internal buffer is currently calculated in the SPAN module.
      Logically it belongs to the spectrum_buffers module, where it should be
      moved. However, that being a chip-specific operation, it needs dynamic
      dispatch. There currently is a chip-specific structure for description of
      shared buffer values, struct mlxsw_sp_sb_vals. However placing ops into
      this structure would be confusing. Therefore introduce a new per-chip
      structure, currently empty, and initialize the ops pointer as appropriate.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a41b9626
    • P
      mlxsw: spectrum_buffers: Inline mlxsw_sp_sb_max_headroom_cells() · bd3e86a5
      Petr Machata 提交于
      This function is now only used from the buffers module, and is a trivial
      field reference. Just inline it and drop the related artifacts.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd3e86a5
    • P
      mlxsw: spectrum: Split headroom autoresize out of buffer configuration · 2d9f703f
      Petr Machata 提交于
      Split mlxsw_sp_port_headroom_set() to three functions.
      mlxsw_sp_hdroom_bufs_reset_sizes() changes the sizes of the individual PG
      buffers, and mlxsw_sp_hdroom_configure_buffers() will actually apply the
      configuration. A third function, mlxsw_sp_hdroom_bufs_fit(), verifies that
      the requested buffer configuration matches total headroom size
      requirements.
      
      Add wrappers, mlxsw_sp_hdroom_configure() and __..., that will eventually
      perform full headroom configuration, but for now, only have them verify the
      configured headroom size, and invoke mlxsw_sp_hdroom_configure_buffers().
      Have them take the `force` argument to prepare for a later patch, even
      though it is currently unused.
      
      Note that the loop in mlxsw_sp_hdroom_configure_buffers() only goes through
      DCBX_MAX_BUFFERS. Since there is no logic to configure the control buffer,
      it needs to keep the values queried from the FW. Eventually this function
      should configure all the PGs.
      
      Note that conversion of __mlxsw_sp_dcbnl_ieee_setets() is not trivial. That
      function performs the headroom configuration in three steps: first it
      resizes the buffers and adds any new ones. Then it redirects priorities to
      the new buffers. And finally it sets the size of the now-unused buffers to
      zero. This way no packet drops are introduced.
      
      So after invoking mlxsw_sp_hdroom_bufs_reset_sizes(), tweak the
      configuration to keep the old sizes of PG buffers for those buffers whose
      size was set to zero.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d9f703f
    • P
      mlxsw: spectrum: Track buffer sizes in struct mlxsw_sp_hdroom · aa7c0621
      Petr Machata 提交于
      So far, port buffers were always autoconfigured. When dcbnl_setbuffer
      callback is implemented, it will allow the user to change the buffer size
      configuration by hand. The sizes therefore need to be a configuration
      parameter, not always deduced, and therefore belong to struct
      mlxsw_sp_hdroom, where the configuration routine should take them from.
      
      Update mlxsw_sp_port_headroom_set() to update these sizes. Have the
      function update the sizes even for the case that a given buffer is not
      used.
      
      Additionally, change the loop iteration end to DCBX_MAX_BUFFERS instead of
      IEEE_8021QAZ_MAX_TCS. The value is the same, but the semantics differ.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa7c0621
    • P
      mlxsw: spectrum: Track lossiness in struct mlxsw_sp_hdroom · ca21e84e
      Petr Machata 提交于
      Client-side configuration has lossiness as an attribute of a priority.
      Therefore add a "lossy" attribute to struct mlxsw_sp_hdroom_prio.
      
      To a Spectrum ASIC, lossiness is a feature of a port buffer. Therefore add
      struct mlxsw_sp_hdroom_buf, which in the following patches will get more
      attributes, but right now only use it to track port buffer lossiness.
      
      Instead of passing around the primary indicators of PFC and pause_en, add a
      function mlxsw_sp_hdroom_bufs_reset_lossiness() to compute the buffer
      lossiness from the priority map and priority lossiness. Change
      mlxsw_sp_port_headroom_set() to take the buffer lossy flag from the
      headroom configuration. Have the PFC and pause handlers configure priority
      lossiness in mlxsw_sp_hdroom, from where it will propagate.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca21e84e
    • P
      mlxsw: spectrum: Track priorities in struct mlxsw_sp_hdroom · 5df825ed
      Petr Machata 提交于
      The mapping from priorities to buffers determines which buffers should be
      configured. Lossiness of these priorities combined with the mapping
      determines whether a given buffer should be lossy.
      
      Currently this configuration is stored implicitly in DCB ETS, PFC and
      ethtool PAUSE configuration. Keeping it together with the rest of the
      headroom configuration and deriving it as needed from PFC / ETS / PAUSE
      will make things clearer. To that end, add a field "prios" to struct
      mlxsw_sp_hdroom.
      
      Previously, __mlxsw_sp_port_headroom_set() took prio_tc as an argument, and
      assumed that the same mapping as we use on the egress should be used on
      ingress as well. Instead, track this configuration at each priority, so
      that it can be adjusted flexibly.
      
      In the following patches, as dcbnl_setbuffer is implemented, it will need
      to store its own mapping, and it will also be sometimes necessary to revert
      back to the original ETS mapping. Therefore track two buffer indices: the
      one for chip configuration (buf_idx), and the source one (ets_buf_idx).
      Introduce a function to configure the chip-level buffer index, and for now
      have it simply copy the ETS mapping over to the chip mapping.
      
      Update the ETS handler to project prio_tc to the ets_buf_idx and invoke the
      buf_idx recomputation.
      
      Now that there is a canonical place to look for this configuration,
      mlxsw_sp_port_headroom_set() does not need to invent def_prio_tc to use if
      DCB is compiled out.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5df825ed
    • P
      mlxsw: spectrum: Track MTU in struct mlxsw_sp_hdroom · 0103a3e4
      Petr Machata 提交于
      MTU influences sizes of auto-allocated buffers. Make it a part of port
      buffer configuration and have __mlxsw_sp_port_headroom_set() take it from
      there, instead of as an argument.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0103a3e4
    • P
      mlxsw: spectrum: Unify delay handling between PFC and pause · b7e07bbd
      Petr Machata 提交于
      When a priority is marked as lossless using DCB PFC, or when pause frames
      are enabled on a port, mlxsw adds to port buffers an extra space to cover
      the traffic that will arrive between the time that a pause or PFC frame is
      emitted, and the time traffic actually stops. This is called the delay. The
      concept is the same in PFC and pause, however the way the extra buffer
      space is calculated differs.
      
      In this patch, unify this handling. Delay is to be measured in bytes of
      extra space, and will not include MTU. PFC handler sets the delay directly
      from the parameter it gets through the DCB interface.
      
      To convert pause handler, move MLXSW_SP_PAUSE_DELAY to ethtool module,
      convert to bytes, and reduce it by maximum MTU, and divide by two. Then it
      has the same meaning as the delay_bytes set by the PFC handler.
      
      Keep the delay_bytes value in struct mlxsw_sp_hdroom introduced in the
      previous patch. Change PFC and pause handlers to store the new delay value
      there and have __mlxsw_sp_port_headroom_set() take it from there.
      
      Instead of mlxsw_sp_pfc_delay_get() and mlxsw_sp_pg_buf_delay_get(),
      introduce mlxsw_sp_hdroom_buf_delay_get() to calculate the delay provision.
      Drop the unnecessary MLXSW_SP_CELL_FACTOR, and instead add an explanatory
      comment describing the formula used.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7e07bbd
    • P
      mlxsw: spectrum_buffers: Add struct mlxsw_sp_hdroom · 3a77f5a2
      Petr Machata 提交于
      The port headroom handling is currently strewn across several modules and
      tricky to follow: MTU, DCB PFC, DCB ETS and ethtool pause all influence the
      settings, and then there is the completely separate initial configuraion in
      spectrum_buffers. A following patch will implement the dcbnl_setbuffer
      callback, which is going to further complicate the landscape.
      
      In order to simplify work with port buffers, the following patches are
      going to centralize all port-buffer handling in spectrum_buffers. As a
      first step, introduce a (currently empty) struct mlxsw_sp_hdroom that will
      keep the configuration parameters, and allocate and free it in appropriate
      places.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a77f5a2
  12. 16 9月, 2020 1 次提交
  13. 15 9月, 2020 4 次提交
    • P
      mlxsw: spectrum_span: Derive SBIB from maximum port speed & MTU · 532b49e4
      Petr Machata 提交于
      The SBIB register configures the size of an internal buffer that the
      Spectrum ASICs use when mirroring traffic on egress. This size should be
      taken into account when validating that the port headroom buffers are not
      larger than the chip can handle. Up until now this was not done, which is
      incidentally not a problem, because the priority group buffers that mlxsw
      auto-configures are small enough that the boundary condition could not be
      violated.
      
      However when dcbnl_setbuffer is implemented, the user has control over
      sizes of PG buffers, and they might overshoot the headroom capacity.
      However the size of the SBIB buffer depends on port speed, and that cannot
      be vetoed. Therefore SBIB size should be deduced from maximum port speed.
      
      Additionally, once the buffers are configured by hand, the user could get
      into an uncomfortable situation where their MTU change requests get vetoed,
      because the SBIB does not fit anymore. Therefore derive SBIB size from
      maximum permissible MTU as well.
      
      Remove all the code that adjusted the SBIB size whenever speed or MTU
      changed.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      532b49e4
    • P
      mlxsw: spectrum: Keep maximum speed around · 3232e8c6
      Petr Machata 提交于
      The maximum port speed depends on link modes supported by the port, and for
      Ethernet ports is constant. The maximum speed will be handy when setting
      SBIB, the internal buffer used for traffic mirroring. Therefore, keep it in
      struct mlxsw_sp_port for easy access.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3232e8c6
    • P
      mlxsw: spectrum: Keep maximum MTU around · 2ecf87ae
      Petr Machata 提交于
      The maximum port MTU depends on port type. On Spectrum, mlxsw configures
      all ports as Ethernet ports, and the maximum MTU therefore never changes.
      Besides checking MTU configuration, maximum MTU will also be handy when
      setting SBIB, the internal buffer used for traffic mirroring. Therefore,
      keep it in struct mlxsw_sp_port for easy access.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ecf87ae
    • P
      mlxsw: spectrum_ethtool: Introduce ptys_max_speed callback · 60fbc521
      Petr Machata 提交于
      The SBIB register configures the size of an internal buffer that the
      Spectrum ASICs use when mirroring traffic on egress. This size should be
      taken into account when validating that the port headroom buffers are not
      larger than the chip can handle. Up until now this was not done, which is
      incidentally not a problem, because the priority group buffers that mlxsw
      auto-configures are small enough that the boundary condition could not be
      violated.
      
      When dcbnl_setbuffer is implemented, the user gets control over sizes of PG
      buffers, and they might overshoot the headroom capacity. However the size
      of the SBIB buffer depends on port speed, which cannot be vetoed. There is
      obviously no way to retroactively push back on requests for overlarge PG
      buffers, or reject an overlarge MTU, or cancel losslessness of a certain
      PG.
      
      Therefore, instead of taking into account the current speed when
      calculating SBIB buffer size, take into account the maximum speed that a
      port with given Ethernet protocol capabilities can have.
      
      To that end, add a new ethtool callback, ptys_max_speed, which determines
      this maximum speed.
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60fbc521
  14. 24 8月, 2020 1 次提交
  15. 04 8月, 2020 3 次提交
  16. 16 7月, 2020 3 次提交
  17. 14 7月, 2020 4 次提交