1. 05 7月, 2016 7 次提交
    • I
      mlxsw: spectrum: Introduce support for router interfaces · 99724c18
      Ido Schimmel 提交于
      Up until now we only supported bridged interfaces. Packets ingressing
      through the switch ports were either classified to FIDs (in the case of
      the VLAN-aware bridge) or vFIDs (in the case of VLAN-unaware bridges).
      The packets were then forwarded according to the FDB. Routing was done
      entirely in slowpath, by splitting the vFID range in two and using the
      lower 0.5K vFIDs as dummy bridges that simply flooded all incoming
      traffic to the CPU.
      
      Instead, allow packets to be routed in the device by creating router
      interfaces (RIFs) that will direct them to the router block.
      Specifically, the RIFs introduced here are Sub-port RIFs used for VLAN
      devices and port netdevs. Packets ingressing from the {Port / LAG ID, VID}
      with which the RIF was programmed with will be assigned to a special
      kind of FIDs called rFIDs and from there directed to the router.
      
      Create a RIF whenever the first IPv4 address was programmed on a VLAN /
      LAG / port netdev. Destroy it upon removal of the last IPv4 address.
      Receive these notifications by registering for the 'inetaddr'
      notification chain. A non-zero (10) priority is used for the
      notification block, so that RIFs will be created before routes are
      offloaded via FIB code.
      
      Note that another trigger for RIF destruction are CHANGEUPPER
      notifications causing the underlying FID's reference count to go down to
      zero. This can happen, for example, when a VLAN netdev with an IP address
      is put under bridge. While this configuration doesn't make sense it does
      cause the device and the kernel to get out of sync when the netdev is
      unbridged. We intend to address this in the future, hopefully in current
      cycle.
      
      Finally, Remove the lower 0.5K vFIDs, as they are deprecated by the RIFs,
      which will trap packets according to their DIP.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99724c18
    • I
      mlxsw: spectrum: Edit RIF properties based on netdev events · 6e095fd4
      Ido Schimmel 提交于
      We are just about to introduce router interfaces (RIFs), but before that
      we need to be able update the device with the correct RIF attributes
      whenever they change for the netdev the RIF is backing. Two such
      attributes are MTU and MAC.
      
      The MAC is used both to set the source MAC of packets egressing from the
      RIF and also to program an FDB rule that will direct packets to the
      router block.
      
      Use the existing netdevice notification block and respond to CHANGEADDR
      and CHANGEMTU accordingly. Store both attributes in the RIF struct
      in case we need to revert to old attributes following a failed update.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e095fd4
    • J
      mlxsw: spectrum: Add couple of lower device helper functions · 7ce856aa
      Jiri Pirko 提交于
      Add functions that iterate over lower devices and find port device.
      As a dependency add netdev_for_each_all_lower_dev and
      netdev_for_each_all_lower_dev_rcu macro with
      netdev_all_lower_get_next and netdev_all_lower_get_next_rcu shelpers.
      
      Also, add functions to return mlxsw struct according to lower device
      found and mlxsw_port struct with a reference to lower device.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ce856aa
    • J
      mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops · 61c503f9
      Jiri Pirko 提交于
      Implement ipv4 FIB entries addition and removal. Initially, we support
      local and broadcast routes using "ip2me" trap action.
      Also, unicast routes without nexthop are supported using "local" action.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61c503f9
    • J
      mlxsw: spectrum_router: Add virtual router management · 6b75c480
      Jiri Pirko 提交于
      Virtual router is a construct used inside HW. In this implementation
      we map kernel tables to virtual routers one to one. Introduce management
      logic to create virtual routers when needed and destroy in case they are
      no longer in use. According to that, call into LPM tree management.
      Each virtual router is always bound to one LPM tree.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b75c480
    • J
      mlxsw: spectrum_router: Implement LPM trees management · 53342023
      Jiri Pirko 提交于
      Introduce basic LPM tree management allowing to share the trees in
      between tables if the used prefixes in the tables are the same.
      Build the tree structure according to the used prefixes. Although it is
      not optimal for many use cases, this initial implementation does only
      simple linear left-tree. More advanced structures will be introduced
      later on, possibly including mechanisms to change trees on the fly.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53342023
    • J
      mlxsw: spectrum_router: Implement private fib · 5e9c16cc
      Jiri Pirko 提交于
      Shadow FIB is needed in order to hold additional information for FIB
      entries and keep track of used prefixes. That is needed for the LPM tree
      construction to be introduced later on in this set.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e9c16cc
  2. 03 7月, 2016 3 次提交
  3. 21 6月, 2016 10 次提交
  4. 10 6月, 2016 1 次提交
  5. 15 4月, 2016 3 次提交
  6. 09 4月, 2016 1 次提交
  7. 07 4月, 2016 6 次提交
    • I
      mlxsw: spectrum: Add IEEE 802.1Qbb PFC support · d81a6bdb
      Ido Schimmel 提交于
      Implement the appropriate DCB ops and allow a user to configure certain
      traffic classes as lossless.
      
      The operation configures PFC for both the egress (respecting PFC frames)
      and ingress (sending PFC frames) parts of the port.
      
      At egress, when a PFC frame is received for a PFC enabled priority, then
      all the priorities mapped to the same TC are stopped.
      
      At ingress, the priority group (PG) buffers to which the enabled PFC
      priorities are mapped are configured to be lossless. PFC frames will be
      transmitted when the Xoff threshold is crossed.
      
      The user-supplied delay parameter is used to determine the PG's size
      according to the following formula:
      
      PG_SIZE = PG_SIZE_LOSSY + delay * CELL_FACTOR + MTU
      
      In the worst case scenario the delay will be made up of packets that
      are all of size CELL_SIZE + 1, which means each packet will require
      almost twice its true size when buffered in the switch. We therefore
      multiply this value by the "cell factor", which is close to 2.
      
      Another MTU is added in case the transmitting host already started
      transmitting a maximum length frame when the PFC packet was received.
      
      As with PAUSE enabled ports, when the port's MTU is changed both the
      PGs' size and threshold are adjusted accordingly.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d81a6bdb
    • I
      mlxsw: spectrum: Add support for PAUSE frames · 9f7ec052
      Ido Schimmel 提交于
      When a packet ingress the switch it's placed in its assigned priority
      group (PG) buffer in the port's headroom buffer while it goes through
      the switch's pipeline. After going through the pipeline - which
      determines its egress port(s) and traffic class - it's moved to the
      switch's shared buffer awaiting transmission.
      
      However, some packets are not eligible to enter the shared buffer due to
      exceeded quotas or insufficient space. Marking their associated PGs as
      lossless will cause the packets to accumulate in the PG buffer. Another
      reason for packets accumulation are complicated pipelines (e.g.
      involving a lot of ACLs).
      
      To prevent packets from being dropped a user can enable PAUSE frames on
      the port. This will mark all the active PGs as lossless and set their
      size according to the maximum delay, as it's not configured by user.
      
                               +----------------+   +
                               |                |   |
                               |                |   |
                               |                |   |
                               |                |   |
                               |                |   |
                               |                |   | Delay
                               |                |   |
                               |                |   |
                               |                |   |
                               |                |   |
                               |                |   |
          Xon/Xoff threshold   +----------------+   +
                               |                |   |
                               |                |   | 2 * MTU
                               |                |   |
                               +----------------+   +
      
      The delay (612 [Cells]) was calculated according to worst-case scenario
      involving maximum MTU and 100m cables.
      
      After marking the PGs as lossless the device is configured to respect
      incoming PAUSE frames (Rx PAUSE) and generate PAUSE frames (Tx PAUSE)
      according to user's settings.
      
      Whenever the port's headroom configuration changes we take into account
      the PAUSE configuration, so that we correctly set the PG's type (lossy /
      lossless), size and threshold. This can happen when:
      
      a) The port's MTU changes, as it directly affects the PG's size.
      
      b) A PG is created following user configuration, by binding a priority
      to it.
      
      Note that the relevant SUPPORTED flags were already mistakenly set by
      the driver before this commit.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f7ec052
    • I
      mlxsw: spectrum: Allow setting maximum rate for a TC · cc7cf517
      Ido Schimmel 提交于
      Allow a user to set maximum rate for a particular TC using DCB ops.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc7cf517
    • I
      mlxsw: spectrum: Add IEEE 802.1Qaz ETS support · 8e8dfe9f
      Ido Schimmel 提交于
      Implement the appropriate DCB ops and allow a user to configure:
      	* Priority to traffic class (TC) mapping with a total of 8
      	  supported TCs
      	* Transmission selection algorithm (TSA) for each TC and the
      	  corresponding weights in case of weighted round robin (WRR)
      
      As previously explained, we treat the priority group (PG) buffer in the
      port's headroom as the ingress counterpart of the egress TC. Therefore,
      when a certain priority to TC mapping is configured, we also configure
      the port's headroom buffer.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e8dfe9f
    • I
      mlxsw: spectrum: Introduce support for Data Center Bridging (DCB) · f00817df
      Ido Schimmel 提交于
      Introduce basic infrastructure for DCB and add the missing ops in
      following patches.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f00817df
    • I
      mlxsw: spectrum: Add bytes to cells helper · 1a198449
      Ido Schimmel 提交于
      Buffers in the switch store packets in units called buffer cells. Add a
      helper to convert from bytes to cells, so that the actual number of
      cells required (result is round up) is returned.
      
      Also, drop the SB (shared buffer) acronym from the BYTES_PER_CELL macro,
      as this unit is also used in the ports' buffers and not only the
      switch's shared buffer.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a198449
  8. 06 4月, 2016 1 次提交
  9. 12 3月, 2016 1 次提交
  10. 02 3月, 2016 3 次提交
    • I
      mlxsw: spectrum: Introduce port splitting · 18f1e70c
      Ido Schimmel 提交于
      Allow a user to split or unsplit a port using the newly introduced
      devlink ops.
      
      Once split, the original netdev is destroyed and 2 or 4 others are
      created, according to user configuration. The new ports are like any
      other port, with the sole difference of supporting a lower maximum
      speed. When unsplit, the reverse process takes place.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18f1e70c
    • I
      mlxsw: spectrum: Store local port to module mapping during init · 558c2d5e
      Ido Schimmel 提交于
      The port netdevs are each associated with a different local port number
      in the device. These local ports are grouped into groups of 4 (e.g.
      (1-4), (5-8)) called clusters. The cluster constitutes the one of two
      possible modules they can be mapped to. This mapping is board-specific
      and done by the device's firmware during init.
      
      When splitting a port by 4, the device requires us to first unmap all
      the ports in the cluster and then map each to a single lane in the module
      associated with the port netdev used as the handle for the operation.
      This means that two port netdevs will disappear, as only 100Gb/s (4
      lanes) ports can be split and we are guaranteed to have two of these
      ((1, 3), (5, 7) etc.) in a cluster.
      
      When unsplit occurs we need to reinstantiate the two original 100Gb/s
      ports and map each to its origianl module. Therefore, during driver init
      store the initial local port to module mapping, so it can be used later
      during unsplitting.
      
      Note that a by 2 split doesn't require us to store the mapping, as we
      only need to reinstantiate one port whose module is known.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      558c2d5e
    • J
      mlxsw: Implement devlink interface · c4745500
      Jiri Pirko 提交于
      Implement newly introduced devlink interface. Add devlink port instances
      for every port and set the port types accordingly.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4745500
  11. 18 2月, 2016 1 次提交
  12. 29 1月, 2016 2 次提交
  13. 11 1月, 2016 1 次提交
    • I
      mlxsw: spectrum: Add FDB lock to prevent session interleaving · 366ce603
      Ido Schimmel 提交于
      Dumping the FDB (invoked with a process context) or handling FDB
      notifications (polled periodicly in delayed work) might each entail
      multiple EMAD transcations due to the number of entries.
      
      While we only allow one EMAD transaction at a time, there is nothing
      stopping the dump and notification processing sessions from
      interleaving. However, this is forbidden by the hardware, so we need to
      make sure only one of these sessions can run at a time.
      
      Solve this by adding a mutex ('fdb_lock'), as both kernel threads can
      sleep while waiting for the response EMAD.
      
      Fixes: 56ade8fe ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      366ce603