1. 07 2月, 2021 8 次提交
  2. 04 2月, 2021 2 次提交
  3. 30 1月, 2021 5 次提交
    • V
      net: dsa: felix: perform switch setup for tag_8021q · e21268ef
      Vladimir Oltean 提交于
      Unlike sja1105, the only other user of the software-defined tag_8021q.c
      tagger format, the implementation we choose for the Felix DSA switch
      driver preserves full functionality under a vlan_filtering bridge
      (i.e. IP termination works through the DSA user ports under all
      circumstances).
      
      The tag_8021q protocol just wants:
      - Identifying the ingress switch port based on the RX VLAN ID, as seen
        by the CPU. We achieve this by using the TCAM engines (which are also
        used for tc-flower offload) to push the RX VLAN as a second, outer
        tag, on egress towards the CPU port.
      - Steering traffic injected into the switch from the network stack
        towards the correct front port based on the TX VLAN, and consuming
        (popping) that header on the switch's egress.
      
      A tc-flower pseudocode of the static configuration done by the driver
      would look like this:
      
      $ tc qdisc add dev <cpu-port> clsact
      $ for eth in swp0 swp1 swp2 swp3; do \
      	tc filter add dev <cpu-port> egress flower indev ${eth} \
      		action vlan push id <rxvlan> protocol 802.1ad; \
      	tc filter add dev <cpu-port> ingress protocol 802.1Q flower
      		vlan_id <txvlan> action vlan pop \
      		action mirred egress redirect dev ${eth}; \
      done
      
      but of course since DSA does not register network interfaces for the CPU
      port, this configuration would be impossible for the user to do. Also,
      due to the same reason, it is impossible for the user to inadvertently
      delete these rules using tc. These rules do not collide in any way with
      tc-flower, they just consume some TCAM space, which is something we can
      live with.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e21268ef
    • V
      net: mscc: ocelot: don't use NPI tag prefix for the CPU port module · cacea62f
      Vladimir Oltean 提交于
      Context: Ocelot switches put the injection/extraction frame header in
      front of the Ethernet header. When used in NPI mode, a DSA master would
      see junk instead of the destination MAC address, and it would most
      likely drop the packets. So the Ocelot frame header can have an optional
      prefix, which is just "ff:ff:ff:ff:ff:fe > ff:ff:ff:ff:ff:ff" padding
      put before the actual tag (still before the real Ethernet header) such
      that the DSA master thinks it's looking at a broadcast frame with a
      strange EtherType.
      
      Unfortunately, a lesson learned in commit 69df578c ("net: mscc:
      ocelot: eliminate confusion between CPU and NPI port") seems to have
      been forgotten in the meanwhile.
      
      The CPU port module and the NPI port have independent settings for the
      length of the tag prefix. However, the driver is using the same variable
      to program both of them.
      
      There is no reason really to use any tag prefix with the CPU port
      module, since that is not connected to any Ethernet port. So this patch
      makes the inj_prefix and xtr_prefix variables apply only to the NPI
      port (which the switchdev ocelot_vsc7514 driver does not use).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      cacea62f
    • V
      net: mscc: ocelot: reapply bridge forwarding mask on bonding join/leave · 9b521250
      Vladimir Oltean 提交于
      Applying the bridge forwarding mask currently is done only on the STP
      state changes for any port. But it depends on both STP state changes,
      and bonding interface state changes. Export the bit that recalculates
      the forwarding mask so that it could be reused, and call it when a port
      starts and stops offloading a bonding interface.
      
      Now that the logic is split into a separate function, we can rename "p"
      into "port", since the "port" variable was already taken in
      ocelot_bridge_stp_state_set. Also, we can rename "i" into "lag", to make
      it more clear what is it that we're iterating through.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9b521250
    • V
      net: mscc: ocelot: store a namespaced VCAP filter ID · 50c6cc5b
      Vladimir Oltean 提交于
      We will be adding some private VCAP filters that should not interfere in
      any way with the filters added using tc-flower. So we need to allocate
      some IDs which will not be used by tc.
      
      Currently ocelot uses an u32 id derived from the flow cookie, which in
      itself is an unsigned long. This is a problem in itself, since on 64 bit
      systems, sizeof(unsigned long)=8, so the driver is already truncating
      these.
      
      Create a struct ocelot_vcap_id which contains the full unsigned long
      cookie from tc, as well as a boolean that is supposed to namespace the
      filters added by tc with the ones that aren't.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      50c6cc5b
    • V
      net: mscc: ocelot: export VCAP structures to include/soc/mscc · 0e9bb4e9
      Vladimir Oltean 提交于
      The Felix driver will need to preinstall some VCAP filters for its
      tag_8021q implementation (outside of the tc-flower offload logic), so
      these need to be exported to the common includes.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      0e9bb4e9
  4. 21 1月, 2021 1 次提交
    • A
      net: mscc: ocelot: Fix multicast to the CPU port · 584b7cfc
      Alban Bedel 提交于
      Multicast entries in the MAC table use the high bits of the MAC
      address to encode the ports that should get the packets. But this port
      mask does not work for the CPU port, to receive these packets on the
      CPU port the MAC_CPU_COPY flag must be set.
      
      Because of this IPv6 was effectively not working because neighbor
      solicitations were never received. This was not apparent before commit
      9403c158 (net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb
      entries) as the IPv6 entries were broken so all incoming IPv6
      multicast was then treated as unknown and flooded on all ports.
      
      To fix this problem rework the ocelot_mact_learn() to set the
      MAC_CPU_COPY flag when a multicast entry that target the CPU port is
      added. For this we have to read back the ports endcoded in the pseudo
      MAC address by the caller. It is not a very nice design but that avoid
      changing the callers and should make backporting easier.
      Signed-off-by: NAlban Bedel <alban.bedel@aerq.com>
      Fixes: 9403c158 ("net: mscc: ocelot: support IPv4, IPv6 and plain Ethernet mdb entries")
      Link: https://lore.kernel.org/r/20210119140638.203374-1-alban.bedel@aerq.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      584b7cfc
  5. 19 1月, 2021 1 次提交
  6. 17 1月, 2021 1 次提交
  7. 16 1月, 2021 6 次提交
    • V
      net: mscc: ocelot: configure watermarks using devlink-sb · f59fd9ca
      Vladimir Oltean 提交于
      Using devlink-sb, we can configure 12/16 (the important 75%) of the
      switch's controlling watermarks for congestion drops, and we can monitor
      50% of the watermark occupancies (we can monitor the reservation
      watermarks, but not the sharing watermarks, which are exposed as pool
      sizes).
      
      The following definitions can be made:
      
      SB_BUF=0 # The devlink-sb for frame buffers
      SB_REF=1 # The devlink-sb for frame references
      POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances
                 # have one of these.
      POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances
                 # have one of these.
      
      Editing the hardware watermarks is done in the following way:
      BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING
      REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING
      BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR
      REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR
      
      Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly
      by modifying the corresponding pool size. By default, the pool size has
      maximum size, so this can be skipped.
      
      devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \
      	size 129840 thtype static
      
      Since by default there is no buffer reservation, the above command has
      maxed out BUF_COL_SHR_I(dp=0).
      
      Configuring the per-port reservation watermark (P_RSRV) is done in the
      following way:
      
      devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \
      	pool $POOL_ING th 1000
      
      The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this
      command, the sharing watermarks are internally reconfigured with 1000
      bytes less, i.e. from 129840 bytes to 128840 bytes.
      
      Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in
      the following way:
      
      for tc in {0..7}; do
      	devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \
      		type ingress pool $POOL_ING \
      		th 3000
      done
      
      The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes.
      The sharing watermarks are again reconfigured with 24000 bytes less.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f59fd9ca
    • V
      net: mscc: ocelot: initialize watermarks to sane defaults · a4ae997a
      Vladimir Oltean 提交于
      This is meant to be a gentle introduction into the world of watermarks
      on ocelot. The code is placed in ocelot_devlink.c because it will be
      integrated with devlink, even if it isn't right now.
      
      My first step was intended to be to replicate the default configuration
      of the congestion watermarks programatically, since they are now going
      to be tuned by the user.
      
      But after studying and understanding through trial and error how they
      work, I now believe that the configuration used out of reset does not do
      justice to the word "reservation", since the sum of all reservations
      exceeds the total amount of resources (otherwise said, all reservations
      cannot be fulfilled at the same time, which means that, contrary to the
      reference manual, they don't guarantee anything).
      
      As an example, here's a dump of the reservation watermarks for frame
      buffers, for port 0 (for brevity, the ports 1-6 were omitted, but they
      have the same configuration):
      
      BUF_Q_RSRV_I(port 0, prio 0) = max 3000 bytes
      BUF_Q_RSRV_I(port 0, prio 1) = max 3000 bytes
      BUF_Q_RSRV_I(port 0, prio 2) = max 3000 bytes
      BUF_Q_RSRV_I(port 0, prio 3) = max 3000 bytes
      BUF_Q_RSRV_I(port 0, prio 4) = max 3000 bytes
      BUF_Q_RSRV_I(port 0, prio 5) = max 3000 bytes
      BUF_Q_RSRV_I(port 0, prio 6) = max 3000 bytes
      BUF_Q_RSRV_I(port 0, prio 7) = max 3000 bytes
      
      Otherwise said, every port-tc has an ingress reservation of 3000 bytes,
      and there are 7 ports in VSC9959 Felix (6 user ports and 1 CPU port).
      Concentrating only on the ingress reservations, there are, in total,
      8 [traffic classes] x 7 [ports] x 3000 [bytes] = 168,000 bytes of memory
      reserved on ingress.
      But, surprise, Felix only has 128 KB of packet buffer in total...
      A similar thing happens with Seville, which has a larger packet buffer,
      but also more ports, and the default configuration is also overcommitted.
      
      This patch disables the (apparently) bogus reservations and moves all
      resources to the shared area. This way, real reservations can be set up
      by the user, using devlink-sb.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a4ae997a
    • V
      net: mscc: ocelot: register devlink ports · 6c30384e
      Vladimir Oltean 提交于
      Add devlink integration into the mscc_ocelot switchdev driver. All
      physical ports (i.e. the unused ones as well) except the CPU port module
      at ocelot->num_phys_ports are registered with devlink, and that requires
      keeping the devlink_port structure outside struct ocelot_port_private,
      since the latter has a 1:1 mapping with a struct net_device (which does
      not exist for unused ports).
      
      Since we use devlink_port_type_eth_set to link the devlink port to the
      net_device, we can as well remove the .ndo_get_phys_port_name and
      .ndo_get_port_parent_id implementations, since devlink takes care of
      retrieving the port name and number automatically, once
      .ndo_get_devlink_port is implemented.
      
      Note that the felix DSA driver is already integrated with devlink by
      default, since that is a thing that the DSA core takes care of. This is
      the reason why these devlink stubs were put in ocelot_net.c and not in
      the common library. It is also the reason why ocelot::devlink is a
      pointer and not a full structure embedded inside struct ocelot: because
      the mscc_ocelot driver allocates that by itself (as the container of
      struct ocelot, in fact), but in the case of felix, it is DSA who
      allocates the devlink, and felix just propagates the pointer towards
      struct ocelot.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      6c30384e
    • V
      net: mscc: ocelot: delete unused ocelot_set_cpu_port prototype · c6c65d47
      Vladimir Oltean 提交于
      This is a leftover of commit 69df578c ("net: mscc: ocelot: eliminate
      confusion between CPU and NPI port") which renamed that function.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      c6c65d47
    • V
      net: mscc: ocelot: add ops for decoding watermark threshold and occupancy · 703b7621
      Vladimir Oltean 提交于
      We'll need to read back the watermark thresholds and occupancy from
      hardware (for devlink-sb integration), not only to write them as we did
      so far in ocelot_port_set_maxlen. So introduce 2 new functions in struct
      ocelot_ops, similar to wm_enc, and implement them for the 3 supported
      mscc_ocelot switches.
      
      Remove the INUSE and MAXUSE unpacking helpers for the QSYS_RES_STAT
      register, because that doesn't scale with the number of switches that
      mscc_ocelot supports now. They have different bit widths for the
      watermarks, and we need function pointers to abstract that difference
      away.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      703b7621
    • V
      net: mscc: ocelot: auto-detect packet buffer size and number of frame references · f6fe01d6
      Vladimir Oltean 提交于
      Instead of reading these values from the reference manual and writing
      them down into the driver, it appears that the hardware gives us the
      option of detecting them dynamically.
      
      The number of frame references corresponds to what the reference manual
      notes, however it seems that the frame buffers are reported as slightly
      less than the books would indicate. On VSC9959 (Felix), the books say it
      should have 128KB of packet buffer, but the registers indicate only
      129840 bytes (126.79 KB). Also, the unit of measurement for FREECNT from
      the documentation of all these devices is incorrect (taken from an older
      generation). This was confirmed by Younes Leroul from Microchip support.
      
      Not having anything better to do with these values at the moment* (this
      will change soon), let's just print them.
      
      *The frame buffer size is, in fact, used to calculate the tail dropping
      watermarks.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f6fe01d6
  8. 12 1月, 2021 3 次提交
    • V
      net: switchdev: remove the transaction structure from port attributes · bae33f2b
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port attributes were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      attribute notifier structures, and converts drivers to not look at this
      member.
      
      In part, this patch contains a revert of my previous commit 2e554a7a
      ("net: dsa: propagate switchdev vlan_filtering prepare phase to
      drivers").
      
      For the most part, the conversion was trivial except for:
      - Rocker's world implementation based on Broadcom OF-DPA had an odd
        implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
        done mechanically, by pasting the implementation twice, then only
        keeping the code that would get executed during prepare phase on top,
        then only keeping the code that gets executed during the commit phase
        on bottom, then simplifying the resulting code until this was obtained.
      - DSA's offloading of STP state, bridge flags, VLAN filtering and
        multicast router could be converted right away. But the ageing time
        could not, so a shim was introduced and this was left for a further
        commit.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      bae33f2b
    • V
      net: switchdev: remove the transaction structure from port object notifiers · ffb68fc5
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port objects were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      object notifier structures, and converts drivers to not look at this
      member.
      
      Where driver conversion is trivial (like in the case of the Marvell
      Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
      done in this patch.
      
      Where driver conversion needs more attention (DSA, Mellanox Spectrum),
      the conversion is left for subsequent patches and here we only fake the
      prepare/commit phases at a lower level, just not in the switchdev
      notifier itself.
      
      Where the code has a natural structure that is best left alone as a
      preparation and a commit phase (as in the case of the Ocelot switch),
      that structure is left in place, just made to not depend upon the
      switchdev transactional model.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ffb68fc5
    • V
      net: switchdev: remove vid_begin -> vid_end range from VLAN objects · b7a9e0da
      Vladimir Oltean 提交于
      The call path of a switchdev VLAN addition to the bridge looks something
      like this today:
      
              nbp_vlan_init
              |  __br_vlan_set_default_pvid
              |  |                       |
              |  |    br_afspec          |
              |  |        |              |
              |  |        v              |
              |  | br_process_vlan_info  |
              |  |        |              |
              |  |        v              |
              |  |   br_vlan_info        |
              |  |       / \            /
              |  |      /   \          /
              |  |     /     \        /
              |  |    /       \      /
              v  v   v         v    v
            nbp_vlan_add   br_vlan_add ------+
             |              ^      ^ |       |
             |             /       | |       |
             |            /       /  /       |
             \ br_vlan_get_master/  /        v
              \        ^        /  /  br_vlan_add_existing
               \       |       /  /          |
                \      |      /  /          /
                 \     |     /  /          /
                  \    |    /  /          /
                   \   |   /  /          /
                    v  |   | v          /
                    __vlan_add         /
                       / |            /
                      /  |           /
                     v   |          /
         __vlan_vid_add  |         /
                     \   |        /
                      v  v        v
            br_switchdev_port_vlan_add
      
      The ranges UAPI was introduced to the bridge in commit bdced7ef
      ("bridge: support for multiple vlans and vlan ranges in setlink and
      dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
      have always been passed one by one, through struct bridge_vlan_info
      tmp_vinfo, to br_vlan_info. So the range never went too far in depth.
      
      Then Scott Feldman introduced the switchdev_port_bridge_setlink function
      in commit 47f8328b ("switchdev: add new switchdev bridge setlink").
      That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
      full use of the range. But switchdev_port_bridge_setlink was called like
      this:
      
      br_setlink
      -> br_afspec
      -> switchdev_port_bridge_setlink
      
      Basically, the switchdev and the bridge code were not tightly integrated.
      Then commit 41c498b9 ("bridge: restore br_setlink back to original")
      came, and switchdev drivers were required to implement
      .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.
      
      In the meantime, commits such as 0944d6b5 ("bridge: try switchdev op
      first in __vlan_vid_add/del") finally made switchdev penetrate the
      br_vlan_info() barrier and start to develop the call path we have today.
      But remember, br_vlan_info() still receives VLANs one by one.
      
      Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
      29ab586c ("net: switchdev: Remove bridge bypass support from
      switchdev") so that drivers would not implement .ndo_bridge_setlink any
      longer. The switchdev_port_bridge_setlink also got deleted.
      This refactoring removed the parallel bridge_setlink implementation from
      switchdev, and left the only switchdev VLAN objects to be the ones
      offloaded from __vlan_vid_add (basically RX filtering) and  __vlan_add
      (the latter coming from commit 9c86ce2c ("net: bridge: Notify about
      bridge VLANs")).
      
      That is to say, today the switchdev VLAN object ranges are not used in
      the kernel. Refactoring the above call path is a bit complicated, when
      the bridge VLAN call path is already a bit complicated.
      
      Let's go off and finish the job of commit 29ab586c by deleting the
      bogus iteration through the VLAN ranges from the drivers. Some aspects
      of this feature never made too much sense in the first place. For
      example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
      flag supposed to mean, when a port can obviously have a single pvid?
      This particular configuration _is_ denied as of commit 6623c60d
      ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
      perspective, the driver still has to play pretend, and only offload the
      vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
      modify the flags of another, completely unrelated, switchdev VLAN
      object! (a VLAN that is PVID will invalidate the PVID flag from whatever
      other VLAN had previously been offloaded with switchdev and had that
      flag. Yet switchdev never notifies about that change, drivers are
      supposed to guess).
      
      Nonetheless, having a VLAN range in the API makes error handling look
      scarier than it really is - unwinding on errors and all of that.
      When in reality, no one really calls this API with more than one VLAN.
      It is all unnecessary complexity.
      
      And despite appearing pretentious (two-phase transactional model and
      all), the switchdev API is really sloppy because the VLAN addition and
      removal operations are not paired with one another (you can add a VLAN
      100 times and delete it just once). The bridge notifies through
      switchdev of a VLAN addition not only when the flags of an existing VLAN
      change, but also when nothing changes. There are switchdev drivers out
      there who don't like adding a VLAN that has already been added, and
      those checks don't really belong at driver level. But the fact that the
      API contains ranges is yet another factor that prevents this from being
      addressed in the future.
      
      Of the existing switchdev pieces of hardware, it appears that only
      Mellanox Spectrum supports offloading more than one VLAN at a time,
      through mlxsw_sp_port_vlan_set. I have kept that code internal to the
      driver, because there is some more bookkeeping that makes use of it, but
      I deleted it from the switchdev API. But since the switchdev support for
      ranges has already been de facto deleted by a Mellanox employee and
      nobody noticed for 4 years, I'm going to assume it's not a biggie.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b7a9e0da
  9. 17 12月, 2020 1 次提交
  10. 15 12月, 2020 1 次提交
  11. 06 12月, 2020 1 次提交
    • V
      net: mscc: ocelot: fix dropping of unknown IPv4 multicast on Seville · edd2410b
      Vladimir Oltean 提交于
      The current assumption is that the felix DSA driver has flooding knobs
      per traffic class, while ocelot switchdev has a single flooding knob.
      This was correct for felix VSC9959 and ocelot VSC7514, but with the
      introduction of seville VSC9953, we see a switch driven by felix.c which
      has a single flooding knob.
      
      So it is clear that we must do what should have been done from the
      beginning, which is not to overwrite the configuration done by ocelot.c
      in felix, but instead to teach the common ocelot library about the
      differences in our switches, and set up the flooding PGIDs centrally.
      
      The effect that the bogus iteration through FELIX_NUM_TC has upon
      seville is quite dramatic. ANA_FLOODING is located at 0x00b548, and
      ANA_FLOODING_IPMC is located at 0x00b54c. So the bogus iteration will
      actually overwrite ANA_FLOODING_IPMC when attempting to write
      ANA_FLOODING[1]. There is no ANA_FLOODING[1] in sevile, just ANA_FLOODING.
      
      And when ANA_FLOODING_IPMC is overwritten with a bogus value, the effect
      is that ANA_FLOODING_IPMC gets the value of 0x0003CF7D:
      	MC6_DATA = 61,
      	MC6_CTRL = 61,
      	MC4_DATA = 60,
      	MC4_CTRL = 0.
      Because MC4_CTRL is zero, this means that IPv4 multicast control packets
      are not flooded, but dropped. An invalid configuration, and this is how
      the issue was actually spotted.
      Reported-by: NEldar Gasanov <eldargasanov2@gmail.com>
      Reported-by: NMaxim Kochetkov <fido_max@inbox.ru>
      Tested-by: NEldar Gasanov <eldargasanov2@gmail.com>
      Fixes: 84705fc1 ("net: dsa: felix: introduce support for Seville VSC9953 switch")
      Fixes: 3c7b51bd ("net: dsa: felix: allow flooding for all traffic classes")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
      Link: https://lore.kernel.org/r/20201204175416.1445937-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      edd2410b
  12. 03 11月, 2020 6 次提交
    • V
      net: mscc: ocelot: deny changing the native VLAN from the prepare phase · 2f0402fe
      Vladimir Oltean 提交于
      Put the preparation phase of switchdev VLAN objects to some good use,
      and move the check we already had, for preventing the existence of more
      than one egress-untagged VLAN per port, to the preparation phase of the
      addition.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      2f0402fe
    • V
      net: mscc: ocelot: move the logic to drop 802.1p traffic to the pvid deletion · be0576fe
      Vladimir Oltean 提交于
      Currently, the ocelot_port_set_native_vlan() function starts dropping
      untagged and prio-tagged traffic when the native VLAN is removed?
      
      What is the native VLAN? It is the only egress-untagged VLAN that ocelot
      supports on a port. If the port is a trunk with 100 VLANs, one of those
      VLANs can be transmitted as egress-untagged, and that's the native VLAN.
      
      Is it wrong to drop untagged and prio-tagged traffic if there's no
      native VLAN? Yes and no.
      
      In this case, which is more typical, it's ok to apply that drop
      configuration:
      $ bridge vlan add dev swp0 vid 1 pvid untagged <- this is the native VLAN
      $ bridge vlan add dev swp0 vid 100
      $ bridge vlan add dev swp0 vid 101
      $ bridge vlan del dev swp0 vid 1 <- delete the native VLAN
      But only because the pvid and the native VLAN have the same ID.
      
      In this case, it isn't:
      $ bridge vlan add dev swp0 vid 1 pvid
      $ bridge vlan add dev swp0 vid 100 untagged <- this is the native VLAN
      $ bridge vlan del dev swp0 vid 101
      $ bridge vlan del dev swp0 vid 100 <- delete the native VLAN
      
      It's wrong, because the switch will drop untagged and prio-tagged
      traffic now, despite having a valid pvid of 1.
      
      The confusion seems to stem from the fact that the native VLAN is an
      egress setting, while the PVID is an ingress setting. It would be
      correct to drop untagged and prio-tagged traffic only if there was no
      pvid on the port. So let's do just that.
      
      Background:
      https://lore.kernel.org/netdev/CA+h21hrRMrLH-RjBGhEJSTZd6_QPRSd3RkVRQF-wNKkrgKcRSA@mail.gmail.com/#tSigned-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      be0576fe
    • V
      net: mscc: ocelot: add a "valid" boolean to struct ocelot_vlan · e2b2e83e
      Vladimir Oltean 提交于
      Currently we are checking in some places whether the port has a native
      VLAN on egress or not, by comparing the ocelot_port->vid value with zero.
      
      That works, because VID 0 can never be a native VLAN configured by the
      bridge, but now we want to make similar checks for the pvid. That won't
      work, because there are cases when we do have the pvid set to 0 (not by
      the bridge, by ourselves, but still.. it's confusing). And we can't
      encode a negative value into an u16, so add a bool to the structure.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e2b2e83e
    • V
      net: mscc: ocelot: transform the pvid and native vlan values into a structure · c3e58a75
      Vladimir Oltean 提交于
      This is a mechanical patch only.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      c3e58a75
    • V
      net: mscc: ocelot: don't reset the pvid to 0 when deleting it · 110e847c
      Vladimir Oltean 提交于
      I have no idea why this code is here, but I have 2 hypotheses:
      
      1.
      A desperate attempt to keep untagged traffic working when the bridge
      deletes the pvid on a port.
      
      There was a fairly okay discussion here:
      https://lore.kernel.org/netdev/CA+h21hrRMrLH-RjBGhEJSTZd6_QPRSd3RkVRQF-wNKkrgKcRSA@mail.gmail.com/#t
      which established that in vlan_filtering=1 mode, the absence of a pvid
      should denote that the ingress port should drop untagged and priority
      tagged traffic. While in vlan_filtering=0 mode, nothing should change.
      
      So in vlan_filtering=1 mode, we should simply let things happen, and not
      attempt to save the day. And in vlan_filtering=0 mode, the pvid is 0
      anyway, no need to do anything.
      
      2.
      The driver encodes the native VLAN (ocelot_port->vid) value of 0 as
      special, meaning "not valid". There are checks based on that. But there
      are no such checks for the ocelot_port->pvid value of 0. In fact, that's
      a perfectly valid value, which is used in standalone mode. Maybe there
      was some confusion and the author thought that 0 means "invalid" here as
      well.
      
      In conclusion, delete the code*.
      
      *in fact we'll add it back later, in a slightly different form, but for
      an entirely different reason than the one for which this exists now.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      110e847c
    • V
      net: mscc: ocelot: use the pvid of zero when bridged with vlan_filtering=0 · 75e5a554
      Vladimir Oltean 提交于
      Currently, mscc_ocelot ports configure pvid=0 in standalone mode, and
      inherit the pvid from the bridge when one is present.
      
      When the bridge has vlan_filtering=0, the software semantics are that
      packets should be received regardless of whether there's a pvid
      configured on the ingress port or not. However, ocelot does not observe
      those semantics today.
      
      Moreover, changing the PVID is also a problem with vlan_filtering=0.
      We are privately remapping the VID of FDB, MDB entries to the port's
      PVID when those are VLAN-unaware (i.e. when the VID of these entries
      comes to us as 0). But we have no logic of adjusting that remapping when
      the user changes the pvid and vlan_filtering is 0. So stale entries
      would be left behind, and untagged traffic will stop matching on them.
      
      And even if we were to solve that, there's an even bigger problem. If
      swp0 has pvid 1, and swp1 has pvid 2, and both are under a vlan_filtering=0
      bridge, they should be able to forward traffic between one another.
      However, with ocelot they wouldn't do that.
      
      The simplest way of fixing this is to never configure the pvid based on
      what the bridge is asking for, when vlan_filtering is 0. Only if there
      was a VLAN that the bridge couldn't mangle, that we could use as pvid....
      So, turns out, there's 0 just for that. And for a reason: IEEE
      802.1Q-2018, page 247, Table 9-2-Reserved VID values says:
      
      	The null VID. Indicates that the tag header contains only
      	priority information; no VID is present in the frame.
      	This VID value shall not be configured as a PVID or a member
      	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      	of a VID Set, or configured in any FDB entry, or used in any
      	Management operation.
      
      So, aren't we doing exactly what 802.1Q says not to? Well, in a way, but
      what we're doing here is just driver-level bookkeeping, all for the
      better. The fact that we're using a pvid of 0 is not observable behavior
      from the outside world: the network stack does not see the classified
      VLAN that the switch uses, in vlan_filtering=0 mode. And we're also more
      consistent with the standalone mode now.
      
      And now that we use the pvid of 0 in this mode, there's another advantage:
      we don't need to perform any VID remapping for FDB and MDB entries either,
      we can just use the VID of 0 that the bridge is passing to us.
      
      The only gotcha is that every time we change the vlan_filtering setting,
      we need to reapply the pvid (either to 0, or to the value from the bridge).
      A small side-effect visible in the patch is that ocelot_port_set_pvid
      needs to be moved above ocelot_port_vlan_filtering, so that it can be
      called from there without forward-declarations.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      75e5a554
  13. 31 10月, 2020 4 次提交