1. 21 4月, 2021 1 次提交
  2. 14 4月, 2021 1 次提交
    • M
      of: net: pass the dst buffer to of_get_mac_address() · 83216e39
      Michael Walle 提交于
      of_get_mac_address() returns a "const void*" pointer to a MAC address.
      Lately, support to fetch the MAC address by an NVMEM provider was added.
      But this will only work with platform devices. It will not work with
      PCI devices (e.g. of an integrated root complex) and esp. not with DSA
      ports.
      
      There is an of_* variant of the nvmem binding which works without
      devices. The returned data of a nvmem_cell_read() has to be freed after
      use. On the other hand the return of_get_mac_address() points to some
      static data without a lifetime. The trick for now, was to allocate a
      device resource managed buffer which is then returned. This will only
      work if we have an actual device.
      
      Change it, so that the caller of of_get_mac_address() has to supply a
      buffer where the MAC address is written to. Unfortunately, this will
      touch all drivers which use the of_get_mac_address().
      
      Usually the code looks like:
      
        const char *addr;
        addr = of_get_mac_address(np);
        if (!IS_ERR(addr))
          ether_addr_copy(ndev->dev_addr, addr);
      
      This can then be simply rewritten as:
      
        of_get_mac_address(np, ndev->dev_addr);
      
      Sometimes is_valid_ether_addr() is used to test the MAC address.
      of_get_mac_address() already makes sure, it just returns a valid MAC
      address. Thus we can just test its return code. But we have to be
      careful if there are still other sources for the MAC address before the
      of_get_mac_address(). In this case we have to keep the
      is_valid_ether_addr() call.
      
      The following coccinelle patch was used to convert common cases to the
      new style. Afterwards, I've manually gone over the drivers and fixed the
      return code variable: either used a new one or if one was already
      available use that. Mansour Moufid, thanks for that coccinelle patch!
      
      <spml>
      @a@
      identifier x;
      expression y, z;
      @@
      - x = of_get_mac_address(y);
      + x = of_get_mac_address(y, z);
        <...
      - ether_addr_copy(z, x);
        ...>
      
      @@
      identifier a.x;
      @@
      - if (<+... x ...+>) {}
      
      @@
      identifier a.x;
      @@
        if (<+... x ...+>) {
            ...
        }
      - else {}
      
      @@
      identifier a.x;
      expression e;
      @@
      - if (<+... x ...+>@e)
      -     {}
      - else
      + if (!(e))
            {...}
      
      @@
      expression x, y, z;
      @@
      - x = of_get_mac_address(y, z);
      + of_get_mac_address(y, z);
        ... when != x
      </spml>
      
      All drivers, except drivers/net/ethernet/aeroflex/greth.c, were
      compile-time tested.
      Suggested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NMichael Walle <michael@walle.cc>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83216e39
  3. 19 3月, 2021 1 次提交
  4. 18 3月, 2021 1 次提交
  5. 17 2月, 2021 1 次提交
  6. 15 2月, 2021 3 次提交
    • V
      net: dsa: propagate extack to .port_vlan_filtering · 89153ed6
      Vladimir Oltean 提交于
      Some drivers can't dynamically change the VLAN filtering option, or
      impose some restrictions, it would be nice to propagate this info
      through netlink instead of printing it to a kernel log that might never
      be read. Also netlink extack includes the module that emitted the
      message, which means that it's easier to figure out which ones are
      driver-generated errors as opposed to command misuse.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89153ed6
    • V
      net: dsa: propagate extack to .port_vlan_add · 31046a5f
      Vladimir Oltean 提交于
      Allow drivers to communicate their restrictions to user space directly,
      instead of printing to the kernel log. Where the conversion would have
      been lossy and things like VLAN ID could no longer be conveyed (due to
      the lack of support for printf format specifier in netlink extack), I
      chose to keep the messages in full form to the kernel log only, and
      leave it up to individual driver maintainers to move more messages to
      extack.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31046a5f
    • V
      net: dsa: tag_ocelot: create separate tagger for Seville · 7c4bb540
      Vladimir Oltean 提交于
      The ocelot tagger is a hot mess currently, it relies on memory
      initialized by the attached driver for basic frame transmission.
      This is against all that DSA tagging protocols stand for, which is that
      the transmission and reception of a DSA-tagged frame, the data path,
      should be independent from the switch control path, because the tag
      protocol is in principle hot-pluggable and reusable across switches
      (even if in practice it wasn't until very recently). But if another
      driver like dsa_loop wants to make use of tag_ocelot, it couldn't.
      
      This was done to have common code between Felix and Ocelot, which have
      one bit difference in the frame header format. Quoting from commit
      67c24049 ("net: dsa: felix: create a template for the DSA tags on
      xmit"):
      
          Other alternatives have been analyzed, such as:
          - Create a separate tag_seville.c: too much code duplication for just 1
            bit field difference.
          - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like
            tag_brcm.c, which would have a separate .xmit function. Again, too
            much code duplication for just 1 bit field difference.
          - Allocate the template from the init function of the tag_ocelot.c
            module, instead of from the driver: couldn't figure out a method of
            accessing the correct port template corresponding to the correct
            tagger in the .xmit function.
      
      The really interesting part is that Seville should have had its own
      tagging protocol defined - it is not compatible on the wire with Ocelot,
      even for that single bit. In principle, a packet generated by
      DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain
      way, but when booted on NXP T1040 it would look differently. The reverse
      is also true: a packet generated by a Seville switch would be
      interpreted incorrectly by Wireshark if it was told it was generated by
      an Ocelot switch.
      
      Actually things are a bit more nuanced. If we concentrate only on the
      DSA tag, what I said above is true, but Ocelot/Seville also support an
      optional DSA tag prefix, which can be short or long, and it is possible
      to distinguish the two taggers based on an integer constant put in that
      prefix. Nonetheless, creating a separate tagger is still justified,
      since the tag prefix is optional, and without it, there is again no way
      to distinguish.
      
      Claiming backwards binary compatibility is a bit more tough, since I've
      already changed the format of tag_ocelot once, in commit 5124197c
      ("net: dsa: tag_ocelot: use a short prefix on both ingress and egress").
      Therefore I am not very concerned with treating this as a bugfix and
      backporting it to stable kernels (which would be another mess due to the
      fact that there would be lots of conflicts with the other DSA_TAG_PROTO*
      definitions). It's just simpler to say that the string values of the
      taggers have ABI value starting with kernel 5.12, which will be when the
      changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging
      goes live.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c4bb540
  7. 13 2月, 2021 1 次提交
    • V
      net: dsa: act as passthrough for bridge port flags · a8b659e7
      Vladimir Oltean 提交于
      There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be
      expressed by the bridge through switchdev, and not all of them can be
      emulated by DSA mid-layer API at the same time.
      
      One possible configuration is when the bridge offloads the port flags
      using a mask that has a single bit set - therefore only one feature
      should change. However, DSA currently groups together unicast and
      multicast flooding in the .port_egress_floods method, which limits our
      options when we try to add support for turning off broadcast flooding:
      do we extend .port_egress_floods with a third parameter which b53 and
      mv88e6xxx will ignore? But that means that the DSA layer, which
      currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will
      see that .port_egress_floods is implemented, and will report that all 3
      types of flooding are supported - not necessarily true.
      
      Another configuration is when the user specifies more than one flag at
      the same time, in the same netlink message. If we were to create one
      individual function per offloadable bridge port flag, we would limit the
      expressiveness of the switch driver of refusing certain combinations of
      flag values. For example, a switch may not have an explicit knob for
      flooding of unknown multicast, just for flooding in general. In that
      case, the only correct thing to do is to allow changes to BR_FLOOD and
      BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having
      a separate .port_set_unicast_flood and .port_set_multicast_flood would
      not allow the driver to possibly reject that.
      
      Also, DSA doesn't consider it necessary to inform the driver that a
      SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
      just calls .port_egress_floods for the CPU port. When we'll add support
      for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
      problem because the flood settings will need to be held statefully in
      the DSA middle layer, otherwise changing the mrouter port attribute will
      impact the flooding attribute. And that's _assuming_ that the underlying
      hardware doesn't have anything else to do when a multicast router
      attaches to a port than flood unknown traffic to it.  If it does, there
      will need to be a dedicated .port_set_mrouter anyway.
      
      So we need to let the DSA drivers see the exact form that the bridge
      passes this switchdev attribute in, otherwise we are standing in the
      way. Therefore we also need to use this form of language when
      communicating to the driver that it needs to configure its initial
      (before bridge join) and final (after bridge leave) port flags.
      
      The b53 and mv88e6xxx drivers are converted to the passthrough API and
      their implementation of .port_egress_floods is split into two: a
      function that configures unicast flooding and another for multicast.
      The mv88e6xxx implementation is quite hairy, and it turns out that
      the implementations of unknown unicast flooding are actually the same
      for 6185 and for 6352:
      
      behind the confusing names actually lie two individual bits:
      NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2)
      NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3)
      
      so there was no reason to entangle them in the first place.
      
      Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of
      PORT_CTL0, which has the exact same bit index. I have left the
      implementations separate though, for the only reason that the names are
      different enough to confuse me, since I am not able to double-check with
      a user manual. The multicast flooding setting for 6185 is in a different
      register than for 6352 though.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8b659e7
  8. 12 2月, 2021 1 次提交
  9. 30 1月, 2021 3 次提交
    • V
      net: dsa: add a second tagger for Ocelot switches based on tag_8021q · 7c83a7c5
      Vladimir Oltean 提交于
      There are use cases for which the existing tagger, based on the NPI
      (Node Processor Interface) functionality, is insufficient.
      
      Namely:
      - Frames injected through the NPI port bypass the frame analyzer, so no
        source address learning is performed, no TSN stream classification,
        etc.
      - Flow control is not functional over an NPI port (PAUSE frames are
        encapsulated in the same Extraction Frame Header as all other frames)
      - There can be at most one NPI port configured for an Ocelot switch. But
        in NXP LS1028A and T1040 there are two Ethernet CPU ports. The non-NPI
        port is currently either disabled, or operated as a plain user port
        (albeit an internally-facing one). Having the ability to configure the
        two CPU ports symmetrically could pave the way for e.g. creating a LAG
        between them, to increase bandwidth seamlessly for the system.
      
      So there is a desire to have an alternative to the NPI mode. This change
      keeps the default tagger for the Seville and Felix switches as "ocelot",
      but it can be changed via the following device attribute:
      
      echo ocelot-8021q > /sys/class/<dsa-master>/dsa/tagging
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      7c83a7c5
    • V
      net: dsa: allow changing the tag protocol via the "tagging" device attribute · 53da0eba
      Vladimir Oltean 提交于
      Currently DSA exposes the following sysfs:
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot
      
      which is a read-only device attribute, introduced in the kernel as
      commit 98cdb480 ("net: dsa: Expose tagging protocol to user-space"),
      and used by libpcap since its commit 993db3800d7d ("Add support for DSA
      link-layer types").
      
      It would be nice if we could extend this device attribute by making it
      writable:
      $ echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
      
      This is useful with DSA switches that can make use of more than one
      tagging protocol. It may be useful in dsa_loop in the future too, to
      perform offline testing of various taggers, or for changing between dsa
      and edsa on Marvell switches, if that is desirable.
      
      In terms of implementation, drivers can support this feature by
      implementing .change_tag_protocol, which should always leave the switch
      in a consistent state: either with the new protocol if things went well,
      or with the old one if something failed. Teardown of the old protocol,
      if necessary, must be handled by the driver.
      
      Some things remain as before:
      - The .get_tag_protocol is currently only called at probe time, to load
        the initial tagging protocol driver. Nonetheless, new drivers should
        report the tagging protocol in current use now.
      - The driver should manage by itself the initial setup of tagging
        protocol, no later than the .setup() method, as well as destroying
        resources used by the last tagger in use, no earlier than the
        .teardown() method.
      
      For multi-switch DSA trees, error handling is a bit more complicated,
      since e.g. the 5th out of 7 switches may fail to change the tag
      protocol. When that happens, a revert to the original tag protocol is
      attempted, but that may fail too, leaving the tree in an inconsistent
      state despite each individual switch implementing .change_tag_protocol
      transactionally. Since the intersection between drivers that implement
      .change_tag_protocol and drivers that support D in DSA is currently the
      empty set, the possibility for this error to happen is ignored for now.
      
      Testing:
      
      $ insmod mscc_felix.ko
      [   79.549784] mscc_felix 0000:00:00.5: Adding to iommu group 14
      [   79.565712] mscc_felix 0000:00:00.5: Failed to register DSA switch: -517
      $ insmod tag_ocelot.ko
      $ rmmod mscc_felix.ko
      $ insmod mscc_felix.ko
      [   97.261724] libphy: VSC9959 internal MDIO bus: probed
      [   97.267363] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 0
      [   97.274998] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 1
      [   97.282561] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 2
      [   97.289700] mscc_felix 0000:00:00.5: Found PCS at internal MDIO address 3
      [   97.599163] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   97.862034] mscc_felix 0000:00:00.5 swp1 (uninitialized): PHY [0000:00:00.3:11] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   97.950731] mscc_felix 0000:00:00.5 swp0: configuring for inband/qsgmii link mode
      [   97.964278] 8021q: adding VLAN 0 to HW filter on device swp0
      [   98.146161] mscc_felix 0000:00:00.5 swp2 (uninitialized): PHY [0000:00:00.3:12] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   98.238649] mscc_felix 0000:00:00.5 swp1: configuring for inband/qsgmii link mode
      [   98.251845] 8021q: adding VLAN 0 to HW filter on device swp1
      [   98.433916] mscc_felix 0000:00:00.5 swp3 (uninitialized): PHY [0000:00:00.3:13] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
      [   98.485542] mscc_felix 0000:00:00.5: configuring for fixed/internal link mode
      [   98.503584] mscc_felix 0000:00:00.5: Link is Up - 2.5Gbps/Full - flow control rx/tx
      [   98.527948] device eno2 entered promiscuous mode
      [   98.544755] DSA: tree 0 setup
      
      $ ping 10.0.0.1
      PING 10.0.0.1 (10.0.0.1): 56 data bytes
      64 bytes from 10.0.0.1: seq=0 ttl=64 time=2.337 ms
      64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.754 ms
      ^C
       -  10.0.0.1 ping statistics  -
      2 packets transmitted, 2 packets received, 0% packet loss
      round-trip min/avg/max = 0.754/1.545/2.337 ms
      
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot
      $ cat ./test_ocelot_8021q.sh
              #!/bin/bash
      
              ip link set swp0 down
              ip link set swp1 down
              ip link set swp2 down
              ip link set swp3 down
              ip link set swp5 down
              ip link set eno2 down
              echo ocelot-8021q > /sys/class/net/eno2/dsa/tagging
              ip link set eno2 up
              ip link set swp0 up
              ip link set swp1 up
              ip link set swp2 up
              ip link set swp3 up
              ip link set swp5 up
      $ ./test_ocelot_8021q.sh
      ./test_ocelot_8021q.sh: line 9: echo: write error: Protocol not available
      $ rmmod tag_ocelot.ko
      rmmod: can't unload module 'tag_ocelot': Resource temporarily unavailable
      $ insmod tag_ocelot_8021q.ko
      $ ./test_ocelot_8021q.sh
      $ cat /sys/class/net/eno2/dsa/tagging
      ocelot-8021q
      $ rmmod tag_ocelot.ko
      $ rmmod tag_ocelot_8021q.ko
      rmmod: can't unload module 'tag_ocelot_8021q': Resource temporarily unavailable
      $ ping 10.0.0.1
      PING 10.0.0.1 (10.0.0.1): 56 data bytes
      64 bytes from 10.0.0.1: seq=0 ttl=64 time=0.953 ms
      64 bytes from 10.0.0.1: seq=1 ttl=64 time=0.787 ms
      64 bytes from 10.0.0.1: seq=2 ttl=64 time=0.771 ms
      $ rmmod mscc_felix.ko
      [  645.544426] mscc_felix 0000:00:00.5: Link is Down
      [  645.838608] DSA: tree 0 torn down
      $ rmmod tag_ocelot_8021q.ko
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      53da0eba
    • V
      net: dsa: keep a copy of the tagging protocol in the DSA switch tree · 357f203b
      Vladimir Oltean 提交于
      Cascading DSA switches can be done multiple ways. There is the brute
      force approach / tag stacking, where one upstream switch, located
      between leaf switches and the host Ethernet controller, will just
      happily transport the DSA header of those leaf switches as payload.
      For this kind of setups, DSA works without any special kind of treatment
      compared to a single switch - they just aren't aware of each other.
      Then there's the approach where the upstream switch understands the tags
      it transports from its leaves below, as it doesn't push a tag of its own,
      but it routes based on the source port & switch id information present
      in that tag (as opposed to DMAC & VID) and it strips the tag when
      egressing a front-facing port. Currently only Marvell implements the
      latter, and Marvell DSA trees contain only Marvell switches.
      
      So it is safe to say that DSA trees already have a single tag protocol
      shared by all switches, and in fact this is what makes the switches able
      to understand each other. This fact is also implied by the fact that
      currently, the tagging protocol is reported as part of a sysfs installed
      on the DSA master and not per port, so it must be the same for all the
      ports connected to that DSA master regardless of the switch that they
      belong to.
      
      It's time to make this official and enforce it (yes, this also means we
      won't have any "switch understands tag to some extent but is not able to
      speak it" hardware oddities that we'll support in the future).
      
      This is needed due to the imminent introduction of the dsa_switch_ops::
      change_tag_protocol driver API. When that is introduced, we'll have
      to notify switches of the tagging protocol that they're configured to
      use. Currently the tag_ops structure pointer is held only for CPU ports.
      But there are switches which don't have CPU ports and nonetheless still
      need to be configured. These would be Marvell leaf switches whose
      upstream port is just a DSA link. How do we inform these of their
      tagging protocol setup/deletion?
      
      One answer to the above would be: iterate through the DSA switch tree's
      ports once, list the CPU ports, get their tag_ops, then iterate again
      now that we have it, and notify everybody of that tag_ops. But what to
      do if conflicts appear between one cpu_dp->tag_ops and another? There's
      no escaping the fact that conflict resolution needs to be done, so we
      can be upfront about it.
      
      Ease our work and just keep the master copy of the tag_ops inside the
      struct dsa_switch_tree. Reference counting is now moved to be per-tree
      too, instead of per-CPU port.
      
      There are many places in the data path that access master->dsa_ptr->tag_ops
      and we would introduce unnecessary performance penalty going through yet
      another indirection, so keep those right where they are.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      357f203b
  10. 16 1月, 2021 2 次提交
  11. 15 1月, 2021 1 次提交
    • T
      net: dsa: Link aggregation support · 058102a6
      Tobias Waldekranz 提交于
      Monitor the following events and notify the driver when:
      
      - A DSA port joins/leaves a LAG.
      - A LAG, made up of DSA ports, joins/leaves a bridge.
      - A DSA port in a LAG is enabled/disabled (enabled meaning
        "distributing" in 802.3ad LACP terms).
      
      When a LAG joins a bridge, the DSA subsystem will treat that as each
      individual port joining the bridge. The driver may look at the port's
      LAG device pointer to see if it is associated with any LAG, if that is
      required. This is analogue to how switchdev events are replicated out
      to all lower devices when reaching e.g. a LAG.
      
      Drivers can optionally request that DSA maintain a linear mapping from
      a LAG ID to the corresponding netdev by setting ds->num_lag_ids to the
      desired size.
      
      In the event that the hardware is not capable of offloading a
      particular LAG for any reason (the typical case being use of exotic
      modes like broadcast), DSA will take a hands-off approach, allowing
      the LAG to be formed as a pure software construct. This is reported
      back through the extended ACK, but is otherwise transparent to the
      user.
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Tested-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      058102a6
  12. 13 1月, 2021 1 次提交
  13. 12 1月, 2021 3 次提交
    • V
      net: dsa: remove the transactional logic from VLAN objects · 1958d581
      Vladimir Oltean 提交于
      It should be the driver's business to logically separate its VLAN
      offloading into a preparation and a commit phase, and some drivers don't
      need / can't do this.
      
      So remove the transactional shim from DSA and let drivers propagate
      errors directly from the .port_vlan_add callback.
      
      It would appear that the code has worse error handling now than it had
      before. DSA is the only in-kernel user of switchdev that offloads one
      switchdev object to more than one port: for every VLAN object offloaded
      to a user port, that VLAN is also offloaded to the CPU port. So the
      "prepare for user port -> check for errors -> prepare for CPU port ->
      check for errors -> commit for user port -> commit for CPU port"
      sequence appears to make more sense than the one we are using now:
      "offload to user port -> check for errors -> offload to CPU port ->
      check for errors", but it is really a compromise. In the new way, we can
      catch errors from the commit phase that we previously had to ignore.
      But we have our hands tied and cannot do any rollback now: if we add a
      VLAN on the CPU port and it fails, we can't do the rollback by simply
      deleting it from the user port, because the switchdev API is not so nice
      with us: it could have simply been there already, even with the same
      flags. So we don't even attempt to rollback anything on addition error,
      just leave whatever VLANs managed to get offloaded right where they are.
      This should not be a problem at all in practice.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1958d581
    • V
      net: dsa: remove the transactional logic from MDB entries · a52b2da7
      Vladimir Oltean 提交于
      For many drivers, the .port_mdb_prepare callback was not a good opportunity
      to avoid any error condition, and they would suppress errors found during
      the actual commit phase.
      
      Where a logical separation between the prepare and the commit phase
      existed, the function that used to implement the .port_mdb_prepare
      callback still exists, but now it is called directly from .port_mdb_add,
      which was modified to return an int code.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Reviewed-by: Linus Wallei <linus.walleij@linaro.org> # RTL8366
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a52b2da7
    • V
      net: switchdev: remove the transaction structure from port attributes · bae33f2b
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port attributes were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      attribute notifier structures, and converts drivers to not look at this
      member.
      
      In part, this patch contains a revert of my previous commit 2e554a7a
      ("net: dsa: propagate switchdev vlan_filtering prepare phase to
      drivers").
      
      For the most part, the conversion was trivial except for:
      - Rocker's world implementation based on Broadcom OF-DPA had an odd
        implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
        done mechanically, by pasting the implementation twice, then only
        keeping the code that would get executed during prepare phase on top,
        then only keeping the code that gets executed during the commit phase
        on bottom, then simplifying the resulting code until this was obtained.
      - DSA's offloading of STP state, bridge flags, VLAN filtering and
        multicast router could be converted right away. But the ageing time
        could not, so a shim was introduced and this was left for a further
        commit.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      bae33f2b
  14. 08 1月, 2021 4 次提交
    • V
      net: dsa: remove the DSA specific notifiers · 1dbb1302
      Vladimir Oltean 提交于
      This effectively reverts commit 60724d4b ("net: dsa: Add support for
      DSA specific notifiers"). The reason is that since commit 2f1e8ea7
      ("net: dsa: link interfaces with the DSA master to get rid of lockdep
      warnings"), it appears that there is a generic way to achieve the same
      purpose. The only user thus far, the Broadcom SYSTEMPORT driver, was
      converted to use the generic notifiers.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1dbb1302
    • V
      net: dsa: export dsa_slave_dev_check · a5e3c9ba
      Vladimir Oltean 提交于
      Using the NETDEV_CHANGEUPPER notifications, drivers can be aware when
      they are enslaved to e.g. a bridge by calling netif_is_bridge_master().
      
      Export this helper from DSA to get the equivalent functionality of
      determining whether the upper interface of a CHANGEUPPER notifier is a
      DSA switch interface or not.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a5e3c9ba
    • V
      net: dsa: move the Broadcom tag information in a separate header file · f46b9b8e
      Vladimir Oltean 提交于
      It is a bit strange to see something as specific as Broadcom SYSTEMPORT
      bits in the main DSA include file. Move these away into a separate
      header, and have the tagger and the SYSTEMPORT driver include them.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f46b9b8e
    • V
      net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors · d5f19486
      Vladimir Oltean 提交于
      Some DSA switches (and not only) cannot learn source MAC addresses from
      packets injected from the CPU. They only perform hardware address
      learning from inbound traffic.
      
      This can be problematic when we have a bridge spanning some DSA switch
      ports and some non-DSA ports (which we'll call "foreign interfaces" from
      DSA's perspective).
      
      There are 2 classes of problems created by the lack of learning on
      CPU-injected traffic:
      - excessive flooding, due to the fact that DSA treats those addresses as
        unknown
      - the risk of stale routes, which can lead to temporary packet loss
      
      To illustrate the second class, consider the following situation, which
      is common in production equipment (wireless access points, where there
      is a WLAN interface and an Ethernet switch, and these form a single
      bridging domain).
      
       AP 1:
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
             |                                                       ^        ^
             |                                                       |        |
             |                                                       |        |
             |                                                    Client A  Client B
             |
             |
             |
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       AP 2
      
      - br0 of AP 1 will know that Clients A and B are reachable via wlan0
      - the hardware fdb of a DSA switch driver today is not kept in sync with
        the software entries on other bridge ports, so it will not know that
        clients A and B are reachable via the CPU port UNLESS the hardware
        switch itself performs SA learning from traffic injected from the CPU.
        Nonetheless, a substantial number of switches don't.
      - the hardware fdb of the DSA switch on AP 2 may autonomously learn that
        Client A and B are reachable through swp0. Therefore, the software br0
        of AP 2 also may or may not learn this. In the example we're
        illustrating, some Ethernet traffic has been going on, and br0 from AP
        2 has indeed learnt that it can reach Client B through swp0.
      
      One of the wireless clients, say Client B, disconnects from AP 1 and
      roams to AP 2. The topology now looks like this:
      
       AP 1:
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
             |                                                            ^
             |                                                            |
             |                                                         Client A
             |
             |
             |                                                         Client B
             |                                                            |
             |                                                            v
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       AP 2
      
      - br0 of AP 1 still knows that Client A is reachable via wlan0 (no change)
      - br0 of AP 1 will (possibly) know that Client B has left wlan0. There
        are cases where it might never find out though. Either way, DSA today
        does not process that notification in any way.
      - the hardware FDB of the DSA switch on AP 1 may learn autonomously that
        Client B can be reached via swp0, if it receives any packet with
        Client 1's source MAC address over Ethernet.
      - the hardware FDB of the DSA switch on AP 2 still thinks that Client B
        can be reached via swp0. It does not know that it has roamed to wlan0,
        because it doesn't perform SA learning from the CPU port.
      
      Now Client A contacts Client B.
      AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet
      segment.
      AP 2 sees a frame on swp0 and its fdb says that the destination is swp0.
      Hairpinning is disabled => drop.
      
      This problem comes from the fact that these switches have a 'blind spot'
      for addresses coming from software bridging. The generic solution is not
      to assume that hardware learning can be enabled somehow, but to listen
      to more bridge learning events. It turns out that the bridge driver does
      learn in software from all inbound frames, in __br_handle_local_finish.
      A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the
      addresses serviced by the bridge on 'foreign' interfaces. The software
      bridge also does the right thing on migration, by notifying that the old
      entry is deleted, so that does not need to be special-cased in DSA. When
      it is deleted, we just need to delete our static FDB entry towards the
      CPU too, and wait.
      
      The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE
      events received on its own interfaces, such as static FDB entries.
      
      Luckily we can change that, and DSA can listen to all switchdev FDB
      add/del events in the system and figure out if those events were emitted
      by a bridge that spans at least one of DSA's own ports. In case that is
      true, DSA will also offload that address towards its own CPU port, in
      the eventuality that there might be bridge clients attached to the DSA
      switch who want to talk to the station connected to the foreign
      interface.
      
      In terms of implementation, we need to keep the fdb_info->added_by_user
      check for the case where the switchdev event was targeted directly at a
      DSA switch port. But we don't need to look at that flag for snooped
      events. So the check is currently too late, we need to move it earlier.
      This also simplifies the code a bit, since we avoid uselessly allocating
      and freeing switchdev_work.
      
      We could probably do some improvements in the future. For example,
      multi-bridge support is rudimentary at the moment. If there are two
      bridges spanning a DSA switch's ports, and both of them need to service
      the same MAC address, then what will happen is that the migration of one
      of those stations will trigger the deletion of the FDB entry from the
      CPU port while it is still used by other bridge. That could be improved
      with reference counting but is left for another time.
      
      This behavior needs to be enabled at driver level by setting
      ds->assisted_learning_on_cpu_port = true. This is because we don't want
      to inflict a potential performance penalty (accesses through
      MDIO/I2C/SPI are expensive) to hardware that really doesn't need it
      because address learning on the CPU port works there.
      Reported-by: NDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d5f19486
  15. 06 11月, 2020 2 次提交
  16. 05 10月, 2020 4 次提交
  17. 03 10月, 2020 1 次提交
  18. 27 9月, 2020 4 次提交
  19. 19 9月, 2020 3 次提交
  20. 24 7月, 2020 1 次提交
    • V
      net: dsa: stop overriding master's ndo_get_phys_port_name · 5df5661a
      Vladimir Oltean 提交于
      The purpose of this override is to give the user an indication of what
      the number of the CPU port is (in DSA, the CPU port is a hardware
      implementation detail and not a network interface capable of traffic).
      
      However, it has always failed (by design) at providing this information
      to the user in a reliable fashion.
      
      Prior to commit 3369afba ("net: Call into DSA netdevice_ops
      wrappers"), the behavior was to only override this callback if it was
      not provided by the DSA master.
      
      That was its first failure: if the DSA master itself was a DSA port or a
      switchdev, then the user would not see the number of the CPU port in
      /sys/class/net/eth0/phys_port_name, but the number of the DSA master
      port within its respective physical switch.
      
      But that was actually ok in a way. The commit mentioned above changed
      that behavior, and now overrides the master's ndo_get_phys_port_name
      unconditionally. That comes with problems of its own, which are worse in
      a way.
      
      The idea is that it's typical for switchdev users to have udev rules for
      consistent interface naming. These are based, among other things, on
      the phys_port_name attribute. If we let the DSA switch at the bottom
      to start randomly overriding ndo_get_phys_port_name with its own CPU
      port, we basically lose any predictability in interface naming, or even
      uniqueness, for that matter.
      
      So, there are reasons to let DSA override the master's callback (to
      provide a consistent interface, a number which has a clear meaning and
      must not be interpreted according to context), and there are reasons to
      not let DSA override it (it breaks udev matching for the DSA master).
      
      But, there is an alternative method for users to retrieve the number of
      the CPU port of each DSA switch in the system:
      
        $ devlink port
        pci/0000:00:00.5/0: type eth netdev swp0 flavour physical port 0
        pci/0000:00:00.5/2: type eth netdev swp2 flavour physical port 2
        pci/0000:00:00.5/4: type notset flavour cpu port 4
        spi/spi2.0/0: type eth netdev sw0p0 flavour physical port 0
        spi/spi2.0/1: type eth netdev sw0p1 flavour physical port 1
        spi/spi2.0/2: type eth netdev sw0p2 flavour physical port 2
        spi/spi2.0/4: type notset flavour cpu port 4
        spi/spi2.1/0: type eth netdev sw1p0 flavour physical port 0
        spi/spi2.1/1: type eth netdev sw1p1 flavour physical port 1
        spi/spi2.1/2: type eth netdev sw1p2 flavour physical port 2
        spi/spi2.1/3: type eth netdev sw1p3 flavour physical port 3
        spi/spi2.1/4: type notset flavour cpu port 4
      
      So remove this duplicated, unreliable and troublesome method. From this
      patch on, the phys_port_name attribute of the DSA master will only
      contain information about itself (if at all). If the users need reliable
      information about the CPU port they're probably using devlink anyway.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Acked-by: Nflorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5df5661a
  21. 21 7月, 2020 1 次提交
    • F
      net: dsa: Setup dsa_netdev_ops · 9c0c7014
      Florian Fainelli 提交于
      Now that we have all the infrastructure in place for calling into the
      dsa_ptr->netdev_ops function pointers, install them when we configure
      the DSA CPU/management interface and tear them down. The flow is
      unchanged from before, but now we preserve equality of tests when
      network device drivers do tests like dev->netdev_ops == &foo_ops which
      was not the case before since we were allocating an entirely new
      structure.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c0c7014