1. 05 5月, 2020 1 次提交
  2. 28 3月, 2020 1 次提交
    • V
      net: dsa: implement auto-normalization of MTU for bridge hardware datapath · bff33f7e
      Vladimir Oltean 提交于
      Many switches don't have an explicit knob for configuring the MTU
      (maximum transmission unit per interface).  Instead, they do the
      length-based packet admission checks on the ingress interface, for
      reasons that are easy to understand (why would you accept a packet in
      the queuing subsystem if you know you're going to drop it anyway).
      
      So it is actually the MRU that these switches permit configuring.
      
      In Linux there only exists the IFLA_MTU netlink attribute and the
      associated dev_set_mtu function. The comments like to play blind and say
      that it's changing the "maximum transfer unit", which is to say that
      there isn't any directionality in the meaning of the MTU word. So that
      is the interpretation that this patch is giving to things: MTU == MRU.
      
      When 2 interfaces having different MTUs are bridged, the bridge driver
      MTU auto-adjustment logic kicks in: what br_mtu_auto_adjust() does is it
      adjusts the MTU of the bridge net device itself (and not that of the
      slave net devices) to the minimum value of all slave interfaces, in
      order for forwarded packets to not exceed the MTU regardless of the
      interface they are received and send on.
      
      The idea behind this behavior, and why the slave MTUs are not adjusted,
      is that normal termination from Linux over the L2 forwarding domain
      should happen over the bridge net device, which _is_ properly limited by
      the minimum MTU. And termination over individual slave devices is
      possible even if those are bridged. But that is not "forwarding", so
      there's no reason to do normalization there, since only a single
      interface sees that packet.
      
      The problem with those switches that can only control the MRU is with
      the offloaded data path, where a packet received on an interface with
      MRU 9000 would still be forwarded to an interface with MRU 1500. And the
      br_mtu_auto_adjust() function does not really help, since the MTU
      configured on the bridge net device is ignored.
      
      In order to enforce the de-facto MTU == MRU rule for these switches, we
      need to do MTU normalization, which means: in order for no packet larger
      than the MTU configured on this port to be sent, then we need to limit
      the MRU on all ports that this packet could possibly come from. AKA
      since we are configuring the MRU via MTU, it means that all ports within
      a bridge forwarding domain should have the same MTU.
      
      And that is exactly what this patch is trying to do.
      
      >From an implementation perspective, we try to follow the intent of the
      user, otherwise there is a risk that we might livelock them (they try to
      change the MTU on an already-bridged interface, but we just keep
      changing it back in an attempt to keep the MTU normalized). So the MTU
      that the bridge is normalized to is either:
      
       - The most recently changed one:
      
         ip link set dev swp0 master br0
         ip link set dev swp1 master br0
         ip link set dev swp0 mtu 1400
      
         This sequence will make swp1 inherit MTU 1400 from swp0.
      
       - The one of the most recently added interface to the bridge:
      
         ip link set dev swp0 master br0
         ip link set dev swp1 mtu 1400
         ip link set dev swp1 master br0
      
         The above sequence will make swp0 inherit MTU 1400 as well.
      Suggested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bff33f7e
  3. 27 1月, 2020 1 次提交
    • V
      net: dsa: Fix use-after-free in probing of DSA switch tree · 6dc43cd3
      Vladimir Oltean 提交于
      DSA sets up a switch tree little by little. Every switch of the N
      members of the tree calls dsa_register_switch, and (N - 1) will just
      touch the dst->ports list with their ports and quickly exit. Only the
      last switch that calls dsa_register_switch will find all DSA links
      complete in dsa_tree_setup_routing_table, and not return zero as a
      result but instead go ahead and set up the entire DSA switch tree
      (practically on behalf of the other switches too).
      
      The trouble is that the (N - 1) switches don't clean up after themselves
      after they get an error such as EPROBE_DEFER. Their footprint left in
      dst->ports by dsa_switch_touch_ports is still there. And switch N, the
      one responsible with actually setting up the tree, is going to work with
      those stale dp, dp->ds and dp->ds->dev pointers. In particular ds and
      ds->dev might get freed by the device driver.
      
      Be there a 2-switch tree and the following calling order:
      - Switch 1 calls dsa_register_switch
        - Calls dsa_switch_touch_ports, populates dst->ports
        - Calls dsa_port_parse_cpu, gets -EPROBE_DEFER, exits.
      - Switch 2 calls dsa_register_switch
        - Calls dsa_switch_touch_ports, populates dst->ports
        - Probe doesn't get deferred, so it goes ahead.
        - Calls dsa_tree_setup_routing_table, which returns "complete == true"
          due to Switch 1 having called dsa_switch_touch_ports before.
        - Because the DSA links are complete, it calls dsa_tree_setup_switches
          now.
        - dsa_tree_setup_switches iterates through dst->ports, initializing
          the Switch 1 ds structure (invalid) and the Switch 2 ds structure
          (valid).
        - Undefined behavior (use after free, sometimes NULL pointers, etc).
      
      Real example below (debugging prints added by me, as well as guards
      against NULL pointers):
      
      [    5.477947] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.313002] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.319932] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.329693] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.339458] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.349226] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.358991] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.368758] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.378524] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.388291] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.398057] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803df0b980 (dev ffffff803f775c00)
      [    6.407912] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.417682] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.427446] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.437212] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.446979] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.456744] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.466512] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.476277] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.486043] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.495810] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.505577] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803da02f80 (dev 0000000000000000)
      [    6.515433] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.354120] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.361045] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.370805] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.380571] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.390337] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.400104] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.409872] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.419637] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.429403] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      [    7.439169] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803db15b80 (dev ffffff803d8e4800)
      
      The solution is to recognize that the functions that call
      dsa_switch_touch_ports (dsa_switch_parse_of, dsa_switch_parse) have side
      effects, and therefore one should clean up their side effects on error
      path. The cleanup of dst->ports was taken from dsa_switch_remove and
      moved into a dedicated dsa_switch_release_ports function, which should
      really be per-switch (free only the members of dst->ports that are also
      members of ds, instead of all switch ports).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dc43cd3
  4. 09 1月, 2020 1 次提交
    • F
      net: dsa: Get information about stacked DSA protocol · 4d776482
      Florian Fainelli 提交于
      It is possible to stack multiple DSA switches in a way that they are not
      part of the tree (disjoint) but the DSA master of a switch is a DSA
      slave of another. When that happens switch drivers may have to know this
      is the case so as to determine whether their tagging protocol has a
      remove chance of working.
      
      This is useful for specific switch drivers such as b53 where devices
      have been known to be stacked in the wild without the Broadcom tag
      protocol supporting that feature. This allows b53 to continue supporting
      those devices by forcing the disabling of Broadcom tags on the outermost
      switches if necessary.
      
      The get_tag_protocol() function is therefore updated to gain an
      additional enum dsa_tag_protocol argument which denotes the current
      tagging protocol used by the DSA master we are attached to, else
      DSA_TAG_PROTO_NONE for the top of the dsa_switch_tree.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d776482
  5. 18 12月, 2019 1 次提交
  6. 06 11月, 2019 1 次提交
  7. 01 11月, 2019 6 次提交
  8. 29 10月, 2019 2 次提交
  9. 23 10月, 2019 10 次提交
  10. 20 10月, 2019 1 次提交
  11. 16 9月, 2019 1 次提交
    • A
      net: dsa: Fix load order between DSA drivers and taggers · 23426a25
      Andrew Lunn 提交于
      The DSA core, DSA taggers and DSA drivers all make use of
      module_init(). Hence they get initialised at device_initcall() time.
      The ordering is non-deterministic. It can be a DSA driver is bound to
      a device before the needed tag driver has been initialised, resulting
      in the message:
      
      No tagger for this switch
      
      Rather than have this be fatal, return -EPROBE_DEFER so that it is
      tried again later once all the needed drivers have been loaded.
      
      Fixes: d3b8c049 ("dsa: Add boilerplate helper to register DSA tag driver modules")
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23426a25
  12. 03 9月, 2019 1 次提交
    • V
      net: dsa: Fix off-by-one number of calls to devlink_port_unregister · 4ba0ebbc
      Vladimir Oltean 提交于
      When a function such as dsa_slave_create fails, currently the following
      stack trace can be seen:
      
      [    2.038342] sja1105 spi0.1: Probed switch chip: SJA1105T
      [    2.054556] sja1105 spi0.1: Reset switch and programmed static config
      [    2.063837] sja1105 spi0.1: Enabled switch tagging
      [    2.068706] fsl-gianfar soc:ethernet@2d90000 eth2: error -19 setting up slave phy
      [    2.076371] ------------[ cut here ]------------
      [    2.080973] WARNING: CPU: 1 PID: 21 at net/core/devlink.c:6184 devlink_free+0x1b4/0x1c0
      [    2.088954] Modules linked in:
      [    2.092005] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 5.3.0-rc6-01360-g41b52e38d2b6-dirty #1746
      [    2.100912] Hardware name: Freescale LS1021A
      [    2.105162] Workqueue: events deferred_probe_work_func
      [    2.110287] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14)
      [    2.117992] [<c030d8cc>] (show_stack) from [<c10b08d8>] (dump_stack+0xb4/0xc8)
      [    2.125180] [<c10b08d8>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8)
      [    2.132018] [<c0349d04>] (__warn) from [<c0349e34>] (warn_slowpath_null+0x40/0x48)
      [    2.139549] [<c0349e34>] (warn_slowpath_null) from [<c0f19d74>] (devlink_free+0x1b4/0x1c0)
      [    2.147772] [<c0f19d74>] (devlink_free) from [<c1064fc0>] (dsa_switch_teardown+0x60/0x6c)
      [    2.155907] [<c1064fc0>] (dsa_switch_teardown) from [<c1065950>] (dsa_register_switch+0x8e4/0xaa8)
      [    2.164821] [<c1065950>] (dsa_register_switch) from [<c0ba7fe4>] (sja1105_probe+0x21c/0x2ec)
      [    2.173216] [<c0ba7fe4>] (sja1105_probe) from [<c0b35948>] (spi_drv_probe+0x80/0xa4)
      [    2.180920] [<c0b35948>] (spi_drv_probe) from [<c0a4c1cc>] (really_probe+0x108/0x400)
      [    2.188711] [<c0a4c1cc>] (really_probe) from [<c0a4c694>] (driver_probe_device+0x78/0x1bc)
      [    2.196933] [<c0a4c694>] (driver_probe_device) from [<c0a4a3dc>] (bus_for_each_drv+0x58/0xb8)
      [    2.205414] [<c0a4a3dc>] (bus_for_each_drv) from [<c0a4c024>] (__device_attach+0xd0/0x168)
      [    2.213637] [<c0a4c024>] (__device_attach) from [<c0a4b1d0>] (bus_probe_device+0x84/0x8c)
      [    2.221772] [<c0a4b1d0>] (bus_probe_device) from [<c0a4b72c>] (deferred_probe_work_func+0x84/0xc4)
      [    2.230686] [<c0a4b72c>] (deferred_probe_work_func) from [<c03650a4>] (process_one_work+0x218/0x510)
      [    2.239772] [<c03650a4>] (process_one_work) from [<c03660d8>] (worker_thread+0x2a8/0x5c0)
      [    2.247908] [<c03660d8>] (worker_thread) from [<c036b348>] (kthread+0x148/0x150)
      [    2.255265] [<c036b348>] (kthread) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
      [    2.262444] Exception stack(0xea965fb0 to 0xea965ff8)
      [    2.267466] 5fa0:                                     00000000 00000000 00000000 00000000
      [    2.275598] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    2.283729] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
      [    2.290333] ---[ end trace ca5d506728a0581a ]---
      
      devlink_free is complaining right here:
      
      	WARN_ON(!list_empty(&devlink->port_list));
      
      This happens because devlink_port_unregister is no longer done right
      away in dsa_port_setup when a DSA_PORT_TYPE_USER has failed.
      Vivien said about this change that:
      
          Also no need to call devlink_port_unregister from within dsa_port_setup
          as this step is inconditionally handled by dsa_port_teardown on error.
      
      which is not really true. The devlink_port_unregister function _is_
      being called unconditionally from within dsa_port_setup, but not for
      this port that just failed, just for the previous ones which were set
      up.
      
      ports_teardown:
      	for (i = 0; i < port; i++)
      		dsa_port_teardown(&ds->ports[i]);
      
      Initially I was tempted to fix this by extending the "for" loop to also
      cover the port that failed during setup. But this could have potentially
      unforeseen consequences unrelated to devlink_port or even other types of
      ports than user ports, which I can't really test for. For example, if
      for some reason devlink_port_register itself would fail, then
      unconditionally unregistering it in dsa_port_teardown would not be a
      smart idea. The list might go on.
      
      So just make dsa_port_setup undo the setup it had done upon failure, and
      let the for loop undo the work of setting up the previous ports, which
      are guaranteed to be brought up to a consistent state.
      
      Fixes: 955222ca ("net: dsa: use a single switch statement for port setup")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ba0ebbc
  13. 28 8月, 2019 1 次提交
    • V
      net: dsa: remove bitmap operations · e65d45cc
      Vivien Didelot 提交于
      The bitmap operations were introduced to simplify the switch drivers
      in the future, since most of them could implement the common VLAN and
      MDB operations (add, del, dump) with simple functions taking all target
      ports at once, and thus limiting the number of hardware accesses.
      
      Programming an MDB or VLAN this way in a single operation would clearly
      simplify the drivers a lot but would require a new get-set interface
      in DSA. The usage of such bitmap from the stack also raised concerned
      in the past, leading to the dynamic allocation of a new ds->_bitmap
      member in the dsa_switch structure. So let's get rid of them for now.
      
      This commit nicely wraps the ds->ops->port_{mdb,vlan}_{prepare,add}
      switch operations into new dsa_switch_{mdb,vlan}_{prepare,add}
      variants not using any bitmap argument anymore.
      
      New dsa_switch_{mdb,vlan}_match helpers have been introduced to make
      clear which local port of a switch must be programmed with the target
      object. While the targeted user port is an obvious candidate, the
      DSA links must also be programmed, as well as the CPU port for VLANs.
      
      While at it, also remove local variables that are only used once.
      Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e65d45cc
  14. 21 8月, 2019 2 次提交
  15. 09 6月, 2019 1 次提交
  16. 31 5月, 2019 2 次提交
  17. 06 5月, 2019 2 次提交
    • V
      net: dsa: Allow drivers to filter packets they can decode source port from · cc1939e4
      Vladimir Oltean 提交于
      Frames get processed by DSA and redirected to switch port net devices
      based on the ETH_P_XDSA multiplexed packet_type handler found by the
      network stack when calling eth_type_trans().
      
      The running assumption is that once the DSA .rcv function is called, DSA
      is always able to decode the switch tag in order to change the skb->dev
      from its master.
      
      However there are tagging protocols (such as the new DSA_TAG_PROTO_SJA1105,
      user of DSA_TAG_PROTO_8021Q) where this assumption is not completely
      true, since switch tagging piggybacks on the absence of a vlan_filtering
      bridge. Moreover, management traffic (BPDU, PTP) for this switch doesn't
      rely on switch tagging, but on a different mechanism. So it would make
      sense to at least be able to terminate that.
      
      Having DSA receive traffic it can't decode would put it in an impossible
      situation: the eth_type_trans() function would invoke the DSA .rcv(),
      which could not change skb->dev, then eth_type_trans() would be invoked
      again, which again would call the DSA .rcv, and the packet would never
      be able to exit the DSA filter and would spiral in a loop until the
      whole system dies.
      
      This happens because eth_type_trans() doesn't actually look at the skb
      (so as to identify a potential tag) when it deems it as being
      ETH_P_XDSA. It just checks whether skb->dev has a DSA private pointer
      installed (therefore it's a DSA master) and that there exists a .rcv
      callback (everybody except DSA_TAG_PROTO_NONE has that). This is
      understandable as there are many switch tags out there, and exhaustively
      checking for all of them is far from ideal.
      
      The solution lies in introducing a filtering function for each tagging
      protocol. In the absence of a filtering function, all traffic is passed
      to the .rcv DSA callback. The tagging protocol should see the filtering
      function as a pre-validation that it can decode the incoming skb. The
      traffic that doesn't match the filter will bypass the DSA .rcv callback
      and be left on the master netdevice, which wasn't previously possible.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc1939e4
    • V
      net: dsa: Call driver's setup callback after setting up its switchdev notifier · b2243b36
      Vladimir Oltean 提交于
      This allows the driver to perform some manipulations of its own during
      setup, using generic switchdev calls. Having the notifiers registered at
      setup time is important because otherwise any switchdev transaction
      emitted during this time would be ignored (dispatched to an empty call
      chain).
      
      One current usage scenario is for the driver to request DSA to set up
      802.1Q based switch tagging for its ports.
      
      There is no danger for the driver setup code to start racing now with
      switchdev events emitted from the network stack (such as bridge core)
      even if the notifier is registered earlier. This is because the network
      stack needs a net_device as a vehicle to perform switchdev operations,
      and the slave net_devices are registered later than the core driver
      setup anyway (ds->ops->setup in dsa_switch_setup vs dsa_port_setup).
      
      Luckily DSA doesn't need a net_device to carry out switchdev callbacks,
      and therefore drivers shouldn't assume either that net_devices are
      available at the time their switchdev callbacks get invoked.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>-
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2243b36
  18. 29 4月, 2019 2 次提交
  19. 05 4月, 2019 2 次提交
  20. 02 4月, 2019 1 次提交