1. 06 7月, 2020 1 次提交
  2. 01 7月, 2020 1 次提交
  3. 25 6月, 2020 1 次提交
    • D
      dsa: Allow forwarding of redirected IGMP traffic · 1ed9ec9b
      Daniel Mack 提交于
      The driver for Marvell switches puts all ports in IGMP snooping mode
      which results in all IGMP/MLD frames that ingress on the ports to be
      forwarded to the CPU only.
      
      The bridge code in the kernel can then interpret these frames and act
      upon them, for instance by updating the mdb in the switch to reflect
      multicast memberships of stations connected to the ports. However,
      the IGMP/MLD frames must then also be forwarded to other ports of the
      bridge so external IGMP queriers can track membership reports, and
      external multicast clients can receive query reports from foreign IGMP
      queriers.
      
      Currently, this is impossible as the EDSA tagger sets offload_fwd_mark
      on the skb when it unwraps the tagged frames, and that will make the
      switchdev layer prevent the skb from egressing on any other port of
      the same switch.
      
      To fix that, look at the To_CPU code in the DSA header and make
      forwarding of the frame possible for trapped IGMP packets.
      
      Introduce some #defines for the frame types to make the code a bit more
      comprehensive.
      
      This was tested on a Marvell 88E6352 variant.
      Signed-off-by: NDaniel Mack <daniel@zonque.org>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ed9ec9b
  4. 14 6月, 2020 1 次提交
    • M
      treewide: replace '---help---' in Kconfig files with 'help' · a7f7f624
      Masahiro Yamada 提交于
      Since commit 84af7a61 ("checkpatch: kconfig: prefer 'help' over
      '---help---'"), the number of '---help---' has been gradually
      decreasing, but there are still more than 2400 instances.
      
      This commit finishes the conversion. While I touched the lines,
      I also fixed the indentation.
      
      There are a variety of indentation styles found.
      
        a) 4 spaces + '---help---'
        b) 7 spaces + '---help---'
        c) 8 spaces + '---help---'
        d) 1 space + 1 tab + '---help---'
        e) 1 tab + '---help---'    (correct indentation)
        f) 1 tab + 1 space + '---help---'
        g) 1 tab + 2 spaces + '---help---'
      
      In order to convert all of them to 1 tab + 'help', I ran the
      following commend:
      
        $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      a7f7f624
  5. 10 6月, 2020 1 次提交
    • C
      net: change addr_list_lock back to static key · 845e0ebb
      Cong Wang 提交于
      The dynamic key update for addr_list_lock still causes troubles,
      for example the following race condition still exists:
      
      CPU 0:				CPU 1:
      (RCU read lock)			(RTNL lock)
      dev_mc_seq_show()		netdev_update_lockdep_key()
      				  -> lockdep_unregister_key()
       -> netif_addr_lock_bh()
      
      because lockdep doesn't provide an API to update it atomically.
      Therefore, we have to move it back to static keys and use subclass
      for nest locking like before.
      
      In commit 1a33e10e ("net: partially revert dynamic lockdep key
      changes"), I already reverted most parts of commit ab92d68f
      ("net: core: add generic lockdep keys").
      
      This patch reverts the rest and also part of commit f3b0a18b
      ("net: remove unnecessary variables and callback"). After this
      patch, addr_list_lock changes back to using static keys and
      subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
      not have to change back to ->ndo_get_lock_subclass().
      
      And hopefully this reduces some syzbot lockdep noises too.
      
      Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
      Cc: Taehee Yoo <ap420073@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      845e0ebb
  6. 30 5月, 2020 1 次提交
  7. 28 5月, 2020 1 次提交
    • V
      net: dsa: declare lockless TX feature for slave ports · 2b86cb82
      Vladimir Oltean 提交于
      Be there a platform with the following layout:
      
            Regular NIC
             |
             +----> DSA master for switch port
                     |
                     +----> DSA master for another switch port
      
      After changing DSA back to static lockdep class keys in commit
      1a33e10e ("net: partially revert dynamic lockdep key changes"), this
      kernel splat can be seen:
      
      [   13.361198] ============================================
      [   13.366524] WARNING: possible recursive locking detected
      [   13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
      [   13.377874] --------------------------------------------
      [   13.383201] swapper/0/0 is trying to acquire lock:
      [   13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.397879]
      [   13.397879] but task is already holding lock:
      [   13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.413593]
      [   13.413593] other info that might help us debug this:
      [   13.420140]  Possible unsafe locking scenario:
      [   13.420140]
      [   13.426075]        CPU0
      [   13.428523]        ----
      [   13.430969]   lock(&dsa_slave_netdev_xmit_lock_key);
      [   13.435946]   lock(&dsa_slave_netdev_xmit_lock_key);
      [   13.440924]
      [   13.440924]  *** DEADLOCK ***
      [   13.440924]
      [   13.446860]  May be due to missing lock nesting notation
      [   13.446860]
      [   13.453668] 6 locks held by swapper/0/0:
      [   13.457598]  #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
      [   13.466593]  #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
      [   13.474803]  #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
      [   13.483886]  #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
      [   13.492793]  #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.503094]  #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
      [   13.512000]
      [   13.512000] stack backtrace:
      [   13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
      [   13.530421] Call trace:
      [   13.532871]  dump_backtrace+0x0/0x1d8
      [   13.536539]  show_stack+0x24/0x30
      [   13.539862]  dump_stack+0xe8/0x150
      [   13.543271]  __lock_acquire+0x1030/0x1678
      [   13.547290]  lock_acquire+0xf8/0x458
      [   13.550873]  _raw_spin_lock+0x44/0x58
      [   13.554543]  __dev_queue_xmit+0x84c/0xbe0
      [   13.558562]  dev_queue_xmit+0x24/0x30
      [   13.562232]  dsa_slave_xmit+0xe0/0x128
      [   13.565988]  dev_hard_start_xmit+0xf4/0x448
      [   13.570182]  __dev_queue_xmit+0x808/0xbe0
      [   13.574200]  dev_queue_xmit+0x24/0x30
      [   13.577869]  neigh_resolve_output+0x15c/0x220
      [   13.582237]  ip6_finish_output2+0x244/0xb10
      [   13.586430]  __ip6_finish_output+0x1dc/0x298
      [   13.590709]  ip6_output+0x84/0x358
      [   13.594116]  mld_sendpack+0x2bc/0x560
      [   13.597786]  mld_ifc_timer_expire+0x210/0x390
      [   13.602153]  call_timer_fn+0xcc/0x400
      [   13.605822]  run_timer_softirq+0x588/0x6e0
      [   13.609927]  __do_softirq+0x118/0x590
      [   13.613597]  irq_exit+0x13c/0x148
      [   13.616918]  __handle_domain_irq+0x6c/0xc0
      [   13.621023]  gic_handle_irq+0x6c/0x160
      [   13.624779]  el1_irq+0xbc/0x180
      [   13.627927]  cpuidle_enter_state+0xb4/0x4d0
      [   13.632120]  cpuidle_enter+0x3c/0x50
      [   13.635703]  call_cpuidle+0x44/0x78
      [   13.639199]  do_idle+0x228/0x2c8
      [   13.642433]  cpu_startup_entry+0x2c/0x48
      [   13.646363]  rest_init+0x1ac/0x280
      [   13.649773]  arch_call_rest_init+0x14/0x1c
      [   13.653878]  start_kernel+0x490/0x4bc
      
      Lockdep keys themselves were added in commit ab92d68f ("net: core:
      add generic lockdep keys"), and it's very likely that this splat existed
      since then, but I have no real way to check, since this stacked platform
      wasn't supported by mainline back then.
      
      >From Taehee's own words:
      
        This patch was considered that all stackable devices have LLTX flag.
        But the dsa doesn't have LLTX, so this splat happened.
        After this patch, dsa shares the same lockdep class key.
        On the nested dsa interface architecture, which you illustrated,
        the same lockdep class key will be used in __dev_queue_xmit() because
        dsa doesn't have LLTX.
        So that lockdep detects deadlock because the same lockdep class key is
        used recursively although actually the different locks are used.
        There are some ways to fix this problem.
      
        1. using NETIF_F_LLTX flag.
        If possible, using the LLTX flag is a very clear way for it.
        But I'm so sorry I don't know whether the dsa could have LLTX or not.
      
        2. using dynamic lockdep again.
        It means that each interface uses a separate lockdep class key.
        So, lockdep will not detect recursive locking.
        But this way has a problem that it could consume lockdep class key
        too many.
        Currently, lockdep can have 8192 lockdep class keys.
         - you can see this number with the following command.
           cat /proc/lockdep_stats
           lock-classes:                         1251 [max: 8192]
           ...
           The [max: 8192] means that the maximum number of lockdep class keys.
        If too many lockdep class keys are registered, lockdep stops to work.
        So, using a dynamic(separated) lockdep class key should be considered
        carefully.
        In addition, updating lockdep class key routine might have to be existing.
        (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())
      
        3. Using lockdep subclass.
        A lockdep class key could have 8 subclasses.
        The different subclass is considered different locks by lockdep
        infrastructure.
        But "lock-classes" is not counted by subclasses.
        So, it could avoid stopping lockdep infrastructure by an overflow of
        lockdep class keys.
        This approach should also have an updating lockdep class key routine.
        (lockdep_set_subclass())
      
        4. Using nonvalidate lockdep class key.
        The lockdep infrastructure supports nonvalidate lockdep class key type.
        It means this lockdep is not validated by lockdep infrastructure.
        So, the splat will not happen but lockdep couldn't detect real deadlock
        case because lockdep really doesn't validate it.
        I think this should be used for really special cases.
        (lockdep_set_novalidate_class())
      
      Further discussion here:
      https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/
      
      There appears to be no negative side-effect to declaring lockless TX for
      the DSA virtual interfaces, which means they handle their own locking.
      So that's what we do to make the splat go away.
      
      Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.
      
      Fixes: ab92d68f ("net: core: add generic lockdep keys")
      Suggested-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b86cb82
  8. 17 5月, 2020 1 次提交
    • D
      net: dsa: mt7530: fix roaming from DSA user ports · 5e5502e0
      DENG Qingfang 提交于
      When a client moves from a DSA user port to a software port in a bridge,
      it cannot reach any other clients that connected to the DSA user ports.
      That is because SA learning on the CPU port is disabled, so the switch
      ignores the client's frames from the CPU port and still thinks it is at
      the user port.
      
      Fix it by enabling SA learning on the CPU port.
      
      To prevent the switch from learning from flooding frames from the CPU
      port, set skb->offload_fwd_mark to 1 for unicast and broadcast frames,
      and let the switch flood them instead of trapping to the CPU port.
      Multicast frames still need to be trapped to the CPU port for snooping,
      so set the SA_DIS bit of the MTK tag to 1 when transmitting those frames
      to disable SA learning.
      
      Fixes: b8f126a8 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
      Signed-off-by: NDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e5502e0
  9. 13 5月, 2020 7 次提交
    • V
      net: dsa: tag_sja1105: appease sparse checks for ethertype accessors · fb9f2e92
      Vladimir Oltean 提交于
      A comparison between a value from the packet and an integer constant
      value needs to be done by converting the value from the packet from
      net->host, or the constant from host->net. Not the other way around.
      Even though it makes no practical difference, correct that.
      
      Fixes: 38b5beea ("net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb9f2e92
    • V
      net: dsa: tag_sja1105: implement sub-VLAN decoding · 84eeb5d4
      Vladimir Oltean 提交于
      Create a subvlan_map as part of each port's tagger private structure.
      This keeps reverse mappings of bridge-to-dsa_8021q VLAN retagging rules.
      
      Note that as of this patch, this piece of code is never engaged, due to
      the fact that the driver hasn't installed any retagging rule, so we'll
      always see packets with a subvlan code of 0 (untagged).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84eeb5d4
    • V
      net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs · 3eaae1d0
      Vladimir Oltean 提交于
      For switches that support VLAN retagging, such as sja1105, we extend
      dsa_8021q by encoding a "sub-VLAN" into the remaining 3 free bits in the
      dsa_8021q tag.
      
      A sub-VLAN is nothing more than a number in the range 0-7, which serves
      as an index into a per-port driver lookup table. The sub-VLAN value of
      zero means that traffic is untagged (this is also backwards-compatible
      with dsa_8021q without retagging).
      
      The switch should be configured to retag VLAN-tagged traffic that gets
      transmitted towards the CPU port (and towards the CPU only). Example:
      
      bridge vlan add dev sw1p0 vid 100
      
      The switch retags frames received on port 0, going to the CPU, and
      having VID 100, to the VID of 1104 (0x0450). In dsa_8021q language:
      
       | 11  | 10  |  9  |  8  |  7  |  6  |  5  |  4  |  3  |  2  |  1  |  0  |
       +-----------+-----+-----------------+-----------+-----------------------+
       |    DIR    | SVL |    SWITCH_ID    |  SUBVLAN  |          PORT         |
       +-----------+-----+-----------------+-----------+-----------------------+
      
      0x0450 means:
       - DIR = 0b01: this is an RX VLAN
       - SUBVLAN = 0b001: this is subvlan #1
       - SWITCH_ID = 0b001: this is switch 1 (see the name "sw1p0")
       - PORT = 0b0000: this is port 0 (see the name "sw1p0")
      
      The driver also remembers the "1 -> 100" mapping. In the hotpath, if the
      sub-VLAN from the tag encodes a non-untagged frame, this mapping is used
      to create a VLAN hwaccel tag, with the value of 100.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3eaae1d0
    • V
      net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously · 38b5beea
      Vladimir Oltean 提交于
      In VLAN-unaware mode, sja1105 uses VLAN tags with a custom TPID of
      0xdadb. While in the yet-to-be introduced best_effort_vlan_filtering
      mode, it needs to work with normal VLAN TPID values.
      
      A complication arises when we must transmit a VLAN-tagged packet to the
      switch when it's in VLAN-aware mode. We need to construct a packet with
      2 VLAN tags, and the switch will use the outer header for routing and
      pop it on egress. But sadly, here the 2 hardware generations don't
      behave the same:
      
      - E/T switches won't pop an ETH_P_8021AD tag on egress, it seems
        (packets will remain double-tagged).
      - P/Q/R/S switches will drop a packet with 2 ETH_P_8021Q tags (it looks
        like it tries to prevent VLAN hopping).
      
      But looks like the reverse is also true:
      
      - E/T switches have no problem popping the outer tag from packets with
        2 ETH_P_8021Q tags.
      - P/Q/R/S will have no problem popping a single tag even if that is
        ETH_P_8021AD.
      
      So it is clear that if we want the hardware to work with dsa_8021q
      tagging in VLAN-aware mode, we need to send different TPIDs depending on
      revision. Keep that information in priv->info->qinq_tpid.
      
      The per-port tagger structure will hold an xmit_tpid value that depends
      not only upon the qinq_tpid, but also upon the VLAN awareness state
      itself (in case we must transmit using 0xdadb).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38b5beea
    • V
      net: dsa: sja1105: save/restore VLANs using a delta commit method · ec5ae610
      Vladimir Oltean 提交于
      Managing the VLAN table that is present in hardware will become very
      difficult once we add a third operating state
      (best_effort_vlan_filtering). That is because correct cleanup (not too
      little, not too much) becomes virtually impossible, when VLANs can be
      added from the bridge layer, from dsa_8021q for basic tagging, for
      cross-chip bridging, as well as retagging rules for sub-VLANs and
      cross-chip sub-VLANs. So we need to rethink VLAN interaction with the
      switch in a more scalable way.
      
      In preparation for that, use the priv->expect_dsa_8021q boolean to
      classify any VLAN request received through .port_vlan_add or
      .port_vlan_del towards either one of 2 internal lists: bridge VLANs and
      dsa_8021q VLANs.
      
      Then, implement a central sja1105_build_vlan_table method that creates a
      VLAN configuration from scratch based on the 2 lists of VLANs kept by
      the driver, and based on the VLAN awareness state. Currently, if we are
      VLAN-unaware, install the dsa_8021q VLANs, otherwise the bridge VLANs.
      
      Then, implement a delta commit procedure that identifies which VLANs
      from this new configuration are actually different from the config
      previously committed to hardware. We apply the delta through the dynamic
      configuration interface (we don't reset the switch). The result is that
      the hardware should see the exact sequence of operations as before this
      patch.
      
      This also helps remove the "br" argument passed to
      dsa_8021q_crosschip_bridge_join, which it was only using to figure out
      whether it should commit the configuration back to us or not, based on
      the VLAN awareness state of the bridge. We can simplify that, by always
      allowing those VLANs inside of our dsa_8021q_vlans list, and committing
      those to hardware when necessary.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec5ae610
    • V
      net: dsa: tag_8021q: introduce a vid_is_dsa_8021q helper · 1f66b0f0
      Vladimir Oltean 提交于
      This function returns a boolean denoting whether the VLAN passed as
      argument is part of the 1024-3071 range that the dsa_8021q tagging
      scheme uses.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f66b0f0
    • R
      net: dsa: provide an option for drivers to always receive bridge VLANs · 54a0ed0d
      Russell King 提交于
      DSA assumes that a bridge which has vlan filtering disabled is not
      vlan aware, and ignores all vlan configuration. However, the kernel
      software bridge code allows configuration in this state.
      
      This causes the kernel's idea of the bridge vlan state and the
      hardware state to disagree, so "bridge vlan show" indicates a correct
      configuration but the hardware lacks all configuration. Even worse,
      enabling vlan filtering on a DSA bridge immediately blocks all traffic
      which, given the output of "bridge vlan show", is very confusing.
      
      Provide an option that drivers can set to indicate they want to receive
      vlan configuration even when vlan filtering is disabled. At the very
      least, this is safe for Marvell DSA bridges, which do not look up
      ingress traffic in the VTU if the port is in 8021Q disabled state. It is
      also safe for the Ocelot switch family. Whether this change is suitable
      for all DSA bridges is not known.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54a0ed0d
  10. 12 5月, 2020 2 次提交
  11. 11 5月, 2020 3 次提交
    • V
      net: dsa: sja1105: implement cross-chip bridging operations · ac02a451
      Vladimir Oltean 提交于
      sja1105 uses dsa_8021q for DSA tagging, a format which is VLAN at heart
      and which is compatible with cascading. A complete description of this
      tagging format is in net/dsa/tag_8021q.c, but a quick summary is that
      each external-facing port tags incoming frames with a unique pvid, and
      this special VLAN is transmitted as tagged towards the inside of the
      system, and as untagged towards the exterior. The tag encodes the switch
      id and the source port index.
      
      This means that cross-chip bridging for dsa_8021q only entails adding
      the dsa_8021q pvids of one switch to the RX filter of the other
      switches. Everything else falls naturally into place, as long as the
      bottom-end of ports (the leaves in the tree) is comprised exclusively of
      dsa_8021q-compatible (i.e. sja1105 switches). Otherwise, there would be
      a chance that a front-panel switch transmits a packet tagged with a
      dsa_8021q header, header which it wouldn't be able to remove, and which
      would hence "leak" out.
      
      The only use case I tested (due to lack of board availability) was when
      the sja1105 switches are part of disjoint trees (however, this doesn't
      change the fact that multiple sja1105 switches still need unique switch
      identifiers in such a system). But in principle, even "true" single-tree
      setups (with DSA links) should work just as fine, except for a small
      change which I can't test: dsa_towards_port should be used instead of
      dsa_upstream_port (I made the assumption that the routing port that any
      sja1105 should use towards its neighbours is the CPU port. That might
      not hold true in other setups).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ac02a451
    • V
      net: dsa: introduce a dsa_switch_find function · 3b7bc1f0
      Vladimir Oltean 提交于
      Somewhat similar to dsa_tree_find, dsa_switch_find returns a dsa_switch
      structure pointer by searching for its tree index and switch index (the
      parameters from dsa,member). To be used, for example, by drivers who
      implement .crosschip_bridge_join and need a reference to the other
      switch indicated to by the tree_index and sw_index arguments.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3b7bc1f0
    • V
      net: dsa: permit cross-chip bridging between all trees in the system · f66a6a69
      Vladimir Oltean 提交于
      One way of utilizing DSA is by cascading switches which do not all have
      compatible taggers. Consider the following real-life topology:
      
            +---------------------------------------------------------------+
            | LS1028A                                                       |
            |               +------------------------------+                |
            |               |      DSA master for Felix    |                |
            |               |(internal ENETC port 2: eno2))|                |
            |  +------------+------------------------------+-------------+  |
            |  | Felix embedded L2 switch                                |  |
            |  |                                                         |  |
            |  | +--------------+   +--------------+   +--------------+  |  |
            |  | |DSA master for|   |DSA master for|   |DSA master for|  |  |
            |  | |  SJA1105 1   |   |  SJA1105 2   |   |  SJA1105 3   |  |  |
            |  | |(Felix port 1)|   |(Felix port 2)|   |(Felix port 3)|  |  |
            +--+-+--------------+---+--------------+---+--------------+--+--+
      
      +-----------------------+ +-----------------------+ +-----------------------+
      |   SJA1105 switch 1    | |   SJA1105 switch 2    | |   SJA1105 switch 3    |
      +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
      |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3|
      +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
      
      The above can be described in the device tree as follows (obviously not
      complete):
      
      mscc_felix {
      	dsa,member = <0 0>;
      	ports {
      		port@4 {
      			ethernet = <&enetc_port2>;
      		};
      	};
      };
      
      sja1105_switch1 {
      	dsa,member = <1 1>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port1>;
      		};
      	};
      };
      
      sja1105_switch2 {
      	dsa,member = <2 2>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port2>;
      		};
      	};
      };
      
      sja1105_switch3 {
      	dsa,member = <3 3>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port3>;
      		};
      	};
      };
      
      Basically we instantiate one DSA switch tree for every hardware switch
      in the system, but we still give them globally unique switch IDs (will
      come back to that later). Having 3 disjoint switch trees makes the
      tagger drivers "just work", because net devices are registered for the
      3 Felix DSA master ports, and they are also DSA slave ports to the ENETC
      port. So packets received on the ENETC port are stripped of their
      stacked DSA tags one by one.
      
      Currently, hardware bridging between ports on the same sja1105 chip is
      possible, but switching between sja1105 ports on different chips is
      handled by the software bridge. This is fine, but we can do better.
      
      In fact, the dsa_8021q tag used by sja1105 is compatible with cascading.
      In other words, a sja1105 switch can correctly parse and route a packet
      containing a dsa_8021q tag. So if we could enable hardware bridging on
      the Felix DSA master ports, cross-chip bridging could be completely
      offloaded.
      
      Such as system would be used as follows:
      
      ip link add dev br0 type bridge && ip link set dev br0 up
      for port in sw0p0 sw0p1 sw0p2 sw0p3 \
      	    sw1p0 sw1p1 sw1p2 sw1p3 \
      	    sw2p0 sw2p1 sw2p2 sw2p3; do
      	ip link set dev $port master br0
      done
      
      The above makes switching between ports on the same row be performed in
      hardware, and between ports on different rows in software. Now assume
      the Felix switch ports are called swp0, swp1, swp2. By running the
      following extra commands:
      
      ip link add dev br1 type bridge && ip link set dev br1 up
      for port in swp0 swp1 swp2; do
      	ip link set dev $port master br1
      done
      
      the CPU no longer sees packets which traverse sja1105 switch boundaries
      and can be forwarded directly by Felix. The br1 bridge would not be used
      for any sort of traffic termination.
      
      For this to work, we need to give drivers an opportunity to listen for
      bridging events on DSA trees other than their own, and pass that other
      tree index as argument. I have made the assumption, for the moment, that
      the other existing DSA notifiers don't need to be broadcast to other
      trees. That assumption might turn out to be incorrect. But in the
      meantime, introduce a dsa_broadcast function, similar in purpose to
      dsa_port_notify, which is used only by the bridging notifiers.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f66a6a69
  12. 08 5月, 2020 3 次提交
    • E
      netpoll: accept NULL np argument in netpoll_send_skb() · f78ed220
      Eric Dumazet 提交于
      netpoll_send_skb() callers seem to leak skb if
      the np pointer is NULL. While this should not happen, we
      can make the code more robust.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f78ed220
    • J
      net: remove newlines in NL_SET_ERR_MSG_MOD · c75a33c8
      Jacob Keller 提交于
      The NL_SET_ERR_MSG_MOD macro is used to report a string describing an
      error message to userspace via the netlink extended ACK structure. It
      should not have a trailing newline.
      
      Add a cocci script which catches cases where the newline marker is
      present. Using this script, fix the handful of cases which accidentally
      included a trailing new line.
      
      I couldn't figure out a way to get a patch mode working, so this script
      only implements context, report, and org.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c75a33c8
    • V
      net: dsa: introduce a dsa_port_from_netdev public helper · e1eea811
      Vladimir Oltean 提交于
      As its implementation shows, this is synonimous with calling
      dsa_slave_dev_check followed by dsa_slave_to_port, so it is quite simple
      already and provides functionality which is already there.
      
      However there is now a need for these functions outside dsa_priv.h, for
      example in drivers that perform mirroring and redirection through
      tc-flower offloads (they are given raw access to the flow_cls_offload
      structure), where they need to call this function on act->dev.
      
      But simply exporting dsa_slave_to_port would make it non-inline and
      would result in an extra function call in the hotpath, as can be seen
      for example in sja1105:
      
      Before:
      
      000006dc <sja1105_xmit>:
      {
       6dc:	e92d4ff0 	push	{r4, r5, r6, r7, r8, r9, sl, fp, lr}
       6e0:	e1a04000 	mov	r4, r0
       6e4:	e591958c 	ldr	r9, [r1, #1420]	; 0x58c <- Inline dsa_slave_to_port
       6e8:	e1a05001 	mov	r5, r1
       6ec:	e24dd004 	sub	sp, sp, #4
      	u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index);
       6f0:	e1c901d8 	ldrd	r0, [r9, #24]
       6f4:	ebfffffe 	bl	0 <dsa_8021q_tx_vid>
      			6f4: R_ARM_CALL	dsa_8021q_tx_vid
      	u8 pcp = netdev_txq_to_tc(netdev, queue_mapping);
       6f8:	e1d416b0 	ldrh	r1, [r4, #96]	; 0x60
      	u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index);
       6fc:	e1a08000 	mov	r8, r0
      
      After:
      
      000006e4 <sja1105_xmit>:
      {
       6e4:	e92d4ff0 	push	{r4, r5, r6, r7, r8, r9, sl, fp, lr}
       6e8:	e1a04000 	mov	r4, r0
       6ec:	e24dd004 	sub	sp, sp, #4
      	struct dsa_port *dp = dsa_slave_to_port(netdev);
       6f0:	e1a00001 	mov	r0, r1
      {
       6f4:	e1a05001 	mov	r5, r1
      	struct dsa_port *dp = dsa_slave_to_port(netdev);
       6f8:	ebfffffe 	bl	0 <dsa_slave_to_port>
      			6f8: R_ARM_CALL	dsa_slave_to_port
       6fc:	e1a09000 	mov	r9, r0
      	u16 tx_vid = dsa_8021q_tx_vid(dp->ds, dp->index);
       700:	e1c001d8 	ldrd	r0, [r0, #24]
       704:	ebfffffe 	bl	0 <dsa_8021q_tx_vid>
      			704: R_ARM_CALL	dsa_8021q_tx_vid
      
      Because we want to avoid possible performance regressions, introduce
      this new function which is designed to be public.
      Suggested-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1eea811
  13. 07 5月, 2020 2 次提交
  14. 05 5月, 2020 2 次提交
  15. 25 4月, 2020 1 次提交
  16. 24 4月, 2020 1 次提交
    • A
      net: dsa: add GRO support via gro_cells · e131a563
      Alexander Lobakin 提交于
      gro_cells lib is used by different encapsulating netdevices, such as
      geneve, macsec, vxlan etc. to speed up decapsulated traffic processing.
      CPU tag is a sort of "encapsulation", and we can use the same mechs to
      greatly improve overall DSA performance.
      skbs are passed to the GRO layer after removing CPU tags, so we don't
      need any new packet offload types as it was firstly proposed by me in
      the first GRO-over-DSA variant [1].
      
      The size of struct gro_cells is sizeof(void *), so hot struct
      dsa_slave_priv becomes only 4/8 bytes bigger, and all critical fields
      remain in one 32-byte cacheline.
      The other positive side effect is that drivers for network devices
      that can be shipped as CPU ports of DSA-driven switches can now use
      napi_gro_frags() to pass skbs to kernel. Packets built that way are
      completely non-linear and are likely being dropped without GRO.
      
      This was tested on to-be-mainlined-soon Ethernet driver that uses
      napi_gro_frags(), and the overall performance was on par with the
      variant from [1], sometimes even better due to minimal overhead.
      net.core.gro_normal_batch tuning may help to push it to the limit
      on particular setups and platforms.
      
      iperf3 IPoE VLAN NAT TCP forwarding (port1.218 -> port0) setup
      on 1.2 GHz MIPS board:
      
      5.7-rc2 baseline:
      
      [ID]  Interval         Transfer     Bitrate        Retr
      [ 5]  0.00-120.01 sec  9.00 GBytes  644 Mbits/sec  413  sender
      [ 5]  0.00-120.00 sec  8.99 GBytes  644 Mbits/sec       receiver
      
      Iface      RX packets  TX packets
      eth0       7097731     7097702
      port0      426050      6671829
      port1      6671681     425862
      port1.218  6671677     425851
      
      With this patch:
      
      [ID]  Interval         Transfer     Bitrate        Retr
      [ 5]  0.00-120.01 sec  12.2 GBytes  870 Mbits/sec  122  sender
      [ 5]  0.00-120.00 sec  12.2 GBytes  870 Mbits/sec       receiver
      
      Iface      RX packets  TX packets
      eth0       9474792     9474777
      port0      455200      353288
      port1      9019592     455035
      port1.218  353144      455024
      
      v2:
       - Add some performance examples in the commit message;
       - No functional changes.
      
      [1] https://lore.kernel.org/netdev/20191230143028.27313-1-alobakin@dlink.ru/Signed-off-by: NAlexander Lobakin <bloodyreaper@yandex.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e131a563
  17. 23 4月, 2020 1 次提交
  18. 15 4月, 2020 1 次提交
    • A
      net: dsa: Down cpu/dsa ports phylink will control · 3be98b2d
      Andrew Lunn 提交于
      DSA and CPU ports can be configured in two ways. By default, the
      driver should configure such ports to there maximum bandwidth. For
      most use cases, this is sufficient. When this default is insufficient,
      a phylink instance can be bound to such ports, and phylink will
      configure the port, e.g. based on fixed-link properties. phylink
      assumes the port is initially down. Given that the driver should have
      already configured it to its maximum speed, ask the driver to down
      the port before instantiating the phylink instance.
      
      Fixes: 30c4a5b0 ("net: mv88e6xxx: use resolved link config in mac_link_up()")
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3be98b2d
  19. 02 4月, 2020 1 次提交
  20. 01 4月, 2020 1 次提交
    • R
      net: dsa: fix oops while probing Marvell DSA switches · 765bda93
      Russell King 提交于
      Fix an oops in dsa_port_phylink_mac_change() caused by a combination
      of a20f9970 ("net: dsa: Don't instantiate phylink for CPU/DSA
      ports unless needed") and the net-dsa-improve-serdes-integration
      series of patches 65b7a2c8 ("Merge branch
      'net-dsa-improve-serdes-integration'").
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000124
      pgd = c0004000
      [00000124] *pgd=00000000
      Internal error: Oops: 805 [#1] SMP ARM
      Modules linked in: tag_edsa spi_nor mtd xhci_plat_hcd mv88e6xxx(+) xhci_hcd armada_thermal marvell_cesa dsa_core ehci_orion libdes phy_armada38x_comphy at24 mcp3021 sfp evbug spi_orion sff mdio_i2c
      CPU: 1 PID: 214 Comm: irq/55-mv88e6xx Not tainted 5.6.0+ #470
      Hardware name: Marvell Armada 380/385 (Device Tree)
      PC is at phylink_mac_change+0x10/0x88
      LR is at mv88e6352_serdes_irq_status+0x74/0x94 [mv88e6xxx]
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      765bda93
  21. 31 3月, 2020 3 次提交
  22. 28 3月, 2020 2 次提交
    • V
      net: dsa: implement auto-normalization of MTU for bridge hardware datapath · bff33f7e
      Vladimir Oltean 提交于
      Many switches don't have an explicit knob for configuring the MTU
      (maximum transmission unit per interface).  Instead, they do the
      length-based packet admission checks on the ingress interface, for
      reasons that are easy to understand (why would you accept a packet in
      the queuing subsystem if you know you're going to drop it anyway).
      
      So it is actually the MRU that these switches permit configuring.
      
      In Linux there only exists the IFLA_MTU netlink attribute and the
      associated dev_set_mtu function. The comments like to play blind and say
      that it's changing the "maximum transfer unit", which is to say that
      there isn't any directionality in the meaning of the MTU word. So that
      is the interpretation that this patch is giving to things: MTU == MRU.
      
      When 2 interfaces having different MTUs are bridged, the bridge driver
      MTU auto-adjustment logic kicks in: what br_mtu_auto_adjust() does is it
      adjusts the MTU of the bridge net device itself (and not that of the
      slave net devices) to the minimum value of all slave interfaces, in
      order for forwarded packets to not exceed the MTU regardless of the
      interface they are received and send on.
      
      The idea behind this behavior, and why the slave MTUs are not adjusted,
      is that normal termination from Linux over the L2 forwarding domain
      should happen over the bridge net device, which _is_ properly limited by
      the minimum MTU. And termination over individual slave devices is
      possible even if those are bridged. But that is not "forwarding", so
      there's no reason to do normalization there, since only a single
      interface sees that packet.
      
      The problem with those switches that can only control the MRU is with
      the offloaded data path, where a packet received on an interface with
      MRU 9000 would still be forwarded to an interface with MRU 1500. And the
      br_mtu_auto_adjust() function does not really help, since the MTU
      configured on the bridge net device is ignored.
      
      In order to enforce the de-facto MTU == MRU rule for these switches, we
      need to do MTU normalization, which means: in order for no packet larger
      than the MTU configured on this port to be sent, then we need to limit
      the MRU on all ports that this packet could possibly come from. AKA
      since we are configuring the MRU via MTU, it means that all ports within
      a bridge forwarding domain should have the same MTU.
      
      And that is exactly what this patch is trying to do.
      
      >From an implementation perspective, we try to follow the intent of the
      user, otherwise there is a risk that we might livelock them (they try to
      change the MTU on an already-bridged interface, but we just keep
      changing it back in an attempt to keep the MTU normalized). So the MTU
      that the bridge is normalized to is either:
      
       - The most recently changed one:
      
         ip link set dev swp0 master br0
         ip link set dev swp1 master br0
         ip link set dev swp0 mtu 1400
      
         This sequence will make swp1 inherit MTU 1400 from swp0.
      
       - The one of the most recently added interface to the bridge:
      
         ip link set dev swp0 master br0
         ip link set dev swp1 mtu 1400
         ip link set dev swp1 master br0
      
         The above sequence will make swp0 inherit MTU 1400 as well.
      Suggested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bff33f7e
    • V
      net: dsa: configure the MTU for switch ports · bfcb8132
      Vladimir Oltean 提交于
      It is useful be able to configure port policers on a switch to accept
      frames of various sizes:
      
      - Increase the MTU for better throughput from the default of 1500 if it
        is known that there is no 10/100 Mbps device in the network.
      - Decrease the MTU to limit the latency of high-priority frames under
        congestion, or work around various network segments that add extra
        headers to packets which can't be fragmented.
      
      For DSA slave ports, this is mostly a pass-through callback, called
      through the regular ndo ops and at probe time (to ensure consistency
      across all supported switches).
      
      The CPU port is called with an MTU equal to the largest configured MTU
      of the slave ports. The assumption is that the user might want to
      sustain a bidirectional conversation with a partner over any switch
      port.
      
      The DSA master is configured the same as the CPU port, plus the tagger
      overhead. Since the MTU is by definition L2 payload (sans Ethernet
      header), it is up to each individual driver to figure out if it needs to
      do anything special for its frame tags on the CPU port (it shouldn't
      except in special cases). So the MTU does not contain the tagger
      overhead on the CPU port.
      However the MTU of the DSA master, minus the tagger overhead, is used as
      a proxy for the MTU of the CPU port, which does not have a net device.
      This is to avoid uselessly calling the .change_mtu function on the CPU
      port when nothing should change.
      
      So it is safe to assume that the DSA master and the CPU port MTUs are
      apart by exactly the tagger's overhead in bytes.
      
      Some changes were made around dsa_master_set_mtu(), function which was
      now removed, for 2 reasons:
        - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to
          do the same thing in DSA
        - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method
      That is to say, there's no need for this function in DSA, we can safely
      call dev_set_mtu() directly, take the rtnl lock when necessary, and just
      propagate whatever errors get reported (since the user probably wants to
      be informed).
      
      Some inspiration (mainly in the MTU DSA notifier) was taken from a
      vaguely similar patch from Murali and Florian, who are credited as
      co-developers down below.
      Co-developed-by: NMurali Krishna Policharla <murali.policharla@broadcom.com>
      Signed-off-by: NMurali Krishna Policharla <murali.policharla@broadcom.com>
      Co-developed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfcb8132
  23. 25 3月, 2020 1 次提交
    • V
      net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop · e80f40cb
      Vladimir Oltean 提交于
      Not only did this wheel did not need reinventing, but there is also
      an issue with it: It doesn't remove the VLAN header in a way that
      preserves the L2 payload checksum when that is being provided by the DSA
      master hw.  It should recalculate checksum both for the push, before
      removing the header, and for the pull afterwards. But the current
      implementation is quite dizzying, with pulls followed immediately
      afterwards by pushes, the memmove is done before the push, etc.  This
      makes a DSA master with RX checksumming offload to print stack traces
      with the infamous 'hw csum failure' message.
      
      So remove the dsa_8021q_remove_header function and replace it with
      something that actually works with inet checksumming.
      
      Fixes: d4619336 ("net: dsa: tag_8021q: Create helper function for removing VLAN header")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e80f40cb
  24. 24 3月, 2020 1 次提交