1. 19 10月, 2021 3 次提交
  2. 18 10月, 2021 2 次提交
    • A
      net: sched: Remove Qdisc::running sequence counter · 29cbcd85
      Ahmed S. Darwish 提交于
      The Qdisc::running sequence counter has two uses:
      
        1. Reliably reading qdisc's tc statistics while the qdisc is running
           (a seqcount read/retry loop at gnet_stats_add_basic()).
      
        2. As a flag, indicating whether the qdisc in question is running
           (without any retry loops).
      
      For the first usage, the Qdisc::running sequence counter write section,
      qdisc_run_begin() => qdisc_run_end(), covers a much wider area than what
      is actually needed: the raw qdisc's bstats update. A u64_stats sync
      point was thus introduced (in previous commits) inside the bstats
      structure itself. A local u64_stats write section is then started and
      stopped for the bstats updates.
      
      Use that u64_stats sync point mechanism for the bstats read/retry loop
      at gnet_stats_add_basic().
      
      For the second qdisc->running usage, a __QDISC_STATE_RUNNING bit flag,
      accessed with atomic bitops, is sufficient. Using a bit flag instead of
      a sequence counter at qdisc_run_begin/end() and qdisc_is_running() leads
      to the SMP barriers implicitly added through raw_read_seqcount() and
      write_seqcount_begin/end() getting removed. All call sites have been
      surveyed though, and no required ordering was identified.
      
      Now that the qdisc->running sequence counter is no longer used, remove
      it.
      
      Note, using u64_stats implies no sequence counter protection for 64-bit
      architectures. This can lead to the qdisc tc statistics "packets" vs.
      "bytes" values getting out of sync on rare occasions. The individual
      values will still be valid.
      Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29cbcd85
    • A
      u64_stats: Introduce u64_stats_set() · f2efdb17
      Ahmed S. Darwish 提交于
      Allow to directly set a u64_stats_t value which is used to provide an init
      function which sets it directly to zero intead of memset() the value.
      
      Add u64_stats_set() to the u64_stats API.
      
      [bigeasy: commit message. ]
      Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2efdb17
  3. 16 10月, 2021 5 次提交
  4. 15 10月, 2021 12 次提交
    • Y
      page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA · d00e60ee
      Yunsheng Lin 提交于
      As the 32-bit arch with 64-bit DMA seems to rare those days,
      and page pool might carry a lot of code and complexity for
      systems that possibly.
      
      So disable dma mapping support for such systems, if drivers
      really want to work on such systems, they have to implement
      their own DMA-mapping fallback tracking outside page_pool.
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d00e60ee
    • A
      net: of: fix stub of_net helpers for CONFIG_NET=n · 8b017fbe
      Arnd Bergmann 提交于
      Moving the of_net code from drivers/of/ to net/core means we
      no longer stub out the helpers when networking is disabled,
      which leads to a randconfig build failure with at least one
      ARM platform that calls this from non-networking code:
      
      arm-linux-gnueabi-ld: arch/arm/mach-mvebu/kirkwood.o: in function `kirkwood_dt_eth_fixup':
      kirkwood.c:(.init.text+0x54): undefined reference to `of_get_mac_address'
      
      Restore the way this worked before by changing that #ifdef
      check back to testing for both CONFIG_OF and CONFIG_NET.
      
      Fixes: e330fb14 ("of: net: move of_net under net/")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20211014090055.2058949-1-arnd@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8b017fbe
    • F
      netfilter: ebtables: allow use of ebt_do_table as hookfn · f0d6764f
      Florian Westphal 提交于
      This is possible now that the xt_table structure is passed via *priv.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f0d6764f
    • F
      netfilter: ip6tables: allow use of ip6t_do_table as hookfn · 44b5990e
      Florian Westphal 提交于
      This is possible now that the xt_table structure is passed via *priv.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      44b5990e
    • F
      netfilter: arp_tables: allow use of arpt_do_table as hookfn · e8d225b6
      Florian Westphal 提交于
      This is possible now that the xt_table structure is passed in via *priv.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e8d225b6
    • F
      netfilter: iptables: allow use of ipt_do_table as hookfn · 8844e010
      Florian Westphal 提交于
      This is possible now that the xt_table structure is passed in via *priv.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8844e010
    • L
      netfilter: Introduce egress hook · 42df6e1d
      Lukas Wunner 提交于
      Support classifying packets with netfilter on egress to satisfy user
      requirements such as:
      * outbound security policies for containers (Laura)
      * filtering and mangling intra-node Direct Server Return (DSR) traffic
        on a load balancer (Laura)
      * filtering locally generated traffic coming in through AF_PACKET,
        such as local ARP traffic generated for clustering purposes or DHCP
        (Laura; the AF_PACKET plumbing is contained in a follow-up commit)
      * L2 filtering from ingress and egress for AVB (Audio Video Bridging)
        and gPTP with nftables (Pablo)
      * in the future: in-kernel NAT64/NAT46 (Pablo)
      
      The egress hook introduced herein complements the ingress hook added by
      commit e687ad60 ("netfilter: add netfilter ingress hook after
      handle_ing() under unique static key").  A patch for nftables to hook up
      egress rules from user space has been submitted separately, so users may
      immediately take advantage of the feature.
      
      Alternatively or in addition to netfilter, packets can be classified
      with traffic control (tc).  On ingress, packets are classified first by
      tc, then by netfilter.  On egress, the order is reversed for symmetry.
      Conceptually, tc and netfilter can be thought of as layers, with
      netfilter layered above tc.
      
      Traffic control is capable of redirecting packets to another interface
      (man 8 tc-mirred).  E.g., an ingress packet may be redirected from the
      host namespace to a container via a veth connection:
      tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container)
      
      In this case, netfilter egress classifying is not performed when leaving
      the host namespace!  That's because the packet is still on the tc layer.
      If tc redirects the packet to a physical interface in the host namespace
      such that it leaves the system, the packet is never subjected to
      netfilter egress classifying.  That is only logical since it hasn't
      passed through netfilter ingress classifying either.
      
      Packets can alternatively be redirected at the netfilter layer using
      nft fwd.  Such a packet *is* subjected to netfilter egress classifying
      since it has reached the netfilter layer.
      
      Internally, the skb->nf_skip_egress flag controls whether netfilter is
      invoked on egress by __dev_queue_xmit().  Because __dev_queue_xmit() may
      be called recursively by tunnel drivers such as vxlan, the flag is
      reverted to false after sch_handle_egress().  This ensures that
      netfilter is applied both on the overlay and underlying network.
      
      Interaction between tc and netfilter is possible by setting and querying
      skb->mark.
      
      If netfilter egress classifying is not enabled on any interface, it is
      patched out of the data path by way of a static_key and doesn't make a
      performance difference that is discernible from noise:
      
      Before:             1537 1538 1538 1537 1538 1537 Mb/sec
      After:              1536 1534 1539 1539 1539 1540 Mb/sec
      Before + tc accept: 1418 1418 1418 1419 1419 1418 Mb/sec
      After  + tc accept: 1419 1424 1418 1419 1422 1420 Mb/sec
      Before + tc drop:   1620 1619 1619 1619 1620 1620 Mb/sec
      After  + tc drop:   1616 1624 1625 1624 1622 1619 Mb/sec
      
      When netfilter egress classifying is enabled on at least one interface,
      a minimal performance penalty is incurred for every egress packet, even
      if the interface it's transmitted over doesn't have any netfilter egress
      rules configured.  That is caused by checking dev->nf_hooks_egress
      against NULL.
      
      Measurements were performed on a Core i7-3615QM.  Commands to reproduce:
      ip link add dev foo type dummy
      ip link set dev foo up
      modprobe pktgen
      echo "add_device foo" > /proc/net/pktgen/kpktgend_3
      samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1
      
      Accept all traffic with tc:
      tc qdisc add dev foo clsact
      tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'
      
      Drop all traffic with tc:
      tc qdisc add dev foo clsact
      tc filter add dev foo egress bpf da bytecode '1,6 0 0 2,'
      
      Apply this patch when measuring packet drops to avoid errors in dmesg:
      https://lore.kernel.org/netdev/a73dda33-57f4-95d8-ea51-ed483abd6a7a@iogearbox.net/Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Cc: Laura García Liébana <nevola@gmail.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      42df6e1d
    • L
      netfilter: Generalize ingress hook include file · 17d20784
      Lukas Wunner 提交于
      Prepare for addition of a netfilter egress hook by generalizing the
      ingress hook include file.
      
      No functional change intended.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      17d20784
    • L
      netfilter: Rename ingress hook include file · 7463acfb
      Lukas Wunner 提交于
      Prepare for addition of a netfilter egress hook by renaming
      <linux/netfilter_ingress.h> to <linux/netfilter_netdev.h>.
      
      The egress hook also necessitates a refactoring of the include file,
      but that is done in a separate commit to ease reviewing.
      
      No functional change intended.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7463acfb
    • J
      ethernet: remove random_ether_addr() · ba530fea
      Jakub Kicinski 提交于
      random_ether_addr() was the original name of the helper which
      was kept for backward compatibility (?) after the rename in
      commit 0a4dd594 ("etherdevice: Rename random_ether_addr
      to eth_random_addr").
      
      We have a single random_ether_addr() caller left in tree
      while there are 70 callers of eth_random_addr().
      Time to drop this define.
      Reviewed-by: NSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20211013205450.328092-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ba530fea
    • J
      ethernet: make eth_hw_addr_random() use dev_addr_set() · 54f2d8d6
      Jakub Kicinski 提交于
      Commit 406f42fa ("net-next: When a bond have a massive amount
      of VLANs...") introduced a rbtree for faster Ethernet address look
      up. To maintain netdev->dev_addr in this tree we need to make all
      the writes to it got through appropriate helpers.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      54f2d8d6
    • J
      ethernet: constify references to netdev->dev_addr in drivers · 76660757
      Jakub Kicinski 提交于
      This big patch sprinkles const on local variables and
      function arguments which may refer to netdev->dev_addr.
      
      Commit 406f42fa ("net-next: When a bond have a massive amount
      of VLANs...") introduced a rbtree for faster Ethernet address look
      up. To maintain netdev->dev_addr in this tree we need to make all
      the writes to it got through appropriate helpers.
      
      Some of the changes here are not strictly required - const
      is sometimes cast off but pointer is not used for writing.
      It seems like it's still better to add the const in case
      the code changes later or relevant -W flags get enabled
      for the build.
      
      No functional changes.
      
      Link: https://lore.kernel.org/r/20211014142432.449314-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      76660757
  5. 14 10月, 2021 3 次提交
  6. 13 10月, 2021 5 次提交
    • V
      net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib · 49f885b2
      Vladimir Oltean 提交于
      Michael reported that when using the "ocelot-8021q" tagging protocol,
      the switch driver module must be manually loaded before the tagging
      protocol can be loaded/is available.
      
      This appears to be the same problem described here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      where due to the fact that DSA tagging protocols make use of symbols
      exported by the switch drivers, circular dependencies appear and this
      breaks module autoloading.
      
      The ocelot_8021q driver needs the ocelot_can_inject() and
      ocelot_port_inject_frame() functions from the switch library. Previously
      the wrong approach was taken to solve that dependency: shims were
      provided for the case where the ocelot switch library was compiled out,
      but that turns out to be insufficient, because the dependency when the
      switch lib _is_ compiled is problematic too.
      
      We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
      static inline functions, because these access I/O functions like
      __ocelot_write_ix() which is called by ocelot_write_rix(). Making those
      static inline basically means exposing the whole guts of the ocelot
      switch library, not ideal...
      
      We already have one tagging protocol driver which calls into the switch
      driver during xmit but not using any exported symbol: sja1105_defer_xmit.
      We can do the same thing here: create a kthread worker and one work item
      per skb, and let the switch driver itself do the register accesses to
      send the skb, and then consume it.
      
      Fixes: 0a6f17c6 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping")
      Reported-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      49f885b2
    • V
      net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver · deab6b1c
      Vladimir Oltean 提交于
      As explained here:
      https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
      DSA tagging protocol drivers cannot depend on symbols exported by switch
      drivers, because this creates a circular dependency that breaks module
      autoloading.
      
      The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
      exported by the common ocelot switch lib. This function looks at
      OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
      DSA tag, for PTP timestamping (the command: one-step/two-step, and the
      TX timestamp identifier).
      
      None of that requires deep insight into the driver, it is quite
      stateless, as it only depends upon the skb->cb. So let's make it a
      static inline function and put it in include/linux/dsa/ocelot.h, a
      file that despite its name is used by the ocelot switch driver for
      populating the injection header too - since commit 40d3f295 ("net:
      mscc: ocelot: use common tag parsing code with DSA").
      
      With that function declared as static inline, its body is expanded
      inside each call site, so the dependency is broken and the DSA tagger
      can be built without the switch library, upon which the felix driver
      depends.
      
      Fixes: 39e5308b ("net: mscc: ocelot: support PTP Sync one-step timestamping")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      deab6b1c
    • V
      net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · 4ac0567e
      Vladimir Oltean 提交于
      It's nice to be able to test a tagging protocol with dsa_loop, but not
      at the cost of losing the ability of building the tagging protocol and
      switch driver as modules, because as things stand, there is a circular
      dependency between the two. Tagging protocol drivers cannot depend on
      switch drivers, that is a hard fact.
      
      The reasoning behind the blamed patch was that accessing dp->priv should
      first make sure that the structure behind that pointer is what we really
      think it is.
      
      Currently the "sja1105" and "sja1110" tagging protocols only operate
      with the sja1105 switch driver, just like any other tagging protocol and
      switch combination. The only way to mix and match them is by modifying
      the code, and this applies to dsa_loop as well (by default that uses
      DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
      practice there isn't one.
      
      Until we extend dsa_loop to allow user space configuration, treat the
      problem as a non-issue and just say that DSA ports found by tag_sja1105
      are always sja1105 ports, which is in fact true. But keep the
      dsa_port_is_sja1105 function so that it's easy to patch it during
      testing, and rely on dead code elimination.
      
      Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4ac0567e
    • V
      net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver · 28da0555
      Vladimir Oltean 提交于
      The problem is that DSA tagging protocols really must not depend on the
      switch driver, because this creates a circular dependency at insmod
      time, and the switch driver will effectively not load when the tagging
      protocol driver is missing.
      
      The code was structured in the way it was for a reason, though. The DSA
      driver-facing API for PTP timestamping relies on the assumption that
      two-step TX timestamps are provided by the hardware in an out-of-band
      manner, typically by raising an interrupt and making that timestamp
      available inside some sort of FIFO which is to be accessed over
      SPI/MDIO/etc.
      
      So the API puts .port_txtstamp into dsa_switch_ops, because it is
      expected that the switch driver needs to save some state (like put the
      skb into a queue until its TX timestamp arrives).
      
      On SJA1110, TX timestamps are provided by the switch as Ethernet
      packets, so this makes them be received and processed by the tagging
      protocol driver. This in itself is great, because the timestamps are
      full 64-bit and do not require reconstruction, and since Ethernet is the
      fastest I/O method available to/from the switch, PTP timestamps arrive
      very quickly, no matter how bottlenecked the SPI connection is, because
      SPI interaction is not needed at all.
      
      DSA's code structure and strict isolation between the tagging protocol
      driver and the switch driver break the natural code organization.
      
      When the tagging protocol driver receives a packet which is classified
      as a metadata packet containing timestamps, it passes those timestamps
      one by one to the switch driver, which then proceeds to compare them
      based on the recorded timestamp ID that was generated in .port_txtstamp.
      
      The communication between the tagging protocol and the switch driver is
      done through a method exported by the switch driver, sja1110_process_meta_tstamp.
      To satisfy build requirements, we force a dependency to build the
      tagging protocol driver as a module when the switch driver is a module.
      However, as explained in the first paragraph, that causes the circular
      dependency.
      
      To solve this, move the skb queue from struct sja1105_private :: struct
      sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
      The latter is a data structure for which hacks have already been put
      into place to be able to create persistent storage per switch that is
      accessible from the tagging protocol driver (see sja1105_setup_ports).
      
      With the skb queue directly accessible from the tagging protocol driver,
      we can now move sja1110_process_meta_tstamp into the tagging driver
      itself, and avoid exporting a symbol.
      
      Fixes: 566b18c8 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
      Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      28da0555
    • A
      net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp · 0bc73ad4
      Aya Levin 提交于
      Due to current HW arch limitations, RX-FCS (scattering FCS frame field
      to software) and RX-port-timestamp (improved timestamp accuracy on the
      receive side) can't work together.
      RX-port-timestamp is not controlled by the user and it is enabled by
      default when supported by the HW/FW.
      This patch sets RX-port-timestamp opposite to RX-FCS configuration.
      
      Fixes: 102722fc ("net/mlx5e: Add support for RXFCS feature flag")
      Signed-off-by: NAya Levin <ayal@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      0bc73ad4
  7. 10 10月, 2021 1 次提交
  8. 09 10月, 2021 2 次提交
    • V
      net: dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports · 5bded825
      Vladimir Oltean 提交于
      Similar to commit 6087175b ("net: dsa: mt7530: use independent VLAN
      learning on VLAN-unaware bridges"), software forwarding between an
      unoffloaded LAG port (a bonding interface with an unsupported policy)
      and a mv88e6xxx user port directly under a bridge is broken.
      
      We adopt the same strategy, which is to make the standalone ports not
      find any ATU entry learned on a bridge port.
      
      Theory: the mv88e6xxx ATU is looked up by FID and MAC address. There are
      as many FIDs as VIDs (4096). The FID is derived from the VID when
      possible (the VTU maps a VID to a FID), with a fallback to the port
      based default FID value when not (802.1Q Mode is disabled on the port,
      or the classified VID isn't present in the VTU).
      
      The mv88e6xxx driver makes the following use of FIDs and VIDs:
      
      - the port's DefaultVID (to which untagged & pvid-tagged packets get
        classified) is 0 and is absent from the VTU, so this kind of packets is
        processed in FID 0, the default FID assigned by mv88e6xxx_setup_port.
      
      - every time a bridge VLAN is created, mv88e6xxx_port_vlan_join() ->
        mv88e6xxx_atu_new() associates a FID with that VID which increases
        linearly starting from 1. Like this:
      
        bridge vlan add dev lan0 vid 100 # FID 1
        bridge vlan add dev lan1 vid 100 # still FID 1
        bridge vlan add dev lan2 vid 1024 # FID 2
      
      The FID allocation made by the driver is sub-optimal for the following
      reasons:
      
      (a) A standalone port has a DefaultPVID of 0 and a default FID of 0 too.
          A VLAN-unaware bridged port has a DefaultPVID of 0 and a default FID
          of 0 too. The difference is that the bridged ports may learn ATU
          entries, while the standalone port has the requirement that it must
          not, and must not find them either. Standalone ports must not use
          the same FID as ports belonging to a bridge. All standalone ports
          can use the same FID, since the ATU will never have an entry in
          that FID.
      
      (b) Multiple VLAN-unaware bridges will all use a DefaultPVID of 0 and a
          default FID of 0 on all their ports. The FDBs will not be isolated
          between these bridges. Every VLAN-unaware bridge must use the same
          FID on all its ports, different from the FID of other bridge ports.
      
      (c) Each bridge VLAN uses a unique FID which is useful for Independent
          VLAN Learning, but the same VLAN ID on multiple VLAN-aware bridges
          will result in the same FID being used by mv88e6xxx_atu_new().
          The correct behavior is for VLAN 1 in br0 to have a different FID
          compared to VLAN 1 in br1.
      
      This patch cannot fix all the above. Traditionally the DSA framework did
      not care about this, and the reality is that DSA core involvement is
      needed for the aforementioned issues to be solved. The only thing we can
      solve here is an issue which does not require API changes, and that is
      issue (a), aka use a different FID for standalone ports vs ports under
      VLAN-unaware bridges.
      
      The first step is deciding what VID and FID to use for standalone ports,
      and what VID and FID for bridged ports. The 0/0 pair for standalone
      ports is what they used up till now, let's keep using that. For bridged
      ports, there are 2 cases:
      
      - VLAN-aware ports will never end up using the port default FID, because
        packets will always be classified to a VID in the VTU or dropped
        otherwise. The FID is the one associated with the VID in the VTU.
      
      - On VLAN-unaware ports, we _could_ leave their DefaultVID (pvid) at
        zero (just as in the case of standalone ports), and just change the
        port's default FID from 0 to a different number (say 1).
      
      However, Tobias points out that there is one more requirement to cater to:
      cross-chip bridging. The Marvell DSA header does not carry the FID in
      it, only the VID. So once a packet crosses a DSA link, if it has a VID
      of zero it will get classified to the default FID of that cascade port.
      Relying on a port default FID for upstream cascade ports results in
      contradictions: a default FID of 0 breaks ATU isolation of bridged ports
      on the downstream switch, a default FID of 1 breaks standalone ports on
      the downstream switch.
      
      So not only must standalone ports have different FIDs compared to
      bridged ports, they must also have different DefaultVID values.
      IEEE 802.1Q defines two reserved VID values: 0 and 4095. So we simply
      choose 4095 as the DefaultVID of ports belonging to VLAN-unaware
      bridges, and VID 4095 maps to FID 1.
      
      For the xmit operation to look up the same ATU database, we need to put
      VID 4095 in DSA tags sent to ports belonging to VLAN-unaware bridges
      too. All shared ports are configured to map this VID to the bridging
      FID, because they are members of that VLAN in the VTU. Shared ports
      don't need to have 802.1QMode enabled in any way, they always parse the
      VID from the DSA header, they don't need to look at the 802.1Q header.
      
      We install VID 4095 to the VTU in mv88e6xxx_setup_port(), with the
      mention that mv88e6xxx_vtu_setup() which was located right below that
      call was flushing the VTU so those entries wouldn't be preserved.
      So we need to relocate the VTU flushing prior to the port initialization
      during ->setup(). Also note that this is why it is safe to assume that
      VID 4095 will get associated with FID 1: the user ports haven't been
      created, so there is no avenue for the user to create a bridge VLAN
      which could otherwise race with the creation of another FID which would
      otherwise use up the non-reserved FID value of 1.
      
      [ Currently mv88e6xxx_port_vlan_join() doesn't have the option of
        specifying a preferred FID, it always calls mv88e6xxx_atu_new(). ]
      
      mv88e6xxx_port_db_load_purge() is the function to access the ATU for
      FDB/MDB entries, and it used to determine the FID to use for
      VLAN-unaware FDB entries (VID=0) using mv88e6xxx_port_get_fid().
      But the driver only called mv88e6xxx_port_set_fid() once, during probe,
      so no surprises, the port FID was always 0, the call to get_fid() was
      redundant. As much as I would have wanted to not touch that code, the
      logic is broken when we add a new FID which is not the port-based
      default. Now the port-based default FID only corresponds to standalone
      ports, and FDB/MDB entries belong to the bridging service. So while in
      the future, when the DSA API will support FDB isolation, we will have to
      figure out the FID based on the bridge number, for now there's a single
      bridging FID, so hardcode that.
      
      Lastly, the tagger needs to check, when it is transmitting a VLAN
      untagged skb, whether it is sending it towards a bridged or a standalone
      port. When we see it is bridged we assume the bridge is VLAN-unaware.
      Not because it cannot be VLAN-aware but:
      
      - if we are transmitting from a VLAN-aware bridge we are likely doing so
        using TX forwarding offload. That code path guarantees that skbs have
        a vlan hwaccel tag in them, so we would not enter the "else" branch
        of the "if (skb->protocol == htons(ETH_P_8021Q))" condition.
      
      - if we are transmitting on behalf of a VLAN-aware bridge but with no TX
        forwarding offload (no PVT support, out of space in the PVT, whatever),
        we would indeed be transmitting with VLAN 4095 instead of the bridge
        device's pvid. However we would be injecting a "From CPU" frame, and
        the switch won't learn from that - it only learns from "Forward" frames.
        So it is inconsequential for address learning. And VLAN 4095 is
        absolutely enough for the frame to exit the switch, since we never
        remove that VLAN from any port.
      
      Fixes: 57e661aa ("net: dsa: mv88e6xxx: Link aggregation support")
      Reported-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5bded825
    • A
      net: introduce a function to check if a netdev name is in use · 75ea27d0
      Antoine Tenart 提交于
      __dev_get_by_name is currently used to either retrieve a net device
      reference using its name or to check if a name is already used by a
      registered net device (per ns). In the later case there is no need to
      return a reference to a net device.
      
      Introduce a new helper, netdev_name_in_use, to check if a name is
      currently used by a registered net device without leaking a reference
      the corresponding net device. This helper uses netdev_name_node_lookup
      instead of __dev_get_by_name as we don't need the extra logic retrieving
      a reference to the corresponding net device.
      Signed-off-by: NAntoine Tenart <atenart@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75ea27d0
  9. 08 10月, 2021 1 次提交
  10. 07 10月, 2021 6 次提交