1. 14 5月, 2021 5 次提交
  2. 30 4月, 2021 1 次提交
    • Z
      bridge: Fix possible races between assigning rx_handler_data and setting IFF_BRIDGE_PORT bit · 59259ff7
      Zhang Zhengming 提交于
      There is a crash in the function br_get_link_af_size_filtered,
      as the port_exists(dev) is true and the rx_handler_data of dev is NULL.
      But the rx_handler_data of dev is correct saved in vmcore.
      
      The oops looks something like:
       ...
       pc : br_get_link_af_size_filtered+0x28/0x1c8 [bridge]
       ...
       Call trace:
        br_get_link_af_size_filtered+0x28/0x1c8 [bridge]
        if_nlmsg_size+0x180/0x1b0
        rtnl_calcit.isra.12+0xf8/0x148
        rtnetlink_rcv_msg+0x334/0x370
        netlink_rcv_skb+0x64/0x130
        rtnetlink_rcv+0x28/0x38
        netlink_unicast+0x1f0/0x250
        netlink_sendmsg+0x310/0x378
        sock_sendmsg+0x4c/0x70
        __sys_sendto+0x120/0x150
        __arm64_sys_sendto+0x30/0x40
        el0_svc_common+0x78/0x130
        el0_svc_handler+0x38/0x78
        el0_svc+0x8/0xc
      
      In br_add_if(), we found there is no guarantee that
      assigning rx_handler_data to dev->rx_handler_data
      will before setting the IFF_BRIDGE_PORT bit of priv_flags.
      So there is a possible data competition:
      
      CPU 0:                                                        CPU 1:
      (RCU read lock)                                               (RTNL lock)
      rtnl_calcit()                                                 br_add_slave()
        if_nlmsg_size()                                               br_add_if()
          br_get_link_af_size_filtered()                              -> netdev_rx_handler_register
                                                                          ...
                                                                          // The order is not guaranteed
            ...                                                           -> dev->priv_flags |= IFF_BRIDGE_PORT;
            // The IFF_BRIDGE_PORT bit of priv_flags has been set
            -> if (br_port_exists(dev)) {
              // The dev->rx_handler_data has NOT been assigned
              -> p = br_port_get_rcu(dev);
              ....
                                                                          -> rcu_assign_pointer(dev->rx_handler_data, rx_handler_data);
                                                                           ...
      
      Fix it in br_get_link_af_size_filtered, using br_port_get_check_rcu() and checking the return value.
      Signed-off-by: NZhang Zhengming <zhangzhengming@huawei.com>
      Reviewed-by: NZhao Lei <zhaolei69@huawei.com>
      Reviewed-by: NWang Xiaogang <wangxiaogang3@huawei.com>
      Suggested-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59259ff7
  3. 28 4月, 2021 1 次提交
    • L
      net: bridge: mcast: fix broken length + header check for MRDv6 Adv. · 99014088
      Linus Lüssing 提交于
      The IPv6 Multicast Router Advertisements parsing has the following two
      issues:
      
      For one thing, ICMPv6 MRD Advertisements are smaller than ICMPv6 MLD
      messages (ICMPv6 MRD Adv.: 8 bytes vs. ICMPv6 MLDv1/2: >= 24 bytes,
      assuming MLDv2 Reports with at least one multicast address entry).
      When ipv6_mc_check_mld_msg() tries to parse an Multicast Router
      Advertisement its MLD length check will fail - and it will wrongly
      return -EINVAL, even if we have a valid MRD Advertisement. With the
      returned -EINVAL the bridge code will assume a broken packet and will
      wrongly discard it, potentially leading to multicast packet loss towards
      multicast routers.
      
      The second issue is the MRD header parsing in
      br_ip6_multicast_mrd_rcv(): It wrongly checks for an ICMPv6 header
      immediately after the IPv6 header (IPv6 next header type). However
      according to RFC4286, section 2 all MRD messages contain a Router Alert
      option (just like MLD). So instead there is an IPv6 Hop-by-Hop option
      for the Router Alert between the IPv6 and ICMPv6 header, again leading
      to the bridge wrongly discarding Multicast Router Advertisements.
      
      To fix these two issues, introduce a new return value -ENODATA to
      ipv6_mc_check_mld() to indicate a valid ICMPv6 packet with a hop-by-hop
      option which is not an MLD but potentially an MRD packet. This also
      simplifies further parsing in the bridge code, as ipv6_mc_check_mld()
      already fully checks the ICMPv6 header and hop-by-hop option.
      
      These issues were found and fixed with the help of the mrdisc tool
      (https://github.com/troglobit/mrdisc).
      
      Fixes: 4b3087c7 ("bridge: Snoop Multicast Router Advertisements")
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99014088
  4. 27 4月, 2021 1 次提交
  5. 26 4月, 2021 1 次提交
    • F
      netfilter: ebtables: remove the 3 ebtables pointers from struct net · 4c95e072
      Florian Westphal 提交于
      ebtables stores the table internal data (what gets passed to the
      ebt_do_table() interpreter) in struct net.
      
      nftables keeps the internal interpreter format in pernet lists
      and passes it via the netfilter core infrastructure (priv pointer).
      
      Do the same for ebtables: the nf_hook_ops are duplicated via kmemdup,
      then the ops->priv pointer is set to the table that is being registered.
      
      After that, the netfilter core passes this table info to the hookfn.
      
      This allows to remove the pointers from struct net.
      
      Same pattern can be applied to ip/ip6/arptables.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4c95e072
  6. 22 4月, 2021 1 次提交
  7. 17 4月, 2021 2 次提交
  8. 15 4月, 2021 1 次提交
  9. 11 4月, 2021 1 次提交
    • F
      netfilter: bridge: add pre_exit hooks for ebtable unregistration · 7ee3c61d
      Florian Westphal 提交于
      Just like ip/ip6/arptables, the hooks have to be removed, then
      synchronize_rcu() has to be called to make sure no more packets are being
      processed before the ruleset data is released.
      
      Place the hook unregistration in the pre_exit hook, then call the new
      ebtables pre_exit function from there.
      
      Years ago, when first netns support got added for netfilter+ebtables,
      this used an older (now removed) netfilter hook unregister API, that did
      a unconditional synchronize_rcu().
      
      Now that all is done with call_rcu, ebtable_{filter,nat,broute} pernet exit
      handlers may free the ebtable ruleset while packets are still in flight.
      
      This can only happens on module removal, not during netns exit.
      
      The new function expects the table name, not the table struct.
      
      This is because upcoming patch set (targeting -next) will remove all
      net->xt.{nat,filter,broute}_table instances, this makes it necessary
      to avoid external references to those member variables.
      
      The existing APIs will be converted, so follow the upcoming scheme of
      passing name + hook type instead.
      
      Fixes: aee12a0a ("ebtables: remove nf_hook_register usage")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7ee3c61d
  10. 06 4月, 2021 1 次提交
  11. 01 4月, 2021 1 次提交
  12. 25 3月, 2021 5 次提交
  13. 24 3月, 2021 6 次提交
    • V
      net: bridge: add helper to replay VLANs installed on port · 22f67cdf
      Vladimir Oltean 提交于
      Currently this simple setup with DSA:
      
      ip link add br0 type bridge vlan_filtering 1
      ip link add bond0 type bond
      ip link set bond0 master br0
      ip link set swp0 master bond0
      
      will not work because the bridge has created the PVID in br_add_if ->
      nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1,
      but that was too early, since swp0 was not yet a lower of bond0, so it
      had no reason to act upon that notification.
      
      We need a helper in the bridge to replay the switchdev VLAN objects that
      were notified since the bridge port creation, because some of them may
      have been missed.
      
      As opposed to the br_mdb_replay function, the vg->vlan_list write side
      protection is offered by the rtnl_mutex which is sleepable, so we don't
      need to queue up the objects in atomic context, we can replay them right
      away.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22f67cdf
    • V
      net: bridge: add helper to replay port and local fdb entries · 04846f90
      Vladimir Oltean 提交于
      When a switchdev port starts offloading a LAG that is already in a
      bridge and has an FDB entry pointing to it:
      
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      ip link set swp0 master bond0
      
      the switchdev driver will have no idea that this FDB entry is there,
      because it missed the switchdev event emitted at its creation.
      
      Ido Schimmel pointed this out during a discussion about challenges with
      switchdev offloading of stacked interfaces between the physical port and
      the bridge, and recommended to just catch that condition and deny the
      CHANGEUPPER event:
      https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/
      
      But in fact, we might need to deal with the hard thing anyway, which is
      to replay all FDB addresses relevant to this port, because it isn't just
      static FDB entries, but also local addresses (ones that are not
      forwarded but terminated by the bridge). There, we can't just say 'oh
      yeah, there was an upper already so I'm not joining that'.
      
      So, similar to the logic for replaying MDB entries, add a function that
      must be called by individual switchdev drivers and replays local FDB
      entries as well as ones pointing towards a bridge port. This time, we
      use the atomic switchdev notifier block, since that's what FDB entries
      expect for some reason.
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04846f90
    • V
      net: bridge: add helper to replay port and host-joined mdb entries · 4f2673b3
      Vladimir Oltean 提交于
      I have a system with DSA ports, and udhcpcd is configured to bring
      interfaces up as soon as they are created.
      
      I create a bridge as follows:
      
      ip link add br0 type bridge
      
      As soon as I create the bridge and udhcpcd brings it up, I also have
      avahi which automatically starts sending IPv6 packets to advertise some
      local services, and because of that, the br0 bridge joins the following
      IPv6 groups due to the code path detailed below:
      
      33:33:ff:6d:c1:9c vid 0
      33:33:00:00:00:6a vid 0
      33:33:00:00:00:fb vid 0
      
      br_dev_xmit
      -> br_multicast_rcv
         -> br_ip6_multicast_add_group
            -> __br_multicast_add_group
               -> br_multicast_host_join
                  -> br_mdb_notify
      
      This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
      hooked up, and switchdev will attempt to offload the host joined groups
      to an empty list of ports. Of course nobody offloads them.
      
      Then when we add a port to br0:
      
      ip link set swp0 master br0
      
      the bridge doesn't replay the host-joined MDB entries from br_add_if,
      and eventually the host joined addresses expire, and a switchdev
      notification for deleting it is emitted, but surprise, the original
      addition was already completely missed.
      
      The strategy to address this problem is to replay the MDB entries (both
      the port ones and the host joined ones) when the new port joins the
      bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
      be populated and only then attached to a bridge that you offload).
      However there are 2 possibilities: the addresses can be 'pushed' by the
      bridge into the port, or the port can 'pull' them from the bridge.
      
      Considering that in the general case, the new port can be really late to
      the party, and there may have been many other switchdev ports that
      already received the initial notification, we would like to avoid
      delivering duplicate events to them, since they might misbehave. And
      currently, the bridge calls the entire switchdev notifier chain, whereas
      for replaying it should just call the notifier block of the new guy.
      But the bridge doesn't know what is the new guy's notifier block, it
      just knows where the switchdev notifier chain is. So for simplification,
      we make this a driver-initiated pull for now, and the notifier block is
      passed as an argument.
      
      To emulate the calling context for mdb objects (deferred and put on the
      blocking notifier chain), we must iterate under RCU protection through
      the bridge's mdb entries, queue them, and only call them once we're out
      of the RCU read-side critical section.
      
      There was some opportunity for reuse between br_mdb_switchdev_host_port,
      br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
      mdb object is created, so a helper was created.
      Suggested-by: NIdo Schimmel <idosch@idosch.org>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f2673b3
    • V
      net: bridge: add helper to retrieve the current ageing time · f1d42ea1
      Vladimir Oltean 提交于
      The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from:
      
      sysfs/ioctl/netlink
      -> br_set_ageing_time
         -> __set_ageing_time
      
      therefore not at bridge port creation time, so:
      (a) switchdev drivers have to hardcode the initial value for the address
          ageing time, because they didn't get any notification
      (b) that hardcoded value can be out of sync, if the user changes the
          ageing time before enslaving the port to the bridge
      
      We need a helper in the bridge, such that switchdev drivers can query
      the current value of the bridge ageing time when they start offloading
      it.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NTobias Waldekranz <tobias@waldekranz.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1d42ea1
    • V
      net: bridge: add helper for retrieving the current bridge port STP state · c0e715bb
      Vladimir Oltean 提交于
      It may happen that we have the following topology with DSA or any other
      switchdev driver with LAG offload:
      
      ip link add br0 type bridge stp_state 1
      ip link add bond0 type bond
      ip link set bond0 master br0
      ip link set swp0 master bond0
      ip link set swp1 master bond0
      
      STP decides that it should put bond0 into the BLOCKING state, and
      that's that. The ports that are actively listening for the switchdev
      port attributes emitted for the bond0 bridge port (because they are
      offloading it) and have the honor of seeing that switchdev port
      attribute can react to it, so we can program swp0 and swp1 into the
      BLOCKING state.
      
      But if then we do:
      
      ip link set swp2 master bond0
      
      then as far as the bridge is concerned, nothing has changed: it still
      has one bridge port. But this new bridge port will not see any STP state
      change notification and will remain FORWARDING, which is how the
      standalone code leaves it in.
      
      We need a function in the bridge driver which retrieves the current STP
      state, such that drivers can synchronize to it when they may have missed
      switchdev events.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NTobias Waldekranz <tobias@waldekranz.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0e715bb
    • V
      net: bridge: don't notify switchdev for local FDB addresses · 6ab4c311
      Vladimir Oltean 提交于
      As explained in this discussion:
      https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/
      
      the switchdev notifiers for FDB entries managed to have a zero-day bug.
      The bridge would not say that this entry is local:
      
      ip link add br0 type bridge
      ip link set swp0 master br0
      bridge fdb add dev swp0 00:01:02:03:04:05 master local
      
      and the switchdev driver would be more than happy to offload it as a
      normal static FDB entry. This is despite the fact that 'local' and
      non-'local' entries have completely opposite directions: a local entry
      is locally terminated and not forwarded, whereas a static entry is
      forwarded and not locally terminated. So, for example, DSA would install
      this entry on swp0 instead of installing it on the CPU port as it should.
      
      There is an even sadder part, which is that the 'local' flag is implicit
      if 'static' is not specified, meaning that this command produces the
      same result of adding a 'local' entry:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master
      
      I've updated the man pages for 'bridge', and after reading it now, it
      should be pretty clear to any user that the commands above were broken
      and should have never resulted in the 00:01:02:03:04:05 address being
      forwarded (this behavior is coherent with non-switchdev interfaces):
      https://patchwork.kernel.org/project/netdevbpf/cover/20210211104502.2081443-1-olteanv@gmail.com/
      If you're a user reading this and this is what you want, just use:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master static
      
      Because switchdev should have given drivers the means from day one to
      classify FDB entries as local/non-local, but didn't, it means that all
      drivers are currently broken. So we can just as well omit the switchdev
      notifications for local FDB entries, which is exactly what this patch
      does to close the bug in stable trees. For further development work
      where drivers might want to trap the local FDB entries to the host, we
      can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and
      selectively make drivers act upon that bit, while all the others ignore
      those entries if the 'is_local' bit is set.
      
      Fixes: 6b26b51b ("net: bridge: Add support for notifying devices about FDB add/del")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ab4c311
  14. 23 3月, 2021 2 次提交
  15. 17 3月, 2021 2 次提交
    • N
      net: bridge: mcast: factor out common allow/block EHT handling · e09cf582
      Nikolay Aleksandrov 提交于
      We hande EHT state change for ALLOW messages in INCLUDE mode and for
      BLOCK messages in EXCLUDE mode similarly - create the new set entries
      with the proper filter mode. We also handle EHT state change for ALLOW
      messages in EXCLUDE mode and for BLOCK messages in INCLUDE mode in a
      similar way - delete the common entries (current set and new set).
      Factor out all the common code as follows:
       - ALLOW/INCLUDE, BLOCK/EXCLUDE: call __eht_create_set_entries()
       - ALLOW/EXCLUDE, BLOCK/INCLUDE: call __eht_del_common_set_entries()
      
      The set entries creation can be reused in __eht_inc_exc() as well.
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e09cf582
    • N
      net: bridge: mcast: remove unreachable EHT code · 6aa2c371
      Nikolay Aleksandrov 提交于
      In the initial EHT versions there were common functions which handled
      allow/block messages for both INCLUDE and EXCLUDE modes, but later they
      were separated. It seems I've left some common code which cannot be
      reached because the filter mode is checked before calling the respective
      functions, i.e. the host filter is always in EXCLUDE mode when using
      __eht_allow_excl() and __eht_block_excl() thus we can drop the host_excl
      checks inside and simplify the code a bit.
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6aa2c371
  16. 11 3月, 2021 1 次提交
  17. 17 2月, 2021 3 次提交
    • H
      bridge: mrp: Update br_mrp to use new return values of br_mrp_switchdev · cd605d45
      Horatiu Vultur 提交于
      Check the return values of the br_mrp_switchdev function.
      In case of:
      - BR_MRP_NONE, return the error to userspace,
      - BR_MRP_SW, continue with SW implementation,
      - BR_MRP_HW, continue without SW implementation,
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd605d45
    • H
      bridge: mrp: Extend br_mrp_switchdev to detect better the errors · 1a3ddb0b
      Horatiu Vultur 提交于
      This patch extends the br_mrp_switchdev functions to be able to have a
      better understanding what cause the issue and if the SW needs to be used
      as a backup.
      
      There are the following cases:
      - when the code is compiled without CONFIG_NET_SWITCHDEV. In this case
        return success so the SW can continue with the protocol. Depending
        on the function, it returns 0 or BR_MRP_SW.
      - when code is compiled with CONFIG_NET_SWITCHDEV and the driver doesn't
        implement any MRP callbacks. In this case the HW can't run MRP so it
        just returns -EOPNOTSUPP. So the SW will stop further to configure the
        node.
      - when code is compiled with CONFIG_NET_SWITCHDEV and the driver fully
        supports any MRP functionality. In this case the SW doesn't need to do
        anything. The functions will return 0 or BR_MRP_HW.
      - when code is compiled with CONFIG_NET_SWITCHDEV and the HW can't run
        completely the protocol but it can help the SW to run it. For
        example, the HW can't support completely MRM role(can't detect when it
        stops receiving MRP Test frames) but it can redirect these frames to
        CPU. In this case it is possible to have a SW fallback. The SW will
        try initially to call the driver with sw_backup set to false, meaning
        that the HW should implement completely the role. If the driver returns
        -EOPNOTSUPP, the SW will try again with sw_backup set to false,
        meaning that the SW will detect when it stops receiving the frames but
        it needs HW support to redirect the frames to CPU. In case the driver
        returns 0 then the SW will continue to configure the node accordingly.
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a3ddb0b
    • H
      bridge: mrp: Add 'enum br_mrp_hw_support' · e1bd99d0
      Horatiu Vultur 提交于
      Add the enum br_mrp_hw_support that is used by the br_mrp_switchdev
      functions to allow the SW to detect the cases where HW can't implement
      the functionality or when SW is used as a backup.
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1bd99d0
  18. 16 2月, 2021 1 次提交
  19. 15 2月, 2021 3 次提交
  20. 13 2月, 2021 1 次提交
    • V
      net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes · e18f4c18
      Vladimir Oltean 提交于
      This switchdev attribute offers a counterproductive API for a driver
      writer, because although br_switchdev_set_port_flag gets passed a
      "flags" and a "mask", those are passed piecemeal to the driver, so while
      the PRE_BRIDGE_FLAGS listener knows what changed because it has the
      "mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
      value. But certain drivers can offload only certain combinations of
      settings, like for example they cannot change unicast flooding
      independently of multicast flooding - they must be both on or both off.
      The way the information is passed to switchdev makes drivers not
      expressive enough, and unable to reject this request ahead of time, in
      the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
      the deferred BRIDGE_FLAGS attribute, where the rejection is currently
      ignored.
      
      This patch also changes drivers to make use of the "mask" field for edge
      detection when possible.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e18f4c18