1. 09 8月, 2021 3 次提交
    • V
      net: dsa: flush the dynamic FDB of the software bridge when fast ageing a port · 9264e4ad
      Vladimir Oltean 提交于
      Currently, when DSA performs fast ageing on a port, 'bridge fdb' shows
      us that the 'self' entries (corresponding to the hardware bridge, as
      printed by dsa_slave_fdb_dump) are deleted, but the 'master' entries
      (corresponding to the software bridge) aren't.
      
      Indeed, searching through the bridge driver, neither the
      brport_attr_learning handler nor the IFLA_BRPORT_LEARNING handler call
      br_fdb_delete_by_port. However, br_stp_disable_port does, which is one
      of the paths which DSA uses to trigger a fast ageing process anyway.
      
      There is, however, one other very promising caller of
      br_fdb_delete_by_port, and that is the bridge driver's handler of the
      SWITCHDEV_FDB_FLUSH_TO_BRIDGE atomic notifier. Currently the s390/qeth
      HiperSockets card driver is the only user of this.
      
      I can't say I understand that driver's architecture or interaction with
      the bridge, but it appears to not be a switchdev driver in the traditional
      sense of the word. Nonetheless, the mechanism it provides is a useful
      way for DSA to express the fact that it performs fast ageing too, in a
      way that does not change the existing behavior for other drivers.
      
      Cc: Alexandra Winter <wintera@linux.ibm.com>
      Cc: Julian Wiedmann <jwi@linux.ibm.com>
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9264e4ad
    • V
      net: dsa: don't fast age bridge ports with learning turned off · 4eab90d9
      Vladimir Oltean 提交于
      On topology changes, stations that were dynamically learned on ports
      that are no longer part of the active topology must be flushed - this is
      described by clause "17.11 Updating learned station location information"
      of IEEE 802.1D-2004.
      
      However, when address learning on the bridge port is turned off in the
      first place, there is nothing to flush, so skip a potentially expensive
      operation.
      
      We can finally do this now since DSA is aware of the learning state of
      its bridged ports.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eab90d9
    • V
      net: dsa: centralize fast ageing when address learning is turned off · 045c45d1
      Vladimir Oltean 提交于
      Currently DSA leaves it down to device drivers to fast age the FDB on a
      port when address learning is disabled on it. There are 2 reasons for
      doing that in the first place:
      
      - when address learning is disabled by user space, through
        IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user
        space typically wants to achieve is to operate in a mode with no
        dynamic FDB entry on that port. But if the port is already up, some
        addresses might have been already learned on it, and it seems silly to
        wait for 5 minutes for them to expire until something useful can be
        done.
      
      - when a port leaves a bridge and becomes standalone, DSA turns off
        address learning on it. This also has the nice side effect of flushing
        the dynamically learned bridge FDB entries on it, which is a good idea
        because standalone ports should not have bridge FDB entries on them.
      
      We let drivers manage fast ageing under this condition because if DSA
      were to do it, it would need to track each port's learning state, and
      act upon the transition, which it currently doesn't.
      
      But there are 2 reasons why doing it is better after all:
      
      - drivers might get it wrong and not do it (see b53_port_set_learning)
      
      - we would like to flush the dynamic entries from the software bridge
        too, and letting drivers do that would be another pain point
      
      So track the port learning state and trigger a fast age process
      automatically within DSA.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      045c45d1
  2. 08 8月, 2021 3 次提交
    • J
      atm: horizon: Fix spelling mistakes in TX comment · 64ec13ec
      Jun Miao 提交于
      It's "must not", not "musn't", meaning "shall not".
      Let's fix that.
      Suggested-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NJun Miao <jun.miao@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64ec13ec
    • L
      devlink: Simplify devlink port API calls · 82564f6c
      Leon Romanovsky 提交于
      Devlink port already has pointer to the devlink instance and all API
      calls that forward these devlink ports to the drivers perform same
      "devlink_port->devlink" assignment before actual call.
      
      This patch removes useless parameter and allows us in the future
      to create specific devlink_port_ops to manage user space access with
      reliable ops assignment.
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82564f6c
    • V
      net: dsa: don't fast age standalone ports · 39f32101
      Vladimir Oltean 提交于
      DSA drives the procedure to flush dynamic FDB entries from a port based
      on the change of STP state: whenever we go from a state where address
      learning is enabled (LEARNING, FORWARDING) to a state where it isn't
      (LISTENING, BLOCKING, DISABLED), we need to flush the existing dynamic
      entries.
      
      However, there are cases when this is not needed. Internally, when a
      DSA switch interface is not under a bridge, DSA still keeps it in the
      "FORWARDING" STP state. And when that interface joins a bridge, the
      bridge will meticulously iterate that port through all STP states,
      starting with BLOCKING and ending with FORWARDING. Because there is a
      state transition from the standalone version of FORWARDING into the
      temporary BLOCKING bridge port state, DSA calls the fast age procedure.
      
      Since commit 5e38c158 ("net: dsa: configure better brport flags when
      ports leave the bridge"), DSA asks standalone ports to disable address
      learning. Therefore, there can be no dynamic FDB entries on a standalone
      port. Therefore, it does not make sense to flush dynamic FDB entries on
      one.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39f32101
  3. 07 8月, 2021 9 次提交
  4. 06 8月, 2021 25 次提交
    • D
      net: dsa: mt7530: drop untagged frames on VLAN-aware ports without PVID · 8fbebef8
      DENG Qingfang 提交于
      The driver currently still accepts untagged frames on VLAN-aware ports
      without PVID. Use PVC.ACC_FRM to drop untagged frames in that case.
      Signed-off-by: NDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fbebef8
    • D
      Merge branch 'dsa-cpu-flood' · 9b9311af
      David S. Miller 提交于
      Vladimir Oltean says:
      
      ====================
      Always flood multicast to the DSA CPU port
      
      Discussing with Qingfang, it became obvious that DSA is not prepared to
      disable multicast flooding towards the CPU port under any circumstance
      right now, and this in fact breaks traffic quite blatantly.
      
      This series is a revert done in reverse chronological order. These
      should be propagated to stable trees up to commit a8b659e7 ("net:
      dsa: act as passthrough for bridge port flags") which is in v5.12.
      For older kernels, that commit blocks further backporting, so I need to
      send a modified version of patch 3 separately to Greg after these go
      into "net".
      
      v1->v2: delete unused b53_set_mrouter function prototype
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b9311af
    • V
      net: dsa: don't disable multicast flooding to the CPU even without an IGMP querier · c73c5708
      Vladimir Oltean 提交于
      Commit 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER
      attribute") added an option for users to turn off multicast flooding
      towards the CPU if they turn off the IGMP querier on a bridge which
      already has enslaved ports (echo 0 > /sys/class/net/br0/bridge/multicast_router).
      
      And commit a8b659e7 ("net: dsa: act as passthrough for bridge port flags")
      simply papered over that issue, because it moved the decision to flood
      the CPU with multicast (or not) from the DSA core down to individual drivers,
      instead of taking a more radical position then.
      
      The truth is that disabling multicast flooding to the CPU is simply
      something we are not prepared to do now, if at all. Some reasons:
      
      - ICMP6 neighbor solicitation messages are unregistered multicast
        packets as far as the bridge is concerned. So if we stop flooding
        multicast, the outside world cannot ping the bridge device's IPv6
        link-local address.
      
      - There might be foreign interfaces bridged with our DSA switch ports
        (sending a packet towards the host does not necessarily equal
        termination, but maybe software forwarding). So if there is no one
        interested in that multicast traffic in the local network stack, that
        doesn't mean nobody is.
      
      - PTP over L4 (IPv4, IPv6) is multicast, but is unregistered as far as
        the bridge is concerned. This should reach the CPU port.
      
      - The switch driver might not do FDB partitioning. And since we don't
        even bother to do more fine-grained flood disabling (such as "disable
        flooding _from_port_N_ towards the CPU port" as opposed to "disable
        flooding _from_any_port_ towards the CPU port"), this breaks standalone
        ports, or even multiple bridges where one has an IGMP querier and one
        doesn't.
      
      Reverting the logic makes all of the above work.
      
      Fixes: a8b659e7 ("net: dsa: act as passthrough for bridge port flags")
      Fixes: 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER attribute")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c73c5708
    • V
      net: dsa: mt7530: remove the .port_set_mrouter implementation · cbbf09b5
      Vladimir Oltean 提交于
      DSA's idea of optimizing out multicast flooding to the CPU port leaves
      quite a few holes open, so it should be reverted.
      
      The mt7530 driver is the only new driver which added a .port_set_mrouter
      implementation after the reorg from commit a8b659e7 ("net: dsa: act
      as passthrough for bridge port flags"), so it needs to be reverted
      separately so that the other revert commit can go a bit further down the
      git history.
      
      Fixes: 5a30833b ("net: dsa: mt7530: support MDB and bridge flag operations")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbbf09b5
    • V
      net: dsa: stop syncing the bridge mcast_router attribute at join time · 7df4e744
      Vladimir Oltean 提交于
      Qingfang points out that when a bridge with the default settings is
      created and a port joins it:
      
      ip link add br0 type bridge
      ip link set swp0 master br0
      
      DSA calls br_multicast_router() on the bridge to see if the br0 device
      is a multicast router port, and if it is, it enables multicast flooding
      to the CPU port, otherwise it disables it.
      
      If we look through the multicast_router_show() sysfs or at the
      IFLA_BR_MCAST_ROUTER netlink attribute, we see that the default mrouter
      attribute for the bridge device is "1" (MDB_RTR_TYPE_TEMP_QUERY).
      
      However, br_multicast_router() will return "0" (MDB_RTR_TYPE_DISABLED),
      because an mrouter port in the MDB_RTR_TYPE_TEMP_QUERY state may not be
      actually _active_ until it receives an actual IGMP query. So, the
      br_multicast_router() function should really have been called
      br_multicast_router_active() perhaps.
      
      When/if an IGMP query is received, the bridge device will transition via
      br_multicast_mark_router() into the active state until the
      ip4_mc_router_timer expires after an multicast_querier_interval.
      
      Of course, this does not happen if the bridge is created with an
      mcast_router attribute of "2" (MDB_RTR_TYPE_PERM).
      
      The point is that in lack of any IGMP query messages, and in the default
      bridge configuration, unregistered multicast packets will not be able to
      reach the CPU port through flooding, and this breaks many use cases
      (most obviously, IPv6 ND, with its ICMP6 neighbor solicitation multicast
      messages).
      
      Leave the multicast flooding setting towards the CPU port down to a driver
      level decision.
      
      Fixes: 010e269f ("net: dsa: sync up switchdev objects and port attributes when joining the bridge")
      Reported-by: NDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7df4e744
    • G
      net: ethernet: ti: am65-cpsw: use napi_complete_done() in TX completion · 3bacbe04
      Grygorii Strashko 提交于
      This patch enables support for hard irqs deferral feature from Eric Dumazet
      [1] for TI K3 CPSW driver by using napi_complete_done() in TX completion
      path.
      
      Depending on gro_flush_timeout and napi_defer_hard_irqs at gives up to 30%
      CPU utilization reduction:
      
      gro_flush_timeout=50000
      napi_defer_hard_irqs=2
      
      netperf -l 10 -H 192.168.1.1  -t UDP_STREAM -c -C -- -m 1470
      MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      before:
      212992    1470   10.00      809632      0      952.0     42.98    14.792
      212992           10.00      809630             952.0     50.66    8.719
      
      after:
      212992    1470   10.00      813686      0      956.8     32.14    11.009
      212992           10.00      813686             956.8     50.05    8.570
      
      [1] https://lore.kernel.org/netdev/20200422161329.56026-1-edumazet@google.com/Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bacbe04
    • V
      net: ti: am65-cpsw-nuss: fix RX IRQ state after .ndo_stop() · 47bfc4d1
      Vignesh Raghavendra 提交于
      On TI K3 am64x platform the issue with RX IRQ is observed - it's become
      disabled forever after .ndo_stop(). The K3 CPSW driver manipulates RX IRQ
      by using standard Linux enable_irq()/disable_irq_nosync() API as there is
      no IRQ enable/disable options in CPSW HW itself, as result during
      .ndo_stop() following sequence happens
      
        phy_stop()
        teardown TX/RX channels
        wait for TX tdown complete
        napi_disable(TX)
        clean up TX channels
      
        (a)
      
        napi_disable(RX)
      
      At point (a) it's not possible to predict if RX IRQ was triggered or not.
      if RX IRQ was triggered then it also not possible to definitely say if RX
      NAPI was run or only scheduled and immediately canceled by
      napi_disable(RX). Actually the last case causes RX IRQ to be permanently
      disabled.
      
      Another observed issue is that RX IRQ enable counter become unbalanced if
      (gro_flush_timeout =! 0) while (napi_defer_hard_irqs == 0):
      
      Unbalanced enable for IRQ 44
      WARNING: CPU: 0 PID: 10 at ../kernel/irq/manage.c:776 __enable_irq+0x38/0x80
      __enable_irq+0x38/0x80
      enable_irq+0x54/0xb0
      am65_cpsw_nuss_rx_poll+0x2f4/0x368
      __napi_poll+0x34/0x1b8
      net_rx_action+0xe4/0x220
      _stext+0x11c/0x284
      run_ksoftirqd+0x4c/0x60
      
      To avoid above issues introduce flag indicating if RX was actually disabled
      before enabling it in am65_cpsw_nuss_rx_poll() and restore RX IRQ state in
      .ndo_open()
      
      Fixes: 4f7cce27 ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g")
      Signed-off-by: NVignesh Raghavendra <vigneshr@ti.com>
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47bfc4d1
    • D
      Merge branch 'ptp-ocp-fixes' · 370cb73a
      David S. Miller 提交于
      Jonathan Lemon says:
      
      ====================
      ptp: ocp: assorted fixes.
      
      Assorted fixes for the ocp timecard.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      370cb73a
    • J
      ptp: ocp: Remove pending_image indicator from devlink · 8ef8ccbc
      Jonathan Lemon 提交于
      After writing an image blob to the flash memory, a reboot is required
      to reload the FPGA.  There is no versioning prsent in the FPGA image
      file, so only a running version is available.  The 'stored version'
      was set to 'pending' in order to indicate a reboot was needed.
      
      This isn't reliable, as the module could be unloaded/loaded, losing
      the "reboot needed" indicator.  Also, the devlink 'stored version'
      information is designed to refer to the actual image version.
      
      Unfortunately, there is no method to determine the flash image version
      other than booting it, so remove the devlink stored version setting.
      Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ef8ccbc
    • J
      ptp: ocp: Rename version string shown by devlink. · 1a052da9
      Jonathan Lemon 提交于
      The TimeCard has two FPGA images in the flash: the actual firmware,
      and a manufacturing fallback version which is intended to act as a
      loader in case the flash update failed.
      
      Name these "fw" and "loader", which are reflected in devlink:
      
          [root@timecard drv]# devlink dev info
          pci/0000:04:00.0:
            driver ptp_ocp
            serial_number fc:c2:3d:2e:d7:c0
            versions:
                running:
                  fw 5
      Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a052da9
    • J
      ptp: ocp: Use 'gnss' naming instead of 'gps' · ef0cfb34
      Jonathan Lemon 提交于
      GPS is not the only available positioning system.  Use the generic
      naming of "GNSS" instead.
      Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef0cfb34
    • J
      ptp: ocp: Remove devlink health and unused parameters. · 37a156ba
      Jonathan Lemon 提交于
      "devlink health" was used as a way to monitor the GNSS signal
      status.  This isn't really the intended use, and the same
      functionality can be achived by monitoring the status file.
      
      Remove the devlink heath support entirely, and also remove the
      currently unused devlink parameters.
      Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37a156ba
    • J
      ptp: ocp: Add the mapping for the external PPS registers. · 0d43d4f2
      Jonathan Lemon 提交于
      There are two PPS blocks: one handles the external PPS signal output,
      with the other handling the PPS signal input to the internal clock.
      Add controls for the external PPS block.
      
      Rename the fields so they match their function.
      
      Add cable_delay to the register definitions.
      Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d43d4f2
    • J
      ptp: ocp: Fix the error handling path for the class device. · d12f23fa
      Jonathan Lemon 提交于
      Move the put_device() call to the error handling path, so the
      device is released after the .release callback, avoiding a
      use-after-free.
      Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d12f23fa
    • H
      ethtool: return error from ethnl_ops_begin if dev is NULL · 596690e9
      Heiner Kallweit 提交于
      Julian reported that after d43c65b0 Coverity complains about a
      missing check whether dev is NULL in ethnl_ops_complete().
      There doesn't seem to be any valid case where dev could be NULL when
      calling ethnl_ops_begin(), therefore return an error if dev is NULL.
      
      Fixes: d43c65b0 ("ethtool: runtime-resume netdev parent in ethnl_ops_begin")
      Reported-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      596690e9
    • L
      netdevsim: Protect both reload_down and reload_up paths · 5c0418ed
      Leon Romanovsky 提交于
      Don't progress with adding and deleting ports as long as devlink
      reload is running.
      
      Fixes: 23809a72 ("netdevsim: Forbid devlink reload when adding or deleting ports")
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c0418ed
    • D
      Merge branch 'cpsw-emac-skb_put_padto' · a5516053
      David S. Miller 提交于
      Grygorii Strashko says:
      
      ====================
      net: ethernet: ti: cpsw/emac: switch to use skb_put_padto()
      
      Now frame padding in TI TI CPSW/EMAC is implemented in a bit of entangled way as
      frame SKB padded in drivers (without skb->len) while frame length fixed in CPDMA.
      Things became even more confusing hence CPSW switcdev driver need to perform min
      TX frame length correction in switch mode [1].
      
      To avoid further confusion, make xmit path more clear and linear, and avoid
      updating CPDMA configuration interface for min TX frame length correction
      (which is not CPDMA job in general) this series switches TI CPSW/EMAC
      drivers to skb_put_padto() instead of skb_padto() in their xmit path, so
      skb->len also got updated properly and then removes TX frame length
      fixup from CPDMA code.
      
      [1] https://patchwork.kernel.org/project/netdevbpf/patch/20210611132732.10690-1-grygorii.strashko@ti.com/
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5516053
    • G
      net: ethernet: ti: davinci_cpdma: drop frame padding · 9ffc513f
      Grygorii Strashko 提交于
      Hence all users of davinci_cpdma switched to skb_put_padto() the frame
      padding can be removed from it.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ffc513f
    • G
      net: ethernet: ti: davinci_emac: switch to use skb_put_padto() · 61e7a22d
      Grygorii Strashko 提交于
      Use skb_put_padto() instead of skb_padto() so skb->len also got updated, as
      preparation for further removing frame padding from cpdma.
      It also makes xmit path more clear and linear.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61e7a22d
    • G
      net: ethernet: ti: cpsw: switch to use skb_put_padto() · 1f88d5d5
      Grygorii Strashko 提交于
      Use skb_put_padto() instead of skb_padto() so skb->len also got updated, as
      preparation for further removing frame padding from cpdma.
      It also makes xmit path more clear and linear.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f88d5d5
    • J
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 0ca8d3ca
      Jakub Kicinski 提交于
      Build failure in drivers/net/wwan/mhi_wwan_mbim.c:
      add missing parameter (0, assuming we don't want buffer pre-alloc).
      
      Conflict in drivers/net/dsa/sja1105/sja1105_main.c between:
        589918df ("net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
        0fac6aa0 ("net: dsa: sja1105: delete the best_effort_vlan_filtering mode")
      
      Follow the instructions from the commit message of the former commit
      - removed the if conditions. When looking at commit 589918df ("net:
      dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too")
      note that the mask_iotag fields get removed by the following patch.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      0ca8d3ca
    • L
      Merge tag 'net-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 902e7f37
      Linus Torvalds 提交于
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from ipsec.
      
        Current release - regressions:
      
         - sched: taprio: fix init procedure to avoid inf loop when dumping
      
         - sctp: move the active_key update after sh_keys is added
      
        Current release - new code bugs:
      
         - sparx5: fix build with old GCC & bitmask on 32-bit targets
      
        Previous releases - regressions:
      
         - xfrm: redo the PREEMPT_RT RCU vs hash_resize_mutex deadlock fix
      
         - xfrm: fixes for the compat netlink attribute translator
      
         - phy: micrel: Fix detection of ksz87xx switch
      
        Previous releases - always broken:
      
         - gro: set inner transport header offset in tcp/udp GRO hook to avoid
           crashes when such packets reach GSO
      
         - vsock: handle VIRTIO_VSOCK_OP_CREDIT_REQUEST, as required by spec
      
         - dsa: sja1105: fix static FDB entries on SJA1105P/Q/R/S and SJA1110
      
         - bridge: validate the NUD_PERMANENT bit when adding an extern_learn
           FDB entry
      
         - usb: lan78xx: don't modify phy_device state concurrently
      
         - usb: pegasus: check for errors of IO routines"
      
      * tag 'net-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (48 commits)
        net: vxge: fix use-after-free in vxge_device_unregister
        net: fec: fix use-after-free in fec_drv_remove
        net: pegasus: fix uninit-value in get_interrupt_interval
        net: ethernet: ti: am65-cpsw: fix crash in am65_cpsw_port_offload_fwd_mark_update()
        bnx2x: fix an error code in bnx2x_nic_load()
        net: wwan: iosm: fix recursive lock acquire in unregister
        net: wwan: iosm: correct data protocol mask bit
        net: wwan: iosm: endianness type correction
        net: wwan: iosm: fix lkp buildbot warning
        net: usb: lan78xx: don't modify phy_device state concurrently
        docs: networking: netdevsim rules
        net: usb: pegasus: Remove the changelog and DRIVER_VERSION.
        net: usb: pegasus: Check the return value of get_geristers() and friends;
        net/prestera: Fix devlink groups leakage in error flow
        net: sched: fix lockdep_set_class() typo error for sch->seqlock
        net: dsa: qca: ar9331: reorder MDIO write sequence
        VSOCK: handle VIRTIO_VSOCK_OP_CREDIT_REQUEST
        mptcp: drop unused rcu member in mptcp_pm_addr_entry
        net: ipv6: fix returned variable type in ip6_skb_dst_mtu
        nfp: update ethtool reporting of pauseframe control
        ...
      902e7f37
    • T
      Bluetooth: defer cleanup of resources in hci_unregister_dev() · e0448092
      Tetsuo Handa 提交于
      syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
      calling lock_sock() with rw spinlock held [1].
      
      It seems that history of this locking problem is a trial and error.
      
      Commit b40df574 ("[PATCH] bluetooth: fix socket locking in
      hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to
      lock_sock() as an attempt to fix lockdep warning.
      
      Then, commit 4ce61d1c ("[BLUETOOTH]: Fix locking in
      hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to
      local_bh_disable() + bh_lock_sock_nested() as an attempt to fix the
      sleep in atomic context warning.
      
      Then, commit 4b5dd696 ("Bluetooth: Remove local_bh_disable() from
      hci_sock.c") in 3.3-rc1 removed local_bh_disable().
      
      Then, commit e305509e ("Bluetooth: use correct lock to prevent UAF
      of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to
      lock_sock() as an attempt to fix CVE-2021-3573.
      
      This difficulty comes from current implementation that
      hci_sock_dev_event(HCI_DEV_UNREG) is responsible for dropping all
      references from sockets because hci_unregister_dev() immediately
      reclaims resources as soon as returning from
      hci_sock_dev_event(HCI_DEV_UNREG).
      
      But the history suggests that hci_sock_dev_event(HCI_DEV_UNREG) was not
      doing what it should do.
      
      Therefore, instead of trying to detach sockets from device, let's accept
      not detaching sockets from device at hci_sock_dev_event(HCI_DEV_UNREG),
      by moving actual cleanup of resources from hci_unregister_dev() to
      hci_cleanup_dev() which is called by bt_host_release() when all
      references to this unregistered device (which is a kobject) are gone.
      
      Since hci_sock_dev_event(HCI_DEV_UNREG) no longer resets
      hci_pi(sk)->hdev, we need to check whether this device was unregistered
      and return an error based on HCI_UNREGISTER flag.  There might be subtle
      behavioral difference in "monitor the hdev" functionality; please report
      if you found something went wrong due to this patch.
      
      Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1]
      Reported-by: Nsyzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Fixes: e305509e ("Bluetooth: use correct lock to prevent UAF of hdev object")
      Acked-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0448092
    • L
      Merge tag 'selinux-pr-20210805' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 0b53abfc
      Linus Torvalds 提交于
      Pull selinux fix from Paul Moore:
       "One small SELinux fix for a problem where an error code was not being
        propagated back up to userspace when a bogus SELinux policy is loaded
        into the kernel"
      
      * tag 'selinux-pr-20210805' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: correct the return value when loads initial sids
      0b53abfc
    • L
      Merge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · 6209049e
      Linus Torvalds 提交于
      Pull ucounts fix from Eric Biederman:
       "Fix a subtle locking versus reference counting bug in the ucount
        changes, found by syzbot"
      
      * 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        ucounts: Fix race condition between alloc_ucounts and put_ucounts
      6209049e