1. 09 4月, 2021 40 次提交
    • B
      net: ethernet: mtk-star-emac: fix wrong unmap in RX handling · 8f07278d
      Biao Huang 提交于
      stable inclusion
      from stable-5.10.24
      commit fa0bc09db49bf4875d9a8c88813fe2b87c1059bb
      bugzilla: 51348
      
      --------------------------------
      
      commit 95b39f07 upstream.
      
      mtk_star_dma_unmap_rx() should unmap the dma_addr of old skb rather than
      that of new skb.
      Assign new_dma_addr to desc_data.dma_addr after all handling of old skb
      ends to avoid unexpected receive side error.
      
      Fixes: f96e9641 ("net: ethernet: mtk-star-emac: fix error path in RX handling")
      Signed-off-by: NBiao Huang <biao.huang@mediatek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8f07278d
    • V
      net: enetc: keep RX ring consumer index in sync with hardware · 684db2b4
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 1cdd008902d4e32f270e8fdb3239db6412f0a90b
      bugzilla: 51348
      
      --------------------------------
      
      commit 3a5d12c9 upstream.
      
      The RX rings have a producer index owned by hardware, where newly
      received frame buffers are placed, and a consumer index owned by
      software, where newly allocated buffers are placed, in expectation of
      hardware being able to place frame data in them.
      
      Hardware increments the producer index when a frame is received, however
      it is not allowed to increment the producer index to match the consumer
      index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received
      BDs. Whenever the producer index matches the value of the consumer
      index, the ring has no unprocessed received frames and all BDs in the
      ring have been initialized/prepared by software, i.e. hardware owns all
      BDs in the ring.
      
      The code uses the next_to_clean variable to keep track of the producer
      index, and the next_to_use variable to keep track of the consumer index.
      
      The RX rings are seeded from enetc_refill_rx_ring, which is called from
      two places:
      
      1. initially the ring is seeded until full with enetc_bd_unused(rx_ring),
         i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511:
      
      .ndo_open
      -> enetc_open
         -> enetc_setup_bdrs
            -> enetc_setup_rxbdr
               -> enetc_refill_rx_ring
      
      2. then during the data path processing, it is refilled with 16 buffers
         at a time:
      
      enetc_msix
      -> napi_schedule
         -> enetc_poll
            -> enetc_clean_rx_ring
               -> enetc_refill_rx_ring
      
      There is just one problem: the initial seeding done during .ndo_open
      updates just the producer index (ENETC_RBPIR) with 0, and the software
      next_to_clean and next_to_use variables. Notably, it will not update the
      consumer index to make the hardware aware of the newly added buffers.
      
      Wait, what? So how does it work?
      
      Well, the reset values of the producer index and of the consumer index
      of a ring are both zero. As per the description in the second paragraph,
      it means that the ring is full of buffers waiting for hardware to put
      frames in them, which by coincidence is almost true, because we have in
      fact seeded 511 buffers into the ring.
      
      But will the hardware attempt to access the 512th entry of the ring,
      which has an invalid BD in it? Well, no, because in order to do that, it
      would have to first populate the first 511 entries, and the NAPI
      enetc_poll will kick in by then. Eventually, after 16 processed slots
      have become available in the RX ring, enetc_clean_rx_ring will call
      enetc_refill_rx_ring and then will [ finally ] update the consumer index
      with the new software next_to_use variable. From now on, the
      next_to_clean and next_to_use variables are in sync with the producer
      and consumer ring indices.
      
      So the day is saved, right? Well, not quite. Freeing the memory
      allocated for the rings is done in:
      
      enetc_close
      -> enetc_clear_bdrs
         -> enetc_clear_rxbdr
            -> this just disables the ring
      -> enetc_free_rxtx_rings
         -> enetc_free_rx_ring
            -> sets next_to_clean and next_to_use to 0
      
      but again, nothing is committed to the hardware producer and consumer
      indices (yay!). The assumption is that the ring is disabled, so the
      indices don't matter anyway, and it's the responsibility of the "open"
      code path to set those up.
      
      .. Except that the "open" code path does not set those up properly.
      
      While initially, things almost work, during subsequent enetc_close ->
      enetc_open sequences, we have problems. To be precise, the enetc_open
      that is subsequent to enetc_close will again refill the ring with 511
      entries, but it will leave the consumer index untouched. Untouched
      means, of course, equal to the value it had before disabling the ring
      and draining the old buffers in enetc_close.
      
      But as mentioned, enetc_setup_rxbdr will at least update the producer
      index though, through this line of code:
      
      	enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);
      
      so at this stage we'll have:
      
      next_to_clean=0 (in hardware 0)
      next_to_use=511 (in hardware we'll have the refill index prior to enetc_close)
      
      Again, the next_to_clean and producer index are in sync and set to
      correct values, so the driver manages to limp on. Eventually, 16 ring
      entries will be consumed by enetc_poll, and the savior
      enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then
      update the hardware consumer ring based upon the new next_to_use.
      
      So.. it works?
      Well, by coincidence, it almost does, but there's a circumstance where
      enetc_clean_rx_ring won't be there to save us. If the previous value of
      the consumer index was 15, there's a problem, because the NAPI poll
      sequence will only issue a refill when 16 or more buffers have been
      consumed.
      
      It's easiest to illustrate this with an example:
      
      ip link set eno0 up
      ip addr add 192.168.100.1/24 dev eno0
      ping 192.168.100.1 -c 20 # ping this port from another board
      ip link set eno0 down
      ip link set eno0 up
      ping 192.168.100.1 -c 20 # ping it again from the same other board
      
      One by one:
      
      1. ip link set eno0 up
      -> calls enetc_setup_rxbdr:
         -> calls enetc_refill_rx_ring(511 buffers)
         -> next_to_clean=0 (in hw 0)
         -> next_to_use=511 (in hw 0)
      
      2. ping 192.168.100.1 -c 20 # ping this port from another board
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15)
      
      20 packets transmitted, 20 packets received, 0% packet loss
      
      3. ip link set eno0 down
      enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15)
      
      4. ip link set eno0 up
      -> calls enetc_setup_rxbdr:
         -> calls enetc_refill_rx_ring(511 buffers)
         -> next_to_clean=0 (in hw 0)
         -> next_to_use=511 (in hw 15)
      
      5. ping 192.168.100.1 -c 20 # ping it again from the same other board
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15)
      
      20 packets transmitted, 12 packets received, 40% packet loss
      
      And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal
      to 15 for that to happen), no nothing. The hardware enters the condition where
      the producer (14) + 1 is equal to the consumer (15) index, which makes it
      believe it has no more free buffers to put packets in, so it starts discarding
      them:
      
      ip netns exec ns0 ethtool -S eno0 | grep -v ': 0'
      NIC statistics:
           Rx ring  0 discarded frames: 8
      
      Summarized, if the interface receives between 16 and 32 (mod 512) frames
      and then there is a link flap, then the port will eventually die with no
      way to recover. If it receives less than 16 (mod 512) frames, then the
      initial NAPI poll [ before the link flap ] will not update the consumer
      index in hardware (it will remain zero) which will be ok when the buffers
      are later reinitialized. If more than 32 (mod 512) frames are received,
      the initial NAPI poll has the chance to refill the ring twice, updating
      the consumer index to at least 32. So after the link flap, the consumer
      index is still wrong, but the post-flap NAPI poll gets a chance to
      refill the ring once (because it passes through cleaned_cnt=15) and
      makes the consumer index be again back in sync with next_to_use.
      
      The solution to this problem is actually simple, we just need to write
      next_to_use into the hardware consumer index at enetc_open time, which
      always brings it back in sync after an initial buffer seeding process.
      
      The simpler thing would be to put the write to the consumer index into
      enetc_refill_rx_ring directly, but there are issues with the MDIO
      locking: in the NAPI poll code we have the enetc_lock_mdio() taken from
      top-level and we use the unlocked enetc_wr_reg_hot, whereas in
      enetc_open, the enetc_lock_mdio() is not taken at the top level, but
      instead by each individual enetc_wr_reg, so we are forced to put an
      additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of
      the code is left as a refactoring exercise.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      684db2b4
    • V
      net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr · 3829c6cf
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 5317365401119e88268d61691d298704ca7286c4
      bugzilla: 51348
      
      --------------------------------
      
      commit 96a5223b upstream.
      
      The Station Interface Receive Interrupt Detect Register (SIRXIDR)
      contains a 16-bit wide mask of 'interrupt detected' events for each ring
      associated with a port. Bit i is write-1-to-clean for RX ring i.
      
      I have no explanation whatsoever how this line of code came to be
      inserted in the blamed commit. I checked the downstream versions of that
      patch and none of them have it.
      
      The somewhat comical aspect of it is that we're writing a binary number
      to the SIRXIDR register, which is derived from enetc_bd_unused(rx_ring).
      Since the RX rings have 512 buffer descriptors, we end up writing 511 to
      this register, which is 0x1ff, so we are effectively clearing the
      'interrupt detected' event for rings 0-8.
      
      This register is not what is used for interrupt handling though - it
      only provides a summary for the entire SI. The hardware provides one
      separate Interrupt Detect Register per RX ring, which auto-clears upon
      read. So there doesn't seem to be any adverse effect caused by this
      bogus write.
      
      There is, however, one reason why this should be handled as a bugfix:
      next_to_clean _should_ be committed to hardware, just not to that
      register, and this was obscuring the fact that it wasn't. This is fixed
      in the next patch, and removing the bogus line now allows the fix patch
      to be backported beyond that point.
      
      Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3829c6cf
    • V
      net: enetc: force the RGMII speed and duplex instead of operating in inband mode · d0a90192
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 63876df5615edfe94291409eb862f4570e2f4ffc
      bugzilla: 51348
      
      --------------------------------
      
      commit c76a9721 upstream.
      
      The ENETC port 0 MAC supports in-band status signaling coming from a PHY
      when operating in RGMII mode, and this feature is enabled by default.
      
      It has been reported that RGMII is broken in fixed-link, and that is not
      surprising considering the fact that no PHY is attached to the MAC in
      that case, but a switch.
      
      This brings us to the topic of the patch: the enetc driver should have
      not enabled the optional in-band status signaling for RGMII unconditionally,
      but should have forced the speed and duplex to what was resolved by
      phylink.
      
      Note that phylink does not accept the RGMII modes as valid for in-band
      signaling, and these operate a bit differently than 1000base-x and SGMII
      (notably there is no clause 37 state machine so no ACK required from the
      MAC, instead the PHY sends extra code words on RXD[3:0] whenever it is
      not transmitting something else, so it should be safe to leave a PHY
      with this option unconditionally enabled even if we ignore it). The spec
      talks about this here:
      https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/138/RGMIIv1_5F00_3.pdf
      
      Fixes: 71b77a7a ("enetc: Migrate to PHYLINK and PCS_LYNX")
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      d0a90192
    • V
      net: enetc: don't disable VLAN filtering in IFF_PROMISC mode · 4b000964
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 5732688c8411b1d29a3676819c279236b0a0ec5b
      bugzilla: 51348
      
      --------------------------------
      
      commit a74dbce9 upstream.
      
      Quoting from the blamed commit:
      
          In promiscuous mode, it is more intuitive that all traffic is received,
          including VLAN tagged traffic. It appears that it is necessary to set
          the flag in PSIPVMR for that to be the case, so VLAN promiscuous mode is
          also temporarily enabled. On exit from promiscuous mode, the setting
          made by ethtool is restored.
      
      Intuitive or not, there isn't any definition issued by a standards body
      which says that promiscuity has anything to do with VLAN filtering - it
      only has to do with accepting packets regardless of destination MAC address.
      
      In fact people are already trying to use this misunderstanding/bug of
      the enetc driver as a justification to transform promiscuity into
      something it never was about: accepting every packet (maybe that would
      be the "rx-all" netdev feature?):
      https://lore.kernel.org/netdev/20201110153958.ci5ekor3o2ekg3ky@ipetronik.com/
      
      This is relevant because there are use cases in the kernel (such as
      tc-flower rules with the protocol 802.1Q and a vlan_id key) which do not
      (yet) use the vlan_vid_add API to be compatible with VLAN-filtering NICs
      such as enetc, so for those, disabling rx-vlan-filter is currently the
      only right solution to make these setups work:
      https://lore.kernel.org/netdev/CA+h21hoxwRdhq4y+w8Kwgm74d4cA0xLeiHTrmT-VpSaM7obhkg@mail.gmail.com/
      The blamed patch has unintentionally introduced one more way for this to
      work, which is to enable IFF_PROMISC, however this is non-portable
      because port promiscuity is not meant to disable VLAN filtering.
      Therefore, it could invite people to write broken scripts for enetc, and
      then wonder why they are broken when migrating to other drivers that
      don't handle promiscuity in the same way.
      
      Fixes: 7070eea5 ("enetc: permit configuration of rx-vlan-filter with ethtool")
      Cc: Markus Blöchl <Markus.Bloechl@ipetronik.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      4b000964
    • V
      net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets · fded4910
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit d56e3f8d289bdc70378f84efab166ad38022532e
      bugzilla: 51348
      
      --------------------------------
      
      commit 827b6fd0 upstream.
      
      When the enetc ports have rx-vlan-offload enabled, they report a TPID of
      ETH_P_8021Q regardless of what was actually in the packet. When
      rx-vlan-offload is disabled, packets have the proper TPID. Fix this
      inconsistency by finishing the TODO left in the code.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      fded4910
    • V
      net: enetc: take the MDIO lock only once per NAPI poll cycle · cfce9813
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit bf9c564716a13dde6a990d3b02c27cd6e39608bf
      bugzilla: 51348
      
      --------------------------------
      
      commit 6d36ecdb upstream.
      
      The workaround for the ENETC MDIO erratum caused a performance
      degradation of 82 Kpps (seen with IP forwarding of two 1Gbps streams of
      64B packets). This is due to excessive locking and unlocking in the fast
      path, which can be avoided.
      
      By taking the MDIO read-side lock only once per NAPI poll cycle, we are
      able to regain 54 Kpps (65%) of the performance hit. The rest of the
      performance degradation comes from the TX data path, but unfortunately
      it doesn't look like we can optimize that away easily, even with
      netdev_xmit_more(), there just isn't any skb batching done, to help with
      taking the MDIO lock less often than once per packet.
      
      We need to change the register accessor type for enetc_get_tx_tstamp,
      because it now runs under the enetc_lock_mdio as per the new call path
      detailed below:
      
      enetc_msix
      -> napi_schedule
         -> enetc_poll
            -> enetc_lock_mdio
            -> enetc_clean_tx_ring
               -> enetc_get_tx_tstamp
            -> enetc_clean_rx_ring
            -> enetc_unlock_mdio
      
      Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      cfce9813
    • V
      net: enetc: don't overwrite the RSS indirection table when initializing · 88db41d4
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit dfaf418dfff819aaa5e6a945bb8efd38d53b6eb9
      bugzilla: 51348
      
      --------------------------------
      
      commit c646d10d upstream.
      
      After the blamed patch, all RX traffic gets hashed to CPU 0 because the
      hashing indirection table set up in:
      
      enetc_pf_probe
      -> enetc_alloc_si_resources
         -> enetc_configure_si
            -> enetc_setup_default_rss_table
      
      is overwritten later in:
      
      enetc_pf_probe
      -> enetc_init_port_rss_memory
      
      which zero-initializes the entire port RSS table in order to avoid ECC errors.
      
      The trouble really is that enetc_init_port_rss_memory really neads
      enetc_alloc_si_resources to be called, because it depends upon
      enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si
      thing could have been better thought out, it has nothing to do in a
      function called "alloc_si_resources", especially since its counterpart,
      "free_si_resources", does nothing to unwind the configuration of the SI.
      
      The point is, we need to pull out enetc_configure_si out of
      enetc_alloc_resources, and move it after enetc_init_port_rss_memory.
      This allows us to set up the default RSS indirection table after
      initializing the memory.
      
      Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories")
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      88db41d4
    • S
      sh_eth: fix TRSCER mask for SH771x · 2b4b2b80
      Sergey Shtylyov 提交于
      stable inclusion
      from stable-5.10.24
      commit 4ea379733555d652acadb05112a3365e5059f6f4
      bugzilla: 51348
      
      --------------------------------
      
      commit 8c91bc3d upstream.
      
      According  to  the SH7710, SH7712, SH7713 Group User's Manual: Hardware,
      Rev. 3.00, the TRSCER register actually has only bit 7 valid (and named
      differently), with all the other bits reserved. Apparently, this was not
      the case with some early revisions of the manual as we have the other
      bits declared (and set) in the original driver.  Follow the suit and add
      the explicit sh_eth_cpu_data::trscer_err_mask initializer for SH771x...
      
      Fixes: 86a74ff2 ("net: sh_eth: add support for Renesas SuperH Ethernet")
      Signed-off-by: NSergey Shtylyov <s.shtylyov@omprussia.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      2b4b2b80
    • D
      net: dsa: tag_rtl4_a: fix egress tags · 8de05457
      DENG Qingfang 提交于
      stable inclusion
      from stable-5.10.24
      commit 68277f69a8734a444a05dce9f78ce79c1225d08d
      bugzilla: 51348
      
      --------------------------------
      
      commit 9eb8bc59 upstream.
      
      Commit 86dd9868 has several issues, but was accepted too soon
      before anyone could take a look.
      
      - Double free. dsa_slave_xmit() will free the skb if the xmit function
        returns NULL, but the skb is already freed by eth_skb_pad(). Use
        __skb_put_padto() to avoid that.
      - Unnecessary allocation. It has been done by DSA core since commit
        a3b0b647.
      - A u16 pointer points to skb data. It should be __be16 for network
        byte order.
      - Typo in comments. "numer" -> "number".
      
      Fixes: 86dd9868 ("net: dsa: tag_rtl4_a: Support also egress tags")
      Signed-off-by: NDENG Qingfang <dqfext@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8de05457
    • J
      docs: networking: drop special stable handling · 8157b471
      Jakub Kicinski 提交于
      stable inclusion
      from stable-5.10.24
      commit 389055e7b97048c7ecd6066cdac2c703bae493bc
      bugzilla: 51348
      
      --------------------------------
      
      commit dbbe7c96 upstream.
      
      Leave it to Greg.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8157b471
    • L
      Revert "mm, slub: consider rest of partial list if acquire_slab() fails" · b36ca11e
      Linus Torvalds 提交于
      stable inclusion
      from stable-5.10.24
      commit e1759160877a06082a9323dfb9437abfbe4af2d3
      bugzilla: 51348
      
      --------------------------------
      
      commit 9b1ea29b upstream.
      
      This reverts commit 8ff60eb0.
      
      The kernel test robot reports a huge performance regression due to the
      commit, and the reason seems fairly straightforward: when there is
      contention on the page list (which is what causes acquire_slab() to
      fail), we do _not_ want to just loop and try again, because that will
      transfer the contention to the 'n->list_lock' spinlock we hold, and
      just make things even worse.
      
      This is admittedly likely a problem only on big machines - the kernel
      test robot report comes from a 96-thread dual socket Intel Xeon Gold
      6252 setup, but the regression there really is quite noticeable:
      
         -47.9% regression of stress-ng.rawpkt.ops_per_sec
      
      and the commit that was marked as being fixed (7ced3719: "slub:
      Acquire_slab() avoid loop") actually did the loop exit early very
      intentionally (the hint being that "avoid loop" part of that commit
      message), exactly to avoid this issue.
      
      The correct thing to do may be to pick some kind of reasonable middle
      ground: instead of breaking out of the loop on the very first sign of
      contention, or trying over and over and over again, the right thing may
      be to re-try _once_, and then give up on the second failure (or pick
      your favorite value for "once"..).
      Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/
      Cc: Jann Horn <jannh@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b36ca11e
    • P
      cifs: return proper error code in statfs(2) · 9ae15b79
      Paulo Alcantara 提交于
      stable inclusion
      from stable-5.10.24
      commit 3d0bbd97eb6f32bcc1365252aa04a8984bab5007
      bugzilla: 51348
      
      --------------------------------
      
      commit 14302ee3 upstream.
      
      In cifs_statfs(), if server->ops->queryfs is not NULL, then we should
      use its return value rather than always returning 0.  Instead, use rc
      variable as it is properly set to 0 in case there is no
      server->ops->queryfs.
      Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      9ae15b79
    • C
      mount: fix mounting of detached mounts onto targets that reside on shared mounts · 3ef215f4
      Christian Brauner 提交于
      stable inclusion
      from stable-5.10.24
      commit 36e1efcdc54274d03e67ed6a9d5c1c2a2e77e947
      bugzilla: 51348
      
      --------------------------------
      
      commit ee2e3f50 upstream.
      
      Creating a series of detached mounts, attaching them to the filesystem,
      and unmounting them can be used to trigger an integer overflow in
      ns->mounts causing the kernel to block any new mounts in count_mounts()
      and returning ENOSPC because it falsely assumes that the maximum number
      of mounts in the mount namespace has been reached, i.e. it thinks it
      can't fit the new mounts into the mount namespace anymore.
      
      Depending on the number of mounts in your system, this can be reproduced
      on any kernel that supportes open_tree() and move_mount() by compiling
      and running the following program:
      
        /* SPDX-License-Identifier: LGPL-2.1+ */
      
        #define _GNU_SOURCE
        #include <errno.h>
        #include <fcntl.h>
        #include <getopt.h>
        #include <limits.h>
        #include <stdbool.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        #include <sys/mount.h>
        #include <sys/stat.h>
        #include <sys/syscall.h>
        #include <sys/types.h>
        #include <unistd.h>
      
        /* open_tree() */
        #ifndef OPEN_TREE_CLONE
        #define OPEN_TREE_CLONE 1
        #endif
      
        #ifndef OPEN_TREE_CLOEXEC
        #define OPEN_TREE_CLOEXEC O_CLOEXEC
        #endif
      
        #ifndef __NR_open_tree
                #if defined __alpha__
                        #define __NR_open_tree 538
                #elif defined _MIPS_SIM
                        #if _MIPS_SIM == _MIPS_SIM_ABI32        /* o32 */
                                #define __NR_open_tree 4428
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_NABI32       /* n32 */
                                #define __NR_open_tree 6428
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_ABI64        /* n64 */
                                #define __NR_open_tree 5428
                        #endif
                #elif defined __ia64__
                        #define __NR_open_tree (428 + 1024)
                #else
                        #define __NR_open_tree 428
                #endif
        #endif
      
        /* move_mount() */
        #ifndef MOVE_MOUNT_F_EMPTY_PATH
        #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */
        #endif
      
        #ifndef __NR_move_mount
                #if defined __alpha__
                        #define __NR_move_mount 539
                #elif defined _MIPS_SIM
                        #if _MIPS_SIM == _MIPS_SIM_ABI32        /* o32 */
                                #define __NR_move_mount 4429
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_NABI32       /* n32 */
                                #define __NR_move_mount 6429
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_ABI64        /* n64 */
                                #define __NR_move_mount 5429
                        #endif
                #elif defined __ia64__
                        #define __NR_move_mount (428 + 1024)
                #else
                        #define __NR_move_mount 429
                #endif
        #endif
      
        static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags)
        {
                return syscall(__NR_open_tree, dfd, filename, flags);
        }
      
        static inline int sys_move_mount(int from_dfd, const char *from_pathname, int to_dfd,
                                         const char *to_pathname, unsigned int flags)
        {
                return syscall(__NR_move_mount, from_dfd, from_pathname, to_dfd, to_pathname, flags);
        }
      
        static bool is_shared_mountpoint(const char *path)
        {
                bool shared = false;
                FILE *f = NULL;
                char *line = NULL;
                int i;
                size_t len = 0;
      
                f = fopen("/proc/self/mountinfo", "re");
                if (!f)
                        return 0;
      
                while (getline(&line, &len, f) > 0) {
                        char *slider1, *slider2;
      
                        for (slider1 = line, i = 0; slider1 && i < 4; i++)
                                slider1 = strchr(slider1 + 1, ' ');
      
                        if (!slider1)
                                continue;
      
                        slider2 = strchr(slider1 + 1, ' ');
                        if (!slider2)
                                continue;
      
                        *slider2 = '\0';
                        if (strcmp(slider1 + 1, path) == 0) {
                                /* This is the path. Is it shared? */
                                slider1 = strchr(slider2 + 1, ' ');
                                if (slider1 && strstr(slider1, "shared:")) {
                                        shared = true;
                                        break;
                                }
                        }
                }
                fclose(f);
                free(line);
      
                return shared;
        }
      
        static void usage(void)
        {
                const char *text = "mount-new [--recursive] <base-dir>\n";
                fprintf(stderr, "%s", text);
                _exit(EXIT_SUCCESS);
        }
      
        #define exit_usage(format, ...)                              \
                ({                                                   \
                        fprintf(stderr, format "\n", ##__VA_ARGS__); \
                        usage();                                     \
                })
      
        #define exit_log(format, ...)                                \
                ({                                                   \
                        fprintf(stderr, format "\n", ##__VA_ARGS__); \
                        exit(EXIT_FAILURE);                          \
                })
      
        static const struct option longopts[] = {
                {"help",        no_argument,            0,      'a'},
                { NULL,         no_argument,            0,       0 },
        };
      
        int main(int argc, char *argv[])
        {
                int exit_code = EXIT_SUCCESS, index = 0;
                int dfd, fd_tree, new_argc, ret;
                char *base_dir;
                char *const *new_argv;
                char target[PATH_MAX];
      
                while ((ret = getopt_long_only(argc, argv, "", longopts, &index)) != -1) {
                        switch (ret) {
                        case 'a':
                                /* fallthrough */
                        default:
                                usage();
                        }
                }
      
                new_argv = &argv[optind];
                new_argc = argc - optind;
                if (new_argc < 1)
                        exit_usage("Missing base directory\n");
                base_dir = new_argv[0];
      
                if (*base_dir != '/')
                        exit_log("Please specify an absolute path");
      
                /* Ensure that target is a shared mountpoint. */
                if (!is_shared_mountpoint(base_dir))
                        exit_log("Please ensure that \"%s\" is a shared mountpoint", base_dir);
      
                dfd = open(base_dir, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
                if (dfd < 0)
                        exit_log("%m - Failed to open base directory \"%s\"", base_dir);
      
                ret = mkdirat(dfd, "detached-move-mount", 0755);
                if (ret < 0)
                        exit_log("%m - Failed to create required temporary directories");
      
                ret = snprintf(target, sizeof(target), "%s/detached-move-mount", base_dir);
                if (ret < 0 || (size_t)ret >= sizeof(target))
                        exit_log("%m - Failed to assemble target path");
      
                /*
                 * Having a mount table with 10000 mounts is already quite excessive
                 * and shoult account even for weird test systems.
                 */
                for (size_t i = 0; i < 10000; i++) {
                        fd_tree = sys_open_tree(dfd, "detached-move-mount",
                                                OPEN_TREE_CLONE |
                                                OPEN_TREE_CLOEXEC |
                                                AT_EMPTY_PATH);
                        if (fd_tree < 0) {
                                fprintf(stderr, "%m - Failed to open %d(detached-move-mount)", dfd);
                                exit_code = EXIT_FAILURE;
                                break;
                        }
      
                        ret = sys_move_mount(fd_tree, "", dfd, "detached-move-mount", MOVE_MOUNT_F_EMPTY_PATH);
                        if (ret < 0) {
                                if (errno == ENOSPC)
                                        fprintf(stderr, "%m - Buggy mount counting");
                                else
                                        fprintf(stderr, "%m - Failed to attach mount to %d(detached-move-mount)", dfd);
                                exit_code = EXIT_FAILURE;
                                break;
                        }
                        close(fd_tree);
      
                        ret = umount2(target, MNT_DETACH);
                        if (ret < 0) {
                                fprintf(stderr, "%m - Failed to unmount %s", target);
                                exit_code = EXIT_FAILURE;
                                break;
                        }
                }
      
                (void)unlinkat(dfd, "detached-move-mount", AT_REMOVEDIR);
                close(dfd);
      
                exit(exit_code);
        }
      
      and wait for the kernel to refuse any new mounts by returning ENOSPC.
      How many iterations are needed depends on the number of mounts in your
      system. Assuming you have something like 50 mounts on a standard system
      it should be almost instantaneous.
      
      The root cause of this is that detached mounts aren't handled correctly
      when source and target mount are identical and reside on a shared mount
      causing a broken mount tree where the detached source itself is
      propagated which propagation prevents for regular bind-mounts and new
      mounts. This ultimately leads to a miscalculation of the number of
      mounts in the mount namespace.
      
      Detached mounts created via
      open_tree(fd, path, OPEN_TREE_CLONE)
      are essentially like an unattached new mount, or an unattached
      bind-mount. They can then later on be attached to the filesystem via
      move_mount() which calls into attach_recursive_mount(). Part of
      attaching it to the filesystem is making sure that mounts get correctly
      propagated in case the destination mountpoint is MS_SHARED, i.e. is a
      shared mountpoint. This is done by calling into propagate_mnt() which
      walks the list of peers calling propagate_one() on each mount in this
      list making sure it receives the propagation event.
      The propagate_one() functions thereby skips both new mounts and bind
      mounts to not propagate them "into themselves". Both are identified by
      checking whether the mount is already attached to any mount namespace in
      mnt->mnt_ns. The is what the IS_MNT_NEW() helper is responsible for.
      
      However, detached mounts have an anonymous mount namespace attached to
      them stashed in mnt->mnt_ns which means that IS_MNT_NEW() doesn't
      realize they need to be skipped causing the mount to propagate "into
      itself" breaking the mount table and causing a disconnect between the
      number of mounts recorded as being beneath or reachable from the target
      mountpoint and the number of mounts actually recorded/counted in
      ns->mounts ultimately causing an overflow which in turn prevents any new
      mounts via the ENOSPC issue.
      
      So teach propagation to handle detached mounts by making it aware of
      them. I've been tracking this issue down for the last couple of days and
      then verifying that the fix is correct by
      unmounting everything in my current mount table leaving only /proc and
      /sys mounted and running the reproducer above overnight verifying the
      number of mounts counted in ns->mounts. With this fix the counts are
      correct and the ENOSPC issue can't be reproduced.
      
      This change will only have an effect on mounts created with the new
      mount API since detached mounts cannot be created with the old mount API
      so regressions are extremely unlikely.
      
      Link: https://lore.kernel.org/r/20210306101010.243666-1-christian.brauner@ubuntu.com
      Fixes: 2db154b3 ("vfs: syscall: Add move_mount(2) to move mounts around")
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3ef215f4
    • C
      powerpc/603: Fix protection of user pages mapped with PROT_NONE · f0404980
      Christophe Leroy 提交于
      stable inclusion
      from stable-5.10.24
      commit aa1258d91455a75474d0541f746537c9bb0484c3
      bugzilla: 51348
      
      --------------------------------
      
      commit c119565a upstream.
      
      On book3s/32, page protection is defined by the PP bits in the PTE
      which provide the following protection depending on the access
      keys defined in the matching segment register:
      - PP 00 means RW with key 0 and N/A with key 1.
      - PP 01 means RW with key 0 and RO with key 1.
      - PP 10 means RW with both key 0 and key 1.
      - PP 11 means RO with both key 0 and key 1.
      
      Since the implementation of kernel userspace access protection,
      PP bits have been set as follows:
      - PP00 for pages without _PAGE_USER
      - PP01 for pages with _PAGE_USER and _PAGE_RW
      - PP11 for pages with _PAGE_USER and without _PAGE_RW
      
      For kernelspace segments, kernel accesses are performed with key 0
      and user accesses are performed with key 1. As PP00 is used for
      non _PAGE_USER pages, user can't access kernel pages not flagged
      _PAGE_USER while kernel can.
      
      For userspace segments, both kernel and user accesses are performed
      with key 0, therefore pages not flagged _PAGE_USER are still
      accessible to the user.
      
      This shouldn't be an issue, because userspace is expected to be
      accessible to the user. But unlike most other architectures, powerpc
      implements PROT_NONE protection by removing _PAGE_USER flag instead of
      flagging the page as not valid. This means that pages in userspace
      that are not flagged _PAGE_USER shall remain inaccessible.
      
      To get the expected behaviour, just mimic other architectures in the
      TLB miss handler by checking _PAGE_USER permission on userspace
      accesses as if it was the _PAGE_PRESENT bit.
      
      Note that this problem only is only for 603 cores. The 604+ have
      an hash table, and hash_page() function already implement the
      verification of _PAGE_USER permission on userspace pages.
      
      Fixes: f342adca ("powerpc/32s: Prepare Kernel Userspace Access Protection")
      Cc: stable@vger.kernel.org # v5.2+
      Reported-by: NChristoph Plattner <christoph.plattner@thalesgroup.com>
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/4a0c6e3bb8f0c162457bf54d9bc6fd8d7b55129f.1612160907.git.christophe.leroy@csgroup.euSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f0404980
    • L
      mt76: dma: do not report truncated frames to mac80211 · de914b17
      Lorenzo Bianconi 提交于
      stable inclusion
      from stable-5.10.24
      commit e36d276dd4be6085b2f830dbb24e4746ec4a042b
      bugzilla: 51348
      
      --------------------------------
      
      commit d0bd52c5 upstream.
      
      Commit b102f0c5 ("mt76: fix array overflow on receiving too many
      fragments for a packet") fixes a possible OOB access but it introduces a
      memory leak since the pending frame is not released to page_frag_cache
      if the frag array of skb_shared_info is full. Commit 93a1d479
      ("mt76: dma: fix a possible memory leak in mt76_add_fragment()") fixes
      the issue but does not free the truncated skb that is forwarded to
      mac80211 layer. Fix the leftover issue discarding even truncated skbs.
      
      Fixes: 93a1d479 ("mt76: dma: fix a possible memory leak in mt76_add_fragment()")
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/a03166fcc8214644333c68674a781836e0f57576.1612697217.git.lorenzo@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      de914b17
    • J
      ibmvnic: always store valid MAC address · adf9ab95
      Jiri Wiesner 提交于
      stable inclusion
      from stable-5.10.24
      commit 1e343b2e7b9678f199df9693a3548e9a4ab98488
      bugzilla: 51348
      
      --------------------------------
      
      commit 67eb2114 upstream.
      
      The last change to ibmvnic_set_mac(), 8fc3672a, meant to prevent
      users from setting an invalid MAC address on an ibmvnic interface
      that has not been brought up yet. The change also prevented the
      requested MAC address from being stored by the adapter object for an
      ibmvnic interface when the state of the ibmvnic interface is
      VNIC_PROBED - that is after probing has finished but before the
      ibmvnic interface is brought up. The MAC address stored by the
      adapter object is used and sent to the hypervisor for checking when
      an ibmvnic interface is brought up.
      
      The ibmvnic driver ignoring the requested MAC address when in
      VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more
      than one slave to malfunction. The bonding code must be able to
      change the MAC address of its slaves before they are brought up
      during enslaving. The inability of kernels with 8fc3672a to set
      the MAC addresses of bonding slaves is observable in the output of
      "ip address show". The MAC addresses of the slaves are the same as
      the MAC address of the bond on a working system whereas the slaves
      retain their original MAC addresses on a system with a malfunctioning
      LACP bond.
      
      Fixes: 8fc3672a ("ibmvnic: fix ibmvnic_set_mac")
      Signed-off-by: NJiri Wiesner <jwiesner@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      adf9ab95
    • M
      ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning. · f963c2cf
      Michal Suchanek 提交于
      stable inclusion
      from stable-5.10.24
      commit 57ac75f8d241b3d13b77d223214be025f18df8a1
      bugzilla: 51348
      
      --------------------------------
      
      commit 6881b07f upstream.
      
      GCC 7.5 reports:
      ../drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_reset_init':
      ../drivers/net/ethernet/ibm/ibmvnic.c:5373:51: warning: 'old_num_tx_queues' may be used uninitialized in this function [-Wmaybe-uninitialized]
      ../drivers/net/ethernet/ibm/ibmvnic.c:5373:6: warning: 'old_num_rx_queues' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      The variable is initialized only if(reset) and used only if(reset &&
      something) so this is a false positive. However, there is no reason to
      not initialize the variables unconditionally avoiding the warning.
      
      Fixes: 635e442f ("ibmvnic: merge ibmvnic_reset_init and ibmvnic_init")
      Signed-off-by: NMichal Suchanek <msuchanek@suse.de>
      Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f963c2cf
    • M
      libbpf: Clear map_info before each bpf_obj_get_info_by_fd · 5929b475
      Maciej Fijalkowski 提交于
      stable inclusion
      from stable-5.10.24
      commit 2f6f72ee9a98811f80b604f54b00dd3dd7fa75eb
      bugzilla: 51348
      
      --------------------------------
      
      commit 2b2aedab upstream.
      
      xsk_lookup_bpf_maps, based on prog_fd, looks whether current prog has a
      reference to XSKMAP. BPF prog can include insns that work on various BPF
      maps and this is covered by iterating through map_ids.
      
      The bpf_map_info that is passed to bpf_obj_get_info_by_fd for filling
      needs to be cleared at each iteration, so that it doesn't contain any
      outdated fields and that is currently missing in the function of
      interest.
      
      To fix that, zero-init map_info via memset before each
      bpf_obj_get_info_by_fd call.
      
      Also, since the area of this code is touched, in general strcmp is
      considered harmful, so let's convert it to strncmp and provide the
      size of the array name for current map_info.
      
      While at it, do s/continue/break/ once we have found the xsks_map to
      terminate the search.
      
      Fixes: 5750902a ("libbpf: proper XSKMAP cleanup")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Link: https://lore.kernel.org/bpf/20210303185636.18070-4-maciej.fijalkowski@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5929b475
    • M
      samples, bpf: Add missing munmap in xdpsock · 3a91b98c
      Maciej Fijalkowski 提交于
      stable inclusion
      from stable-5.10.24
      commit f126147970a11eb4a686d30bd0740de3de2cd6c8
      bugzilla: 51348
      
      --------------------------------
      
      commit 6bc66998 upstream.
      
      We mmap the umem region, but we never munmap it.
      Add the missing call at the end of the cleanup.
      
      Fixes: 3945b37a ("samples/bpf: use hugepages in xdpsock app")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Link: https://lore.kernel.org/bpf/20210303185636.18070-3-maciej.fijalkowski@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3a91b98c
    • Y
      selftests/bpf: Mask bpf_csum_diff() return value to 16 bits in test_verifier · c95dcb7b
      Yauheni Kaliuta 提交于
      stable inclusion
      from stable-5.10.24
      commit 4d2cdb2ded60a6aae748ac61ae3919a3b037f26c
      bugzilla: 51348
      
      --------------------------------
      
      commit 6185266c upstream.
      
      The verifier test labelled "valid read map access into a read-only array
      2" calls the bpf_csum_diff() helper and checks its return value. However,
      architecture implementations of csum_partial() (which is what the helper
      uses) differ in whether they fold the return value to 16 bit or not. For
      example, x86 version has ...
      
      	if (unlikely(odd)) {
      		result = from32to16(result);
      		result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
      	}
      
      ... while generic lib/checksum.c does:
      
      	result = from32to16(result);
      	if (odd)
      		result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
      
      This makes the helper return different values on different architectures,
      breaking the test on non-x86. To fix this, add an additional instruction
      to always mask the return value to 16 bits, and update the expected return
      value accordingly.
      
      Fixes: fb2abb73 ("bpf, selftest: test {rd, wr}only flags and direct value access")
      Signed-off-by: NYauheni Kaliuta <yauheni.kaliuta@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210228103017.320240-1-yauheni.kaliuta@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      c95dcb7b
    • H
      selftests/bpf: No need to drop the packet when there is no geneve opt · 15c659c2
      Hangbin Liu 提交于
      stable inclusion
      from stable-5.10.24
      commit 4fa0ece2e0eb3740c6bfbf4f8121068248bb4295
      bugzilla: 51348
      
      --------------------------------
      
      commit 557c223b upstream.
      
      In bpf geneve tunnel test we set geneve option on tx side. On rx side we
      only call bpf_skb_get_tunnel_opt(). Since commit 9c2e14b4 ("ip_tunnels:
      Set tunnel option flag when tunnel metadata is present") geneve_rx() will
      not add TUNNEL_GENEVE_OPT flag if there is no geneve option, which cause
      bpf_skb_get_tunnel_opt() return ENOENT and _geneve_get_tunnel() in
      test_tunnel_kern.c drop the packet.
      
      As it should be valid that bpf_skb_get_tunnel_opt() return error when
      there is not tunnel option, there is no need to drop the packet and
      break all geneve rx traffic. Just set opt_class to 0 in this test and
      keep returning TC_ACT_OK.
      
      Fixes: 933a741e ("selftests/bpf: bpf tunnel test.")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Link: https://lore.kernel.org/bpf/20210224081403.1425474-1-liuhangbin@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      15c659c2
    • I
      selftests/bpf: Use the last page in test_snprintf_btf on s390 · 67f8c3c8
      Ilya Leoshkevich 提交于
      stable inclusion
      from stable-5.10.24
      commit 7653656be252abd7d2d3f16152188623de5be4f8
      bugzilla: 51348
      
      --------------------------------
      
      commit 42a382a4 upstream.
      
      test_snprintf_btf fails on s390, because NULL points to a readable
      struct lowcore there. Fix by using the last page instead.
      
      Error message example:
      
          printing fffffffffffff000 should generate error, got (361)
      
      Fixes: 076a95f5 ("selftests/bpf: Add bpf_snprintf_btf helper tests")
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NHeiko Carstens <hca@linux.ibm.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210227051726.121256-1-iii@linux.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      67f8c3c8
    • G
      net: phy: fix save wrong speed and duplex problem if autoneg is on · 541e3645
      Guangbin Huang 提交于
      stable inclusion
      from stable-5.10.24
      commit 6aa23829949c2c0912e82866aeab4fd591595235
      bugzilla: 51348
      
      --------------------------------
      
      commit d9032dba upstream.
      
      If phy uses generic driver and autoneg is on, enter command
      "ethtool -s eth0 speed 50" will not change phy speed actually, but
      command "ethtool eth0" shows speed is 50Mb/s because phydev->speed
      has been set to 50 and no update later.
      
      And duplex setting has same problem too.
      
      However, if autoneg is on, phy only changes speed and duplex according to
      phydev->advertising, but not phydev->speed and phydev->duplex. So in this
      case, phydev->speed and phydev->duplex don't need to be set in function
      phy_ethtool_ksettings_set() if autoneg is on.
      
      Fixes: 51e2a384 ("PHY: Avoid unnecessary aneg restarts")
      Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      541e3645
    • J
      net: always use icmp{,v6}_ndo_send from ndo_start_xmit · 0165d324
      Jason A. Donenfeld 提交于
      stable inclusion
      from stable-5.10.24
      commit 91796b65563bd3fd0efe4fb56d6ee1c5c6006eb0
      bugzilla: 51348
      
      --------------------------------
      
      commit 4372339e upstream.
      
      There were a few remaining tunnel drivers that didn't receive the prior
      conversion to icmp{,v6}_ndo_send. Knowing now that this could lead to
      memory corrution (see ee576c47 ("net: icmp: pass zeroed opts from
      icmp{,v6}_ndo_send before sending") for details), there's even more
      imperative to have these all converted. So this commit goes through the
      remaining cases that I could find and does a boring translation to the
      ndo variety.
      
      The Fixes: line below is the merge that originally added icmp{,v6}_
      ndo_send and converted the first batch of icmp{,v6}_send users. The
      rationale then for the change applies equally to this patch. It's just
      that these drivers were left out of the initial conversion because these
      network devices are hiding in net/ rather than in drivers/net/.
      
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Fixes: 803381f9 ("Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      0165d324
    • V
      netfilter: x_tables: gpf inside xt_find_revision() · 034017c8
      Vasily Averin 提交于
      stable inclusion
      from stable-5.10.24
      commit 8abbf7e53e179b16dc48c40cecc6c86240ca026c
      bugzilla: 51348
      
      --------------------------------
      
      commit 8e24eddd upstream.
      
      nested target/match_revfn() calls work with xt[NFPROTO_UNSPEC] lists
      without taking xt[NFPROTO_UNSPEC].mutex. This can race with module unload
      and cause host to crash:
      
      general protection fault: 0000 [#1]
      Modules linked in: ... [last unloaded: xt_cluster]
      CPU: 0 PID: 542455 Comm: iptables
      RIP: 0010:[<ffffffff8ffbd518>]  [<ffffffff8ffbd518>] strcmp+0x18/0x40
      RDX: 0000000000000003 RSI: ffff9a5a5d9abe10 RDI: dead000000000111
      R13: ffff9a5a5d9abe10 R14: ffff9a5a5d9abd8c R15: dead000000000100
      (VvS: %R15 -- &xt_match,  %RDI -- &xt_match.name,
      xt_cluster unregister match in xt[NFPROTO_UNSPEC].match list)
      Call Trace:
       [<ffffffff902ccf44>] match_revfn+0x54/0xc0
       [<ffffffff902ccf9f>] match_revfn+0xaf/0xc0
       [<ffffffff902cd01e>] xt_find_revision+0x6e/0xf0
       [<ffffffffc05a5be0>] do_ipt_get_ctl+0x100/0x420 [ip_tables]
       [<ffffffff902cc6bf>] nf_getsockopt+0x4f/0x70
       [<ffffffff902dd99e>] ip_getsockopt+0xde/0x100
       [<ffffffff903039b5>] raw_getsockopt+0x25/0x50
       [<ffffffff9026c5da>] sock_common_getsockopt+0x1a/0x20
       [<ffffffff9026b89d>] SyS_getsockopt+0x7d/0xf0
       [<ffffffff903cbf92>] system_call_fastpath+0x25/0x2a
      
      Fixes: 656caff2 ("netfilter 04/09: x_tables: fix match/target revision lookup")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      034017c8
    • F
      netfilter: nf_nat: undo erroneous tcp edemux lookup · 6c980f9a
      Florian Westphal 提交于
      stable inclusion
      from stable-5.10.24
      commit 42402bd84530d3761b97775c10762fde28d5b2f9
      bugzilla: 51348
      
      --------------------------------
      
      commit 03a3ca37 upstream.
      
      Under extremely rare conditions TCP early demux will retrieve the wrong
      socket.
      
      1. local machine establishes a connection to a remote server, S, on port
         p.
      
         This gives:
         laddr:lport -> S:p
         ... both in tcp and conntrack.
      
      2. local machine establishes a connection to host H, on port p2.
         2a. TCP stack choses same laddr:lport, so we have
         laddr:lport -> H:p2 from TCP point of view.
         2b). There is a destination NAT rewrite in place, translating
              H:p2 to S:p.  This results in following conntrack entries:
      
         I)  laddr:lport -> S:p  (origin)  S:p -> laddr:lport (reply)
         II) laddr:lport -> H:p2 (origin)  S:p -> laddr:lport2 (reply)
      
         NAT engine has rewritten laddr:lport to laddr:lport2 to map
         the reply packet to the correct origin.
      
         When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
         will undo-the SNAT transformation, rewriting IP header to
         S:p -> laddr:lport
      
         This causes TCP early demux to associate the skb with the TCP socket
         of the first connection.
      
         The INPUT hook will then reverse the DNAT transformation, rewriting
         the IP header to H:p2 -> laddr:lport.
      
      Because packet ends up with the wrong socket, the new connection
      never completes: originator stays in SYN_SENT and conntrack entry
      remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
      until it gives up.
      
      To resolve this, orphan the skb after the input rewrite:
      Because the source IP address changed, the socket must be incorrect.
      We can't move the DNAT undo to prerouting due to backwards
      compatibility, doing so will make iptables/nftables rules to no longer
      match the way they did.
      
      After orphan, the packet will be handed to the next protocol layer
      (tcp, udp, ...) and that will repeat the socket lookup just like as if
      early demux was disabled.
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      6c980f9a
    • E
      tcp: add sanity tests to TCP_QUEUE_SEQ · bc4f1468
      Eric Dumazet 提交于
      stable inclusion
      from stable-5.10.24
      commit 046f3c1c2ff450fb7ae53650e9a95e0074a61f3e
      bugzilla: 51348
      
      --------------------------------
      
      commit 8811f4a9 upstream.
      
      Qingyu Li reported a syzkaller bug where the repro
      changes RCV SEQ _after_ restoring data in the receive queue.
      
      mprotect(0x4aa000, 12288, PROT_READ)    = 0
      mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000
      mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
      mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000
      socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
      setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0
      connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0
      setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0
      sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
      setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0
      setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0
      recvfrom(3, NULL, 20, 0, NULL, NULL)    = -1 ECONNRESET (Connection reset by peer)
      
      syslog shows:
      [  111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0
      [  111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0
      
      This should not be allowed. TCP_QUEUE_SEQ should only be used
      when queues are empty.
      
      This patch fixes this case, and the tx path as well.
      
      Fixes: ee995283 ("tcp: Initial repair mode")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005Reported-by: NQingyu Li <ieatmuttonchuan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      bc4f1468
    • A
      tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE) · fe30893d
      Arjun Roy 提交于
      stable inclusion
      from stable-5.10.24
      commit e95ebe1ed6abc259b897abc1f92622504750747c
      bugzilla: 51348
      
      --------------------------------
      
      commit 2107d45f upstream.
      
      getsockopt(TCP_ZEROCOPY_RECEIVE) has a bug where we read a
      user-provided "len" field of type signed int, and then compare the
      value to the result of an "offsetofend" operation, which is unsigned.
      
      Negative values provided by the user will be promoted to large
      positive numbers; thus checking that len < offsetofend() will return
      false when the intention was that it return true.
      
      Note that while len is originally checked for negative values earlier
      on in do_tcp_getsockopt(), subsequent calls to get_user() re-read the
      value from userspace which may have changed in the meantime.
      
      Therefore, re-add the check for negative values after the call to
      get_user in the handler code for TCP_ZEROCOPY_RECEIVE.
      
      Fixes: c8856c05 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NArjun Roy <arjunroy@google.com>
      Link: https://lore.kernel.org/r/20210225232628.4033281-1-arjunroy.kdev@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      fe30893d
    • T
      can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode · eb2be85c
      Torin Cooper-Bennun 提交于
      stable inclusion
      from stable-5.10.24
      commit 473bce9b9393a3a990ed7c9708af38df553f2712
      bugzilla: 51348
      
      --------------------------------
      
      commit 27126252 upstream.
      
      This patch prevents a potentially destructive race condition. The
      device is fully operational on the bus after entering Normal Mode, so
      zeroing the MRAM after entering this mode may lead to loss of
      information, e.g. new received messages.
      
      This patch fixes the problem by first initializing the MRAM, then
      bringing the device into Normale Mode.
      
      Fixes: 5443c226 ("can: tcan4x5x: Add tcan4x5x driver to the kernel")
      Link: https://lore.kernel.org/r/20210226163440.313628-1-torin@maxiluxsystems.comSuggested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NTorin Cooper-Bennun <torin@maxiluxsystems.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      eb2be85c
    • J
      can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode · cc83b0f6
      Joakim Zhang 提交于
      stable inclusion
      from stable-5.10.24
      commit c537011c99abc9d1e1e9bc2a3bb32fda1cda4583
      bugzilla: 51348
      
      --------------------------------
      
      commit c6382004 upstream.
      
      Invoke flexcan_chip_freeze() to enter freeze mode, since need poll
      freeze mode acknowledge.
      
      Fixes: e955cead ("CAN: Add Flexcan CAN controller driver")
      Link: https://lore.kernel.org/r/20210218110037.16591-4-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      cc83b0f6
    • J
      can: flexcan: enable RX FIFO after FRZ/HALT valid · b08e4dd1
      Joakim Zhang 提交于
      stable inclusion
      from stable-5.10.24
      commit e24c53182850abce8c7fe3423f843ccb62581e6f
      bugzilla: 51348
      
      --------------------------------
      
      commit ec15e27c upstream.
      
      RX FIFO enable failed could happen when do system reboot stress test:
      
      [    0.303958] flexcan 5a8d0000.can: 5a8d0000.can supply xceiver not found, using dummy regulator
      [    0.304281] flexcan 5a8d0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
      [    0.314640] flexcan 5a8d0000.can: registering netdev failed
      [    0.320728] flexcan 5a8e0000.can: 5a8e0000.can supply xceiver not found, using dummy regulator
      [    0.320991] flexcan 5a8e0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
      [    0.331360] flexcan 5a8e0000.can: registering netdev failed
      [    0.337444] flexcan 5a8f0000.can: 5a8f0000.can supply xceiver not found, using dummy regulator
      [    0.337716] flexcan 5a8f0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
      [    0.348117] flexcan 5a8f0000.can: registering netdev failed
      
      RX FIFO should be enabled after the FRZ/HALT are valid. But the current
      code enable RX FIFO and FRZ/HALT at the same time.
      
      Fixes: e955cead ("CAN: Add Flexcan CAN controller driver")
      Link: https://lore.kernel.org/r/20210218110037.16591-3-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b08e4dd1
    • J
      can: flexcan: assert FRZ bit in flexcan_chip_freeze() · 5c89e681
      Joakim Zhang 提交于
      stable inclusion
      from stable-5.10.24
      commit 98b7f969116df96c57e9a8572620d71e92fcb725
      bugzilla: 51348
      
      --------------------------------
      
      commit 449052cf upstream.
      
      Assert HALT bit to enter freeze mode, there is a premise that FRZ bit is
      asserted. This patch asserts FRZ bit in flexcan_chip_freeze, although
      the reset value is 1b'1. This is a prepare patch, later patch will
      invoke flexcan_chip_freeze() to enter freeze mode, which polling freeze
      mode acknowledge.
      
      Fixes: b1aa1c7a ("can: flexcan: fix transition from and to freeze mode in chip_{,un}freeze")
      Link: https://lore.kernel.org/r/20210218110037.16591-2-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5c89e681
    • O
      can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership · 7401aa66
      Oleksij Rempel 提交于
      stable inclusion
      from stable-5.10.24
      commit 4224890edff1b4679dc8ddeaa69b43efce5366ba
      bugzilla: 51348
      
      --------------------------------
      
      commit e940e089 upstream.
      
      There are two ref count variables controlling the free()ing of a socket:
      - struct sock::sk_refcnt - which is changed by sock_hold()/sock_put()
      - struct sock::sk_wmem_alloc - which accounts the memory allocated by
        the skbs in the send path.
      
      In case there are still TX skbs on the fly and the socket() is closed,
      the struct sock::sk_refcnt reaches 0. In the TX-path the CAN stack
      clones an "echo" skb, calls sock_hold() on the original socket and
      references it. This produces the following back trace:
      
      | WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134
      | refcount_t: addition on 0; use-after-free.
      | Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E)
      | CPU: 0 PID: 280 Comm: test_can.sh Tainted: G            E     5.11.0-04577-gf8ff6603c617 #203
      | Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
      | Backtrace:
      | [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220
      | [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8)
      | [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90
      | [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2
      | [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540
      | [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50)
      | [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c)
      | [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600
      | [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00
      | [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600
      | [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400
      | [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534)
      
      To fix this problem, only set skb ownership to sockets which have still
      a ref count > 0.
      
      Fixes: 0ae89beb ("can: add destructor for self generated skbs")
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Cc: Andre Naujoks <nautsch2@gmail.com>
      Link: https://lore.kernel.org/r/20210226092456.27126-1-o.rempel@pengutronix.deSuggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      7401aa66
    • M
      net: l2tp: reduce log level of messages in receive path, add counter instead · 7a2d5a15
      Matthias Schiffer 提交于
      stable inclusion
      from stable-5.10.24
      commit fa5d019c56e78e0b33f585d23149f2553568b998
      bugzilla: 51348
      
      --------------------------------
      
      commit 3e59e885 upstream.
      
      Commit 5ee759cd ("l2tp: use standard API for warning log messages")
      changed a number of warnings about invalid packets in the receive path
      so that they are always shown, instead of only when a special L2TP debug
      flag is set. Even with rate limiting these warnings can easily cause
      significant log spam - potentially triggered by a malicious party
      sending invalid packets on purpose.
      
      In addition these warnings were noticed by projects like Tunneldigger [1],
      which uses L2TP for its data path, but implements its own control
      protocol (which is sufficiently different from L2TP data packets that it
      would always be passed up to userspace even with future extensions of
      L2TP).
      
      Some of the warnings were already redundant, as l2tp_stats has a counter
      for these packets. This commit adds one additional counter for invalid
      packets that are passed up to userspace. Packets with unknown session are
      not counted as invalid, as there is nothing wrong with the format of
      these packets.
      
      With the additional counter, all of these messages are either redundant
      or benign, so we reduce them to pr_debug_ratelimited().
      
      [1] https://github.com/wlanslovenija/tunneldigger/issues/160
      
      Fixes: 5ee759cd ("l2tp: use standard API for warning log messages")
      Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      7a2d5a15
    • B
      net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 · f908d27a
      Balazs Nemeth 提交于
      stable inclusion
      from stable-5.10.24
      commit 453fff24f52eeb62ab65582848498097273df269
      bugzilla: 51348
      
      --------------------------------
      
      commit d348ede3 upstream.
      
      A packet with skb_inner_network_header(skb) == skb_network_header(skb)
      and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
      from the packet. Subsequently, the call to skb_mac_gso_segment will
      again call mpls_gso_segment with the same packet leading to an infinite
      loop. In addition, ensure that the header length is a multiple of four,
      which should hold irrespective of the number of stacked labels.
      Signed-off-by: NBalazs Nemeth <bnemeth@redhat.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f908d27a
    • B
      net: check if protocol extracted by virtio_net_hdr_set_proto is correct · 857ee3c4
      Balazs Nemeth 提交于
      stable inclusion
      from stable-5.10.24
      commit faa3baa2828c5e1c4374f3e60041f75c64f5fcb6
      bugzilla: 51348
      
      --------------------------------
      
      commit 924a9bc3 upstream.
      
      For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't
      set) based on the type in the virtio net hdr, but the skb could contain
      anything since it could come from packet_snd through a raw socket. If
      there is a mismatch between what virtio_net_hdr_set_proto sets and
      the actual protocol, then the skb could be handled incorrectly later
      on.
      
      An example where this poses an issue is with the subsequent call to
      skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set
      correctly. A specially crafted packet could fool
      skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned.
      
      Avoid blindly trusting the information provided by the virtio net header
      by checking that the protocol in the packet actually matches the
      protocol set by virtio_net_hdr_set_proto. Note that since the protocol
      is only checked if skb->dev implements header_ops->parse_protocol,
      packets from devices without the implementation are not checked at this
      stage.
      
      Fixes: 9274124f ("net: stricter validation of untrusted gso packets")
      Signed-off-by: NBalazs Nemeth <bnemeth@redhat.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      857ee3c4
    • D
      net: Fix gro aggregation for udp encaps with zero csum · 8365fd56
      Daniel Borkmann 提交于
      stable inclusion
      from stable-5.10.24
      commit 09af4362ba47c805347840c2bb9719c0458925ca
      bugzilla: 51348
      
      --------------------------------
      
      commit 89e5c58f upstream.
      
      We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
      csum for the UDP header itself is 0. In that case, GRO aggregation does
      not take place on the phys dev, but instead is deferred to the vxlan/geneve
      driver (see trace below).
      
      The reason is essentially that GRO aggregation bails out in udp_gro_receive()
      for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
      others) where for non-zero csums 2abb7cdc ("udp: Add support for doing
      checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
      and napi context has csum_valid set. This is however not the case for zero
      UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).
      
      At the same time 57c67ff4 ("udp: additional GRO support") added matches
      on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
      so it certainly is expected to handle zero csums in udp_gro_receive(). The
      purpose of the check added via 662880f4 ("net: Allow GRO to use and set
      levels of checksum unnecessary") seems to catch bad csum and stop aggregation
      right away.
      
      One way to fix aggregation in the zero case is to only perform the !csum_valid
      check in udp_gro_receive() if uh->check is infact non-zero.
      
      Before:
      
        [...]
        swapper     0 [008]   731.946506: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100400 len=1500   (1)
        swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100200 len=1500
        swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101100 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101700 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101b00 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100600 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100f00 len=1500
        swapper     0 [008]   731.946509: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100a00 len=1500
        swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100500 len=1500
        swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100700 len=1500
        swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101d00 len=1500   (2)
        swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101000 len=1500
        swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101c00 len=1500
        swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101400 len=1500
        swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100e00 len=1500
        swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101600 len=1500
        swapper     0 [008]   731.946521: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100800 len=774
        swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1)
        swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112  (2)
        [...]
      
        # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
        MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
        Recv   Send    Send
        Socket Socket  Message  Elapsed
        Size   Size    Size     Time     Throughput
        bytes  bytes   bytes    secs.    10^6bits/sec
      
         87380  16384  16384    20.01    13129.24
      
      After:
      
        [...]
        swapper     0 [026]   521.862641: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479000 len=11286 (1)
        swapper     0 [026]   521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1)
        swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d478500 len=2898  (2)
        swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479f00 len=8490  (3)
        swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848  (2)
        swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440  (3)
        [...]
      
        # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
        MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
        Recv   Send    Send
        Socket Socket  Message  Elapsed
        Size   Size    Size     Time     Throughput
        bytes  bytes   bytes    secs.    10^6bits/sec
      
         87380  16384  16384    20.01    24576.53
      
      Fixes: 57c67ff4 ("udp: additional GRO support")
      Fixes: 662880f4 ("net: Allow GRO to use and set levels of checksum unnecessary")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20210226212248.8300-1-daniel@iogearbox.netSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8365fd56
    • F
      ath9k: fix transmitting to stations in dynamic SMPS mode · bd1c87fa
      Felix Fietkau 提交于
      stable inclusion
      from stable-5.10.24
      commit d2fb1911a7a8f655440d613fc8946df384d83ee5
      bugzilla: 51348
      
      --------------------------------
      
      commit 3b9ea720 upstream.
      
      When transmitting to a receiver in dynamic SMPS mode, all transmissions that
      use multiple spatial streams need to be sent using CTS-to-self or RTS/CTS to
      give the receiver's extra chains some time to wake up.
      This fixes the tx rate getting stuck at <= MCS7 for some clients, especially
      Intel ones, which make aggressive use of SMPS.
      
      Cc: stable@vger.kernel.org
      Reported-by: NMartin Kennedy <hurricos@gmail.com>
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210214184911.96702-1-nbd@nbd.nameSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      bd1c87fa
    • M
      crypto: mips/poly1305 - enable for all MIPS processors · ab4d5a39
      Maciej W. Rozycki 提交于
      stable inclusion
      from stable-5.10.24
      commit b0454a28f60878539a55439436ea9ad29728d366
      bugzilla: 51348
      
      --------------------------------
      
      commit 6c810cf2 upstream.
      
      The MIPS Poly1305 implementation is generic MIPS code written such as to
      support down to the original MIPS I and MIPS III ISA for the 32-bit and
      64-bit variant respectively.  Lift the current limitation then to enable
      code for MIPSr1 ISA or newer processors only and have it available for
      all MIPS processors.
      Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk>
      Fixes: a11d055e ("crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation")
      Cc: stable@vger.kernel.org # v5.5+
      Acked-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      ab4d5a39