1. 09 4月, 2021 40 次提交
    • J
      net: stmmac: stop each tx channel independently · c4088439
      Joakim Zhang 提交于
      stable inclusion
      from stable-5.10.24
      commit 3c1b58261ff81fba9f6bd8f8d9bcec85920c2ff0
      bugzilla: 51348
      
      --------------------------------
      
      commit a3e860a8 upstream.
      
      If clear GMAC_CONFIG_TE bit, it would stop all tx channels, but users
      may only want to stop specific tx channel.
      
      Fixes: 48863ce5 ("stmmac: add DMA support for GMAC 4.xx")
      Signed-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      c4088439
    • A
      perf build: Fix ccache usage in $(CC) when generating arch errno table · af999be5
      Antonio Terceiro 提交于
      stable inclusion
      from stable-5.10.24
      commit 640492cf1732a9f97b062c3eec2c5ee37f73dada
      bugzilla: 51348
      
      --------------------------------
      
      commit dacfc08d upstream.
      
      This was introduced by commit e4ffd066 ("perf: Normalize gcc
      parameter when generating arch errno table").
      
      Assuming the first word of $(CC) is the actual compiler breaks usage
      like CC="ccache gcc": the script ends up calling ccache directly with
      gcc arguments, what fails. Instead of getting the first word, just
      remove from $(CC) any word that starts with a "-". This maintains the
      spirit of the original patch, while not breaking ccache users.
      
      Fixes: e4ffd066 ("perf: Normalize gcc parameter when generating arch errno table")
      Signed-off-by: NAntonio Terceiro <antonio.terceiro@linaro.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: He Zhe <zhe.he@windriver.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20210224130046.346977-1-antonio.terceiro@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      af999be5
    • K
      tools/resolve_btfids: Fix build error with older host toolchains · 37735835
      Kun-Chuan Hsieh 提交于
      stable inclusion
      from stable-5.10.24
      commit 8493877b58b6f35d643da307ee1ae920b3ddd1d8
      bugzilla: 51348
      
      --------------------------------
      
      commit 41462c6e upstream.
      
      Older libelf.h and glibc elf.h might not yet define the ELF compression
      types.
      
      Checking and defining SHF_COMPRESSED fix the build error when compiling
      with older toolchains. Also, the tool resolve_btfids is compiled with host
      toolchain. The host toolchain is more likely to be older than the cross
      compile toolchain.
      
      Fixes: 51f6463a ("tools/resolve_btfids: Fix sections with wrong alignment")
      Signed-off-by: NKun-Chuan Hsieh <jetswayss@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Link: https://lore.kernel.org/bpf/20210224052752.5284-1-jetswayss@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      37735835
    • A
      ixgbe: fail to create xfrm offload of IPsec tunnel mode SA · b7d26338
      Antony Antony 提交于
      stable inclusion
      from stable-5.10.24
      commit ee7eac24b5b495f484034cdb2bd7152edc5a985f
      bugzilla: 51348
      
      --------------------------------
      
      commit d785e1fe upstream.
      
      Based on talks and indirect references ixgbe IPsec offlod do not
      support IPsec tunnel mode offload. It can only support IPsec transport
      mode offload. Now explicitly fail when creating non transport mode SA
      with offload to avoid false performance expectations.
      
      Fixes: 63a67fe2 ("ixgbe: add ipsec offload add and remove SA")
      Signed-off-by: NAntony Antony <antony@phenome.org>
      Acked-by: NShannon Nelson <snelson@pensando.io>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b7d26338
    • H
      r8169: fix r8168fp_adjust_ocp_cmd function · 1c236597
      Hayes Wang 提交于
      stable inclusion
      from stable-5.10.24
      commit cab735320fe92b40d1f69b93a20694117aa2d5a5
      bugzilla: 51348
      
      --------------------------------
      
      commit abbf9a0e upstream.
      
      The (0xBAF70000 & 0x00FFF000) << 6 should be (0xf70 << 18).
      
      Fixes: 561535b0 ("r8169: fix OCP access on RTL8117")
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Acked-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      1c236597
    • J
      s390/qeth: fix memory leak after failed TX Buffer allocation · 979f13f5
      Julian Wiedmann 提交于
      stable inclusion
      from stable-5.10.24
      commit 84ef8a8cb7894a71f61a46ebdd7bf0c53773cecf
      bugzilla: 51348
      
      --------------------------------
      
      commit e7a36d27 upstream.
      
      When qeth_alloc_qdio_queues() fails to allocate one of the buffers that
      back an Output Queue, the 'out_freeoutqbufs' path will free all
      previously allocated buffers for this queue. But it misses to free the
      half-finished queue struct itself.
      
      Move the buffer allocation into qeth_alloc_output_queue(), and deal with
      such errors internally.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: NAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      979f13f5
    • J
      net: qrtr: fix error return code of qrtr_sendmsg() · 5a7361a8
      Jia-Ju Bai 提交于
      stable inclusion
      from stable-5.10.24
      commit 345d90cd741a6b0c5d98920ee1c6fc6549dc0415
      bugzilla: 51348
      
      --------------------------------
      
      commit 179d0ba0 upstream.
      
      When sock_alloc_send_skb() returns NULL to skb, no error return code of
      qrtr_sendmsg() is assigned.
      To fix this bug, rc is assigned with -ENOMEM in this case.
      
      Fixes: 194ccc88 ("net: qrtr: Support decoding incoming v2 packets")
      Reported-by: NTOTE Robot <oslab@tsinghua.edu.cn>
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5a7361a8
    • V
      net: enetc: allow hardware timestamping on TX queues with tc-etf enabled · 2e3a86fe
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 4f8e71a770dd460c9e8ac40baceefef6d8853b13
      bugzilla: 51348
      
      --------------------------------
      
      commit 29d98f54 upstream.
      
      The txtime is passed to the driver in skb->skb_mstamp_ns, which is
      actually in a union with skb->tstamp (the place where software
      timestamps are kept).
      
      Since commit b50a5c70 ("net: allow simultaneous SW and HW transmit
      timestamping"), __sock_recv_timestamp has some logic for making sure
      that the two calls to skb_tstamp_tx:
      
      skb_tx_timestamp(skb) # Software timestamp in the driver
      -> skb_tstamp_tx(skb, NULL)
      
      and
      
      skb_tstamp_tx(skb, &shhwtstamps) # Hardware timestamp in the driver
      
      will both do the right thing and in a race-free manner, meaning that
      skb_tx_timestamp will deliver a cmsg with the software timestamp only,
      and skb_tstamp_tx with a non-NULL hwtstamps argument will deliver a cmsg
      with the hardware timestamp only.
      
      Why are races even possible? Well, because although the software timestamp
      skb->tstamp is private per skb, the hardware timestamp skb_hwtstamps(skb)
      lives in skb_shinfo(skb), an area which is shared between skbs and their
      clones. And skb_tstamp_tx works by cloning the packets when timestamping
      them, therefore attempting to perform hardware timestamping on an skb's
      clone will also change the hardware timestamp of the original skb. And
      the original skb might have been yet again cloned for software
      timestamping, at an earlier stage.
      
      So the logic in __sock_recv_timestamp can't be as simple as saying
      "does this skb have a hardware timestamp? if yes I'll send the hardware
      timestamp to the socket, otherwise I'll send the software timestamp",
      precisely because the hardware timestamp is shared.
      Instead, it's quite the other way around: __sock_recv_timestamp says
      "does this skb have a software timestamp? if yes, I'll send the software
      timestamp, otherwise the hardware one". This works because the software
      timestamp is not shared with clones.
      
      But that means we have a problem when we attempt hardware timestamping
      with skbs that don't have the skb->tstamp == 0. __sock_recv_timestamp
      will say "oh, yeah, this must be some sort of odd clone" and will not
      deliver the hardware timestamp to the socket. And this is exactly what
      is happening when we have txtime enabled on the socket: as mentioned,
      that is put in a union with skb->tstamp, so it is quite easy to mistake
      it.
      
      Do what other drivers do (intel igb/igc) and write zero to skb->tstamp
      before taking the hardware timestamp. It's of no use to us now (we're
      already on the TX confirmation path).
      
      Fixes: 0d08c9ec ("enetc: add support time specific departure base on the qos etf")
      Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      2e3a86fe
    • P
      net: davicom: Fix regulator not turned off on driver removal · f93d1f3c
      Paul Cercueil 提交于
      stable inclusion
      from stable-5.10.24
      commit 4fd0654b8f2129b68203974ddee15f804ec011c2
      bugzilla: 51348
      
      --------------------------------
      
      commit cf9e60aa upstream.
      
      We must disable the regulator that was enabled in the probe function.
      
      Fixes: 7994fe55 ("dm9000: Add regulator and reset support to dm9000")
      Signed-off-by: NPaul Cercueil <paul@crapouillou.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f93d1f3c
    • P
      net: davicom: Fix regulator not turned off on failed probe · 69ae4b29
      Paul Cercueil 提交于
      stable inclusion
      from stable-5.10.24
      commit e334c401f3fc9fdf2321e14a5204d591a0713e7f
      bugzilla: 51348
      
      --------------------------------
      
      commit ac88c531 upstream.
      
      When the probe fails or requests to be defered, we must disable the
      regulator that was previously enabled.
      
      Fixes: 7994fe55 ("dm9000: Add regulator and reset support to dm9000")
      Signed-off-by: NPaul Cercueil <paul@crapouillou.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      69ae4b29
    • X
      net: lapbether: Remove netif_start_queue / netif_stop_queue · a5937062
      Xie He 提交于
      stable inclusion
      from stable-5.10.24
      commit 6342ccdfdf2bfb47e80037508dc75bd4e7ace184
      bugzilla: 51348
      
      --------------------------------
      
      commit f7d9d485 upstream.
      
      For the devices in this driver, the default qdisc is "noqueue",
      because their "tx_queue_len" is 0.
      
      In function "__dev_queue_xmit" in "net/core/dev.c", devices with the
      "noqueue" qdisc are specially handled. Packets are transmitted without
      being queued after a "dev->flags & IFF_UP" check. However, it's possible
      that even if this check succeeds, "ops->ndo_stop" may still have already
      been called. This is because in "__dev_close_many", "ops->ndo_stop" is
      called before clearing the "IFF_UP" flag.
      
      If we call "netif_stop_queue" in "ops->ndo_stop", then it's possible in
      "__dev_queue_xmit", it sees the "IFF_UP" flag is present, and then it
      checks "netif_xmit_stopped" and finds that the queue is already stopped.
      In this case, it will complain that:
      "Virtual device ... asks to queue packet!"
      
      To prevent "__dev_queue_xmit" from generating this complaint, we should
      not call "netif_stop_queue" in "ops->ndo_stop".
      
      We also don't need to call "netif_start_queue" in "ops->ndo_open",
      because after a netdev is allocated and registered, the
      "__QUEUE_STATE_DRV_XOFF" flag is initially not set, so there is no need
      to call "netif_start_queue" to clear it.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NXie He <xie.he.0141@gmail.com>
      Acked-by: NMartin Schiller <ms@dev.tdt.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      a5937062
    • W
      stmmac: intel: Fixes clock registration error seen for multiple interfaces · 96e1cedc
      Wong Vee Khee 提交于
      stable inclusion
      from stable-5.10.24
      commit 9c4136081cc2076ca981e68001b3cb8f53800a94
      bugzilla: 51348
      
      --------------------------------
      
      commit 8eb37ab7 upstream.
      
      Issue seen when enumerating multiple Intel mGbE interfaces in EHL.
      
      [    6.898141] intel-eth-pci 0000:00:1d.2: enabling device (0000 -> 0002)
      [    6.900971] intel-eth-pci 0000:00:1d.2: Fail to register stmmac-clk
      [    6.906434] intel-eth-pci 0000:00:1d.2: User ID: 0x51, Synopsys ID: 0x52
      
      We fix it by making the clock name to be unique following the format
      of stmmac-pci_name(pci_dev) so that we can differentiate the clock for
      these Intel mGbE interfaces in EHL platform as follow:
      
        /sys/kernel/debug/clk/stmmac-0000:00:1d.1
        /sys/kernel/debug/clk/stmmac-0000:00:1d.2
        /sys/kernel/debug/clk/stmmac-0000:00:1e.4
      
      Fixes: 58da0cfa ("net: stmmac: create dwmac-intel.c to contain all Intel platform")
      Signed-off-by: NWong Vee Khee <vee.khee.wong@intel.com>
      Signed-off-by: NVoon Weifeng <weifeng.voon@intel.com>
      Co-developed-by: NOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: NOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      96e1cedc
    • O
      net: stmmac: Fix VLAN filter delete timeout issue in Intel mGBE SGMII · f4995f98
      Ong Boon Leong 提交于
      stable inclusion
      from stable-5.10.24
      commit d78f23ef304060608bc9b1627f303842f75d7029
      bugzilla: 51348
      
      --------------------------------
      
      commit 9a7b3950 upstream.
      
      For Intel mGbE controller, MAC VLAN filter delete operation will time-out
      if serdes power-down sequence happened first during driver remove() with
      below message.
      
      [82294.764958] intel-eth-pci 0000:00:1e.4 eth2: stmmac_dvr_remove: removing driver
      [82294.778677] intel-eth-pci 0000:00:1e.4 eth2: Timeout accessing MAC_VLAN_Tag_Filter
      [82294.779997] intel-eth-pci 0000:00:1e.4 eth2: failed to kill vid 0081/0
      [82294.947053] intel-eth-pci 0000:00:1d.2 eth1: stmmac_dvr_remove: removing driver
      [82295.002091] intel-eth-pci 0000:00:1d.1 eth0: stmmac_dvr_remove: removing driver
      
      Therefore, we delay the serdes power-down to be after unregister_netdev()
      which triggers the VLAN filter delete.
      
      Fixes: b9663b7c ("net: stmmac: Enable SERDES power up/down sequence")
      Signed-off-by: NOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f4995f98
    • P
      cipso,calipso: resolve a number of problems with the DOI refcounts · a2ebebf3
      Paul Moore 提交于
      stable inclusion
      from stable-5.10.24
      commit 85178d76febd30a745b7d947dbd9751919d0fa5b
      bugzilla: 51348
      
      --------------------------------
      
      commit ad5d07f4 upstream.
      
      The current CIPSO and CALIPSO refcounting scheme for the DOI
      definitions is a bit flawed in that we:
      
      1. Don't correctly match gets/puts in netlbl_cipsov4_list().
      2. Decrement the refcount on each attempt to remove the DOI from the
         DOI list, only removing it from the list once the refcount drops
         to zero.
      
      This patch fixes these problems by adding the missing "puts" to
      netlbl_cipsov4_list() and introduces a more conventional, i.e.
      not-buggy, refcounting mechanism to the DOI definitions.  Upon the
      addition of a DOI to the DOI list, it is initialized with a refcount
      of one, removing a DOI from the list removes it from the list and
      drops the refcount by one; "gets" and "puts" behave as expected with
      respect to refcounts, increasing and decreasing the DOI's refcount by
      one.
      
      Fixes: b1edeb10 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
      Fixes: d7cce015 ("netlabel: Add support for removing a CALIPSO DOI.")
      Reported-by: syzbot+9ec037722d2603a9f52e@syzkaller.appspotmail.com
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      a2ebebf3
    • H
      netdevsim: init u64 stats for 32bit hardware · db715b39
      Hillf Danton 提交于
      stable inclusion
      from stable-5.10.24
      commit e03ed1190d56f983c582c0853499d4c3a2cfa410
      bugzilla: 51348
      
      --------------------------------
      
      commit 863a42b2 upstream.
      
      Init the u64 stats in order to avoid the lockdep prints on the 32bit
      hardware like
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0
       Hardware name: ARM-Versatile Express
       Backtrace:
       [<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       [<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 [inline])
       [<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 lib/dump_stack.c:120)
       [<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key kernel/locking/lockdep.c:935 [inline])
       [<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 kernel/locking/lockdep.c:1247)
       [<802bef04>] (register_lock_class) from [<802baa2c>] (__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711)
       [<802ba9a8>] (__lock_acquire) from [<802be840>] (lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442)
       [<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5415)
       [<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin include/linux/u64_stats_sync.h:164 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin include/linux/u64_stats_sync.h:175 [inline])
       [<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 drivers/net/netdevsim/netdev.c:70)
       [<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 net/core/dev.c:10405)
       [<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 net/core/rtnetlink.c:1211)
       [<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] (rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783)
       [<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] (rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798)
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3830 [inline])
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3821 [inline])
       [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839)
       [<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c net/core/dev.c:10103)
       [<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 drivers/net/netdevsim/netdev.c:317)
       [<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 drivers/net/netdevsim/dev.c:941)
       [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all drivers/net/netdevsim/dev.c:990 [inline])
       [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119)
       [<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 drivers/net/netdevsim/bus.c:287)
       [<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c drivers/base/dd.c:554)
       [<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 drivers/base/dd.c:740)
       [<811728cc>] (driver_probe_device) from [<81172fe4>] (__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846)
       [<81172f58>] (__device_attach_driver) from [<8116fee0>] (bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431)
       [<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 drivers/base/dd.c:914)
       [<81172b90>] (__device_attach) from [<8117305c>] (device_initial_probe+0x14/0x18 drivers/base/dd.c:961)
       [<81173048>] (device_initial_probe) from [<81171358>] (bus_probe_device+0x90/0x98 drivers/base/bus.c:491)
       [<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 drivers/base/core.c:3109)
       [<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 drivers/base/core.c:3182)
       [<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline])
       [<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 drivers/net/netdevsim/bus.c:215)
       [<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 drivers/base/bus.c:122)
       [<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 fs/sysfs/file.c:139)
       [<805b4b44>] (sysfs_kf_write) from [<805b3c90>] (kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296)
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter include/linux/fs.h:1901 [inline])
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write fs/read_write.c:518 [inline])
       [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c fs/read_write.c:605)
       [<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec fs/read_write.c:658)
       [<804d259c>] (ksys_write) from [<804d2698>] (__do_sys_write fs/read_write.c:670 [inline])
       [<804d259c>] (ksys_write) from [<804d2698>] (sys_write+0x10/0x14 fs/read_write.c:667)
       [<804d2688>] (sys_write) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)
      
      Fixes: 83c9e13a ("netdevsim: add software driver for testing offloads")
      Reported-by: syzbot+e74a6857f2d0efe3ad81@syzkaller.appspotmail.com
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NHillf Danton <hdanton@sina.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      db715b39
    • D
      net: usb: qmi_wwan: allow qmimux add/del with master up · 19b7b69a
      Daniele Palmas 提交于
      stable inclusion
      from stable-5.10.24
      commit 6ed0a2cafd1f08a243123df094aa8479590112bf
      bugzilla: 51348
      
      --------------------------------
      
      commit 6c59cff3 upstream.
      
      There's no reason for preventing the creation and removal
      of qmimux network interfaces when the underlying interface
      is up.
      
      This makes qmi_wwan mux implementation more similar to the
      rmnet one, simplifying userspace management of the same
      logical interfaces.
      
      Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
      Reported-by: NAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: NDaniele Palmas <dnlplm@gmail.com>
      Acked-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      19b7b69a
    • V
      net: dsa: sja1105: fix SGMII PCS being forced to SPEED_UNKNOWN instead of SPEED_10 · 1b757456
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 565b2d3ae20256be43df960206cdd1d8d479c325
      bugzilla: 51348
      
      --------------------------------
      
      commit 053d8ad1 upstream.
      
      When using MLO_AN_PHY or MLO_AN_FIXED, the MII_BMCR of the SGMII PCS is
      read before resetting the switch so it can be reprogrammed afterwards.
      This works for the speeds of 1Gbps and 100Mbps, but not for 10Mbps,
      because SPEED_10 is actually 0, so AND-ing anything with 0 is false,
      therefore that last branch is dead code.
      
      Do what others do (genphy_read_status_fixed, phy_mii_ioctl) and just
      remove the check for SPEED_10, let it fall into the default case.
      
      Fixes: ffe10e67 ("net: dsa: sja1105: Add support for the SGMII port")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      1b757456
    • V
      net: mscc: ocelot: properly reject destination IP keys in VCAP IS1 · 64c28022
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 719611e806deea598088541bd4509a3735d29c92
      bugzilla: 51348
      
      --------------------------------
      
      commit f1becbed upstream.
      
      An attempt is made to warn the user about the fact that VCAP IS1 cannot
      offload keys matching on destination IP (at least given the current half
      key format), but sadly that warning fails miserably in practice, due to
      the fact that it operates on an uninitialized "match" variable. We must
      first decode the keys from the flow rule.
      
      Fixes: 75944fda ("net: mscc: ocelot: offload ingress skbedit and vlan actions to VCAP IS1")
      Reported-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      64c28022
    • M
      net: sched: avoid duplicates in classes dump · 3bfa215f
      Maximilian Heyne 提交于
      stable inclusion
      from stable-5.10.24
      commit 2809a5ca962e96397d9504414a1140a69fe5e138
      bugzilla: 51348
      
      --------------------------------
      
      commit bfc25605 upstream.
      
      This is a follow up of commit ea327469 ("net: sched: avoid
      duplicates in qdisc dump") which has fixed the issue only for the qdisc
      dump.
      
      The duplicate printing also occurs when dumping the classes via
        tc class show dev eth0
      
      Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      Signed-off-by: NMaximilian Heyne <mheyne@amazon.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3bfa215f
    • I
      nexthop: Do not flush blackhole nexthops when loopback goes down · 9edc9d71
      Ido Schimmel 提交于
      stable inclusion
      from stable-5.10.24
      commit 9c61f1e1c40e85e5db4154eba36711d166a38d34
      bugzilla: 51348
      
      --------------------------------
      
      commit 76c03bf8 upstream.
      
      As far as user space is concerned, blackhole nexthops do not have a
      nexthop device and therefore should not be affected by the
      administrative or carrier state of any netdev.
      
      However, when the loopback netdev goes down all the blackhole nexthops
      are flushed. This happens because internally the kernel associates
      blackhole nexthops with the loopback netdev.
      
      This behavior is both confusing to those not familiar with kernel
      internals and also diverges from the legacy API where blackhole IPv4
      routes are not flushed when the loopback netdev goes down:
      
       # ip route add blackhole 198.51.100.0/24
       # ip link set dev lo down
       # ip route show 198.51.100.0/24
       blackhole 198.51.100.0/24
      
      Blackhole IPv6 routes are flushed, but at least user space knows that
      they are associated with the loopback netdev:
      
       # ip -6 route show 2001:db8:1::/64
       blackhole 2001:db8:1::/64 dev lo metric 1024 pref medium
      
      Fix this by only flushing blackhole nexthops when the loopback netdev is
      unregistered.
      
      Fixes: ab84be7e ("net: Initial nexthop code")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reported-by: NDonald Sharp <sharpd@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      9edc9d71
    • O
      net: stmmac: fix incorrect DMA channel intr enable setting of EQoS v4.10 · 025d472a
      Ong Boon Leong 提交于
      stable inclusion
      from stable-5.10.24
      commit 87b7b19d6e1dabbd12344b2784b78ea8b4992f6f
      bugzilla: 51348
      
      --------------------------------
      
      commit 879c348c upstream.
      
      We introduce dwmac410_dma_init_channel() here for both EQoS v4.10 and
      above which use different DMA_CH(n)_Interrupt_Enable bit definitions for
      NIE and AIE.
      
      Fixes: 48863ce5 ("stmmac: add DMA support for GMAC 4.xx")
      Signed-off-by: NOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: NRamesh Babu B <ramesh.babu.b@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      025d472a
    • K
      net/mlx4_en: update moderation when config reset · 24d7b86f
      Kevin(Yudong) Yang 提交于
      stable inclusion
      from stable-5.10.24
      commit 6b0d3ae1051bdca4acf91a66be64d42d1c0f577b
      bugzilla: 51348
      
      --------------------------------
      
      commit 00ff801b upstream.
      
      This patch fixes a bug that the moderation config will not be
      applied when calling mlx4_en_reset_config. For example, when
      turning on rx timestamping, mlx4_en_reset_config() will be called,
      causing the NIC to forget previous moderation config.
      
      This fix is in phase with a previous fix:
      commit 79c54b6b ("net/mlx4_en: Fix TX moderation info loss
      after set_ringparam is called")
      
      Tested: Before this patch, on a host with NIC using mlx4, run
      netserver and stream TCP to the host at full utilization.
      $ sar -I SUM 1
                       INTR    intr/s
      14:03:56          sum  48758.00
      
      After rx hwtstamp is enabled:
      $ sar -I SUM 1
      14:10:38          sum 317771.00
      We see the moderation is not working properly and issued 7x more
      interrupts.
      
      After the patch, and turned on rx hwtstamp, the rate of interrupts
      is as expected:
      $ sar -I SUM 1
      14:52:11          sum  49332.00
      
      Fixes: 79c54b6b ("net/mlx4_en: Fix TX moderation info loss after set_ringparam is called")
      Signed-off-by: NKevin(Yudong) Yang <yyd@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      CC: Tariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      24d7b86f
    • B
      net: ethernet: mtk-star-emac: fix wrong unmap in RX handling · 8f07278d
      Biao Huang 提交于
      stable inclusion
      from stable-5.10.24
      commit fa0bc09db49bf4875d9a8c88813fe2b87c1059bb
      bugzilla: 51348
      
      --------------------------------
      
      commit 95b39f07 upstream.
      
      mtk_star_dma_unmap_rx() should unmap the dma_addr of old skb rather than
      that of new skb.
      Assign new_dma_addr to desc_data.dma_addr after all handling of old skb
      ends to avoid unexpected receive side error.
      
      Fixes: f96e9641 ("net: ethernet: mtk-star-emac: fix error path in RX handling")
      Signed-off-by: NBiao Huang <biao.huang@mediatek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8f07278d
    • V
      net: enetc: keep RX ring consumer index in sync with hardware · 684db2b4
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 1cdd008902d4e32f270e8fdb3239db6412f0a90b
      bugzilla: 51348
      
      --------------------------------
      
      commit 3a5d12c9 upstream.
      
      The RX rings have a producer index owned by hardware, where newly
      received frame buffers are placed, and a consumer index owned by
      software, where newly allocated buffers are placed, in expectation of
      hardware being able to place frame data in them.
      
      Hardware increments the producer index when a frame is received, however
      it is not allowed to increment the producer index to match the consumer
      index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received
      BDs. Whenever the producer index matches the value of the consumer
      index, the ring has no unprocessed received frames and all BDs in the
      ring have been initialized/prepared by software, i.e. hardware owns all
      BDs in the ring.
      
      The code uses the next_to_clean variable to keep track of the producer
      index, and the next_to_use variable to keep track of the consumer index.
      
      The RX rings are seeded from enetc_refill_rx_ring, which is called from
      two places:
      
      1. initially the ring is seeded until full with enetc_bd_unused(rx_ring),
         i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511:
      
      .ndo_open
      -> enetc_open
         -> enetc_setup_bdrs
            -> enetc_setup_rxbdr
               -> enetc_refill_rx_ring
      
      2. then during the data path processing, it is refilled with 16 buffers
         at a time:
      
      enetc_msix
      -> napi_schedule
         -> enetc_poll
            -> enetc_clean_rx_ring
               -> enetc_refill_rx_ring
      
      There is just one problem: the initial seeding done during .ndo_open
      updates just the producer index (ENETC_RBPIR) with 0, and the software
      next_to_clean and next_to_use variables. Notably, it will not update the
      consumer index to make the hardware aware of the newly added buffers.
      
      Wait, what? So how does it work?
      
      Well, the reset values of the producer index and of the consumer index
      of a ring are both zero. As per the description in the second paragraph,
      it means that the ring is full of buffers waiting for hardware to put
      frames in them, which by coincidence is almost true, because we have in
      fact seeded 511 buffers into the ring.
      
      But will the hardware attempt to access the 512th entry of the ring,
      which has an invalid BD in it? Well, no, because in order to do that, it
      would have to first populate the first 511 entries, and the NAPI
      enetc_poll will kick in by then. Eventually, after 16 processed slots
      have become available in the RX ring, enetc_clean_rx_ring will call
      enetc_refill_rx_ring and then will [ finally ] update the consumer index
      with the new software next_to_use variable. From now on, the
      next_to_clean and next_to_use variables are in sync with the producer
      and consumer ring indices.
      
      So the day is saved, right? Well, not quite. Freeing the memory
      allocated for the rings is done in:
      
      enetc_close
      -> enetc_clear_bdrs
         -> enetc_clear_rxbdr
            -> this just disables the ring
      -> enetc_free_rxtx_rings
         -> enetc_free_rx_ring
            -> sets next_to_clean and next_to_use to 0
      
      but again, nothing is committed to the hardware producer and consumer
      indices (yay!). The assumption is that the ring is disabled, so the
      indices don't matter anyway, and it's the responsibility of the "open"
      code path to set those up.
      
      .. Except that the "open" code path does not set those up properly.
      
      While initially, things almost work, during subsequent enetc_close ->
      enetc_open sequences, we have problems. To be precise, the enetc_open
      that is subsequent to enetc_close will again refill the ring with 511
      entries, but it will leave the consumer index untouched. Untouched
      means, of course, equal to the value it had before disabling the ring
      and draining the old buffers in enetc_close.
      
      But as mentioned, enetc_setup_rxbdr will at least update the producer
      index though, through this line of code:
      
      	enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);
      
      so at this stage we'll have:
      
      next_to_clean=0 (in hardware 0)
      next_to_use=511 (in hardware we'll have the refill index prior to enetc_close)
      
      Again, the next_to_clean and producer index are in sync and set to
      correct values, so the driver manages to limp on. Eventually, 16 ring
      entries will be consumed by enetc_poll, and the savior
      enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then
      update the hardware consumer ring based upon the new next_to_use.
      
      So.. it works?
      Well, by coincidence, it almost does, but there's a circumstance where
      enetc_clean_rx_ring won't be there to save us. If the previous value of
      the consumer index was 15, there's a problem, because the NAPI poll
      sequence will only issue a refill when 16 or more buffers have been
      consumed.
      
      It's easiest to illustrate this with an example:
      
      ip link set eno0 up
      ip addr add 192.168.100.1/24 dev eno0
      ping 192.168.100.1 -c 20 # ping this port from another board
      ip link set eno0 down
      ip link set eno0 up
      ping 192.168.100.1 -c 20 # ping it again from the same other board
      
      One by one:
      
      1. ip link set eno0 up
      -> calls enetc_setup_rxbdr:
         -> calls enetc_refill_rx_ring(511 buffers)
         -> next_to_clean=0 (in hw 0)
         -> next_to_use=511 (in hw 0)
      
      2. ping 192.168.100.1 -c 20 # ping this port from another board
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15)
      
      20 packets transmitted, 20 packets received, 0% packet loss
      
      3. ip link set eno0 down
      enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15)
      
      4. ip link set eno0 up
      -> calls enetc_setup_rxbdr:
         -> calls enetc_refill_rx_ring(511 buffers)
         -> next_to_clean=0 (in hw 0)
         -> next_to_use=511 (in hw 15)
      
      5. ping 192.168.100.1 -c 20 # ping it again from the same other board
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15)
      
      20 packets transmitted, 12 packets received, 40% packet loss
      
      And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal
      to 15 for that to happen), no nothing. The hardware enters the condition where
      the producer (14) + 1 is equal to the consumer (15) index, which makes it
      believe it has no more free buffers to put packets in, so it starts discarding
      them:
      
      ip netns exec ns0 ethtool -S eno0 | grep -v ': 0'
      NIC statistics:
           Rx ring  0 discarded frames: 8
      
      Summarized, if the interface receives between 16 and 32 (mod 512) frames
      and then there is a link flap, then the port will eventually die with no
      way to recover. If it receives less than 16 (mod 512) frames, then the
      initial NAPI poll [ before the link flap ] will not update the consumer
      index in hardware (it will remain zero) which will be ok when the buffers
      are later reinitialized. If more than 32 (mod 512) frames are received,
      the initial NAPI poll has the chance to refill the ring twice, updating
      the consumer index to at least 32. So after the link flap, the consumer
      index is still wrong, but the post-flap NAPI poll gets a chance to
      refill the ring once (because it passes through cleaned_cnt=15) and
      makes the consumer index be again back in sync with next_to_use.
      
      The solution to this problem is actually simple, we just need to write
      next_to_use into the hardware consumer index at enetc_open time, which
      always brings it back in sync after an initial buffer seeding process.
      
      The simpler thing would be to put the write to the consumer index into
      enetc_refill_rx_ring directly, but there are issues with the MDIO
      locking: in the NAPI poll code we have the enetc_lock_mdio() taken from
      top-level and we use the unlocked enetc_wr_reg_hot, whereas in
      enetc_open, the enetc_lock_mdio() is not taken at the top level, but
      instead by each individual enetc_wr_reg, so we are forced to put an
      additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of
      the code is left as a refactoring exercise.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      684db2b4
    • V
      net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr · 3829c6cf
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 5317365401119e88268d61691d298704ca7286c4
      bugzilla: 51348
      
      --------------------------------
      
      commit 96a5223b upstream.
      
      The Station Interface Receive Interrupt Detect Register (SIRXIDR)
      contains a 16-bit wide mask of 'interrupt detected' events for each ring
      associated with a port. Bit i is write-1-to-clean for RX ring i.
      
      I have no explanation whatsoever how this line of code came to be
      inserted in the blamed commit. I checked the downstream versions of that
      patch and none of them have it.
      
      The somewhat comical aspect of it is that we're writing a binary number
      to the SIRXIDR register, which is derived from enetc_bd_unused(rx_ring).
      Since the RX rings have 512 buffer descriptors, we end up writing 511 to
      this register, which is 0x1ff, so we are effectively clearing the
      'interrupt detected' event for rings 0-8.
      
      This register is not what is used for interrupt handling though - it
      only provides a summary for the entire SI. The hardware provides one
      separate Interrupt Detect Register per RX ring, which auto-clears upon
      read. So there doesn't seem to be any adverse effect caused by this
      bogus write.
      
      There is, however, one reason why this should be handled as a bugfix:
      next_to_clean _should_ be committed to hardware, just not to that
      register, and this was obscuring the fact that it wasn't. This is fixed
      in the next patch, and removing the bogus line now allows the fix patch
      to be backported beyond that point.
      
      Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3829c6cf
    • V
      net: enetc: force the RGMII speed and duplex instead of operating in inband mode · d0a90192
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 63876df5615edfe94291409eb862f4570e2f4ffc
      bugzilla: 51348
      
      --------------------------------
      
      commit c76a9721 upstream.
      
      The ENETC port 0 MAC supports in-band status signaling coming from a PHY
      when operating in RGMII mode, and this feature is enabled by default.
      
      It has been reported that RGMII is broken in fixed-link, and that is not
      surprising considering the fact that no PHY is attached to the MAC in
      that case, but a switch.
      
      This brings us to the topic of the patch: the enetc driver should have
      not enabled the optional in-band status signaling for RGMII unconditionally,
      but should have forced the speed and duplex to what was resolved by
      phylink.
      
      Note that phylink does not accept the RGMII modes as valid for in-band
      signaling, and these operate a bit differently than 1000base-x and SGMII
      (notably there is no clause 37 state machine so no ACK required from the
      MAC, instead the PHY sends extra code words on RXD[3:0] whenever it is
      not transmitting something else, so it should be safe to leave a PHY
      with this option unconditionally enabled even if we ignore it). The spec
      talks about this here:
      https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/138/RGMIIv1_5F00_3.pdf
      
      Fixes: 71b77a7a ("enetc: Migrate to PHYLINK and PCS_LYNX")
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      d0a90192
    • V
      net: enetc: don't disable VLAN filtering in IFF_PROMISC mode · 4b000964
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit 5732688c8411b1d29a3676819c279236b0a0ec5b
      bugzilla: 51348
      
      --------------------------------
      
      commit a74dbce9 upstream.
      
      Quoting from the blamed commit:
      
          In promiscuous mode, it is more intuitive that all traffic is received,
          including VLAN tagged traffic. It appears that it is necessary to set
          the flag in PSIPVMR for that to be the case, so VLAN promiscuous mode is
          also temporarily enabled. On exit from promiscuous mode, the setting
          made by ethtool is restored.
      
      Intuitive or not, there isn't any definition issued by a standards body
      which says that promiscuity has anything to do with VLAN filtering - it
      only has to do with accepting packets regardless of destination MAC address.
      
      In fact people are already trying to use this misunderstanding/bug of
      the enetc driver as a justification to transform promiscuity into
      something it never was about: accepting every packet (maybe that would
      be the "rx-all" netdev feature?):
      https://lore.kernel.org/netdev/20201110153958.ci5ekor3o2ekg3ky@ipetronik.com/
      
      This is relevant because there are use cases in the kernel (such as
      tc-flower rules with the protocol 802.1Q and a vlan_id key) which do not
      (yet) use the vlan_vid_add API to be compatible with VLAN-filtering NICs
      such as enetc, so for those, disabling rx-vlan-filter is currently the
      only right solution to make these setups work:
      https://lore.kernel.org/netdev/CA+h21hoxwRdhq4y+w8Kwgm74d4cA0xLeiHTrmT-VpSaM7obhkg@mail.gmail.com/
      The blamed patch has unintentionally introduced one more way for this to
      work, which is to enable IFF_PROMISC, however this is non-portable
      because port promiscuity is not meant to disable VLAN filtering.
      Therefore, it could invite people to write broken scripts for enetc, and
      then wonder why they are broken when migrating to other drivers that
      don't handle promiscuity in the same way.
      
      Fixes: 7070eea5 ("enetc: permit configuration of rx-vlan-filter with ethtool")
      Cc: Markus Blöchl <Markus.Bloechl@ipetronik.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      4b000964
    • V
      net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets · fded4910
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit d56e3f8d289bdc70378f84efab166ad38022532e
      bugzilla: 51348
      
      --------------------------------
      
      commit 827b6fd0 upstream.
      
      When the enetc ports have rx-vlan-offload enabled, they report a TPID of
      ETH_P_8021Q regardless of what was actually in the packet. When
      rx-vlan-offload is disabled, packets have the proper TPID. Fix this
      inconsistency by finishing the TODO left in the code.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      fded4910
    • V
      net: enetc: take the MDIO lock only once per NAPI poll cycle · cfce9813
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit bf9c564716a13dde6a990d3b02c27cd6e39608bf
      bugzilla: 51348
      
      --------------------------------
      
      commit 6d36ecdb upstream.
      
      The workaround for the ENETC MDIO erratum caused a performance
      degradation of 82 Kpps (seen with IP forwarding of two 1Gbps streams of
      64B packets). This is due to excessive locking and unlocking in the fast
      path, which can be avoided.
      
      By taking the MDIO read-side lock only once per NAPI poll cycle, we are
      able to regain 54 Kpps (65%) of the performance hit. The rest of the
      performance degradation comes from the TX data path, but unfortunately
      it doesn't look like we can optimize that away easily, even with
      netdev_xmit_more(), there just isn't any skb batching done, to help with
      taking the MDIO lock less often than once per packet.
      
      We need to change the register accessor type for enetc_get_tx_tstamp,
      because it now runs under the enetc_lock_mdio as per the new call path
      detailed below:
      
      enetc_msix
      -> napi_schedule
         -> enetc_poll
            -> enetc_lock_mdio
            -> enetc_clean_tx_ring
               -> enetc_get_tx_tstamp
            -> enetc_clean_rx_ring
            -> enetc_unlock_mdio
      
      Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      cfce9813
    • V
      net: enetc: don't overwrite the RSS indirection table when initializing · 88db41d4
      Vladimir Oltean 提交于
      stable inclusion
      from stable-5.10.24
      commit dfaf418dfff819aaa5e6a945bb8efd38d53b6eb9
      bugzilla: 51348
      
      --------------------------------
      
      commit c646d10d upstream.
      
      After the blamed patch, all RX traffic gets hashed to CPU 0 because the
      hashing indirection table set up in:
      
      enetc_pf_probe
      -> enetc_alloc_si_resources
         -> enetc_configure_si
            -> enetc_setup_default_rss_table
      
      is overwritten later in:
      
      enetc_pf_probe
      -> enetc_init_port_rss_memory
      
      which zero-initializes the entire port RSS table in order to avoid ECC errors.
      
      The trouble really is that enetc_init_port_rss_memory really neads
      enetc_alloc_si_resources to be called, because it depends upon
      enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si
      thing could have been better thought out, it has nothing to do in a
      function called "alloc_si_resources", especially since its counterpart,
      "free_si_resources", does nothing to unwind the configuration of the SI.
      
      The point is, we need to pull out enetc_configure_si out of
      enetc_alloc_resources, and move it after enetc_init_port_rss_memory.
      This allows us to set up the default RSS indirection table after
      initializing the memory.
      
      Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories")
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      88db41d4
    • S
      sh_eth: fix TRSCER mask for SH771x · 2b4b2b80
      Sergey Shtylyov 提交于
      stable inclusion
      from stable-5.10.24
      commit 4ea379733555d652acadb05112a3365e5059f6f4
      bugzilla: 51348
      
      --------------------------------
      
      commit 8c91bc3d upstream.
      
      According  to  the SH7710, SH7712, SH7713 Group User's Manual: Hardware,
      Rev. 3.00, the TRSCER register actually has only bit 7 valid (and named
      differently), with all the other bits reserved. Apparently, this was not
      the case with some early revisions of the manual as we have the other
      bits declared (and set) in the original driver.  Follow the suit and add
      the explicit sh_eth_cpu_data::trscer_err_mask initializer for SH771x...
      
      Fixes: 86a74ff2 ("net: sh_eth: add support for Renesas SuperH Ethernet")
      Signed-off-by: NSergey Shtylyov <s.shtylyov@omprussia.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      2b4b2b80
    • D
      net: dsa: tag_rtl4_a: fix egress tags · 8de05457
      DENG Qingfang 提交于
      stable inclusion
      from stable-5.10.24
      commit 68277f69a8734a444a05dce9f78ce79c1225d08d
      bugzilla: 51348
      
      --------------------------------
      
      commit 9eb8bc59 upstream.
      
      Commit 86dd9868 has several issues, but was accepted too soon
      before anyone could take a look.
      
      - Double free. dsa_slave_xmit() will free the skb if the xmit function
        returns NULL, but the skb is already freed by eth_skb_pad(). Use
        __skb_put_padto() to avoid that.
      - Unnecessary allocation. It has been done by DSA core since commit
        a3b0b647.
      - A u16 pointer points to skb data. It should be __be16 for network
        byte order.
      - Typo in comments. "numer" -> "number".
      
      Fixes: 86dd9868 ("net: dsa: tag_rtl4_a: Support also egress tags")
      Signed-off-by: NDENG Qingfang <dqfext@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8de05457
    • J
      docs: networking: drop special stable handling · 8157b471
      Jakub Kicinski 提交于
      stable inclusion
      from stable-5.10.24
      commit 389055e7b97048c7ecd6066cdac2c703bae493bc
      bugzilla: 51348
      
      --------------------------------
      
      commit dbbe7c96 upstream.
      
      Leave it to Greg.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8157b471
    • L
      Revert "mm, slub: consider rest of partial list if acquire_slab() fails" · b36ca11e
      Linus Torvalds 提交于
      stable inclusion
      from stable-5.10.24
      commit e1759160877a06082a9323dfb9437abfbe4af2d3
      bugzilla: 51348
      
      --------------------------------
      
      commit 9b1ea29b upstream.
      
      This reverts commit 8ff60eb0.
      
      The kernel test robot reports a huge performance regression due to the
      commit, and the reason seems fairly straightforward: when there is
      contention on the page list (which is what causes acquire_slab() to
      fail), we do _not_ want to just loop and try again, because that will
      transfer the contention to the 'n->list_lock' spinlock we hold, and
      just make things even worse.
      
      This is admittedly likely a problem only on big machines - the kernel
      test robot report comes from a 96-thread dual socket Intel Xeon Gold
      6252 setup, but the regression there really is quite noticeable:
      
         -47.9% regression of stress-ng.rawpkt.ops_per_sec
      
      and the commit that was marked as being fixed (7ced3719: "slub:
      Acquire_slab() avoid loop") actually did the loop exit early very
      intentionally (the hint being that "avoid loop" part of that commit
      message), exactly to avoid this issue.
      
      The correct thing to do may be to pick some kind of reasonable middle
      ground: instead of breaking out of the loop on the very first sign of
      contention, or trying over and over and over again, the right thing may
      be to re-try _once_, and then give up on the second failure (or pick
      your favorite value for "once"..).
      Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/
      Cc: Jann Horn <jannh@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b36ca11e
    • P
      cifs: return proper error code in statfs(2) · 9ae15b79
      Paulo Alcantara 提交于
      stable inclusion
      from stable-5.10.24
      commit 3d0bbd97eb6f32bcc1365252aa04a8984bab5007
      bugzilla: 51348
      
      --------------------------------
      
      commit 14302ee3 upstream.
      
      In cifs_statfs(), if server->ops->queryfs is not NULL, then we should
      use its return value rather than always returning 0.  Instead, use rc
      variable as it is properly set to 0 in case there is no
      server->ops->queryfs.
      Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      9ae15b79
    • C
      mount: fix mounting of detached mounts onto targets that reside on shared mounts · 3ef215f4
      Christian Brauner 提交于
      stable inclusion
      from stable-5.10.24
      commit 36e1efcdc54274d03e67ed6a9d5c1c2a2e77e947
      bugzilla: 51348
      
      --------------------------------
      
      commit ee2e3f50 upstream.
      
      Creating a series of detached mounts, attaching them to the filesystem,
      and unmounting them can be used to trigger an integer overflow in
      ns->mounts causing the kernel to block any new mounts in count_mounts()
      and returning ENOSPC because it falsely assumes that the maximum number
      of mounts in the mount namespace has been reached, i.e. it thinks it
      can't fit the new mounts into the mount namespace anymore.
      
      Depending on the number of mounts in your system, this can be reproduced
      on any kernel that supportes open_tree() and move_mount() by compiling
      and running the following program:
      
        /* SPDX-License-Identifier: LGPL-2.1+ */
      
        #define _GNU_SOURCE
        #include <errno.h>
        #include <fcntl.h>
        #include <getopt.h>
        #include <limits.h>
        #include <stdbool.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <string.h>
        #include <sys/mount.h>
        #include <sys/stat.h>
        #include <sys/syscall.h>
        #include <sys/types.h>
        #include <unistd.h>
      
        /* open_tree() */
        #ifndef OPEN_TREE_CLONE
        #define OPEN_TREE_CLONE 1
        #endif
      
        #ifndef OPEN_TREE_CLOEXEC
        #define OPEN_TREE_CLOEXEC O_CLOEXEC
        #endif
      
        #ifndef __NR_open_tree
                #if defined __alpha__
                        #define __NR_open_tree 538
                #elif defined _MIPS_SIM
                        #if _MIPS_SIM == _MIPS_SIM_ABI32        /* o32 */
                                #define __NR_open_tree 4428
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_NABI32       /* n32 */
                                #define __NR_open_tree 6428
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_ABI64        /* n64 */
                                #define __NR_open_tree 5428
                        #endif
                #elif defined __ia64__
                        #define __NR_open_tree (428 + 1024)
                #else
                        #define __NR_open_tree 428
                #endif
        #endif
      
        /* move_mount() */
        #ifndef MOVE_MOUNT_F_EMPTY_PATH
        #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */
        #endif
      
        #ifndef __NR_move_mount
                #if defined __alpha__
                        #define __NR_move_mount 539
                #elif defined _MIPS_SIM
                        #if _MIPS_SIM == _MIPS_SIM_ABI32        /* o32 */
                                #define __NR_move_mount 4429
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_NABI32       /* n32 */
                                #define __NR_move_mount 6429
                        #endif
                        #if _MIPS_SIM == _MIPS_SIM_ABI64        /* n64 */
                                #define __NR_move_mount 5429
                        #endif
                #elif defined __ia64__
                        #define __NR_move_mount (428 + 1024)
                #else
                        #define __NR_move_mount 429
                #endif
        #endif
      
        static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags)
        {
                return syscall(__NR_open_tree, dfd, filename, flags);
        }
      
        static inline int sys_move_mount(int from_dfd, const char *from_pathname, int to_dfd,
                                         const char *to_pathname, unsigned int flags)
        {
                return syscall(__NR_move_mount, from_dfd, from_pathname, to_dfd, to_pathname, flags);
        }
      
        static bool is_shared_mountpoint(const char *path)
        {
                bool shared = false;
                FILE *f = NULL;
                char *line = NULL;
                int i;
                size_t len = 0;
      
                f = fopen("/proc/self/mountinfo", "re");
                if (!f)
                        return 0;
      
                while (getline(&line, &len, f) > 0) {
                        char *slider1, *slider2;
      
                        for (slider1 = line, i = 0; slider1 && i < 4; i++)
                                slider1 = strchr(slider1 + 1, ' ');
      
                        if (!slider1)
                                continue;
      
                        slider2 = strchr(slider1 + 1, ' ');
                        if (!slider2)
                                continue;
      
                        *slider2 = '\0';
                        if (strcmp(slider1 + 1, path) == 0) {
                                /* This is the path. Is it shared? */
                                slider1 = strchr(slider2 + 1, ' ');
                                if (slider1 && strstr(slider1, "shared:")) {
                                        shared = true;
                                        break;
                                }
                        }
                }
                fclose(f);
                free(line);
      
                return shared;
        }
      
        static void usage(void)
        {
                const char *text = "mount-new [--recursive] <base-dir>\n";
                fprintf(stderr, "%s", text);
                _exit(EXIT_SUCCESS);
        }
      
        #define exit_usage(format, ...)                              \
                ({                                                   \
                        fprintf(stderr, format "\n", ##__VA_ARGS__); \
                        usage();                                     \
                })
      
        #define exit_log(format, ...)                                \
                ({                                                   \
                        fprintf(stderr, format "\n", ##__VA_ARGS__); \
                        exit(EXIT_FAILURE);                          \
                })
      
        static const struct option longopts[] = {
                {"help",        no_argument,            0,      'a'},
                { NULL,         no_argument,            0,       0 },
        };
      
        int main(int argc, char *argv[])
        {
                int exit_code = EXIT_SUCCESS, index = 0;
                int dfd, fd_tree, new_argc, ret;
                char *base_dir;
                char *const *new_argv;
                char target[PATH_MAX];
      
                while ((ret = getopt_long_only(argc, argv, "", longopts, &index)) != -1) {
                        switch (ret) {
                        case 'a':
                                /* fallthrough */
                        default:
                                usage();
                        }
                }
      
                new_argv = &argv[optind];
                new_argc = argc - optind;
                if (new_argc < 1)
                        exit_usage("Missing base directory\n");
                base_dir = new_argv[0];
      
                if (*base_dir != '/')
                        exit_log("Please specify an absolute path");
      
                /* Ensure that target is a shared mountpoint. */
                if (!is_shared_mountpoint(base_dir))
                        exit_log("Please ensure that \"%s\" is a shared mountpoint", base_dir);
      
                dfd = open(base_dir, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
                if (dfd < 0)
                        exit_log("%m - Failed to open base directory \"%s\"", base_dir);
      
                ret = mkdirat(dfd, "detached-move-mount", 0755);
                if (ret < 0)
                        exit_log("%m - Failed to create required temporary directories");
      
                ret = snprintf(target, sizeof(target), "%s/detached-move-mount", base_dir);
                if (ret < 0 || (size_t)ret >= sizeof(target))
                        exit_log("%m - Failed to assemble target path");
      
                /*
                 * Having a mount table with 10000 mounts is already quite excessive
                 * and shoult account even for weird test systems.
                 */
                for (size_t i = 0; i < 10000; i++) {
                        fd_tree = sys_open_tree(dfd, "detached-move-mount",
                                                OPEN_TREE_CLONE |
                                                OPEN_TREE_CLOEXEC |
                                                AT_EMPTY_PATH);
                        if (fd_tree < 0) {
                                fprintf(stderr, "%m - Failed to open %d(detached-move-mount)", dfd);
                                exit_code = EXIT_FAILURE;
                                break;
                        }
      
                        ret = sys_move_mount(fd_tree, "", dfd, "detached-move-mount", MOVE_MOUNT_F_EMPTY_PATH);
                        if (ret < 0) {
                                if (errno == ENOSPC)
                                        fprintf(stderr, "%m - Buggy mount counting");
                                else
                                        fprintf(stderr, "%m - Failed to attach mount to %d(detached-move-mount)", dfd);
                                exit_code = EXIT_FAILURE;
                                break;
                        }
                        close(fd_tree);
      
                        ret = umount2(target, MNT_DETACH);
                        if (ret < 0) {
                                fprintf(stderr, "%m - Failed to unmount %s", target);
                                exit_code = EXIT_FAILURE;
                                break;
                        }
                }
      
                (void)unlinkat(dfd, "detached-move-mount", AT_REMOVEDIR);
                close(dfd);
      
                exit(exit_code);
        }
      
      and wait for the kernel to refuse any new mounts by returning ENOSPC.
      How many iterations are needed depends on the number of mounts in your
      system. Assuming you have something like 50 mounts on a standard system
      it should be almost instantaneous.
      
      The root cause of this is that detached mounts aren't handled correctly
      when source and target mount are identical and reside on a shared mount
      causing a broken mount tree where the detached source itself is
      propagated which propagation prevents for regular bind-mounts and new
      mounts. This ultimately leads to a miscalculation of the number of
      mounts in the mount namespace.
      
      Detached mounts created via
      open_tree(fd, path, OPEN_TREE_CLONE)
      are essentially like an unattached new mount, or an unattached
      bind-mount. They can then later on be attached to the filesystem via
      move_mount() which calls into attach_recursive_mount(). Part of
      attaching it to the filesystem is making sure that mounts get correctly
      propagated in case the destination mountpoint is MS_SHARED, i.e. is a
      shared mountpoint. This is done by calling into propagate_mnt() which
      walks the list of peers calling propagate_one() on each mount in this
      list making sure it receives the propagation event.
      The propagate_one() functions thereby skips both new mounts and bind
      mounts to not propagate them "into themselves". Both are identified by
      checking whether the mount is already attached to any mount namespace in
      mnt->mnt_ns. The is what the IS_MNT_NEW() helper is responsible for.
      
      However, detached mounts have an anonymous mount namespace attached to
      them stashed in mnt->mnt_ns which means that IS_MNT_NEW() doesn't
      realize they need to be skipped causing the mount to propagate "into
      itself" breaking the mount table and causing a disconnect between the
      number of mounts recorded as being beneath or reachable from the target
      mountpoint and the number of mounts actually recorded/counted in
      ns->mounts ultimately causing an overflow which in turn prevents any new
      mounts via the ENOSPC issue.
      
      So teach propagation to handle detached mounts by making it aware of
      them. I've been tracking this issue down for the last couple of days and
      then verifying that the fix is correct by
      unmounting everything in my current mount table leaving only /proc and
      /sys mounted and running the reproducer above overnight verifying the
      number of mounts counted in ns->mounts. With this fix the counts are
      correct and the ENOSPC issue can't be reproduced.
      
      This change will only have an effect on mounts created with the new
      mount API since detached mounts cannot be created with the old mount API
      so regressions are extremely unlikely.
      
      Link: https://lore.kernel.org/r/20210306101010.243666-1-christian.brauner@ubuntu.com
      Fixes: 2db154b3 ("vfs: syscall: Add move_mount(2) to move mounts around")
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      3ef215f4
    • C
      powerpc/603: Fix protection of user pages mapped with PROT_NONE · f0404980
      Christophe Leroy 提交于
      stable inclusion
      from stable-5.10.24
      commit aa1258d91455a75474d0541f746537c9bb0484c3
      bugzilla: 51348
      
      --------------------------------
      
      commit c119565a upstream.
      
      On book3s/32, page protection is defined by the PP bits in the PTE
      which provide the following protection depending on the access
      keys defined in the matching segment register:
      - PP 00 means RW with key 0 and N/A with key 1.
      - PP 01 means RW with key 0 and RO with key 1.
      - PP 10 means RW with both key 0 and key 1.
      - PP 11 means RO with both key 0 and key 1.
      
      Since the implementation of kernel userspace access protection,
      PP bits have been set as follows:
      - PP00 for pages without _PAGE_USER
      - PP01 for pages with _PAGE_USER and _PAGE_RW
      - PP11 for pages with _PAGE_USER and without _PAGE_RW
      
      For kernelspace segments, kernel accesses are performed with key 0
      and user accesses are performed with key 1. As PP00 is used for
      non _PAGE_USER pages, user can't access kernel pages not flagged
      _PAGE_USER while kernel can.
      
      For userspace segments, both kernel and user accesses are performed
      with key 0, therefore pages not flagged _PAGE_USER are still
      accessible to the user.
      
      This shouldn't be an issue, because userspace is expected to be
      accessible to the user. But unlike most other architectures, powerpc
      implements PROT_NONE protection by removing _PAGE_USER flag instead of
      flagging the page as not valid. This means that pages in userspace
      that are not flagged _PAGE_USER shall remain inaccessible.
      
      To get the expected behaviour, just mimic other architectures in the
      TLB miss handler by checking _PAGE_USER permission on userspace
      accesses as if it was the _PAGE_PRESENT bit.
      
      Note that this problem only is only for 603 cores. The 604+ have
      an hash table, and hash_page() function already implement the
      verification of _PAGE_USER permission on userspace pages.
      
      Fixes: f342adca ("powerpc/32s: Prepare Kernel Userspace Access Protection")
      Cc: stable@vger.kernel.org # v5.2+
      Reported-by: NChristoph Plattner <christoph.plattner@thalesgroup.com>
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/4a0c6e3bb8f0c162457bf54d9bc6fd8d7b55129f.1612160907.git.christophe.leroy@csgroup.euSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f0404980
    • L
      mt76: dma: do not report truncated frames to mac80211 · de914b17
      Lorenzo Bianconi 提交于
      stable inclusion
      from stable-5.10.24
      commit e36d276dd4be6085b2f830dbb24e4746ec4a042b
      bugzilla: 51348
      
      --------------------------------
      
      commit d0bd52c5 upstream.
      
      Commit b102f0c5 ("mt76: fix array overflow on receiving too many
      fragments for a packet") fixes a possible OOB access but it introduces a
      memory leak since the pending frame is not released to page_frag_cache
      if the frag array of skb_shared_info is full. Commit 93a1d479
      ("mt76: dma: fix a possible memory leak in mt76_add_fragment()") fixes
      the issue but does not free the truncated skb that is forwarded to
      mac80211 layer. Fix the leftover issue discarding even truncated skbs.
      
      Fixes: 93a1d479 ("mt76: dma: fix a possible memory leak in mt76_add_fragment()")
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/a03166fcc8214644333c68674a781836e0f57576.1612697217.git.lorenzo@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      de914b17
    • J
      ibmvnic: always store valid MAC address · adf9ab95
      Jiri Wiesner 提交于
      stable inclusion
      from stable-5.10.24
      commit 1e343b2e7b9678f199df9693a3548e9a4ab98488
      bugzilla: 51348
      
      --------------------------------
      
      commit 67eb2114 upstream.
      
      The last change to ibmvnic_set_mac(), 8fc3672a, meant to prevent
      users from setting an invalid MAC address on an ibmvnic interface
      that has not been brought up yet. The change also prevented the
      requested MAC address from being stored by the adapter object for an
      ibmvnic interface when the state of the ibmvnic interface is
      VNIC_PROBED - that is after probing has finished but before the
      ibmvnic interface is brought up. The MAC address stored by the
      adapter object is used and sent to the hypervisor for checking when
      an ibmvnic interface is brought up.
      
      The ibmvnic driver ignoring the requested MAC address when in
      VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more
      than one slave to malfunction. The bonding code must be able to
      change the MAC address of its slaves before they are brought up
      during enslaving. The inability of kernels with 8fc3672a to set
      the MAC addresses of bonding slaves is observable in the output of
      "ip address show". The MAC addresses of the slaves are the same as
      the MAC address of the bond on a working system whereas the slaves
      retain their original MAC addresses on a system with a malfunctioning
      LACP bond.
      
      Fixes: 8fc3672a ("ibmvnic: fix ibmvnic_set_mac")
      Signed-off-by: NJiri Wiesner <jwiesner@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      adf9ab95
    • M
      ibmvnic: Fix possibly uninitialized old_num_tx_queues variable warning. · f963c2cf
      Michal Suchanek 提交于
      stable inclusion
      from stable-5.10.24
      commit 57ac75f8d241b3d13b77d223214be025f18df8a1
      bugzilla: 51348
      
      --------------------------------
      
      commit 6881b07f upstream.
      
      GCC 7.5 reports:
      ../drivers/net/ethernet/ibm/ibmvnic.c: In function 'ibmvnic_reset_init':
      ../drivers/net/ethernet/ibm/ibmvnic.c:5373:51: warning: 'old_num_tx_queues' may be used uninitialized in this function [-Wmaybe-uninitialized]
      ../drivers/net/ethernet/ibm/ibmvnic.c:5373:6: warning: 'old_num_rx_queues' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      The variable is initialized only if(reset) and used only if(reset &&
      something) so this is a false positive. However, there is no reason to
      not initialize the variables unconditionally avoiding the warning.
      
      Fixes: 635e442f ("ibmvnic: merge ibmvnic_reset_init and ibmvnic_init")
      Signed-off-by: NMichal Suchanek <msuchanek@suse.de>
      Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f963c2cf