1. 09 4月, 2021 40 次提交
    • G
      net: phy: fix save wrong speed and duplex problem if autoneg is on · 541e3645
      Guangbin Huang 提交于
      stable inclusion
      from stable-5.10.24
      commit 6aa23829949c2c0912e82866aeab4fd591595235
      bugzilla: 51348
      
      --------------------------------
      
      commit d9032dba upstream.
      
      If phy uses generic driver and autoneg is on, enter command
      "ethtool -s eth0 speed 50" will not change phy speed actually, but
      command "ethtool eth0" shows speed is 50Mb/s because phydev->speed
      has been set to 50 and no update later.
      
      And duplex setting has same problem too.
      
      However, if autoneg is on, phy only changes speed and duplex according to
      phydev->advertising, but not phydev->speed and phydev->duplex. So in this
      case, phydev->speed and phydev->duplex don't need to be set in function
      phy_ethtool_ksettings_set() if autoneg is on.
      
      Fixes: 51e2a384 ("PHY: Avoid unnecessary aneg restarts")
      Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      541e3645
    • J
      net: always use icmp{,v6}_ndo_send from ndo_start_xmit · 0165d324
      Jason A. Donenfeld 提交于
      stable inclusion
      from stable-5.10.24
      commit 91796b65563bd3fd0efe4fb56d6ee1c5c6006eb0
      bugzilla: 51348
      
      --------------------------------
      
      commit 4372339e upstream.
      
      There were a few remaining tunnel drivers that didn't receive the prior
      conversion to icmp{,v6}_ndo_send. Knowing now that this could lead to
      memory corrution (see ee576c47 ("net: icmp: pass zeroed opts from
      icmp{,v6}_ndo_send before sending") for details), there's even more
      imperative to have these all converted. So this commit goes through the
      remaining cases that I could find and does a boring translation to the
      ndo variety.
      
      The Fixes: line below is the merge that originally added icmp{,v6}_
      ndo_send and converted the first batch of icmp{,v6}_send users. The
      rationale then for the change applies equally to this patch. It's just
      that these drivers were left out of the initial conversion because these
      network devices are hiding in net/ rather than in drivers/net/.
      
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Fixes: 803381f9 ("Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      0165d324
    • V
      netfilter: x_tables: gpf inside xt_find_revision() · 034017c8
      Vasily Averin 提交于
      stable inclusion
      from stable-5.10.24
      commit 8abbf7e53e179b16dc48c40cecc6c86240ca026c
      bugzilla: 51348
      
      --------------------------------
      
      commit 8e24eddd upstream.
      
      nested target/match_revfn() calls work with xt[NFPROTO_UNSPEC] lists
      without taking xt[NFPROTO_UNSPEC].mutex. This can race with module unload
      and cause host to crash:
      
      general protection fault: 0000 [#1]
      Modules linked in: ... [last unloaded: xt_cluster]
      CPU: 0 PID: 542455 Comm: iptables
      RIP: 0010:[<ffffffff8ffbd518>]  [<ffffffff8ffbd518>] strcmp+0x18/0x40
      RDX: 0000000000000003 RSI: ffff9a5a5d9abe10 RDI: dead000000000111
      R13: ffff9a5a5d9abe10 R14: ffff9a5a5d9abd8c R15: dead000000000100
      (VvS: %R15 -- &xt_match,  %RDI -- &xt_match.name,
      xt_cluster unregister match in xt[NFPROTO_UNSPEC].match list)
      Call Trace:
       [<ffffffff902ccf44>] match_revfn+0x54/0xc0
       [<ffffffff902ccf9f>] match_revfn+0xaf/0xc0
       [<ffffffff902cd01e>] xt_find_revision+0x6e/0xf0
       [<ffffffffc05a5be0>] do_ipt_get_ctl+0x100/0x420 [ip_tables]
       [<ffffffff902cc6bf>] nf_getsockopt+0x4f/0x70
       [<ffffffff902dd99e>] ip_getsockopt+0xde/0x100
       [<ffffffff903039b5>] raw_getsockopt+0x25/0x50
       [<ffffffff9026c5da>] sock_common_getsockopt+0x1a/0x20
       [<ffffffff9026b89d>] SyS_getsockopt+0x7d/0xf0
       [<ffffffff903cbf92>] system_call_fastpath+0x25/0x2a
      
      Fixes: 656caff2 ("netfilter 04/09: x_tables: fix match/target revision lookup")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      034017c8
    • F
      netfilter: nf_nat: undo erroneous tcp edemux lookup · 6c980f9a
      Florian Westphal 提交于
      stable inclusion
      from stable-5.10.24
      commit 42402bd84530d3761b97775c10762fde28d5b2f9
      bugzilla: 51348
      
      --------------------------------
      
      commit 03a3ca37 upstream.
      
      Under extremely rare conditions TCP early demux will retrieve the wrong
      socket.
      
      1. local machine establishes a connection to a remote server, S, on port
         p.
      
         This gives:
         laddr:lport -> S:p
         ... both in tcp and conntrack.
      
      2. local machine establishes a connection to host H, on port p2.
         2a. TCP stack choses same laddr:lport, so we have
         laddr:lport -> H:p2 from TCP point of view.
         2b). There is a destination NAT rewrite in place, translating
              H:p2 to S:p.  This results in following conntrack entries:
      
         I)  laddr:lport -> S:p  (origin)  S:p -> laddr:lport (reply)
         II) laddr:lport -> H:p2 (origin)  S:p -> laddr:lport2 (reply)
      
         NAT engine has rewritten laddr:lport to laddr:lport2 to map
         the reply packet to the correct origin.
      
         When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
         will undo-the SNAT transformation, rewriting IP header to
         S:p -> laddr:lport
      
         This causes TCP early demux to associate the skb with the TCP socket
         of the first connection.
      
         The INPUT hook will then reverse the DNAT transformation, rewriting
         the IP header to H:p2 -> laddr:lport.
      
      Because packet ends up with the wrong socket, the new connection
      never completes: originator stays in SYN_SENT and conntrack entry
      remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
      until it gives up.
      
      To resolve this, orphan the skb after the input rewrite:
      Because the source IP address changed, the socket must be incorrect.
      We can't move the DNAT undo to prerouting due to backwards
      compatibility, doing so will make iptables/nftables rules to no longer
      match the way they did.
      
      After orphan, the packet will be handed to the next protocol layer
      (tcp, udp, ...) and that will repeat the socket lookup just like as if
      early demux was disabled.
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      6c980f9a
    • E
      tcp: add sanity tests to TCP_QUEUE_SEQ · bc4f1468
      Eric Dumazet 提交于
      stable inclusion
      from stable-5.10.24
      commit 046f3c1c2ff450fb7ae53650e9a95e0074a61f3e
      bugzilla: 51348
      
      --------------------------------
      
      commit 8811f4a9 upstream.
      
      Qingyu Li reported a syzkaller bug where the repro
      changes RCV SEQ _after_ restoring data in the receive queue.
      
      mprotect(0x4aa000, 12288, PROT_READ)    = 0
      mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000
      mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
      mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000
      socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
      setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0
      connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0
      setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0
      sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
      setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0
      setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0
      recvfrom(3, NULL, 20, 0, NULL, NULL)    = -1 ECONNRESET (Connection reset by peer)
      
      syslog shows:
      [  111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0
      [  111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0
      
      This should not be allowed. TCP_QUEUE_SEQ should only be used
      when queues are empty.
      
      This patch fixes this case, and the tx path as well.
      
      Fixes: ee995283 ("tcp: Initial repair mode")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005Reported-by: NQingyu Li <ieatmuttonchuan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      bc4f1468
    • A
      tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE) · fe30893d
      Arjun Roy 提交于
      stable inclusion
      from stable-5.10.24
      commit e95ebe1ed6abc259b897abc1f92622504750747c
      bugzilla: 51348
      
      --------------------------------
      
      commit 2107d45f upstream.
      
      getsockopt(TCP_ZEROCOPY_RECEIVE) has a bug where we read a
      user-provided "len" field of type signed int, and then compare the
      value to the result of an "offsetofend" operation, which is unsigned.
      
      Negative values provided by the user will be promoted to large
      positive numbers; thus checking that len < offsetofend() will return
      false when the intention was that it return true.
      
      Note that while len is originally checked for negative values earlier
      on in do_tcp_getsockopt(), subsequent calls to get_user() re-read the
      value from userspace which may have changed in the meantime.
      
      Therefore, re-add the check for negative values after the call to
      get_user in the handler code for TCP_ZEROCOPY_RECEIVE.
      
      Fixes: c8856c05 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NArjun Roy <arjunroy@google.com>
      Link: https://lore.kernel.org/r/20210225232628.4033281-1-arjunroy.kdev@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      fe30893d
    • T
      can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode · eb2be85c
      Torin Cooper-Bennun 提交于
      stable inclusion
      from stable-5.10.24
      commit 473bce9b9393a3a990ed7c9708af38df553f2712
      bugzilla: 51348
      
      --------------------------------
      
      commit 27126252 upstream.
      
      This patch prevents a potentially destructive race condition. The
      device is fully operational on the bus after entering Normal Mode, so
      zeroing the MRAM after entering this mode may lead to loss of
      information, e.g. new received messages.
      
      This patch fixes the problem by first initializing the MRAM, then
      bringing the device into Normale Mode.
      
      Fixes: 5443c226 ("can: tcan4x5x: Add tcan4x5x driver to the kernel")
      Link: https://lore.kernel.org/r/20210226163440.313628-1-torin@maxiluxsystems.comSuggested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NTorin Cooper-Bennun <torin@maxiluxsystems.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      eb2be85c
    • J
      can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode · cc83b0f6
      Joakim Zhang 提交于
      stable inclusion
      from stable-5.10.24
      commit c537011c99abc9d1e1e9bc2a3bb32fda1cda4583
      bugzilla: 51348
      
      --------------------------------
      
      commit c6382004 upstream.
      
      Invoke flexcan_chip_freeze() to enter freeze mode, since need poll
      freeze mode acknowledge.
      
      Fixes: e955cead ("CAN: Add Flexcan CAN controller driver")
      Link: https://lore.kernel.org/r/20210218110037.16591-4-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      cc83b0f6
    • J
      can: flexcan: enable RX FIFO after FRZ/HALT valid · b08e4dd1
      Joakim Zhang 提交于
      stable inclusion
      from stable-5.10.24
      commit e24c53182850abce8c7fe3423f843ccb62581e6f
      bugzilla: 51348
      
      --------------------------------
      
      commit ec15e27c upstream.
      
      RX FIFO enable failed could happen when do system reboot stress test:
      
      [    0.303958] flexcan 5a8d0000.can: 5a8d0000.can supply xceiver not found, using dummy regulator
      [    0.304281] flexcan 5a8d0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
      [    0.314640] flexcan 5a8d0000.can: registering netdev failed
      [    0.320728] flexcan 5a8e0000.can: 5a8e0000.can supply xceiver not found, using dummy regulator
      [    0.320991] flexcan 5a8e0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
      [    0.331360] flexcan 5a8e0000.can: registering netdev failed
      [    0.337444] flexcan 5a8f0000.can: 5a8f0000.can supply xceiver not found, using dummy regulator
      [    0.337716] flexcan 5a8f0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
      [    0.348117] flexcan 5a8f0000.can: registering netdev failed
      
      RX FIFO should be enabled after the FRZ/HALT are valid. But the current
      code enable RX FIFO and FRZ/HALT at the same time.
      
      Fixes: e955cead ("CAN: Add Flexcan CAN controller driver")
      Link: https://lore.kernel.org/r/20210218110037.16591-3-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b08e4dd1
    • J
      can: flexcan: assert FRZ bit in flexcan_chip_freeze() · 5c89e681
      Joakim Zhang 提交于
      stable inclusion
      from stable-5.10.24
      commit 98b7f969116df96c57e9a8572620d71e92fcb725
      bugzilla: 51348
      
      --------------------------------
      
      commit 449052cf upstream.
      
      Assert HALT bit to enter freeze mode, there is a premise that FRZ bit is
      asserted. This patch asserts FRZ bit in flexcan_chip_freeze, although
      the reset value is 1b'1. This is a prepare patch, later patch will
      invoke flexcan_chip_freeze() to enter freeze mode, which polling freeze
      mode acknowledge.
      
      Fixes: b1aa1c7a ("can: flexcan: fix transition from and to freeze mode in chip_{,un}freeze")
      Link: https://lore.kernel.org/r/20210218110037.16591-2-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5c89e681
    • O
      can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership · 7401aa66
      Oleksij Rempel 提交于
      stable inclusion
      from stable-5.10.24
      commit 4224890edff1b4679dc8ddeaa69b43efce5366ba
      bugzilla: 51348
      
      --------------------------------
      
      commit e940e089 upstream.
      
      There are two ref count variables controlling the free()ing of a socket:
      - struct sock::sk_refcnt - which is changed by sock_hold()/sock_put()
      - struct sock::sk_wmem_alloc - which accounts the memory allocated by
        the skbs in the send path.
      
      In case there are still TX skbs on the fly and the socket() is closed,
      the struct sock::sk_refcnt reaches 0. In the TX-path the CAN stack
      clones an "echo" skb, calls sock_hold() on the original socket and
      references it. This produces the following back trace:
      
      | WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134
      | refcount_t: addition on 0; use-after-free.
      | Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E)
      | CPU: 0 PID: 280 Comm: test_can.sh Tainted: G            E     5.11.0-04577-gf8ff6603c617 #203
      | Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
      | Backtrace:
      | [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220
      | [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8)
      | [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90
      | [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2
      | [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540
      | [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50)
      | [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c)
      | [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600
      | [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00
      | [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600
      | [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400
      | [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534)
      
      To fix this problem, only set skb ownership to sockets which have still
      a ref count > 0.
      
      Fixes: 0ae89beb ("can: add destructor for self generated skbs")
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Cc: Andre Naujoks <nautsch2@gmail.com>
      Link: https://lore.kernel.org/r/20210226092456.27126-1-o.rempel@pengutronix.deSuggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      7401aa66
    • M
      net: l2tp: reduce log level of messages in receive path, add counter instead · 7a2d5a15
      Matthias Schiffer 提交于
      stable inclusion
      from stable-5.10.24
      commit fa5d019c56e78e0b33f585d23149f2553568b998
      bugzilla: 51348
      
      --------------------------------
      
      commit 3e59e885 upstream.
      
      Commit 5ee759cd ("l2tp: use standard API for warning log messages")
      changed a number of warnings about invalid packets in the receive path
      so that they are always shown, instead of only when a special L2TP debug
      flag is set. Even with rate limiting these warnings can easily cause
      significant log spam - potentially triggered by a malicious party
      sending invalid packets on purpose.
      
      In addition these warnings were noticed by projects like Tunneldigger [1],
      which uses L2TP for its data path, but implements its own control
      protocol (which is sufficiently different from L2TP data packets that it
      would always be passed up to userspace even with future extensions of
      L2TP).
      
      Some of the warnings were already redundant, as l2tp_stats has a counter
      for these packets. This commit adds one additional counter for invalid
      packets that are passed up to userspace. Packets with unknown session are
      not counted as invalid, as there is nothing wrong with the format of
      these packets.
      
      With the additional counter, all of these messages are either redundant
      or benign, so we reduce them to pr_debug_ratelimited().
      
      [1] https://github.com/wlanslovenija/tunneldigger/issues/160
      
      Fixes: 5ee759cd ("l2tp: use standard API for warning log messages")
      Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      7a2d5a15
    • B
      net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 · f908d27a
      Balazs Nemeth 提交于
      stable inclusion
      from stable-5.10.24
      commit 453fff24f52eeb62ab65582848498097273df269
      bugzilla: 51348
      
      --------------------------------
      
      commit d348ede3 upstream.
      
      A packet with skb_inner_network_header(skb) == skb_network_header(skb)
      and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
      from the packet. Subsequently, the call to skb_mac_gso_segment will
      again call mpls_gso_segment with the same packet leading to an infinite
      loop. In addition, ensure that the header length is a multiple of four,
      which should hold irrespective of the number of stacked labels.
      Signed-off-by: NBalazs Nemeth <bnemeth@redhat.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f908d27a
    • B
      net: check if protocol extracted by virtio_net_hdr_set_proto is correct · 857ee3c4
      Balazs Nemeth 提交于
      stable inclusion
      from stable-5.10.24
      commit faa3baa2828c5e1c4374f3e60041f75c64f5fcb6
      bugzilla: 51348
      
      --------------------------------
      
      commit 924a9bc3 upstream.
      
      For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't
      set) based on the type in the virtio net hdr, but the skb could contain
      anything since it could come from packet_snd through a raw socket. If
      there is a mismatch between what virtio_net_hdr_set_proto sets and
      the actual protocol, then the skb could be handled incorrectly later
      on.
      
      An example where this poses an issue is with the subsequent call to
      skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set
      correctly. A specially crafted packet could fool
      skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned.
      
      Avoid blindly trusting the information provided by the virtio net header
      by checking that the protocol in the packet actually matches the
      protocol set by virtio_net_hdr_set_proto. Note that since the protocol
      is only checked if skb->dev implements header_ops->parse_protocol,
      packets from devices without the implementation are not checked at this
      stage.
      
      Fixes: 9274124f ("net: stricter validation of untrusted gso packets")
      Signed-off-by: NBalazs Nemeth <bnemeth@redhat.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      857ee3c4
    • D
      net: Fix gro aggregation for udp encaps with zero csum · 8365fd56
      Daniel Borkmann 提交于
      stable inclusion
      from stable-5.10.24
      commit 09af4362ba47c805347840c2bb9719c0458925ca
      bugzilla: 51348
      
      --------------------------------
      
      commit 89e5c58f upstream.
      
      We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
      csum for the UDP header itself is 0. In that case, GRO aggregation does
      not take place on the phys dev, but instead is deferred to the vxlan/geneve
      driver (see trace below).
      
      The reason is essentially that GRO aggregation bails out in udp_gro_receive()
      for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
      others) where for non-zero csums 2abb7cdc ("udp: Add support for doing
      checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
      and napi context has csum_valid set. This is however not the case for zero
      UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).
      
      At the same time 57c67ff4 ("udp: additional GRO support") added matches
      on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
      so it certainly is expected to handle zero csums in udp_gro_receive(). The
      purpose of the check added via 662880f4 ("net: Allow GRO to use and set
      levels of checksum unnecessary") seems to catch bad csum and stop aggregation
      right away.
      
      One way to fix aggregation in the zero case is to only perform the !csum_valid
      check in udp_gro_receive() if uh->check is infact non-zero.
      
      Before:
      
        [...]
        swapper     0 [008]   731.946506: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100400 len=1500   (1)
        swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100200 len=1500
        swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101100 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101700 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101b00 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100600 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100f00 len=1500
        swapper     0 [008]   731.946509: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100a00 len=1500
        swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100500 len=1500
        swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100700 len=1500
        swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101d00 len=1500   (2)
        swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101000 len=1500
        swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101c00 len=1500
        swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101400 len=1500
        swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100e00 len=1500
        swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101600 len=1500
        swapper     0 [008]   731.946521: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100800 len=774
        swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1)
        swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112  (2)
        [...]
      
        # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
        MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
        Recv   Send    Send
        Socket Socket  Message  Elapsed
        Size   Size    Size     Time     Throughput
        bytes  bytes   bytes    secs.    10^6bits/sec
      
         87380  16384  16384    20.01    13129.24
      
      After:
      
        [...]
        swapper     0 [026]   521.862641: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479000 len=11286 (1)
        swapper     0 [026]   521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1)
        swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d478500 len=2898  (2)
        swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479f00 len=8490  (3)
        swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848  (2)
        swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440  (3)
        [...]
      
        # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
        MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
        Recv   Send    Send
        Socket Socket  Message  Elapsed
        Size   Size    Size     Time     Throughput
        bytes  bytes   bytes    secs.    10^6bits/sec
      
         87380  16384  16384    20.01    24576.53
      
      Fixes: 57c67ff4 ("udp: additional GRO support")
      Fixes: 662880f4 ("net: Allow GRO to use and set levels of checksum unnecessary")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20210226212248.8300-1-daniel@iogearbox.netSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      8365fd56
    • F
      ath9k: fix transmitting to stations in dynamic SMPS mode · bd1c87fa
      Felix Fietkau 提交于
      stable inclusion
      from stable-5.10.24
      commit d2fb1911a7a8f655440d613fc8946df384d83ee5
      bugzilla: 51348
      
      --------------------------------
      
      commit 3b9ea720 upstream.
      
      When transmitting to a receiver in dynamic SMPS mode, all transmissions that
      use multiple spatial streams need to be sent using CTS-to-self or RTS/CTS to
      give the receiver's extra chains some time to wake up.
      This fixes the tx rate getting stuck at <= MCS7 for some clients, especially
      Intel ones, which make aggressive use of SMPS.
      
      Cc: stable@vger.kernel.org
      Reported-by: NMartin Kennedy <hurricos@gmail.com>
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210214184911.96702-1-nbd@nbd.nameSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      bd1c87fa
    • M
      crypto: mips/poly1305 - enable for all MIPS processors · ab4d5a39
      Maciej W. Rozycki 提交于
      stable inclusion
      from stable-5.10.24
      commit b0454a28f60878539a55439436ea9ad29728d366
      bugzilla: 51348
      
      --------------------------------
      
      commit 6c810cf2 upstream.
      
      The MIPS Poly1305 implementation is generic MIPS code written such as to
      support down to the original MIPS I and MIPS III ISA for the 32-bit and
      64-bit variant respectively.  Lift the current limitation then to enable
      code for MIPSr1 ISA or newer processors only and have it available for
      all MIPS processors.
      Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk>
      Fixes: a11d055e ("crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation")
      Cc: stable@vger.kernel.org # v5.5+
      Acked-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      ab4d5a39
    • J
      ethernet: alx: fix order of calls on resume · 09c31584
      Jakub Kicinski 提交于
      stable inclusion
      from stable-5.10.24
      commit a0df424a863aa6a2e8bd57ef5e0928da5d5b797f
      bugzilla: 51348
      
      --------------------------------
      
      commit a4dcfbc4 upstream.
      
      netif_device_attach() will unpause the queues so we can't call
      it before __alx_open(). This went undetected until
      commit b0999223 ("alx: add ability to allocate and free
      alx_napi structures") but now if stack tries to xmit immediately
      on resume before __alx_open() we'll crash on the NAPI being null:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000198
       CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G           OE 5.10.0-3-amd64 #1 Debian 5.10.13-1
       Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77-D3H, BIOS F15 11/14/2013
       RIP: 0010:alx_start_xmit+0x34/0x650 [alx]
       Code: 41 56 41 55 41 54 55 53 48 83 ec 20 0f b7 57 7c 8b 8e b0
      0b 00 00 39 ca 72 06 89 d0 31 d2 f7 f1 89 d2 48 8b 84 df
       RSP: 0018:ffffb09240083d28 EFLAGS: 00010297
       RAX: 0000000000000000 RBX: ffffa04d80ae7800 RCX: 0000000000000004
       RDX: 0000000000000000 RSI: ffffa04d80afa000 RDI: ffffa04e92e92a00
       RBP: 0000000000000042 R08: 0000000000000100 R09: ffffa04ea3146700
       R10: 0000000000000014 R11: 0000000000000000 R12: ffffa04e92e92100
       R13: 0000000000000001 R14: ffffa04e92e92a00 R15: ffffa04e92e92a00
       FS:  0000000000000000(0000) GS:ffffa0508f600000(0000) knlGS:0000000000000000
       i915 0000:00:02.0: vblank wait timed out on crtc 0
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000198 CR3: 000000004460a001 CR4: 00000000001706f0
       Call Trace:
        dev_hard_start_xmit+0xc7/0x1e0
        sch_direct_xmit+0x10f/0x310
      
      Cc: <stable@vger.kernel.org> # 4.9+
      Fixes: bc2bebe8 ("alx: remove WoL support")
      Reported-by: NZbynek Michl <zbynek.michl@gmail.com>
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983595Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Tested-by: NZbynek Michl <zbynek.michl@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      09c31584
    • G
      powerpc/pseries: Don't enforce MSI affinity with kdump · 58233a58
      Greg Kurz 提交于
      stable inclusion
      from stable-5.10.24
      commit a9c55f22a0b978d636204509c4edaf511cb20f62
      bugzilla: 51348
      
      --------------------------------
      
      commit f9619d5e upstream.
      
      Depending on the number of online CPUs in the original kernel, it is
      likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
      in the affinity mappings provided by irq_create_affinity_masks() are
      thus not started by irq_startup(), as per-design with managed IRQs.
      
      This can be a problem with multi-queue block devices driven by blk-mq :
      such a non-started IRQ is very likely paired with the single queue
      enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
      causes the device to remain silent and likely hangs the guest at
      some point.
      
      This is a regression caused by commit 9ea69a55 ("powerpc/pseries:
      Pass MSI affinity to irq_create_mapping()"). Note that this only happens
      with the XIVE interrupt controller because XICS has a workaround to bypass
      affinity, which is activated during kdump with the "noirqdistrib" kernel
      parameter.
      
      The issue comes from a combination of factors:
      - discrepancy between the number of queues detected by the multi-queue
        block driver, that was used to create the MSI vectors, and the single
        queue mode enforced later on by blk-mq because of kdump (i.e. keeping
        all queues fixes the issue)
      - CPU#0 offline (i.e. kdump always succeed with CPU#0)
      
      Given that I couldn't reproduce on x86, which seems to always have CPU#0
      online even during kdump, I'm not sure where this should be fixed. Hence
      going for another approach : fine-grained affinity is for performance
      and we don't really care about that during kdump. Simply revert to the
      previous working behavior of ignoring affinity masks in this case only.
      
      Fixes: 9ea69a55 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Reviewed-by: NCédric Le Goater <clg@kaod.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210215094506.1196119-1-groug@kaod.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      58233a58
    • A
      powerpc/perf: Fix handling of privilege level checks in perf interrupt context · b8fb12dc
      Athira Rajeev 提交于
      stable inclusion
      from stable-5.10.24
      commit ac022fbee6855dc6304a9e63e481859b2589836d
      bugzilla: 51348
      
      --------------------------------
      
      commit 5ae5fbd2 upstream.
      
      Running "perf mem record" in powerpc platforms with selinux enabled
      resulted in soft lockup's. Below call-trace was seen in the logs:
      
        CPU: 58 PID: 3751 Comm: sssd_nss Not tainted 5.11.0-rc7+ #2
        NIP:  c000000000dff3d4 LR: c000000000dff3d0 CTR: 0000000000000000
        REGS: c000007fffab7d60 TRAP: 0100   Not tainted  (5.11.0-rc7+)
        ...
        NIP _raw_spin_lock_irqsave+0x94/0x120
        LR  _raw_spin_lock_irqsave+0x90/0x120
        Call Trace:
          0xc00000000fd47260 (unreliable)
          skb_queue_tail+0x3c/0x90
          audit_log_end+0x6c/0x180
          common_lsm_audit+0xb0/0xe0
          slow_avc_audit+0xa4/0x110
          avc_has_perm+0x1c4/0x260
          selinux_perf_event_open+0x74/0xd0
          security_perf_event_open+0x68/0xc0
          record_and_restart+0x6e8/0x7f0
          perf_event_interrupt+0x22c/0x560
          performance_monitor_exception0x4c/0x60
          performance_monitor_common_virt+0x1c8/0x1d0
        interrupt: f00 at _raw_spin_lock_irqsave+0x38/0x120
        NIP:  c000000000dff378 LR: c000000000b5fbbc CTR: c0000000007d47f0
        REGS: c00000000fd47860 TRAP: 0f00   Not tainted  (5.11.0-rc7+)
        ...
        NIP _raw_spin_lock_irqsave+0x38/0x120
        LR  skb_queue_tail+0x3c/0x90
        interrupt: f00
          0x38 (unreliable)
          0xc00000000aae6200
          audit_log_end+0x6c/0x180
          audit_log_exit+0x344/0xf80
          __audit_syscall_exit+0x2c0/0x320
          do_syscall_trace_leave+0x148/0x200
          syscall_exit_prepare+0x324/0x390
          system_call_common+0xfc/0x27c
      
      The above trace shows that while the CPU was handling a performance
      monitor exception, there was a call to security_perf_event_open()
      function. In powerpc core-book3s, this function is called from
      perf_allow_kernel() check during recording of data address in the
      sample via perf_get_data_addr().
      
      Commit da97e184 ("perf_event: Add support for LSM and SELinux
      checks") introduced security enhancements to perf. As part of this
      commit, the new security hook for perf_event_open() was added in all
      places where perf paranoid check was previously used. In powerpc
      core-book3s code, originally had paranoid checks in
      perf_get_data_addr() and power_pmu_bhrb_read(). So
      perf_paranoid_kernel() checks were replaced with perf_allow_kernel()
      in these PMU helper functions as well.
      
      The intention of paranoid checks in core-book3s was to verify
      privilege access before capturing some of the sample data. Along with
      paranoid checks, perf_allow_kernel() also does a
      security_perf_event_open(). Since these functions are accessed while
      recording a sample, we end up calling selinux_perf_event_open() in PMI
      context. Some of the security functions use spinlock like
      sidtab_sid2str_put(). If a perf interrupt hits under a spin lock and
      if we end up in calling selinux hook functions in PMI handler, this
      could cause a dead lock.
      
      Since the purpose of this security hook is to control access to
      perf_event_open(), it is not right to call this in interrupt context.
      
      The paranoid checks in powerpc core-book3s were done at interrupt time
      which is also not correct.
      
      Reference commits:
        Commit cd1231d7 ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()")
        Commit bb19af81 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer")
      
      We only allow creation of events that have already passed the
      privilege checks in perf_event_open(). So these paranoid checks are
      not needed at event time. As a fix, patch uses
      'event->attr.exclude_kernel' check to prevent exposing kernel address
      for userspace only sampling.
      
      Fixes: cd1231d7 ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()")
      Cc: stable@vger.kernel.org # v4.17+
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1614247839-1428-1-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b8fb12dc
    • D
      uapi: nfnetlink_cthelper.h: fix userspace compilation error · a577087a
      Dmitry V. Levin 提交于
      stable inclusion
      from stable-5.10.24
      commit 7732f57f0f523509b0b405ad2a0271f4016a4b45
      bugzilla: 51348
      
      --------------------------------
      
      commit c33cb002 upstream.
      
      Apparently, <linux/netfilter/nfnetlink_cthelper.h> and
      <linux/netfilter/nfnetlink_acct.h> could not be included into the same
      compilation unit because of a cut-and-paste typo in the former header.
      
      Fixes: 12f7a505 ("netfilter: add user-space connection tracking helper infrastructure")
      Cc: <stable@vger.kernel.org> # v3.6
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      a577087a
    • Z
      arm64/mpam: fix a memleak in add_schema · 2571b979
      Zhang Ming 提交于
      openEuler inclusion
      category: bugfix
      bugzilla: 48265
      CVE: NA
      Reference: https://gitee.com/openeuler/kernel/issues/I3BPPX
      
      ---------------------------------------------------
      
      The default branch in switch will not run at present,
      but there may be related extensions in the future,
      which may lead to memory leakage.
      
      Signed-off-by: Zhang Ming <154842638(a)qq.com>
      Reported-by: Wang ShaoBo <bobo.shaobowang(a)huawei.com>
      Suggested-by: Jian Cheng <cj.chengjian(a)huawei.com>
      Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
      [Zheng Zengkai: adjust commit message]
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      2571b979
    • W
      cacheinfo: workaround cacheinfo's info_list uninitialized error · daf583dc
      Wang ShaoBo 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      Workaround cacheinfo's info_list uninitialized error in some special
      cases, such as free_cache_attributes() free info_list but not set
      num_leaves to zero when PPTT is not supported. this solution lasts
      until upstream issue resolved.
      
      Fixes: 950e5edb ("drivers: base: cacheinfo: Add helper to search cacheinfo by of_node")
      Fixes: 709c4362 ("cacheinfo: Move resctrl's get_cache_id() to the cacheinfo header file")
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NJian Cheng <cj.chengjian@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      daf583dc
    • W
      openeuler_defconfig: Enable MPAM by default · 4fd69d66
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      Enable MPAM by default.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      4fd69d66
    • W
      arm64/mpam: Sort domains when cpu online · fa837999
      Wang ShaoBo 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      When cpu online, domains inserted into resctrl_resource structure's
      domains list may be out of order, so sort them with domain id.
      
      Fixes: 2e2c511ff49d ("arm64/mpam: resctrl: Handle cpuhp and resctrl_dom allocation")
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NJian Cheng <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      fa837999
    • W
      arm64/mpam: resctrl: Refresh cpu mask for handling cpuhp · 867ae5b2
      Wang ShaoBo 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      This fixes two problems:
      
      1) when cpu offline, we should clear cpu mask from all associated resctrl
         group but not only default group.
      
      2) when cpu online, we should set cpu mask for default group and update
         default group's cpus to default state if cdp on, this operation is to
         fill code and data fields of mpam sysregs with appropriate value.
      
      Fixes: 2e2c511ff49d ("arm64/mpam: resctrl: Handle cpuhp and resctrl_dom allocation")
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NJian Cheng <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      867ae5b2
    • W
      arm64/mpam: resctrl: Allow setting register MPAMCFG_MBW_MIN to 0 · 649e23fb
      Wang ShaoBo 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      Unlike mbw max(Memory Bandwidth Maximum), sometimes we don't want make use
      of mbw min feature(this for restrict memory bandwidth maximum capacity
      partition by using MPAMCFG_MBW_MIN, MBMIN row in schemata) and set
      MPAMCFG_MBW_MIN to 0.
      
      e.g.
          > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMin
          > cd resctrl/ && cat schemata
            L3:0=7fff;1=7fff;2=7fff;3=7fff
            MBMIN:0=0;1=0;2=0;3=0
      
          # before revision
          > echo 'MBMIN:0=0;1=0;2=0;3=0' > schemata
          > cat schemata
            L3:0=7fff;1=7fff;2=7fff;3=7fff
            MBMIN:0=2;1=2;2=2;3=2
      
          # after revision
          > echo 'MBMIN:0=0;1=0;2=0;3=0' > schemata
          > cat schemata
            L3:0=7fff;1=7fff;2=7fff;3=7fff
            MBMIN:0=0;1=0;2=0;3=0
      
      Fixes: 5a49c4f1983d ("arm64/mpam: Supplement additional useful ctrl features for mount options")
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      649e23fb
    • W
      arm64/mpam: resctrl: Use resctrl_group_init_alloc() for default group · 96a27f9d
      Wang ShaoBo 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      When we support configure different types of resources for a resource, the
      wrong history value will be updated in the default group after remounting.
      
      e.g.
          > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMax,mbMin && cd resctrl/
          > echo 'MBMIN:0=2;1=2;2=2;3=2' > schemata
          > cat schemata
            L3:0=7fff;1=7fff;2=7fff;3=7fff
            MBMAX:0=100;1=100;2=100;3=100
            MBMIN:0=2;1=2;2=2;3=2
          > cd .. && umount /sys/fs/resctrl/
          > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMax,mbMin && cd resctrl/ && cat schemata
            L3:0=7fff;1=7fff;2=7fff;3=7fff
            MBMAX:0=100;1=100;2=100;3=100
            MBMIN:0=0;1=0;2=0;3=0
          > echo 'MBMAX:0=10;1=10;2=10;3=10' > schemata
          > cat schemata
            L3:0=7fff;1=7fff;2=7fff;3=7fff
            MBMAX:0=10;1=10;2=10;3=10
            MBMIN:0=2;1=2;2=2;3=2  #update error history value
      
      When writing schemata sysfile, call path like this:
      
      resctrl_group_schemata_write()
        -=> resctrl_update_groups_config()
               -=> resctrl_group_update_domains()
                     -=> resctrl_group_update_domain_ctrls()
                      { .../*refresh new_ctrl array of supported conf type once for each resource*/ }
      
      We should refresh new_ctrl field in struct resctrl_staged_config by
      resctrl_group_init_alloc() before calling resctrl_group_update_domain_ctrls().
      
      Fixes: 6b2471f089be ("arm64/mpam: resctrl: Support priority and hardlimit(Memory bandwidth) configuration")
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      96a27f9d
    • W
      arm64/mpam: resctrl: Add proper error handling to resctrl_mount() · 10e4e43b
      Wang ShaoBo 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      This function is called only when we mount resctrl sysfs, for error
      handling we need to destroy schemata list when next few steps failed
      after creation of schemata list.
      
      Fixes: 7e9b5caeefff ("arm64/mpam: resctrl: Add helpers for init and destroy schemata list")
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      10e4e43b
    • W
      arm64/mpam: Use fs_context to parse mount options · 100e2317
      Wang ShaoBo 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      Use fs_context to parse mount options, this old process parsing from
      parse_rdtgroupfs_options() will be obsoleted and removed.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      100e2317
    • W
      arm64/mpam: Supplement additional useful ctrl features for mount options · 228fa64a
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      Based on 61fa56e1dd8a ("arm64/mpam: Add resctrl_ctrl_feature structure to manage
      ctrl features"), we add several ctrl features and supply corresponding
      mount options, including mbPbm, mbMax, mbMin, mbPrio, caMax, caPrio, caPbm,
      if MPAM system supports relevant features, we can mount resctrl like this:
      
      e.g.
         > mount -t resctrl resctrl /sys/fs/resctrl -o mbMax,mbMin,caPrio
         > cd /sys/fs/resctrl && cat schemata
           L3:0=0x7fff;1=0x7fff;2=0x7fff;3=0x7fff #default select cpbm as basic ctrl feature
           L3PRI:0=3;1=3;2=3;3=3
           MBMAX:0=100;1=100;2=100;3=100
           MBMIN:0=0;1=0;2=0;3=0
      
         > mount -t resctrl resctrl /sys/fs/resctrl
         > cd /sys/fs/resctrl && cat schemata
           L3:0=0x7fff;1=0x7fff;2=0x7fff;3=0x7fff #default select cpbm as basic ctrl feature
           MB:0=100;1=100;2=100;3=100  #default select mbw max as basic ctrl feature
      
         > mount -t resctrl resctrl /sys/fs/resctrl -o caMax
         > cd /sys/fs/resctrl && cat schemata
           L3:0=33554432;1=33554432;2=33554432;3=33554432 #use cmax ctrl feature
           MB:0=100;1=100;2=100;3=100  #default select mbw max as basic ctrl feature
      
      For Cache MSCs, basic ctrl features include cmax(Cache Maximum Capacity)
      and cpbm(Cache protion bitmap) partition, if mount options are not specified,
      default cpbm will be selected.
      
      For Memory MSCs, basic ctrl features include max(Memory Bandwidth Maximum)
      and pbm(Memory Bandwidth Portion Bitmap) partition, if mount options are
      not specified, default max will be selected.
      
      Above mount options also can be used accompany with cdp options.
      
      e.g.
         > mount -t resctrl resctrl /sys/fs/resctrl -o caMax,caPrio,cdpl3
         > cd /sys/fs/resctrl && cat schemata
           L3CODE:0=33554432;1=33554432;2=33554432;3=33554432 #code use cmax ctrl feature
           L3DATA:0=33554432;1=33554432;2=33554432;3=33554432 #data use cmax ctrl feature
           L3CODEPRI:0=3;1=3;2=3;3=3 #code use intpriority ctrl feature
           L3DATAPRI:0=3;1=3;2=3;3=3 #data use intpriority ctrl feature
           MB:0=100;1=100;2=100;3=100  #default select mbw max as basic ctrl feature
      
      By combining these mount parameters can we use MPAM more powerfully.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      228fa64a
    • W
      arm64/mpam: Set per-cpu's closid to none zero for cdp · cae569b3
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      Sometimes monitoring will have such anomalies:
      
      e.g.
          > cd /sys/fs/resctrl/ && grep . mon_data/*
            mon_data/mon_L3CODE_00:14336
            mon_data/mon_L3CODE_01:344064
            mon_data/mon_L3CODE_02:2048
            mon_data/mon_L3CODE_03:27648
            mon_data/mon_L3DATA_00:0  #L3DATA's monitoring data always be 0
            mon_data/mon_L3DATA_01:0
            mon_data/mon_L3DATA_02:0
            mon_data/mon_L3DATA_03:0
            mon_data/mon_MB_00:392
            mon_data/mon_MB_01:552
            mon_data/mon_MB_02:160
            mon_data/mon_MB_03:0
      
      If cdp on, tasks in resctrl default group with closid=0 and rmid=0 don't
      know how to fill proper partid_i/pmg_i and partid_d/pmg_d into MPAMx_ELx
      sysregs by mpam_sched_in() called by __switch_to(), it's because current
      cpu's default closid and rmid are also equal to 0 and to make the operation
      modifying configuration passed.
      
      Update per cpu default closid of none-zero value, call update_closid_rmid()
      to update each cpu's mpam proper MPAMx_ELx sysregs for setting partid and
      pmg when mounting resctrl sysfs, it looks like a practical method.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      cae569b3
    • W
      arm64/mpam: Simplify mpamid cdp mapping process · 092d98c3
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      MPAM includes partid, pmg, monitor, all of these we collectively call
      mpam id, if cdp on, we would allocate a new mpamid_new which equals to
      mpamid + 1, and at some places mpamid may not need to be encapsulated
      into struct { u16 val; } for simplicity, So we use a simpler macro
      resctrl_cdp_mpamid_map_val() to complete this cdp mapping process.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      092d98c3
    • W
      arm64/mpam: Filter schema control type with ctrl features · b3a23e33
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      ctrl_features array, introduced by 61fa56e1dd8a ("arm64/mpam: Add
      resctrl_ctrl_feature structure to manage ctrl features"), which lives
      in raw_resctrl_resource structure for listing ctrl features's type do
      we support in total for this resource, this filters illegal parameters
      outside from mount options and provides useful info for add_schema()
      for registering a new control type node in schema list.
      
      This action helps us to add new ctrl feature easier later.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b3a23e33
    • W
      arm64/mpam: resctrl: Add rmid file in resctrl sysfs · 2ae8305b
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      rmid is used to mark each resctrl group for monitoring, anyhow, also
      following corresponding resctrl group's configuration, we export rmid
      sysfile to resctrl sysfs for any usage elsewhere such as SMMU io, user
      can get rmid from a resctrl group and set this rmid to a target io
      through SMMU driver if SMMU MPAM implemented, so make related io devices
      can be monitored or accomplish aimed configuration for resource's usage.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      2ae8305b
    • W
      arm64/mpam: Split header files into suitable location · 0c564931
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      So far there are some declarations shared by resctrlfs.c and mpam
      core module files under kernel/mpam directory scattered in mpam.h
      and resctrl.h, this is organized like this:
      
      -- asm/
         +-- resctrl.h        +
         +-- mpam.h           |    +
         +-- mpam_resource.h  |    |    +
                              |    |    |
      -- fs/                  |    |    +-> mpam/
         +-- resctrlfs.c <----+----+------> +-- mpam_resctrl.c ...
      
      We move this declarations shared by resctrlfs.c and mpam/ to resctrl.h
      and split another declarations into mpam_internal.h, also including
      moving mpam_resource.h to mpam/ directory, currently this is organized
      like this:
      
      -- asm/
         +-- mpam.h           +----> export to other modules(e.g. SMMU master io)
         +-- resctrl.h        +
                              |
      -- mpam/                |
         +-- mpam_internal.h  |    +
         +-- mpam_resource.h  |    |    +
                              |    |    |
      -- fs/                  |    +----+-> mpam/
         +-- resctrlfs.c <----+-----------> +-- mpam_resctrl.c ...
      
      In this way can we build a clearer framework for MPAM usage.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      0c564931
    • W
      arm64/mpam: resctrl: Export resource's properties to info directory · 9d39dad1
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      Some resource's properities such as closid and rmid are exported like
      Intel-RDT in our resctrl design, but there also has two main differences,
      one is MB(Memory Bandwidth), for we MB is also divided into two directories
      MB and MB_MON to show respective properties about control and monitor type
      as same as LxCache, another is we adopt features sysfile under resources'
      directories, which indicates the properties of control type of corresponding
      resource, for instance MB hardlimit.
      
      e.g.
          > mount -t resctrl resctrl /sys/fs/resctrl -o mbHdl
          > cd /sys/fs/resctrl/ && cat info/MB/features
            mbHdl@1  #indicate MBHDL setting's upper bound is 1
          > cat schemata
            L3:0=7fff;1=7fff;2=7fff;3=7fff
            MB:0=100;1=100;2=100;3=100
            MBHDL:0=1;1=1;2=1;3=1
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      9d39dad1
    • W
      arm64/mpam: Add resctrl_ctrl_feature structure to manage ctrl features · cf92ebfa
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      Structure resctrl_ctrl_feature taken by resources is introduced to manage
      ctrl features, of which characteristic like max width from outer input
      and the base we parse from.
      
      Now it is more practical for declaring a new ctrl feature, such as SCHEMA_PRI
      feature, only associated with internal priority setting exported by mpam
      devices, where informations is collected from mpam_resctrl_resource_init(),
      and next be chosen open or close by user options.
      
      ctrl_ctrl_feature structure contains a flags field to avoid duplicated
      control type, for instance, SCHEMA_COMM feature selectes cpbm (Cache
      portion bitmap) as resource Cache default control type, so we should not
      enable this feature no longer if user manually selectes cpbm control
      type through mount options.
      
      This field evt in ctrl_ctrl_feature structure is enum rdt_event_id type
      variable which works like eee4ad2a36e6 ("arm64/mpam: Add hook-events id
      for ctrl features") illustrates.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      cf92ebfa
    • W
      arm64/mpam: Add wait queue for monitor alloc and free · 7d3cd1a2
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      For MPAM, a rmid can do monitoring work only with a monitor resource
      allocated, we adopt a mechanism for monitor resource dynamic allocation
      and recycling, it is different from Intel-RDT operation who creates a
      kworker thread for dynamically monitoring Cache usage and checks if it
      is below a threshold adjustable for rmid free, for we have detected that
      this method will affect the cpu utilization in many cases, sometimes this
      influence cannot be accepted.
      
      Our method is simple, as different resource's monitor number varies, we
      deliever two list, one for storing rmids which has exclusive monitor
      resource and another for storing this rmids which have monitor resource
      shared, this shared monitor id always be 0. it works like this, if a new
      rmid apply for a resource monitor which is in used, then we put this rmid
      to the tail of latter list and temporarily give a default monitor id 0
      util someone releases available monitor resource, if this new rmid has
      all resources' monitor resource needed, then it will be put into exclusive
      list.
      
      This implements the LRU allocation of monitor resources and give users
      part control rights of allocation and release, if resctrl group's quantity
      can be guaranteed or user don't need monitoring too many groups
      synchronously, this is a more appropriate way for user deployment, not
      only that, also can it avoid the risk of inaccuracy in monitoring when
      monitoring operation happen to too many groups at the same time.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      7d3cd1a2
    • W
      arm64/mpam: Remap reqpartid,pmg to rmid and intpartid to closid · 0b16164d
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      feature: ARM MPAM support
      bugzilla: 48265
      CVE: NA
      
      --------------------------------
      
      So far we use sd_closid, including {reqpartid, intpartid}, to label each
      resctrl group including ctrlgroup and mongroup, This can perfectly handle
      this case where number of reqpartid exceeds intpartid, this always happen
      when intpartid narrowing supported, otherwise their two are of same number.
      So we use excessive reqpartid to indicate (1)- how configurations can be
      synchronized from the configuration indexed by intpartid, not only that,
      (2)- take part of monitor role.
      
      But reqpartid in (2) with pmg still be scattered, So far we have not yet
      a right way to explain how can we use their two properly. In order to
      ensure their resources can be fully utilized, and given this idea from
      Intel-RDT's design which uses rmid for monitoring, a rmid remap matrix is
      delivered for transforming partid and pmg to rmid, this matrix is organized
      like this:
      
                       [bitmap entry indexed by partid]
                             [col pos is partid]
      
                           [0]  [1]  [2]  [3]  [4]  [5]
         occ->bitmap[:0]    1    0    0    1    1    1
              bitmap[:1]    1    0    0    1    1    1
              bitmap[:2]    1    1    1    1    1    1
              bitmap[:3]    1    1    1    1    1    1
      [row pos-1 is pmg]
      
      Calculate rmid = partid + NR_partid * pmg
      
      occ represents if this bitmap has been used by a partid, it is because
      a certain partid should not be accompany with a duplicated pmg for
      monitoring, this design easily saves a lot of space, and can also decrease
      time complexity of allocating and free rmid process from O(NR_partid)*
      O(NR_pmg) to O(NR_partid) + O(log(NR_pmg)) compared with using list.
      
      By this way, we get a continuous rmid set with upper bound(NR_pmg *
      NR_partid - 1), given an rmid we can assume that if it's a valid rmid
      by judging whether it falls within this range or not.
      
      rmid implicts the reqpartid info, so we can use relevant helpers to get
      this reqpartid for sd_closid@reqpartid and perfectly accomplish this
      configuration sync mission, this also makes closid simpler which can be
      consists of intpartid index only, also each resctrl group is happy to own
      consecutive rmid.
      
      This also has some profound influences, for instance for MPAM there
      also support SMMU io using partid and pmg, we can use a single helper
      mpam_rmid_to_partid_pmg() in SMMU driver to complete this remap process
      for rmid input from outside user space.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      0b16164d