提交 · 541e36454da95a48d52eb1874e46812f1c5af10b · openeuler / Kernel

09 4月, 2021 40 次提交

net: phy: fix save wrong speed and duplex problem if autoneg is on · 541e3645

由 Guangbin Huang 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 6aa23829949c2c0912e82866aeab4fd591595235
bugzilla: 51348

--------------------------------

commit d9032dba upstream.

If phy uses generic driver and autoneg is on, enter command
"ethtool -s eth0 speed 50" will not change phy speed actually, but
command "ethtool eth0" shows speed is 50Mb/s because phydev->speed
has been set to 50 and no update later.

And duplex setting has same problem too.

However, if autoneg is on, phy only changes speed and duplex according to
phydev->advertising, but not phydev->speed and phydev->duplex. So in this
case, phydev->speed and phydev->duplex don't need to be set in function
phy_ethtool_ksettings_set() if autoneg is on.

Fixes: 51e2a384 ("PHY: Avoid unnecessary aneg restarts")
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

541e3645

net: always use icmp{,v6}_ndo_send from ndo_start_xmit · 0165d324

由 Jason A. Donenfeld 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 91796b65563bd3fd0efe4fb56d6ee1c5c6006eb0
bugzilla: 51348

--------------------------------

commit 4372339e upstream.

There were a few remaining tunnel drivers that didn't receive the prior
conversion to icmp{,v6}_ndo_send. Knowing now that this could lead to
memory corrution (see ee576c47 ("net: icmp: pass zeroed opts from
icmp{,v6}_ndo_send before sending") for details), there's even more
imperative to have these all converted. So this commit goes through the
remaining cases that I could find and does a boring translation to the
ndo variety.

The Fixes: line below is the merge that originally added icmp{,v6}_
ndo_send and converted the first batch of icmp{,v6}_send users. The
rationale then for the change applies equally to this patch. It's just
that these drivers were left out of the initial conversion because these
network devices are hiding in net/ rather than in drivers/net/.

Cc: Florian Westphal <fw@strlen.de>
Cc: Willem de Bruijn <willemb@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: 803381f9 ("Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'")
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0165d324

netfilter: x_tables: gpf inside xt_find_revision() · 034017c8

由 Vasily Averin 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 8abbf7e53e179b16dc48c40cecc6c86240ca026c
bugzilla: 51348

--------------------------------

commit 8e24eddd upstream.

nested target/match_revfn() calls work with xt[NFPROTO_UNSPEC] lists
without taking xt[NFPROTO_UNSPEC].mutex. This can race with module unload
and cause host to crash:

general protection fault: 0000 [#1]
Modules linked in: ... [last unloaded: xt_cluster]
CPU: 0 PID: 542455 Comm: iptables
RIP: 0010:[<ffffffff8ffbd518>]  [<ffffffff8ffbd518>] strcmp+0x18/0x40
RDX: 0000000000000003 RSI: ffff9a5a5d9abe10 RDI: dead000000000111
R13: ffff9a5a5d9abe10 R14: ffff9a5a5d9abd8c R15: dead000000000100
(VvS: %R15 -- &xt_match,  %RDI -- &xt_match.name,
xt_cluster unregister match in xt[NFPROTO_UNSPEC].match list)
Call Trace:
 [<ffffffff902ccf44>] match_revfn+0x54/0xc0
 [<ffffffff902ccf9f>] match_revfn+0xaf/0xc0
 [<ffffffff902cd01e>] xt_find_revision+0x6e/0xf0
 [<ffffffffc05a5be0>] do_ipt_get_ctl+0x100/0x420 [ip_tables]
 [<ffffffff902cc6bf>] nf_getsockopt+0x4f/0x70
 [<ffffffff902dd99e>] ip_getsockopt+0xde/0x100
 [<ffffffff903039b5>] raw_getsockopt+0x25/0x50
 [<ffffffff9026c5da>] sock_common_getsockopt+0x1a/0x20
 [<ffffffff9026b89d>] SyS_getsockopt+0x7d/0xf0
 [<ffffffff903cbf92>] system_call_fastpath+0x25/0x2a

Fixes: 656caff2 ("netfilter 04/09: x_tables: fix match/target revision lookup")
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Reviewed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

034017c8

netfilter: nf_nat: undo erroneous tcp edemux lookup · 6c980f9a

由 Florian Westphal 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 42402bd84530d3761b97775c10762fde28d5b2f9
bugzilla: 51348

--------------------------------

commit 03a3ca37 upstream.

Under extremely rare conditions TCP early demux will retrieve the wrong
socket.

1. local machine establishes a connection to a remote server, S, on port
   p.

   This gives:
   laddr:lport -> S:p
   ... both in tcp and conntrack.

2. local machine establishes a connection to host H, on port p2.
   2a. TCP stack choses same laddr:lport, so we have
   laddr:lport -> H:p2 from TCP point of view.
   2b). There is a destination NAT rewrite in place, translating
        H:p2 to S:p.  This results in following conntrack entries:

   I)  laddr:lport -> S:p  (origin)  S:p -> laddr:lport (reply)
   II) laddr:lport -> H:p2 (origin)  S:p -> laddr:lport2 (reply)

   NAT engine has rewritten laddr:lport to laddr:lport2 to map
   the reply packet to the correct origin.

   When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
   will undo-the SNAT transformation, rewriting IP header to
   S:p -> laddr:lport

   This causes TCP early demux to associate the skb with the TCP socket
   of the first connection.

   The INPUT hook will then reverse the DNAT transformation, rewriting
   the IP header to H:p2 -> laddr:lport.

Because packet ends up with the wrong socket, the new connection
never completes: originator stays in SYN_SENT and conntrack entry
remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
until it gives up.

To resolve this, orphan the skb after the input rewrite:
Because the source IP address changed, the socket must be incorrect.
We can't move the DNAT undo to prerouting due to backwards
compatibility, doing so will make iptables/nftables rules to no longer
match the way they did.

After orphan, the packet will be handed to the next protocol layer
(tcp, udp, ...) and that will repeat the socket lookup just like as if
early demux was disabled.

Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6c980f9a

tcp: add sanity tests to TCP_QUEUE_SEQ · bc4f1468

由 Eric Dumazet 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 046f3c1c2ff450fb7ae53650e9a95e0074a61f3e
bugzilla: 51348

--------------------------------

commit 8811f4a9 upstream.

Qingyu Li reported a syzkaller bug where the repro
changes RCV SEQ _after_ restoring data in the receive queue.

mprotect(0x4aa000, 12288, PROT_READ)    = 0
mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000
mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000
socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0
connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0
setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0
sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0
setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0
recvfrom(3, NULL, 20, 0, NULL, NULL)    = -1 ECONNRESET (Connection reset by peer)

syslog shows:
[  111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0
[  111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0

This should not be allowed. TCP_QUEUE_SEQ should only be used
when queues are empty.

This patch fixes this case, and the tx path as well.

Fixes: ee995283 ("tcp: Initial repair mode")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005Reported-by: NQingyu Li <ieatmuttonchuan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bc4f1468

tcp: Fix sign comparison bug in getsockopt(TCP_ZEROCOPY_RECEIVE) · fe30893d

由 Arjun Roy 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit e95ebe1ed6abc259b897abc1f92622504750747c
bugzilla: 51348

--------------------------------

commit 2107d45f upstream.

getsockopt(TCP_ZEROCOPY_RECEIVE) has a bug where we read a
user-provided "len" field of type signed int, and then compare the
value to the result of an "offsetofend" operation, which is unsigned.

Negative values provided by the user will be promoted to large
positive numbers; thus checking that len < offsetofend() will return
false when the intention was that it return true.

Note that while len is originally checked for negative values earlier
on in do_tcp_getsockopt(), subsequent calls to get_user() re-read the
value from userspace which may have changed in the meantime.

Therefore, re-add the check for negative values after the call to
get_user in the handler code for TCP_ZEROCOPY_RECEIVE.

Fixes: c8856c05 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.")
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NArjun Roy <arjunroy@google.com>
Link: https://lore.kernel.org/r/20210225232628.4033281-1-arjunroy.kdev@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fe30893d

can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before entering Normal Mode · eb2be85c

由 Torin Cooper-Bennun 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 473bce9b9393a3a990ed7c9708af38df553f2712
bugzilla: 51348

--------------------------------

commit 27126252 upstream.

This patch prevents a potentially destructive race condition. The
device is fully operational on the bus after entering Normal Mode, so
zeroing the MRAM after entering this mode may lead to loss of
information, e.g. new received messages.

This patch fixes the problem by first initializing the MRAM, then
bringing the device into Normale Mode.

Fixes: 5443c226 ("can: tcan4x5x: Add tcan4x5x driver to the kernel")
Link: https://lore.kernel.org/r/20210226163440.313628-1-torin@maxiluxsystems.comSuggested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NTorin Cooper-Bennun <torin@maxiluxsystems.com>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

eb2be85c

can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode · cc83b0f6

由 Joakim Zhang 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit c537011c99abc9d1e1e9bc2a3bb32fda1cda4583
bugzilla: 51348

--------------------------------

commit c6382004 upstream.

Invoke flexcan_chip_freeze() to enter freeze mode, since need poll
freeze mode acknowledge.

Fixes: e955cead ("CAN: Add Flexcan CAN controller driver")
Link: https://lore.kernel.org/r/20210218110037.16591-4-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cc83b0f6

can: flexcan: enable RX FIFO after FRZ/HALT valid · b08e4dd1

由 Joakim Zhang 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit e24c53182850abce8c7fe3423f843ccb62581e6f
bugzilla: 51348

--------------------------------

commit ec15e27c upstream.

RX FIFO enable failed could happen when do system reboot stress test:

[    0.303958] flexcan 5a8d0000.can: 5a8d0000.can supply xceiver not found, using dummy regulator
[    0.304281] flexcan 5a8d0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
[    0.314640] flexcan 5a8d0000.can: registering netdev failed
[    0.320728] flexcan 5a8e0000.can: 5a8e0000.can supply xceiver not found, using dummy regulator
[    0.320991] flexcan 5a8e0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
[    0.331360] flexcan 5a8e0000.can: registering netdev failed
[    0.337444] flexcan 5a8f0000.can: 5a8f0000.can supply xceiver not found, using dummy regulator
[    0.337716] flexcan 5a8f0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core
[    0.348117] flexcan 5a8f0000.can: registering netdev failed

RX FIFO should be enabled after the FRZ/HALT are valid. But the current
code enable RX FIFO and FRZ/HALT at the same time.

Fixes: e955cead ("CAN: Add Flexcan CAN controller driver")
Link: https://lore.kernel.org/r/20210218110037.16591-3-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b08e4dd1

can: flexcan: assert FRZ bit in flexcan_chip_freeze() · 5c89e681

由 Joakim Zhang 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 98b7f969116df96c57e9a8572620d71e92fcb725
bugzilla: 51348

--------------------------------

commit 449052cf upstream.

Assert HALT bit to enter freeze mode, there is a premise that FRZ bit is
asserted. This patch asserts FRZ bit in flexcan_chip_freeze, although
the reset value is 1b'1. This is a prepare patch, later patch will
invoke flexcan_chip_freeze() to enter freeze mode, which polling freeze
mode acknowledge.

Fixes: b1aa1c7a ("can: flexcan: fix transition from and to freeze mode in chip_{,un}freeze")
Link: https://lore.kernel.org/r/20210218110037.16591-2-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5c89e681

can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership · 7401aa66

由 Oleksij Rempel 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 4224890edff1b4679dc8ddeaa69b43efce5366ba
bugzilla: 51348

--------------------------------

commit e940e089 upstream.

There are two ref count variables controlling the free()ing of a socket:
- struct sock::sk_refcnt - which is changed by sock_hold()/sock_put()
- struct sock::sk_wmem_alloc - which accounts the memory allocated by
  the skbs in the send path.

In case there are still TX skbs on the fly and the socket() is closed,
the struct sock::sk_refcnt reaches 0. In the TX-path the CAN stack
clones an "echo" skb, calls sock_hold() on the original socket and
references it. This produces the following back trace:

| WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134
| refcount_t: addition on 0; use-after-free.
| Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E)
| CPU: 0 PID: 280 Comm: test_can.sh Tainted: G            E     5.11.0-04577-gf8ff6603c617 #203
| Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
| Backtrace:
| [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220
| [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8)
| [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90
| [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2
| [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540
| [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50)
| [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c)
| [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600
| [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00
| [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600
| [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400
| [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534)

To fix this problem, only set skb ownership to sockets which have still
a ref count > 0.

Fixes: 0ae89beb ("can: add destructor for self generated skbs")
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Andre Naujoks <nautsch2@gmail.com>
Link: https://lore.kernel.org/r/20210226092456.27126-1-o.rempel@pengutronix.deSuggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7401aa66

net: l2tp: reduce log level of messages in receive path, add counter instead · 7a2d5a15

由 Matthias Schiffer 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit fa5d019c56e78e0b33f585d23149f2553568b998
bugzilla: 51348

--------------------------------

commit 3e59e885 upstream.

Commit 5ee759cd ("l2tp: use standard API for warning log messages")
changed a number of warnings about invalid packets in the receive path
so that they are always shown, instead of only when a special L2TP debug
flag is set. Even with rate limiting these warnings can easily cause
significant log spam - potentially triggered by a malicious party
sending invalid packets on purpose.

In addition these warnings were noticed by projects like Tunneldigger [1],
which uses L2TP for its data path, but implements its own control
protocol (which is sufficiently different from L2TP data packets that it
would always be passed up to userspace even with future extensions of
L2TP).

Some of the warnings were already redundant, as l2tp_stats has a counter
for these packets. This commit adds one additional counter for invalid
packets that are passed up to userspace. Packets with unknown session are
not counted as invalid, as there is nothing wrong with the format of
these packets.

With the additional counter, all of these messages are either redundant
or benign, so we reduce them to pr_debug_ratelimited().

[1] https://github.com/wlanslovenija/tunneldigger/issues/160

Fixes: 5ee759cd ("l2tp: use standard API for warning log messages")
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7a2d5a15

net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0 · f908d27a

由 Balazs Nemeth 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 453fff24f52eeb62ab65582848498097273df269
bugzilla: 51348

--------------------------------

commit d348ede3 upstream.

A packet with skb_inner_network_header(skb) == skb_network_header(skb)
and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
from the packet. Subsequently, the call to skb_mac_gso_segment will
again call mpls_gso_segment with the same packet leading to an infinite
loop. In addition, ensure that the header length is a multiple of four,
which should hold irrespective of the number of stacked labels.
Signed-off-by: NBalazs Nemeth <bnemeth@redhat.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f908d27a

net: check if protocol extracted by virtio_net_hdr_set_proto is correct · 857ee3c4

由 Balazs Nemeth 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit faa3baa2828c5e1c4374f3e60041f75c64f5fcb6
bugzilla: 51348

--------------------------------

commit 924a9bc3 upstream.

For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't
set) based on the type in the virtio net hdr, but the skb could contain
anything since it could come from packet_snd through a raw socket. If
there is a mismatch between what virtio_net_hdr_set_proto sets and
the actual protocol, then the skb could be handled incorrectly later
on.

An example where this poses an issue is with the subsequent call to
skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set
correctly. A specially crafted packet could fool
skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned.

Avoid blindly trusting the information provided by the virtio net header
by checking that the protocol in the packet actually matches the
protocol set by virtio_net_hdr_set_proto. Note that since the protocol
is only checked if skb->dev implements header_ops->parse_protocol,
packets from devices without the implementation are not checked at this
stage.

Fixes: 9274124f ("net: stricter validation of untrusted gso packets")
Signed-off-by: NBalazs Nemeth <bnemeth@redhat.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

857ee3c4

net: Fix gro aggregation for udp encaps with zero csum · 8365fd56

由 Daniel Borkmann 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 09af4362ba47c805347840c2bb9719c0458925ca
bugzilla: 51348

--------------------------------

commit 89e5c58f upstream.

We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
csum for the UDP header itself is 0. In that case, GRO aggregation does
not take place on the phys dev, but instead is deferred to the vxlan/geneve
driver (see trace below).

The reason is essentially that GRO aggregation bails out in udp_gro_receive()
for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
others) where for non-zero csums 2abb7cdc ("udp: Add support for doing
checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
and napi context has csum_valid set. This is however not the case for zero
UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).

At the same time 57c67ff4 ("udp: additional GRO support") added matches
on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
so it certainly is expected to handle zero csums in udp_gro_receive(). The
purpose of the check added via 662880f4 ("net: Allow GRO to use and set
levels of checksum unnecessary") seems to catch bad csum and stop aggregation
right away.

One way to fix aggregation in the zero case is to only perform the !csum_valid
check in udp_gro_receive() if uh->check is infact non-zero.

Before:

  [...]
  swapper     0 [008]   731.946506: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100400 len=1500   (1)
  swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100200 len=1500
  swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101100 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101700 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101b00 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100600 len=1500
  swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100f00 len=1500
  swapper     0 [008]   731.946509: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100a00 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100500 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100700 len=1500
  swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101d00 len=1500   (2)
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101000 len=1500
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101c00 len=1500
  swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101400 len=1500
  swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100e00 len=1500
  swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101600 len=1500
  swapper     0 [008]   731.946521: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100800 len=774
  swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1)
  swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112  (2)
  [...]

  # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec

   87380  16384  16384    20.01    13129.24

After:

  [...]
  swapper     0 [026]   521.862641: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479000 len=11286 (1)
  swapper     0 [026]   521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1)
  swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d478500 len=2898  (2)
  swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479f00 len=8490  (3)
  swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848  (2)
  swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440  (3)
  [...]

  # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
  MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
  Recv   Send    Send
  Socket Socket  Message  Elapsed
  Size   Size    Size     Time     Throughput
  bytes  bytes   bytes    secs.    10^6bits/sec

   87380  16384  16384    20.01    24576.53

Fixes: 57c67ff4 ("udp: additional GRO support")
Fixes: 662880f4 ("net: Allow GRO to use and set levels of checksum unnecessary")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tom Herbert <tom@herbertland.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20210226212248.8300-1-daniel@iogearbox.netSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8365fd56

ath9k: fix transmitting to stations in dynamic SMPS mode · bd1c87fa

由 Felix Fietkau 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit d2fb1911a7a8f655440d613fc8946df384d83ee5
bugzilla: 51348

--------------------------------

commit 3b9ea720 upstream.

When transmitting to a receiver in dynamic SMPS mode, all transmissions that
use multiple spatial streams need to be sent using CTS-to-self or RTS/CTS to
give the receiver's extra chains some time to wake up.
This fixes the tx rate getting stuck at <= MCS7 for some clients, especially
Intel ones, which make aggressive use of SMPS.

Cc: stable@vger.kernel.org
Reported-by: NMartin Kennedy <hurricos@gmail.com>
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210214184911.96702-1-nbd@nbd.nameSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bd1c87fa

crypto: mips/poly1305 - enable for all MIPS processors · ab4d5a39

由 Maciej W. Rozycki 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit b0454a28f60878539a55439436ea9ad29728d366
bugzilla: 51348

--------------------------------

commit 6c810cf2 upstream.

The MIPS Poly1305 implementation is generic MIPS code written such as to
support down to the original MIPS I and MIPS III ISA for the 32-bit and
64-bit variant respectively.  Lift the current limitation then to enable
code for MIPSr1 ISA or newer processors only and have it available for
all MIPS processors.
Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk>
Fixes: a11d055e ("crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation")
Cc: stable@vger.kernel.org # v5.5+
Acked-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ab4d5a39

ethernet: alx: fix order of calls on resume · 09c31584

由 Jakub Kicinski 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit a0df424a863aa6a2e8bd57ef5e0928da5d5b797f
bugzilla: 51348

--------------------------------

commit a4dcfbc4 upstream.

netif_device_attach() will unpause the queues so we can't call
it before __alx_open(). This went undetected until
commit b0999223 ("alx: add ability to allocate and free
alx_napi structures") but now if stack tries to xmit immediately
on resume before __alx_open() we'll crash on the NAPI being null:

 BUG: kernel NULL pointer dereference, address: 0000000000000198
 CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G           OE 5.10.0-3-amd64 #1 Debian 5.10.13-1
 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77-D3H, BIOS F15 11/14/2013
 RIP: 0010:alx_start_xmit+0x34/0x650 [alx]
 Code: 41 56 41 55 41 54 55 53 48 83 ec 20 0f b7 57 7c 8b 8e b0
0b 00 00 39 ca 72 06 89 d0 31 d2 f7 f1 89 d2 48 8b 84 df
 RSP: 0018:ffffb09240083d28 EFLAGS: 00010297
 RAX: 0000000000000000 RBX: ffffa04d80ae7800 RCX: 0000000000000004
 RDX: 0000000000000000 RSI: ffffa04d80afa000 RDI: ffffa04e92e92a00
 RBP: 0000000000000042 R08: 0000000000000100 R09: ffffa04ea3146700
 R10: 0000000000000014 R11: 0000000000000000 R12: ffffa04e92e92100
 R13: 0000000000000001 R14: ffffa04e92e92a00 R15: ffffa04e92e92a00
 FS:  0000000000000000(0000) GS:ffffa0508f600000(0000) knlGS:0000000000000000
 i915 0000:00:02.0: vblank wait timed out on crtc 0
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000198 CR3: 000000004460a001 CR4: 00000000001706f0
 Call Trace:
  dev_hard_start_xmit+0xc7/0x1e0
  sch_direct_xmit+0x10f/0x310

Cc: <stable@vger.kernel.org> # 4.9+
Fixes: bc2bebe8 ("alx: remove WoL support")
Reported-by: NZbynek Michl <zbynek.michl@gmail.com>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983595Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Tested-by: NZbynek Michl <zbynek.michl@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

09c31584

powerpc/pseries: Don't enforce MSI affinity with kdump · 58233a58

由 Greg Kurz 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit a9c55f22a0b978d636204509c4edaf511cb20f62
bugzilla: 51348

--------------------------------

commit f9619d5e upstream.

Depending on the number of online CPUs in the original kernel, it is
likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
in the affinity mappings provided by irq_create_affinity_masks() are
thus not started by irq_startup(), as per-design with managed IRQs.

This can be a problem with multi-queue block devices driven by blk-mq :
such a non-started IRQ is very likely paired with the single queue
enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
causes the device to remain silent and likely hangs the guest at
some point.

This is a regression caused by commit 9ea69a55 ("powerpc/pseries:
Pass MSI affinity to irq_create_mapping()"). Note that this only happens
with the XIVE interrupt controller because XICS has a workaround to bypass
affinity, which is activated during kdump with the "noirqdistrib" kernel
parameter.

The issue comes from a combination of factors:
- discrepancy between the number of queues detected by the multi-queue
  block driver, that was used to create the MSI vectors, and the single
  queue mode enforced later on by blk-mq because of kdump (i.e. keeping
  all queues fixes the issue)
- CPU#0 offline (i.e. kdump always succeed with CPU#0)

Given that I couldn't reproduce on x86, which seems to always have CPU#0
online even during kdump, I'm not sure where this should be fixed. Hence
going for another approach : fine-grained affinity is for performance
and we don't really care about that during kdump. Simply revert to the
previous working behavior of ignoring affinity masks in this case only.

Fixes: 9ea69a55 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210215094506.1196119-1-groug@kaod.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

58233a58

powerpc/perf: Fix handling of privilege level checks in perf interrupt context · b8fb12dc

由 Athira Rajeev 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit ac022fbee6855dc6304a9e63e481859b2589836d
bugzilla: 51348

--------------------------------

commit 5ae5fbd2 upstream.

Running "perf mem record" in powerpc platforms with selinux enabled
resulted in soft lockup's. Below call-trace was seen in the logs:

  CPU: 58 PID: 3751 Comm: sssd_nss Not tainted 5.11.0-rc7+ #2
  NIP:  c000000000dff3d4 LR: c000000000dff3d0 CTR: 0000000000000000
  REGS: c000007fffab7d60 TRAP: 0100   Not tainted  (5.11.0-rc7+)
  ...
  NIP _raw_spin_lock_irqsave+0x94/0x120
  LR  _raw_spin_lock_irqsave+0x90/0x120
  Call Trace:
    0xc00000000fd47260 (unreliable)
    skb_queue_tail+0x3c/0x90
    audit_log_end+0x6c/0x180
    common_lsm_audit+0xb0/0xe0
    slow_avc_audit+0xa4/0x110
    avc_has_perm+0x1c4/0x260
    selinux_perf_event_open+0x74/0xd0
    security_perf_event_open+0x68/0xc0
    record_and_restart+0x6e8/0x7f0
    perf_event_interrupt+0x22c/0x560
    performance_monitor_exception0x4c/0x60
    performance_monitor_common_virt+0x1c8/0x1d0
  interrupt: f00 at _raw_spin_lock_irqsave+0x38/0x120
  NIP:  c000000000dff378 LR: c000000000b5fbbc CTR: c0000000007d47f0
  REGS: c00000000fd47860 TRAP: 0f00   Not tainted  (5.11.0-rc7+)
  ...
  NIP _raw_spin_lock_irqsave+0x38/0x120
  LR  skb_queue_tail+0x3c/0x90
  interrupt: f00
    0x38 (unreliable)
    0xc00000000aae6200
    audit_log_end+0x6c/0x180
    audit_log_exit+0x344/0xf80
    __audit_syscall_exit+0x2c0/0x320
    do_syscall_trace_leave+0x148/0x200
    syscall_exit_prepare+0x324/0x390
    system_call_common+0xfc/0x27c

The above trace shows that while the CPU was handling a performance
monitor exception, there was a call to security_perf_event_open()
function. In powerpc core-book3s, this function is called from
perf_allow_kernel() check during recording of data address in the
sample via perf_get_data_addr().

Commit da97e184 ("perf_event: Add support for LSM and SELinux
checks") introduced security enhancements to perf. As part of this
commit, the new security hook for perf_event_open() was added in all
places where perf paranoid check was previously used. In powerpc
core-book3s code, originally had paranoid checks in
perf_get_data_addr() and power_pmu_bhrb_read(). So
perf_paranoid_kernel() checks were replaced with perf_allow_kernel()
in these PMU helper functions as well.

The intention of paranoid checks in core-book3s was to verify
privilege access before capturing some of the sample data. Along with
paranoid checks, perf_allow_kernel() also does a
security_perf_event_open(). Since these functions are accessed while
recording a sample, we end up calling selinux_perf_event_open() in PMI
context. Some of the security functions use spinlock like
sidtab_sid2str_put(). If a perf interrupt hits under a spin lock and
if we end up in calling selinux hook functions in PMI handler, this
could cause a dead lock.

Since the purpose of this security hook is to control access to
perf_event_open(), it is not right to call this in interrupt context.

The paranoid checks in powerpc core-book3s were done at interrupt time
which is also not correct.

Reference commits:
  Commit cd1231d7 ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()")
  Commit bb19af81 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer")

We only allow creation of events that have already passed the
privilege checks in perf_event_open(). So these paranoid checks are
not needed at event time. As a fix, patch uses
'event->attr.exclude_kernel' check to prevent exposing kernel address
for userspace only sampling.

Fixes: cd1231d7 ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()")
Cc: stable@vger.kernel.org # v4.17+
Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1614247839-1428-1-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b8fb12dc

uapi: nfnetlink_cthelper.h: fix userspace compilation error · a577087a

由 Dmitry V. Levin 提交于 3月 27, 2021

stable inclusion
from stable-5.10.24
commit 7732f57f0f523509b0b405ad2a0271f4016a4b45
bugzilla: 51348

--------------------------------

commit c33cb002 upstream.

Apparently, <linux/netfilter/nfnetlink_cthelper.h> and
<linux/netfilter/nfnetlink_acct.h> could not be included into the same
compilation unit because of a cut-and-paste typo in the former header.

Fixes: 12f7a505 ("netfilter: add user-space connection tracking helper infrastructure")
Cc: <stable@vger.kernel.org> # v3.6
Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a577087a

arm64/mpam: fix a memleak in add_schema · 2571b979

由 Zhang Ming 提交于 3月 17, 2021

openEuler inclusion
category: bugfix
bugzilla: 48265
CVE: NA
Reference: https://gitee.com/openeuler/kernel/issues/I3BPPX

---------------------------------------------------

The default branch in switch will not run at present,
but there may be related extensions in the future,
which may lead to memory leakage.

Signed-off-by: Zhang Ming <154842638(a)qq.com>
Reported-by: Wang ShaoBo <bobo.shaobowang(a)huawei.com>
Suggested-by: Jian Cheng <cj.chengjian(a)huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
[Zheng Zengkai: adjust commit message]
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2571b979

cacheinfo: workaround cacheinfo's info_list uninitialized error · daf583dc

由 Wang ShaoBo 提交于 3月 12, 2021

hulk inclusion
category: bugfix
bugzilla: 48265
CVE: NA

--------------------------------

Workaround cacheinfo's info_list uninitialized error in some special
cases, such as free_cache_attributes() free info_list but not set
num_leaves to zero when PPTT is not supported. this solution lasts
until upstream issue resolved.

Fixes: 950e5edb ("drivers: base: cacheinfo: Add helper to search cacheinfo by of_node")
Fixes: 709c4362 ("cacheinfo: Move resctrl's get_cache_id() to the cacheinfo header file")
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NJian Cheng <cj.chengjian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

daf583dc

openeuler_defconfig: Enable MPAM by default · 4fd69d66

由 Wang ShaoBo 提交于 2月 27, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

Enable MPAM by default.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4fd69d66

arm64/mpam: Sort domains when cpu online · fa837999

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: bugfix
bugzilla: 48265
CVE: NA

--------------------------------

When cpu online, domains inserted into resctrl_resource structure's
domains list may be out of order, so sort them with domain id.

Fixes: 2e2c511ff49d ("arm64/mpam: resctrl: Handle cpuhp and resctrl_dom allocation")
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NJian Cheng <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fa837999

arm64/mpam: resctrl: Refresh cpu mask for handling cpuhp · 867ae5b2

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: bugfix
bugzilla: 48265
CVE: NA

--------------------------------

This fixes two problems:

1) when cpu offline, we should clear cpu mask from all associated resctrl
   group but not only default group.

2) when cpu online, we should set cpu mask for default group and update
   default group's cpus to default state if cdp on, this operation is to
   fill code and data fields of mpam sysregs with appropriate value.

Fixes: 2e2c511ff49d ("arm64/mpam: resctrl: Handle cpuhp and resctrl_dom allocation")
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NJian Cheng <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

867ae5b2

arm64/mpam: resctrl: Allow setting register MPAMCFG_MBW_MIN to 0 · 649e23fb

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: bugfix
bugzilla: 48265
CVE: NA

--------------------------------

Unlike mbw max(Memory Bandwidth Maximum), sometimes we don't want make use
of mbw min feature(this for restrict memory bandwidth maximum capacity
partition by using MPAMCFG_MBW_MIN, MBMIN row in schemata) and set
MPAMCFG_MBW_MIN to 0.

e.g.
    > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMin
    > cd resctrl/ && cat schemata
      L3:0=7fff;1=7fff;2=7fff;3=7fff
      MBMIN:0=0;1=0;2=0;3=0

    # before revision
    > echo 'MBMIN:0=0;1=0;2=0;3=0' > schemata
    > cat schemata
      L3:0=7fff;1=7fff;2=7fff;3=7fff
      MBMIN:0=2;1=2;2=2;3=2

    # after revision
    > echo 'MBMIN:0=0;1=0;2=0;3=0' > schemata
    > cat schemata
      L3:0=7fff;1=7fff;2=7fff;3=7fff
      MBMIN:0=0;1=0;2=0;3=0

Fixes: 5a49c4f1983d ("arm64/mpam: Supplement additional useful ctrl features for mount options")
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

649e23fb

arm64/mpam: resctrl: Use resctrl_group_init_alloc() for default group · 96a27f9d

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: bugfix
bugzilla: 48265
CVE: NA

--------------------------------

When we support configure different types of resources for a resource, the
wrong history value will be updated in the default group after remounting.

e.g.
    > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMax,mbMin && cd resctrl/
    > echo 'MBMIN:0=2;1=2;2=2;3=2' > schemata
    > cat schemata
      L3:0=7fff;1=7fff;2=7fff;3=7fff
      MBMAX:0=100;1=100;2=100;3=100
      MBMIN:0=2;1=2;2=2;3=2
    > cd .. && umount /sys/fs/resctrl/
    > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMax,mbMin && cd resctrl/ && cat schemata
      L3:0=7fff;1=7fff;2=7fff;3=7fff
      MBMAX:0=100;1=100;2=100;3=100
      MBMIN:0=0;1=0;2=0;3=0
    > echo 'MBMAX:0=10;1=10;2=10;3=10' > schemata
    > cat schemata
      L3:0=7fff;1=7fff;2=7fff;3=7fff
      MBMAX:0=10;1=10;2=10;3=10
      MBMIN:0=2;1=2;2=2;3=2  #update error history value

When writing schemata sysfile, call path like this:

resctrl_group_schemata_write()
  -=> resctrl_update_groups_config()
         -=> resctrl_group_update_domains()
               -=> resctrl_group_update_domain_ctrls()
                { .../*refresh new_ctrl array of supported conf type once for each resource*/ }

We should refresh new_ctrl field in struct resctrl_staged_config by
resctrl_group_init_alloc() before calling resctrl_group_update_domain_ctrls().

Fixes: 6b2471f089be ("arm64/mpam: resctrl: Support priority and hardlimit(Memory bandwidth) configuration")
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

96a27f9d

arm64/mpam: resctrl: Add proper error handling to resctrl_mount() · 10e4e43b

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: bugfix
bugzilla: 48265
CVE: NA

--------------------------------

This function is called only when we mount resctrl sysfs, for error
handling we need to destroy schemata list when next few steps failed
after creation of schemata list.

Fixes: 7e9b5caeefff ("arm64/mpam: resctrl: Add helpers for init and destroy schemata list")
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

10e4e43b

arm64/mpam: Use fs_context to parse mount options · 100e2317

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: bugfix
bugzilla: 48265
CVE: NA

--------------------------------

Use fs_context to parse mount options, this old process parsing from
parse_rdtgroupfs_options() will be obsoleted and removed.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

100e2317

arm64/mpam: Supplement additional useful ctrl features for mount options · 228fa64a

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

Based on 61fa56e1dd8a ("arm64/mpam: Add resctrl_ctrl_feature structure to manage
ctrl features"), we add several ctrl features and supply corresponding
mount options, including mbPbm, mbMax, mbMin, mbPrio, caMax, caPrio, caPbm,
if MPAM system supports relevant features, we can mount resctrl like this:

e.g.
   > mount -t resctrl resctrl /sys/fs/resctrl -o mbMax,mbMin,caPrio
   > cd /sys/fs/resctrl && cat schemata
     L3:0=0x7fff;1=0x7fff;2=0x7fff;3=0x7fff #default select cpbm as basic ctrl feature
     L3PRI:0=3;1=3;2=3;3=3
     MBMAX:0=100;1=100;2=100;3=100
     MBMIN:0=0;1=0;2=0;3=0

   > mount -t resctrl resctrl /sys/fs/resctrl
   > cd /sys/fs/resctrl && cat schemata
     L3:0=0x7fff;1=0x7fff;2=0x7fff;3=0x7fff #default select cpbm as basic ctrl feature
     MB:0=100;1=100;2=100;3=100  #default select mbw max as basic ctrl feature

   > mount -t resctrl resctrl /sys/fs/resctrl -o caMax
   > cd /sys/fs/resctrl && cat schemata
     L3:0=33554432;1=33554432;2=33554432;3=33554432 #use cmax ctrl feature
     MB:0=100;1=100;2=100;3=100  #default select mbw max as basic ctrl feature

For Cache MSCs, basic ctrl features include cmax(Cache Maximum Capacity)
and cpbm(Cache protion bitmap) partition, if mount options are not specified,
default cpbm will be selected.

For Memory MSCs, basic ctrl features include max(Memory Bandwidth Maximum)
and pbm(Memory Bandwidth Portion Bitmap) partition, if mount options are
not specified, default max will be selected.

Above mount options also can be used accompany with cdp options.

e.g.
   > mount -t resctrl resctrl /sys/fs/resctrl -o caMax,caPrio,cdpl3
   > cd /sys/fs/resctrl && cat schemata
     L3CODE:0=33554432;1=33554432;2=33554432;3=33554432 #code use cmax ctrl feature
     L3DATA:0=33554432;1=33554432;2=33554432;3=33554432 #data use cmax ctrl feature
     L3CODEPRI:0=3;1=3;2=3;3=3 #code use intpriority ctrl feature
     L3DATAPRI:0=3;1=3;2=3;3=3 #data use intpriority ctrl feature
     MB:0=100;1=100;2=100;3=100  #default select mbw max as basic ctrl feature

By combining these mount parameters can we use MPAM more powerfully.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

228fa64a

arm64/mpam: Set per-cpu's closid to none zero for cdp · cae569b3

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

Sometimes monitoring will have such anomalies:

e.g.
    > cd /sys/fs/resctrl/ && grep . mon_data/*
      mon_data/mon_L3CODE_00:14336
      mon_data/mon_L3CODE_01:344064
      mon_data/mon_L3CODE_02:2048
      mon_data/mon_L3CODE_03:27648
      mon_data/mon_L3DATA_00:0  #L3DATA's monitoring data always be 0
      mon_data/mon_L3DATA_01:0
      mon_data/mon_L3DATA_02:0
      mon_data/mon_L3DATA_03:0
      mon_data/mon_MB_00:392
      mon_data/mon_MB_01:552
      mon_data/mon_MB_02:160
      mon_data/mon_MB_03:0

If cdp on, tasks in resctrl default group with closid=0 and rmid=0 don't
know how to fill proper partid_i/pmg_i and partid_d/pmg_d into MPAMx_ELx
sysregs by mpam_sched_in() called by __switch_to(), it's because current
cpu's default closid and rmid are also equal to 0 and to make the operation
modifying configuration passed.

Update per cpu default closid of none-zero value, call update_closid_rmid()
to update each cpu's mpam proper MPAMx_ELx sysregs for setting partid and
pmg when mounting resctrl sysfs, it looks like a practical method.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cae569b3

arm64/mpam: Simplify mpamid cdp mapping process · 092d98c3

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

MPAM includes partid, pmg, monitor, all of these we collectively call
mpam id, if cdp on, we would allocate a new mpamid_new which equals to
mpamid + 1, and at some places mpamid may not need to be encapsulated
into struct { u16 val; } for simplicity, So we use a simpler macro
resctrl_cdp_mpamid_map_val() to complete this cdp mapping process.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

092d98c3

arm64/mpam: Filter schema control type with ctrl features · b3a23e33

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

ctrl_features array, introduced by 61fa56e1dd8a ("arm64/mpam: Add
resctrl_ctrl_feature structure to manage ctrl features"), which lives
in raw_resctrl_resource structure for listing ctrl features's type do
we support in total for this resource, this filters illegal parameters
outside from mount options and provides useful info for add_schema()
for registering a new control type node in schema list.

This action helps us to add new ctrl feature easier later.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b3a23e33

arm64/mpam: resctrl: Add rmid file in resctrl sysfs · 2ae8305b

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

rmid is used to mark each resctrl group for monitoring, anyhow, also
following corresponding resctrl group's configuration, we export rmid
sysfile to resctrl sysfs for any usage elsewhere such as SMMU io, user
can get rmid from a resctrl group and set this rmid to a target io
through SMMU driver if SMMU MPAM implemented, so make related io devices
can be monitored or accomplish aimed configuration for resource's usage.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2ae8305b

arm64/mpam: Split header files into suitable location · 0c564931

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

So far there are some declarations shared by resctrlfs.c and mpam
core module files under kernel/mpam directory scattered in mpam.h
and resctrl.h, this is organized like this:

-- asm/
   +-- resctrl.h        +
   +-- mpam.h           |    +
   +-- mpam_resource.h  |    |    +
                        |    |    |
-- fs/                  |    |    +-> mpam/
   +-- resctrlfs.c <----+----+------> +-- mpam_resctrl.c ...

We move this declarations shared by resctrlfs.c and mpam/ to resctrl.h
and split another declarations into mpam_internal.h, also including
moving mpam_resource.h to mpam/ directory, currently this is organized
like this:

-- asm/
   +-- mpam.h           +----> export to other modules(e.g. SMMU master io)
   +-- resctrl.h        +
                        |
-- mpam/                |
   +-- mpam_internal.h  |    +
   +-- mpam_resource.h  |    |    +
                        |    |    |
-- fs/                  |    +----+-> mpam/
   +-- resctrlfs.c <----+-----------> +-- mpam_resctrl.c ...

In this way can we build a clearer framework for MPAM usage.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0c564931

arm64/mpam: resctrl: Export resource's properties to info directory · 9d39dad1

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

Some resource's properities such as closid and rmid are exported like
Intel-RDT in our resctrl design, but there also has two main differences,
one is MB(Memory Bandwidth), for we MB is also divided into two directories
MB and MB_MON to show respective properties about control and monitor type
as same as LxCache, another is we adopt features sysfile under resources'
directories, which indicates the properties of control type of corresponding
resource, for instance MB hardlimit.

e.g.
    > mount -t resctrl resctrl /sys/fs/resctrl -o mbHdl
    > cd /sys/fs/resctrl/ && cat info/MB/features
      mbHdl@1  #indicate MBHDL setting's upper bound is 1
    > cat schemata
      L3:0=7fff;1=7fff;2=7fff;3=7fff
      MB:0=100;1=100;2=100;3=100
      MBHDL:0=1;1=1;2=1;3=1
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9d39dad1

arm64/mpam: Add resctrl_ctrl_feature structure to manage ctrl features · cf92ebfa

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

Structure resctrl_ctrl_feature taken by resources is introduced to manage
ctrl features, of which characteristic like max width from outer input
and the base we parse from.

Now it is more practical for declaring a new ctrl feature, such as SCHEMA_PRI
feature, only associated with internal priority setting exported by mpam
devices, where informations is collected from mpam_resctrl_resource_init(),
and next be chosen open or close by user options.

ctrl_ctrl_feature structure contains a flags field to avoid duplicated
control type, for instance, SCHEMA_COMM feature selectes cpbm (Cache
portion bitmap) as resource Cache default control type, so we should not
enable this feature no longer if user manually selectes cpbm control
type through mount options.

This field evt in ctrl_ctrl_feature structure is enum rdt_event_id type
variable which works like eee4ad2a36e6 ("arm64/mpam: Add hook-events id
for ctrl features") illustrates.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cf92ebfa

arm64/mpam: Add wait queue for monitor alloc and free · 7d3cd1a2

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

For MPAM, a rmid can do monitoring work only with a monitor resource
allocated, we adopt a mechanism for monitor resource dynamic allocation
and recycling, it is different from Intel-RDT operation who creates a
kworker thread for dynamically monitoring Cache usage and checks if it
is below a threshold adjustable for rmid free, for we have detected that
this method will affect the cpu utilization in many cases, sometimes this
influence cannot be accepted.

Our method is simple, as different resource's monitor number varies, we
deliever two list, one for storing rmids which has exclusive monitor
resource and another for storing this rmids which have monitor resource
shared, this shared monitor id always be 0. it works like this, if a new
rmid apply for a resource monitor which is in used, then we put this rmid
to the tail of latter list and temporarily give a default monitor id 0
util someone releases available monitor resource, if this new rmid has
all resources' monitor resource needed, then it will be put into exclusive
list.

This implements the LRU allocation of monitor resources and give users
part control rights of allocation and release, if resctrl group's quantity
can be guaranteed or user don't need monitoring too many groups
synchronously, this is a more appropriate way for user deployment, not
only that, also can it avoid the risk of inaccuracy in monitoring when
monitoring operation happen to too many groups at the same time.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7d3cd1a2

arm64/mpam: Remap reqpartid,pmg to rmid and intpartid to closid · 0b16164d

由 Wang ShaoBo 提交于 2月 26, 2021

hulk inclusion
category: feature
feature: ARM MPAM support
bugzilla: 48265
CVE: NA

--------------------------------

So far we use sd_closid, including {reqpartid, intpartid}, to label each
resctrl group including ctrlgroup and mongroup, This can perfectly handle
this case where number of reqpartid exceeds intpartid, this always happen
when intpartid narrowing supported, otherwise their two are of same number.
So we use excessive reqpartid to indicate (1)- how configurations can be
synchronized from the configuration indexed by intpartid, not only that,
(2)- take part of monitor role.

But reqpartid in (2) with pmg still be scattered, So far we have not yet
a right way to explain how can we use their two properly. In order to
ensure their resources can be fully utilized, and given this idea from
Intel-RDT's design which uses rmid for monitoring, a rmid remap matrix is
delivered for transforming partid and pmg to rmid, this matrix is organized
like this:

                 [bitmap entry indexed by partid]
                       [col pos is partid]

                     [0]  [1]  [2]  [3]  [4]  [5]
   occ->bitmap[:0]    1    0    0    1    1    1
        bitmap[:1]    1    0    0    1    1    1
        bitmap[:2]    1    1    1    1    1    1
        bitmap[:3]    1    1    1    1    1    1
[row pos-1 is pmg]

Calculate rmid = partid + NR_partid * pmg

occ represents if this bitmap has been used by a partid, it is because
a certain partid should not be accompany with a duplicated pmg for
monitoring, this design easily saves a lot of space, and can also decrease
time complexity of allocating and free rmid process from O(NR_partid)*
O(NR_pmg) to O(NR_partid) + O(log(NR_pmg)) compared with using list.

By this way, we get a continuous rmid set with upper bound(NR_pmg *
NR_partid - 1), given an rmid we can assume that if it's a valid rmid
by judging whether it falls within this range or not.

rmid implicts the reqpartid info, so we can use relevant helpers to get
this reqpartid for sd_closid@reqpartid and perfectly accomplish this
configuration sync mission, this also makes closid simpler which can be
consists of intpartid index only, also each resctrl group is happy to own
consecutive rmid.

This also has some profound influences, for instance for MPAM there
also support SMMU io using partid and pmg, we can use a single helper
mpam_rmid_to_partid_pmg() in SMMU driver to complete this remap process
for rmid input from outside user space.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0b16164d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功