- 09 4月, 2021 40 次提交
-
-
由 Guangbin Huang 提交于
stable inclusion from stable-5.10.24 commit 6aa23829949c2c0912e82866aeab4fd591595235 bugzilla: 51348 -------------------------------- commit d9032dba upstream. If phy uses generic driver and autoneg is on, enter command "ethtool -s eth0 speed 50" will not change phy speed actually, but command "ethtool eth0" shows speed is 50Mb/s because phydev->speed has been set to 50 and no update later. And duplex setting has same problem too. However, if autoneg is on, phy only changes speed and duplex according to phydev->advertising, but not phydev->speed and phydev->duplex. So in this case, phydev->speed and phydev->duplex don't need to be set in function phy_ethtool_ksettings_set() if autoneg is on. Fixes: 51e2a384 ("PHY: Avoid unnecessary aneg restarts") Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com> Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jason A. Donenfeld 提交于
stable inclusion from stable-5.10.24 commit 91796b65563bd3fd0efe4fb56d6ee1c5c6006eb0 bugzilla: 51348 -------------------------------- commit 4372339e upstream. There were a few remaining tunnel drivers that didn't receive the prior conversion to icmp{,v6}_ndo_send. Knowing now that this could lead to memory corrution (see ee576c47 ("net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending") for details), there's even more imperative to have these all converted. So this commit goes through the remaining cases that I could find and does a boring translation to the ndo variety. The Fixes: line below is the merge that originally added icmp{,v6}_ ndo_send and converted the first batch of icmp{,v6}_send users. The rationale then for the change applies equally to this patch. It's just that these drivers were left out of the initial conversion because these network devices are hiding in net/ rather than in drivers/net/. Cc: Florian Westphal <fw@strlen.de> Cc: Willem de Bruijn <willemb@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Fixes: 803381f9 ("Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'") Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Vasily Averin 提交于
stable inclusion from stable-5.10.24 commit 8abbf7e53e179b16dc48c40cecc6c86240ca026c bugzilla: 51348 -------------------------------- commit 8e24eddd upstream. nested target/match_revfn() calls work with xt[NFPROTO_UNSPEC] lists without taking xt[NFPROTO_UNSPEC].mutex. This can race with module unload and cause host to crash: general protection fault: 0000 [#1] Modules linked in: ... [last unloaded: xt_cluster] CPU: 0 PID: 542455 Comm: iptables RIP: 0010:[<ffffffff8ffbd518>] [<ffffffff8ffbd518>] strcmp+0x18/0x40 RDX: 0000000000000003 RSI: ffff9a5a5d9abe10 RDI: dead000000000111 R13: ffff9a5a5d9abe10 R14: ffff9a5a5d9abd8c R15: dead000000000100 (VvS: %R15 -- &xt_match, %RDI -- &xt_match.name, xt_cluster unregister match in xt[NFPROTO_UNSPEC].match list) Call Trace: [<ffffffff902ccf44>] match_revfn+0x54/0xc0 [<ffffffff902ccf9f>] match_revfn+0xaf/0xc0 [<ffffffff902cd01e>] xt_find_revision+0x6e/0xf0 [<ffffffffc05a5be0>] do_ipt_get_ctl+0x100/0x420 [ip_tables] [<ffffffff902cc6bf>] nf_getsockopt+0x4f/0x70 [<ffffffff902dd99e>] ip_getsockopt+0xde/0x100 [<ffffffff903039b5>] raw_getsockopt+0x25/0x50 [<ffffffff9026c5da>] sock_common_getsockopt+0x1a/0x20 [<ffffffff9026b89d>] SyS_getsockopt+0x7d/0xf0 [<ffffffff903cbf92>] system_call_fastpath+0x25/0x2a Fixes: 656caff2 ("netfilter 04/09: x_tables: fix match/target revision lookup") Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Reviewed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Florian Westphal 提交于
stable inclusion from stable-5.10.24 commit 42402bd84530d3761b97775c10762fde28d5b2f9 bugzilla: 51348 -------------------------------- commit 03a3ca37 upstream. Under extremely rare conditions TCP early demux will retrieve the wrong socket. 1. local machine establishes a connection to a remote server, S, on port p. This gives: laddr:lport -> S:p ... both in tcp and conntrack. 2. local machine establishes a connection to host H, on port p2. 2a. TCP stack choses same laddr:lport, so we have laddr:lport -> H:p2 from TCP point of view. 2b). There is a destination NAT rewrite in place, translating H:p2 to S:p. This results in following conntrack entries: I) laddr:lport -> S:p (origin) S:p -> laddr:lport (reply) II) laddr:lport -> H:p2 (origin) S:p -> laddr:lport2 (reply) NAT engine has rewritten laddr:lport to laddr:lport2 to map the reply packet to the correct origin. When server sends SYN/ACK to laddr:lport2, the PREROUTING hook will undo-the SNAT transformation, rewriting IP header to S:p -> laddr:lport This causes TCP early demux to associate the skb with the TCP socket of the first connection. The INPUT hook will then reverse the DNAT transformation, rewriting the IP header to H:p2 -> laddr:lport. Because packet ends up with the wrong socket, the new connection never completes: originator stays in SYN_SENT and conntrack entry remains in SYN_RECV until timeout, and responder retransmits SYN/ACK until it gives up. To resolve this, orphan the skb after the input rewrite: Because the source IP address changed, the socket must be incorrect. We can't move the DNAT undo to prerouting due to backwards compatibility, doing so will make iptables/nftables rules to no longer match the way they did. After orphan, the packet will be handed to the next protocol layer (tcp, udp, ...) and that will repeat the socket lookup just like as if early demux was disabled. Fixes: 41063e9d ("ipv4: Early TCP socket demux.") Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Eric Dumazet 提交于
stable inclusion from stable-5.10.24 commit 046f3c1c2ff450fb7ae53650e9a95e0074a61f3e bugzilla: 51348 -------------------------------- commit 8811f4a9 upstream. Qingyu Li reported a syzkaller bug where the repro changes RCV SEQ _after_ restoring data in the receive queue. mprotect(0x4aa000, 12288, PROT_READ) = 0 mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000 mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000 mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000 socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3 setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0 setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0 sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20 setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0 setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0 recvfrom(3, NULL, 20, 0, NULL, NULL) = -1 ECONNRESET (Connection reset by peer) syslog shows: [ 111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0 [ 111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0 This should not be allowed. TCP_QUEUE_SEQ should only be used when queues are empty. This patch fixes this case, and the tx path as well. Fixes: ee995283 ("tcp: Initial repair mode") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005Reported-by: NQingyu Li <ieatmuttonchuan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Arjun Roy 提交于
stable inclusion from stable-5.10.24 commit e95ebe1ed6abc259b897abc1f92622504750747c bugzilla: 51348 -------------------------------- commit 2107d45f upstream. getsockopt(TCP_ZEROCOPY_RECEIVE) has a bug where we read a user-provided "len" field of type signed int, and then compare the value to the result of an "offsetofend" operation, which is unsigned. Negative values provided by the user will be promoted to large positive numbers; thus checking that len < offsetofend() will return false when the intention was that it return true. Note that while len is originally checked for negative values earlier on in do_tcp_getsockopt(), subsequent calls to get_user() re-read the value from userspace which may have changed in the meantime. Therefore, re-add the check for negative values after the call to get_user in the handler code for TCP_ZEROCOPY_RECEIVE. Fixes: c8856c05 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.") Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NArjun Roy <arjunroy@google.com> Link: https://lore.kernel.org/r/20210225232628.4033281-1-arjunroy.kdev@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Torin Cooper-Bennun 提交于
stable inclusion from stable-5.10.24 commit 473bce9b9393a3a990ed7c9708af38df553f2712 bugzilla: 51348 -------------------------------- commit 27126252 upstream. This patch prevents a potentially destructive race condition. The device is fully operational on the bus after entering Normal Mode, so zeroing the MRAM after entering this mode may lead to loss of information, e.g. new received messages. This patch fixes the problem by first initializing the MRAM, then bringing the device into Normale Mode. Fixes: 5443c226 ("can: tcan4x5x: Add tcan4x5x driver to the kernel") Link: https://lore.kernel.org/r/20210226163440.313628-1-torin@maxiluxsystems.comSuggested-by: NMarc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: NTorin Cooper-Bennun <torin@maxiluxsystems.com> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Joakim Zhang 提交于
stable inclusion from stable-5.10.24 commit c537011c99abc9d1e1e9bc2a3bb32fda1cda4583 bugzilla: 51348 -------------------------------- commit c6382004 upstream. Invoke flexcan_chip_freeze() to enter freeze mode, since need poll freeze mode acknowledge. Fixes: e955cead ("CAN: Add Flexcan CAN controller driver") Link: https://lore.kernel.org/r/20210218110037.16591-4-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Joakim Zhang 提交于
stable inclusion from stable-5.10.24 commit e24c53182850abce8c7fe3423f843ccb62581e6f bugzilla: 51348 -------------------------------- commit ec15e27c upstream. RX FIFO enable failed could happen when do system reboot stress test: [ 0.303958] flexcan 5a8d0000.can: 5a8d0000.can supply xceiver not found, using dummy regulator [ 0.304281] flexcan 5a8d0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core [ 0.314640] flexcan 5a8d0000.can: registering netdev failed [ 0.320728] flexcan 5a8e0000.can: 5a8e0000.can supply xceiver not found, using dummy regulator [ 0.320991] flexcan 5a8e0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core [ 0.331360] flexcan 5a8e0000.can: registering netdev failed [ 0.337444] flexcan 5a8f0000.can: 5a8f0000.can supply xceiver not found, using dummy regulator [ 0.337716] flexcan 5a8f0000.can (unnamed net_device) (uninitialized): Could not enable RX FIFO, unsupported core [ 0.348117] flexcan 5a8f0000.can: registering netdev failed RX FIFO should be enabled after the FRZ/HALT are valid. But the current code enable RX FIFO and FRZ/HALT at the same time. Fixes: e955cead ("CAN: Add Flexcan CAN controller driver") Link: https://lore.kernel.org/r/20210218110037.16591-3-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Joakim Zhang 提交于
stable inclusion from stable-5.10.24 commit 98b7f969116df96c57e9a8572620d71e92fcb725 bugzilla: 51348 -------------------------------- commit 449052cf upstream. Assert HALT bit to enter freeze mode, there is a premise that FRZ bit is asserted. This patch asserts FRZ bit in flexcan_chip_freeze, although the reset value is 1b'1. This is a prepare patch, later patch will invoke flexcan_chip_freeze() to enter freeze mode, which polling freeze mode acknowledge. Fixes: b1aa1c7a ("can: flexcan: fix transition from and to freeze mode in chip_{,un}freeze") Link: https://lore.kernel.org/r/20210218110037.16591-2-qiangqing.zhang@nxp.comSigned-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Oleksij Rempel 提交于
stable inclusion from stable-5.10.24 commit 4224890edff1b4679dc8ddeaa69b43efce5366ba bugzilla: 51348 -------------------------------- commit e940e089 upstream. There are two ref count variables controlling the free()ing of a socket: - struct sock::sk_refcnt - which is changed by sock_hold()/sock_put() - struct sock::sk_wmem_alloc - which accounts the memory allocated by the skbs in the send path. In case there are still TX skbs on the fly and the socket() is closed, the struct sock::sk_refcnt reaches 0. In the TX-path the CAN stack clones an "echo" skb, calls sock_hold() on the original socket and references it. This produces the following back trace: | WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134 | refcount_t: addition on 0; use-after-free. | Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E) | CPU: 0 PID: 280 Comm: test_can.sh Tainted: G E 5.11.0-04577-gf8ff6603c617 #203 | Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) | Backtrace: | [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220 | [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8) | [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90 | [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2 | [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540 | [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50) | [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c) | [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600 | [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00 | [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600 | [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400 | [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534) To fix this problem, only set skb ownership to sockets which have still a ref count > 0. Fixes: 0ae89beb ("can: add destructor for self generated skbs") Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: Andre Naujoks <nautsch2@gmail.com> Link: https://lore.kernel.org/r/20210226092456.27126-1-o.rempel@pengutronix.deSuggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: NOliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Matthias Schiffer 提交于
stable inclusion from stable-5.10.24 commit fa5d019c56e78e0b33f585d23149f2553568b998 bugzilla: 51348 -------------------------------- commit 3e59e885 upstream. Commit 5ee759cd ("l2tp: use standard API for warning log messages") changed a number of warnings about invalid packets in the receive path so that they are always shown, instead of only when a special L2TP debug flag is set. Even with rate limiting these warnings can easily cause significant log spam - potentially triggered by a malicious party sending invalid packets on purpose. In addition these warnings were noticed by projects like Tunneldigger [1], which uses L2TP for its data path, but implements its own control protocol (which is sufficiently different from L2TP data packets that it would always be passed up to userspace even with future extensions of L2TP). Some of the warnings were already redundant, as l2tp_stats has a counter for these packets. This commit adds one additional counter for invalid packets that are passed up to userspace. Packets with unknown session are not counted as invalid, as there is nothing wrong with the format of these packets. With the additional counter, all of these messages are either redundant or benign, so we reduce them to pr_debug_ratelimited(). [1] https://github.com/wlanslovenija/tunneldigger/issues/160 Fixes: 5ee759cd ("l2tp: use standard API for warning log messages") Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Balazs Nemeth 提交于
stable inclusion from stable-5.10.24 commit 453fff24f52eeb62ab65582848498097273df269 bugzilla: 51348 -------------------------------- commit d348ede3 upstream. A packet with skb_inner_network_header(skb) == skb_network_header(skb) and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers from the packet. Subsequently, the call to skb_mac_gso_segment will again call mpls_gso_segment with the same packet leading to an infinite loop. In addition, ensure that the header length is a multiple of four, which should hold irrespective of the number of stacked labels. Signed-off-by: NBalazs Nemeth <bnemeth@redhat.com> Acked-by: NWillem de Bruijn <willemb@google.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Balazs Nemeth 提交于
stable inclusion from stable-5.10.24 commit faa3baa2828c5e1c4374f3e60041f75c64f5fcb6 bugzilla: 51348 -------------------------------- commit 924a9bc3 upstream. For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't set) based on the type in the virtio net hdr, but the skb could contain anything since it could come from packet_snd through a raw socket. If there is a mismatch between what virtio_net_hdr_set_proto sets and the actual protocol, then the skb could be handled incorrectly later on. An example where this poses an issue is with the subsequent call to skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set correctly. A specially crafted packet could fool skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned. Avoid blindly trusting the information provided by the virtio net header by checking that the protocol in the packet actually matches the protocol set by virtio_net_hdr_set_proto. Note that since the protocol is only checked if skb->dev implements header_ops->parse_protocol, packets from devices without the implementation are not checked at this stage. Fixes: 9274124f ("net: stricter validation of untrusted gso packets") Signed-off-by: NBalazs Nemeth <bnemeth@redhat.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Daniel Borkmann 提交于
stable inclusion from stable-5.10.24 commit 09af4362ba47c805347840c2bb9719c0458925ca bugzilla: 51348 -------------------------------- commit 89e5c58f upstream. We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the csum for the UDP header itself is 0. In that case, GRO aggregation does not take place on the phys dev, but instead is deferred to the vxlan/geneve driver (see trace below). The reason is essentially that GRO aggregation bails out in udp_gro_receive() for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e, others) where for non-zero csums 2abb7cdc ("udp: Add support for doing checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE and napi context has csum_valid set. This is however not the case for zero UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false). At the same time 57c67ff4 ("udp: additional GRO support") added matches on !uh->check ^ !uh2->check as part to determine candidates for aggregation, so it certainly is expected to handle zero csums in udp_gro_receive(). The purpose of the check added via 662880f4 ("net: Allow GRO to use and set levels of checksum unnecessary") seems to catch bad csum and stop aggregation right away. One way to fix aggregation in the zero case is to only perform the !csum_valid check in udp_gro_receive() if uh->check is infact non-zero. Before: [...] swapper 0 [008] 731.946506: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100400 len=1500 (1) swapper 0 [008] 731.946507: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100200 len=1500 swapper 0 [008] 731.946507: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101100 len=1500 swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101700 len=1500 swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101b00 len=1500 swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100600 len=1500 swapper 0 [008] 731.946508: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100f00 len=1500 swapper 0 [008] 731.946509: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100a00 len=1500 swapper 0 [008] 731.946516: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100500 len=1500 swapper 0 [008] 731.946516: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100700 len=1500 swapper 0 [008] 731.946516: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101d00 len=1500 (2) swapper 0 [008] 731.946517: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101000 len=1500 swapper 0 [008] 731.946517: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101c00 len=1500 swapper 0 [008] 731.946517: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101400 len=1500 swapper 0 [008] 731.946518: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100e00 len=1500 swapper 0 [008] 731.946518: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497101600 len=1500 swapper 0 [008] 731.946521: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff966497100800 len=774 swapper 0 [008] 731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1) swapper 0 [008] 731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112 (2) [...] # netperf -H 10.55.10.4 -t TCP_STREAM -l 20 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 20.01 13129.24 After: [...] swapper 0 [026] 521.862641: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff93ab0d479000 len=11286 (1) swapper 0 [026] 521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1) swapper 0 [026] 521.862650: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff93ab0d478500 len=2898 (2) swapper 0 [026] 521.862650: net:netif_receive_skb: dev=enp10s0f0 skbaddr=0xffff93ab0d479f00 len=8490 (3) swapper 0 [026] 521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848 (2) swapper 0 [026] 521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440 (3) [...] # netperf -H 10.55.10.4 -t TCP_STREAM -l 20 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 20.01 24576.53 Fixes: 57c67ff4 ("udp: additional GRO support") Fixes: 662880f4 ("net: Allow GRO to use and set levels of checksum unnecessary") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tom Herbert <tom@herbertland.com> Acked-by: NWillem de Bruijn <willemb@google.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20210226212248.8300-1-daniel@iogearbox.netSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Felix Fietkau 提交于
stable inclusion from stable-5.10.24 commit d2fb1911a7a8f655440d613fc8946df384d83ee5 bugzilla: 51348 -------------------------------- commit 3b9ea720 upstream. When transmitting to a receiver in dynamic SMPS mode, all transmissions that use multiple spatial streams need to be sent using CTS-to-self or RTS/CTS to give the receiver's extra chains some time to wake up. This fixes the tx rate getting stuck at <= MCS7 for some clients, especially Intel ones, which make aggressive use of SMPS. Cc: stable@vger.kernel.org Reported-by: NMartin Kennedy <hurricos@gmail.com> Signed-off-by: NFelix Fietkau <nbd@nbd.name> Signed-off-by: NKalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210214184911.96702-1-nbd@nbd.nameSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Maciej W. Rozycki 提交于
stable inclusion from stable-5.10.24 commit b0454a28f60878539a55439436ea9ad29728d366 bugzilla: 51348 -------------------------------- commit 6c810cf2 upstream. The MIPS Poly1305 implementation is generic MIPS code written such as to support down to the original MIPS I and MIPS III ISA for the 32-bit and 64-bit variant respectively. Lift the current limitation then to enable code for MIPSr1 ISA or newer processors only and have it available for all MIPS processors. Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk> Fixes: a11d055e ("crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation") Cc: stable@vger.kernel.org # v5.5+ Acked-by: NJason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jakub Kicinski 提交于
stable inclusion from stable-5.10.24 commit a0df424a863aa6a2e8bd57ef5e0928da5d5b797f bugzilla: 51348 -------------------------------- commit a4dcfbc4 upstream. netif_device_attach() will unpause the queues so we can't call it before __alx_open(). This went undetected until commit b0999223 ("alx: add ability to allocate and free alx_napi structures") but now if stack tries to xmit immediately on resume before __alx_open() we'll crash on the NAPI being null: BUG: kernel NULL pointer dereference, address: 0000000000000198 CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G OE 5.10.0-3-amd64 #1 Debian 5.10.13-1 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77-D3H, BIOS F15 11/14/2013 RIP: 0010:alx_start_xmit+0x34/0x650 [alx] Code: 41 56 41 55 41 54 55 53 48 83 ec 20 0f b7 57 7c 8b 8e b0 0b 00 00 39 ca 72 06 89 d0 31 d2 f7 f1 89 d2 48 8b 84 df RSP: 0018:ffffb09240083d28 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffffa04d80ae7800 RCX: 0000000000000004 RDX: 0000000000000000 RSI: ffffa04d80afa000 RDI: ffffa04e92e92a00 RBP: 0000000000000042 R08: 0000000000000100 R09: ffffa04ea3146700 R10: 0000000000000014 R11: 0000000000000000 R12: ffffa04e92e92100 R13: 0000000000000001 R14: ffffa04e92e92a00 R15: ffffa04e92e92a00 FS: 0000000000000000(0000) GS:ffffa0508f600000(0000) knlGS:0000000000000000 i915 0000:00:02.0: vblank wait timed out on crtc 0 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000198 CR3: 000000004460a001 CR4: 00000000001706f0 Call Trace: dev_hard_start_xmit+0xc7/0x1e0 sch_direct_xmit+0x10f/0x310 Cc: <stable@vger.kernel.org> # 4.9+ Fixes: bc2bebe8 ("alx: remove WoL support") Reported-by: NZbynek Michl <zbynek.michl@gmail.com> Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=983595Signed-off-by: NJakub Kicinski <kuba@kernel.org> Tested-by: NZbynek Michl <zbynek.michl@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Greg Kurz 提交于
stable inclusion from stable-5.10.24 commit a9c55f22a0b978d636204509c4edaf511cb20f62 bugzilla: 51348 -------------------------------- commit f9619d5e upstream. Depending on the number of online CPUs in the original kernel, it is likely for CPU #0 to be offline in a kdump kernel. The associated IRQs in the affinity mappings provided by irq_create_affinity_masks() are thus not started by irq_startup(), as per-design with managed IRQs. This can be a problem with multi-queue block devices driven by blk-mq : such a non-started IRQ is very likely paired with the single queue enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This causes the device to remain silent and likely hangs the guest at some point. This is a regression caused by commit 9ea69a55 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()"). Note that this only happens with the XIVE interrupt controller because XICS has a workaround to bypass affinity, which is activated during kdump with the "noirqdistrib" kernel parameter. The issue comes from a combination of factors: - discrepancy between the number of queues detected by the multi-queue block driver, that was used to create the MSI vectors, and the single queue mode enforced later on by blk-mq because of kdump (i.e. keeping all queues fixes the issue) - CPU#0 offline (i.e. kdump always succeed with CPU#0) Given that I couldn't reproduce on x86, which seems to always have CPU#0 online even during kdump, I'm not sure where this should be fixed. Hence going for another approach : fine-grained affinity is for performance and we don't really care about that during kdump. Simply revert to the previous working behavior of ignoring affinity masks in this case only. Fixes: 9ea69a55 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NLaurent Vivier <lvivier@redhat.com> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210215094506.1196119-1-groug@kaod.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Athira Rajeev 提交于
stable inclusion from stable-5.10.24 commit ac022fbee6855dc6304a9e63e481859b2589836d bugzilla: 51348 -------------------------------- commit 5ae5fbd2 upstream. Running "perf mem record" in powerpc platforms with selinux enabled resulted in soft lockup's. Below call-trace was seen in the logs: CPU: 58 PID: 3751 Comm: sssd_nss Not tainted 5.11.0-rc7+ #2 NIP: c000000000dff3d4 LR: c000000000dff3d0 CTR: 0000000000000000 REGS: c000007fffab7d60 TRAP: 0100 Not tainted (5.11.0-rc7+) ... NIP _raw_spin_lock_irqsave+0x94/0x120 LR _raw_spin_lock_irqsave+0x90/0x120 Call Trace: 0xc00000000fd47260 (unreliable) skb_queue_tail+0x3c/0x90 audit_log_end+0x6c/0x180 common_lsm_audit+0xb0/0xe0 slow_avc_audit+0xa4/0x110 avc_has_perm+0x1c4/0x260 selinux_perf_event_open+0x74/0xd0 security_perf_event_open+0x68/0xc0 record_and_restart+0x6e8/0x7f0 perf_event_interrupt+0x22c/0x560 performance_monitor_exception0x4c/0x60 performance_monitor_common_virt+0x1c8/0x1d0 interrupt: f00 at _raw_spin_lock_irqsave+0x38/0x120 NIP: c000000000dff378 LR: c000000000b5fbbc CTR: c0000000007d47f0 REGS: c00000000fd47860 TRAP: 0f00 Not tainted (5.11.0-rc7+) ... NIP _raw_spin_lock_irqsave+0x38/0x120 LR skb_queue_tail+0x3c/0x90 interrupt: f00 0x38 (unreliable) 0xc00000000aae6200 audit_log_end+0x6c/0x180 audit_log_exit+0x344/0xf80 __audit_syscall_exit+0x2c0/0x320 do_syscall_trace_leave+0x148/0x200 syscall_exit_prepare+0x324/0x390 system_call_common+0xfc/0x27c The above trace shows that while the CPU was handling a performance monitor exception, there was a call to security_perf_event_open() function. In powerpc core-book3s, this function is called from perf_allow_kernel() check during recording of data address in the sample via perf_get_data_addr(). Commit da97e184 ("perf_event: Add support for LSM and SELinux checks") introduced security enhancements to perf. As part of this commit, the new security hook for perf_event_open() was added in all places where perf paranoid check was previously used. In powerpc core-book3s code, originally had paranoid checks in perf_get_data_addr() and power_pmu_bhrb_read(). So perf_paranoid_kernel() checks were replaced with perf_allow_kernel() in these PMU helper functions as well. The intention of paranoid checks in core-book3s was to verify privilege access before capturing some of the sample data. Along with paranoid checks, perf_allow_kernel() also does a security_perf_event_open(). Since these functions are accessed while recording a sample, we end up calling selinux_perf_event_open() in PMI context. Some of the security functions use spinlock like sidtab_sid2str_put(). If a perf interrupt hits under a spin lock and if we end up in calling selinux hook functions in PMI handler, this could cause a dead lock. Since the purpose of this security hook is to control access to perf_event_open(), it is not right to call this in interrupt context. The paranoid checks in powerpc core-book3s were done at interrupt time which is also not correct. Reference commits: Commit cd1231d7 ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()") Commit bb19af81 ("powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer") We only allow creation of events that have already passed the privilege checks in perf_event_open(). So these paranoid checks are not needed at event time. As a fix, patch uses 'event->attr.exclude_kernel' check to prevent exposing kernel address for userspace only sampling. Fixes: cd1231d7 ("powerpc/perf: Prevent kernel address leak via perf_get_data_addr()") Cc: stable@vger.kernel.org # v4.17+ Suggested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1614247839-1428-1-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Dmitry V. Levin 提交于
stable inclusion from stable-5.10.24 commit 7732f57f0f523509b0b405ad2a0271f4016a4b45 bugzilla: 51348 -------------------------------- commit c33cb002 upstream. Apparently, <linux/netfilter/nfnetlink_cthelper.h> and <linux/netfilter/nfnetlink_acct.h> could not be included into the same compilation unit because of a cut-and-paste typo in the former header. Fixes: 12f7a505 ("netfilter: add user-space connection tracking helper infrastructure") Cc: <stable@vger.kernel.org> # v3.6 Signed-off-by: NDmitry V. Levin <ldv@altlinux.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhang Ming 提交于
openEuler inclusion category: bugfix bugzilla: 48265 CVE: NA Reference: https://gitee.com/openeuler/kernel/issues/I3BPPX --------------------------------------------------- The default branch in switch will not run at present, but there may be related extensions in the future, which may lead to memory leakage. Signed-off-by: Zhang Ming <154842638(a)qq.com> Reported-by: Wang ShaoBo <bobo.shaobowang(a)huawei.com> Suggested-by: Jian Cheng <cj.chengjian(a)huawei.com> Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com> [Zheng Zengkai: adjust commit message] Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- Workaround cacheinfo's info_list uninitialized error in some special cases, such as free_cache_attributes() free info_list but not set num_leaves to zero when PPTT is not supported. this solution lasts until upstream issue resolved. Fixes: 950e5edb ("drivers: base: cacheinfo: Add helper to search cacheinfo by of_node") Fixes: 709c4362 ("cacheinfo: Move resctrl's get_cache_id() to the cacheinfo header file") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NJian Cheng <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Enable MPAM by default. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- When cpu online, domains inserted into resctrl_resource structure's domains list may be out of order, so sort them with domain id. Fixes: 2e2c511ff49d ("arm64/mpam: resctrl: Handle cpuhp and resctrl_dom allocation") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NJian Cheng <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- This fixes two problems: 1) when cpu offline, we should clear cpu mask from all associated resctrl group but not only default group. 2) when cpu online, we should set cpu mask for default group and update default group's cpus to default state if cdp on, this operation is to fill code and data fields of mpam sysregs with appropriate value. Fixes: 2e2c511ff49d ("arm64/mpam: resctrl: Handle cpuhp and resctrl_dom allocation") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NJian Cheng <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- Unlike mbw max(Memory Bandwidth Maximum), sometimes we don't want make use of mbw min feature(this for restrict memory bandwidth maximum capacity partition by using MPAMCFG_MBW_MIN, MBMIN row in schemata) and set MPAMCFG_MBW_MIN to 0. e.g. > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMin > cd resctrl/ && cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MBMIN:0=0;1=0;2=0;3=0 # before revision > echo 'MBMIN:0=0;1=0;2=0;3=0' > schemata > cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MBMIN:0=2;1=2;2=2;3=2 # after revision > echo 'MBMIN:0=0;1=0;2=0;3=0' > schemata > cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MBMIN:0=0;1=0;2=0;3=0 Fixes: 5a49c4f1983d ("arm64/mpam: Supplement additional useful ctrl features for mount options") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- When we support configure different types of resources for a resource, the wrong history value will be updated in the default group after remounting. e.g. > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMax,mbMin && cd resctrl/ > echo 'MBMIN:0=2;1=2;2=2;3=2' > schemata > cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MBMAX:0=100;1=100;2=100;3=100 MBMIN:0=2;1=2;2=2;3=2 > cd .. && umount /sys/fs/resctrl/ > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMax,mbMin && cd resctrl/ && cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MBMAX:0=100;1=100;2=100;3=100 MBMIN:0=0;1=0;2=0;3=0 > echo 'MBMAX:0=10;1=10;2=10;3=10' > schemata > cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MBMAX:0=10;1=10;2=10;3=10 MBMIN:0=2;1=2;2=2;3=2 #update error history value When writing schemata sysfile, call path like this: resctrl_group_schemata_write() -=> resctrl_update_groups_config() -=> resctrl_group_update_domains() -=> resctrl_group_update_domain_ctrls() { .../*refresh new_ctrl array of supported conf type once for each resource*/ } We should refresh new_ctrl field in struct resctrl_staged_config by resctrl_group_init_alloc() before calling resctrl_group_update_domain_ctrls(). Fixes: 6b2471f089be ("arm64/mpam: resctrl: Support priority and hardlimit(Memory bandwidth) configuration") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- This function is called only when we mount resctrl sysfs, for error handling we need to destroy schemata list when next few steps failed after creation of schemata list. Fixes: 7e9b5caeefff ("arm64/mpam: resctrl: Add helpers for init and destroy schemata list") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- Use fs_context to parse mount options, this old process parsing from parse_rdtgroupfs_options() will be obsoleted and removed. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Based on 61fa56e1dd8a ("arm64/mpam: Add resctrl_ctrl_feature structure to manage ctrl features"), we add several ctrl features and supply corresponding mount options, including mbPbm, mbMax, mbMin, mbPrio, caMax, caPrio, caPbm, if MPAM system supports relevant features, we can mount resctrl like this: e.g. > mount -t resctrl resctrl /sys/fs/resctrl -o mbMax,mbMin,caPrio > cd /sys/fs/resctrl && cat schemata L3:0=0x7fff;1=0x7fff;2=0x7fff;3=0x7fff #default select cpbm as basic ctrl feature L3PRI:0=3;1=3;2=3;3=3 MBMAX:0=100;1=100;2=100;3=100 MBMIN:0=0;1=0;2=0;3=0 > mount -t resctrl resctrl /sys/fs/resctrl > cd /sys/fs/resctrl && cat schemata L3:0=0x7fff;1=0x7fff;2=0x7fff;3=0x7fff #default select cpbm as basic ctrl feature MB:0=100;1=100;2=100;3=100 #default select mbw max as basic ctrl feature > mount -t resctrl resctrl /sys/fs/resctrl -o caMax > cd /sys/fs/resctrl && cat schemata L3:0=33554432;1=33554432;2=33554432;3=33554432 #use cmax ctrl feature MB:0=100;1=100;2=100;3=100 #default select mbw max as basic ctrl feature For Cache MSCs, basic ctrl features include cmax(Cache Maximum Capacity) and cpbm(Cache protion bitmap) partition, if mount options are not specified, default cpbm will be selected. For Memory MSCs, basic ctrl features include max(Memory Bandwidth Maximum) and pbm(Memory Bandwidth Portion Bitmap) partition, if mount options are not specified, default max will be selected. Above mount options also can be used accompany with cdp options. e.g. > mount -t resctrl resctrl /sys/fs/resctrl -o caMax,caPrio,cdpl3 > cd /sys/fs/resctrl && cat schemata L3CODE:0=33554432;1=33554432;2=33554432;3=33554432 #code use cmax ctrl feature L3DATA:0=33554432;1=33554432;2=33554432;3=33554432 #data use cmax ctrl feature L3CODEPRI:0=3;1=3;2=3;3=3 #code use intpriority ctrl feature L3DATAPRI:0=3;1=3;2=3;3=3 #data use intpriority ctrl feature MB:0=100;1=100;2=100;3=100 #default select mbw max as basic ctrl feature By combining these mount parameters can we use MPAM more powerfully. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Sometimes monitoring will have such anomalies: e.g. > cd /sys/fs/resctrl/ && grep . mon_data/* mon_data/mon_L3CODE_00:14336 mon_data/mon_L3CODE_01:344064 mon_data/mon_L3CODE_02:2048 mon_data/mon_L3CODE_03:27648 mon_data/mon_L3DATA_00:0 #L3DATA's monitoring data always be 0 mon_data/mon_L3DATA_01:0 mon_data/mon_L3DATA_02:0 mon_data/mon_L3DATA_03:0 mon_data/mon_MB_00:392 mon_data/mon_MB_01:552 mon_data/mon_MB_02:160 mon_data/mon_MB_03:0 If cdp on, tasks in resctrl default group with closid=0 and rmid=0 don't know how to fill proper partid_i/pmg_i and partid_d/pmg_d into MPAMx_ELx sysregs by mpam_sched_in() called by __switch_to(), it's because current cpu's default closid and rmid are also equal to 0 and to make the operation modifying configuration passed. Update per cpu default closid of none-zero value, call update_closid_rmid() to update each cpu's mpam proper MPAMx_ELx sysregs for setting partid and pmg when mounting resctrl sysfs, it looks like a practical method. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- MPAM includes partid, pmg, monitor, all of these we collectively call mpam id, if cdp on, we would allocate a new mpamid_new which equals to mpamid + 1, and at some places mpamid may not need to be encapsulated into struct { u16 val; } for simplicity, So we use a simpler macro resctrl_cdp_mpamid_map_val() to complete this cdp mapping process. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- ctrl_features array, introduced by 61fa56e1dd8a ("arm64/mpam: Add resctrl_ctrl_feature structure to manage ctrl features"), which lives in raw_resctrl_resource structure for listing ctrl features's type do we support in total for this resource, this filters illegal parameters outside from mount options and provides useful info for add_schema() for registering a new control type node in schema list. This action helps us to add new ctrl feature easier later. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- rmid is used to mark each resctrl group for monitoring, anyhow, also following corresponding resctrl group's configuration, we export rmid sysfile to resctrl sysfs for any usage elsewhere such as SMMU io, user can get rmid from a resctrl group and set this rmid to a target io through SMMU driver if SMMU MPAM implemented, so make related io devices can be monitored or accomplish aimed configuration for resource's usage. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- So far there are some declarations shared by resctrlfs.c and mpam core module files under kernel/mpam directory scattered in mpam.h and resctrl.h, this is organized like this: -- asm/ +-- resctrl.h + +-- mpam.h | + +-- mpam_resource.h | | + | | | -- fs/ | | +-> mpam/ +-- resctrlfs.c <----+----+------> +-- mpam_resctrl.c ... We move this declarations shared by resctrlfs.c and mpam/ to resctrl.h and split another declarations into mpam_internal.h, also including moving mpam_resource.h to mpam/ directory, currently this is organized like this: -- asm/ +-- mpam.h +----> export to other modules(e.g. SMMU master io) +-- resctrl.h + | -- mpam/ | +-- mpam_internal.h | + +-- mpam_resource.h | | + | | | -- fs/ | +----+-> mpam/ +-- resctrlfs.c <----+-----------> +-- mpam_resctrl.c ... In this way can we build a clearer framework for MPAM usage. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Some resource's properities such as closid and rmid are exported like Intel-RDT in our resctrl design, but there also has two main differences, one is MB(Memory Bandwidth), for we MB is also divided into two directories MB and MB_MON to show respective properties about control and monitor type as same as LxCache, another is we adopt features sysfile under resources' directories, which indicates the properties of control type of corresponding resource, for instance MB hardlimit. e.g. > mount -t resctrl resctrl /sys/fs/resctrl -o mbHdl > cd /sys/fs/resctrl/ && cat info/MB/features mbHdl@1 #indicate MBHDL setting's upper bound is 1 > cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MB:0=100;1=100;2=100;3=100 MBHDL:0=1;1=1;2=1;3=1 Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Structure resctrl_ctrl_feature taken by resources is introduced to manage ctrl features, of which characteristic like max width from outer input and the base we parse from. Now it is more practical for declaring a new ctrl feature, such as SCHEMA_PRI feature, only associated with internal priority setting exported by mpam devices, where informations is collected from mpam_resctrl_resource_init(), and next be chosen open or close by user options. ctrl_ctrl_feature structure contains a flags field to avoid duplicated control type, for instance, SCHEMA_COMM feature selectes cpbm (Cache portion bitmap) as resource Cache default control type, so we should not enable this feature no longer if user manually selectes cpbm control type through mount options. This field evt in ctrl_ctrl_feature structure is enum rdt_event_id type variable which works like eee4ad2a36e6 ("arm64/mpam: Add hook-events id for ctrl features") illustrates. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- For MPAM, a rmid can do monitoring work only with a monitor resource allocated, we adopt a mechanism for monitor resource dynamic allocation and recycling, it is different from Intel-RDT operation who creates a kworker thread for dynamically monitoring Cache usage and checks if it is below a threshold adjustable for rmid free, for we have detected that this method will affect the cpu utilization in many cases, sometimes this influence cannot be accepted. Our method is simple, as different resource's monitor number varies, we deliever two list, one for storing rmids which has exclusive monitor resource and another for storing this rmids which have monitor resource shared, this shared monitor id always be 0. it works like this, if a new rmid apply for a resource monitor which is in used, then we put this rmid to the tail of latter list and temporarily give a default monitor id 0 util someone releases available monitor resource, if this new rmid has all resources' monitor resource needed, then it will be put into exclusive list. This implements the LRU allocation of monitor resources and give users part control rights of allocation and release, if resctrl group's quantity can be guaranteed or user don't need monitoring too many groups synchronously, this is a more appropriate way for user deployment, not only that, also can it avoid the risk of inaccuracy in monitoring when monitoring operation happen to too many groups at the same time. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- So far we use sd_closid, including {reqpartid, intpartid}, to label each resctrl group including ctrlgroup and mongroup, This can perfectly handle this case where number of reqpartid exceeds intpartid, this always happen when intpartid narrowing supported, otherwise their two are of same number. So we use excessive reqpartid to indicate (1)- how configurations can be synchronized from the configuration indexed by intpartid, not only that, (2)- take part of monitor role. But reqpartid in (2) with pmg still be scattered, So far we have not yet a right way to explain how can we use their two properly. In order to ensure their resources can be fully utilized, and given this idea from Intel-RDT's design which uses rmid for monitoring, a rmid remap matrix is delivered for transforming partid and pmg to rmid, this matrix is organized like this: [bitmap entry indexed by partid] [col pos is partid] [0] [1] [2] [3] [4] [5] occ->bitmap[:0] 1 0 0 1 1 1 bitmap[:1] 1 0 0 1 1 1 bitmap[:2] 1 1 1 1 1 1 bitmap[:3] 1 1 1 1 1 1 [row pos-1 is pmg] Calculate rmid = partid + NR_partid * pmg occ represents if this bitmap has been used by a partid, it is because a certain partid should not be accompany with a duplicated pmg for monitoring, this design easily saves a lot of space, and can also decrease time complexity of allocating and free rmid process from O(NR_partid)* O(NR_pmg) to O(NR_partid) + O(log(NR_pmg)) compared with using list. By this way, we get a continuous rmid set with upper bound(NR_pmg * NR_partid - 1), given an rmid we can assume that if it's a valid rmid by judging whether it falls within this range or not. rmid implicts the reqpartid info, so we can use relevant helpers to get this reqpartid for sd_closid@reqpartid and perfectly accomplish this configuration sync mission, this also makes closid simpler which can be consists of intpartid index only, also each resctrl group is happy to own consecutive rmid. This also has some profound influences, for instance for MPAM there also support SMMU io using partid and pmg, we can use a single helper mpam_rmid_to_partid_pmg() in SMMU driver to complete this remap process for rmid input from outside user space. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-