- 12 11月, 2021 40 次提交
-
-
由 Eric Dumazet 提交于
stable inclusion from linux-4.19.207 commit 44ba281510190e2915506016407ba5b374a3add2 -------------------------------- commit 04f08eb4 upstream. syzbot reported another data-race in af_unix [1] Lets change __skb_insert() to use WRITE_ONCE() when changing skb head qlen. Also, change unix_dgram_poll() to use lockless version of unix_recvq_full() It is verry possible we can switch all/most unix_recvq_full() to the lockless version, this will be done in a future kernel version. [1] HEAD commit: 8596e589 BUG: KCSAN: data-race in skb_queue_tail / unix_dgram_poll write to 0xffff88814eeb24e0 of 4 bytes by task 25815 on cpu 0: __skb_insert include/linux/skbuff.h:1938 [inline] __skb_queue_before include/linux/skbuff.h:2043 [inline] __skb_queue_tail include/linux/skbuff.h:2076 [inline] skb_queue_tail+0x80/0xa0 net/core/skbuff.c:3264 unix_dgram_sendmsg+0xff2/0x1600 net/unix/af_unix.c:1850 sock_sendmsg_nosec net/socket.c:703 [inline] sock_sendmsg net/socket.c:723 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392 ___sys_sendmsg net/socket.c:2446 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2532 __do_sys_sendmmsg net/socket.c:2561 [inline] __se_sys_sendmmsg net/socket.c:2558 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2558 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88814eeb24e0 of 4 bytes by task 25834 on cpu 1: skb_queue_len include/linux/skbuff.h:1869 [inline] unix_recvq_full net/unix/af_unix.c:194 [inline] unix_dgram_poll+0x2bc/0x3e0 net/unix/af_unix.c:2777 sock_poll+0x23e/0x260 net/socket.c:1288 vfs_poll include/linux/poll.h:90 [inline] ep_item_poll fs/eventpoll.c:846 [inline] ep_send_events fs/eventpoll.c:1683 [inline] ep_poll fs/eventpoll.c:1798 [inline] do_epoll_wait+0x6ad/0xf00 fs/eventpoll.c:2226 __do_sys_epoll_wait fs/eventpoll.c:2238 [inline] __se_sys_epoll_wait fs/eventpoll.c:2233 [inline] __x64_sys_epoll_wait+0xf6/0x120 fs/eventpoll.c:2233 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x0000001b -> 0x00000001 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 25834 Comm: syz-executor.1 Tainted: G W 5.14.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 86b18aaa ("skbuff: fix a data race in skb_queue_len()") Cc: Qian Cai <cai@lca.pw> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Baptiste Lepers 提交于
stable inclusion from linux-4.19.207 commit c09a84aea0d3902f955cc6504e1c25cddb7c48c2 -------------------------------- commit b89a05b2 upstream. In perf_event_addr_filters_apply, the task associated with the event (event->ctx->task) is read using READ_ONCE at the beginning of the function, checked, and then re-read from event->ctx->task, voiding all guarantees of the checks. Reuse the value that was read by READ_ONCE to ensure the consistency of the task struct throughout the function. Fixes: 375637bc ("perf/core: Introduce address range filtering") Signed-off-by: NBaptiste Lepers <baptiste.lepers@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210906015310.12802-1-baptiste.lepers@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Mike Rapoport 提交于
stable inclusion from linux-4.19.207 commit 2717db72f74c8e51068801b6327559241c54b86e -------------------------------- commit 34b1999d upstream. Jiri Olsa reported a fault when running: # cat /proc/kallsyms | grep ksys_read ffffffff8136d580 T ksys_read # objdump -d --start-address=0xffffffff8136d580 --stop-address=0xffffffff8136d590 /proc/kcore /proc/kcore: file format elf64-x86-64 Segmentation fault general protection fault, probably for non-canonical address 0xf887ffcbff000: 0000 [#1] SMP PTI CPU: 12 PID: 1079 Comm: objdump Not tainted 5.14.0-rc5qemu+ #508 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:kern_addr_valid Call Trace: read_kcore ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? trace_hardirqs_on ? rcu_read_lock_sched_held ? lock_acquire ? lock_acquire ? rcu_read_lock_sched_held ? lock_acquire ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? rcu_read_lock_sched_held ? lock_release ? _raw_spin_unlock ? __handle_mm_fault ? rcu_read_lock_sched_held ? lock_acquire ? rcu_read_lock_sched_held ? lock_release proc_reg_read ? vfs_read vfs_read ksys_read do_syscall_64 entry_SYSCALL_64_after_hwframe The fault happens because kern_addr_valid() dereferences existent but not present PMD in the high kernel mappings. Such PMDs are created when free_kernel_image_pages() frees regions larger than 2Mb. In this case, a part of the freed memory is mapped with PMDs and the set_memory_np_noalias() -> ... -> __change_page_attr() sequence will mark the PMD as not present rather than wipe it completely. Have kern_addr_valid() check whether higher level page table entries are present before trying to dereference them to fix this issue and to avoid similar issues in the future. Stable backporting note: ------------------------ Note that the stable marking is for all active stable branches because there could be cases where pagetable entries exist but are not valid - see 9a14aefc ("x86: cpa, fix lookup_address"), for example. So make sure to be on the safe side here and use pXY_present() accessors rather than pXY_none() which could #GP when accessing pages in the direct map. Also see: c40a56a7 ("x86/mm/init: Remove freed kernel image areas from alias mapping") for more info. Reported-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NDave Hansen <dave.hansen@intel.com> Tested-by: NJiri Olsa <jolsa@redhat.com> Cc: <stable@vger.kernel.org> # 4.4+ Link: https://lkml.kernel.org/r/20210819132717.19358-1-rppt@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Mark Brown 提交于
stable inclusion from linux-4.19.207 commit aadf3115f86c6df4308e2734b5c39eadbe9573e0 -------------------------------- commit e35ac9d0 upstream. When we need a buffer for SVE register state we call sve_alloc() to make sure that one is there. In order to avoid repeated allocations and frees we keep the buffer around unless we change vector length and just memset() it to ensure a clean register state. The function that deals with this takes the task to operate on as an argument, however in the case where we do a memset() we initialise using the SVE state size for the current task rather than the task passed as an argument. This is only an issue in the case where we are setting the register state for a task via ptrace and the task being configured has a different vector length to the task tracing it. In the case where the buffer is larger in the traced process we will leak old state from the traced process to itself, in the case where the buffer is smaller in the traced process we will overflow the buffer and corrupt memory. Fixes: bc0ee476 ("arm64/sve: Core task context handling") Cc: <stable@vger.kernel.org> # 4.15.x Signed-off-by: NMark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210909165356.10675-1-broonie@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Liu Zixian 提交于
stable inclusion from linux-4.19.207 commit 2fed7f8eda3211190a27caddd6ba8fd728f7b17b -------------------------------- commit 13db8c50 upstream. After fork, the child process will get incorrect (2x) hugetlb_usage. If a process uses 5 2MB hugetlb pages in an anonymous mapping, HugetlbPages: 10240 kB and then forks, the child will show, HugetlbPages: 20480 kB The reason for double the amount is because hugetlb_usage will be copied from the parent and then increased when we copy page tables from parent to child. Child will have 2x actual usage. Fix this by adding hugetlb_count_init in mm_init. Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com Fixes: 5d317b2b ("mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status") Signed-off-by: NLiu Zixian <liuzixian4@huawei.com> Reviewed-by: NNaoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Maciej W. Rozycki 提交于
stable inclusion from linux-4.19.207 commit 33b8fcdedbde2b2a508c3d78d4bb34e4067391a0 -------------------------------- commit 44d01fc8 upstream. Update BusLogic driver's messaging system to use pr_cont() for continuation lines, bringing messy output: pci 0000:00:13.0: PCI->APIC IRQ transform: INT A -> IRQ 17 scsi: ***** BusLogic SCSI Driver Version 2.1.17 of 12 September 2013 ***** scsi: Copyright 1995-1998 by Leonard N. Zubkoff <lnz@dandelion.com> scsi0: Configuring BusLogic Model BT-958 PCI Wide Ultra SCSI Host Adapter scsi0: Firmware Version: 5.07B, I/O Address: 0x7000, IRQ Channel: 17/Level scsi0: PCI Bus: 0, Device: 19, Address: 0xE0012000, Host Adapter SCSI ID: 7 scsi0: Parity Checking: Enabled, Extended Translation: Enabled scsi0: Synchronous Negotiation: Ultra, Wide Negotiation: Enabled scsi0: Disconnect/Reconnect: Enabled, Tagged Queuing: Enabled scsi0: Scatter/Gather Limit: 128 of 8192 segments, Mailboxes: 211 scsi0: Driver Queue Depth: 211, Host Adapter Queue Depth: 192 scsi0: Tagged Queue Depth: Automatic , Untagged Queue Depth: 3 scsi0: SCSI Bus Termination: Both Enabled , SCAM: Disabled scsi0: *** BusLogic BT-958 Initialized Successfully *** scsi host0: BusLogic BT-958 back to order: pci 0000:00:13.0: PCI->APIC IRQ transform: INT A -> IRQ 17 scsi: ***** BusLogic SCSI Driver Version 2.1.17 of 12 September 2013 ***** scsi: Copyright 1995-1998 by Leonard N. Zubkoff <lnz@dandelion.com> scsi0: Configuring BusLogic Model BT-958 PCI Wide Ultra SCSI Host Adapter scsi0: Firmware Version: 5.07B, I/O Address: 0x7000, IRQ Channel: 17/Level scsi0: PCI Bus: 0, Device: 19, Address: 0xE0012000, Host Adapter SCSI ID: 7 scsi0: Parity Checking: Enabled, Extended Translation: Enabled scsi0: Synchronous Negotiation: Ultra, Wide Negotiation: Enabled scsi0: Disconnect/Reconnect: Enabled, Tagged Queuing: Enabled scsi0: Scatter/Gather Limit: 128 of 8192 segments, Mailboxes: 211 scsi0: Driver Queue Depth: 211, Host Adapter Queue Depth: 192 scsi0: Tagged Queue Depth: Automatic, Untagged Queue Depth: 3 scsi0: SCSI Bus Termination: Both Enabled, SCAM: Disabled scsi0: *** BusLogic BT-958 Initialized Successfully *** scsi host0: BusLogic BT-958 Also diagnostic output such as with the BusLogic=TraceConfiguration parameter is affected and becomes vertical and therefore hard to read. This has now been corrected, e.g.: pci 0000:00:13.0: PCI->APIC IRQ transform: INT A -> IRQ 17 blogic_cmd(86) Status = 30: 4 ==> 4: FF 05 93 00 blogic_cmd(95) Status = 28: (Modify I/O Address) blogic_cmd(91) Status = 30: 1 ==> 1: 01 blogic_cmd(04) Status = 30: 4 ==> 4: 41 41 35 30 blogic_cmd(8D) Status = 30: 14 ==> 14: 45 DC 00 20 00 00 00 00 00 40 30 37 42 1D scsi: ***** BusLogic SCSI Driver Version 2.1.17 of 12 September 2013 ***** scsi: Copyright 1995-1998 by Leonard N. Zubkoff <lnz@dandelion.com> blogic_cmd(04) Status = 30: 4 ==> 4: 41 41 35 30 blogic_cmd(0B) Status = 30: 3 ==> 3: 00 08 07 blogic_cmd(0D) Status = 30: 34 ==> 34: 03 01 07 04 00 00 00 00 00 00 00 00 00 00 00 00 FF 42 44 46 FF 00 00 00 00 00 00 00 00 00 FF 00 FF 00 blogic_cmd(8D) Status = 30: 14 ==> 14: 45 DC 00 20 00 00 00 00 00 40 30 37 42 1D blogic_cmd(84) Status = 30: 1 ==> 1: 37 blogic_cmd(8B) Status = 30: 5 ==> 5: 39 35 38 20 20 blogic_cmd(85) Status = 30: 1 ==> 1: 42 blogic_cmd(86) Status = 30: 4 ==> 4: FF 05 93 00 blogic_cmd(91) Status = 30: 64 ==> 64: 41 46 3E 20 39 35 38 20 20 00 C4 00 04 01 07 2F 07 04 35 FF FF FF FF FF FF FF FF FF FF 01 00 FE FF 08 FF FF 00 00 00 00 00 00 00 01 00 01 00 00 FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 FC scsi0: Configuring BusLogic Model BT-958 PCI Wide Ultra SCSI Host Adapter etc. Link: https://lore.kernel.org/r/alpine.DEB.2.21.2104201940430.44318@angie.orcam.me.uk Fixes: 4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines") Cc: stable@vger.kernel.org # v4.9+ Acked-by: NKhalid Aziz <khalid@gonehiking.org> Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 chenying 提交于
stable inclusion from linux-4.19.207 commit 1bc41e47e0ee05367b2efbdd95bdc2d4e7fcc164 -------------------------------- commit 52d5a0c6 upstream. If function ovl_instantiate() returns an error, ovl_cleanup will be called and try to remove newdentry from wdir, but the newdentry has been moved to udir at this time. This will causes BUG_ON(victim->d_parent->d_inode != dir) in fs/namei.c:may_delete. Signed-off-by: Nchenying <chenying.kernel@bytedance.com> Fixes: 01b39dcc ("ovl: use inode_insert5() to hash a newly created inode") Link: https://lore.kernel.org/linux-unionfs/e6496a94-a161-dc04-c38a-d2544633acb4@bytedance.com/ Cc: <stable@vger.kernel.org> # v4.18 Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Ding Hui 提交于
stable inclusion from linux-4.19.207 commit 1c409595b46bc405e9d4761d449d9307e009fc6f -------------------------------- [ Upstream commit d72c7419 ] smb_buf is allocated by small_smb_init_no_tc(), and buf type is CIFS_SMALL_BUFFER, so we should use cifs_small_buf_release() to release it in failed path. Signed-off-by: NDing Hui <dinghui@sangfor.com.cn> Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: NSteve French <stfrench@microsoft.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Yufeng Mo 提交于
stable inclusion from linux-4.19.207 commit 25a35dd83627d33ff4304b656f17928efa339a09 -------------------------------- [ Upstream commit 220ade77 ] Some time ago, I reported a calltrace issue "did not find a suitable aggregator", please see[1]. After a period of analysis and reproduction, I find that this problem is caused by concurrency. Before the problem occurs, the bond structure is like follows: bond0 - slaver0(eth0) - agg0.lag_ports -> port0 - port1 \ port0 \ slaver1(eth1) - agg1.lag_ports -> NULL \ port1 If we run 'ifenslave bond0 -d eth1', the process is like below: excuting __bond_release_one() | bond_upper_dev_unlink()[step1] | | | | | bond_3ad_lacpdu_recv() | | ->bond_3ad_rx_indication() | | spin_lock_bh() | | ->ad_rx_machine() | | ->__record_pdu()[step2] | | spin_unlock_bh() | | | | bond_3ad_state_machine_handler() | spin_lock_bh() | ->ad_port_selection_logic() | ->try to find free aggregator[step3] | ->try to find suitable aggregator[step4] | ->did not find a suitable aggregator[step5] | spin_unlock_bh() | | | | bond_3ad_unbind_slave() | spin_lock_bh() spin_unlock_bh() step1: already removed slaver1(eth1) from list, but port1 remains step2: receive a lacpdu and update port0 step3: port0 will be removed from agg0.lag_ports. The struct is "agg0.lag_ports -> port1" now, and agg0 is not free. At the same time, slaver1/agg1 has been removed from the list by step1. So we can't find a free aggregator now. step4: can't find suitable aggregator because of step2 step5: cause a calltrace since port->aggregator is NULL To solve this concurrency problem, put bond_upper_dev_unlink() after bond_3ad_unbind_slave(). In this way, we can invalid the port first and skip this port in bond_3ad_state_machine_handler(). This eliminates the situation that the slaver has been removed from the list but the port is still valid. [1]https://lore.kernel.org/netdev/10374.1611947473@famine/Signed-off-by: NYufeng Mo <moyufeng@huawei.com> Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Rafael J. Wysocki 提交于
stable inclusion from linux-4.19.207 commit 6d941bd6366a08bf5c660dbd4f8345427d56aefe -------------------------------- [ Upstream commit 14858dcc ] Updating the current_state field of struct pci_dev the way it is done in pci_enable_device_flags() before calling do_pci_enable_device() may not work. For example, if the given PCI device depends on an ACPI power resource whose _STA method initially returns 0 ("off"), but the config space of the PCI device is accessible and the power state retrieved from the PCI_PM_CTRL register is D0, the current_state field in the struct pci_dev representing that device will get out of sync with the power.state of its ACPI companion object and that will lead to power management issues going forward. To avoid such issues, make pci_enable_device_flags() call pci_update_current_state() which takes ACPI device power management into account, if present, to retrieve the current power state of the device. Link: https://lore.kernel.org/lkml/20210314000439.3138941-1-luzmaximilian@gmail.com/Reported-by: NMaximilian Luz <luzmaximilian@gmail.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NMaximilian Luz <luzmaximilian@gmail.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Nadav Amit 提交于
stable inclusion from linux-4.19.207 commit e826df5864e449f89c2d056aab998c813d40c932 -------------------------------- [ Upstream commit 22e5fe2a ] userfaultfd assumes that the enabled features are set once and never changed after UFFDIO_API ioctl succeeded. However, currently, UFFDIO_API can be called concurrently from two different threads, succeed on both threads and leave userfaultfd's features in non-deterministic state. Theoretically, other uffd operations (ioctl's and page-faults) can be dispatched while adversely affected by such changes of features. Moreover, the writes to ctx->state and ctx->features are not ordered, which can - theoretically, again - let userfaultfd_ioctl() think that userfaultfd API completed, while the features are still not initialized. To avoid races, it is arguably best to get rid of ctx->state. Since there are only 2 states, record the API initialization in ctx->features as the uppermost bit and remove ctx->state. Link: https://lkml.kernel.org/r/20210808020724.1022515-3-namit@vmware.com Fixes: 9cd75c3c ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor") Signed-off-by: NNadav Amit <namit@vmware.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Krzysztof Wilczyński 提交于
stable inclusion from linux-4.19.207 commit 60242386c7ef9b084fc138e3d2b7bb99042fd779 -------------------------------- commit a8bd29bd upstream. The pciconfig_read() syscall reads PCI configuration space using hardware-dependent config accessors. If the read fails on PCI, most accessors don't return an error; they pretend the read was successful and got ~0 data from the device, so the syscall returns success with ~0 data in the buffer. When the accessor does return an error, pciconfig_read() normally fills the user's buffer with ~0 and returns an error in errno. But after e4585da2 ("pci syscall.c: Switch to refcounting API"), we don't fill the buffer with ~0 for the EPERM "user lacks CAP_SYS_ADMIN" error. Userspace may rely on the ~0 data to detect errors, but after e4585da2, that would not detect CAP_SYS_ADMIN errors. Restore the original behaviour of filling the buffer with ~0 when the CAP_SYS_ADMIN check fails. [bhelgaas: commit log, fold in Nathan's fix https://lore.kernel.org/r/20210803200836.500658-1-nathan@kernel.org] Fixes: e4585da2 ("pci syscall.c: Switch to refcounting API") Link: https://lore.kernel.org/r/20210729233755.1509616-1-kw@linux.comSigned-off-by: NKrzysztof Wilczyński <kw@linux.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Damien Le Moal 提交于
stable inclusion from linux-4.19.207 commit 35a276f173356448cd07d83238d2bbc30796b267 -------------------------------- commit a680dd72 upstream. For a request that has a priority level equal to or larger than IOPRIO_BE_NR, bfq_set_next_ioprio_data() prints a critical warning but defaults to setting the request new_ioprio field to IOPRIO_BE_NR. This is not consistent with the warning and the allowed values for priority levels. Fix this by setting the request new_ioprio field to IOPRIO_BE_NR - 1, the lowest priority level allowed. Cc: <stable@vger.kernel.org> Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210811033702.368488-2-damien.lemoal@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Mark Rutland 提交于
stable inclusion from linux-4.19.207 commit 79d15fc5646f79e769b90957370a84c0ba981c85 -------------------------------- commit 90268574 upstream. The `compute_indices` and `populate_entries` macros operate on inclusive bounds, and thus the `map_memory` macro which uses them also operates on inclusive bounds. We pass `_end` and `_idmap_text_end` to `map_memory`, but these are exclusive bounds, and if one of these is sufficiently aligned (as a result of kernel configuration, physical placement, and KASLR), then: * In `compute_indices`, the computed `iend` will be in the page/block *after* the final byte of the intended mapping. * In `populate_entries`, an unnecessary entry will be created at the end of each level of table. At the leaf level, this entry will map up to SWAPPER_BLOCK_SIZE bytes of physical addresses that we did not intend to map. As we may map up to SWAPPER_BLOCK_SIZE bytes more than intended, we may violate the boot protocol and map physical address past the 2MiB-aligned end address we are permitted to map. As we map these with Normal memory attributes, this may result in further problems depending on what these physical addresses correspond to. The final entry at each level may require an additional table at that level. As EARLY_ENTRIES() calculates an inclusive bound, we allocate enough memory for this. Avoid the extraneous mapping by having map_memory convert the exclusive end address to an inclusive end address by subtracting one, and do likewise in EARLY_ENTRIES() when calculating the number of required tables. For clarity, comments are updated to more clearly document which boundaries the macros operate on. For consistency with the other macros, the comments in map_memory are also updated to describe `vstart` and `vend` as virtual addresses. Fixes: 0370b31e ("arm64: Extend early page table code to allow for larger kernels") Cc: <stable@vger.kernel.org> # 4.16.x Signed-off-by: NMark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: NWill Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210823101253.55567-1-mark.rutland@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Daniel Borkmann 提交于
stable inclusion from linux-4.19.207 commit a386fceed74e9f1125104eea7e66cff187edeff7 -------------------------------- commit e042aa53 upstream. In 7fedb63a ("bpf: Tighten speculative pointer arithmetic mask") we narrowed the offset mask for unprivileged pointer arithmetic in order to mitigate a corner case where in the speculative domain it is possible to advance, for example, the map value pointer by up to value_size-1 out-of- bounds in order to leak kernel memory via side-channel to user space. The verifier's state pruning for scalars leaves one corner case open where in the first verification path R_x holds an unknown scalar with an aux->alu_limit of e.g. 7, and in a second verification path that same register R_x, here denoted as R_x', holds an unknown scalar which has tighter bounds and would thus satisfy range_within(R_x, R_x') as well as tnum_in(R_x, R_x') for state pruning, yielding an aux->alu_limit of 3: Given the second path fits the register constraints for pruning, the final generated mask from aux->alu_limit will remain at 7. While technically not wrong for the non-speculative domain, it would however be possible to craft similar cases where the mask would be too wide as in 7fedb63a. One way to fix it is to detect the presence of unknown scalar map pointer arithmetic and force a deeper search on unknown scalars to ensure that we do not run into a masking mismatch. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> [OP: adjusted context for 4.19] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Lorenz Bauer 提交于
stable inclusion from linux-4.19.207 commit a4fe956b03316516ce0eca037d480d270b9bed4c -------------------------------- commit c9e73e3d upstream. func_states_equal makes a very short lived allocation for idmap, probably because it's too large to fit on the stack. However the function is called quite often, leading to a lot of alloc / free churn. Replace the temporary allocation with dedicated scratch space in struct bpf_verifier_env. Signed-off-by: NLorenz Bauer <lmb@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NEdward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/bpf/20210429134656.122225-4-lmb@cloudflare.com [OP: adjusted context for 4.19] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Alexei Starovoitov 提交于
stable inclusion from linux-4.19.207 commit fc578c64398df4f93fd7a6143e218281cc7c4348 -------------------------------- commit fc559a70 upstream. fix tests that incorrectly assumed that the verifier cannot track constants through stack. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> [OP: backport to 4.19] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ignatov 提交于
stable inclusion from linux-4.19.207 commit 2a28675d4b5cff2bfa0b3475c53edba1f5ec8401 -------------------------------- commit 8ff80e96 upstream. Test different scenarios of indirect variable-offset stack access: out of bound access (>0), min_off below initialized part of the stack, max_off+size above initialized part of the stack, initialized stack. Example of output: ... #856/p indirect variable-offset stack access, out of bound OK #857/p indirect variable-offset stack access, max_off+size > max_initialized OK #858/p indirect variable-offset stack access, min_off < min_initialized OK #859/p indirect variable-offset stack access, ok OK ... Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> [OP: backport to 4.19] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ignatov 提交于
stable inclusion from linux-4.19.207 commit 7667818ef188832f69e2cf9cfed56e03a8f7436b -------------------------------- stable inclusion from linux-4.19.207 commit 7667818ef188832f69e2cf9cfed56e03a8f7436b -------------------------------- commit 107c26a7 upstream. As discussed in [1] max value of variable offset has to be checked for overflow on stack access otherwise verifier would accept code like this: 0: (b7) r2 = 6 1: (b7) r3 = 28 2: (7a) *(u64 *)(r10 -16) = 0 3: (7a) *(u64 *)(r10 -8) = 0 4: (79) r4 = *(u64 *)(r1 +168) 5: (c5) if r4 s< 0x0 goto pc+4 R1=ctx(id=0,off=0,imm=0) R2=inv6 R3=inv28 R4=inv(id=0,umax_value=9223372036854775807,var_off=(0x0; 0x7fffffffffffffff)) R10=fp0,call_-1 fp-8=mmmmmmmm fp-16=mmmmmmmm 6: (17) r4 -= 16 7: (0f) r4 += r10 8: (b7) r5 = 8 9: (85) call bpf_getsockopt#57 10: (b7) r0 = 0 11: (95) exit , where R4 obviosly has unbounded max value. Fix it by checking that reg->smax_value is inside (-BPF_MAX_VAR_OFF; BPF_MAX_VAR_OFF) range. reg->smax_value is used instead of reg->umax_value because stack pointers are calculated using negative offset from fp. This is opposite to e.g. map access where offset must be non-negative and where umax_value is used. Also dedicated verbose logs are added for both min and max bound check failures to have diagnostics consistent with variable offset handling in check_map_access(). [1] https://marc.info/?l=linux-netdev&m=155433357510597&w=2 Fixes: 2011fccf ("bpf: Support variable offset stack access from helpers") Reported-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ignatov 提交于
stable inclusion from linux-4.19.207 commit 14cf676ba6a0fb5e495baf843d750070f627d7e8 -------------------------------- commit 088ec26d upstream. Proper support of indirect stack access with variable offset in unprivileged mode (!root) requires corresponding support in Spectre masking for stack ALU in retrieve_ptr_limit(). There are no use-case for variable offset in unprivileged mode though so make verifier reject such accesses for simplicity. Pointer arithmetics is one (and only?) way to cause variable offset and it's already rejected in unpriv mode so that verifier won't even get to helper function whose argument contains variable offset, e.g.: 0: (7a) *(u64 *)(r10 -16) = 0 1: (7a) *(u64 *)(r10 -8) = 0 2: (61) r2 = *(u32 *)(r1 +0) 3: (57) r2 &= 4 4: (17) r2 -= 16 5: (0f) r2 += r10 variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16 size=1R2 stack pointer arithmetic goes out of range, prohibited for !root Still it looks like a good idea to reject variable offset indirect stack access for unprivileged mode in check_stack_boundary() explicitly. Fixes: 2011fccf ("bpf: Support variable offset stack access from helpers") Reported-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> [OP: drop comment in retrieve_ptr_limit()] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ignatov 提交于
stable inclusion from linux-4.19.207 commit 148339cb18ed38a4b1de2269d1a4227fd51376ef -------------------------------- commit f2bcd05e upstream. It's hard to guarantee that whole memory is marked as initialized on helper return if uninitialized stack is accessed with variable offset since specific bounds are unknown to verifier. This may cause uninitialized stack leaking. Reject such an access in check_stack_boundary to prevent possible leaking. There are no known use-cases for indirect uninitialized stack access with variable offset so it shouldn't break anything. Fixes: 2011fccf ("bpf: Support variable offset stack access from helpers") Reported-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ignatov 提交于
stable inclusion from linux-4.19.207 commit caee4103f09e3dcce2ec0d0faf6ff245a6cffbad -------------------------------- commit 2011fccf upstream. Currently there is a difference in how verifier checks memory access for helper arguments for PTR_TO_MAP_VALUE and PTR_TO_STACK with regard to variable part of offset. check_map_access, that is used for PTR_TO_MAP_VALUE, can handle variable offsets just fine, so that BPF program can call a helper like this: some_helper(map_value_ptr + off, size); , where offset is unknown at load time, but is checked by program to be in a safe rage (off >= 0 && off + size < map_value_size). But it's not the case for check_stack_boundary, that is used for PTR_TO_STACK, and same code with pointer to stack is rejected by verifier: some_helper(stack_value_ptr + off, size); For example: 0: (7a) *(u64 *)(r10 -16) = 0 1: (7a) *(u64 *)(r10 -8) = 0 2: (61) r2 = *(u32 *)(r1 +0) 3: (57) r2 &= 4 4: (17) r2 -= 16 5: (0f) r2 += r10 6: (18) r1 = 0xffff888111343a80 8: (85) call bpf_map_lookup_elem#1 invalid variable stack read R2 var_off=(0xfffffffffffffff0; 0x4) Add support for variable offset access to check_stack_boundary so that if offset is checked by program to be in a safe range it's accepted by verifier. Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> [OP: replace reg_state(env, regno) helper with "cur_regs(env) + regno"] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: kernel/bpf/verifier.c [yyl: adjust context] Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jiong Wang 提交于
stable inclusion from linux-4.19.207 commit 79aba0ac3df1a604e843780b17c37646e175b4f8 -------------------------------- commit 0bae2d4d upstream. Verifier is supposed to support sharing stack slot allocated to ptr with SCALAR_VALUE for privileged program. However this doesn't happen for some cases. The reason is verifier is not clearing slot_type STACK_SPILL for all bytes, it only clears part of them, while verifier is using: slot_type[0] == STACK_SPILL as a convention to check one slot is ptr type. So, the consequence of partial clearing slot_type is verifier could treat a partially overridden ptr slot, which should now be a SCALAR_VALUE slot, still as ptr slot, and rejects some valid programs. Before this patch, test_xdp_noinline.o under bpf selftests, bpf_lxc.o and bpf_netdev.o under Cilium bpf repo, when built with -mattr=+alu32 are rejected due to this issue. After this patch, they all accepted. There is no processed insn number change before and after this patch on Cilium bpf programs. Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NJiong Wang <jiong.wang@netronome.com> Reviewed-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> [OP: adjusted context for 4.19] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
stable inclusion from linux-4.19.207 commit 34b23fc32c05f14fd21caa32544f3720e1527a8d -------------------------------- commit 1a519dc7 upstream. When running as Xen PV guest, masking MSI-X is a responsibility of the hypervisor. The guest has no write access to the relevant BAR at all - when it tries to, it results in a crash like this: BUG: unable to handle page fault for address: ffffc9004069100c #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0 e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e] e1000_probe+0x41f/0xdb0 [e1000e] local_pci_probe+0x42/0x80 (...) The recently introduced function msix_mask_all() does not check the global variable pci_msi_ignore_mask which is set by XEN PV to bypass the masking of MSI[-X] interrupts. Add the check to make this function XEN PV compatible. Fixes: 7d5ec3d3 ("PCI/MSI: Mask all unused MSI-X entries") Signed-off-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210826170342.135172-1-marmarek@invisiblethingslab.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Nguyen Dinh Phi 提交于
stable inclusion from linux-4.19.207 commit f7bffefa322a3d5a292c0b7a9b93302b392928f6 -------------------------------- commit bb2853a6 upstream. The ops->receive_buf() may be accessed concurrently from these two functions. If the driver flushes data to the line discipline receive_buf() method while tiocsti() is waiting for the ops->receive_buf() to finish its work, the data race will happen. For example: tty_ioctl |tty_ldisc_receive_buf ->tioctsi | ->tty_port_default_receive_buf | ->tty_ldisc_receive_buf ->hci_uart_tty_receive | ->hci_uart_tty_receive ->h4_recv | ->h4_recv In this case, the h4 receive buffer will be overwritten by the latecomer, and we will lost the data. Hence, change tioctsi() function to use the exclusive lock interface from tty_buffer to avoid the data race. Reported-by: syzbot+97388eb9d31b997fe1d0@syzkaller.appspotmail.com Reviewed-by: NJiri Slaby <jirislaby@kernel.org> Signed-off-by: NNguyen Dinh Phi <phind.uet@gmail.com> Link: https://lore.kernel.org/r/20210823000641.2082292-1-phind.uet@gmail.com Cc: stable <stable@vger.kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Xiyu Yang 提交于
stable inclusion from linux-4.19.207 commit 5129766a9797ea212087314fafc68268f1bf8515 -------------------------------- [ Upstream commit c6607012 ] The reference counting issue happens in one exception handling path of cbq_change_class(). When failing to get tcf_block, the function forgets to decrease the refcount of "rtab" increased by qdisc_put_rtab(), causing a refcount leak. Fix this issue by jumping to "failure" label when get tcf_block failed. Fixes: 6529eaba ("net: sched: introduce tcf block infractructure") Signed-off-by: NXiyu Yang <xiyuyang19@fudan.edu.cn> Reviewed-by: NCong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/1630252681-71588-1-git-send-email-xiyuyang19@fudan.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andy Duan 提交于
stable inclusion from linux-4.19.207 commit 600624ecbe35e0359806b107f247811b1f79ad25 -------------------------------- [ Upstream commit d5c38948 ] Register offset needs to be applied on mapbase also. dma_tx/rx_request use the physical address of UARTDATA. Register offset is currently only applied to membase (the corresponding virtual addr) but not on mapbase. Fixes: 24b1e5f0 ("tty: serial: lpuart: add imx7ulp support") Reviewed-by: NLeonard Crestez <leonard.crestez@nxp.com> Signed-off-by: NAdriana Reus <adriana.reus@nxp.com> Signed-off-by: NSherry Sun <sherry.sun@nxp.com> Signed-off-by: NAndy Duan <fugang.duan@nxp.com> Link: https://lore.kernel.org/r/20210819021033.32606-1-sherry.sun@nxp.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Len Baker 提交于
stable inclusion from linux-4.19.207 commit bea655491daf39f1934a71bf576bf3499092d3a4 -------------------------------- [ Upstream commit f980d055 ] strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated. Also, the strnlen() call does not avoid the read overflow in the strlcpy function when a not NUL-terminated string is passed. So, replace this block by a call to kstrndup() that avoids this type of overflow and does the same. Fixes: 066ce689 ("cifs: rename cifs_strlcpy_to_host and make it use new functions") Signed-off-by: NLen Baker <len.baker@gmx.com> Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NSteve French <stfrench@microsoft.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Rafael J. Wysocki 提交于
stable inclusion from linux-4.19.207 commit d4bcd29b50d1a33111d3bacd7974f0d0bbec484f -------------------------------- [ Upstream commit 0e00392a ] PME signaling is only enabled by __pci_enable_wake() if the target device can signal PME from the given target power state (to avoid pointless reconfiguration of the device), but if the hierarchy above the device goes into D3cold, the device itself will end up in D3cold too, so if it can signal PME from D3cold, it should be enabled to do so in __pci_enable_wake(). [Note that if the device does not end up in D3cold and it cannot signal PME from the original target power state, it will not signal PME, so in that case the behavior does not change.] Link: https://lore.kernel.org/linux-pm/3149540.aeNJFYEL58@kreacher/ Fixes: 5bcc2fb4 ("PCI PM: Simplify PCI wake-up code") Reported-by: NMika Westerberg <mika.westerberg@linux.intel.com> Reported-by: NUtkarsh H Patel <utkarsh.h.patel@intel.com> Reported-by: NKoba Ko <koba.ko@canonical.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Rafael J. Wysocki 提交于
stable inclusion from linux-4.19.207 commit c5c62f4c936407fb734cae64700f20b68273b059 -------------------------------- [ Upstream commit da9f2150 ] It is inconsistent to return PCI_D0 from pci_target_state() instead of the original target state if 'wakeup' is true and the device cannot signal PME from D0. This only happens when the device cannot signal PME from the original target state and any shallower power states (including D0) and that case is effectively equivalent to the one in which PME singaling is not supported at all. Since the original target state is returned in the latter case, make the function do that in the former one too. Link: https://lore.kernel.org/linux-pm/3149540.aeNJFYEL58@kreacher/ Fixes: 666ff6f8 ("PCI/PM: Avoid using device_may_wakeup() for runtime PM") Reported-by: NMika Westerberg <mika.westerberg@linux.intel.com> Reported-by: NUtkarsh H Patel <utkarsh.h.patel@intel.com> Reported-by: NKoba Ko <koba.ko@canonical.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Martin KaFai Lau 提交于
stable inclusion from linux-4.19.207 commit 15bf24c5fb7948564397b450ead0441143d42af2 -------------------------------- [ Upstream commit 525e2f9f ] st->bucket stores the current bucket number. st->offset stores the offset within this bucket that is the sk to be seq_show(). Thus, st->offset only makes sense within the same st->bucket. These two variables are an optimization for the common no-lseek case. When resuming the seq_file iteration (i.e. seq_start()), tcp_seek_last_pos() tries to continue from the st->offset at bucket st->bucket. However, it is possible that the bucket pointed by st->bucket has changed and st->offset may end up skipping the whole st->bucket without finding a sk. In this case, tcp_seek_last_pos() currently continues to satisfy the offset condition in the next (and incorrect) bucket. Instead, regardless of the offset value, the first sk of the next bucket should be returned. Thus, "bucket == st->bucket" check is added to tcp_seek_last_pos(). The chance of hitting this is small and the issue is a decade old, so targeting for the next tree. Fixes: a8b690f9 ("tcp: Fix slowness in read /proc/net/tcp") Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Acked-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200541.1033917-1-kafai@fb.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Desmond Cheong Zhi Xi 提交于
stable inclusion from linux-4.19.207 commit 64dd1fbb0bb743ccd2fb420c441f4ca9732f598f -------------------------------- [ Upstream commit 2f488f69 ] There is an existing lock hierarchy of &dev->event_lock --> &fasync_struct.fa_lock --> &f->f_owner.lock from the following call chain: input_inject_event(): spin_lock_irqsave(&dev->event_lock,...); input_handle_event(): input_pass_values(): input_to_handler(): evdev_events(): evdev_pass_values(): spin_lock(&client->buffer_lock); __pass_event(): kill_fasync(): kill_fasync_rcu(): read_lock(&fa->fa_lock); send_sigio(): read_lock_irqsave(&fown->lock,...); &dev->event_lock is HARDIRQ-safe, so interrupts have to be disabled while grabbing &fasync_struct.fa_lock, otherwise we invert the lock hierarchy. However, since kill_fasync which calls kill_fasync_rcu is an exported symbol, it may not necessarily be called with interrupts disabled. As kill_fasync_rcu may be called with interrupts disabled (for example, in the call chain above), we replace calls to read_lock/read_unlock on &fasync_struct.fa_lock in kill_fasync_rcu with read_lock_irqsave/read_unlock_irqrestore. Signed-off-by: NDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Thomas Gleixner 提交于
stable inclusion from linux-4.19.207 commit e6c3fefc6bb11bef1bd8adfe37e0f317303ab751 -------------------------------- [ Upstream commit 627ef5ae ] If __hrtimer_start_range_ns() is invoked with an already armed hrtimer then the timer has to be canceled first and then added back. If the timer is the first expiring timer then on removal the clockevent device is reprogrammed to the next expiring timer to avoid that the pending expiry fires needlessly. If the new expiry time ends up to be the first expiry again then the clock event device has to reprogrammed again. Avoid this by checking whether the timer is the first to expire and in that case, keep the timer on the current CPU and delay the reprogramming up to the point where the timer has been enqueued again. Reported-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135157.873137732@linutronix.deSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Dietmar Eggemann 提交于
stable inclusion from linux-4.19.207 commit 293fe77dbfe68c5794be4e76c9212003e9304d24 -------------------------------- [ Upstream commit b4da13aa ] A missing clock update is causing the following warning: rq->clock_update_flags < RQCF_ACT_SKIP WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453 sub_running_bw.isra.0+0x190/0x1a0 ... CPU: 112 PID: 2041 Comm: sugov:112 Tainted: G W 5.14.0-rc1 #1 Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 1.6.20210526 (SCP: 1.06.20210526) 2021/05/26 ... Call trace: sub_running_bw.isra.0+0x190/0x1a0 migrate_task_rq_dl+0xf8/0x1e0 set_task_cpu+0xa8/0x1f0 try_to_wake_up+0x150/0x3d4 wake_up_q+0x64/0xc0 __up_write+0xd0/0x1c0 up_write+0x4c/0x2b0 cppc_set_perf+0x120/0x2d0 cppc_cpufreq_set_target+0xe0/0x1a4 [cppc_cpufreq] __cpufreq_driver_target+0x74/0x140 sugov_work+0x64/0x80 kthread_worker_fn+0xe0/0x230 kthread+0x138/0x140 ret_from_fork+0x10/0x18 The task causing this is the `cppc_fie` DL task introduced by commit 1eb5dde6 ("cpufreq: CPPC: Add support for frequency invariance"). With CONFIG_ACPI_CPPC_CPUFREQ_FIE=y and schedutil cpufreq governor on slow-switching system (like on this Ampere Altra WIWYNN Mt. Jade Arm Server): DL task `curr=sugov:112` lets `p=cppc_fie` migrate and since the latter is in `non_contending` state, migrate_task_rq_dl() calls sub_running_bw()->__sub_running_bw()->cpufreq_update_util()-> rq_clock()->assert_clock_updated() on p. Fix this by updating the clock for a non_contending task in migrate_task_rq_dl() before calling sub_running_bw(). Reported-by: NBruno Goncalves <bgoncalv@redhat.com> Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDaniel Bristot de Oliveira <bristot@kernel.org> Acked-by: NJuri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20210804135925.3734605-1-dietmar.eggemann@arm.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Quentin Perret 提交于
stable inclusion from linux-4.19.207 commit bc0d543f53e0d73059cf10733be0109984dadd44 -------------------------------- [ Upstream commit f9509153 ] It is possible for sched_getattr() to incorrectly report the state of the reset_on_fork flag when called on a deadline task. Indeed, if the flag was set on a deadline task using sched_setattr() with flags (SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_KEEP_PARAMS), then p->sched_reset_on_fork will be set, but __setscheduler() will bail out early, which means that the dl_se->flags will not get updated by __setscheduler_params()->__setparam_dl(). Consequently, if sched_getattr() is then called on the task, __getparam_dl() will override kattr.sched_flags with the now out-of-date copy in dl_se->flags and report the stale value to userspace. To fix this, make sure to only copy the flags that are relevant to sched_deadline to and from the dl_se->flags field. Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210727101103.2729607-2-qperret@google.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from linux-4.19.207 commit b7a041073f4e72083dcd688a003a59bf9d9c9124 -------------------------------- [ Upstream commit 048661a1 ] Yanfei reported that setting HANDOFF should not depend on recomputing @first, only on @first state. Which would then give: if (ww_ctx || !first) first = __mutex_waiter_is_first(lock, &waiter); if (first) __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); But because 'ww_ctx || !first' is basically 'always' and the test for first is relatively cheap, omit that first branch entirely. Reported-by: NYanfei Xu <yanfei.xu@windriver.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NWaiman Long <longman@redhat.com> Reviewed-by: NYanfei Xu <yanfei.xu@windriver.com> Link: https://lore.kernel.org/r/20210630154114.896786297@infradead.orgSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Mathieu Desnoyers 提交于
stable inclusion from linux-4.19.207 commit 69178f2d652e64991dc812be24d0776a731f447d -------------------------------- commit e1e84eb5 upstream. As per RFC792, ICMP errors should be sent to the source host. However, in configurations with Virtual Routing and Forwarding tables, looking up which routing table to use is currently done by using the destination net_device. commit 9d1a6c4e ("net: icmp_route_lookup should use rt dev to determine L3 domain") changes the interface passed to l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev to skb_dst(skb_in)->dev. This effectively uses the destination device rather than the source device for choosing which routing table should be used to lookup where to send the ICMP error. Therefore, if the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source host in the destination interface's routing table will fail if the destination interface's routing table contains no route to the source host. One observable effect of this issue is that traceroute does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Preferably use the source device routing table when sending ICMP error messages. If no source device is set, fall-back on the destination device routing table. Else, use the main routing table (index 0). [ It has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate, but is outside of the scope of this investigation. ] [ It has also been pointed out that a similar issue exists with unreachable / fragmentation needed messages, which can be triggered by changing the MTU of eth1 in r1 to 1400 and running: ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2 Some investigation points to raw_icmp_error() and raw_err() as being involved in this last scenario. The focus of this patch is TTL expired ICMP messages, which go through icmp_route_lookup. Investigation of failure modes related to raw_icmp_error() is beyond this investigation's scope. ] Fixes: 9d1a6c4e ("net: icmp_route_lookup should use rt dev to determine L3 domain") Link: https://tools.ietf.org/html/rfc792Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Xiaoyao Li 提交于
stable inclusion from linux-4.19.207 commit f109e8a1678ce920b7f0df7865f2a31754ec5d1c -------------------------------- [ Upstream commit c53c6b74 ] Per SDM, bit 2:0 of CPUID(0x14,1).EAX[2:0] reports the number of configurable address ranges for filtering, not bit 1:0. Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lkml.kernel.org/r/20210824040622.4081502-1-xiaoyao.li@intel.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit bb21deda. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit 76b1b2e0. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-