- 12 11月, 2021 37 次提交
-
-
由 Nadav Amit 提交于
stable inclusion from linux-4.19.207 commit e826df5864e449f89c2d056aab998c813d40c932 -------------------------------- [ Upstream commit 22e5fe2a ] userfaultfd assumes that the enabled features are set once and never changed after UFFDIO_API ioctl succeeded. However, currently, UFFDIO_API can be called concurrently from two different threads, succeed on both threads and leave userfaultfd's features in non-deterministic state. Theoretically, other uffd operations (ioctl's and page-faults) can be dispatched while adversely affected by such changes of features. Moreover, the writes to ctx->state and ctx->features are not ordered, which can - theoretically, again - let userfaultfd_ioctl() think that userfaultfd API completed, while the features are still not initialized. To avoid races, it is arguably best to get rid of ctx->state. Since there are only 2 states, record the API initialization in ctx->features as the uppermost bit and remove ctx->state. Link: https://lkml.kernel.org/r/20210808020724.1022515-3-namit@vmware.com Fixes: 9cd75c3c ("userfaultfd: non-cooperative: add ability to report non-PF events from uffd descriptor") Signed-off-by: NNadav Amit <namit@vmware.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Krzysztof Wilczyński 提交于
stable inclusion from linux-4.19.207 commit 60242386c7ef9b084fc138e3d2b7bb99042fd779 -------------------------------- commit a8bd29bd upstream. The pciconfig_read() syscall reads PCI configuration space using hardware-dependent config accessors. If the read fails on PCI, most accessors don't return an error; they pretend the read was successful and got ~0 data from the device, so the syscall returns success with ~0 data in the buffer. When the accessor does return an error, pciconfig_read() normally fills the user's buffer with ~0 and returns an error in errno. But after e4585da2 ("pci syscall.c: Switch to refcounting API"), we don't fill the buffer with ~0 for the EPERM "user lacks CAP_SYS_ADMIN" error. Userspace may rely on the ~0 data to detect errors, but after e4585da2, that would not detect CAP_SYS_ADMIN errors. Restore the original behaviour of filling the buffer with ~0 when the CAP_SYS_ADMIN check fails. [bhelgaas: commit log, fold in Nathan's fix https://lore.kernel.org/r/20210803200836.500658-1-nathan@kernel.org] Fixes: e4585da2 ("pci syscall.c: Switch to refcounting API") Link: https://lore.kernel.org/r/20210729233755.1509616-1-kw@linux.comSigned-off-by: NKrzysztof Wilczyński <kw@linux.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Damien Le Moal 提交于
stable inclusion from linux-4.19.207 commit 35a276f173356448cd07d83238d2bbc30796b267 -------------------------------- commit a680dd72 upstream. For a request that has a priority level equal to or larger than IOPRIO_BE_NR, bfq_set_next_ioprio_data() prints a critical warning but defaults to setting the request new_ioprio field to IOPRIO_BE_NR. This is not consistent with the warning and the allowed values for priority levels. Fix this by setting the request new_ioprio field to IOPRIO_BE_NR - 1, the lowest priority level allowed. Cc: <stable@vger.kernel.org> Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210811033702.368488-2-damien.lemoal@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Mark Rutland 提交于
stable inclusion from linux-4.19.207 commit 79d15fc5646f79e769b90957370a84c0ba981c85 -------------------------------- commit 90268574 upstream. The `compute_indices` and `populate_entries` macros operate on inclusive bounds, and thus the `map_memory` macro which uses them also operates on inclusive bounds. We pass `_end` and `_idmap_text_end` to `map_memory`, but these are exclusive bounds, and if one of these is sufficiently aligned (as a result of kernel configuration, physical placement, and KASLR), then: * In `compute_indices`, the computed `iend` will be in the page/block *after* the final byte of the intended mapping. * In `populate_entries`, an unnecessary entry will be created at the end of each level of table. At the leaf level, this entry will map up to SWAPPER_BLOCK_SIZE bytes of physical addresses that we did not intend to map. As we may map up to SWAPPER_BLOCK_SIZE bytes more than intended, we may violate the boot protocol and map physical address past the 2MiB-aligned end address we are permitted to map. As we map these with Normal memory attributes, this may result in further problems depending on what these physical addresses correspond to. The final entry at each level may require an additional table at that level. As EARLY_ENTRIES() calculates an inclusive bound, we allocate enough memory for this. Avoid the extraneous mapping by having map_memory convert the exclusive end address to an inclusive end address by subtracting one, and do likewise in EARLY_ENTRIES() when calculating the number of required tables. For clarity, comments are updated to more clearly document which boundaries the macros operate on. For consistency with the other macros, the comments in map_memory are also updated to describe `vstart` and `vend` as virtual addresses. Fixes: 0370b31e ("arm64: Extend early page table code to allow for larger kernels") Cc: <stable@vger.kernel.org> # 4.16.x Signed-off-by: NMark Rutland <mark.rutland@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: NWill Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210823101253.55567-1-mark.rutland@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Daniel Borkmann 提交于
stable inclusion from linux-4.19.207 commit a386fceed74e9f1125104eea7e66cff187edeff7 -------------------------------- commit e042aa53 upstream. In 7fedb63a ("bpf: Tighten speculative pointer arithmetic mask") we narrowed the offset mask for unprivileged pointer arithmetic in order to mitigate a corner case where in the speculative domain it is possible to advance, for example, the map value pointer by up to value_size-1 out-of- bounds in order to leak kernel memory via side-channel to user space. The verifier's state pruning for scalars leaves one corner case open where in the first verification path R_x holds an unknown scalar with an aux->alu_limit of e.g. 7, and in a second verification path that same register R_x, here denoted as R_x', holds an unknown scalar which has tighter bounds and would thus satisfy range_within(R_x, R_x') as well as tnum_in(R_x, R_x') for state pruning, yielding an aux->alu_limit of 3: Given the second path fits the register constraints for pruning, the final generated mask from aux->alu_limit will remain at 7. While technically not wrong for the non-speculative domain, it would however be possible to craft similar cases where the mask would be too wide as in 7fedb63a. One way to fix it is to detect the presence of unknown scalar map pointer arithmetic and force a deeper search on unknown scalars to ensure that we do not run into a masking mismatch. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> [OP: adjusted context for 4.19] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Lorenz Bauer 提交于
stable inclusion from linux-4.19.207 commit a4fe956b03316516ce0eca037d480d270b9bed4c -------------------------------- commit c9e73e3d upstream. func_states_equal makes a very short lived allocation for idmap, probably because it's too large to fit on the stack. However the function is called quite often, leading to a lot of alloc / free churn. Replace the temporary allocation with dedicated scratch space in struct bpf_verifier_env. Signed-off-by: NLorenz Bauer <lmb@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NEdward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/bpf/20210429134656.122225-4-lmb@cloudflare.com [OP: adjusted context for 4.19] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Alexei Starovoitov 提交于
stable inclusion from linux-4.19.207 commit fc578c64398df4f93fd7a6143e218281cc7c4348 -------------------------------- commit fc559a70 upstream. fix tests that incorrectly assumed that the verifier cannot track constants through stack. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> [OP: backport to 4.19] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ignatov 提交于
stable inclusion from linux-4.19.207 commit 2a28675d4b5cff2bfa0b3475c53edba1f5ec8401 -------------------------------- commit 8ff80e96 upstream. Test different scenarios of indirect variable-offset stack access: out of bound access (>0), min_off below initialized part of the stack, max_off+size above initialized part of the stack, initialized stack. Example of output: ... #856/p indirect variable-offset stack access, out of bound OK #857/p indirect variable-offset stack access, max_off+size > max_initialized OK #858/p indirect variable-offset stack access, min_off < min_initialized OK #859/p indirect variable-offset stack access, ok OK ... Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> [OP: backport to 4.19] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ignatov 提交于
stable inclusion from linux-4.19.207 commit 7667818ef188832f69e2cf9cfed56e03a8f7436b -------------------------------- stable inclusion from linux-4.19.207 commit 7667818ef188832f69e2cf9cfed56e03a8f7436b -------------------------------- commit 107c26a7 upstream. As discussed in [1] max value of variable offset has to be checked for overflow on stack access otherwise verifier would accept code like this: 0: (b7) r2 = 6 1: (b7) r3 = 28 2: (7a) *(u64 *)(r10 -16) = 0 3: (7a) *(u64 *)(r10 -8) = 0 4: (79) r4 = *(u64 *)(r1 +168) 5: (c5) if r4 s< 0x0 goto pc+4 R1=ctx(id=0,off=0,imm=0) R2=inv6 R3=inv28 R4=inv(id=0,umax_value=9223372036854775807,var_off=(0x0; 0x7fffffffffffffff)) R10=fp0,call_-1 fp-8=mmmmmmmm fp-16=mmmmmmmm 6: (17) r4 -= 16 7: (0f) r4 += r10 8: (b7) r5 = 8 9: (85) call bpf_getsockopt#57 10: (b7) r0 = 0 11: (95) exit , where R4 obviosly has unbounded max value. Fix it by checking that reg->smax_value is inside (-BPF_MAX_VAR_OFF; BPF_MAX_VAR_OFF) range. reg->smax_value is used instead of reg->umax_value because stack pointers are calculated using negative offset from fp. This is opposite to e.g. map access where offset must be non-negative and where umax_value is used. Also dedicated verbose logs are added for both min and max bound check failures to have diagnostics consistent with variable offset handling in check_map_access(). [1] https://marc.info/?l=linux-netdev&m=155433357510597&w=2 Fixes: 2011fccf ("bpf: Support variable offset stack access from helpers") Reported-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ignatov 提交于
stable inclusion from linux-4.19.207 commit 14cf676ba6a0fb5e495baf843d750070f627d7e8 -------------------------------- commit 088ec26d upstream. Proper support of indirect stack access with variable offset in unprivileged mode (!root) requires corresponding support in Spectre masking for stack ALU in retrieve_ptr_limit(). There are no use-case for variable offset in unprivileged mode though so make verifier reject such accesses for simplicity. Pointer arithmetics is one (and only?) way to cause variable offset and it's already rejected in unpriv mode so that verifier won't even get to helper function whose argument contains variable offset, e.g.: 0: (7a) *(u64 *)(r10 -16) = 0 1: (7a) *(u64 *)(r10 -8) = 0 2: (61) r2 = *(u32 *)(r1 +0) 3: (57) r2 &= 4 4: (17) r2 -= 16 5: (0f) r2 += r10 variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16 size=1R2 stack pointer arithmetic goes out of range, prohibited for !root Still it looks like a good idea to reject variable offset indirect stack access for unprivileged mode in check_stack_boundary() explicitly. Fixes: 2011fccf ("bpf: Support variable offset stack access from helpers") Reported-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> [OP: drop comment in retrieve_ptr_limit()] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ignatov 提交于
stable inclusion from linux-4.19.207 commit 148339cb18ed38a4b1de2269d1a4227fd51376ef -------------------------------- commit f2bcd05e upstream. It's hard to guarantee that whole memory is marked as initialized on helper return if uninitialized stack is accessed with variable offset since specific bounds are unknown to verifier. This may cause uninitialized stack leaking. Reject such an access in check_stack_boundary to prevent possible leaking. There are no known use-cases for indirect uninitialized stack access with variable offset so it shouldn't break anything. Fixes: 2011fccf ("bpf: Support variable offset stack access from helpers") Reported-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ignatov 提交于
stable inclusion from linux-4.19.207 commit caee4103f09e3dcce2ec0d0faf6ff245a6cffbad -------------------------------- commit 2011fccf upstream. Currently there is a difference in how verifier checks memory access for helper arguments for PTR_TO_MAP_VALUE and PTR_TO_STACK with regard to variable part of offset. check_map_access, that is used for PTR_TO_MAP_VALUE, can handle variable offsets just fine, so that BPF program can call a helper like this: some_helper(map_value_ptr + off, size); , where offset is unknown at load time, but is checked by program to be in a safe rage (off >= 0 && off + size < map_value_size). But it's not the case for check_stack_boundary, that is used for PTR_TO_STACK, and same code with pointer to stack is rejected by verifier: some_helper(stack_value_ptr + off, size); For example: 0: (7a) *(u64 *)(r10 -16) = 0 1: (7a) *(u64 *)(r10 -8) = 0 2: (61) r2 = *(u32 *)(r1 +0) 3: (57) r2 &= 4 4: (17) r2 -= 16 5: (0f) r2 += r10 6: (18) r1 = 0xffff888111343a80 8: (85) call bpf_map_lookup_elem#1 invalid variable stack read R2 var_off=(0xfffffffffffffff0; 0x4) Add support for variable offset access to check_stack_boundary so that if offset is checked by program to be in a safe range it's accepted by verifier. Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> [OP: replace reg_state(env, regno) helper with "cur_regs(env) + regno"] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: kernel/bpf/verifier.c [yyl: adjust context] Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jiong Wang 提交于
stable inclusion from linux-4.19.207 commit 79aba0ac3df1a604e843780b17c37646e175b4f8 -------------------------------- commit 0bae2d4d upstream. Verifier is supposed to support sharing stack slot allocated to ptr with SCALAR_VALUE for privileged program. However this doesn't happen for some cases. The reason is verifier is not clearing slot_type STACK_SPILL for all bytes, it only clears part of them, while verifier is using: slot_type[0] == STACK_SPILL as a convention to check one slot is ptr type. So, the consequence of partial clearing slot_type is verifier could treat a partially overridden ptr slot, which should now be a SCALAR_VALUE slot, still as ptr slot, and rejects some valid programs. Before this patch, test_xdp_noinline.o under bpf selftests, bpf_lxc.o and bpf_netdev.o under Cilium bpf repo, when built with -mattr=+alu32 are rejected due to this issue. After this patch, they all accepted. There is no processed insn number change before and after this patch on Cilium bpf programs. Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NJiong Wang <jiong.wang@netronome.com> Reviewed-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> [OP: adjusted context for 4.19] Signed-off-by: NOvidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
stable inclusion from linux-4.19.207 commit 34b23fc32c05f14fd21caa32544f3720e1527a8d -------------------------------- commit 1a519dc7 upstream. When running as Xen PV guest, masking MSI-X is a responsibility of the hypervisor. The guest has no write access to the relevant BAR at all - when it tries to, it results in a crash like this: BUG: unable to handle page fault for address: ffffc9004069100c #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0 e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e] e1000_probe+0x41f/0xdb0 [e1000e] local_pci_probe+0x42/0x80 (...) The recently introduced function msix_mask_all() does not check the global variable pci_msi_ignore_mask which is set by XEN PV to bypass the masking of MSI[-X] interrupts. Add the check to make this function XEN PV compatible. Fixes: 7d5ec3d3 ("PCI/MSI: Mask all unused MSI-X entries") Signed-off-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210826170342.135172-1-marmarek@invisiblethingslab.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Nguyen Dinh Phi 提交于
stable inclusion from linux-4.19.207 commit f7bffefa322a3d5a292c0b7a9b93302b392928f6 -------------------------------- commit bb2853a6 upstream. The ops->receive_buf() may be accessed concurrently from these two functions. If the driver flushes data to the line discipline receive_buf() method while tiocsti() is waiting for the ops->receive_buf() to finish its work, the data race will happen. For example: tty_ioctl |tty_ldisc_receive_buf ->tioctsi | ->tty_port_default_receive_buf | ->tty_ldisc_receive_buf ->hci_uart_tty_receive | ->hci_uart_tty_receive ->h4_recv | ->h4_recv In this case, the h4 receive buffer will be overwritten by the latecomer, and we will lost the data. Hence, change tioctsi() function to use the exclusive lock interface from tty_buffer to avoid the data race. Reported-by: syzbot+97388eb9d31b997fe1d0@syzkaller.appspotmail.com Reviewed-by: NJiri Slaby <jirislaby@kernel.org> Signed-off-by: NNguyen Dinh Phi <phind.uet@gmail.com> Link: https://lore.kernel.org/r/20210823000641.2082292-1-phind.uet@gmail.com Cc: stable <stable@vger.kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Xiyu Yang 提交于
stable inclusion from linux-4.19.207 commit 5129766a9797ea212087314fafc68268f1bf8515 -------------------------------- [ Upstream commit c6607012 ] The reference counting issue happens in one exception handling path of cbq_change_class(). When failing to get tcf_block, the function forgets to decrease the refcount of "rtab" increased by qdisc_put_rtab(), causing a refcount leak. Fix this issue by jumping to "failure" label when get tcf_block failed. Fixes: 6529eaba ("net: sched: introduce tcf block infractructure") Signed-off-by: NXiyu Yang <xiyuyang19@fudan.edu.cn> Reviewed-by: NCong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/1630252681-71588-1-git-send-email-xiyuyang19@fudan.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andy Duan 提交于
stable inclusion from linux-4.19.207 commit 600624ecbe35e0359806b107f247811b1f79ad25 -------------------------------- [ Upstream commit d5c38948 ] Register offset needs to be applied on mapbase also. dma_tx/rx_request use the physical address of UARTDATA. Register offset is currently only applied to membase (the corresponding virtual addr) but not on mapbase. Fixes: 24b1e5f0 ("tty: serial: lpuart: add imx7ulp support") Reviewed-by: NLeonard Crestez <leonard.crestez@nxp.com> Signed-off-by: NAdriana Reus <adriana.reus@nxp.com> Signed-off-by: NSherry Sun <sherry.sun@nxp.com> Signed-off-by: NAndy Duan <fugang.duan@nxp.com> Link: https://lore.kernel.org/r/20210819021033.32606-1-sherry.sun@nxp.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Len Baker 提交于
stable inclusion from linux-4.19.207 commit bea655491daf39f1934a71bf576bf3499092d3a4 -------------------------------- [ Upstream commit f980d055 ] strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated. Also, the strnlen() call does not avoid the read overflow in the strlcpy function when a not NUL-terminated string is passed. So, replace this block by a call to kstrndup() that avoids this type of overflow and does the same. Fixes: 066ce689 ("cifs: rename cifs_strlcpy_to_host and make it use new functions") Signed-off-by: NLen Baker <len.baker@gmx.com> Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NSteve French <stfrench@microsoft.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Rafael J. Wysocki 提交于
stable inclusion from linux-4.19.207 commit d4bcd29b50d1a33111d3bacd7974f0d0bbec484f -------------------------------- [ Upstream commit 0e00392a ] PME signaling is only enabled by __pci_enable_wake() if the target device can signal PME from the given target power state (to avoid pointless reconfiguration of the device), but if the hierarchy above the device goes into D3cold, the device itself will end up in D3cold too, so if it can signal PME from D3cold, it should be enabled to do so in __pci_enable_wake(). [Note that if the device does not end up in D3cold and it cannot signal PME from the original target power state, it will not signal PME, so in that case the behavior does not change.] Link: https://lore.kernel.org/linux-pm/3149540.aeNJFYEL58@kreacher/ Fixes: 5bcc2fb4 ("PCI PM: Simplify PCI wake-up code") Reported-by: NMika Westerberg <mika.westerberg@linux.intel.com> Reported-by: NUtkarsh H Patel <utkarsh.h.patel@intel.com> Reported-by: NKoba Ko <koba.ko@canonical.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Rafael J. Wysocki 提交于
stable inclusion from linux-4.19.207 commit c5c62f4c936407fb734cae64700f20b68273b059 -------------------------------- [ Upstream commit da9f2150 ] It is inconsistent to return PCI_D0 from pci_target_state() instead of the original target state if 'wakeup' is true and the device cannot signal PME from D0. This only happens when the device cannot signal PME from the original target state and any shallower power states (including D0) and that case is effectively equivalent to the one in which PME singaling is not supported at all. Since the original target state is returned in the latter case, make the function do that in the former one too. Link: https://lore.kernel.org/linux-pm/3149540.aeNJFYEL58@kreacher/ Fixes: 666ff6f8 ("PCI/PM: Avoid using device_may_wakeup() for runtime PM") Reported-by: NMika Westerberg <mika.westerberg@linux.intel.com> Reported-by: NUtkarsh H Patel <utkarsh.h.patel@intel.com> Reported-by: NKoba Ko <koba.ko@canonical.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Martin KaFai Lau 提交于
stable inclusion from linux-4.19.207 commit 15bf24c5fb7948564397b450ead0441143d42af2 -------------------------------- [ Upstream commit 525e2f9f ] st->bucket stores the current bucket number. st->offset stores the offset within this bucket that is the sk to be seq_show(). Thus, st->offset only makes sense within the same st->bucket. These two variables are an optimization for the common no-lseek case. When resuming the seq_file iteration (i.e. seq_start()), tcp_seek_last_pos() tries to continue from the st->offset at bucket st->bucket. However, it is possible that the bucket pointed by st->bucket has changed and st->offset may end up skipping the whole st->bucket without finding a sk. In this case, tcp_seek_last_pos() currently continues to satisfy the offset condition in the next (and incorrect) bucket. Instead, regardless of the offset value, the first sk of the next bucket should be returned. Thus, "bucket == st->bucket" check is added to tcp_seek_last_pos(). The chance of hitting this is small and the issue is a decade old, so targeting for the next tree. Fixes: a8b690f9 ("tcp: Fix slowness in read /proc/net/tcp") Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Acked-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210701200541.1033917-1-kafai@fb.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Desmond Cheong Zhi Xi 提交于
stable inclusion from linux-4.19.207 commit 64dd1fbb0bb743ccd2fb420c441f4ca9732f598f -------------------------------- [ Upstream commit 2f488f69 ] There is an existing lock hierarchy of &dev->event_lock --> &fasync_struct.fa_lock --> &f->f_owner.lock from the following call chain: input_inject_event(): spin_lock_irqsave(&dev->event_lock,...); input_handle_event(): input_pass_values(): input_to_handler(): evdev_events(): evdev_pass_values(): spin_lock(&client->buffer_lock); __pass_event(): kill_fasync(): kill_fasync_rcu(): read_lock(&fa->fa_lock); send_sigio(): read_lock_irqsave(&fown->lock,...); &dev->event_lock is HARDIRQ-safe, so interrupts have to be disabled while grabbing &fasync_struct.fa_lock, otherwise we invert the lock hierarchy. However, since kill_fasync which calls kill_fasync_rcu is an exported symbol, it may not necessarily be called with interrupts disabled. As kill_fasync_rcu may be called with interrupts disabled (for example, in the call chain above), we replace calls to read_lock/read_unlock on &fasync_struct.fa_lock in kill_fasync_rcu with read_lock_irqsave/read_unlock_irqrestore. Signed-off-by: NDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Thomas Gleixner 提交于
stable inclusion from linux-4.19.207 commit e6c3fefc6bb11bef1bd8adfe37e0f317303ab751 -------------------------------- [ Upstream commit 627ef5ae ] If __hrtimer_start_range_ns() is invoked with an already armed hrtimer then the timer has to be canceled first and then added back. If the timer is the first expiring timer then on removal the clockevent device is reprogrammed to the next expiring timer to avoid that the pending expiry fires needlessly. If the new expiry time ends up to be the first expiry again then the clock event device has to reprogrammed again. Avoid this by checking whether the timer is the first to expire and in that case, keep the timer on the current CPU and delay the reprogramming up to the point where the timer has been enqueued again. Reported-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210713135157.873137732@linutronix.deSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Dietmar Eggemann 提交于
stable inclusion from linux-4.19.207 commit 293fe77dbfe68c5794be4e76c9212003e9304d24 -------------------------------- [ Upstream commit b4da13aa ] A missing clock update is causing the following warning: rq->clock_update_flags < RQCF_ACT_SKIP WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453 sub_running_bw.isra.0+0x190/0x1a0 ... CPU: 112 PID: 2041 Comm: sugov:112 Tainted: G W 5.14.0-rc1 #1 Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 1.6.20210526 (SCP: 1.06.20210526) 2021/05/26 ... Call trace: sub_running_bw.isra.0+0x190/0x1a0 migrate_task_rq_dl+0xf8/0x1e0 set_task_cpu+0xa8/0x1f0 try_to_wake_up+0x150/0x3d4 wake_up_q+0x64/0xc0 __up_write+0xd0/0x1c0 up_write+0x4c/0x2b0 cppc_set_perf+0x120/0x2d0 cppc_cpufreq_set_target+0xe0/0x1a4 [cppc_cpufreq] __cpufreq_driver_target+0x74/0x140 sugov_work+0x64/0x80 kthread_worker_fn+0xe0/0x230 kthread+0x138/0x140 ret_from_fork+0x10/0x18 The task causing this is the `cppc_fie` DL task introduced by commit 1eb5dde6 ("cpufreq: CPPC: Add support for frequency invariance"). With CONFIG_ACPI_CPPC_CPUFREQ_FIE=y and schedutil cpufreq governor on slow-switching system (like on this Ampere Altra WIWYNN Mt. Jade Arm Server): DL task `curr=sugov:112` lets `p=cppc_fie` migrate and since the latter is in `non_contending` state, migrate_task_rq_dl() calls sub_running_bw()->__sub_running_bw()->cpufreq_update_util()-> rq_clock()->assert_clock_updated() on p. Fix this by updating the clock for a non_contending task in migrate_task_rq_dl() before calling sub_running_bw(). Reported-by: NBruno Goncalves <bgoncalv@redhat.com> Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDaniel Bristot de Oliveira <bristot@kernel.org> Acked-by: NJuri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20210804135925.3734605-1-dietmar.eggemann@arm.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Quentin Perret 提交于
stable inclusion from linux-4.19.207 commit bc0d543f53e0d73059cf10733be0109984dadd44 -------------------------------- [ Upstream commit f9509153 ] It is possible for sched_getattr() to incorrectly report the state of the reset_on_fork flag when called on a deadline task. Indeed, if the flag was set on a deadline task using sched_setattr() with flags (SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_KEEP_PARAMS), then p->sched_reset_on_fork will be set, but __setscheduler() will bail out early, which means that the dl_se->flags will not get updated by __setscheduler_params()->__setparam_dl(). Consequently, if sched_getattr() is then called on the task, __getparam_dl() will override kattr.sched_flags with the now out-of-date copy in dl_se->flags and report the stale value to userspace. To fix this, make sure to only copy the flags that are relevant to sched_deadline to and from the dl_se->flags field. Signed-off-by: NQuentin Perret <qperret@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210727101103.2729607-2-qperret@google.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from linux-4.19.207 commit b7a041073f4e72083dcd688a003a59bf9d9c9124 -------------------------------- [ Upstream commit 048661a1 ] Yanfei reported that setting HANDOFF should not depend on recomputing @first, only on @first state. Which would then give: if (ww_ctx || !first) first = __mutex_waiter_is_first(lock, &waiter); if (first) __mutex_set_flag(lock, MUTEX_FLAG_HANDOFF); But because 'ww_ctx || !first' is basically 'always' and the test for first is relatively cheap, omit that first branch entirely. Reported-by: NYanfei Xu <yanfei.xu@windriver.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NWaiman Long <longman@redhat.com> Reviewed-by: NYanfei Xu <yanfei.xu@windriver.com> Link: https://lore.kernel.org/r/20210630154114.896786297@infradead.orgSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Mathieu Desnoyers 提交于
stable inclusion from linux-4.19.207 commit 69178f2d652e64991dc812be24d0776a731f447d -------------------------------- commit e1e84eb5 upstream. As per RFC792, ICMP errors should be sent to the source host. However, in configurations with Virtual Routing and Forwarding tables, looking up which routing table to use is currently done by using the destination net_device. commit 9d1a6c4e ("net: icmp_route_lookup should use rt dev to determine L3 domain") changes the interface passed to l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev to skb_dst(skb_in)->dev. This effectively uses the destination device rather than the source device for choosing which routing table should be used to lookup where to send the ICMP error. Therefore, if the source and destination interfaces are within separate VRFs, or one in the global routing table and the other in a VRF, looking up the source host in the destination interface's routing table will fail if the destination interface's routing table contains no route to the source host. One observable effect of this issue is that traceroute does not work in the following cases: - Route leaking between global routing table and VRF - Route leaking between VRFs Preferably use the source device routing table when sending ICMP error messages. If no source device is set, fall-back on the destination device routing table. Else, use the main routing table (index 0). [ It has been pointed out that a similar issue may exist with ICMP errors triggered when forwarding between network namespaces. It would be worthwhile to investigate, but is outside of the scope of this investigation. ] [ It has also been pointed out that a similar issue exists with unreachable / fragmentation needed messages, which can be triggered by changing the MTU of eth1 in r1 to 1400 and running: ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2 Some investigation points to raw_icmp_error() and raw_err() as being involved in this last scenario. The focus of this patch is TTL expired ICMP messages, which go through icmp_route_lookup. Investigation of failure modes related to raw_icmp_error() is beyond this investigation's scope. ] Fixes: 9d1a6c4e ("net: icmp_route_lookup should use rt dev to determine L3 domain") Link: https://tools.ietf.org/html/rfc792Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Xiaoyao Li 提交于
stable inclusion from linux-4.19.207 commit f109e8a1678ce920b7f0df7865f2a31754ec5d1c -------------------------------- [ Upstream commit c53c6b74 ] Per SDM, bit 2:0 of CPUID(0x14,1).EAX[2:0] reports the number of configurable address ranges for filtering, not bit 1:0. Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lkml.kernel.org/r/20210824040622.4081502-1-xiaoyao.li@intel.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit bb21deda. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit 76b1b2e0. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit 41236b73. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit 52b5a666. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit 37cdb660. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit 3aa5fa87. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit c9620822. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit deaf01a6. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 zhangguijiang 提交于
ascend inclusion category: bugfix feature: Ascend emmc adaption bugzilla: https://gitee.com/openeuler/kernel/issues/I4F4LL CVE: NA -------------------- This reverts commit 80e65cb9. Signed-off-by: Nzhangguijiang <zhangguijiang@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NYANHONG LU <luyanhong.luyanhong@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 11 11月, 2021 2 次提交
-
-
由 Mao HongBo 提交于
phytium inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I41AUQ ----------------------------------- To fix iommu issue of device access in virtualization scenario for ft2000plus and S2500. Convert to new cputype macros naming of phytium. Signed-off-by: NMao HongBo <maohongbo@phytium.com.cn> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Weilong Chen 提交于
ascend inclusion category: feature bugzilla: 46922 CVE: NA ------------------------------------- Patch "cache: Workaround HiSilicon Taishan DC CVAU" breaks the kabi symbols: cpu_hwcaps cpu_hwcap_keys Patch "arm64: Errata: fix kabi changed by cpu_errata" try to fix it but incomplete. Eable IDC on platform TSV{110,200}. Signed-off-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 10 11月, 2021 1 次提交
-
-
由 Ye Bin 提交于
mainline inclusion from mainline-v5.16 commit a846a8e6 category: bugfix bugzilla: 185668 CVE: NA ----------------------------------------------- We got UAF report on v5.10 as follows: [ 1446.674930] ================================================================== [ 1446.675970] BUG: KASAN: use-after-free in blk_mq_get_driver_tag+0x9a4/0xa90 [ 1446.676902] Read of size 8 at addr ffff8880185afd10 by task kworker/1:2/12348 [ 1446.677851] [ 1446.678073] CPU: 1 PID: 12348 Comm: kworker/1:2 Not tainted 5.10.0-10177-gc9c81b1e346a #2 [ 1446.679168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1446.680692] Workqueue: kthrotld blk_throtl_dispatch_work_fn [ 1446.681448] Call Trace: [ 1446.681800] dump_stack+0x9b/0xce [ 1446.682916] print_address_description.constprop.6+0x3e/0x60 [ 1446.685999] kasan_report.cold.9+0x22/0x3a [ 1446.687186] blk_mq_get_driver_tag+0x9a4/0xa90 [ 1446.687785] blk_mq_dispatch_rq_list+0x21a/0x1d40 [ 1446.692576] __blk_mq_do_dispatch_sched+0x394/0x830 [ 1446.695758] __blk_mq_sched_dispatch_requests+0x398/0x4f0 [ 1446.698279] blk_mq_sched_dispatch_requests+0xdf/0x140 [ 1446.698967] __blk_mq_run_hw_queue+0xc0/0x270 [ 1446.699561] __blk_mq_delay_run_hw_queue+0x4cc/0x550 [ 1446.701407] blk_mq_run_hw_queue+0x13b/0x2b0 [ 1446.702593] blk_mq_sched_insert_requests+0x1de/0x390 [ 1446.703309] blk_mq_flush_plug_list+0x4b4/0x760 [ 1446.705408] blk_flush_plug_list+0x2c5/0x480 [ 1446.708471] blk_finish_plug+0x55/0xa0 [ 1446.708980] blk_throtl_dispatch_work_fn+0x23b/0x2e0 [ 1446.711236] process_one_work+0x6d4/0xfe0 [ 1446.711778] worker_thread+0x91/0xc80 [ 1446.713400] kthread+0x32d/0x3f0 [ 1446.714362] ret_from_fork+0x1f/0x30 [ 1446.714846] [ 1446.715062] Allocated by task 1: [ 1446.715509] kasan_save_stack+0x19/0x40 [ 1446.716026] __kasan_kmalloc.constprop.1+0xc1/0xd0 [ 1446.716673] blk_mq_init_tags+0x6d/0x330 [ 1446.717207] blk_mq_alloc_rq_map+0x50/0x1c0 [ 1446.717769] __blk_mq_alloc_map_and_request+0xe5/0x320 [ 1446.718459] blk_mq_alloc_tag_set+0x679/0xdc0 [ 1446.719050] scsi_add_host_with_dma.cold.3+0xa0/0x5db [ 1446.719736] virtscsi_probe+0x7bf/0xbd0 [ 1446.720265] virtio_dev_probe+0x402/0x6c0 [ 1446.720808] really_probe+0x276/0xde0 [ 1446.721320] driver_probe_device+0x267/0x3d0 [ 1446.721892] device_driver_attach+0xfe/0x140 [ 1446.722491] __driver_attach+0x13a/0x2c0 [ 1446.723037] bus_for_each_dev+0x146/0x1c0 [ 1446.723603] bus_add_driver+0x3fc/0x680 [ 1446.724145] driver_register+0x1c0/0x400 [ 1446.724693] init+0xa2/0xe8 [ 1446.725091] do_one_initcall+0x9e/0x310 [ 1446.725626] kernel_init_freeable+0xc56/0xcb9 [ 1446.726231] kernel_init+0x11/0x198 [ 1446.726714] ret_from_fork+0x1f/0x30 [ 1446.727212] [ 1446.727433] Freed by task 26992: [ 1446.727882] kasan_save_stack+0x19/0x40 [ 1446.728420] kasan_set_track+0x1c/0x30 [ 1446.728943] kasan_set_free_info+0x1b/0x30 [ 1446.729517] __kasan_slab_free+0x111/0x160 [ 1446.730084] kfree+0xb8/0x520 [ 1446.730507] blk_mq_free_map_and_requests+0x10b/0x1b0 [ 1446.731206] blk_mq_realloc_hw_ctxs+0x8cb/0x15b0 [ 1446.731844] blk_mq_init_allocated_queue+0x374/0x1380 [ 1446.732540] blk_mq_init_queue_data+0x7f/0xd0 [ 1446.733155] scsi_mq_alloc_queue+0x45/0x170 [ 1446.733730] scsi_alloc_sdev+0x73c/0xb20 [ 1446.734281] scsi_probe_and_add_lun+0x9a6/0x2d90 [ 1446.734916] __scsi_scan_target+0x208/0xc50 [ 1446.735500] scsi_scan_channel.part.3+0x113/0x170 [ 1446.736149] scsi_scan_host_selected+0x25a/0x360 [ 1446.736783] store_scan+0x290/0x2d0 [ 1446.737275] dev_attr_store+0x55/0x80 [ 1446.737782] sysfs_kf_write+0x132/0x190 [ 1446.738313] kernfs_fop_write_iter+0x319/0x4b0 [ 1446.738921] new_sync_write+0x40e/0x5c0 [ 1446.739429] vfs_write+0x519/0x720 [ 1446.739877] ksys_write+0xf8/0x1f0 [ 1446.740332] do_syscall_64+0x2d/0x40 [ 1446.740802] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1446.741462] [ 1446.741670] The buggy address belongs to the object at ffff8880185afd00 [ 1446.741670] which belongs to the cache kmalloc-256 of size 256 [ 1446.743276] The buggy address is located 16 bytes inside of [ 1446.743276] 256-byte region [ffff8880185afd00, ffff8880185afe00) [ 1446.744765] The buggy address belongs to the page: [ 1446.745416] page:ffffea0000616b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x185ac [ 1446.746694] head:ffffea0000616b00 order:2 compound_mapcount:0 compound_pincount:0 [ 1446.747719] flags: 0x1fffff80010200(slab|head) [ 1446.748337] raw: 001fffff80010200 ffffea00006a3208 ffffea000061bf08 ffff88801004f240 [ 1446.749404] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 [ 1446.750455] page dumped because: kasan: bad access detected [ 1446.751227] [ 1446.751445] Memory state around the buggy address: [ 1446.752102] ffff8880185afc00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.753090] ffff8880185afc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.754079] >ffff8880185afd00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1446.755065] ^ [ 1446.755589] ffff8880185afd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1446.756574] ffff8880185afe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.757566] ================================================================== Flag 'BLK_MQ_F_TAG_QUEUE_SHARED' will be set if the second device on the same host initializes it's queue successfully. However, if the second device failed to allocate memory in blk_mq_alloc_and_init_hctx() from blk_mq_realloc_hw_ctxs() from blk_mq_init_allocated_queue(), __blk_mq_free_map_and_rqs() will be called on error path, and if 'BLK_MQ_TAG_HCTX_SHARED' is not set, 'tag_set->tags' will be freed while it's still used by the first device. To fix this issue we move release newly allocated hardware context from blk_mq_realloc_hw_ctxs to __blk_mq_update_nr_hw_queues. As there is needn't to release hardware context in blk_mq_init_allocated_queue. Fixes: 868f2f0b ("blk-mq: dynamic h/w context count") Signed-off-by: NYe Bin <yebin10@huawei.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211108074019.1058843-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk> conflicts: block/blk-mq.c Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-