- 27 12月, 2019 40 次提交
-
-
由 Yuchung Cheng 提交于
mainline inclusion from mainline-5.0 commit 31aa6503 category: bugfix bugzilla: 6982 CVE: NA ------------------------------------------------- The existing BPF TCP initial congestion window (TCP_BPF_IW) does not to work on (active) Fast Open sender. This is because it changes the (initial) window only if data_segs_out is zero -- but data_segs_out is also incremented on SYN-data. This patch fixes the issue by proerly accounting for SYN-data additionally. Fixes: fc747810 ("bpf: Adds support for setting initial cwnd") Signed-off-by: NYuchung Cheng <ycheng@google.com> Reviewed-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NLawrence Brakmo <brakmo@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NMao Wenan <maowenan@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrei Vagin 提交于
mainline inclusion from mainline-4.20 commit c34c1287 category: bugfix bugzilla: 6053 CVE: NA ------------------------------------------------- IPPROTO_RAW isn't registred as an inet protocol, so inet_protos[protocol] is always NULL for it. Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Fixes: bf2ae2e4 ("sock_diag: request _diag module only when the family or proto has been registered") Signed-off-by: NAndrei Vagin <avagin@gmail.com> Reviewed-by: NCyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NMao Wenan <maowenan@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Florian Westphal 提交于
mainline inclusion from mainline-5.0 commit 2035f3ff category: bugfix bugzilla: 7269 CVE: NA ------------------------------------------------- Unlike ip(6)tables ebtables only counts user-defined chains. The effect is that a 32bit ebtables binary on a 64bit kernel can do 'ebtables -N FOO' only after adding at least one rule, else the request fails with -EINVAL. This is a similar fix as done in 3f1e53ab ("netfilter: ebtables: don't attempt to allocate 0-sized compat array"). Fixes: 7d7d7e02 ("netfilter: compat: reject huge allocation requests") Reported-by: NFrancesco Ruggeri <fruggeri@arista.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NMao Wenan <maowenan@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
mainline inclusion from mainline-5.0 commit 1a6a0951 category: bugfix bugzilla: 7252 CVE: NA ------------------------------------------------- When we check the tcp options of a packet and it doesn't match the current fingerprint, the tcp packet option pointer must be restored to its initial value in order to do the proper tcp options check for the next fingerprint. Here we can see an example. Assumming the following fingerprint base with two lines: S10:64:1:60:M*,S,T,N,W6: Linux:3.0::Linux 3.0 S20:64:1:60:M*,S,T,N,W7: Linux:4.19:arch:Linux 4.1 Where TCP options are the last field in the OS signature, all of them overlap except by the last one, ie. 'W6' versus 'W7'. In case a packet for Linux 4.19 kicks in, the osf finds no matching because the TCP options pointer is updated after checking for the TCP options in the first line. Therefore, reset pointer back to where it should be. Fixes: 11eeef41 ("netfilter: passive OS fingerprint xtables match") Signed-off-by: NFernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NMao Wenan <maowenan@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Benedict Wong 提交于
mainline inclusion from mainline-5.0 commit e2612cd4 category: bugfix bugzilla: 7260 CVE: NA ------------------------------------------------- Fixes 9b42c1f1, which changed the default route lookup behavior for tunnel mode SAs in the outbound direction to use the skb mark, whereas previously mark=0 was used if the output mark was unspecified. In mark-based routing schemes such as Android’s, this change in default behavior causes routing loops or lookup failures. This patch restores the default behavior of using a 0 mark while still incorporating the skb mark if the SET_MARK (and SET_MARK_MASK) is specified. Tested with additions to Android's kernel unit test suite: https://android-review.googlesource.com/c/kernel/tests/+/860150 Fixes: 9b42c1f1 ("xfrm: Extend the output_mark to support input direction and masking") Signed-off-by: NBenedict Wong <benedictwong@google.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NMao Wenan <maowenan@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Taehee Yoo 提交于
mainline inclusion from mainline-4.20 commit f24d2d4f category: bugfix bugzilla: 6238 CVE: NA ------------------------------------------------- TEE netdevice notifier handler checks only interface name. however each netns can have same interface name. hence other netns's interface could be selected. test commands: %ip netns add vm1 %iptables -I INPUT -p icmp -j TEE --gateway 192.168.1.1 --oif enp2s0 %ip link set enp2s0 netns vm1 Above rule is in the root netns. but that rule could get enp2s0 ifindex of vm1 by notifier handler. After this patch, TEE rule is added to the per-netns list. Fixes: 9e2f6c5d ("netfilter: Rework xt_TEE netdevice notifier") Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NMao Wenan <maowenan@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Taehee Yoo 提交于
mainline inclusion from mainline-4.20 commit 18c0ab87 category: bugfix bugzilla: 6233 CVE: NA ------------------------------------------------- checkentry(tee_tg_check) should initialize priv->oif from dev if possible. But only netdevice notifier handler can set that. Hence priv->oif is always -1 until notifier handler is called. Fixes: 9e2f6c5d ("netfilter: Rework xt_TEE netdevice notifier") Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NMao Wenan <maowenan@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Florian Westphal 提交于
mainline inclusion from mainline-5.0 commit b2e3d68d category: bugfix bugzilla: 7256 CVE: NA ------------------------------------------------- The nft_compat destroy function deletes the nft_xt object from a list. This isn't allowed anymore. Destroy functions are called asynchronously, i.e. next batch can find the object that has a pending ->destroy() invocation: cpu0 cpu1 worker ->destroy for_each_entry() if (x == ... return x->ops; list_del(x) kfree_rcu(x) expr->ops->... // ops was free'd To resolve this, the list_del needs to occur before the transaction mutex gets released. nf_tables has a 'deactivate' hook for this purpose, so use that to unlink the object from the list. Fixes: 0935d558 ("netfilter: nf_tables: asynchronous release") Reported-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NMao Wenan <maowenan@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Florian Westphal 提交于
mainline inclusion from mainline-5.0 commit cf52572e category: bugfix bugzilla: 7256 CVE: NA ------------------------------------------------- There are two problems with nft_compat since the netlink config plane uses a per-netns mutex: 1. Concurrent add/del accesses to the same list 2. accesses to a list element after it has been free'd already. This patch fixes the first problem. Freeing occurs from a work queue, after transaction mutexes have been released, i.e., it still possible for a new transaction (even from same net ns) to find the to-be-deleted expression in the list. The ->destroy functions are not allowed to have any such side effects, i.e. the list_del() in the destroy function is not allowed. This part of the problem is solved in the next patch. I tried to make this work by serializing list access via mutex and by moving list_del() to a deactivate callback, but Taehee spotted following race on this approach: NET #0 NET #1 >select_ops() ->init() ->select_ops() ->deactivate() ->destroy() nft_xt_put() kfree_rcu(xt, rcu_head); ->init() <-- use-after-free occurred. Unfortunately, we can't increment reference count in select_ops(), because we can't undo the refcount increase in case a different expression fails in the same batch. (The destroy hook will only be called in case the expression was initialized successfully). Fixes: f102d66b ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NMao Wenan <maowenan@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Florian Westphal 提交于
mainline inclusion from mainline-5.0 commit 12c44aba category: bugfix bugzilla: 7256 CVE: NA ------------------------------------------------- Using standard integer type was fine while all operations on it were guarded by the nftnl subsys mutex. This isn't true anymore: 1. transactions are guarded only by a pernet mutex, so concurrent rule manipulation in different netns is racy 2. the ->destroy hook runs from a work queue after the transaction mutex has been released already. cpu0 cpu1 (net 1) cpu2 (net 2) kworker nft_compat->destroy nft_compat->init nft_compat->init if (--nft_xt->ref == 0) nft_xt->ref++ nft_xt->ref++ Switch to refcount_t. Doing this however only fixes a minor aspect, nft_compat also performs linked-list operations in an unsafe way. This is addressed in the next two patches. Fixes: f102d66b ("netfilter: nf_tables: use dedicated mutex to guard transactions") Fixes: 0935d558 ("netfilter: nf_tables: asynchronous release") Reported-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NMao Wenan <maowenan@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Suzuki K Poulose 提交于
mainline inclusion from mainline-4.20 commit 1602df02 category: bugfix bugzilla: 5645 CVE: NA ------------------------------------------------- CTR_EL0.IDC reports the data cache clean requirements for instruction to data coherence. However, if the field is 0, we need to check the CLIDR_EL1 fields to detect the status of the feature. Currently we don't do this and generate a warning with tainting the kernel, when there is a mismatch in the field among the CPUs. Also the userspace doesn't have a reliable way to check the CLIDR_EL1 register to check the status. This patch fixes the problem by checking the CLIDR_EL1 fields, when (CTR_EL0.IDC == 0) and updates the kernel's copy of the CTR_EL0 for the CPU with the actual status of the feature. This would allow the sanity check infrastructure to do the proper checking of the fields and also allow the CTR_EL0 emulation code to supply the real status of the feature. Now, if a CPU has raw CTR_EL0.IDC == 0 and effective IDC == 1 (with overall system wide IDC == 1), we need to expose the real value to the user. So, we trap CTR_EL0 access on the CPU which reports incorrect CTR_EL0.IDC. Conflicts: arch/arm64/kernel/cpu_errata.c Fixes: commit 6ae4b6e0 ("arm64: Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC") Cc: Shanker Donthineni <shankerd@codeaurora.org> Cc: Philip Elcan <pelcan@codeaurora.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> [yyl: adjust context] Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NXuefeng Wang <wxf.wang@hisilicon.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-5.0 commit 8d20d6e9 category: bugfix bugzilla: 5765 CVE: NA ------------------------------------------------- Currently chunk hash key (which is in fact pointer to the inode) is derived as chunk->mark.conn->obj. It is tricky to make this dereference reliable for hash table lookups only under RCU as mark can get detached from the connector and connector gets freed independently of the running lookup. Thus there is a possible use after free / NULL ptr dereference issue: CPU1 CPU2 untag_chunk() ... audit_tree_lookup() list_for_each_entry_rcu(p, list, hash) { list_del_rcu(&chunk->hash); fsnotify_destroy_mark(entry); fsnotify_put_mark(entry) chunk_to_key(p) if (!chunk->mark.connector) ... hlist_del_init_rcu(&mark->obj_list); if (hlist_empty(&conn->list)) { inode = fsnotify_detach_connector_from_object(conn); mark->connector = NULL; ... frees connector from workqueue chunk->mark.connector->obj This race is probably impossible to hit in practice as the race window on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's better to have this fixed. Since the inode the chunk is attached to is constant during chunk's lifetime it is easy to cache the key in the chunk itself and thus avoid these issues. Reviewed-by: NRichard Guy Briggs <rgb@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaul Moore <paul@paul-moore.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
-
由 James Morse 提交于
mainline inclusion from mainline-v5.0-rc4 commit f2b3d856 category: bugfix bugzilla: NA CVE: NA ------------------------------------------------- On systems with VHE the kernel and KVM's world-switch code run at the same exception level. Code that is only used on a VHE system does not need to be annotated as __hyp_text as it can reside anywhere in the kernel text. __hyp_text was also used to prevent kprobes from patching breakpoint instructions into this region, as this code runs at a different exception level. While this is no longer true with VHE, KVM still switches VBAR_EL1, meaning a kprobe's breakpoint executed in the world-switch code will cause a hyp-panic. Move the __hyp_text check in the kprobes blacklist so it applies on VHE systems too, to cover the common code and guest enter/exit assembly. Fixes: 888b3c87 ("arm64: Treat all entry code as non-kprobe-able") Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NJames Morse <james.morse@arm.com> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
-
由 Bjorn Helgaas 提交于
mainline inclusion from mainline-4.20 commit a98959fd category: bugfix bugzilla: 5773 CVE: NA ------------------------------------------------- find_next_iomem_res() finds an iomem resource that covers part of a range described by "start, end". All callers expect that range to be inclusive, i.e., both start and end are included, but find_next_iomem_res() doesn't handle the end address correctly. If it finds an iomem resource that contains exactly the end address, it skips it, e.g., if "start, end" is [0x0-0x10000] and there happens to be an iomem resource [mem 0x10000-0x10000] (the single byte at 0x10000), we skip it: find_next_iomem_res(...) { start = 0x0; end = 0x10000; for (p = next_resource(...)) { # p->start = 0x10000; # p->end = 0x10000; # we *should* return this resource, but this condition is false: if ((p->end >= start) && (p->start < end)) break; Adjust find_next_iomem_res() so it allows a resource that includes the single byte at the end of the range. This is a corner case that we probably don't see in practice. Fixes: 58c1b5b0 ("[PATCH] memory hotadd fixes: find_next_system_ram catch range fix") Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NBorislav Petkov <bp@suse.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Dan Williams <dan.j.williams@intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: Yaowei Bai <baiyaowei@cmss.chinamobile.com> CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org CC: mingo@redhat.com CC: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/153805812254.1157.16736368485811773752.stgit@bhelgaas-glaptop.roam.corp.google.comSigned-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
-
由 Dan Carpenter 提交于
mainline inclusion from mainline-5.0 commit db7ddeab category: bugfix bugzilla: 7418 CVE: NA ------------------------------------------------- There is a copy and paste bug so we set "config->test_driver" to NULL twice instead of setting "config->test_fs". Smatch complains that it leads to a double free: lib/test_kmod.c:840 __kmod_config_init() warn: 'config->test_fs' double freed Link: http://lkml.kernel.org/r/20190121140011.GA14283@kadam Fixes: d9c6a72d ("kmod: add test driver to stress test the module loader") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NLuis Chamberlain <mcgrof@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit db7ddeab) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ryabinin 提交于
mainline inclusion from mainline-4.20 commit 0c96350a category: bugfix bugzilla: 5689 CVE: NA ------------------------------------------------- Arch code may have asm implementation of string/memory API functions instead of using generic one from lib/string.c. KASAN don't see memory accesses in asm code, thus can miss many bugs. E.g. on ARM64 KASAN don't see bugs in memchr(), memcmp(), str[r]chr(), str[n]cmp(), str[n]len(). Add tests for these functions to be sure that we notice the problem on other architectures. Link: http://lkml.kernel.org/r/20180920135631.23833-3-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kyeongdon Kim <kyeongdon.kim@lge.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 0c96350a) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ryabinin 提交于
mainline inclusion from mainline-4.20 commit 19a2ca0f category: bugfix bugzilla: 5689 CVE: NA ------------------------------------------------- ARM64 has asm implementation of memchr(), memcmp(), str[r]chr(), str[n]cmp(), str[n]len(). KASAN don't see memory accesses in asm code, thus it can potentially miss many bugs. Ifdef out __HAVE_ARCH_* defines of these functions when KASAN is enabled, so the generic implementations from lib/string.c will be used. We can't just remove the asm functions because efistub uses them. And we can't have two non-weak functions either, so declare the asm functions as weak. Link: http://lkml.kernel.org/r/20180920135631.23833-2-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: NKyeongdon Kim <kyeongdon.kim@lge.com> Cc: Alexander Potapenko <glider@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 19a2ca0f) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrey Ryabinin 提交于
mainline inclusion from mainline-4.20 commit 74f213ea category: bugfix bugzilla: 5689 CVE: NA ------------------------------------------------- Since WEAK() supposed to be used instead of ENTRY() to define weak symbols, but unlike ENTRY() it doesn't have ALIGN directive. It seems there is no actual reason to not have, so let's add ALIGN to WEAK() too. Link: http://lkml.kernel.org/r/20180920135631.23833-1-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Will Deacon <will.deacon@arm.com>, Catalin Marinas <catalin.marinas@arm.com> Cc: Kyeongdon Kim <kyeongdon.kim@lge.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 74f213ea) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Mattias Jacobsson 提交于
mainline inclusion from mainline-5.0 commit 099be748 category: bugfix bugzilla: 6914 CVE: NA ------------------------------------------------- Each call to va_copy() should have one, and only one, corresponding call to va_end(). In strbuf_addv() some code paths result in va_end() getting called multiple times. Remove the superfluous va_end(). Signed-off-by: NMattias Jacobsson <2pi@mok.nu> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sanskriti Sharma <sansharm@redhat.com> Link: http://lkml.kernel.org/r/20181229141750.16945-1-2pi@mok.nu Fixes: ce49d843 ("perf strbuf: Match va_{add,copy} with va_end") Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> (cherry picked from commit 099be748) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Hongxu Jia 提交于
mainline inclusion from mainline-4.20 commit 389373d3 category: bugfix bugzilla: 6798 CVE: NA ------------------------------------------------- When /tmp is mounted with noexec, mksyscalltbl fails. [snip] |perf-1.0/tools/perf/arch/arm64/entry/syscalls//mksyscalltbl: /tmp/create-table-6VGPSt: Permission denied [snip] Add variable TMPDIR as prefix dir of the temporary file, if it is set, replace default /tmp. Signed-off-by: NHongxu Jia <hongxu.jia@windriver.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Sébastien Boisvert <sboisvert@gydle.com> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Fixes: 2b588243 ("perf arm64: Generate system call table from asm/unistd.h") LPU-Reference: 1539851173-14959-1-git-send-email-hongxu.jia@windriver.com Link: https://lkml.kernel.org/n/tip-1qrgq840ci0c5cy4oww957ge@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> (cherry picked from commit 389373d3) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Arnaldo Carvalho de Melo 提交于
mainline inclusion from mainline-5.0 commit 5192bde7 category: bugfix bugzilla: 6786 CVE: NA ------------------------------------------------- The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it. This fixes this warning on an Alpine Linux Edge system with gcc 8.2: util/header.c: In function 'perf_event__synthesize_event_update_name': util/header.c:3625:2: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation] strncpy(ev->data, evsel->name, len); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ util/header.c:3618:15: note: length computed here size_t len = strlen(evsel->name); ^~~~~~~~~~~~~~~~~~~ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: a6e52817 ("perf tools: Add event_update event unit type") Link: https://lkml.kernel.org/n/tip-wycz66iy8dl2z3yifgqf894p@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> (cherry picked from commit 5192bde7) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Ravi Bangoria 提交于
mainline inclusion from mainline-5.0 commit 57ddf091 category: bugfix bugzilla: 6781 CVE: NA ------------------------------------------------- Commit 0aa802a7 ("perf stat: Get rid of extra clock display function") introduced scale and unit for clock events. Thus, perf_stat__update_shadow_stats() now saves scaled values of clock events in msecs, instead of original nsecs. But while calculating values of shadow stats we still consider clock event values in nsecs. This results in a wrong shadow stat values. Ex, # ./perf stat -e task-clock,cycles ls <SNIP> 2.60 msec task-clock:u # 0.877 CPUs utilized 2,430,564 cycles:u # 1215282.000 GHz Fix this by saving original nsec values for clock events in perf_stat__update_shadow_stats(). After patch: # ./perf stat -e task-clock,cycles ls <SNIP> 3.14 msec task-clock:u # 0.839 CPUs utilized 3,094,528 cycles:u # 0.985 GHz Suggested-by: NJiri Olsa <jolsa@redhat.com> Reported-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: yuzhoujian@didichuxing.com Fixes: 0aa802a7 ("perf stat: Get rid of extra clock display function") Link: http://lkml.kernel.org/r/20181116042843.24067-1-ravi.bangoria@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> (cherry picked from commit 57ddf091) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Ravi Bangoria 提交于
mainline inclusion from mainline-5.0 commit eb08d006 category: prepare bugzilla: 6781 CVE: NA ------------------------------------------------- We already have function to check if a given event is either SW_CPU_CLOCK or SW_TASK_CLOCK. Utilize it. Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com> Acked-by: NJiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anton Blanchard <anton@samba.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Richter <tmricht@linux.vnet.ibm.com> Cc: yuzhoujian@didichuxing.com Link: http://lkml.kernel.org/r/20181115095533.16930-1-ravi.bangoria@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com> (cherry picked from commit eb08d006) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Eric W. Biederman 提交于
mainline inclusion from mainline-4.20 commit 55a3235f category: bugfix bugzilla: 5741 CVE: NA ------------------------------------------------- For userspace to tell the difference between a random signal and an exception, the exception must include siginfo information. Using SEND_SIG_FORCED for SIGILL is thus wrong, and it will result in userspace seeing si_code == SI_USER (like a random signal) instead of si_code == SI_KERNEL or a more specific si_code as all exceptions deliver. Therefore replace force_sig_info(SIGILL, SEND_SIG_FORCE, current) with force_sig(SIG_ILL, current) which gets this right and is shorter and easier to type. Fixes: 014940ba ("uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails") Fixes: 0b5256c7 ("uprobes: Send SIGILL if handle_trampoline() fails") Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> (cherry picked from commit 55a3235f) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Will Deacon 提交于
mainline inclusion from mainline-4.20 commit 8a60419d category: bugfix bugzilla: 5607 CVE: NA ------------------------------------------------- force_signal_inject() is designed to send a fatal signal to userspace, so WARN if the current pt_regs indicates a kernel context. This can currently happen for the undefined instruction trap, so patch that up so we always BUG() if we didn't have a handler. Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> (cherry picked from commit 8a60419d) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Prateek Sood 提交于
mainline inclusion from mainline-5.0 commit 6dc080ee category: bugfix bugzilla: 7209 CVE: NA ------------------------------------------------- For some peculiar reason rcuwait_wake_up() has the right barrier in the comment, but not in the code. This mistake has been observed to cause a deadlock in the following situation: P1 P2 percpu_up_read() percpu_down_write() rcu_sync_is_idle() // false rcu_sync_enter() ... __percpu_up_read() [S] ,- __this_cpu_dec(*sem->read_count) | smp_rmb(); [L] | task = rcu_dereference(w->task) // NULL | | [S] w->task = current | smp_mb(); | [L] readers_active_check() // fail `-> <store happens here> Where the smp_rmb() (obviously) fails to constrain the store. [ peterz: Added changelog. ] Signed-off-by: NPrateek Sood <prsood@codeaurora.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAndrea Parri <andrea.parri@amarulasolutions.com> Acked-by: NDavidlohr Bueso <dbueso@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 8f95c90c ("sched/wait, RCU: Introduce rcuwait machinery") Link: https://lkml.kernel.org/r/1543590656-7157-1-git-send-email-prsood@codeaurora.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> (cherry picked from commit 6dc080ee) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Xie Yongji 提交于
mainline inclusion from mainline-5.0 commit e158488b category: bugfix bugzilla: 7210 CVE: NA ------------------------------------------------- Because wake_q_add() can imply an immediate wakeup (cmpxchg failure case), we must not rely on the wakeup being delayed. However, commit: e3851390 ("locking/rwsem: Rework zeroing reader waiter->task") relies on exactly that behaviour in that the wakeup must not happen until after we clear waiter->task. [ peterz: Added changelog. ] Signed-off-by: NXie Yongji <xieyongji@baidu.com> Signed-off-by: NZhang Yu <zhangyu31@baidu.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: e3851390 ("locking/rwsem: Rework zeroing reader waiter->task") Link: https://lkml.kernel.org/r/1543495830-2644-1-git-send-email-xieyongji@baidu.comSigned-off-by: NIngo Molnar <mingo@kernel.org> (cherry picked from commit e158488b) Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Wei Li 提交于
hulk inclusion category: feature bugzilla: 9290 CVE: NA Add interrupt statistics for NMIs, as we can get NMI counters on the x86 machine in /proc/interrputs. ------------------------------------------------- Signed-off-by: NWei Li <liwei391@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Julien Thierry 提交于
hulk inclusion category: feature bugzilla: 9290 CVE: NA ported from https://lore.kernel.org/patchwork/patch/1037462/ ------------------------------------------------- NMI handling code should be executed between calls to nmi_enter and nmi_exit. Add a separate domain handler to properly setup NMI context when handling an interrupt requested as NMI. Signed-off-by: NJulien Thierry <julien.thierry@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NWei Li <liwei391@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Julien Thierry 提交于
hulk inclusion category: feature bugzilla: 9290 CVE: NA ported from https://lore.kernel.org/patchwork/patch/1037463/ ------------------------------------------------- Provide flow handlers that are NMI safe for interrupts and percpu_devid interrupts. Signed-off-by: NJulien Thierry <julien.thierry@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NWei Li <liwei391@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Julien Thierry 提交于
hulk inclusion category: feature bugzilla: 9290 CVE: NA ported from https://lore.kernel.org/patchwork/patch/1037461/ ------------------------------------------------- Add support for percpu_devid interrupts treated as NMIs. Percpu_devid NMIs need to be setup/torn down on each CPU they target. The same restrictions as for global NMIs still apply for percpu_devid NMIs. Signed-off-by: NJulien Thierry <julien.thierry@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWei Li <liwei391@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Julien Thierry 提交于
hulk inclusion category: feature bugzilla: 9290 CVE: NA ported from https://lore.kernel.org/patchwork/patch/1037460/ ------------------------------------------------- Add functionality to allocate interrupt lines that will deliver IRQs as Non-Maskable Interrupts. These allocations are only successful if the irqchip provides the necessary support and allows NMI delivery for the interrupt line. Interrupt lines allocated for NMI delivery must be enabled/disabled through enable_nmi/disable_nmi_nosync to keep their state consistent. To treat a PERCPU IRQ as NMI, the interrupt must not be shared nor threaded, the irqchip directly managing the IRQ must be the root irqchip and the irqchip cannot be behind a slow bus. Signed-off-by: NJulien Thierry <julien.thierry@arm.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWei Li <liwei391@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Ming Lei 提交于
mainline inclusion from mainline-4.20-rc5 commit 2a5cf35c category: bugfix bugzilla: 5903 CVE: NA --------------------------- There are actually two kinds of discard merge: - one is the normal discard merge, just like normal read/write request, and call it single-range discard - another is the multi-range discard, queue_max_discard_segments(rq->q) > 1 For the former case, queue_max_discard_segments(rq->q) is 1, and we should handle this kind of discard merge like the normal read/write request. This patch fixes the following kernel panic issue[1], which is caused by not removing the single-range discard request from elevator queue. Guangwu has one raid discard test case, in which this issue is a bit easier to trigger, and I verified that this patch can fix the kernel panic issue in Guangwu's test case. [1] kernel panic log from Jens's report BUG: unable to handle kernel NULL pointer dereference at 0000000000000148 PGD 0 P4D 0. Oops: 0000 [#1] SMP PTI CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \ 4.20.0-rc3-00649-ge64d9a554a91-dirty #14 Hardware name: Wiwynn \ Leopard-Orv2/Leopard-DDR BW, BIOS LBM08 03/03/2017 Workqueue: kblockd \ blk_mq_run_work_fn RIP: \ 0010:blk_mq_get_driver_tag+0x81/0x120 Code: 24 \ 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \ 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \ f6 87 b0 00 00 00 02 RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246 \ RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8 RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000 R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300 R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000 FS: 0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: blk_mq_dispatch_rq_list+0xec/0x480 ? elv_rb_del+0x11/0x30 blk_mq_do_dispatch_sched+0x6e/0xf0 blk_mq_sched_dispatch_requests+0xfa/0x170 __blk_mq_run_hw_queue+0x5f/0xe0 process_one_work+0x154/0x350 worker_thread+0x46/0x3c0 kthread+0xf5/0x130 ? process_one_work+0x350/0x350 ? kthread_destroy_worker+0x50/0x50 ret_from_fork+0x1f/0x30 Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \ kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \ cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \ button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \ nvme_core fuse sg loop efivarfs autofs4 CR2: 0000000000000148 \ ---[ end trace 340a1fb996df1b9b ]--- RIP: 0010:blk_mq_get_driver_tag+0x81/0x120 Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \ 00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 \ 20 72 37 f6 87 b0 00 00 00 02 Fixes: 445251d0 ("blk-mq: fix discard merge with scheduler attached") Reported-by: NJens Axboe <axboe@kernel.dk> Cc: Guangwu Zhang <guazhang@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Reviewed-by: NMiao Xie <miaoxie@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jianchao Wang 提交于
mainline inclusion from mainline-4.20-rc1 commit 69840466 category: bugfix bugzilla: 5903 CVE: NA --------------------------- There are two cases when handle DISCARD merge. If max_discard_segments == 1, the bios/requests need to be contiguous to merge. If max_discard_segments > 1, it takes every bio as a range and different range needn't to be contiguous. But now, attempt_merge screws this up. It always consider contiguity for DISCARD for the case max_discard_segments > 1 and cannot merge contiguous DISCARD for the case max_discard_segments == 1, because rq_attempt_discard_merge always returns false in this case. This patch fixes both of the two cases above. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Reviewed-by: NMiao Xie <miaoxie@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Ming Lei 提交于
mainline inclusion from mainline-5.0-rc1 commit 1db4909e category: bugfix bugzilla: 5901 CVE: NA --------------------------- Even though .mq_kobj, ctx->kobj and q->kobj share same lifetime from block layer's view, actually they don't because userspace may grab one kobject anytime via sysfs. This patch fixes the issue by the following approach: 1) introduce 'struct blk_mq_ctxs' for holding .mq_kobj and managing all ctxs 2) free all allocated ctxs and the 'blk_mq_ctxs' instance in release handler of .mq_kobj 3) grab one ref of .mq_kobj before initializing each ctx->kobj, so that .mq_kobj is always released after all ctxs are freed. This patch fixes kernel panic issue during booting when DEBUG_KOBJECT_RELEASE is enabled. Reported-by: NGuenter Roeck <linux@roeck-us.net> Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> Tested-by: NGuenter Roeck <linux@roeck-us.net> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Reviewed-by: NMiao Xie <miaoxie@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Ming Lei 提交于
mainline inclusion from mainline-5.0-rc1 commit 59388702 category: bugfix bugzilla: 5914 CVE: NA --------------------------- Now almost all .map_queues() implementation based on managed irq affinity doesn't update queue mapping and it just retrieves the old built mapping, so if nr_hw_queues is changed, the mapping talbe includes stale mapping. And only blk_mq_map_queues() may rebuild the mapping talbe. One case is that we limit .nr_hw_queues as 1 in case of kdump kernel. However, drivers often builds queue mapping before allocating tagset via pci_alloc_irq_vectors_affinity(), but set->nr_hw_queues can be set as 1 in case of kdump kernel, so wrong queue mapping is used, and kernel panic[1] is observed during booting. This patch fixes the kernel panic triggerd on nvme by rebulding the mapping table via blk_mq_map_queues(). [1] kernel panic log [ 4.438371] nvme nvme0: 16/0/0 default/read/poll queues [ 4.443277] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 [ 4.444681] PGD 0 P4D 0 [ 4.445367] Oops: 0000 [#1] SMP NOPTI [ 4.446342] CPU: 3 PID: 201 Comm: kworker/u33:10 Not tainted 4.20.0-rc5-00664-g5eb02f7ee1eb-dirty #459 [ 4.447630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 [ 4.448689] Workqueue: nvme-wq nvme_scan_work [nvme_core] [ 4.449368] RIP: 0010:blk_mq_map_swqueue+0xfb/0x222 [ 4.450596] Code: 04 f5 20 28 ef 81 48 89 c6 39 55 30 76 93 89 d0 48 c1 e0 04 48 03 83 f8 05 00 00 48 8b 00 42 8b 3c 28 48 8b 43 58 48 8b 04 f8 <48> 8b b8 98 00 00 00 4c 0f a3 37 72 42 f0 4c 0f ab 37 66 8b b8 f6 [ 4.453132] RSP: 0018:ffffc900023b3cd8 EFLAGS: 00010286 [ 4.454061] RAX: 0000000000000000 RBX: ffff888174448000 RCX: 0000000000000001 [ 4.456480] RDX: 0000000000000001 RSI: ffffe8feffc506c0 RDI: 0000000000000001 [ 4.458750] RBP: ffff88810722d008 R08: ffff88817647a880 R09: 0000000000000002 [ 4.464580] R10: ffffc900023b3c10 R11: 0000000000000004 R12: ffff888174448538 [ 4.467803] R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000001 [ 4.469220] FS: 0000000000000000(0000) GS:ffff88817bac0000(0000) knlGS:0000000000000000 [ 4.471554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.472464] CR2: 0000000000000098 CR3: 0000000174e4e001 CR4: 0000000000760ee0 [ 4.474264] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.476007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4.477061] PKRU: 55555554 [ 4.477464] Call Trace: [ 4.478731] blk_mq_init_allocated_queue+0x36a/0x3ad [ 4.479595] blk_mq_init_queue+0x32/0x4e [ 4.480178] nvme_validate_ns+0x98/0x623 [nvme_core] [ 4.480963] ? nvme_submit_sync_cmd+0x1b/0x20 [nvme_core] [ 4.481685] ? nvme_identify_ctrl.isra.8+0x70/0xa0 [nvme_core] [ 4.482601] nvme_scan_work+0x23a/0x29b [nvme_core] [ 4.483269] ? _raw_spin_unlock_irqrestore+0x25/0x38 [ 4.483930] ? try_to_wake_up+0x38d/0x3b3 [ 4.484478] ? process_one_work+0x179/0x2fc [ 4.485118] process_one_work+0x1d3/0x2fc [ 4.485655] ? rescuer_thread+0x2ae/0x2ae [ 4.486196] worker_thread+0x1e9/0x2be [ 4.486841] kthread+0x115/0x11d [ 4.487294] ? kthread_park+0x76/0x76 [ 4.487784] ret_from_fork+0x3a/0x50 [ 4.488322] Modules linked in: nvme nvme_core qemu_fw_cfg virtio_scsi ip_tables [ 4.489428] Dumping ftrace buffer: [ 4.489939] (ftrace buffer empty) [ 4.490492] CR2: 0000000000000098 [ 4.491052] ---[ end trace 03cd268ad5a86ff7 ]--- Conflicts: block/blk-mq.c [yuyufen: 'b3c661b1 blk-mq: support multiple hctx maps' have not been bankported] Cc: Christoph Hellwig <hch@lst.de> Cc: linux-nvme@lists.infradead.org Cc: David Milburn <dmilburn@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Reviewed-by: NMiao Xie <miaoxie@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jianchao Wang 提交于
mainline inclusion from mainline-5.0-rc5 commit 85bd6e61 category: bugfix bugzilla: 7421 CVE: NA --------------------------- Florian reported a io hung issue when fsync(). It should be triggered by following race condition. data + post flush a flush blk_flush_complete_seq case REQ_FSEQ_DATA blk_flush_queue_rq issued to driver blk_mq_dispatch_rq_list try to issue a flush req failed due to NON-NCQ command .queue_rq return BLK_STS_DEV_RESOURCE request completion req->end_io // doesn't check RESTART mq_flush_data_end_io case REQ_FSEQ_POSTFLUSH blk_kick_flush do nothing because previous flush has not been completed blk_mq_run_hw_queue insert rq to hctx->dispatch due to RESTART is still set, do nothing To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io with blk_mq_sched_restart to check and clear the RESTART flag. Fixes: bd166ef1 (blk-mq-sched: add framework for MQ capable IO schedulers) Reported-by: NFlorian Stecker <m19@florianstecker.de> Tested-by: NFlorian Stecker <m19@florianstecker.de> Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jianchao Wang 提交于
mainline inclusion from mainline-4.20-rc1 commit e01ad46d category: bugfix bugzilla: 5833 CVE: NA --------------------------- When we try to increate the nr_hw_queues, we may fail due to shortage of memory or other reason, then blk_mq_realloc_hw_ctxs stops and some entries in q->queue_hw_ctx are left with NULL. However, because queue map has been updated with new nr_hw_queues, some cpus have been mapped to hw queue which just encounters allocation failure, thus blk_mq_map_queue could return NULL. This will cause panic in following blk_mq_map_swqueue. To fix it, when increase nr_hw_queues fails, fallback to previous nr_hw_queues and post warning. At the same time, driver's .map_queues usually use completion irq affinity to map hw and cpu, fallback nr_hw_queues will cause lack of some cpu's map to hw, so use default blk_mq_map_queues to do that. Reported-by: syzbot+83e8cbe702263932d9d4@syzkaller.appspotmail.com Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jianchao Wang 提交于
mainline inclusion from mainline-4.20-rc1 commit 34d11ffa category: bugfix bugzilla: 5833 CVE: NA --------------------------- When the hw queues and mq_map are updated, a hctx could be mapped to a different numa node. At this moment, we need to realloc the hctx. If fail to do that, go on using previous hctx. Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jianchao Wang 提交于
mainline inclusion from mainline-4.20-rc1 commit 5b202853 category: bugfix bugzilla: 5833 CVE: NA --------------------------- blk_mq_realloc_hw_ctxs could be invoked during update hw queues. At the momemt, IO is blocked. Change the gfp flags from GFP_KERNEL to GFP_NOIO to avoid forever hang during memory allocation in blk_mq_realloc_hw_ctxs. Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-