- 03 6月, 2020 7 次提交
-
-
由 Joerg Roedel 提交于
Implement the function to sync changes in vmalloc and ioremap ranges to all page-tables. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-5-joro@8bytes.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv] Acked-by: Gao Xiang <xiang@kernel.org> [erofs] Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NWei Liu <wei.liu@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
To help enforcing the W^X protection don't allow remapping existing pages as executable. x86 bits from Peter Zijlstra, arm64 bits from Mark Rutland. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com>. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-20-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
vmap does not take a gfp_t, the flags argument is for VM_* flags. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Kelley <mikelley@microsoft.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200414131348.444715-3-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Patch series "decruft the vmalloc API", v2. Peter noticed that with some dumb luck you can toast the kernel address space with exported vmalloc symbols. I used this as an opportunity to decruft the vmalloc.c API and make it much more systematic. This also removes any chance to create vmalloc mappings outside the designated areas or using executable permissions from modules. Besides that it removes more than 300 lines of code. This patch (of 29): Use the designated helper for allocating executable kernel memory, and remove the now unused PAGE_KERNEL_RX define. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NMichael Kelley <mikelley@microsoft.com> Acked-by: NWei Liu <wei.liu@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Gao Xiang <xiang@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lkml.kernel.org/r/20200414131348.444715-1-hch@lst.de Link: http://lkml.kernel.org/r/20200414131348.444715-2-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Steven Price 提交于
The page table entry is passed in the 'val' argument to note_page(), however this was previously an "unsigned long" which is fine on 64-bit platforms. But for 32 bit x86 it is not always big enough to contain a page table entry which may be 64 bits. Change the type to u64 to ensure that it is always big enough. [akpm@linux-foundation.org: fix riscv] Reported-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NSteven Price <steven.price@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200521152308.33096-3-steven.price@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Steven Price 提交于
Patch series "Fix W+X debug feature on x86" Jan alerted me[1] that the W+X detection debug feature was broken in x86 by my change[2] to switch x86 to use the generic ptdump infrastructure. Fundamentally the approach of trying to move the calculation of effective permissions into note_page() was broken because note_page() is only called for 'leaf' entries and the effective permissions are passed down via the internal nodes of the page tree. The solution I've taken here is to create a new (optional) callback which is called for all nodes of the page tree and therefore can calculate the effective permissions. Secondly on some configurations (32 bit with PAE) "unsigned long" is not large enough to store the table entries. The fix here is simple - let's just use a u64. [1] https://lore.kernel.org/lkml/d573dc7e-e742-84de-473d-f971142fa319@suse.com/ [2] 2ae27137 ("x86: mm: convert dump_pagetables to use walk_page_range") This patch (of 2): By switching the x86 page table dump code to use the generic code the effective permissions are no longer calculated correctly because the note_page() function is only called for *leaf* entries. To calculate the actual effective permissions it is necessary to observe the full hierarchy of the page tree. Introduce a new callback for ptdump which is called for every entry and can therefore update the prot_levels array correctly. note_page() can then simply access the appropriate element in the array. [steven.price@arm.com: make the assignment conditional on val != 0] Link: http://lkml.kernel.org/r/430c8ab4-e7cd-6933-dde6-087fac6db872@arm.com Fixes: 2ae27137 ("x86: mm: convert dump_pagetables to use walk_page_range") Reported-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NSteven Price <steven.price@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200521152308.33096-1-steven.price@arm.com Link: http://lkml.kernel.org/r/20200521152308.33096-2-steven.price@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 5月, 2020 2 次提交
-
-
由 Jay Lang 提交于
In the copy_process() routine called by _do_fork(), failure to allocate a PID (or further along in the function) will trigger an invocation to exit_thread(). This is done to clean up from an earlier call to copy_thread_tls(). Naturally, the child task is passed into exit_thread(), however during the process, io_bitmap_exit() nullifies the parent's io_bitmap rather than the child's. As copy_thread_tls() has been called ahead of the failure, the reference count on the calling thread's io_bitmap is incremented as we would expect. However, io_bitmap_exit() doesn't accept any arguments, and thus assumes it should trash the current thread's io_bitmap reference rather than the child's. This is pretty sneaky in practice, because in all instances but this one, exit_thread() is called with respect to the current task and everything works out. A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to get a bitmap allocated, and force a clone3() syscall to fail by passing in a zeroed clone_args structure. The kernel handles the erroneous struct and the buggy code path is followed, and even though the parent's reference to the io_bitmap is trashed, the child still holds a reference and thus the structure will never be freed. Fix this by tweaking io_bitmap_exit() and its subroutines to accept a task_struct argument which to operate on. Fixes: ea5f1cd7 ("x86/ioperm: Remove bitmap if all permissions dropped") Signed-off-by: NJay Lang <jaytlang@mit.edu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable#@vger.kernel.org Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
-
由 Alexander Dahl 提交于
The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is 4 294 967 296 or 0x100000000 which is no problem on 64 bit systems. The patch does not change the later overall result of 0x100000 for MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new calculation yields the same result, but does not require 64 bit arithmetic. On 32 bit systems the old calculation suffers from an arithmetic overflow in that intermediate term in braces: 4UL aka unsigned long int is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does not fit in 4 bytes), the in braces result is truncated to zero, the following right shift does not alter that, so MAX_DMA32_PFN evaluates to 0 on 32 bit systems. That wrong value is a problem in a comparision against MAX_DMA32_PFN in the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if swiotlb should be active. That comparison yields the opposite result, when compiling on 32 bit systems. This was not possible before 1b7e03ef ("x86, NUMA: Enable emulation on 32bit too") when that MAX_DMA32_PFN was first made visible to x86_32 (and which landed in v3.0). In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on x86-32. However if one has set CONFIG_IOMMU_INTEL, since c5a5dc4c ("iommu/vt-d: Don't switch off swiotlb if bounce page is used") there's a dependency on CONFIG_SWIOTLB, which was not necessarily active before. That landed in v5.4, where we noticed it in the fli4l Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64 bit kernel configs there (I could not find out why, so let's just say historical reasons). The effect is at boot time 64 MiB (default size) were allocated for bounce buffers now, which is a noticeable amount of memory on small systems like pcengines ALIX 2D3 with 256 MiB memory, which are still frequently used as home routers. We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4 (LTS) in fli4l and got that kernel messages for example: Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8)) #1 SMP Mon Nov 26 23:40:00 CET 2018 … Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem) … PCI-DMA: Using software bounce buffering for IO (SWIOTLB) software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB) The initial analysis and the suggested fix was done by user 'sourcejedi' at stackoverflow and explicitly marked as GPLv2 for inclusion in the Linux kernel: https://unix.stackexchange.com/a/520525/50007 The new calculation, which does not suffer from that overflow, is the same as for arch/mips now as suggested by Robin Murphy. The fix was tested by fli4l users on round about two dozen different systems, including both 32 and 64 bit archs, bare metal and virtualized machines. [ bp: Massage commit message. ] Fixes: 1b7e03ef ("x86, NUMA: Enable emulation on 32bit too") Reported-by: NAlan Jenkins <alan.christopher.jenkins@gmail.com> Suggested-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NAlexander Dahl <post@lespocky.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://unix.stackexchange.com/q/520065/50007 Link: https://web.nettworks.org/bugs/browse/FFL-2560 Link: https://lkml.kernel.org/r/20200526175749.20742-1-post@lespocky.de
-
- 28 5月, 2020 1 次提交
-
-
由 Al Viro 提交于
copy the corresponding pieces of init_fpstate into the gaps instead. Cc: stable@kernel.org Tested-by: NAlexander Potapenko <glider@google.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 5月, 2020 1 次提交
-
-
由 Krzysztof Kozlowski 提交于
There is a generic, kernel wide configuration symbol for enabling the IOMMU specific bits: CONFIG_IOMMU_API. Implementations (including INTEL_IOMMU and AMD_IOMMU driver) select it so use it here as well. This makes the conditional archdata.iommu field consistent with other platforms and also fixes any compile test builds of other IOMMU drivers, when INTEL_IOMMU or AMD_IOMMU are not selected). For the case when INTEL_IOMMU/AMD_IOMMU and COMPILE_TEST are not selected, this should create functionally equivalent code/choice. With COMPILE_TEST this field could appear if other IOMMU drivers are chosen but neither INTEL_IOMMU nor AMD_IOMMU are not. Reported-by: Nkbuild test robot <lkp@intel.com> Fixes: e93a1695 ("iommu: Enable compile testing for some of drivers") Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org> Acked-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200518120855.27822-2-krzk@kernel.orgSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
- 26 5月, 2020 1 次提交
-
-
由 Andy Lutomirski 提交于
Revert 45e29d11 ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long") and add a comment to discourage someone else from making the same mistake again. It turns out that some user code fails to compile if __X32_SYSCALL_BIT is unsigned long. See, for example [1] below. [ bp: Massage and do the same thing in the respective tools/ header. ] Fixes: 45e29d11 ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long") Reported-by: NThorsten Glaser <t.glaser@tarent.de> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: stable@kernel.org Link: [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954294 Link: https://lkml.kernel.org/r/92e55442b744a5951fdc9cfee10badd0a5f7f828.1588983892.git.luto@kernel.org
-
- 24 5月, 2020 1 次提交
-
-
由 Nick Desaulniers 提交于
This is easily reproducible via CC=clang + CONFIG_STAGING=y + CONFIG_VT6656=m. It turns out that if your config tickles __builtin_constant_p via differences in choices to inline or not, these statements produce invalid assembly: $ cat foo.c long a(long b, long c) { asm("orb %1, %0" : "+q"(c): "r"(b)); return c; } $ gcc foo.c foo.c: Assembler messages: foo.c:2: Error: `%rax' not allowed with `orb' Use the `%b` "x86 Operand Modifier" to instead force register allocation to select a lower-8-bit GPR operand. The "q" constraint only has meaning on -m32 otherwise is treated as "r". Not all GPRs have low-8-bit aliases for -m32. Fixes: 1651e700 ("x86: Fix bitops.h warning with a moved cast") Reported-by: Nkernelci.org bot <bot@kernelci.org> Suggested-by: NAndy Shevchenko <andriy.shevchenko@intel.com> Suggested-by: NBrian Gerst <brgerst@gmail.com> Suggested-by: NH. Peter Anvin <hpa@zytor.com> Suggested-by: NIlie Halip <ilie.halip@gmail.com> Signed-off-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NSedat Dilek <sedat.dilek@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> [build, clang-11] Reviewed-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-By: NBrian Gerst <brgerst@gmail.com> Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Marco Elver <elver@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Daniel Axtens <dja@axtens.net> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: http://lkml.kernel.org/r/20200508183230.229464-1-ndesaulniers@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/961 Link: https://lore.kernel.org/lkml/20200504193524.GA221287@google.com/ Link: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86OperandmodifiersSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 5月, 2020 1 次提交
-
-
由 Josh Poimboeuf 提交于
Normally, show_trace_log_lvl() scans the stack, looking for text addresses to print. In parallel, it unwinds the stack with unwind_next_frame(). If the stack address matches the pointer returned by unwind_get_return_address_ptr() for the current frame, the text address is printed normally without a question mark. Otherwise it's considered a breadcrumb (potentially from a previous call path) and it's printed with a question mark to indicate that the address is unreliable and typically can be ignored. Since the following commit: f1d9a2ab ("x86/unwind/orc: Don't skip the first frame for inactive tasks") ... for inactive tasks, show_trace_log_lvl() prints *only* unreliable addresses (prepended with '?'). That happens because, for the first frame of an inactive task, unwind_get_return_address_ptr() returns the wrong return address pointer: one word *below* the task stack pointer. show_trace_log_lvl() starts scanning at the stack pointer itself, so it never finds the first 'reliable' address, causing only guesses to being printed. The first frame of an inactive task isn't a normal stack frame. It's actually just an instance of 'struct inactive_task_frame' which is left behind by __switch_to_asm(). Now that this inactive frame is actually exposed to callers, fix unwind_get_return_address_ptr() to interpret it properly. Fixes: f1d9a2ab ("x86/unwind/orc: Don't skip the first frame for inactive tasks") Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200522135435.vbxs7umku5pyrdbk@treble
-
- 20 5月, 2020 1 次提交
-
-
由 Nathan Chancellor 提交于
When building with Clang + -Wtautological-compare and CONFIG_CPUMASK_OFFSTACK unset: arch/x86/mm/mmio-mod.c:375:6: warning: comparison of array 'downed_cpus' equal to a null pointer is always false [-Wtautological-pointer-compare] if (downed_cpus == NULL && ^~~~~~~~~~~ ~~~~ arch/x86/mm/mmio-mod.c:405:6: warning: comparison of array 'downed_cpus' equal to a null pointer is always false [-Wtautological-pointer-compare] if (downed_cpus == NULL || cpumask_weight(downed_cpus) == 0) ^~~~~~~~~~~ ~~~~ 2 warnings generated. Commit f7e30f01 ("cpumask: Add helper cpumask_available()") added cpumask_available() to fix warnings of this nature. Use that here so that clang does not warn regardless of CONFIG_CPUMASK_OFFSTACK's value. Reported-by: NSedat Dilek <sedat.dilek@gmail.com> Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Link: https://github.com/ClangBuiltLinux/linux/issues/982 Link: https://lkml.kernel.org/r/20200408205323.44490-1-natechancellor@gmail.com
-
- 16 5月, 2020 1 次提交
-
-
由 Jim Mattson 提交于
Bank_num is a one-based count of banks, not a zero-based index. It overflows the allocated space only when strictly greater than KVM_MAX_MCE_BANKS. Fixes: a9e38c3e ("KVM: x86: Catch potential overrun in MCE setup") Signed-off-by: NJue Wang <juew@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Message-Id: <20200511225616.19557-1-jmattson@google.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 5月, 2020 3 次提交
-
-
由 Daniel Borkmann 提交于
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs with overlapping address ranges, we should really take the next step to disable them from BPF use there. To generally fix the situation, we've recently added new helper variants bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str(). For details on them, see 6ae08ae3 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,kernel}_str helpers"). Given bpf_probe_read{,str}() have been around for ~5 years by now, there are plenty of users at least on x86 still relying on them today, so we cannot remove them entirely w/o breaking the BPF tracing ecosystem. However, their use should be restricted to archs with non-overlapping address ranges where they are working in their current form. Therefore, move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and have x86, arm64, arm select it (other archs supporting it can follow-up on it as well). For the remaining archs, they can workaround easily by relying on the feature probe from bpftool which spills out defines that can be used out of BPF C code to implement the drop-in replacement for old/new kernels via: bpftool feature probe macro Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
-
由 Borislav Petkov 提交于
... or the odyssey of trying to disable the stack protector for the function which generates the stack canary value. The whole story started with Sergei reporting a boot crash with a kernel built with gcc-10: Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b3 #139 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013 Call Trace: dump_stack panic ? start_secondary __stack_chk_fail start_secondary secondary_startup_64 -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary This happens because gcc-10 tail-call optimizes the last function call in start_secondary() - cpu_startup_entry() - and thus emits a stack canary check which fails because the canary value changes after the boot_init_stack_canary() call. To fix that, the initial attempt was to mark the one function which generates the stack canary with: __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused) however, using the optimize attribute doesn't work cumulatively as the attribute does not add to but rather replaces previously supplied optimization options - roughly all -fxxx options. The key one among them being -fno-omit-frame-pointer and thus leading to not present frame pointer - frame pointer which the kernel needs. The next attempt to prevent compilers from tail-call optimizing the last function call cpu_startup_entry(), shy of carving out start_secondary() into a separate compilation unit and building it with -fno-stack-protector, was to add an empty asm(""). This current solution was short and sweet, and reportedly, is supported by both compilers but we didn't get very far this time: future (LTO?) optimization passes could potentially eliminate this, which leads us to the third attempt: having an actual memory barrier there which the compiler cannot ignore or move around etc. That should hold for a long time, but hey we said that about the other two solutions too so... Reported-by: NSergei Trofimovich <slyfox@gentoo.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NKalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
-
由 Josh Poimboeuf 提交于
The unwind_state 'error' field is used to inform the reliable unwinding code that the stack trace can't be trusted. Set this field for all errors in __unwind_start(). Also, move the zeroing out of the unwind_state struct to before the ORC table initialization check, to prevent the caller from reading uninitialized data if the ORC table is corrupted. Fixes: af085d90 ("stacktrace/x86: add function for detecting reliable stack traces") Fixes: d3a09104 ("x86/unwinder/orc: Dont bail on stack overflow") Fixes: 98d0c8eb ("x86/unwind/orc: Prevent unwinding before ORC initialization") Reported-by: NPavel Machek <pavel@denx.de> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/d6ac7215a84ca92b895fdd2e1aa546729417e6e6.1589487277.git.jpoimboe@redhat.com
-
- 14 5月, 2020 1 次提交
-
-
由 Arvind Sankar 提交于
Mike Lothian reports that after commit 964124a9 ("efi/x86: Remove extra headroom for setup block") gcc 10.1.0 fails with HOSTCC arch/x86/boot/tools/build /usr/lib/gcc/x86_64-pc-linux-gnu/10.1.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: linker defined: multiple definition of '_end' /usr/lib/gcc/x86_64-pc-linux-gnu/10.1.0/../../../../x86_64-pc-linux-gnu/bin/ld: /tmp/ccEkW0jM.o: previous definition here collect2: error: ld returned 1 exit status make[1]: *** [scripts/Makefile.host:103: arch/x86/boot/tools/build] Error 1 make: *** [arch/x86/Makefile:303: bzImage] Error 2 The issue is with the _end variable that was added, to hold the end of the compressed kernel from zoffsets.h (ZO__end). The name clashes with the linker-defined _end symbol that indicates the end of the build program itself. Even when there is no compile-time error, this causes build to use memory past the end of its .bss section. To solve this, mark _end as static, and for symmetry, mark the rest of the variables that keep track of symbols from the compressed kernel as static as well. Fixes: 964124a9 ("efi/x86: Remove extra headroom for setup block") Reported-by: NMike Lothian <mike@fireburn.co.uk> Tested-by: NMike Lothian <mike@fireburn.co.uk> Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200511225849.1311869-1-nivedita@alum.mit.eduSigned-off-by: NArd Biesheuvel <ardb@kernel.org>
-
- 13 5月, 2020 3 次提交
-
-
由 Babu Moger 提交于
Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU resource isn't. It can be read with XSAVE and written with XRSTOR. So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state), the guest can read the host value. In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could potentially use XRSTOR to change the host PKRU value. While at it, move pkru state save/restore to common code and the host_pkru field to kvm_vcpu_arch. This will let SVM support protection keys. Cc: stable@vger.kernel.org Reported-by: NJim Mattson <jmattson@google.com> Signed-off-by: NBabu Moger <babu.moger@amd.com> Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Errors during hibernation with reenlightenment notifications enabled were reported: [ 51.730435] PM: hibernation entry [ 51.737435] PM: Syncing filesystems ... ... [ 54.102216] Disabling non-boot CPUs ... [ 54.106633] smpboot: CPU 1 is now offline [ 54.110006] unchecked MSR access error: WRMSR to 0x40000106 (tried to write 0x47c72780000100ee) at rIP: 0xffffffff90062f24 native_write_msr+0x4/0x20) [ 54.110006] Call Trace: [ 54.110006] hv_cpu_die+0xd9/0xf0 ... Normally, hv_cpu_die() just reassigns reenlightenment notifications to some other CPU when the CPU receiving them goes offline. Upon hibernation, there is no other CPU which is still online so cpumask_any_but(cpu_online_mask) returns >= nr_cpu_ids and using it as hv_vp_index index is incorrect. Disable the feature when cpumask_any_but() fails. Also, as we now disable reenlightenment notifications upon hibernation we need to restore them on resume. Check if hv_reenlightenment_cb was previously set and restore from hv_resume(). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NDexuan Cui <decui@microsoft.com> Reviewed-by: NTianyu Lan <Tianyu.Lan@microsoft.com> Link: https://lore.kernel.org/r/20200512160153.134467-1-vkuznets@redhat.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
-
由 Steven Rostedt (VMware) 提交于
Booting one of my machines, it triggered the following crash: Kernel/User page tables isolation: enabled ftrace: allocating 36577 entries in 143 pages Starting tracer 'function' BUG: unable to handle page fault for address: ffffffffa000005c #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061 Oops: 0003 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ #24 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 RIP: 0010:text_poke_early+0x4a/0x58 Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff 0 41 57 49 RSP: 0000:ffffffff82003d38 EFLAGS: 00010046 RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005 RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0 R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0 R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0 FS: 0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0 Call Trace: text_poke_bp+0x27/0x64 ? mutex_lock+0x36/0x5d arch_ftrace_update_trampoline+0x287/0x2d5 ? ftrace_replace_code+0x14b/0x160 ? ftrace_update_ftrace_func+0x65/0x6c __register_ftrace_function+0x6d/0x81 ftrace_startup+0x23/0xc1 register_ftrace_function+0x20/0x37 func_set_flag+0x59/0x77 __set_tracer_option.isra.19+0x20/0x3e trace_set_options+0xd6/0x13e apply_trace_boot_options+0x44/0x6d register_tracer+0x19e/0x1ac early_trace_init+0x21b/0x2c9 start_kernel+0x241/0x518 ? load_ucode_intel_bsp+0x21/0x52 secondary_startup_64+0xa4/0xb0 I was able to trigger it on other machines, when I added to the kernel command line of both "ftrace=function" and "trace_options=func_stack_trace". The cause is the "ftrace=function" would register the function tracer and create a trampoline, and it will set it as executable and read-only. Then the "trace_options=func_stack_trace" would then update the same trampoline to include the stack tracer version of the function tracer. But since the trampoline already exists, it updates it with text_poke_bp(). The problem is that text_poke_bp() called while system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not the page mapping, as it would think that the text is still read-write. But in this case it is not, and we take a fault and crash. Instead, lets keep the ftrace trampolines read-write during boot up, and then when the kernel executable text is set to read-only, the ftrace trampolines get set to read-only as well. Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: stable@vger.kernel.org Fixes: 768ae440 ("x86/ftrace: Use text_poke()") Acked-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 08 5月, 2020 7 次提交
-
-
由 Suravee Suthikulpanit 提交于
The commit 64b5bd27 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1") introduced a WARN_ON, which checks if AVIC is enabled when trying to set V_IRQ in the VMCB for enabling irq window. The following warning is triggered because the requesting vcpu (to deactivate AVIC) does not get to process APICv update request for itself until the next #vmexit. WARNING: CPU: 0 PID: 118232 at arch/x86/kvm/svm/svm.c:1372 enable_irq_window+0x6a/0xa0 [kvm_amd] RIP: 0010:enable_irq_window+0x6a/0xa0 [kvm_amd] Call Trace: kvm_arch_vcpu_ioctl_run+0x6e3/0x1b50 [kvm] ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm] ? _copy_to_user+0x26/0x30 ? kvm_vm_ioctl+0xb3e/0xd90 [kvm] ? set_next_entity+0x78/0xc0 kvm_vcpu_ioctl+0x236/0x610 [kvm] ksys_ioctl+0x8a/0xc0 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x58/0x210 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes by sending APICV update request to all other vcpus, and immediately update APIC for itself. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lkml.org/lkml/2020/5/2/167 Fixes: 64b5bd27 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1") Message-Id: <1588818939-54264-1-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
This allows making request to all other vcpus except the one specified in the parameter. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1588771076-73790-2-git-send-email-suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD), DR6 was incorrectly copied from the value in the VM. Instead, DR6.BD should be set in order to catch this case. On AMD this does not need any special code because the processor triggers a #DB exception that is intercepted. However, the testcase would fail without the previous patch because both DR6.BS and DR6.BD would be set. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the different handling of DR6 on intercepted #DB exceptions on Intel and AMD. On Intel, #DB exceptions transmit the DR6 value via the exit qualification field of the VMCS, and the exit qualification only contains the description of the precise event that caused a vmexit. On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception was to be injected into the guest. This has two effects when guest debugging is in use: * the guest DR6 is clobbered * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather than just the last one that happened (the testcase in the next patch covers this issue). This patch fixes both issues by emulating, so to speak, the Intel behavior on AMD processors. The important observation is that (after the previous patches) the VMCB value of DR6 is only ever observable from the guest is KVM_DEBUGREG_WONT_EXIT is set. Therefore we can actually set vmcb->save.dr6 to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it will be if guest debugging is enabled. Therefore it is possible to enter the guest with an all-zero DR6, reconstruct the #DB payload from the DR6 we get at exit time, and let kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6. Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT is set, but this is harmless. This may not be the most optimized way to deal with this, but it is simple and, being confined within SVM code, it gets rid of the set_dr6 callback and kvm_update_dr6. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the second argument. Ensure that the VMCB value is synchronized to vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so that the current value of DR6 is always available in vcpu->arch.dr6. The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Eric Biggers 提交于
<linux/cryptohash.h> sounds very generic and important, like it's the header to include if you're doing cryptographic hashing in the kernel. But actually it only includes the library implementation of the SHA-1 compression function (not even the full SHA-1). This should basically never be used anymore; SHA-1 is no longer considered secure, and there are much better ways to do cryptographic hashing in the kernel. Most files that include this header don't actually need it. So in preparation for removing it, remove all these unneeded includes of it. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Janakarajan Natarajan 提交于
When trying to lock read-only pages, sev_pin_memory() fails because FOLL_WRITE is used as the flag for get_user_pages_fast(). Commit 73b0140b ("mm/gup: change GUP fast to use flags rather than a write 'bool'") updated the get_user_pages_fast() call sites to use flags, but incorrectly updated the call in sev_pin_memory(). As the original coding of this call was correct, revert the change made by that commit. Fixes: 73b0140b ("mm/gup: change GUP fast to use flags rather than a write 'bool'") Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 5月, 2020 8 次提交
-
-
由 Paolo Bonzini 提交于
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
When single-step triggered with KVM_SET_GUEST_DEBUG, we should fill in the pc value with current linear RIP rather than the cached singlestep address. Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20200505205000.188252-3-peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
RTM should always been set even with KVM_EXIT_DEBUG on #DB. Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20200505205000.188252-2-peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Go through kvm_queue_exception_p so that the payload is correctly delivered through the exit qualification, and add a kvm_update_dr6 call to kvm_deliver_exception_payload that is needed on AMD. Reported-by: NPeter Xu <peterx@redhat.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Reinette Chatre 提交于
The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR while the first-generation MBM implementation uses statically defined 24 bit counters. The MBM CPUID enumeration properties have been expanded to include the MBM counter width, encoded as an offset from 24 bits. While eight bits are available for the counter width offset IA32_QM_CTR MSR only supports 62 bit counters. Add a sanity check, with warning printed when encountered, to ensure counters cannot exceed the 62 bit limit. Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/69d52abd5b14794d3a0f05ba7c755ed1f4c0d5ed.1588715690.git.reinette.chatre@intel.com
-
由 Reinette Chatre 提交于
The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR while the first-generation MBM implementation uses statically defined 24 bit counters. Expand the MBM CPUID enumeration properties to include the MBM counter width. The previously undefined EAX output register contains, in bits [7:0], the MBM counter width encoded as an offset from 24 bits. Enumerating this property is only specified for Intel CPUs. Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.com
-
由 Reinette Chatre 提交于
The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR, and the first-generation MBM implementation uses 24 bit counters. Software is required to poll at 1 second or faster to ensure that data is retrieved before a counter rollover occurs more than once under worst conditions. As system bandwidths scale the software requirement is maintained with the introduction of a per-resource enumerable MBM counter width. In preparation for supporting hardware with an enumerable MBM counter width the current globally static MBM counter width is moved to a per-resource MBM counter width. Currently initialized to 24 always to result in no functional change. In essence there is one function, mbm_overflow_count() that needs to know the counter width to handle rollovers. The static value used within mbm_overflow_count() will be replaced with a value discovered from the hardware. Support for learning the MBM counter width from hardware is added in the change that follows. Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/e36743b9800f16ce600f86b89127391f61261f23.1588715690.git.reinette.chatre@intel.com
-
- 06 5月, 2020 2 次提交
-
-
由 Reinette Chatre 提交于
Cache and memory bandwidth monitoring are features that are part of x86 CPU resource control that is supported by the resctrl subsystem. The monitoring properties are obtained via CPUID from every CPU and only used within the resctrl subsystem where the properties are only read from boot_cpu_data. Obtain the monitoring properties once, placed in boot_cpu_data, via the ->c_bsp_init() helpers of the vendors that support X86_FEATURE_CQM_LLC. Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/6d74a6ac3e69f4b7a8b4115835f9455faf0f468d.1588715690.git.reinette.chatre@intel.com
-
由 Reinette Chatre 提交于
The cache and memory bandwidth monitoring properties are read using CPUID on every CPU. After the information is read from the system a sanity check is run to (1) ensure that the RMID data is initialized for the boot CPU in case the information was not available on the boot CPU and (2) the boot CPU's RMID is set to the minimum of RMID obtained from all CPUs. Every known platform that supports resctrl has the same maximum RMID on all CPUs. Both sanity checks found in x86_init_cache_qos() can thus safely be removed. Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/c9a3b60d34091840c8b0bd1c6fab15e5ba92cb17.1588715690.git.reinette.chatre@intel.com
-