- 24 7月, 2017 1 次提交
-
-
由 Andy Shevchenko 提交于
As a preparatory to use generic IO accessor helpers we need to define architecture dependent functions via preprocessor to let world know we have them. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: NWolfram Sang <wsa@the-dreams.de> Cc: Baolin Wang <baolin.wang@spreadtrum.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-i2c@vger.kernel.org Link: http://lkml.kernel.org/r/20170630170934.83028-2-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 7月, 2017 7 次提交
-
-
由 Josh Poimboeuf 提交于
This enables objtool to grok the iret in the middle of a C function. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/b057be26193c11d2ed3337b2107bc7adcba42c99.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
Add unwind hint annotations to entry_64.S. This will enable the ORC unwinder to unwind through any location in the entry code including syscalls, interrupts, and exceptions. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/b9f6d478aadf68ba57c739dcfac34ec0dc021c4c.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
Some asm (and inline asm) code does special things to the stack which objtool can't understand. (Nor can GCC or GNU assembler, for that matter.) In such cases we need a facility for the code to provide annotations, so the unwinder can unwind through it. This provides such a facility, in the form of unwind hints. They're similar to the GNU assembler .cfi* directives, but they give more information, and are needed in far fewer places, because objtool can fill in the blanks by following branches and adjusting the stack pointer for pushes and pops. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/0f5f3c9104fca559ff4088bece1d14ae3bca52d5.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
On x86_64, the double fault exception stack is located immediately after the interrupt stack in memory. This causes confusion in the unwinder when it tries to unwind through an empty interrupt stack, where the stack pointer points to the address bordering the two stacks. The unwinder incorrectly thinks it's running on the double fault stack. Fix this kind of stack border confusion by never considering the beginning address of an exception or interrupt stack to be part of the stack. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Fixes: 5fe599e0 ("x86/dumpstack: Add support for unwinding empty IRQ stacks") Link: http://lkml.kernel.org/r/bcc142160a5104de5c354c21c394c93a0173943f.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
If two consecutive stack frames have pt_regs, the oops dump code fails to print the second frame's registers. Fix that. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Fixes: 3b3fa11b ("x86/dumpstack: Print any pt_regs found on the stack") Link: http://lkml.kernel.org/r/269c5c00c7d45c699f3dcea42a3a594c6cf7a9a3.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
The OOPS unwinder wants the word at the top of the IRQ stack to point back to the previous stack at all times when the IRQ stack is in use. There's currently a one-instruction window in ENTER_IRQ_STACK during which this isn't the case. Fix it by writing the old RSP to the top of the IRQ stack before jumping. This currently writes the pointer to the stack twice, which is a bit ugly. We could get rid of this by replacing irq_stack_ptr with irq_stack_ptr_minus_eight (better name welcome). OTOH, there may be all kinds of odd microarchitectural considerations in play that affect performance by a few cycles here. Reported-by: NMike Galbraith <efault@gmx.de> Reported-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/aae7e79e49914808440ad5310ace138ced2179ca.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
This will allow IRQ stacks to nest inside NMIs or similar entries that can happen during IRQ stack setup or teardown. The new macros won't work correctly if they're invoked with IRQs on. Add a check under CONFIG_DEBUG_ENTRY to detect that. Signed-off-by: NAndy Lutomirski <luto@kernel.org> [ Use %r10 instead of %r11 in xen_do_hypervisor_callback to make objtool and ORC unwinder's lives a little easier. ] Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/b0b2ff5fb97d2da2e1d7e1f380190c92545c8bb5.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 7月, 2017 5 次提交
-
-
由 Roman Kagan 提交于
Hyper-V identifies vCPUs by Virtual Processor Index, which can be queried via HV_X64_MSR_VP_INDEX msr. It is defined by the spec as a sequential number which can't exceed the maximum number of vCPUs per VM. APIC ids can be sparse and thus aren't a valid replacement for VP indices. Current KVM uses its internal vcpu index as VP_INDEX. However, to make it predictable and persistent across VM migrations, the userspace has to control the value of VP_INDEX. This patch achieves that, by storing vp_index explicitly on vcpu, and allowing HV_X64_MSR_VP_INDEX to be set from the host side. For compatibility it's initialized to KVM vcpu index. Also a few variables are renamed to make clear distinction betweed this Hyper-V vp_index and KVM vcpu_id (== APIC id). Besides, a new capability, KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip attempting msr writes where unsupported, to avoid spamming error logs. Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1, async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0, kvm_can_do_async_pf returns 0 if in guest mode. This is similar to what svm.c wanted to do all along, but it is only enabled for Linux as L1 hypervisor. Foreign hypervisors must never receive async page faults as vmexits, because they'd probably be very confused about that. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
Add an nested_apf field to vcpu->arch.exception to identify an async page fault, and constructs the expected vm-exit information fields. Force a nested VM exit from nested_vmx_check_exception() if the injected #PF is async page fault. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
This patch adds the L1 guest async page fault #PF vmexit handler, such by L1 similar to ordinary async page fault. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> [Passed insn parameters to kvm_mmu_page_fault().] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
This patch removes all arguments except the first in kvm_x86_ops->queue_exception since they can extract the arguments from vcpu->arch.exception themselves. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 13 7月, 2017 20 次提交
-
-
由 Roman Kagan 提交于
There is a flaw in the Hyper-V SynIC implementation in KVM: when message page or event flags page is enabled by setting the corresponding msr, KVM zeroes it out. This is problematic because on migration the corresponding MSRs are loaded on the destination, so the content of those pages is lost. This went unnoticed so far because the only user of those pages was in-KVM hyperv synic timers, which could continue working despite that zeroing. Newer QEMU uses those pages for Hyper-V VMBus implementation, and zeroing them breaks the migration. Besides, in newer QEMU the content of those pages is fully managed by QEMU, so zeroing them is undesirable even when writing the MSRs from the guest side. To support this new scheme, introduce a new capability, KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic pages aren't zeroed out in KVM. Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Ladi Prosek 提交于
The backwards_tsc_observed global introduced in commit 16a96021 is never reset to false. If a VM happens to be running while the host is suspended (a common source of the TSC jumping backwards), master clock will never be enabled again for any VM. In contrast, if no VM is running while the host is suspended, master clock is unaffected. This is inconsistent and unnecessarily strict. Let's track the backwards_tsc_observed variable separately and let each VM start with a clean slate. Real world impact: My Windows VMs get slower after my laptop undergoes a suspend/resume cycle. The only way to get the perf back is unloading and reloading the kvm module. Signed-off-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Joe Perches 提交于
Make the code like the rest of the kernel. Link: http://lkml.kernel.org/r/1cd3d401626e51ea0e2333a860e76e80bc560a4c.1499284835.git.joe@perches.comSigned-off-by: NJoe Perches <joe@perches.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
When RLIMIT_STACK is, for example, 256MB, the current code results in a gap between the top of the task and mmap_base of 256MB, failing to take into account the amount by which the stack address was randomized. In other words, the stack gets less than RLIMIT_STACK space. Ensure that the gap between the stack and mmap_base always takes stack randomization and the stack guard gap into account. Obtained from Daniel Micay's linux-hardened tree. Link: http://lkml.kernel.org/r/20170622200033.25714-2-riel@redhat.comSigned-off-by: NDaniel Micay <danielmicay@gmail.com> Signed-off-by: NRik van Riel <riel@redhat.com> Reported-by: NFlorian Weimer <fweimer@redhat.com> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hugh Dickins <hughd@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
Use the ascii-armor canary to prevent unterminated C string overflows from being able to successfully overwrite the canary, even if they somehow obtain the canary value. Inspired by execshield ascii-armor and Daniel Micay's linux-hardened tree. Link: http://lkml.kernel.org/r/20170524155751.424-4-riel@redhat.comSigned-off-by: NRik van Riel <riel@redhat.com> Acked-by: NKees Cook <keescook@chromium.org> Cc: Daniel Micay <danielmicay@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Micay 提交于
This adds support for compiling with a rough equivalent to the glibc _FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer overflow checks for string.h functions when the compiler determines the size of the source or destination buffer at compile-time. Unlike glibc, it covers buffer reads in addition to writes. GNU C __builtin_*_chk intrinsics are avoided because they would force a much more complex implementation. They aren't designed to detect read overflows and offer no real benefit when using an implementation based on inline checks. Inline checks don't add up to much code size and allow full use of the regular string intrinsics while avoiding the need for a bunch of _chk functions and per-arch assembly to avoid wrapper overhead. This detects various overflows at compile-time in various drivers and some non-x86 core kernel code. There will likely be issues caught in regular use at runtime too. Future improvements left out of initial implementation for simplicity, as it's all quite optional and can be done incrementally: * Some of the fortified string functions (strncpy, strcat), don't yet place a limit on reads from the source based on __builtin_object_size of the source buffer. * Extending coverage to more string functions like strlcat. * It should be possible to optionally use __builtin_object_size(x, 1) for some functions (C strings) to detect intra-object overflows (like glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative approach to avoid likely compatibility issues. * The compile-time checks should be made available via a separate config option which can be enabled by default (or always enabled) once enough time has passed to get the issues it catches fixed. Kees said: "This is great to have. While it was out-of-tree code, it would have blocked at least CVE-2016-3858 from being exploitable (improper size argument to strlcpy()). I've sent a number of fixes for out-of-bounds-reads that this detected upstream already" [arnd@arndb.de: x86: fix fortified memcpy] Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de [keescook@chromium.org: avoid panic() in favor of BUG()] Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast [keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help] Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.orgSigned-off-by: NDaniel Micay <danielmicay@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NKees Cook <keescook@chromium.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nicholas Piggin 提交于
Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR. LOCKUP_DETECTOR implies the general boot, sysctl, and programming interfaces for the lockup detectors. An architecture that wants to use a hard lockup detector must define HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH. Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and does not implement the LOCKUP_DETECTOR interfaces. sparc is unusual in that it has started to implement some of the interfaces, but not fully yet. It should probably be converted to a full HAVE_HARDLOCKUP_DETECTOR_ARCH. [npiggin@gmail.com: fix] Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NDon Zickus <dzickus@redhat.com> Reviewed-by: NBabu Moger <babu.moger@oracle.com> Tested-by: Babu Moger <babu.moger@oracle.com> [sparc] Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xunlei Pang 提交于
As Eric said, "what we need to do is move the variable vmcoreinfo_note out of the kernel's .bss section. And modify the code to regenerate and keep this information in something like the control page. Definitely something like this needs a page all to itself, and ideally far away from any other kernel data structures. I clearly was not watching closely the data someone decided to keep this silly thing in the kernel's .bss section." This patch allocates extra pages for these vmcoreinfo_XXX variables, one advantage is that it enhances some safety of vmcoreinfo, because vmcoreinfo now is kept far away from other kernel data structures. Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.comSigned-off-by: NXunlei Pang <xlpang@redhat.com> Tested-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Suggested-by: NEric Biederman <ebiederm@xmission.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Janakarajan Natarajan 提交于
Enable the Virtual VMLOAD VMSAVE feature. This is done by setting bit 1 at position B8h in the vmcb. The processor must have nested paging enabled, be in 64-bit mode and have support for the Virtual VMLOAD VMSAVE feature for the bit to be set in the vmcb. Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Janakarajan Natarajan 提交于
Define a new cpufeature definition for Virtual VMLOAD VMSAVE. Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Janakarajan Natarajan 提交于
Rename the lbr_ctl variable to better reflect the purpose of the field - provide support for virtualization extensions. Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Janakarajan Natarajan 提交于
The lbr_ctl variable in the vmcb control area is used to enable or disable Last Branch Record (LBR) virtualization. However, this is to be done using only bit 0 of the variable. To correct this and to prepare for a new feature, change the current usage to work only on a particular bit. Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Ladi Prosek 提交于
kvm_skip_emulated_instruction handles the singlestep debug exception which is something we almost always want. This commit (specifically the change in rdmsr_interception) makes the debug.flat KVM unit test pass on AMD. Two call sites still call skip_emulated_instruction directly: * In svm_queue_exception where it's used only for moving the rip forward * In task_switch_interception which is analogous to handle_task_switch in VMX Signed-off-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Radim Krčmář 提交于
kvm_vm_release() did not have slots_lock when calling kvm_io_bus_unregister_dev() and this went unnoticed until 4a12f951 ("KVM: mark kvm->busses as rcu protected") added dynamic checks. Luckily, there should be no race at that point: ============================= WARNING: suspicious RCU usage 4.12.0.kvm+ #0 Not tainted ----------------------------- ./include/linux/kvm_host.h:479 suspicious rcu_dereference_check() usage! lockdep_rcu_suspicious+0xc5/0x100 kvm_io_bus_unregister_dev+0x173/0x190 [kvm] kvm_free_pit+0x28/0x80 [kvm] kvm_arch_sync_events+0x2d/0x30 [kvm] kvm_put_kvm+0xa7/0x2a0 [kvm] kvm_vm_release+0x21/0x30 [kvm] Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Jim Mattson 提交于
vmx_complete_atomic_exit should call kvm_machine_check for any VM-entry failure due to a machine-check event. Such an exit should be recognized solely by its basic exit reason (i.e. the low 16 bits of the VMCS exit reason field). None of the other VMCS exit information fields contain valid information when the VM-exit is due to "VM-entry failure due to machine-check event". Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NXiao Guangrong <xiaoguangrong@tencent.com> [Changed VM_EXIT_INTR_INFO condition to better describe its reason.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Radim Krčmář 提交于
kvm master clock usually has a different frequency than the kernel boot clock. This is not a problem until the master clock is updated; update uses the current kernel boot clock to compute new kvm clock, which erases any kvm clock cycles that might have built up due to frequency difference over a long period. KVM_SET_CLOCK is one of places where we can safely update master clock as the guest-visible clock is going to be shifted anyway. The problem with current code is that it updates the kvm master clock after updating the offset. If the master clock was enabled before calling KVM_SET_CLOCK, then it might have built up a significant delta from kernel boot clock. In the worst case, the time set by userspace would be shifted by so much that it couldn't have been set at any point during KVM_SET_CLOCK. To fix this, move kvm_gen_update_masterclock() before computing kvmclock_offset, which means that the master clock and kernel boot clock will be sufficiently close together. Another solution would be to replace get_kvmclock_ns() with "ktime_get_boot_ns() + ka->kvmclock_offset", which is marginally more accurate, but would break symmetry with KVM_GET_CLOCK. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Jim Mattson 提交于
Inconsistencies result from shadowing only accesses to the full 64-bits of a 64-bit VMCS field, but not shadowing accesses to the high 32-bits of the field. The "high" part of a 64-bit field should be shadowed whenever the full 64-bit field is shadowed. Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
Allow the L1 guest to specify the last page of addressable guest physical memory for an L2 MSR permission bitmap. Also remove the vmcs12_read_any() check that should never fail. Fixes: 3af18d9c ("KVM: nVMX: Prepare for using hardware MSR bitmap") Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
According to the SDM, if the "use I/O bitmaps" VM-execution control is 1, bits 11:0 of each I/O-bitmap address must be 0. Neither address should set any bits beyond the processor's physical-address width. Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
The VMCS launch state is not set to "launched" unless the VMLAUNCH actually succeeds. VMLAUNCH failure includes VM-exits with bit 31 set. Note that this change does not address the general problem that a failure to launch/resume vmcs02 (i.e. vmx->fail) is not handled correctly. Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 7月, 2017 3 次提交
-
-
由 Kees Cook 提交于
The ELF_ET_DYN_BASE position was originally intended to keep loaders away from ET_EXEC binaries. (For example, running "/lib/ld-linux.so.2 /bin/cat" might cause the subsequent load of /bin/cat into where the loader had been loaded.) With the advent of PIE (ET_DYN binaries with an INTERP Program Header), ELF_ET_DYN_BASE continued to be used since the kernel was only looking at ET_DYN. However, since ELF_ET_DYN_BASE is traditionally set at the top 1/3rd of the TASK_SIZE, a substantial portion of the address space is unused. For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are loaded above the mmap region. This means they can be made to collide (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with pathological stack regions. Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap region in all cases, and will now additionally avoid programs falling back to the mmap region by enforcing MAP_FIXED for program loads (i.e. if it would have collided with the stack, now it will fail to load instead of falling back to the mmap region). To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP) are loaded into the mmap region, leaving space available for either an ET_EXEC binary with a fixed location or PIE being loaded into mmap by the loader. Only PIE programs are loaded offset from ELF_ET_DYN_BASE, which means architectures can now safely lower their values without risk of loaders colliding with their subsequently loaded programs. For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use the entire 32-bit address space for 32-bit pointers. Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and suggestions on how to implement this solution. Fixes: d1fd836d ("mm: split ET_DYN ASLR from mmap ASLR") Link: http://lkml.kernel.org/r/20170621173201.GA114489@beastSigned-off-by: NKees Cook <keescook@chromium.org> Acked-by: NRik van Riel <riel@redhat.com> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Qualys Security Advisory <qsa@qualys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Pratyush Anand <panand@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
We used to read several bytes of the shadow memory in advance. Therefore additional shadow memory mapped to prevent crash if speculative load would happen near the end of the mapped shadow memory. Now we don't have such speculative loads, so we no longer need to map additional shadow memory. Link: http://lkml.kernel.org/r/20170601162338.23540-2-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Potapenko <glider@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Richard Weinberger 提交于
When checking for PTRACE_GETRESET/SETREGSET, make sure that the correct header file is included. We need linux/ptrace.h which contains all ptrace UAPI related defines. Otherwise #if defined(PTRACE_GETRESET) is always false. Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
- 10 7月, 2017 1 次提交
-
-
由 Paolo Bonzini 提交于
This exit ended up being reported, but the currently exposed data does not provide much of a starting point for debugging. In the reported case, the vmexit was an EPT misconfiguration (MMIO access). Let userspace report ethe exit qualification and, if relevant, the GPA. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 7月, 2017 3 次提交
-
-
由 Punit Agrawal 提交于
A poisoned or migrated hugepage is stored as a swap entry in the page tables. On architectures that support hugepages consisting of contiguous page table entries (such as on arm64) this leads to ambiguity in determining the page table entry to return in huge_pte_offset() when a poisoned entry is encountered. Let's remove the ambiguity by adding a size parameter to convey additional information about the requested address. Also fixup the definition/usage of huge_pte_offset() throughout the tree. Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.comSigned-off-by: NPunit Agrawal <punit.agrawal@arm.com> Acked-by: NSteve Capper <steve.capper@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE) Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS) Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
This moves the #ifdef in C code to a Kconfig dependency. Also we move the gigantic_page_supported() function to be arch specific. This allows architectures to conditionally enable runtime allocation of gigantic huge page. Architectures like ppc64 supports different gigantic huge page size (16G and 1G) based on the translation mode selected. This provides an opportunity for ppc64 to enable runtime allocation only w.r.t 1G hugepage. No functional change in this patch. Link: http://lkml.kernel.org/r/1494995292-4443-1-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
arch_add_memory gets for_device argument which then controls whether we want to create memblocks for created memory sections. Simplify the logic by telling whether we want memblocks directly rather than going through pointless negation. This also makes the api easier to understand because it is clear what we want rather than nothing telling for_device which can mean anything. This shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/20170515085827.16474-13-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Tested-by: NDan Williams <dan.j.williams@intel.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daniel Kiper <daniel.kiper@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-