- 03 10月, 2013 4 次提交
-
-
由 Paolo Bonzini 提交于
The new_cr3 MMU callback has been a wrapper for mmu_free_roots since commit e676505a (KVM: MMU: Force cr3 reload with two dimensional paging on mov cr3 emulation, 2012-07-08). The commit message mentioned that "mmu_free_roots() is somewhat of an overkill, but fixing that is more complicated and will be done after this minimal fix". One year has passed, and no one really felt the need to do a different fix. Wrap the call with a kvm_mmu_new_cr3 function for clarity, but remove the callback. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Paolo Bonzini 提交于
The free MMU callback has been a wrapper for mmu_free_roots since mmu_free_roots itself was introduced (commit 17ac10ad, [PATCH] KVM: MU: Special treatment for shadow pae root pages, 2007-01-05), and has always been the same for all MMU cases. Remove the indirection as it is useless. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Paolo Bonzini 提交于
This makes the interface more deterministic for userspace, which can expect (after configuring only the features it supports) to get exactly the same state from the kernel, independent of the host CPU and kernel version. Suggested-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
由 Paolo Bonzini 提交于
A guest can still attempt to save and restore XSAVE states even if they have been masked in CPUID leaf 0Dh. This usually is not visible to the guest, but is still wrong: "Any attempt to set a reserved bit (as determined by the contents of EAX and EDX after executing CPUID with EAX=0DH, ECX= 0H) in XCR0 for a given processor will result in a #GP exception". The patch also performs the same checks as __kvm_set_xcr in KVM_SET_XSAVE. This catches migration from newer to older kernel/processor before the guest starts running. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 12 9月, 2013 2 次提交
-
-
由 Cyrill Gorcunov 提交于
_PAGE_SOFT_DIRTY bit should never be set on present pte so add VM_BUG_ON to catch any potential future abuse. Also add a comment on _PAGE_SWP_SOFT_DIRTY definition explaining scope of its usage. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Acked-by: NPavel Emelyanov <xemul@parallels.com> Acked-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Hansen 提交于
The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled for SMP. UP systems do not do remote TLB flushes, so compile those counters out on UP. arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly. This is probably an optimization since both the mtrr code and __flush_tlb() write cr4. It would probably be safe to make that a flush_tlb_all() (and then get these statistics), but the mtrr code is ancient and I'm hesitant to touch it other than to just stick in the counters. [akpm@linux-foundation.org: tweak comments] Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 9月, 2013 1 次提交
-
-
由 Linus Torvalds 提交于
Instead of taking the spinlock, the lockless versions atomically check that the lock is not taken, and do the reference count update using a cmpxchg() loop. This is semantically identical to doing the reference count update protected by the lock, but avoids the "wait for lock" contention that you get when accesses to the reference count are contended. Note that a "lockref" is absolutely _not_ equivalent to an atomic_t. Even when the lockref reference counts are updated atomically with cmpxchg, the fact that they also verify the state of the spinlock means that the lockless updates can never happen while somebody else holds the spinlock. So while "lockref_put_or_lock()" looks a lot like just another name for "atomic_dec_and_lock()", and both optimize to lockless updates, they are fundamentally different: the decrement done by atomic_dec_and_lock() is truly independent of any lock (as long as it doesn't decrement to zero), so a locked region can still see the count change. The lockref structure, in contrast, really is a *locked* reference count. If you hold the spinlock, the reference count will be stable and you can modify the reference count without using atomics, because even the lockless updates will see and respect the state of the lock. In order to enable the cmpxchg lockless code, the architecture needs to do three things: (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit in an aligned u64, and have a "cmpxchg()" implementation that works on such a u64 data type. (2) define a helper function to test for a spinlock being unlocked ("arch_spin_value_unlocked()") (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its Kconfig file. This enables it for x86-64 (but not 32-bit, we'd need to make sure cmpxchg() turns into the proper cmpxchg8b in order to enable it for 32-bit mode). Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 9月, 2013 1 次提交
-
-
由 H. Peter Anvin 提交于
Add SMAP annotations to csum_partial_copy_to/from_user(). These functions legitimately access user space and thus need to set the AC flag. TODO: add explicit checks that the side with the kernel space pointer really points into kernel space. Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-2aps0u00eer658fd5xyanan7@git.kernel.org Cc: <stable@vger.kernel.org> # v3.7+
-
- 30 8月, 2013 3 次提交
-
-
由 H. Peter Anvin 提交于
Update comment in uaccess.h to reflect the changes for clang support: gcc only cares about the base register (most architectures don't encode the size of the operation in the operands like x86 does, and so it is treated effectively like a register number), whereas clang tries to enforce the size -- but not for register pairs. Link: http://lkml.kernel.org/r/1377803585-5913-3-git-send-email-dl9pf@gmx.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: Jan-Simon Möller <dl9pf@gmx.de>
-
由 Jan-Simon Möller 提交于
Clang does not support the "shortcut" we're taking here for gcc (see below). The patch uses the macro _ASM_DX to do the job. From arch/x86/include/asm/uaccess.h: /* * Careful: we have to cast the result to the type of the pointer * for sign reasons. * * The use of %edx as the register specifier is a bit of a * simplification, as gcc only cares about it as the starting point * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits * (%ecx being the next register in gcc's x86 register sequence), and * %rdx on 64 bits. */ [ hpa: I consider this a compatibility bug in clang as this reflects a bit of a misunderstanding about how register strings are used by gcc, but the workaround is straightforward and there is no particular reason to not do it. ] Signed-off-by: NJan-Simon Möller <dl9pf@gmx.de> Link: http://lkml.kernel.org/r/1377803585-5913-3-git-send-email-dl9pf@gmx.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jan-Simon Möller 提交于
The __ASM_* macros (e.g. __ASM_DX) are used to return the proper register name (e.g. edx for 32bit / rdx for 64bit). We want to use this also in arch/x86/include/asm/uaccess.h / get_user() . For this to work, we need a raw form as both gcc and clang choke on the whitespace in a register asm() statement, and the __ASM_FORM macro surrounds the argument with blanks. A new macro, __ASM_FORM_RAW was added and we change __ASM_REG to use the new RAW form. Signed-off-by: NJan-Simon Möller <dl9pf@gmx.de> Link: http://lkml.kernel.org/r/1377803585-5913-2-git-send-email-dl9pf@gmx.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 27 8月, 2013 1 次提交
-
-
由 Marek Szyprowski 提交于
This patch cleans the initialization of dma contiguous framework. The all-in-one dma_declare_contiguous() function is now separated into dma_contiguous_reserve_area() which only steals the the memory from memblock allocator and dma_contiguous_add_device() function, which assigns given device to the specified reserved memory area. This improves the flexibility in defining contiguous memory areas and assigning device to them, because now it is possible to assign more than one device to the given contiguous memory area. Such split in initialization procedure is also required for upcoming device tree support. Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com> Acked-by: NKyungmin Park <kyungmin.park@samsung.com> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Acked-by: NTomasz Figa <t.figa@samsung.com>
-
- 26 8月, 2013 1 次提交
-
-
由 Srivatsa Vaddagiri 提交于
kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state. the presence of these hypercalls is indicated to guest via kvm_feature_pv_unhalt. Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration During migration, any vcpu that got kicked but did not become runnable (still in halted state) should be runnable after migration. Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: NSuzuki Poulose <suzuki@in.ibm.com> [Raghu: Apic related changes, folding pvunhalted into vcpu_runnable Added flags for future use (suggested by Gleb)] [ Raghu: fold pv_unhalt flag as suggested by Eric Northup] Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NGleb Natapov <gleb@redhat.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 21 8月, 2013 1 次提交
-
-
由 John Haxby 提交于
This affects xen pv guests with sufficiently old versions of xen and sufficiently new hardware. On such a system, a guest with a btrfs root won't even boot. Signed-off-by: NJohn Haxby <john.haxby@oracle.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 20 8月, 2013 1 次提交
-
-
由 Yoshihiro YUNOMAE 提交于
Prevent crash_kexec() from deadlocking on ioapic_lock. When crash_kexec() is executed on a CPU, the CPU will take ioapic_lock in disable_IO_APIC(). So if the cpu gets an NMI while locking ioapic_lock, a deadlock will happen. In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC(). You can reproduce this deadlock the following way: 1. Add mdelay(1000) after raw_spin_lock_irqsave() in native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c Although the deadlock can occur without this modification, it will increase the potential of the deadlock problem. 2. Build and install the kernel 3. Set up the OS which will run panic() and kexec when NMI is injected # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf # vim /etc/default/grub add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line # grub2-mkconfig 4. Reboot the OS 5. Run following command for each vcpu on the guest # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done; By running this command, cpus will get ioapic_lock for setting affinity. 6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you use VM). After injecting NMI, panic() is called in an nmi-handler context. Then, kexec will normally run in panic(), but the operation will be stopped by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()-> native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()-> clear_IO_APIC_pin()->ioapic_read_entry(). Signed-off-by: NYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevelSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 8月, 2013 4 次提交
-
-
由 Srivatsa Vaddagiri 提交于
During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu. Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130810193849.GA25260@linux.vnet.ibm.comSigned-off-by: NSuzuki Poulose <suzuki@in.ibm.com> [Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes, addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)] Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NGleb Natapov <gleb@redhat.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Linn Crosetto 提交于
Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size larger than early_memremap() is able to handle, which is currently limited to 256kB. If this occurs it leads to a NULL dereference in parse_setup_data(). To avoid this, remap the setup_data header and allow parsing functions for individual types to handle their own data remapping. Signed-off-by: NLinn Crosetto <linn@hp.com> Link: http://lkml.kernel.org/r/1376430401-67445-1-git-send-email-linn@hp.comAcked-by: NYinghai Lu <yinghai@kernel.org> Reviewed-by: NPekka Enberg <penberg@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Cyrill Gorcunov 提交于
Andy reported that if file page get reclaimed we lose the soft-dirty bit if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get encoded into pte entry. Thus when #pf happens on such non-present pte we can restore it back. Reported-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Acked-by: NPavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cyrill Gorcunov 提交于
Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set get swapped out, the bit is getting lost and no longer available when pte read back. To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in pte entry for the page being swapped out. When such page is to be read back from a swap cache we check for bit presence and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY bit back. One of the problem was to find a place in pte entry where we can save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was chosen for that, it doesn't intersect with swap entry format stored in pte. Reported-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Acked-by: NPavel Emelyanov <xemul@parallels.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 8月, 2013 2 次提交
-
-
由 Oleg Nesterov 提交于
This is only theoretical, but after try_to_wake_up(p) was changed to check p->state under p->pi_lock the code like __set_current_state(TASK_INTERRUPTIBLE); schedule(); can miss a signal. This is the special case of wait-for-condition, it relies on try_to_wake_up/schedule interaction and thus it does not need mb() between __set_current_state() and if(signal_pending). However, this __set_current_state() can move into the critical section protected by rq->lock, now that try_to_wake_up() takes another lock we need to ensure that it can't be reordered with "if (signal_pending(current))" check inside that section. The patch is actually one-liner, it simply adds smp_wmb() before spin_lock_irq(rq->lock). This is what try_to_wake_up() already does by the same reason. We turn this wmb() into the new helper, smp_mb__before_spinlock(), for better documentation and to allow the architectures to change the default implementation. While at it, kill smp_mb__after_lock(), it has no callers. Perhaps we can also add smp_mb__before/after_spinunlock() for prepare_to_wait(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Torsten Kaiser 提交于
load_microcode_amd() (and the helper it is using) should not have an cpu parameter. The microcode loading does not depend on the CPU wrt the patches loaded since they will end up in a global list for all CPUs anyway. The change from cpu to x86family in load_microcode_amd() now allows to drop the code messing with cpu_data(cpu) from collect_cpu_info_amd_early(), which is wrong anyway because at that point the per-cpu cpu_info is not yet setup (These values would later be overwritten by smp_store_boot_cpu_info() / smp_store_cpu_info()). Fold the rest of collect_cpu_info_amd_early() into load_ucode_amd_ap(), because its only used at one place and without the cpuinfo_x86 accesses it was not much left. Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com> [ Fengguang: build fix ] Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> [ Boris: adapt it to current tree. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 12 8月, 2013 1 次提交
-
-
由 Thomas Petazzoni 提交于
Until now, the MSI architecture-specific functions could be overloaded using a fairly complex set of #define and compile-time conditionals. In order to prepare for the introduction of the msi_chip infrastructure, it is desirable to switch all those functions to use the 'weak' mechanism. This commit converts all the architectures that were overidding those MSI functions to use the new strategy. Note that we keep two separate, non-weak, functions default_teardown_msi_irqs() and default_restore_msi_irqs() for the default behavior of the arch_teardown_msi_irqs() and arch_restore_msi_irqs(), as the default behavior is needed by x86 PCI code. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: NDaniel Price <daniel.price@gmail.com> Tested-by: NThierry Reding <thierry.reding@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: linux-s390@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: linux-ia64@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: David S. Miller <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: NJason Cooper <jason@lakedaemon.net>
-
- 10 8月, 2013 1 次提交
-
-
由 Daniel Drake 提交于
OpenFirmware wasn't quite following the protocol described in boot.txt and the kernel has detected this through use of the sentinel value in boot_params. OFW does zero out almost all of the stuff that it should do, but not the sentinel. This causes the kernel to clear olpc_ofw_header, which breaks x86 OLPC support. OpenFirmware has now been fixed. However, it would be nice if we could maintain Linux compatibility with old firmware versions. To do that, we just have to avoid zeroing out olpc_ofw_header. OFW does not write to any other parts of the header that are being zapped by the sentinel-detection code, and all users of olpc_ofw_header are somewhat protected through checking for the OLPC_OFW_SIG magic value before using it. So this should not cause any problems for anyone. Signed-off-by: NDaniel Drake <dsd@laptop.org> Link: http://lkml.kernel.org/r/20130809221420.618E6FAB03@dev.laptop.orgAcked-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.9+
-
- 09 8月, 2013 6 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
This is based on a patch that Zhenzhong Duan had sent - which was missing some of the remaining pieces. The kernel has the logic to handle Xen-type-exceptions using the paravirt interface in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME - pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN - pv_cpu_ops.iret). That means the nmi handler (and other exception handlers) use the hypervisor iret. The other changes that would be neccessary for this would be to translate the NMI_VECTOR to one of the entries on the ipi_vector and make xen_send_IPI_mask_allbutself use different events. Fortunately for us commit 1db01b49 (xen: Clean up apic ipi interface) implemented this and we piggyback on the cleanup such that the apic IPI interface will pass the right vector value for NMI. With this patch we can trigger NMIs within a PV guest (only tested x86_64). For this to work with normal PV guests (not initial domain) we need the domain to be able to use the APIC ops - they are already implemented to use the Xen event channels. For that to be turned on in a PV domU we need to remove the masking of X86_FEATURE_APIC. Incidentally that means kgdb will also now work within a PV guest without using the 'nokgdbroundup' workaround. Note that the 32-bit version is different and this patch does not enable that. CC: Lisa Nguyen <lisa@xenapiadmin.com> CC: Ben Guthro <benjamin.guthro@citrix.com> CC: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up per David Vrabel comments] Reviewed-by: NBen Guthro <benjamin.guthro@citrix.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: Unlocker Locker test for lock pickup -> fail unlock test slowpath -> false set slowpath flags block Whereas this works in any ordering: Unlocker Locker set slowpath flags test for lock pickup -> fail block unlock test slowpath -> true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked "add" is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-11-git-send-email-raghavendra.kt@linux.vnet.ibm.comSigned-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stephan Diestelhorst <stephan.diestelhorst@amd.com> Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jeremy Fitzhardinge 提交于
Increment ticket head/tails by 2 rather than 1 to leave the LSB free to store a "is in slowpath state" bit. This halves the number of possible CPUs for a given ticket size, but this shouldn't matter in practice - kernels built for 32k+ CPU systems are probably specially built for the hardware rather than a generic distro kernel. Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-9-git-send-email-raghavendra.kt@linux.vnet.ibm.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NAttilio Rao <attilio.rao@citrix.com> Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jeremy Fitzhardinge 提交于
Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-8-git-send-email-raghavendra.kt@linux.vnet.ibm.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NAttilio Rao <attilio.rao@citrix.com> Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jeremy Fitzhardinge 提交于
Now that the paravirtualization layer doesn't exist at the spinlock level any more, we can collapse the __ticket_ functions into the arch_ functions. Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-4-git-send-email-raghavendra.kt@linux.vnet.ibm.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NAttilio Rao <attilio.rao@citrix.com> Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Jeremy Fitzhardinge 提交于
Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk "Prevent Guests from Spinning Around" http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Results: ======= setup: 32 core machine with 32 vcpu KVM guest (HT off) with 8GB RAM base = 3.11-rc patched = base + pvspinlock V12 +-----------------+----------------+--------+ dbench (Throughput in MB/sec. Higher is better) +-----------------+----------------+--------+ | base (stdev %)|patched(stdev%) | %gain | +-----------------+----------------+--------+ | 15035.3 (0.3) |15150.0 (0.6) | 0.8 | | 1470.0 (2.2) | 1713.7 (1.9) | 16.6 | | 848.6 (4.3) | 967.8 (4.3) | 14.0 | | 652.9 (3.5) | 685.3 (3.7) | 5.0 | +-----------------+----------------+--------+ pvspinlock shows benefits for overcommit ratio > 1 for PLE enabled cases, and undercommits results are flat Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/1376058122-8248-2-git-send-email-raghavendra.kt@linux.vnet.ibm.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NAttilio Rao <attilio.rao@citrix.com> [ Raghavendra: Changed SPIN_THRESHOLD, fixed redefinition of arch_spinlock_t] Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 08 8月, 2013 1 次提交
-
-
由 Kees Cook 提交于
Moves the relocation handling into C, after decompression. This requires that the decompressed size is passed to the decompression routine as well so that relocations can be found. Only kernels that need relocation support will use the code (currently just x86_32), but this is laying the ground work for 64-bit using it in support of KASLR. Based on work by Neill Clift and Michael Davidson. Signed-off-by: NKees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20130708161517.GA4832@www.outflux.netAcked-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 07 8月, 2013 10 次提交
-
-
由 Nadav Har'El 提交于
If we let L1 use EPT, we should probably also support the INVEPT instruction. In our current nested EPT implementation, when L1 changes its EPT table for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course of this modification already calls INVEPT. But if last level of shadow page is unsync not all L1's changes to EPT12 are intercepted, which means roots need to be synced when L1 calls INVEPT. Global INVEPT should not be different since roots are synced by kvm_mmu_load() each time EPTP02 changes. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NNadav Har'El <nyh@il.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yang Zhang 提交于
Inject nEPT fault to L1 guest. This patch is original from Xinhao. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: NJun Nakajima <jun.nakajima@intel.com> Signed-off-by: NXinhao Xu <xinhao.xu@intel.com> Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Steven Rostedt 提交于
As specified by H. Peter Anvin, the best nops for x86 without knowing the running computer is: 32bit: 0x3e, 0x8d, 0x74, 0x26, 0x00 also known as GENERIC_NOP5_ATOMIC 64bit: 0x0f, 0x1f, 0x44, 0x00, 0x00 also known as P6_NOP5_ATOMIC Currently the default nop that is used by jump label is: 0xe9 0x00 0x00 0x00 0x00 Which is really a 5byte jump to the next position. It's better to use a real nop than a jmp. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Andi Kleen 提交于
Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-17-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andi Kleen 提交于
They are implemented in assembler. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-14-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andi Kleen 提交于
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-13-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andi Kleen 提交于
Plus one function, load_gs_index(). Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-10-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andi Kleen 提交于
- Make all the external assembler template symbols __visible - Move the templates inline assembler code into a top level assembler statement, not inside a function. This avoids it being optimized away or cloned. Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-8-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andi Kleen 提交于
FWIW I suspect sys_rt_sigreturn/sys_sigreturn should use standard SYSCALL wrappers. But I didn't do that change in this patch. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-7-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andi Kleen 提交于
This function is called from inline assembler, so has to be visible. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1375740170-7446-6-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-