- 11 10月, 2013 1 次提交
-
-
由 Ingo Molnar 提交于
Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto' constructs, as outlined here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 Implement a workaround suggested by Jakub Jelinek. Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com> Reported-by: NOleg Nesterov <oleg@redhat.com> Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: NJakub Jelinek <jakub@redhat.com> Reviewed-by: NRichard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 10月, 2013 2 次提交
-
-
由 Frediano Ziglio 提交于
Due to the way kernel is initialized under Xen is possible that the ring1 selector used by the kernel for the boot cpu end up to be copied to userspace leading to segmentation fault in the userspace. Xen code in the kernel initialize no-boot cpus with correct selectors (ds and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen). On task context switch (switch_to) we assume that ds, es and cs already point to __USER_DS and __KERNEL_CSso these selector are not changed. If processor is an Intel that support sysenter instruction sysenter/sysexit is used so ds and es are not restored switching back from kernel to userspace. In the case the selectors point to a ring1 instead of __USER_DS the userspace code will crash on first memory access attempt (to be precise Xen on the emulated iret used to do sysexit will detect and set ds and es to zero which lead to GPF anyway). Now if an userspace process call kernel using sysenter and get rescheduled (for me it happen on a specific init calling wait4) could happen that the ring1 selector is set to ds and es. This is quite hard to detect cause after a while these selectors are fixed (__USER_DS seems sticky). Bisecting the code commit 7076aada appears to be the first one that have this issue. Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com> Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>
-
由 Gleb Natapov 提交于
72f85795 broke shadow on EPT. This patch reverts it and fixes PAE on nEPT (which reverted commit fixed) in other way. Shadow on EPT is now broken because while L1 builds shadow page table for L2 (which is PAE while L2 is in real mode) it never loads L2's GUEST_PDPTR[0-3]. They do not need to be loaded because without nested virtualization HW does this during guest entry if EPT is disabled, but in our case L0 emulates L2's vmentry while EPT is enables, so we cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3() is doing, but by clearing cache bits during L2 vmentry we drop values that kvm_set_cr3() read from memory. So why the same code does not work for PAE on nEPT? kvm_set_cr3() reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs() uses vcpu->arch.mmu which contain incorrect values. Fix that by using walk_mmu in ept_(load|save)_pdptrs. Signed-off-by: NGleb Natapov <gleb@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Tested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 10月, 2013 1 次提交
-
-
由 Ville Syrjälä 提交于
Dell Latitude E5410 needs reboot=pci to actually reboot. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: http://lkml.kernel.org/r/1380888964-14517-1-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 05 10月, 2013 2 次提交
-
-
由 Bjorn Helgaas 提交于
This reverts commit 07f9b61c. 07f9b61c was intended to be a cleanup that didn't change anything, but in fact, for systems without _CBA (which is almost everything), it broke extended config space for domain 0 and all config space for other domains. Reference: http://lkml.kernel.org/r/20131004011806.GE20450@dangermouse.emea.sgi.comReported-by: NHedi Berriche <hedi@sgi.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
由 Thomas Petazzoni 提交于
Commit ebd97be6 ('PCI: remove ARCH_SUPPORTS_MSI kconfig option') removed the ARCH_SUPPORTS_MSI option which architectures could select to indicate that they support MSI. Now, all architectures are supposed to build fine when MSI support is enabled: instead of having the architecture tell *when* MSI support can be used, it's up to the architecture code to ensure that MSI support can be enabled. On x86, commit ebd97be6 removed the following line: select ARCH_SUPPORTS_MSI if (X86_LOCAL_APIC && X86_IO_APIC) Which meant that MSI support was only available when the local APIC and I/O APIC were enabled. While this is always true on SMP or x86-64, it is not necessarily the case on i386 !SMP. The below patch makes sure that the local APIC and I/O APIC support is always enabled when MSI support is enabled. To do so, it: * Ensures the X86_UP_APIC option is not visible when PCI_MSI is enabled. This is the option that allows, on UP machines, to enable or not the APIC support. It is already not visible on SMP systems, or x86-64 systems, for example. We're simply also making it invisible on i386 MSI systems. * Ensures that the X86_LOCAL_APIC and X86_IO_APIC options are 'y' when PCI_MSI is enabled. Notice that this change requires a change in drivers/iommu/Kconfig to avoid a recursive Kconfig dependencey. The AMD_IOMMU option selects PCI_MSI, but was depending on X86_IO_APIC. This dependency is no longer needed: as soon as PCI_MSI is selected, the presence of X86_IO_APIC is guaranteed. Moreover, the AMD_IOMMU already depended on X86_64, which already guaranteed that X86_IO_APIC was enabled, so this dependency was anyway redundant. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Link: http://lkml.kernel.org/r/1380794354-9079-1-git-send-email-thomas.petazzoni@free-electrons.comReported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 04 10月, 2013 1 次提交
-
-
由 Peter Zijlstra 提交于
Currently the cap_user_time_zero capability has different tests than cap_user_time; even though they expose the exact same data. Switch from CONSTANT && NONSTOP to sched_clock_stable to also deal with multi cabinet machines and drop the tsc_disabled() check.. non of this will work sanely without tsc anyway. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-nmgn0j0muo1r4c94vlfh23xy@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 10月, 2013 1 次提交
-
-
由 David Herrmann 提交于
IORESOURCE_BUSY is used to mark temporary driver mem-resources instead of global regions. This suppresses warnings if regions overlap with a region marked as BUSY. This was always the case for VESA/VGA/EFI framebuffer regions so do the same for simplefb regions. The reason we do this is to allow device handover to real GPU drivers like i915/radeon/nouveau which get the same regions via PCI BARs. Maybe at some point we will be able to unregister platform devices properly during the handover. In this case the simplefb region would get removed before the new region is created. However, this is currently not the case and would require rather huge changes in remove_conflicting_framebuffers(). Add the BUSY marker now and try to eventually rewrite the handover for a next release. Also see kernel/resource.c for more information: /* * if a resource is "BUSY", it's not a hardware resource * but a driver mapping of such a resource; we don't want * to warn for those; some drivers legitimately map only * partial hardware resources. (example: vesafb) */ This suppresses warnings like: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 199 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2e3/0x390() Info: mapping multiple BARs. Your kernel is fine. Call Trace: dump_stack+0x54/0x8d warn_slowpath_common+0x7d/0xa0 warn_slowpath_fmt+0x4c/0x50 iomem_map_sanity_check+0xac/0xe0 __ioremap_caller+0x2e3/0x390 ioremap_wc+0x32/0x40 i915_driver_load+0x670/0xf50 [i915] ... Reported-by: NTom Gundersen <teg@jklm.no> Tested-by: NTom Gundersen <teg@jklm.no> Tested-by: NPavel Roskin <proski@gnu.org> Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com> Link: http://lkml.kernel.org/r/1380724864-1757-1-git-send-email-dh.herrmann@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 02 10月, 2013 1 次提交
-
-
由 Tom Gundersen 提交于
On my MacBook Air lfb_size is 4M, which makes the bitshit overflow (to 256GB - larger than 32 bits), meaning we fall back to efifb unnecessarily. Cast to u64 to avoid the overflow. Signed-off-by: NTom Gundersen <teg@jklm.no> Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Stephen Warren <swarren@nvidia.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Link: http://lkml.kernel.org/r/1380644320-1026-1-git-send-email-teg@jklm.noSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 9月, 2013 1 次提交
-
-
由 Ingo Molnar 提交于
Ran into this cryptic PMU bootup log recently: [ 0.124047] Performance Events: [ 0.125000] smpboot: ... Turns out we print this if no PMU is detected. Fall back to the right condition so that the following is printed: [ 0.122381] Performance Events: no PMU driver, software events only. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/n/tip-u2fwaUffakjp0qkpRfqljgsn@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 9月, 2013 1 次提交
-
-
由 Suravee Suthikulpanit 提交于
On AMD family 14h, applying microcode patch on the a core (core0) would also affect the other core (core1) in the same compute unit. The driver would skip applying the patch on core1, but it still need to update kernel structures to reflect the proper patch level. The current logic is not updating the struct ucode_cpu_info.cpu_sig.rev of the skipped core. This causes the /sys/devices/system/cpu/cpu1/microcode/version to report incorrect patch level as shown below: $ grep . cpu?/microcode/version cpu0/microcode/version:0x600063d cpu1/microcode/version:0x6000626 cpu2/microcode/version:0x600063d cpu3/microcode/version:0x6000626 cpu4/microcode/version:0x600063d Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Acked-by: NBorislav Petkov <bp@suse.de> Cc: <bp@alien8.de> Cc: <jacob.w.shin@gmail.com> Cc: <herrmann.der.user@googlemail.com> Link: http://lkml.kernel.org/r/1285806432-1995-1-git-send-email-suravee.suthikulpanit@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 9月, 2013 4 次提交
-
-
由 David Vrabel 提交于
On hosts with more than 168 GB of memory, a 32-bit guest may attempt to grant map an MFN that is error cannot lookup in its mapping of the m2p table. There is an m2p lookup as part of m2p_add_override() and m2p_remove_override(). The lookup falls off the end of the mapped portion of the m2p and (because the mapping is at the highest virtual address) wraps around and the lookup causes a fault on what appears to be a user space address. do_page_fault() (thinking it's a fault to a userspace address), tries to lock mm->mmap_sem. If the gntdev device is used for the grant map, m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem already locked. do_page_fault() then deadlocks. The deadlock would most commonly occur when a 64-bit guest is started and xenconsoled attempts to grant map its console ring. Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the mapped portion of the m2p table before accessing the table and use this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn() (which already had the correct range check). All faults caused by accessing the non-existant parts of the m2p are thus within the kernel address space and exception_fixup() is called without trying to lock mm->mmap_sem. This means that for MFNs that are outside the mapped range of the m2p then mfn_to_pfn() will always look in the m2p overrides. This is correct because it must be a foreign MFN (and the PFN in the m2p in this case is only relevant for the other domain). Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com> Cc: Stefano Stabellini <stefano.stabellini@citrix.com> Cc: Jan Beulich <JBeulich@suse.com> -- v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides() v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of range as it's probably foreign. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
-
由 Gleb Natapov 提交于
Bit 12 is undefined in any of the following cases: - If the "NMI exiting" VM-execution control is 1 and the "virtual NMIs" VM-execution control is 0. - If the VM exit sets the valid bit in the IDT-vectoring information field Signed-off-by: NGleb Natapov <gleb@redhat.com> [Add parentheses around & within && - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Dave Jones 提交于
This seems to have been copied from the Optiplex 990 entry above, but somoene forgot to change the ident text. Signed-off-by: NDave Jones <davej@fedoraproject.org> Link: http://lkml.kernel.org/r/20130925001344.GA13554@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
xen_init_spinlocks() currently calls static_key_slow_inc() before jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is the case) the effect of this static_key_slow_inc() is deferred until after jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in which case the key is set immediately. Thus, depending on the value of config option, we may observe different behavior. In addition, when we come to __jump_label_transform() from jump_label_init(), the key (paravirt_ticketlocks_enabled) is already enabled. On processors where ideal_nop is not the same as default_nop this will cause a BUG() since it is expected that before a key is enabled the latter is replaced by the former during initialization. To address this problem we need to move static_key_slow_inc(¶virt_ticketlocks_enabled) so that it is called after jump_label_init(). We also need to make sure that this is done before other cpus start to boot. early_initcall appears to be a good place to do so. (Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops need to be set before alternative_instructions() runs.) Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Added extra comments in the code] Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 23 9月, 2013 2 次提交
-
-
由 Masoud Sharbiani 提交于
Dell PowerEdge C6100 machines fail to completely reboot about 20% of the time. Signed-off-by: NMasoud Sharbiani <msharbiani@twitter.com> Signed-off-by: NVinson Lee <vlee@twitter.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1379717947-18042-1-git-send-email-vlee@freedesktop.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Cc: a.p.zijlstra@chello.nl Cc: eranian@google.com Cc: ak@linux.intel.com Link: http://lkml.kernel.org/r/1379837953-17755-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 9月, 2013 2 次提交
-
-
由 Peter Zijlstra 提交于
Solve the problems around the broken definition of perf_event_mmap_page:: cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially fixed by: 860f085b ("perf: Fix broken union in 'struct perf_event_mmap_page'") The problem with the fix (merged in v3.12-rc1 and not yet released officially), noticed by Vince Weaver is that the new behavior is not detectable by new user-space, and that due to the reuse of the field names it's easy to mis-compile a binary if old headers are used on a new kernel or new headers are used on an old kernel. To solve all that make this change explicit, detectable and self-contained, by iterating the ABI the following way: - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not confuse old user-space binaries. RDPMC will be marked as unavailable to old binaries but that's within the ABI, this is a capability bit. - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new libraries can reliably detect that bit 0 is deprecated and perma-zero without having to check the kernel version. - Use bits 2, 3, 4 for the newly defined, correct functionality: cap_user_rdpmc : 1, /* The RDPMC instruction can be used to read counts */ cap_user_time : 1, /* The time_* fields are used */ cap_user_time_zero : 1, /* The time_zero field is used */ - Rename all the bitfield names in perf_event.h to be different from the old names, to make sure it's not possible to mis-compile it accidentally with old assumptions. The 'size' field can then be used in the future to add new fields and it will act as a natural ABI version indicator as well. Also adjust tools/perf/ userspace for the new definitions, noticed by Adrian Hunter. Reported-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Also-Fixed-by: NAdrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yan, Zheng 提交于
uncore_validate_group() can't call smp_processor_id() because it is in preemptible context. Pass NUMA_NO_NODE to the allocator instead. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1379400493-11505-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 9月, 2013 2 次提交
-
-
由 Josh Boyer 提交于
Add patch to fix 32bit EFI service mapping (rhbz 726701) Multiple people are reporting hitting the following WARNING on i386, WARNING: at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x3d3/0x440() Modules linked in: Pid: 0, comm: swapper Not tainted 3.9.0-rc7+ #95 Call Trace: [<c102b6af>] warn_slowpath_common+0x5f/0x80 [<c1023fb3>] ? __ioremap_caller+0x3d3/0x440 [<c1023fb3>] ? __ioremap_caller+0x3d3/0x440 [<c102b6ed>] warn_slowpath_null+0x1d/0x20 [<c1023fb3>] __ioremap_caller+0x3d3/0x440 [<c106007b>] ? get_usage_chars+0xfb/0x110 [<c102d937>] ? vprintk_emit+0x147/0x480 [<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de [<c102406a>] ioremap_cache+0x1a/0x20 [<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de [<c1418593>] efi_enter_virtual_mode+0x1e4/0x3de [<c1407984>] start_kernel+0x286/0x2f4 [<c1407535>] ? repair_env_string+0x51/0x51 [<c1407362>] i386_start_kernel+0x12c/0x12f Due to the workaround described in commit 916f676f ("x86, efi: Retain boot service code until after switching to virtual mode") EFI Boot Service regions are mapped for a period during boot. Unfortunately, with the limited size of the i386 direct kernel map it's possible that some of the Boot Service regions will not be directly accessible, which causes them to be ioremap()'d, triggering the above warning as the regions are marked as E820_RAM in the e820 memmap. There are currently only two situations where we need to map EFI Boot Service regions, 1. To workaround the firmware bug described in 916f676f 2. To access the ACPI BGRT image but since we haven't seen an i386 implementation that requires either, this simple fix should suffice for now. [ Added to changelog - Matt ] Reported-by: NBryan O'Donoghue <bryan.odonoghue.lkml@nexus-software.ie> Acked-by: NTom Zanussi <tom.zanussi@intel.com> Acked-by: NDarren Hart <dvhart@linux.intel.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: NJosh Boyer <jwboyer@redhat.com> Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
-
由 Gleb Natapov 提交于
Set "blocked by NMI" flag if EPT violation happens during IRET from NMI otherwise NMI can be called recursively causing stack corruption. Signed-off-by: NGleb Natapov <gleb@redhat.com>
-
- 17 9月, 2013 3 次提交
-
-
由 Gleb Natapov 提交于
After nested vmentry stale cache can be used to reload L2 PDPTR pointers which will cause L2 guest to fail. Fix it by invalidating cache on nested vmentry emulation. https://bugzilla.kernel.org/show_bug.cgi?id=60830Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Page tables in a read-only memory slot will currently cause a triple fault because the page walker uses gfn_to_hva and it fails on such a slot. OVMF uses such a page table; however, real hardware seems to be fine with that as long as the accessed/dirty bits are set. Save whether the slot is readonly, and later check it when updating the accessed and dirty bits. Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Bruce Rogers 提交于
Opcode CA This gets used by a DOS based NetWare guest. Signed-off-by: NBruce Rogers <brogers@suse.com> Reviewed-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 9月, 2013 2 次提交
-
-
由 Mathias Nyman 提交于
x86 chips with LPSS (low power subsystem) such as Lynxpoint and Baytrail have SoC like peripheral support and controllable pins. At the moment, Baytrail needs the pinctrl-baytrail driver to let peripherals control their gpio resources, but more pincontrol functions such as pin muxing and grouping are possible to add later. Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: http://lkml.kernel.org/r/1379080949-21734-1-git-send-email-mathias.nyman@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
On Intel SNB (SNB, SNB-EP), the event MEM_LOAD_UOPS_MISS_RETIRED supports PEBS. It was missing for the SNB PEBS event constraint table thereby preventing any measurement with PEBS for it. This patch adds the event to the PEBS table for SNB. WARNING: it should be noted that this event like a few others are subject to the erratum BT241 for Xeon E5 (SNB-EP). As such, the event may undercount when used with PEBS unless the workaround is implemented. But without this patch and just the workaround, the kernel would not allow precise sampling on this event. BT241 is documented in: http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdfSigned-off-by: NStephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Link: http://lkml.kernel.org/r/20130913201646.GA23981@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 9月, 2013 4 次提交
-
-
由 Martin Schwidefsky 提交于
After the last architecture switched to generic hard irqs the config options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code for !CONFIG_GENERIC_HARDIRQS can be removed. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Johannes Weiner 提交于
The x86 fault handler bails in the middle of error handling when the task has a fatal signal pending. For a subsequent patch this is a problem in OOM situations because it relies on pagefault_out_of_memory() being called even when the task has been killed, to perform proper per-task OOM state unwinding. Shortcutting the fault like this is a rather minor optimization that saves a few instructions in rare cases. Just remove it for user-triggered faults. Use the opportunity to split the fault retry handling from actual fault errors and add locking documentation that reads suprisingly similar to ARM's. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Unlike global OOM handling, memory cgroup code will invoke the OOM killer in any OOM situation because it has no way of telling faults occuring in kernel context - which could be handled more gracefully - from user-triggered faults. Pass a flag that identifies faults originating in user space from the architecture-specific fault handlers to generic code so that memcg OOM handling can be improved. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: azurIt <azurit@pobox.sk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Fengguang Wu reported: > sparse warnings: (new ones prefixed by >>) > > >> arch/x86/kernel/cpu/perf_event_intel.c:901:9: sparse: constant 0x768005ffff is so big it is long > >> arch/x86/kernel/cpu/perf_event_intel.c:902:9: sparse: constant 0x768005ffff is so big it is long > > vim +901 arch/x86/kernel/cpu/perf_event_intel.c > > 895 }, > 896 }; > 897 > 898 static struct extra_reg intel_slm_extra_regs[] __read_mostly = > 899 { > 900 /* must define OFFCORE_RSP_X first, see intel_fixup_er() */ > > 901 INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x768005ffff, RSP_0), > > 902 INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x768005ffff, RSP_1), > 903 EVENT_EXTRA_END > 904 }; > 905 Extend those constants to 64 bits. Reported-by: fengguang.wu@intel.com Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20130909112636.GQ31370@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 9月, 2013 6 次提交
-
-
由 Stephane Eranian 提交于
There was a bug in the handling of SNB-EP/IVB-EP uncore PCI fixed counters, e.g., IMC. It would cause erratic values to be returned for the IMC clockticks event. This was due to a bogus hwc->config value which was then written to PCI config space. The erratic values can be seen via: $ perf stat -a -C 0 -e uncore_imc_0/clockticks/ -I 1000 sleep 10 The fixed counter has most fields marked as reserved with hw reset values of 0. Yet the kernel was defaulting to a hwc->config = ~0 and that was causing the issues. This patch sets the hwc->config values for fixed uncore event to 0. Now, the values of IMC clockticks is correct. Signed-off-by: NStephane Eranian <eranian@google.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: peterz@infradead.org Cc: zheng.z.yan@intel.com Link: http://lkml.kernel.org/r/20130909195350.GA17643@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
The IvyBridge event CYCLE_ACTIVITY:CYCLES_LDM_PENDING can only be measured on counters 0-3 when HT is off. When HT is on, you only have counters 0-3. If you program it on the eight counters for 1s on a 3GHz IVB laptop running a noploop, you see: 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING 3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING Clearly the last 4 values are bogus. Signed-off-by: NStephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: ak@linux.intel.com Cc: zheng.z.yan@intel.com Cc: dhsharp@google.com Link: http://lkml.kernel.org/r/20130911152222.GA28761@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Cyrill Gorcunov 提交于
_PAGE_SOFT_DIRTY bit should never be set on present pte so add VM_BUG_ON to catch any potential future abuse. Also add a comment on _PAGE_SWP_SOFT_DIRTY definition explaining scope of its usage. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Acked-by: NPavel Emelyanov <xemul@parallels.com> Acked-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Currently hugepage migration works well only for pmd-based hugepages (mainly due to lack of testing,) so we had better not enable migration of other levels of hugepages until we are ready for it. Some users of hugepage migration (mbind, move_pages, and migrate_pages) do page table walk and check pud/pmd_huge() there, so they are safe. But the other users (softoffline and memory hotremove) don't do this, so without this patch they can try to migrate unexpected types of hugepages. To prevent this, we introduce hugepage_migration_support() as an architecture dependent check of whether hugepage are implemented on a pmd basis or not. And on some architecture multiple sizes of hugepages are available, so hugepage_migration_support() also checks hugepage size. Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Hansen 提交于
The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled for SMP. UP systems do not do remote TLB flushes, so compile those counters out on UP. arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly. This is probably an optimization since both the mtrr code and __flush_tlb() write cr4. It would probably be safe to make that a flush_tlb_all() (and then get these statistics), but the mtrr code is ancient and I'm hesitant to touch it other than to just stick in the counters. [akpm@linux-foundation.org: tweak comments] Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Hansen 提交于
I was investigating some TLB flush scaling issues and realized that we do not have any good methods for figuring out how many TLB flushes we are doing. It would be nice to be able to do these in generic code, but the arch-independent calls don't explicitly specify whether we actually need to do remote flushes or not. In the end, we really need to know if we actually _did_ global vs. local invalidations, so that leaves us with few options other than to muck with the counters from arch-specific code. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 9月, 2013 1 次提交
-
-
由 Dave Chinner 提交于
Convert the remaining couple of random shrinkers in the tree to the new API. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NGlauber Costa <glommer@openvz.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 10 9月, 2013 3 次提交
-
-
由 Borislav Petkov 提交于
b3af11af ("x86: get rid of pt_regs argument of iopl(2)") dropped PTREGSCALL which was also the last user of save_rest. Drop that now-unused function too. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: http://lkml.kernel.org/r/1378546750-19727-1-git-send-email-bp@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Konrad Rzeszutek Wilk 提交于
As we get compile warnings about .init.data being used by non-init functions. Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
This reverts commit 70dd4998. Now that the bugs have been resolved we can re-enable the PV ticketlock implementation under PVHVM Xen guests. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
-