提交 · 36704263f1f466f7eb6f237e97f4ec15473980fc · openeuler / raspberrypi-kernel

11 10月, 2013 1 次提交

compiler/gcc4: Add quirk for 'asm goto' miscompilation bug · 3f0116c3

由 Ingo Molnar 提交于 10月 10, 2013

Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down
a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto'
constructs, as outlined here:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670

Implement a workaround suggested by Jakub Jelinek.
Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
Reported-by: NOleg Nesterov <oleg@redhat.com>
Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Suggested-by: NJakub Jelinek <jakub@redhat.com>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

3f0116c3

10 10月, 2013 2 次提交

xen: Fix possible user space selector corruption · 7cde9b27

由 Frediano Ziglio 提交于 10月 10, 2013

Due to the way kernel is initialized under Xen is possible that the
ring1 selector used by the kernel for the boot cpu end up to be copied
to userspace leading to segmentation fault in the userspace.

Xen code in the kernel initialize no-boot cpus with correct selectors (ds
and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
On task context switch (switch_to) we assume that ds, es and cs already
point to __USER_DS and __KERNEL_CSso these selector are not changed.

If processor is an Intel that support sysenter instruction sysenter/sysexit
is used so ds and es are not restored switching back from kernel to
userspace. In the case the selectors point to a ring1 instead of __USER_DS
the userspace code will crash on first memory access attempt (to be
precise Xen on the emulated iret used to do sysexit will detect and set ds
and es to zero which lead to GPF anyway).

Now if an userspace process call kernel using sysenter and get rescheduled
(for me it happen on a specific init calling wait4) could happen that the
ring1 selector is set to ds and es.

This is quite hard to detect cause after a while these selectors are fixed
(__USER_DS seems sticky).

Bisecting the code commit 7076aada appears
to be the first one that have this issue.
Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>

7cde9b27

KVM: nVMX: fix shadow on EPT · d0d538b9

由 Gleb Natapov 提交于 10月 09, 2013

72f85795 broke shadow on EPT. This patch reverts it and fixes PAE
on nEPT (which reverted commit fixed) in other way.

Shadow on EPT is now broken because while L1 builds shadow page table
for L2 (which is PAE while L2 is in real mode) it never loads L2's
GUEST_PDPTR[0-3].  They do not need to be loaded because without nested
virtualization HW does this during guest entry if EPT is disabled,
but in our case L0 emulates L2's vmentry while EPT is enables, so we
cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values
and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3()
is doing, but by clearing cache bits during L2 vmentry we drop values
that kvm_set_cr3() read from memory.

So why the same code does not work for PAE on nEPT? kvm_set_cr3()
reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to
vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs()
uses vcpu->arch.mmu which contain incorrect values. Fix that by using
walk_mmu in ept_(load|save)_pdptrs.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0d538b9

06 10月, 2013 1 次提交

x86/reboot: Add reboot quirk for Dell Latitude E5410 · 8412da75

由 Ville Syrjälä 提交于 10月 04, 2013

Dell Latitude E5410 needs reboot=pci to actually reboot.
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Link: http://lkml.kernel.org/r/1380888964-14517-1-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8412da75

05 10月, 2013 2 次提交

Revert "x86/PCI: MMCONFIG: Check earlier for MMCONFIG region at address zero" · 67d470e0

由 Bjorn Helgaas 提交于 10月 04, 2013

This reverts commit 07f9b61c.

07f9b61c was intended to be a cleanup that didn't change anything, but in
fact, for systems without _CBA (which is almost everything), it broke
extended config space for domain 0 and all config space for other domains.

Reference: http://lkml.kernel.org/r/20131004011806.GE20450@dangermouse.emea.sgi.comReported-by: NHedi Berriche <hedi@sgi.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

67d470e0

x86, build, pci: Fix PCI_MSI build on !SMP · 0dbc6078

由 Thomas Petazzoni 提交于 10月 03, 2013

Commit ebd97be6 ('PCI: remove ARCH_SUPPORTS_MSI kconfig option')
removed the ARCH_SUPPORTS_MSI option which architectures could select
to indicate that they support MSI. Now, all architectures are supposed
to build fine when MSI support is enabled: instead of having the
architecture tell *when* MSI support can be used, it's up to the
architecture code to ensure that MSI support can be enabled.

On x86, commit ebd97be6 removed the following line:

select ARCH_SUPPORTS_MSI if (X86_LOCAL_APIC && X86_IO_APIC)

Which meant that MSI support was only available when the local APIC
and I/O APIC were enabled. While this is always true on SMP or x86-64,
it is not necessarily the case on i386 !SMP.

The below patch makes sure that the local APIC and I/O APIC support is
always enabled when MSI support is enabled. To do so, it:

* Ensures the X86_UP_APIC option is not visible when PCI_MSI is
enabled. This is the option that allows, on UP machines, to enable
or not the APIC support. It is already not visible on SMP systems,
or x86-64 systems, for example. We're simply also making it
invisible on i386 MSI systems.

* Ensures that the X86_LOCAL_APIC and X86_IO_APIC options are 'y'
when PCI_MSI is enabled.

Notice that this change requires a change in drivers/iommu/Kconfig to
avoid a recursive Kconfig dependencey. The AMD_IOMMU option selects
PCI_MSI, but was depending on X86_IO_APIC. This dependency is no
longer needed: as soon as PCI_MSI is selected, the presence of
X86_IO_APIC is guaranteed. Moreover, the AMD_IOMMU already depended on
X86_64, which already guaranteed that X86_IO_APIC was enabled, so this
dependency was anyway redundant.
Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
Link: http://lkml.kernel.org/r/1380794354-9079-1-git-send-email-thomas.petazzoni@free-electrons.comReported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

0dbc6078

04 10月, 2013 1 次提交

perf/x86: Clean up cap_user_time* setting · d8b11a0c

由 Peter Zijlstra 提交于 10月 03, 2013

Currently the cap_user_time_zero capability has different tests than
cap_user_time; even though they expose the exact same data.

Switch from CONSTANT && NONSTOP to sched_clock_stable to also deal
with multi cabinet machines and drop the tsc_disabled() check.. non of
this will work sanely without tsc anyway.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-nmgn0j0muo1r4c94vlfh23xy@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

d8b11a0c

03 10月, 2013 1 次提交

x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning · 29d274b8

由 David Herrmann 提交于 10月 02, 2013

IORESOURCE_BUSY is used to mark temporary driver mem-resources
instead of global regions. This suppresses warnings if regions
overlap with a region marked as BUSY.

This was always the case for VESA/VGA/EFI framebuffer regions so
do the same for simplefb regions. The reason we do this is to
allow device handover to real GPU drivers like
i915/radeon/nouveau which get the same regions via PCI BARs.

Maybe at some point we will be able to unregister platform
devices properly during the handover. In this case the simplefb
region would get removed before the new region is created.
However, this is currently not the case and would require rather
huge changes in remove_conflicting_framebuffers(). Add the BUSY
marker now and try to eventually rewrite the handover for a next release.

Also see kernel/resource.c for more information:

/*
* if a resource is "BUSY", it's not a hardware resource
* but a driver mapping of such a resource; we don't want
* to warn for those; some drivers legitimately map only
* partial hardware resources. (example: vesafb)
*/

This suppresses warnings like:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 199 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2e3/0x390()
Info: mapping multiple BARs. Your kernel is fine.
Call Trace:
dump_stack+0x54/0x8d
warn_slowpath_common+0x7d/0xa0
warn_slowpath_fmt+0x4c/0x50
iomem_map_sanity_check+0xac/0xe0
__ioremap_caller+0x2e3/0x390
ioremap_wc+0x32/0x40
i915_driver_load+0x670/0xf50 [i915]
...
Reported-by: NTom Gundersen <teg@jklm.no>
Tested-by: NTom Gundersen <teg@jklm.no>
Tested-by: NPavel Roskin <proski@gnu.org>
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1380724864-1757-1-git-send-email-dh.herrmann@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

29d274b8

02 10月, 2013 1 次提交

x86/simplefb: Fix overflow causing bogus fall-back · e33a29a5

由 Tom Gundersen 提交于 10月 01, 2013

On my MacBook Air lfb_size is 4M, which makes the bitshit
overflow (to 256GB - larger than 32 bits), meaning we fall
back to efifb unnecessarily.

Cast to u64 to avoid the overflow.
Signed-off-by: NTom Gundersen <teg@jklm.no>
Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Link: http://lkml.kernel.org/r/1380644320-1026-1-git-send-email-teg@jklm.noSigned-off-by: NIngo Molnar <mingo@kernel.org>

e33a29a5

28 9月, 2013 1 次提交

perf/x86: Fix PMU detection printout when no PMU is detected · 8a3da6c7

由 Ingo Molnar 提交于 9月 28, 2013

Ran into this cryptic PMU bootup log recently:

[    0.124047] Performance Events:
[    0.125000] smpboot: ...

Turns out we print this if no PMU is detected. Fall back to
the right condition so that the following is printed:

[    0.122381] Performance Events: no PMU driver, software events only.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-u2fwaUffakjp0qkpRfqljgsn@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

8a3da6c7

27 9月, 2013 1 次提交

x86/microcode/AMD: Fix patch level reporting for family 15h · accd1e82

由 Suravee Suthikulpanit 提交于 9月 29, 2010

On AMD family 14h, applying microcode patch on the a core (core0)
would also affect the other core (core1) in the same compute
unit. The driver would skip applying the patch on core1, but it
still need to update kernel structures to reflect the proper
patch level.

The current logic is not updating the struct
ucode_cpu_info.cpu_sig.rev of the skipped core. This causes the
/sys/devices/system/cpu/cpu1/microcode/version to report
incorrect patch level as shown below:

  $ grep . cpu?/microcode/version
  cpu0/microcode/version:0x600063d
  cpu1/microcode/version:0x6000626
  cpu2/microcode/version:0x600063d
  cpu3/microcode/version:0x6000626
  cpu4/microcode/version:0x600063d
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Cc: <bp@alien8.de>
Cc: <jacob.w.shin@gmail.com>
Cc: <herrmann.der.user@googlemail.com>
Link: http://lkml.kernel.org/r/1285806432-1995-1-git-send-email-suravee.suthikulpanit@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

accd1e82

25 9月, 2013 4 次提交

xen/p2m: check MFN is in range before using the m2p table · 0160676b

由 David Vrabel 提交于 9月 13, 2013

On hosts with more than 168 GB of memory, a 32-bit guest may attempt
to grant map an MFN that is error cannot lookup in its mapping of the
m2p table.  There is an m2p lookup as part of m2p_add_override() and
m2p_remove_override().  The lookup falls off the end of the mapped
portion of the m2p and (because the mapping is at the highest virtual
address) wraps around and the lookup causes a fault on what appears to
be a user space address.

do_page_fault() (thinking it's a fault to a userspace address), tries
to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
already locked.  do_page_fault() then deadlocks.

The deadlock would most commonly occur when a 64-bit guest is started
and xenconsoled attempts to grant map its console ring.

Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
mapped portion of the m2p table before accessing the table and use
this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
(which already had the correct range check).

All faults caused by accessing the non-existant parts of the m2p are
thus within the kernel address space and exception_fixup() is called
without trying to lock mm->mmap_sem.

This means that for MFNs that are outside the mapped range of the m2p
then mfn_to_pfn() will always look in the m2p overrides.  This is
correct because it must be a foreign MFN (and the PFN in the m2p in
this case is only relevant for the other domain).
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
--
v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
    range as it's probably foreign.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>

0160676b

KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined · bcd1c294

由 Gleb Natapov 提交于 9月 25, 2013

Bit 12 is undefined in any of the following cases:
- If the "NMI exiting" VM-execution control is 1 and the "virtual NMIs"
  VM-execution control is 0.
- If the VM exit sets the valid bit in the IDT-vectoring information field
Signed-off-by: NGleb Natapov <gleb@redhat.com>
[Add parentheses around & within && - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bcd1c294

x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround · 7a20c2fa

由 Dave Jones 提交于 9月 24, 2013

This seems to have been copied from the Optiplex 990 entry
above, but somoene forgot to change the ident text.
Signed-off-by: NDave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20130925001344.GA13554@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7a20c2fa

xen: Do not enable spinlocks before jump_label_init() has executed · a945928e

由 Konrad Rzeszutek Wilk 提交于 9月 12, 2013

xen_init_spinlocks() currently calls static_key_slow_inc() before
jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is
the case) the effect of this static_key_slow_inc() is deferred until after
jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in
which case the key is set immediately. Thus, depending on the value of config
option, we may observe different behavior.

In addition, when we come to __jump_label_transform() from jump_label_init(),
the key (paravirt_ticketlocks_enabled) is already enabled. On processors where
ideal_nop is not the same as default_nop this will cause a BUG() since it is
expected that before a key is enabled the latter is replaced by the former
during initialization.

To address this problem we need to move
static_key_slow_inc(&paravirt_ticketlocks_enabled) so that it is called
after jump_label_init(). We also need to make sure that this is done before
other cpus start to boot. early_initcall appears to be  a good place to do so.
(Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops
need to be set before alternative_instructions() runs.)
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Added extra comments in the code]
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>

a945928e

23 9月, 2013 2 次提交

x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically · 4f0acd31

由 Masoud Sharbiani 提交于 9月 20, 2013

Dell PowerEdge C6100 machines fail to completely reboot about 20% of the time.
Signed-off-by: NMasoud Sharbiani <msharbiani@twitter.com>
Signed-off-by: NVinson Lee <vlee@twitter.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1379717947-18042-1-git-send-email-vlee@freedesktop.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4f0acd31

perf/x86/intel: Add model number for Avoton Silvermont · cf3b425d

由 Yan, Zheng 提交于 9月 22, 2013

Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1379837953-17755-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

cf3b425d

20 9月, 2013 2 次提交

perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page' · fa731587

由 Peter Zijlstra 提交于 9月 19, 2013

Solve the problems around the broken definition of perf_event_mmap_page::
cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
fixed by:

  860f085b ("perf: Fix broken union in 'struct perf_event_mmap_page'")

The problem with the fix (merged in v3.12-rc1 and not yet released
officially), noticed by Vince Weaver is that the new behavior is
not detectable by new user-space, and that due to the reuse of the
field names it's easy to mis-compile a binary if old headers are used
on a new kernel or new headers are used on an old kernel.

To solve all that make this change explicit, detectable and self-contained,
by iterating the ABI the following way:

 - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
   confuse old user-space binaries. RDPMC will be marked as unavailable
   to old binaries but that's within the ABI, this is a capability bit.

 - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - Use bits 2, 3, 4 for the newly defined, correct functionality:

	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
	cap_user_time		: 1, /* The time_* fields are used */
	cap_user_time_zero	: 1, /* The time_zero field is used */

 - Rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

The 'size' field can then be used in the future to add new fields and it
will act as a natural ABI version indicator as well.

Also adjust tools/perf/ userspace for the new definitions, noticed by
Adrian Hunter.
Reported-by: NVince Weaver <vincent.weaver@maine.edu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Also-Fixed-by: NAdrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

fa731587

perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group() · 73c4427c

由 Yan, Zheng 提交于 9月 17, 2013

uncore_validate_group() can't call smp_processor_id() because it is
in preemptible context. Pass NUMA_NO_NODE to the allocator instead.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379400493-11505-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

73c4427c

18 9月, 2013 2 次提交

x86, efi: Don't map Boot Services on i386 · 70087011

由 Josh Boyer 提交于 4月 18, 2013

Add patch to fix 32bit EFI service mapping (rhbz 726701)

Multiple people are reporting hitting the following WARNING on i386,

  WARNING: at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x3d3/0x440()
  Modules linked in:
  Pid: 0, comm: swapper Not tainted 3.9.0-rc7+ #95
  Call Trace:
   [<c102b6af>] warn_slowpath_common+0x5f/0x80
   [<c1023fb3>] ? __ioremap_caller+0x3d3/0x440
   [<c1023fb3>] ? __ioremap_caller+0x3d3/0x440
   [<c102b6ed>] warn_slowpath_null+0x1d/0x20
   [<c1023fb3>] __ioremap_caller+0x3d3/0x440
   [<c106007b>] ? get_usage_chars+0xfb/0x110
   [<c102d937>] ? vprintk_emit+0x147/0x480
   [<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de
   [<c102406a>] ioremap_cache+0x1a/0x20
   [<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de
   [<c1418593>] efi_enter_virtual_mode+0x1e4/0x3de
   [<c1407984>] start_kernel+0x286/0x2f4
   [<c1407535>] ? repair_env_string+0x51/0x51
   [<c1407362>] i386_start_kernel+0x12c/0x12f

Due to the workaround described in commit 916f676f ("x86, efi: Retain
boot service code until after switching to virtual mode") EFI Boot
Service regions are mapped for a period during boot. Unfortunately, with
the limited size of the i386 direct kernel map it's possible that some
of the Boot Service regions will not be directly accessible, which
causes them to be ioremap()'d, triggering the above warning as the
regions are marked as E820_RAM in the e820 memmap.

There are currently only two situations where we need to map EFI Boot
Service regions,

  1. To workaround the firmware bug described in 916f676f
  2. To access the ACPI BGRT image

but since we haven't seen an i386 implementation that requires either,
this simple fix should suffice for now.

[ Added to changelog - Matt ]
Reported-by: NBryan O'Donoghue <bryan.odonoghue.lkml@nexus-software.ie>
Acked-by: NTom Zanussi <tom.zanussi@intel.com>
Acked-by: NDarren Hart <dvhart@linux.intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

70087011

KVM: VMX: set "blocked by NMI" flag if EPT violation happens during IRET from NMI · 0be9c7a8

由 Gleb Natapov 提交于 9月 15, 2013

Set "blocked by NMI" flag if EPT violation happens during IRET from NMI
otherwise NMI can be called recursively causing stack corruption.
Signed-off-by: NGleb Natapov <gleb@redhat.com>

0be9c7a8

17 9月, 2013 3 次提交

KVM: nEPT: reset PDPTR register cache on nested vmentry emulation · 72f85795

由 Gleb Natapov 提交于 9月 02, 2013

After nested vmentry stale cache can be used to reload L2 PDPTR pointers
which will cause L2 guest to fail. Fix it by invalidating cache on nested
vmentry emulation.

https://bugzilla.kernel.org/show_bug.cgi?id=60830Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

72f85795

KVM: mmu: allow page tables to be in read-only slots · ba6a3541

由 Paolo Bonzini 提交于 9月 09, 2013

Page tables in a read-only memory slot will currently cause a triple
fault because the page walker uses gfn_to_hva and it fails on such a slot.

OVMF uses such a page table; however, real hardware seems to be fine with
that as long as the accessed/dirty bits are set. Save whether the slot
is readonly, and later check it when updating the accessed and dirty bits.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba6a3541

KVM: x86 emulator: emulate RETF imm · 3261107e

由 Bruce Rogers 提交于 9月 09, 2013

Opcode CA

This gets used by a DOS based NetWare guest.
Signed-off-by: NBruce Rogers <brogers@suse.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3261107e

14 9月, 2013 2 次提交

x86/intel/lpss: Add pin control support to Intel low power subsystem · 0f531431

由 Mathias Nyman 提交于 9月 13, 2013

x86 chips with LPSS (low power subsystem) such as Lynxpoint and
Baytrail have SoC like peripheral support and controllable pins.

At the moment, Baytrail needs the pinctrl-baytrail driver to let
peripherals control their gpio resources, but more pincontrol
functions such as pin muxing and grouping are possible to add
later.
Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: http://lkml.kernel.org/r/1379080949-21734-1-git-send-email-mathias.nyman@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0f531431

perf/x86/intel: Mark MEM_LOAD_UOPS_MISS_RETIRED as precise on SNB · 9d8e3f96

由 Stephane Eranian 提交于 9月 13, 2013

On Intel SNB (SNB, SNB-EP), the event MEM_LOAD_UOPS_MISS_RETIRED
supports PEBS. It was missing for the SNB PEBS event constraint
table thereby preventing any measurement with PEBS for it.

This patch adds the event to the PEBS table for SNB.

WARNING: it should be noted that this event like a few others
are subject to the erratum BT241 for Xeon E5 (SNB-EP). As such,
the event may undercount when used with PEBS unless the
workaround is implemented. But without this patch and just the
workaround, the kernel would not allow precise sampling on this
event. BT241 is documented in:

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdfSigned-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20130913201646.GA23981@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9d8e3f96

13 9月, 2013 4 次提交

Remove GENERIC_HARDIRQ config option · 0244ad00

由 Martin Schwidefsky 提交于 8月 30, 2013

After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

0244ad00

x86: finish user fault error path with fatal signal · 3a13c4d7

由 Johannes Weiner 提交于 9月 12, 2013

The x86 fault handler bails in the middle of error handling when the
task has a fatal signal pending.  For a subsequent patch this is a
problem in OOM situations because it relies on pagefault_out_of_memory()
being called even when the task has been killed, to perform proper
per-task OOM state unwinding.

Shortcutting the fault like this is a rather minor optimization that
saves a few instructions in rare cases.  Just remove it for
user-triggered faults.

Use the opportunity to split the fault retry handling from actual fault
errors and add locking documentation that reads suprisingly similar to
ARM's.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3a13c4d7

arch: mm: pass userspace fault flag to generic fault handler · 759496ba

由 Johannes Weiner 提交于 9月 12, 2013

Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

759496ba

perf/x86/intel: Fix Silvermont offcore masks · 06c939c1

由 Peter Zijlstra 提交于 9月 09, 2013

Fengguang Wu reported:

> sparse warnings: (new ones prefixed by >>)
>
> >> arch/x86/kernel/cpu/perf_event_intel.c:901:9: sparse: constant 0x768005ffff is so big it is long
> >> arch/x86/kernel/cpu/perf_event_intel.c:902:9: sparse: constant 0x768005ffff is so big it is long
>
> vim +901 arch/x86/kernel/cpu/perf_event_intel.c
>
>    895	 },
>    896	};
>    897
>    898	static struct extra_reg intel_slm_extra_regs[] __read_mostly =
>    899	{
>    900		/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
>  > 901		INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x768005ffff, RSP_0),
>  > 902		INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x768005ffff, RSP_1),
>    903		EVENT_EXTRA_END
>    904	};
>    905

Extend those constants to 64 bits.

Reported-by: fengguang.wu@intel.com
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130909112636.GQ31370@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

06c939c1

12 9月, 2013 6 次提交

perf/x86: Fix uncore PCI fixed counter handling · dbc33f70

由 Stephane Eranian 提交于 9月 09, 2013

There was a bug in the handling of SNB-EP/IVB-EP uncore PCI
fixed counters, e.g., IMC.

It would cause erratic values to be returned for the IMC
clockticks event. This was due to a bogus hwc->config value
which was then written to PCI config space.

The erratic values can be seen via:

  $ perf stat -a -C 0 -e uncore_imc_0/clockticks/ -I 1000 sleep 10

The fixed counter has most fields marked as reserved with
hw reset values of 0. Yet the kernel was defaulting to a
hwc->config = ~0 and that was causing the issues.

This patch sets the hwc->config values for fixed uncore event
to 0. Now, the values of IMC clockticks is correct.
Signed-off-by: NStephane Eranian <eranian@google.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Cc: peterz@infradead.org
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20130909195350.GA17643@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

dbc33f70

perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING · 6113af14

由 Stephane Eranian 提交于 9月 11, 2013

The IvyBridge event CYCLE_ACTIVITY:CYCLES_LDM_PENDING can only
be measured on counters 0-3 when HT is off. When HT is on, you
only have counters 0-3.

If you program it on the eight counters for 1s on a 3GHz
IVB laptop running a noploop, you see:

           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING

Clearly the last 4 values are bogus.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: dhsharp@google.com
Link: http://lkml.kernel.org/r/20130911152222.GA28761@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6113af14

mm: make sure _PAGE_SWP_SOFT_DIRTY bit is not set on present pte · fa0f281c

由 Cyrill Gorcunov 提交于 9月 11, 2013

_PAGE_SOFT_DIRTY bit should never be set on present pte so add VM_BUG_ON
to catch any potential future abuse.

Also add a comment on _PAGE_SWP_SOFT_DIRTY definition explaining scope of
its usage.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa0f281c

mm: migrate: check movability of hugepage in unmap_and_move_huge_page() · 83467efb

由 Naoya Horiguchi 提交于 9月 11, 2013

Currently hugepage migration works well only for pmd-based hugepages
(mainly due to lack of testing,) so we had better not enable migration of
other levels of hugepages until we are ready for it.

Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
page table walk and check pud/pmd_huge() there, so they are safe.  But the
other users (softoffline and memory hotremove) don't do this, so without
this patch they can try to migrate unexpected types of hugepages.

To prevent this, we introduce hugepage_migration_support() as an
architecture dependent check of whether hugepage are implemented on a pmd
basis or not.  And on some architecture multiple sizes of hugepages are
available, so hugepage_migration_support() also checks hugepage size.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83467efb

mm: vmstats: track TLB flush stats on UP too · 6df46865

由 Dave Hansen 提交于 9月 11, 2013

The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush
counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled
for SMP.

UP systems do not do remote TLB flushes, so compile those counters out on
UP.

arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly.  This is
probably an optimization since both the mtrr code and __flush_tlb() write
cr4.  It would probably be safe to make that a flush_tlb_all() (and then
get these statistics), but the mtrr code is ancient and I'm hesitant to
touch it other than to just stick in the counters.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6df46865

mm: vmstats: tlb flush counters · 9824cf97

由 Dave Hansen 提交于 9月 11, 2013

I was investigating some TLB flush scaling issues and realized that we do
not have any good methods for figuring out how many TLB flushes we are
doing.

It would be nice to be able to do these in generic code, but the
arch-independent calls don't explicitly specify whether we actually need
to do remote flushes or not.  In the end, we really need to know if we
actually _did_ global vs.  local invalidations, so that leaves us with few
options other than to muck with the counters from arch-specific code.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9824cf97

11 9月, 2013 1 次提交

shrinker: convert remaining shrinkers to count/scan API · 70534a73

由 Dave Chinner 提交于 8月 28, 2013

Convert the remaining couple of random shrinkers in the tree to the new
API.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NGlauber Costa <glommer@openvz.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

70534a73

10 9月, 2013 3 次提交

x86: Remove now-unused save_rest() · c0da0fa1

由 Borislav Petkov 提交于 9月 07, 2013

b3af11af ("x86: get rid of pt_regs argument of iopl(2)")
dropped PTREGSCALL which was also the last user of save_rest.
Drop that now-unused function too.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: http://lkml.kernel.org/r/1378546750-19727-1-git-send-email-bp@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

c0da0fa1

xen/spinlock: Don't use __initdate for xen_pv_spin · c3b7cb1f

由 Konrad Rzeszutek Wilk 提交于 9月 09, 2013

As we get compile warnings about .init.data being
used by non-init functions.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

c3b7cb1f

Revert "xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM" · fb78e58c

由 Konrad Rzeszutek Wilk 提交于 8月 26, 2013

This reverts commit 70dd4998.

Now that the bugs have been resolved we can re-enable the
PV ticketlock implementation under PVHVM Xen guests.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>

fb78e58c