提交 · 2039e6acaf94d83ec6b6d9f3d0bce7ea1f099918 · openeuler / raspberrypi-kernel

05 1月, 2016 2 次提交

x86/mm/pat: Change free_memtype() to support shrinking case · 2039e6ac

由 Toshi Kani 提交于 12月 22, 2015

Using mremap() to shrink the map size of a VM_PFNMAP range causes
the following error message, and leaves the pfn range allocated.

 x86/PAT: test:3493 freeing invalid memtype [mem 0x483200000-0x4863fffff]

This is because rbt_memtype_erase(), called from free_memtype()
with spin_lock held, only supports to free a whole memtype node in
memtype_rbroot.  Therefore, this patch changes rbt_memtype_erase()
to support a request that shrinks the size of a memtype node for
mremap().

memtype_rb_exact_match() is renamed to memtype_rb_match(), and
is enhanced to support EXACT_MATCH and END_MATCH in @match_type.
Since the memtype_rbroot tree allows overlapping ranges,
rbt_memtype_erase() checks with EXACT_MATCH first, i.e. free
a whole node for the munmap case.  If no such entry is found,
it then checks with END_MATCH, i.e. shrink the size of a node
from the end for the mremap case.

On the mremap case, rbt_memtype_erase() proceeds in two steps,
1) remove the node, and then 2) insert the updated node.  This
allows proper update of augmented values, subtree_max_end, in
the tree.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: stsp@list.ru
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1450832064-10093-3-git-send-email-toshi.kani@hpe.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

2039e6ac

x86/mm/pat: Add untrack_pfn_moved for mremap · d9fe4fab

由 Toshi Kani 提交于 12月 22, 2015

mremap() with MREMAP_FIXED on a VM_PFNMAP range causes the following
WARN_ON_ONCE() message in untrack_pfn().

  WARNING: CPU: 1 PID: 3493 at arch/x86/mm/pat.c:985 untrack_pfn+0xbd/0xd0()
  Call Trace:
  [<ffffffff817729ea>] dump_stack+0x45/0x57
  [<ffffffff8109e4b6>] warn_slowpath_common+0x86/0xc0
  [<ffffffff8109e5ea>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8106a88d>] untrack_pfn+0xbd/0xd0
  [<ffffffff811d2d5e>] unmap_single_vma+0x80e/0x860
  [<ffffffff811d3725>] unmap_vmas+0x55/0xb0
  [<ffffffff811d916c>] unmap_region+0xac/0x120
  [<ffffffff811db86a>] do_munmap+0x28a/0x460
  [<ffffffff811dec33>] move_vma+0x1b3/0x2e0
  [<ffffffff811df113>] SyS_mremap+0x3b3/0x510
  [<ffffffff817793ee>] entry_SYSCALL_64_fastpath+0x12/0x71

MREMAP_FIXED moves a pfnmap from old vma to new vma.  untrack_pfn() is
called with the old vma after its pfnmap page table has been removed,
which causes follow_phys() to fail.  The new vma has a new pfnmap to
the same pfn & cache type with VM_PAT set.  Therefore, we only need to
clear VM_PAT from the old vma in this case.

Add untrack_pfn_moved(), which clears VM_PAT from a given old vma.
move_vma() is changed to call this function with the old vma when
VM_PFNMAP is set.  move_vma() then calls do_munmap(), and untrack_pfn()
is a no-op since VM_PAT is cleared.
Reported-by: NStas Sergeev <stsp@list.ru>
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1450832064-10093-2-git-send-email-toshi.kani@hpe.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

d9fe4fab

29 12月, 2015 2 次提交

x86/mm: Drop WARN from multi-BAR check · 9abb0ecd

由 Laura Abbott 提交于 12月 21, 2015

ioremapping multiple BARs produces a warning with a message "Your kernel is
fine". This message mostly serves to comfort kernel developers. Users do
not read the message, they only see the big scary warning which means
something must be horribly broken with their system. Less dramatically, the
warn also sets the taint flag which makes it difficult to differentiate
problems. If the kernel is actually fine as the warning claims it doesn't
make sense for it to be tainted. Change the WARN_ONCE to a pr_warn with the
caller of the ioremap.
Signed-off-by: NLaura Abbott <labbott@fedoraproject.org>
Link: http://lkml.kernel.org/r/1450728074-31029-1-git-send-email-labbott@fedoraproject.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

9abb0ecd

x86/LDT: Print the real LDT base address · 0d430e3f

由 Jan Beulich 提交于 12月 22, 2015

This was meant to print base address and entry count; make it do so
again.

Fixes: 37868fe1 "x86/ldt: Make modify_ldt synchronous"
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/56797D8402000078000C24F0@prv-mh.provo.novell.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

0d430e3f

06 12月, 2015 2 次提交

x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN · ec941c5f

由 Igor Mammedov 提交于 12月 04, 2015

when memory hotplug enabled system is booted with less
than 4GB of RAM and then later more RAM is hotplugged
32-bit devices stop functioning with following error:

 nommu_map_single: overflow 327b4f8c0+1522 of device mask ffffffff

the reason for this is that if x86_64 system were booted
with RAM less than 4GB, it doesn't enable SWIOTLB and
when memory is hotplugged beyond MAX_DMA32_PFN, devices
that expect 32-bit addresses can't handle 64-bit addresses.

Fix it by tracking max possible PFN when parsing
memory affinity structures from SRAT ACPI table and
enable SWIOTLB if there is hotpluggable memory
regions beyond MAX_DMA32_PFN.

It fixes KVM guests when they use emulated devices
(reproduces with ata_piix, e1000 and usb devices,
 RHBZ: 1275941, 1275977, 1271527)

It also fixes the HyperV, VMWare with emulated devices
which are affected by this issue as well.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: fujita.tomonori@lab.ntt.co.jp
Cc: konrad.wilk@oracle.com
Cc: pbonzini@redhat.com
Cc: revers@redhat.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1449234426-273049-3-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ec941c5f

x86/mm: Introduce max_possible_pfn · 8dd33030

由 Igor Mammedov 提交于 12月 04, 2015

max_possible_pfn will be used for tracking max possible
PFN for memory that isn't present in E820 table and
could be hotplugged later.

By default max_possible_pfn is initialized with max_pfn,
but later it could be updated with highest PFN of
hotpluggable memory ranges declared in ACPI SRAT table
if any present.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: fujita.tomonori@lab.ntt.co.jp
Cc: konrad.wilk@oracle.com
Cc: pbonzini@redhat.com
Cc: revers@redhat.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1449234426-273049-2-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8dd33030

04 12月, 2015 2 次提交

x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only · 071ac0c4

由 Borislav Petkov 提交于 11月 30, 2015

File should be created with S_IRUSR and not with S_IWUSR too
because writing to it doesn't make any sense. I mean, we don't
have a ->write method anyway but let's have the permissions
correct too.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448885579-32506-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

071ac0c4

x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata · c332813b

由 Rasmus Villemoes 提交于 12月 01, 2015

'range_new' doesn't seem to be used after init. It is only passed
to memset(), sum_ranges(), memcmp() and x86_get_mtrr_mem_range(), the
latter of which also only passes it on to various *range*
library functions.

So mark it __initdata to free up an extra page after init.

Its contents are wiped at every call to mtrr_calc_range_state(),
so it being static is not about preserving state between calls,
but simply to avoid a 4k+ stack frame. While there, add a
comment explaining this and why it's safe.

We could also mark nr_range_new as __initdata, but since it's
just a single int and also doesn't carry state between calls (it
is unconditionally assigned to before it is read), we might as
well make it an ordinary automatic variable.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/1449002691-20783-1-git-send-email-linux@rasmusvillemoes.dkSigned-off-by: NIngo Molnar <mingo@kernel.org>

c332813b

23 11月, 2015 1 次提交

x86/mm: Turn CONFIG_X86_PTDUMP into a module · 8609d1b5

由 Kees Cook 提交于 11月 19, 2015

Being able to examine page tables is handy, so make this a
module that can be loaded as needed.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20151120010755.GA9060@www.outflux.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

8609d1b5

19 11月, 2015 2 次提交

x86/cpu: Fix SMAP check in PVOPS environments · 581b7f15

由 Andrew Cooper 提交于 6月 03, 2015

There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.
Signed-off-by: NAndrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Tested-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <lguest@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
CC: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

581b7f15

x86/ftrace: Add comment on static function tracing · 112677d6

由 Namhyung Kim 提交于 11月 17, 2015

There was a confusion between update_ftrace_function() and static
function tracing trampoline regarding 3rd parameter (ftrace_ops).
Add a comment for clarification.
Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1447721004-2551-1-git-send-email-namhyung@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

112677d6

14 11月, 2015 1 次提交

x86: remove unused definition of MSR_NHM_PLATFORM_INFO · 5369a21e

由 Len Brown 提交于 11月 12, 2015

MSR_NHM_PLATFORM_INFO has been replaced by...
MSR_PLATFORM_INFO
Signed-off-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5369a21e

12 11月, 2015 5 次提交

perf/x86/intel/rapl: Remove the unused RAPL_EVENT_DESC() macro · 41ac18eb

由 Huang Rui 提交于 11月 04, 2015

Signed-off-by: NHuang Rui <ray.huang@amd.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1446630233-3166-1-git-send-email-ray.huang@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

41ac18eb

x86/fpu: Fix get_xsave_addr() behavior under virtualization · a05917b6

由 Huaitong Han 提交于 11月 06, 2015

KVM uses the get_xsave_addr() function in a different fashion from
the native kernel, in that the 'xsave' parameter belongs to guest vcpu,
not the currently running task.

But 'xsave' is replaced with current task's (host) xsave structure, so
get_xsave_addr() will incorrectly return the bad xsave address to KVM.

Fix it so that the passed in 'xsave' address is used - as intended
originally.
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@intel.com
Link: http://lkml.kernel.org/r/1446800423-21622-1-git-send-email-huaitong.han@intel.com
[ Tidied up the changelog. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

a05917b6

x86/fpu: Fix 32-bit signal frame handling · ab6b5294

由 Dave Hansen 提交于 11月 10, 2015

(This should have gone to LKML originally. Sorry for the extra
 noise, folks on the cc.)

Background:

Signal frames on x86 have two formats:

  1. For 32-bit executables (whether on a real 32-bit kernel or
     under 32-bit emulation on a 64-bit kernel) we have a
    'fpregset_t' that includes the "FSAVE" registers.

  2. For 64-bit executables (on 64-bit kernels obviously), the
     'fpregset_t' is smaller and does not contain the "FSAVE"
     state.

When creating the signal frame, we have to be aware of whether
we are running a 32 or 64-bit executable so we create the
correct format signal frame.

Problem:

save_xstate_epilog() uses 'fx_sw_reserved_ia32' whenever it is
called for a 32-bit executable.  This is for real 32-bit and
ia32 emulation.

But, fpu__init_prepare_fx_sw_frame() only initializes
'fx_sw_reserved_ia32' when emulation is enabled, *NOT* for real
32-bit kernels.

This leads to really wierd situations where 32-bit programs
lose their extended state when returning from a signal handler.
The kernel copies the uninitialized (zero) 'fx_sw_reserved_ia32'
out to userspace in save_xstate_epilog().  But when returning
from the signal, the kernel errors out in check_for_xstate()
when it does not see FP_XSTATE_MAGIC1 present (because it was
zeroed).  This leads to the FPU/XSAVE state being initialized.

For MPX, this leads to the most permissive state and means we
silently lose bounds violations.  I think this would also mean
that we could lose *ANY* FPU/SSE/AVX state.  I'm not sure why
no one has spotted this bug.

I believe this was broken by:

	72a671ce ("x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels")

way back in 2012.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@sr71.net
Cc: fenghua.yu@intel.com
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20151111002354.A0799571@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ab6b5294

x86/mpx: Fix 32-bit address space calculation · f3119b83

由 Dave Hansen 提交于 11月 11, 2015

I received a bug report that running 32-bit MPX binaries on
64-bit kernels was broken.  I traced it down to this little code
snippet.  We were switching our "number of bounds directory
entries" calculation correctly.  But, we didn't switch the other
side of the calculation: the virtual space size.

This meant that we were calculating an absurd size for
bd_entry_virt_space() on 32-bit because we used the 64-bit
virt_space.

This was _also_ broken for 32-bit kernels running on 64-bit
hardware since boot_cpu_data.x86_virt_bits=48 even when running
in 32-bit mode.

Correct that and properly handle all 3 possible cases:

 1. 32-bit binary on 64-bit kernel
 2. 64-bit binary on 64-bit kernel
 3. 32-bit binary on 32-bit kernel

This manifested in having bounds tables not properly unmapped.
It "leaked" memory but had no functional impact otherwise.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151111181934.FA7FAC34@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f3119b83

x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels · 46561c39

由 Dave Hansen 提交于 11月 11, 2015

When you call get_user(foo, bar), you effectively do a

	copy_from_user(&foo, bar, sizeof(*bar));

Note that the sizeof() is implicit.

When we reach out to userspace to try to zap an entire "bounds
table" we need to go read a "bounds directory entry" in order to
locate the table's address.  The size of a "directory entry"
depends on the binary being run and is always the size of a
pointer.

But, when we have a 64-bit kernel and a 32-bit application, the
directory entry is still only 32-bits long, but we fetch it with
a 64-bit pointer which makes get_user() does a 64-bit fetch.
Reading 4 extra bytes isn't harmful, unless we are at the end of
and run off the table.  It might also cause the zero page to get
faulted in unnecessarily even if you are not at the end.

Fix it up by doing a special 32-bit get_user() via a cast when
we have 32-bit userspace.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20151111181931.3ACF6822@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

46561c39

10 11月, 2015 18 次提交

KVM: x86: rename update_db_bp_intercept to update_bp_intercept · a96036b8

由 Paolo Bonzini 提交于 11月 10, 2015

Because #DB is now intercepted unconditionally, this callback
only operates on #BP for both VMX and SVM.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a96036b8

KVM: svm: unconditionally intercept #DB · cbdb967a

由 Paolo Bonzini 提交于 11月 10, 2015

This is needed to avoid the possibility that the guest triggers
an infinite stream of #DB exceptions (CVE-2015-8104).

VMX is not affected: because it does not save DR6 in the VMCS,
it already intercepts #DB unconditionally.
Reported-by: NJan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cbdb967a

KVM: x86: work around infinite loop in microcode when #AC is delivered · 54a20552

由 Eric Northup 提交于 11月 03, 2015

It was found that a guest can DoS a host by triggering an infinite
stream of "alignment check" (#AC) exceptions.  This causes the
microcode to enter an infinite loop where the core never receives
another interrupt.  The host kernel panics pretty quickly due to the
effects (CVE-2015-5307).
Signed-off-by: NEric Northup <digitaleric@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

54a20552

KVM: VMX: Dump TSC multiplier in dump_vmcs() · 8cfe9866

由 Haozhong Zhang 提交于 10月 20, 2015

This patch enhances dump_vmcs() to dump the value of TSC multiplier
field in VMCS.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8cfe9866

KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC · be7b263e

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes kvm-intel to return a scaled host TSC plus the TSC
offset when handling guest readings to MSR_IA32_TSC.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

be7b263e

KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded · ff2c3a18

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes kvm-intel module to load TSC scaling ratio into TSC
multiplier field of VMCS when a vcpu is loaded, so that TSC scaling
ratio can take effect if VMX TSC scaling is enabled.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ff2c3a18

KVM: VMX: Enable and initialize VMX TSC scaling · 64903d61

由 Haozhong Zhang 提交于 10月 20, 2015

This patch exhances kvm-intel module to enable VMX TSC scaling and
collects information of TSC scaling ratio during initialization.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

64903d61

KVM: x86: Use the correct vcpu's TSC rate to compute time scale · 27cca94e

由 Haozhong Zhang 提交于 10月 20, 2015

This patch makes KVM use virtual_tsc_khz rather than the host TSC rate
as vcpu's TSC rate to compute the time scale if TSC scaling is enabled.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27cca94e

KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc() · 4ba76538

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM scales the host TSC in the same way in call-back
read_l1_tsc(), so this patch moves the scaling logic from call-back
read_l1_tsc() to a common function kvm_read_l1_tsc().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4ba76538

KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset() · 58ea6767

由 Haozhong Zhang 提交于 10月 20, 2015

For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58ea6767

KVM: x86: Replace call-back compute_tsc_offset() with a common function · 07c1419a

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

07c1419a

KVM: x86: Replace call-back set_tsc_khz() with a common function · 381d585c

由 Haozhong Zhang 提交于 10月 20, 2015

Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
patch removes the call-back set_tsc_khz() and replaces it with a common
function.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

381d585c

KVM: x86: Add a common TSC scaling function · 35181e86

由 Haozhong Zhang 提交于 10月 20, 2015

VMX and SVM calculate the TSC scaling ratio in a similar logic, so this
patch generalizes it to a common TSC scaling function.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
[Inline the multiplication and shift steps into mul_u64_u64_shr.  Remove
 BUG_ON.  - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35181e86

KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch · ad721883

由 Haozhong Zhang 提交于 10月 20, 2015

This patch moves the field of TSC scaling ratio from the architecture
struct vcpu_svm to the common struct kvm_vcpu_arch.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad721883

KVM: x86: Collect information for setting TSC scaling ratio · bc9b961b

由 Haozhong Zhang 提交于 10月 20, 2015

The number of bits of the fractional part of the 64-bit TSC scaling
ratio in VMX and SVM is different. This patch makes the architecture
code to collect the number of fractional bits and other related
information into variables that can be accessed in the common code.
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bc9b961b

KVM: x86: declare a few variables as __read_mostly · 893590c7

由 Paolo Bonzini 提交于 11月 06, 2015

These include module parameters and variables that are set by
kvm_x86_ops->hardware_setup.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

893590c7

KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common · 450869d6

由 Paolo Bonzini 提交于 11月 04, 2015

They are exactly the same, except that handle_mmio_page_fault
has an unused argument and a call to WARN_ON.  Remove the unused
argument from the callers, and move the warning to (the former)
handle_mmio_page_fault_common.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

450869d6

kmap_atomic_to_page() has no users, remove it · 77c5b5da

由 Nicolas Pitre 提交于 11月 09, 2015

Removal started in commit 5bbeed12 ("sparc32: drop unused
kmap_atomic_to_page").  Let's do it across the whole tree.
Signed-off-by: NNicolas Pitre <nico@linaro.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

77c5b5da

07 11月, 2015 5 次提交

x86/cpu: Call verify_cpu() after having entered long mode too · 04633df0

由 Borislav Petkov 提交于 11月 05, 2015

When we get loaded by a 64-bit bootloader, kernel entry point is
startup_64 in head_64.S. We don't trust any and all bootloaders because
some will fiddle with CPU configuration so we go ahead and massage each
CPU into sanity again.

For example, some dell BIOSes have this XD disable feature which set
IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
for other OSes but Linux sure doesn't need it.

A similar thing is present in the Surface 3 firmware - see
https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
only on the BSP:

  # rdmsr -a 0x1a0
  400850089
  850089
  850089
  850089

I know, right?!

There's not even an off switch in there.

So fix all those cases by sanitizing the 64-bit entry point too. For
that, make verify_cpu() callable in 64-bit mode also.
Requested-and-debugged-by: N"H. Peter Anvin" <hpa@zytor.com>
Reported-and-tested-by: NBastien Nocera <bugzilla@hadess.net>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

04633df0

x86/setup: Fix low identity map for >= 2GB kernel range · 68accac3

由 Krzysztof Mazur 提交于 11月 06, 2015

The commit f5f3497c extended the low identity mapping. However, if
the kernel uses more than 2 GB (VMSPLIT_2G_OPT or VMSPLIT_1G memory
split), the normal memory mapping is overwritten by the low identity
mapping causing a crash. To avoid overwritting, limit the low identity
map to cover only memory before kernel range (PAGE_OFFSET).

Fixes: f5f3497c "x86/setup: Extend low identity map to cover whole kernel range
Signed-off-by: NKrzysztof Mazur <krzysiek@podlesie.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446815916-22105-1-git-send-email-krzysiek@podlesie.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

68accac3

x86/mm: Skip the hypervisor range when walking PGD · f4e342c8

由 Boris Ostrovsky 提交于 11月 05, 2015

The range between 0xffff800000000000 and 0xffff87ffffffffff is reserved
for hypervisor and therefore we should not try to follow PGD's indexes
corresponding to those addresses.

While this has always been a problem, with the new W+X warning
mechanism ptdump_walk_pgd_level_core() can now be called during boot,
causing a PV Xen guest to crash.

[ tglx: Replaced the macro with a readable inline ]

Fixes: e1a58320 "x86/mm: Warn on W^X mappings"
Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel@lists.xen.org
Link: http://lkml.kernel.org/r/1446749795-27764-1-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f4e342c8

x86/AMD: Fix last level cache topology for AMD Fam17h systems · 3849e91f

由 Aravind Gopalakrishnan 提交于 11月 04, 2015

On AMD Fam17h systems, the last level cache is not resident in the
northbridge. Therefore, we cannot assign cpu_llc_id to the same value as
Node ID as we have been doing until now.

We should rather look at the ApicID bits of the core to provide us the
last level cache ID info.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Link: http://lkml.kernel.org/r/1446582899-9378-1-git-send-email-Aravind.Gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

3849e91f

x86/irq: Probe for PIC presence before allocating descs for legacy IRQs · 8c058b0b

由 Vitaly Kuznetsov 提交于 11月 03, 2015

Commit d32932d0 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain
interfaces") brought a regression for Hyper-V Gen2 instances. These
instances don't have i8259 legacy PIC but they use legacy IRQs for serial
port, rtc, and acpi. With this commit included we end up with these IRQs
not initialized. Earlier, there was a special workaround for legacy IRQs
in mp_map_pin_to_irq() doing mp_irqdomain_map() without looking at
nr_legacy_irqs() and now we fail in __irq_domain_alloc_irqs() when
irq_domain_alloc_descs() returns -EEXIST.

The essence of the issue seems to be that early_irq_init() calls
arch_probe_nr_irqs() to figure out the number of legacy IRQs before
we probe for i8259 and gets 16. Later when init_8259A() is called we switch
to NULL legacy PIC and nr_legacy_irqs() starts to return 0 but we already
have 16 descs allocated.

Solve the issue by separating i8259 probe from init and calling it in
arch_probe_nr_irqs() before we actually use nr_legacy_irqs() information.

Fixes: d32932d0 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446543614-3621-1-git-send-email-vkuznets@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

8c058b0b