提交 · 630c1863bc1c6c0ac33e5134bec7bdf7941c28c8 · openanolis / cloud-kernel

30 7月, 2017 2 次提交

x86/traps: Don't clear segment high bits in early_idt_handler_common() · 630c1863

由 Andy Lutomirski 提交于 7月 28, 2017

Now that pt_regs defines the segment fields as 16-bit, there's no
need to sanitize the values.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

630c1863

x86/asm/32: Make pt_regs's segment registers be 16 bits · 385eca8f

由 Andy Lutomirski 提交于 7月 28, 2017

Many 32-bit x86 CPUs do 16-bit writes when storing segment registers to
memory.  This can cause the high word of regs->[cdefgs]s to
occasionally contain garbage.

Rather than making the entry code more complicated to fix up the
garbage, just change pt_regs to reflect reality.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

385eca8f

27 7月, 2017 1 次提交

x86/ldt/64: Refresh DS and ES when modify_ldt changes an entry · a6323757

由 Andy Lutomirski 提交于 7月 26, 2017

On x86_32, modify_ldt() implicitly refreshes the cached DS and ES
segments because they are refreshed on return to usermode.

On x86_64, they're not refreshed on return to usermode.  To improve
determinism and match x86_32's behavior, refresh them when we update
the LDT.

This avoids a situation in which the DS points to a descriptor that is
changed but the old cached segment persists until the next reschedule.
If this happens, then the user-visible state will change
nondeterministically some time after modify_ldt() returns, which is
unfortunate.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Chang Seok <chang.seok.bae@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

a6323757

26 7月, 2017 3 次提交

x86/kconfig: Consolidate unwinders into multiple choice selection · 81d38719

由 Josh Poimboeuf 提交于 7月 25, 2017

There are three mutually exclusive unwinders.  Make that more obvious by
combining them into a multiple-choice selection:

  CONFIG_FRAME_POINTER_UNWINDER
  CONFIG_ORC_UNWINDER
  CONFIG_GUESS_UNWINDER (if CONFIG_EXPERT=y)

Frame pointers are still the default (for now).

The old CONFIG_FRAME_POINTER option is still used in some
arch-independent places, so keep it around, but make it
invisible to the user on x86 - it's now selected by
CONFIG_FRAME_POINTER_UNWINDER=y.
Suggested-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/20170725135424.zukjmgpz3plf5pmt@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>

81d38719

x86/kconfig: Make it easier to switch to the new ORC unwinder · a34a766f

由 Josh Poimboeuf 提交于 7月 24, 2017

A couple of Kconfig changes which make it much easier to switch to the
new CONFIG_ORC_UNWINDER:

1) Remove x86 dependencies on CONFIG_FRAME_POINTER for lockdep,
   latencytop, and fault injection.  x86 has a 'guess' unwinder which
   just scans the stack for kernel text addresses.  It's not 100%
   accurate but in many cases it's good enough.  This allows those users
   who don't want the text overhead of the frame pointer or ORC
   unwinders to still use these features.  More importantly, this also
   makes it much more straightforward to disable frame pointers.

2) Make CONFIG_ORC_UNWINDER depend on !CONFIG_FRAME_POINTER.  While it
   would be possible to have both enabled, it doesn't really make sense
   to do so.  So enforce a sane configuration to prevent the user from
   making a dumb mistake.

With these changes, when you disable CONFIG_FRAME_POINTER, "make
oldconfig" will ask if you want to enable CONFIG_ORC_UNWINDER.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/9985fb91ce5005fe33ea5cc2a20f14bd33c61d03.1500938583.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a34a766f

x86/unwind: Add the ORC unwinder · ee9f8fce

由 Josh Poimboeuf 提交于 7月 24, 2017

Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
It plugs into the existing x86 unwinder framework.

It relies on objtool to generate the needed .orc_unwind and
.orc_unwind_ip sections.

For more details on why ORC is used instead of DWARF, see
Documentation/x86/orc-unwinder.txt - but the short version is
that it's a simplified, fundamentally more robust debugninfo
data structure, which also allows up to two orders of magnitude
faster lookups than the DWARF unwinder - which matters to
profiling workloads like perf.

Thanks to Andy Lutomirski for the performance improvement ideas:
splitting the ORC unwind table into two parallel arrays and creating a
fast lookup table to search a subset of the unwind table.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
[ Extended the changelog. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ee9f8fce

25 7月, 2017 1 次提交

x86/asm: Add suffix macro for GEN_*_RMWcc() · df340524

由 Kees Cook 提交于 7月 24, 2017

The coming x86 refcount protection needs to be able to add trailing
instructions to the GEN_*_RMWcc() operations. This extracts the
difference between the goto/non-goto cases so the helper macros
can be defined outside the #ifdef cases. Additionally adds argument
naming to the resulting asm for referencing from suffixed
instructions, and adds clobbers for "cc", and "cx" to let suffixes
use _ASM_CX, and retain any set flags.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1500921349-10803-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

df340524

24 7月, 2017 5 次提交

x86/io: Make readq() / writeq() API consistent · 9683a64f

由 Andy Shevchenko 提交于 6月 30, 2017

Despite the following commit:

  93093d09 ("x86: provide readq()/writeq() on 32-bit too, complete")

which says:

  ...Also, map all the APIs to the strongest ordering variant. It's way
  too easy to mess such details up in drivers and the difference between
  "memory" and "" constrained asm() constructs is in the noise range.

... we have for now only one user of this API (i.e. writeq_relaxed() in
drivers/hwtracing/intel_th/sth.c) on x86 and it does care about
"relaxed" part of it.

Moreover 32-bit support has been removed from that header, though appeared
later in specific headers that emphasizes its non-atomic context.

The rest should keep in mind a consistent picture of the __raw_IO() vs. IO()
vs. IO_relaxed() API.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baolin Wang <baolin.wang@spreadtrum.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-i2c@vger.kernel.org
Cc: wsa@the-dreams.de
Link: http://lkml.kernel.org/r/20170630170934.83028-6-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9683a64f

x86/io: Remove xlate_dev_kmem_ptr() duplication · eabc2a7c

由 Andy Shevchenko 提交于 6月 30, 2017

Generic header defines xlate_dev_kmem_ptr().

Reuse it from generic header and remove in x86 code.
Move a description to the generic header as well.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baolin Wang <baolin.wang@spreadtrum.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-i2c@vger.kernel.org
Cc: wsa@the-dreams.de
Link: http://lkml.kernel.org/r/20170630170934.83028-5-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

eabc2a7c

x86/io: Remove mem*io() duplications · c2327da0

由 Andy Shevchenko 提交于 6月 30, 2017

Generic header defines memset_io, memcpy_fromio(). and memcpy_toio().

Reuse them from generic header and remove in x86 code.
Move the descriptions to the generic header as well.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Baolin Wang <baolin.wang@spreadtrum.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-i2c@vger.kernel.org
Cc: wsa@the-dreams.de
Link: http://lkml.kernel.org/r/20170630170934.83028-4-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c2327da0

x86/io: Include asm-generic/io.h to architectural code · 31952011

由 Andy Shevchenko 提交于 6月 30, 2017

asm-generic/io.h defines few helpers which would be useful in the drivers,
such as writesb() and readsb().

Include it to the asm/io.h in architectural folder.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: NWolfram Sang <wsa@the-dreams.de>
Cc: Baolin Wang <baolin.wang@spreadtrum.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-i2c@vger.kernel.org
Link: http://lkml.kernel.org/r/20170630170934.83028-3-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

31952011

x86/io: Define IO accessors by preprocessor · 80b9ece1

由 Andy Shevchenko 提交于 6月 30, 2017

As a preparatory to use generic IO accessor helpers we need to define
architecture dependent functions via preprocessor to let world know we
have them.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: NWolfram Sang <wsa@the-dreams.de>
Cc: Baolin Wang <baolin.wang@spreadtrum.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-i2c@vger.kernel.org
Link: http://lkml.kernel.org/r/20170630170934.83028-2-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

80b9ece1

18 7月, 2017 7 次提交

x86/asm: Add unwind hint annotations to sync_core() · 76846bf3

由 Josh Poimboeuf 提交于 7月 11, 2017

This enables objtool to grok the iret in the middle of a C function.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/b057be26193c11d2ed3337b2107bc7adcba42c99.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

76846bf3

x86/entry/64: Add unwind hint annotations · 8c1f7558

由 Josh Poimboeuf 提交于 7月 11, 2017

Add unwind hint annotations to entry_64.S.  This will enable the ORC
unwinder to unwind through any location in the entry code including
syscalls, interrupts, and exceptions.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/b9f6d478aadf68ba57c739dcfac34ec0dc021c4c.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8c1f7558

objtool, x86: Add facility for asm code to provide unwind hints · 39358a03

由 Josh Poimboeuf 提交于 7月 11, 2017

Some asm (and inline asm) code does special things to the stack which
objtool can't understand.  (Nor can GCC or GNU assembler, for that
matter.)  In such cases we need a facility for the code to provide
annotations, so the unwinder can unwind through it.

This provides such a facility, in the form of unwind hints.  They're
similar to the GNU assembler .cfi* directives, but they give more
information, and are needed in far fewer places, because objtool can
fill in the blanks by following branches and adjusting the stack pointer
for pushes and pops.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/0f5f3c9104fca559ff4088bece1d14ae3bca52d5.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

39358a03

x86/dumpstack: Fix interrupt and exception stack boundary checks · 5a3cf869

由 Josh Poimboeuf 提交于 7月 11, 2017

On x86_64, the double fault exception stack is located immediately after
the interrupt stack in memory.  This causes confusion in the unwinder
when it tries to unwind through an empty interrupt stack, where the
stack pointer points to the address bordering the two stacks.  The
unwinder incorrectly thinks it's running on the double fault stack.

Fix this kind of stack border confusion by never considering the
beginning address of an exception or interrupt stack to be part of the
stack.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Fixes: 5fe599e0 ("x86/dumpstack: Add support for unwinding empty IRQ stacks")
Link: http://lkml.kernel.org/r/bcc142160a5104de5c354c21c394c93a0173943f.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5a3cf869

x86/dumpstack: Fix occasionally missing registers · b0529bec

由 Josh Poimboeuf 提交于 7月 11, 2017

If two consecutive stack frames have pt_regs, the oops dump code fails
to print the second frame's registers.  Fix that.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Fixes: 3b3fa11b ("x86/dumpstack: Print any pt_regs found on the stack")
Link: http://lkml.kernel.org/r/269c5c00c7d45c699f3dcea42a3a594c6cf7a9a3.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b0529bec

x86/entry/64: Initialize the top of the IRQ stack before switching stacks · 29955909

由 Andy Lutomirski 提交于 7月 11, 2017

The OOPS unwinder wants the word at the top of the IRQ stack to
point back to the previous stack at all times when the IRQ stack
is in use.  There's currently a one-instruction window in ENTER_IRQ_STACK
during which this isn't the case.  Fix it by writing the old RSP to the
top of the IRQ stack before jumping.

This currently writes the pointer to the stack twice, which is a bit
ugly.  We could get rid of this by replacing irq_stack_ptr with
irq_stack_ptr_minus_eight (better name welcome).  OTOH, there may be
all kinds of odd microarchitectural considerations in play that
affect performance by a few cycles here.
Reported-by: NMike Galbraith <efault@gmx.de>
Reported-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/aae7e79e49914808440ad5310ace138ced2179ca.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

29955909

x86/entry/64: Refactor IRQ stacks and make them NMI-safe · 1d3e53e8

由 Andy Lutomirski 提交于 7月 11, 2017

This will allow IRQ stacks to nest inside NMIs or similar entries
that can happen during IRQ stack setup or teardown.

The new macros won't work correctly if they're invoked with IRQs on.
Add a check under CONFIG_DEBUG_ENTRY to detect that.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
[ Use %r10 instead of %r11 in xen_do_hypervisor_callback to make objtool
  and ORC unwinder's lives a little easier. ]
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/b0b2ff5fb97d2da2e1d7e1f380190c92545c8bb5.1499786555.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1d3e53e8

14 7月, 2017 5 次提交

kvm: x86: hyperv: make VP_INDEX managed by userspace · d3457c87

由 Roman Kagan 提交于 7月 14, 2017

Hyper-V identifies vCPUs by Virtual Processor Index, which can be
queried via HV_X64_MSR_VP_INDEX msr.  It is defined by the spec as a
sequential number which can't exceed the maximum number of vCPUs per VM.
APIC ids can be sparse and thus aren't a valid replacement for VP
indices.

Current KVM uses its internal vcpu index as VP_INDEX.  However, to make
it predictable and persistent across VM migrations, the userspace has to
control the value of VP_INDEX.

This patch achieves that, by storing vp_index explicitly on vcpu, and
allowing HV_X64_MSR_VP_INDEX to be set from the host side.  For
compatibility it's initialized to KVM vcpu index.  Also a few variables
are renamed to make clear distinction betweed this Hyper-V vp_index and
KVM vcpu_id (== APIC id).  Besides, a new capability,
KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
attempting msr writes where unsupported, to avoid spamming error logs.
Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

d3457c87

KVM: async_pf: Let guest support delivery of async_pf from guest mode · 52a5c155

由 Wanpeng Li 提交于 7月 13, 2017

Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
kvm_can_do_async_pf returns 0 if in guest mode.

This is similar to what svm.c wanted to do all along, but it is only
enabled for Linux as L1 hypervisor.  Foreign hypervisors must never
receive async page faults as vmexits, because they'd probably be very
confused about that.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

52a5c155

KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf · adfe20fb

由 Wanpeng Li 提交于 7月 13, 2017

Add an nested_apf field to vcpu->arch.exception to identify an async page
fault, and constructs the expected vm-exit information fields. Force a
nested VM exit from nested_vmx_check_exception() if the injected #PF is
async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

adfe20fb

KVM: async_pf: Add L1 guest async_pf #PF vmexit handler · 1261bfa3

由 Wanpeng Li 提交于 7月 13, 2017

This patch adds the L1 guest async page fault #PF vmexit handler, such
by L1 similar to ordinary async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Passed insn parameters to kvm_mmu_page_fault().]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

1261bfa3

KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list · cfcd20e5

由 Wanpeng Li 提交于 7月 13, 2017

This patch removes all arguments except the first in
kvm_x86_ops->queue_exception since they can extract the arguments from
vcpu->arch.exception themselves.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

cfcd20e5

13 7月, 2017 16 次提交

kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 · efc479e6

由 Roman Kagan 提交于 6月 22, 2017

There is a flaw in the Hyper-V SynIC implementation in KVM: when message
page or event flags page is enabled by setting the corresponding msr,
KVM zeroes it out.  This is problematic because on migration the
corresponding MSRs are loaded on the destination, so the content of
those pages is lost.

This went unnoticed so far because the only user of those pages was
in-KVM hyperv synic timers, which could continue working despite that
zeroing.

Newer QEMU uses those pages for Hyper-V VMBus implementation, and
zeroing them breaks the migration.

Besides, in newer QEMU the content of those pages is fully managed by
QEMU, so zeroing them is undesirable even when writing the MSRs from the
guest side.

To support this new scheme, introduce a new capability,
KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
pages aren't zeroed out in KVM.
Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

efc479e6

KVM: x86: make backwards_tsc_observed a per-VM variable · a826faf1

由 Ladi Prosek 提交于 6月 26, 2017

The backwards_tsc_observed global introduced in commit 16a96021 is never
reset to false. If a VM happens to be running while the host is suspended
(a common source of the TSC jumping backwards), master clock will never
be enabled again for any VM. In contrast, if no VM is running while the
host is suspended, master clock is unaffected. This is inconsistent and
unnecessarily strict. Let's track the backwards_tsc_observed variable
separately and let each VM start with a clean slate.

Real world impact: My Windows VMs get slower after my laptop undergoes a
suspend/resume cycle. The only way to get the perf back is unloading and
reloading the kvm module.
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a826faf1

x86/efi: move asmlinkage before return type · 0825f49f

由 Joe Perches 提交于 7月 12, 2017

Make the code like the rest of the kernel.

Link: http://lkml.kernel.org/r/1cd3d401626e51ea0e2333a860e76e80bc560a4c.1499284835.git.joe@perches.comSigned-off-by: NJoe Perches <joe@perches.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0825f49f

x86/mmap: properly account for stack randomization in mmap_base · c204d21f

由 Rik van Riel 提交于 7月 12, 2017

When RLIMIT_STACK is, for example, 256MB, the current code results in a
gap between the top of the task and mmap_base of 256MB, failing to take
into account the amount by which the stack address was randomized.  In
other words, the stack gets less than RLIMIT_STACK space.

Ensure that the gap between the stack and mmap_base always takes stack
randomization and the stack guard gap into account.

Obtained from Daniel Micay's linux-hardened tree.

Link: http://lkml.kernel.org/r/20170622200033.25714-2-riel@redhat.comSigned-off-by: NDaniel Micay <danielmicay@gmail.com>
Signed-off-by: NRik van Riel <riel@redhat.com>
Reported-by: NFlorian Weimer <fweimer@redhat.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c204d21f

x86: ascii armor the x86_64 boot init stack canary · bf9eb544

由 Rik van Riel 提交于 7月 12, 2017

Use the ascii-armor canary to prevent unterminated C string overflows
from being able to successfully overwrite the canary, even if they
somehow obtain the canary value.

Inspired by execshield ascii-armor and Daniel Micay's linux-hardened
tree.

Link: http://lkml.kernel.org/r/20170524155751.424-4-riel@redhat.comSigned-off-by: NRik van Riel <riel@redhat.com>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bf9eb544

include/linux/string.h: add the option of fortified string.h functions · 6974f0c4

由 Daniel Micay 提交于 7月 12, 2017

This adds support for compiling with a rough equivalent to the glibc
_FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
overflow checks for string.h functions when the compiler determines the
size of the source or destination buffer at compile-time.  Unlike glibc,
it covers buffer reads in addition to writes.

GNU C __builtin_*_chk intrinsics are avoided because they would force a
much more complex implementation.  They aren't designed to detect read
overflows and offer no real benefit when using an implementation based
on inline checks.  Inline checks don't add up to much code size and
allow full use of the regular string intrinsics while avoiding the need
for a bunch of _chk functions and per-arch assembly to avoid wrapper
overhead.

This detects various overflows at compile-time in various drivers and
some non-x86 core kernel code.  There will likely be issues caught in
regular use at runtime too.

Future improvements left out of initial implementation for simplicity,
as it's all quite optional and can be done incrementally:

* Some of the fortified string functions (strncpy, strcat), don't yet
  place a limit on reads from the source based on __builtin_object_size of
  the source buffer.

* Extending coverage to more string functions like strlcat.

* It should be possible to optionally use __builtin_object_size(x, 1) for
  some functions (C strings) to detect intra-object overflows (like
  glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
  approach to avoid likely compatibility issues.

* The compile-time checks should be made available via a separate config
  option which can be enabled by default (or always enabled) once enough
  time has passed to get the issues it catches fixed.

Kees said:
 "This is great to have. While it was out-of-tree code, it would have
  blocked at least CVE-2016-3858 from being exploitable (improper size
  argument to strlcpy()). I've sent a number of fixes for
  out-of-bounds-reads that this detected upstream already"

[arnd@arndb.de: x86: fix fortified memcpy]
  Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
[keescook@chromium.org: avoid panic() in favor of BUG()]
  Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
[keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.orgSigned-off-by: NDaniel Micay <danielmicay@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6974f0c4

kernel/watchdog: split up config options · 05a4a952

由 Nicholas Piggin 提交于 7月 12, 2017

Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.

LOCKUP_DETECTOR implies the general boot, sysctl, and programming
interfaces for the lockup detectors.

An architecture that wants to use a hard lockup detector must define
HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.

Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the
minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and
does not implement the LOCKUP_DETECTOR interfaces.

sparc is unusual in that it has started to implement some of the
interfaces, but not fully yet.  It should probably be converted to a full
HAVE_HARDLOCKUP_DETECTOR_ARCH.

[npiggin@gmail.com: fix]
  Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com
Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com>
Reviewed-by: NDon Zickus <dzickus@redhat.com>
Reviewed-by: NBabu Moger <babu.moger@oracle.com>
Tested-by: Babu Moger <babu.moger@oracle.com>	[sparc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

05a4a952

kexec: move vmcoreinfo out of the kernel's .bss section · 203e9e41

由 Xunlei Pang 提交于 7月 12, 2017

As Eric said,
 "what we need to do is move the variable vmcoreinfo_note out of the
  kernel's .bss section. And modify the code to regenerate and keep this
  information in something like the control page.

  Definitely something like this needs a page all to itself, and ideally
  far away from any other kernel data structures. I clearly was not
  watching closely the data someone decided to keep this silly thing in
  the kernel's .bss section."

This patch allocates extra pages for these vmcoreinfo_XXX variables, one
advantage is that it enhances some safety of vmcoreinfo, because
vmcoreinfo now is kept far away from other kernel data structures.

Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.comSigned-off-by: NXunlei Pang <xlpang@redhat.com>
Tested-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Suggested-by: NEric Biederman <ebiederm@xmission.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

203e9e41

KVM: SVM: Enable Virtual VMLOAD VMSAVE feature · 89c8a498

由 Janakarajan Natarajan 提交于 7月 06, 2017

Enable the Virtual VMLOAD VMSAVE feature. This is done by setting bit 1
at position B8h in the vmcb.

The processor must have nested paging enabled, be in 64-bit mode and
have support for the Virtual VMLOAD VMSAVE feature for the bit to be set
in the vmcb.
Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

89c8a498

KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition · 76ff3592

由 Janakarajan Natarajan 提交于 7月 06, 2017

Define a new cpufeature definition for Virtual VMLOAD VMSAVE.
Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

76ff3592

KVM: SVM: Rename lbr_ctl field in the vmcb control area · 0dc92119

由 Janakarajan Natarajan 提交于 7月 06, 2017

Rename the lbr_ctl variable to better reflect the purpose of the field -
provide support for virtualization extensions.
Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

0dc92119

KVM: SVM: Prepare for new bit definition in lbr_ctl · 8a77e909

由 Janakarajan Natarajan 提交于 7月 06, 2017

The lbr_ctl variable in the vmcb control area is used to enable or
disable Last Branch Record (LBR) virtualization. However, this is to be
done using only bit 0 of the variable. To correct this and to prepare
for a new feature, change the current usage to work only on a particular
bit.
Signed-off-by: NJanakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

8a77e909

KVM: SVM: handle singlestep exception when skipping emulated instructions · b742c1e6

由 Ladi Prosek 提交于 6月 22, 2017

kvm_skip_emulated_instruction handles the singlestep debug exception
which is something we almost always want. This commit (specifically
the change in rdmsr_interception) makes the debug.flat KVM unit test
pass on AMD.

Two call sites still call skip_emulated_instruction directly:

* In svm_queue_exception where it's used only for moving the rip forward

* In task_switch_interception which is analogous to handle_task_switch
  in VMX
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b742c1e6

KVM: x86: take slots_lock in kvm_free_pit · fb530729

由 Radim Krčmář 提交于 7月 10, 2017

kvm_vm_release() did not have slots_lock when calling
kvm_io_bus_unregister_dev() and this went unnoticed until 4a12f951
("KVM: mark kvm->busses as rcu protected") added dynamic checks.
Luckily, there should be no race at that point:

  =============================
  WARNING: suspicious RCU usage
  4.12.0.kvm+ #0 Not tainted
  -----------------------------
  ./include/linux/kvm_host.h:479 suspicious rcu_dereference_check() usage!

   lockdep_rcu_suspicious+0xc5/0x100
   kvm_io_bus_unregister_dev+0x173/0x190 [kvm]
   kvm_free_pit+0x28/0x80 [kvm]
   kvm_arch_sync_events+0x2d/0x30 [kvm]
   kvm_put_kvm+0xa7/0x2a0 [kvm]
   kvm_vm_release+0x21/0x30 [kvm]
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

fb530729

kvm: vmx: Properly handle machine check during VM-entry · 48ae0fb4

由 Jim Mattson 提交于 5月 22, 2017

vmx_complete_atomic_exit should call kvm_machine_check for any
VM-entry failure due to a machine-check event. Such an exit should be
recognized solely by its basic exit reason (i.e. the low 16 bits of
the VMCS exit reason field). None of the other VMCS exit information
fields contain valid information when the VM-exit is due to "VM-entry
failure due to machine-check event".
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NXiao Guangrong <xiaoguangrong@tencent.com>
[Changed VM_EXIT_INTR_INFO condition to better describe its reason.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

48ae0fb4

KVM: x86: update master clock before computing kvmclock_offset · 0bc48bea

由 Radim Krčmář 提交于 5月 16, 2017

kvm master clock usually has a different frequency than the kernel boot
clock.  This is not a problem until the master clock is updated;
update uses the current kernel boot clock to compute new kvm clock,
which erases any kvm clock cycles that might have built up due to
frequency difference over a long period.

KVM_SET_CLOCK is one of places where we can safely update master clock
as the guest-visible clock is going to be shifted anyway.

The problem with current code is that it updates the kvm master clock
after updating the offset.  If the master clock was enabled before
calling KVM_SET_CLOCK, then it might have built up a significant delta
from kernel boot clock.
In the worst case, the time set by userspace would be shifted by so much
that it couldn't have been set at any point during KVM_SET_CLOCK.

To fix this, move kvm_gen_update_masterclock() before computing
kvmclock_offset, which means that the master clock and kernel boot clock
will be sufficiently close together.
Another solution would be to replace get_kvmclock_ns() with
"ktime_get_boot_ns() + ka->kvmclock_offset", which is marginally more
accurate, but would break symmetry with KVM_GET_CLOCK.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

0bc48bea

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功