提交 · 196bd485ee4f03ce4c690bfcf38138abfcd0a4bc · gsplhtlxg / clone-Linux

30 9月, 2017 1 次提交

x86/asm: Use register variable to get stack pointer value · 196bd485

由 Andrey Ryabinin 提交于 9月 29, 2017

Currently we use current_stack_pointer() function to get the value
of the stack pointer register. Since commit:

  f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")

... we have a stack register variable declared. It can be used instead of
current_stack_pointer() function which allows to optimize away some
excessive "mov %rsp, %<dst>" instructions:

 -mov    %rsp,%rdx
 -sub    %rdx,%rax
 -cmp    $0x3fff,%rax
 -ja     ffffffff810722fd <ist_begin_non_atomic+0x2d>

 +sub    %rsp,%rax
 +cmp    $0x3fff,%rax
 +ja     ffffffff810722fa <ist_begin_non_atomic+0x2a>

Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer
and use it instead of the removed function.
Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

196bd485

29 9月, 2017 1 次提交

x86/asm: Fix inline asm call constraints for GCC 4.4 · 520a13c5

由 Josh Poimboeuf 提交于 9月 28, 2017

The kernel test bot (run by Xiaolong Ye) reported that the following commit:

  f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")

is causing double faults in a kernel compiled with GCC 4.4.

Linus subsequently diagnosed the crash pattern and the buggy commit and found that
the issue is with this code:

  register unsigned int __asm_call_sp asm("esp");
  #define ASM_CALL_CONSTRAINT "+r" (__asm_call_sp)

Even on a 64-bit kernel, it's using ESP instead of RSP.  That causes GCC
to produce the following bogus code:

  ffffffff8147461d:       89 e0                   mov    %esp,%eax
  ffffffff8147461f:       4c 89 f7                mov    %r14,%rdi
  ffffffff81474622:       4c 89 fe                mov    %r15,%rsi
  ffffffff81474625:       ba 20 00 00 00          mov    $0x20,%edx
  ffffffff8147462a:       89 c4                   mov    %eax,%esp
  ffffffff8147462c:       e8 bf 52 05 00          callq  ffffffff814c98f0 <copy_user_generic_unrolled>

Despite the absurdity of it backing up and restoring the stack pointer
for no reason, the bug is actually the fact that it's only backing up
and restoring the lower 32 bits of the stack pointer.  The upper 32 bits
are getting cleared out, corrupting the stack pointer.

So change the '__asm_call_sp' register variable to be associated with
the actual full-size stack pointer.

This also requires changing the __ASM_SEL() macro to be based on the
actual compiled arch size, rather than the CONFIG value, because
CONFIG_X86_64 compiles some files with '-m32' (e.g., realmode and vdso).
Otherwise Clang fails to build the kernel because it complains about the
use of a 64-bit register (RSP) in a 32-bit file.
Reported-and-Bisected-and-Tested-by: Nkernel test robot <xiaolong.ye@intel.com>
Diagnosed-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: LKP <lkp@01.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")
Link: http://lkml.kernel.org/r/20170928215826.6sdpmwtkiydiytim@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>

520a13c5

25 9月, 2017 1 次提交

x86: Don't cast away the __user in __get_user_asm_u64() · 5ac751d9

由 Ville Syrjälä 提交于 9月 12, 2017

Don't cast away the __user in __get_user_asm_u64() on x86-32.
Prevents sparse getting upset.
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20170912164000.13745-1-ville.syrjala@linux.intel.com

5ac751d9

23 9月, 2017 1 次提交

x86/asm: Fix inline asm call constraints for Clang · f5caf621

由 Josh Poimboeuf 提交于 9月 20, 2017

For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:

  static inline void foo()
  {
	register void *__sp asm(_ASM_SP);
	asm("call bar" : "+r" (__sp))
  }

Unfortunately, that pattern causes Clang to corrupt the stack pointer.

The fix is easy: convert the stack pointer register variable to a global
variable.

It should be noted that the end result is different based on the GCC
version.  With GCC 6.4, this patch has exactly the same result as
before:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9820389		9491555		8816046		8516940
 after	9820389		9491555		8816046		8516940

With GCC 7.2, however, GCC's behavior has changed.  It now changes its
behavior based on the conversion of the register variable to a global.
That somehow convinces it to *always* set up the frame pointer before
inserting *any* inline asm.  (Therefore, listing the variable as an
output constraint is a no-op and is no longer necessary.)  It's a bit
overkill, but the performance impact should be negligible.  And in fact,
there's a nice improvement with frame pointers disabled:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9796316		9468236		9076191		8790305
 after	9796957		9464267		9076381		8785949

So in summary, while listing the stack pointer as an output constraint
is no longer necessary for newer versions of GCC, it's still needed for
older versions.
Suggested-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: NMatthias Kaehlcke <mka@chromium.org>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f5caf621

18 9月, 2017 2 次提交

x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code · 52a2af40

由 Andy Lutomirski 提交于 9月 17, 2017

Putting the logical ASID into CR3's PCID bits directly means that we
have two cases to consider separately: ASID == 0 and ASID != 0.
This means that bugs that only hit in one of these cases trigger
nondeterministically.

There were some bugs like this in the past, and I think there's
still one in current kernels.  In particular, we have a number of
ASID-unware code paths that save CR3, write some special value, and
then restore CR3.  This includes suspend/resume, hibernate, kexec,
EFI, and maybe other things I've missed.  This is currently
dangerous: if ASID != 0, then this code sequence will leave garbage
in the TLB tagged for ASID 0.  We could potentially see corruption
when switching back to ASID 0.  In principle, an
initialize_tlbstate_and_flush() call after these sequences would
solve the problem, but EFI, at least, does not call this.  (And it
probably shouldn't -- initialize_tlbstate_and_flush() is rather
expensive.)
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cdc14bbe5d3c3ef2a562be09a6368ffe9bd947a6.1505663533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

52a2af40

x86/mm: Factor out CR3-building code · 47061a24

由 Andy Lutomirski 提交于 9月 17, 2017

Current, the code that assembles a value to load into CR3 is
open-coded everywhere.  Factor it out into helpers build_cr3() and
build_cr3_noflush().

This makes one semantic change: __get_current_cr3_fast() was wrong
on SME systems.  No one noticed because the only caller is in the
VMX code, and there are no CPUs with both SME and VMX.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Link: http://lkml.kernel.org/r/ce350cf11e93e2842d14d0b95b0199c7d881f527.1505663533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

47061a24

14 9月, 2017 1 次提交

KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv() · b2a05fef

由 Suravee Suthikulpanit 提交于 9月 12, 2017

Modify struct kvm_x86_ops.arch.apicv_active() to take struct kvm_vcpu
pointer as parameter in preparation to subsequent changes.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b2a05fef

13 9月, 2017 3 次提交

KVM: x86: Remove .get_pkru() from kvm_x86_ops · 98152b83

由 Joerg Roedel 提交于 8月 28, 2017

The commit

	9dd21e104bc ('KVM: x86: simplify handling of PKRU')

removed all users and providers of that call-back, but
didn't remove it. Remove it now.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

98152b83

x86/hyper-v: Remove duplicated HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED definition · 1278f58c

由 Vitaly Kuznetsov 提交于 9月 11, 2017

Commits:

  7dcf90e9 ("PCI: hv: Use vPCI protocol version 1.2")
  628f54cc ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")

added the same definition and they came in through different trees.
Fix the duplication.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170911150620.3998-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1278f58c

x86/paravirt: Remove no longer used paravirt functions · 87930019

由 Juergen Gross 提交于 9月 04, 2017

With removal of lguest some of the paravirt functions are no longer
needed:

	->read_cr4()
	->store_idt()
	->set_pmd_at()
	->set_pud_at()
	->pte_update()

Remove them.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

87930019

11 9月, 2017 1 次提交

x86/cpu: Remove unused and undefined __generic_processor_info() declaration · e2329b42

由 Dou Liyang 提交于 9月 11, 2017

The following revert:

  2b85b3d2 ("x86/acpi: Restore the order of CPU IDs")

... got rid of __generic_processor_info(), but forgot to remove its
declaration in mpspec.h.

Remove the declaration and update the comments as well.
Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1505101403-29100-1-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e2329b42

09 9月, 2017 4 次提交

x86: implement memset16, memset32 & memset64 · 4c512485

由 Matthew Wilcox 提交于 9月 08, 2017

These are single instructions on x86.  There's no 64-bit instruction for
x86-32, but we don't yet have any user for memset64() on 32-bit
architectures, so don't bother to implement it.

Link: http://lkml.kernel.org/r/20170720184539.31609-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c512485

mm: soft-dirty: keep soft-dirty bits over thp migration · ab6e3d09

由 Naoya Horiguchi 提交于 9月 08, 2017

Soft dirty bit is designed to keep tracked over page migration.  This
patch makes it work in the same manner for thp migration too.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab6e3d09

mm: thp: enable thp migration in generic path · 616b8371

由 Zi Yan 提交于 9月 08, 2017

Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.

This patch makes it possible to support thp migration.  If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration.  If you succeed to
allocate destination thp, you enter thp migration.  Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.

[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
  Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

616b8371

mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1 · eee4818b

由 Naoya Horiguchi 提交于 9月 08, 2017

_PAGE_PSE is used to distinguish between a truly non-present
(_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and
should be treated as present.

But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would
cause confusion between one of those PMDs undergoing a THP split, and a
soft-dirty PMD.  Dropping _PAGE_PSE check in pmd_present() does not work
well, because it can hurt optimization of tlb handling in thp split.

Thus, we need to move the bit.

In the current kernel, bits 1-4 are not used in non-present format since
commit 00839ee3 ("x86/mm: Move swap offset/type up in PTE to work
around erratum").  So let's move _PAGE_SWP_SOFT_DIRTY to bit 1.  Bit 7
is used as reserved (always clear), so please don't use it for other
purpose.

Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.comSigned-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
Acked-by: NDave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eee4818b

07 9月, 2017 3 次提交

x86/mm: Make the SME mask a u64 · 21d9bb4a

由 Borislav Petkov 提交于 9月 07, 2017

The SME encryption mask is for masking 64-bit pagetable entries. It
being an unsigned long works fine on X86_64 but on 32-bit builds in
truncates bits leading to Xen guests crashing very early.

And regardless, the whole SME mask handling shouldnt've leaked into
32-bit because SME is X86_64-only feature. So, first make the mask u64.
And then, add trivial 32-bit versions of the __sme_* macros so that
nothing happens there.
Reported-and-tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Tested-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NTom Lendacky <Thomas.Lendacky@amd.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas <Thomas.Lendacky@amd.com>
Fixes: 21729f81 ("x86/mm: Provide general kernel support for memory encryption")
Link: http://lkml.kernel.org/r/20170907093837.76zojtkgebwtqc74@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>

21d9bb4a

x86/mm: Reinitialize TLB state on hotplug and resume · 72c0098d

由 Andy Lutomirski 提交于 9月 06, 2017

When Linux brings a CPU down and back up, it switches to init_mm and then
loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect
of masking off the ASID bits in CR3.

This can result in some confusion in the TLB handling code. If we
bring a CPU down and back up with any ASID other than 0, we end up
with the wrong ASID active on the CPU after resume. This could
cause our internal state to become corrupt, although major
corruption is unlikely because init_mm doesn't have any user pages.
More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
in the next context switch. The result of *that* is a failure to
resume from suspend with probability 1 - 1/6^(cpus-1).

Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reported-by: NJiri Kosina <jikos@kernel.org>
Fixes: 10af6235 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72c0098d

mm: arch: consolidate mmap hugetlb size encodings · aafd4562

由 Mike Kravetz 提交于 9月 06, 2017

A non-default huge page size can be encoded in the flags argument of the
mmap system call.  The definitions for these encodings are in arch
specific header files.  However, all architectures use the same values.

Consolidate all the definitions in the primary user header file
(uapi/linux/mman.h).  Include definitions for all known huge page sizes.
Use the generic encoding definitions in hugetlb_encode.h as the basis
for these definitions.

Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aafd4562

01 9月, 2017 3 次提交

KVM: update to new mmu_notifier semantic v2 · fb1522e0

由 Jérôme Glisse 提交于 8月 31, 2017

Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.

Changed since v1 (Linus Torvalds)
    - remove now useless kvm_arch_mmu_notifier_invalidate_page()
Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
Tested-by: NMike Galbraith <efault@gmx.de>
Tested-by: NAdam Borowski <kilobyte@angband.pl>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb1522e0

libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7

由 Robin Murphy 提交于 8月 31, 2017

mmio_flush_range() suffers from a lack of clearly-defined semantics,
and is somewhat ambiguous to port to other architectures where the
scope of the writeback implied by "flush" and ordering might matter,
but MMIO would tend to imply non-cacheable anyway. Per the rationale
in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
only existing use is actually to invalidate clean cache lines for
ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
cleanup of the pmem API, that also now happens to be the exact purpose
of arch_invalidate_pmem(), which would be a far more well-defined tool
for the job.

Rather than risk potentially inconsistent implementations of
mmio_flush_range() for the sake of one callsite, streamline things by
removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
definitions up to the libnvdimm level, so they can be shared by NFIT
as well. This allows NFIT to be enabled for arm64.
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5deb67f7

x86/xen: Get rid of paravirt op adjust_exception_frame · 5878d5d6

由 Juergen Gross 提交于 8月 31, 2017

When running as Xen pv-guest the exception frame on the stack contains
%r11 and %rcx additional to the other data pushed by the processor.

Instead of having a paravirt op being called for each exception type
prepend the Xen specific code to each exception entry. When running as
Xen pv-guest just use the exception entry with prepended instructions,
otherwise use the entry without the Xen specific code.

[ tglx: Merged through tip to avoid ugly merge conflict ]
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: boris.ostrovsky@oracle.com
Cc: luto@amacapital.net
Link: http://lkml.kernel.org/r/20170831174249.26853-1-jg@pfupf.net

5878d5d6

31 8月, 2017 6 次提交

xen: remove unused function xen_set_domain_pte() · 882bbe56

由 Juergen Gross 提交于 8月 04, 2017

The function xen_set_domain_pte() is used nowhere in the kernel.
Remove it.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

882bbe56

xen: remove tests for pvh mode in pure pv paths · 82616f95

由 Juergen Gross 提交于 8月 04, 2017

Remove the last tests for XENFEAT_auto_translated_physmap in pure
PV-domain specific paths. PVH V1 is gone and the feature will always
be "false" in PV guests.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

82616f95

tracing/hyper-v: Trace hyperv_mmu_flush_tlb_others() · 773b79f7

由 Vitaly Kuznetsov 提交于 8月 02, 2017

Add Hyper-V tracing subsystem and trace hyperv_mmu_flush_tlb_others().
Tracing is done the same way we do xen_mmu_flush_tlb_others().
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: NStephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170802160921.21791-10-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

773b79f7

x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls · 628f54cc

由 Vitaly Kuznetsov 提交于 8月 02, 2017

Hyper-V hosts may support more than 64 vCPUs, we need to use
HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX/LIST_EX hypercalls in this
case.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: NStephen Hemminger <sthemmin@microsoft.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Xiao <sixiao@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170802160921.21791-9-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

628f54cc

x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y) · 9e52fc2b

由 Vitaly Kuznetsov 提交于 8月 28, 2017

There's a subtle bug in how some of the paravirt guest code handles
page table freeing on x86:

On x86 software page table walkers depend on the fact that remote TLB flush
does an IPI: walk is performed lockless but with interrupts disabled and in
case the page table is freed the freeing CPU will get blocked as remote TLB
flush is required. On other architectures which don't require an IPI to do
remote TLB flush we have an RCU-based mechanism (see
include/asm-generic/tlb.h for more details).

In virtualized environments we may want to override the ->flush_tlb_others
callback in pv_mmu_ops and use a hypercall asking the hypervisor to do a
remote TLB flush for us. This breaks the assumption about IPIs. Xen PV has
been doing this for years and the upcoming remote TLB flush for Hyper-V will
do it too.

This is not safe, as software page table walkers may step on an already
freed page.

Fix the bug by enabling the RCU-based page table freeing mechanism,
CONFIG_HAVE_RCU_TABLE_FREE=y.

Testing with kernbench and mmap/munmap microbenchmarks, and neither showed
any noticeable performance impact.
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NJuergen Gross <jgross@suse.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170828082251.5562-1-vkuznets@redhat.com
[ Rewrote/fixed/clarified the changelog. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

9e52fc2b

x86/idt: Remove the tracing IDT leftovers · 1d792a67

由 Thomas Gleixner 提交于 8月 31, 2017

Stephen reported a merge conflict with the XEN tree. That also shows that the
IDT cleanup forgot to remove the now unused trace_{trap} defines.

Remove them.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>

1d792a67

29 8月, 2017 13 次提交

x86/idt: Hide set_intr_gate() · facaa3e3

由 Thomas Gleixner 提交于 8月 28, 2017

set_intr_gate() is an internal function of the IDT code. The only user left
is the KVM code which replaces the pagefault handler eventually.

Provide an explicit update_intr_gate() function and make set_intr_gate()
static. While at it replace the magic number 14 in the KVM code with the
proper trap define.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.663008004@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

facaa3e3

x86/idt: Deinline setup functions · db18da78

由 Thomas Gleixner 提交于 8月 28, 2017

None of this is performance sensitive in any way - so debloat the kernel.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.502052875@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

db18da78

x86/idt: Remove unused functions/inlines · 485fa57b

由 Thomas Gleixner 提交于 8月 28, 2017

The IDT related inlines are not longer used. Remove them.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.422083717@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

485fa57b

x86/idt: Move APIC gate initialization to tables · 636a7598

由 Thomas Gleixner 提交于 8月 28, 2017

Replace the APIC/SMP vector gate initialization with the table based
mechanism.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.260177013@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

636a7598

x86/idt: Move regular trap init to tables · b70543a0

由 Thomas Gleixner 提交于 8月 28, 2017

Initialize the regular traps with a table.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.182128165@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

b70543a0

x86/idt: Move IST stack based traps to table init · 90f6225f

由 Thomas Gleixner 提交于 8月 28, 2017

Initialize the IST based traps via a table.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.091328949@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

90f6225f

x86/idt: Move debug stack init to table based · 0a30908b

由 Thomas Gleixner 提交于 8月 28, 2017

Add the debug_idt init table and make use of it.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064959.006502252@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

0a30908b

x86/idt: Move early IDT setup out of 32-bit asm · 87e81786

由 Thomas Gleixner 提交于 8月 28, 2017

The early IDT setup can be done in C code like it's done on 64-bit kernels.
Reuse the 64-bit version.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.757980775@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

87e81786

x86/idt: Move early IDT handler setup to IDT code · 588787fd

由 Thomas Gleixner 提交于 8月 28, 2017

The early IDT handler setup is done in C entry code on 64-bit kernels and in
ASM entry code on 32-bit kernels.

Move the 64-bit variant to the IDT code so it can be shared with 32-bit
in the next step.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.679561404@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

588787fd

x86/idt: Consolidate IDT invalidation · e802a51e

由 Thomas Gleixner 提交于 8月 28, 2017

kexec and reboot have both code to invalidate IDT. Create a common function
and use it.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.600953282@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

e802a51e

x86/idt: Remove unused set_trap_gate() · 8f55868f

由 Thomas Gleixner 提交于 8月 28, 2017

This inline is not used at all.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.522053134@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

8f55868f

x86/ldttss: Clean up 32-bit descriptors · 87cc0376

由 Thomas Gleixner 提交于 8月 28, 2017

Like the IDT descriptors, the LDT/TSS descriptors are pointlessly different
on 32 and 64 bit kernels.

Unify them and get rid of the duplicated code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.289634692@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

87cc0376

x86/gdt: Use bitfields for initialization · 38e9e81f

由 Thomas Gleixner 提交于 8月 28, 2017

The GDT entry related code uses two ways to access entries via
union fields:

 - bitfields

 - macros which initialize the two 16-bit parts of the entry
   by magic shift and mask operations.

Clean it up and only use the bitfields to initialize and access entries.

( The old access patterns were partly done due to GCC optimizing bitfield
  accesses in a horrible way - that's mostly fixed these days and clarity
  of code in such low level accessors is very important. )
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20170828064958.197673367@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

38e9e81f