1. 12 4月, 2018 1 次提交
    • D
      x86/mm: Do not auto-massage page protections · fb43d6cb
      Dave Hansen 提交于
      A PTE is constructed from a physical address and a pgprotval_t.
      __PAGE_KERNEL, for instance, is a pgprot_t and must be converted
      into a pgprotval_t before it can be used to create a PTE.  This is
      done implicitly within functions like pfn_pte() by massage_pgprot().
      
      However, this makes it very challenging to set bits (and keep them
      set) if your bit is being filtered out by massage_pgprot().
      
      This moves the bit filtering out of pfn_pte() and friends.  For
      users of PAGE_KERNEL*, filtering will be done automatically inside
      those macros but for users of __PAGE_KERNEL*, they need to do their
      own filtering now.
      
      Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
      instead of massage_pgprot().  This way, we still *look* for
      unsupported bits and properly warn about them if we find them.  This
      might happen if an unfiltered __PAGE_KERNEL* value was passed in,
      for instance.
      
      - printk format warning fix from: Arnd Bergmann <arnd@arndb.de>
      - boot crash fix from:            Tom Lendacky <thomas.lendacky@amd.com>
      - crash bisected by:              Mike Galbraith <efault@gmx.de>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reported-and-fixed-by: NArnd Bergmann <arnd@arndb.de>
      Fixed-by: NTom Lendacky <thomas.lendacky@amd.com>
      Bisected-by: NMike Galbraith <efault@gmx.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb43d6cb
  2. 10 4月, 2018 1 次提交
    • D
      x86/mm: Introduce "default" kernel PTE mask · 8a57f484
      Dave Hansen 提交于
      The __PAGE_KERNEL_* page permissions are "raw".  They contain bits
      that may or may not be supported on the current processor.  They need
      to be filtered by a mask (currently __supported_pte_mask) to turn them
      into a value that we can actually set in a PTE.
      
      These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL.  But, with PTI,
      we want to be able to support _PAGE_GLOBAL (have the bit set in
      __supported_pte_mask) but not have it appear in any of these masks by
      default.
      
      This patch creates a new mask, __default_kernel_pte_mask, and applies
      it when creating all of the PAGE_KERNEL_* masks.  This makes
      PAGE_KERNEL_* safe to use anywhere (they only contain supported bits).
      It also ensures that PAGE_KERNEL_* contains _PAGE_GLOBAL on PTI=n
      kernels but clears _PAGE_GLOBAL when PTI=y.
      
      We also make __default_kernel_pte_mask a non-GPL exported symbol
      because there are plenty of driver-available interfaces that take
      PAGE_KERNEL_* permissions.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205506.030DB6B6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8a57f484
  3. 05 4月, 2018 1 次提交
    • D
      x86/uapi: Fix asm/bootparam.h userspace compilation errors · 9820e1c3
      Dmitry V. Levin 提交于
      Consistently use types provided by <linux/types.h> to fix the following
      asm/bootparam.h userspace compilation errors:
      
      	/usr/include/asm/bootparam.h:140:2: error: unknown type name 'u16'
      	  u16 version;
      	/usr/include/asm/bootparam.h:141:2: error: unknown type name 'u16'
      	  u16 compatible_version;
      	/usr/include/asm/bootparam.h:142:2: error: unknown type name 'u16'
      	  u16 pm_timer_address;
      	/usr/include/asm/bootparam.h:143:2: error: unknown type name 'u16'
      	  u16 num_cpus;
      	/usr/include/asm/bootparam.h:144:2: error: unknown type name 'u64'
      	  u64 pci_mmconfig_base;
      	/usr/include/asm/bootparam.h:145:2: error: unknown type name 'u32'
      	  u32 tsc_khz;
      	/usr/include/asm/bootparam.h:146:2: error: unknown type name 'u32'
      	  u32 apic_khz;
      	/usr/include/asm/bootparam.h:147:2: error: unknown type name 'u8'
      	  u8 standard_ioapic;
      	/usr/include/asm/bootparam.h:148:2: error: unknown type name 'u8'
      	  u8 cpu_ids[255];
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Acked-by: NJan Kiszka <jan.kiszka@siemens.com>
      Cc: <stable@vger.kernel.org> # v4.16
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 4a362601 ("x86/jailhouse: Add infrastructure for running in non-root cell")
      Link: http://lkml.kernel.org/r/20180405043210.GA13254@altlinux.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9820e1c3
  4. 29 3月, 2018 1 次提交
  5. 28 3月, 2018 1 次提交
  6. 27 3月, 2018 2 次提交
  7. 26 3月, 2018 1 次提交
  8. 24 3月, 2018 1 次提交
  9. 21 3月, 2018 1 次提交
    • L
      kvm/x86: fix icebp instruction handling · 32d43cd3
      Linus Torvalds 提交于
      The undocumented 'icebp' instruction (aka 'int1') works pretty much like
      'int3' in the absense of in-circuit probing equipment (except,
      obviously, that it raises #DB instead of raising #BP), and is used by
      some validation test-suites as such.
      
      But Andy Lutomirski noticed that his test suite acted differently in kvm
      than on bare hardware.
      
      The reason is that kvm used an inexact test for the icebp instruction:
      it just assumed that an all-zero VM exit qualification value meant that
      the VM exit was due to icebp.
      
      That is not unlike the guess that do_debug() does for the actual
      exception handling case, but it's purely a heuristic, not an absolute
      rule.  do_debug() does it because it wants to ascribe _some_ reasons to
      the #DB that happened, and an empty %dr6 value means that 'icebp' is the
      most likely casue and we have no better information.
      
      But kvm can just do it right, because unlike the do_debug() case, kvm
      actually sees the real reason for the #DB in the VM-exit interruption
      information field.
      
      So instead of relying on an inexact heuristic, just use the actual VM
      exit information that says "it was 'icebp'".
      
      Right now the 'icebp' instruction isn't technically documented by Intel,
      but that will hopefully change.  The special "privileged software
      exception" information _is_ actually mentioned in the Intel SDM, even
      though the cause of it isn't enumerated.
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32d43cd3
  10. 20 3月, 2018 8 次提交
  11. 17 3月, 2018 1 次提交
  12. 16 3月, 2018 1 次提交
    • R
      x86/tsc: Convert ART in nanoseconds to TSC · fc804f65
      Rajvi Jingar 提交于
      Device drivers use get_device_system_crosststamp() to produce precise
      system/device cross-timestamps. The PHC clock and ALSA interfaces, for
      example, make the cross-timestamps available to user applications.  On
      Intel platforms, get_device_system_crosststamp() requires a TSC value
      derived from ART (Always Running Timer) to compute the monotonic raw and
      realtime system timestamps.
      
      Starting with Intel Goldmont platforms, the PCIe root complex supports the
      PTM time sync protocol. PTM requires all timestamps to be in units of
      nanoseconds. The Intel root complex hardware propagates system time derived
      from ART in units of nanoseconds performing the conversion as follows:
      
           ART_NS = ART * 1e9 / <crystal frequency>
      
      When user software requests a cross-timestamp, the system timestamps
      (generally read from device registers) must be converted to TSC by the
      driver software as follows:
      
          TSC = ART_NS * TSC_KHZ / 1e6
      
      This is valid when CPU feature flag X86_FEATURE_TSC_KNOWN_FREQ is set
      indicating that tsc_khz is derived from CPUID[15H]. Drivers should check
      whether this flag is set before conversion to TSC is attempted.
      Suggested-by: NChristopher S. Hall <christopher.s.hall@intel.com>
      Signed-off-by: NRajvi Jingar <rajvi.jingar@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: peterz@infradead.org
      Link: https://lkml.kernel.org/r/1520530116-4925-1-git-send-email-rajvi.jingar@intel.com
      fc804f65
  13. 14 3月, 2018 1 次提交
  14. 12 3月, 2018 11 次提交
  15. 09 3月, 2018 1 次提交
    • F
      x86/kprobes: Fix kernel crash when probing .entry_trampoline code · c07a8f8b
      Francis Deslauriers 提交于
      Disable the kprobe probing of the entry trampoline:
      
      .entry_trampoline is a code area that is used to ensure page table
      isolation between userspace and kernelspace.
      
      At the beginning of the execution of the trampoline, we load the
      kernel's CR3 register. This has the effect of enabling the translation
      of the kernel virtual addresses to physical addresses. Before this
      happens most kernel addresses can not be translated because the running
      process' CR3 is still used.
      
      If a kprobe is placed on the trampoline code before that change of the
      CR3 register happens the kernel crashes because int3 handling pages are
      not accessible.
      
      To fix this, add the .entry_trampoline section to the kprobe blacklist
      to prohibit the probing of code before all the kernel pages are
      accessible.
      Signed-off-by: NFrancis Deslauriers <francis.deslauriers@efficios.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: mathieu.desnoyers@efficios.com
      Cc: mhiramat@kernel.org
      Link: http://lkml.kernel.org/r/1520565492-4637-2-git-send-email-francis.deslauriers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c07a8f8b
  16. 08 3月, 2018 7 次提交