1. 11 1月, 2020 5 次提交
    • A
      efi/x86: Simplify mixed mode call wrapper · ea5e1919
      Ard Biesheuvel 提交于
      Calling 32-bit EFI runtime services from a 64-bit OS involves
      switching back to the flat mapping with a stack carved out of
      memory that is 32-bit addressable.
      
      There is no need to actually execute the 64-bit part of this
      routine from the flat mapping as well, as long as the entry
      and return address fit in 32 bits. There is also no need to
      preserve part of the calling context in global variables: we
      can simply push the old stack pointer value to the new stack,
      and keep the return address from the code32 section in EBX.
      
      While at it, move the conditional check whether to invoke
      the mixed mode version of SetVirtualAddressMap() into the
      64-bit implementation of the wrapper routine.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Cc: Matthew Garrett <mjg59@google.com>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200103113953.9571-11-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ea5e1919
    • A
      efi/x86: Simplify i386 efi_call_phys() firmware call wrapper · a46d6740
      Ard Biesheuvel 提交于
      The variadic efi_call_phys() wrapper that exists on i386 was
      originally created to call into any EFI firmware runtime service,
      but in practice, we only use it once, to call SetVirtualAddressMap()
      during early boot.
      The flexibility provided by the variadic nature also makes it
      type unsafe, and makes the assembler code more complicated than
      needed, since it has to deal with an unknown number of arguments
      living on the stack.
      
      So clean this up, by renaming the helper to efi_call_svam(), and
      dropping the unneeded complexity. Let's also drop the reference
      to the efi_phys struct and grab the address from the EFI system
      table directly.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Cc: Matthew Garrett <mjg59@google.com>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200103113953.9571-9-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a46d6740
    • A
      efi/x86: Split SetVirtualAddresMap() wrappers into 32 and 64 bit versions · 69829470
      Ard Biesheuvel 提交于
      Split the phys_efi_set_virtual_address_map() routine into 32 and 64 bit
      versions, so we can simplify them individually in subsequent patches.
      
      There is very little overlap between the logic anyway, and this has
      already been factored out in prolog/epilog routines which are completely
      different between 32 bit and 64 bit. So let's take it one step further,
      and get rid of the overlap completely.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Cc: Matthew Garrett <mjg59@google.com>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200103113953.9571-8-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69829470
    • A
      efi/x86: Avoid redundant cast of EFI firmware service pointer · 89ed4865
      Ard Biesheuvel 提交于
      All EFI firmware call prototypes have been annotated as __efiapi,
      permitting us to attach attributes regarding the calling convention
      by overriding __efiapi to an architecture specific value.
      
      On 32-bit x86, EFI firmware calls use the plain calling convention
      where all arguments are passed via the stack, and cleaned up by the
      caller. Let's add this to the __efiapi definition so we no longer
      need to cast the function pointers before invoking them.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Cc: Matthew Garrett <mjg59@google.com>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200103113953.9571-6-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      89ed4865
    • A
      efi/x86: Re-disable RT services for 32-bit kernels running on 64-bit EFI · 6cfcd6f0
      Ard Biesheuvel 提交于
      Commit a8147dba ("efi/x86: Rename efi_is_native() to efi_is_mixed()")
      renamed and refactored efi_is_native() into efi_is_mixed(), but failed
      to take into account that these are not diametrical opposites.
      
      Mixed mode is a construct that permits 64-bit kernels to boot on 32-bit
      firmware, but there is another non-native combination which is supported,
      i.e., 32-bit kernels booting on 64-bit firmware, but only for boot and not
      for runtime services. Also, mixed mode can be disabled in Kconfig, in
      which case the 64-bit kernel can still be booted from 32-bit firmware,
      but without access to runtime services.
      
      Due to this oversight, efi_runtime_supported() now incorrectly returns
      true for such configurations, resulting in crashes at boot. So fix this
      by making efi_runtime_supported() aware of this.
      
      As a side effect, some efi_thunk_xxx() stubs have become obsolete, so
      remove them as well.
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arvind Sankar <nivedita@alum.mit.edu>
      Cc: Matthew Garrett <mjg59@google.com>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20200103113953.9571-4-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6cfcd6f0
  2. 25 12月, 2019 8 次提交
  3. 10 12月, 2019 7 次提交
    • I
      mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions · 186525bd
      Ingo Molnar 提交于
      - Untangle the somewhat incestous way of how VMALLOC_START is used all across the
        kernel, but is, on x86, defined deep inside one of the lowest level page table headers.
        It doesn't help that vmalloc.h only includes a single asm header:
      
           #include <asm/page.h>           /* pgprot_t */
      
        So there was no existing cross-arch way to decouple address layout
        definitions from page.h details. I used this:
      
         #ifndef VMALLOC_START
         # include <asm/vmalloc.h>
         #endif
      
        This way every architecture that wants to simplify page.h can do so.
      
      - Also on x86 we had a couple of LDT related inline functions that used
        the late-stage address space layout positions - but these could be
        uninlined without real trouble - the end result is cleaner this way as
        well.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      186525bd
    • I
      mm/vmalloc: Add empty <asm/vmalloc.h> headers and use them from <linux/vmalloc.h> · 1f059dfd
      Ingo Molnar 提交于
      In the x86 MM code we'd like to untangle various types of historic
      header dependency spaghetti, but for this we'd need to pass to
      the generic vmalloc code various vmalloc related defines that
      customarily come via the <asm/page.h> low level arch header.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1f059dfd
    • I
      x86/mm: Tabulate the page table encoding definitions · 4efb5664
      Ingo Molnar 提交于
      I got lost in trying to figure out which bits were enabled
      in one of the PTE masks, so let's make it pretty
      obvious at the definition site already:
      
       #define PAGE_NONE            __pg(   0|   0|   0|___A|   0|   0|   0|___G)
       #define PAGE_SHARED          __pg(__PP|__RW|_USR|___A|__NX|   0|   0|   0)
       #define PAGE_SHARED_EXEC     __pg(__PP|__RW|_USR|___A|   0|   0|   0|   0)
       #define PAGE_COPY_NOEXEC     __pg(__PP|   0|_USR|___A|__NX|   0|   0|   0)
       #define PAGE_COPY_EXEC       __pg(__PP|   0|_USR|___A|   0|   0|   0|   0)
       #define PAGE_COPY            __pg(__PP|   0|_USR|___A|__NX|   0|   0|   0)
       #define PAGE_READONLY        __pg(__PP|   0|_USR|___A|__NX|   0|   0|   0)
       #define PAGE_READONLY_EXEC   __pg(__PP|   0|_USR|___A|   0|   0|   0|   0)
      
       #define __PAGE_KERNEL            (__PP|__RW|   0|___A|__NX|___D|   0|___G)
       #define __PAGE_KERNEL_EXEC       (__PP|__RW|   0|___A|   0|___D|   0|___G)
       #define _KERNPG_TABLE_NOENC      (__PP|__RW|   0|___A|   0|___D|   0|   0)
       #define _KERNPG_TABLE            (__PP|__RW|   0|___A|   0|___D|   0|   0| _ENC)
       #define _PAGE_TABLE_NOENC        (__PP|__RW|_USR|___A|   0|___D|   0|   0)
       #define _PAGE_TABLE              (__PP|__RW|_USR|___A|   0|___D|   0|   0| _ENC)
       #define __PAGE_KERNEL_RO         (__PP|   0|   0|___A|__NX|___D|   0|___G)
       #define __PAGE_KERNEL_RX         (__PP|   0|   0|___A|   0|___D|   0|___G)
       #define __PAGE_KERNEL_NOCACHE    (__PP|__RW|   0|___A|__NX|___D|   0|___G| __NC)
       #define __PAGE_KERNEL_VVAR       (__PP|   0|_USR|___A|__NX|___D|   0|___G)
       #define __PAGE_KERNEL_LARGE      (__PP|__RW|   0|___A|__NX|___D|_PSE|___G)
       #define __PAGE_KERNEL_LARGE_EXEC (__PP|__RW|   0|___A|   0|___D|_PSE|___G)
       #define __PAGE_KERNEL_WP         (__PP|__RW|   0|___A|__NX|___D|   0|___G| __WP)
      
      Especially security relevant bits like 'NX' or coherence related bits like 'G'
      are now super easy to read based on a single grep.
      
      We do the underscore gymnastics to not pollute the kernel's symbol namespace,
      and the longest line still fits into 80 columns, so this should be readable
      for everyone.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4efb5664
    • I
      x86/mm/pat: Clean up <asm/memtype.h> externs · 533d49b3
      Ingo Molnar 提交于
      Half of the declarations have an 'extern', half of them not,
      use 'extern' consistently.
      
      This makes grepping for APIs easier, such as:
      
        dagon:~/tip> git grep -E '\<memtype_.*\(' arch/x86/ | grep extern
        arch/x86/include/asm/memtype.h:extern int memtype_reserve(u64 start, u64 end,
        arch/x86/include/asm/memtype.h:extern int memtype_free(u64 start, u64 end);
        arch/x86/include/asm/memtype.h:extern int memtype_kernel_map_sync(u64 base, unsigned long size,
        arch/x86/include/asm/memtype.h:extern int memtype_reserve_io(resource_size_t start, resource_size_t end,
        arch/x86/include/asm/memtype.h:extern void memtype_free_io(resource_size_t start, resource_size_t end);
        arch/x86/mm/pat/memtype.h:extern int memtype_check_insert(struct memtype *entry_new,
        arch/x86/mm/pat/memtype.h:extern struct memtype *memtype_erase(u64 start, u64 end);
        arch/x86/mm/pat/memtype.h:extern struct memtype *memtype_lookup(u64 addr);
        arch/x86/mm/pat/memtype.h:extern int memtype_copy_nth_element(struct memtype *entry_out, loff_t pos);
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      533d49b3
    • I
      x86/mm/pat: Rename <asm/pat.h> => <asm/memtype.h> · eb243d1d
      Ingo Molnar 提交于
      pat.h is a file whose main purpose is to provide the memtype_*() APIs.
      
      PAT is the low level hardware mechanism - but the high level abstraction
      is memtype.
      
      So name the header <memtype.h> as well - this goes hand in hand with memtype.c
      and memtype_interval.c.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      eb243d1d
    • I
      x86/mm/pat: Standardize on memtype_*() prefix for APIs · ecdd6ee7
      Ingo Molnar 提交于
      Half of our memtype APIs are memtype_ prefixed, the other half are _memtype suffixed:
      
      	reserve_memtype()
      	free_memtype()
      	kernel_map_sync_memtype()
      	io_reserve_memtype()
      	io_free_memtype()
      
      	memtype_check_insert()
      	memtype_erase()
      	memtype_lookup()
      	memtype_copy_nth_element()
      
      Use prefixes consistently, like most other modern kernel APIs:
      
      	reserve_memtype()		=> memtype_reserve()
      	free_memtype()			=> memtype_free()
      	kernel_map_sync_memtype()	=> memtype_kernel_map_sync()
      	io_reserve_memtype()		=> memtype_reserve_io()
      	io_free_memtype()		=> memtype_free_io()
      
      	memtype_check_insert()		=> memtype_check_insert()
      	memtype_erase()			=> memtype_erase()
      	memtype_lookup()		=> memtype_lookup()
      	memtype_copy_nth_element()	=> memtype_copy_nth_element()
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ecdd6ee7
    • I
      x86/mm/pat: Disambiguate PAT-disabled boot messages · 5557e831
      Ingo Molnar 提交于
      Right now we have these four types of PAT-disabled boot messages:
      
        x86/PAT: PAT support disabled.
        x86/PAT: PAT MSR is 0, disabled.
        x86/PAT: MTRRs disabled, skipping PAT initialization too.
        x86/PAT: PAT not supported by CPU.
      
      The first message is ambiguous in that it doesn't signal that PAT is off
      due to a boot option.
      
      The second message doesn't really make it clear that this is the MSR value
      during early bootup and it's the firmware environment that disabled PAT
      support.
      
      The fourth message doesn't really make it clear that we disable PAT support
      because CONFIG_MTRR is off in the kernel.
      
      Clarify, harmonize and fix the spelling in these user-visible messages:
      
        x86/PAT: PAT support disabled via boot option.
        x86/PAT: PAT support disabled by the firmware.
        x86/PAT: PAT support disabled because CONFIG_MTRR is disabled in the kernel.
        x86/PAT: PAT not supported by the CPU.
      
      Also add a fifth message, in case PAT support is disabled at build time:
      
        x86/PAT: PAT support disabled because CONFIG_X86_PAT is disabled in the kernel.
      
      Previously we'd just silently return from pat_init() without giving any indication
      that PAT support is off.
      
      Finally, clarify/extend some of the comments related to PAT initialization.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5557e831
  4. 28 11月, 2019 1 次提交
    • S
      x86/fpu: Don't cache access to fpu_fpregs_owner_ctx · 59c4bd85
      Sebastian Andrzej Siewior 提交于
      The state/owner of the FPU is saved to fpu_fpregs_owner_ctx by pointing
      to the context that is currently loaded. It never changed during the
      lifetime of a task - it remained stable/constant.
      
      After deferred FPU registers loading until return to userland was
      implemented, the content of fpu_fpregs_owner_ctx may change during
      preemption and must not be cached.
      
      This went unnoticed for some time and was now noticed, in particular
      since gcc 9 is caching that load in copy_fpstate_to_sigframe() and
      reusing it in the retry loop:
      
        copy_fpstate_to_sigframe()
          load fpu_fpregs_owner_ctx and save on stack
          fpregs_lock()
          copy_fpregs_to_sigframe() /* failed */
          fpregs_unlock()
               *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***
      
          fault_in_pages_writeable() /* succeed, retry */
      
          fpregs_lock()
      	__fpregs_load_activate()
      	  fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
          copy_fpregs_to_sigframe() /* succeeds, random FPU content */
      
      This is a comparison of the assembly produced by gcc 9, without vs with this
      patch:
      
      | # arch/x86/kernel/fpu/signal.c:173:      if (!access_ok(buf, size))
      |        cmpq    %rdx, %rax      # tmp183, _4
      |        jb      .L190   #,
      |-# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |-#APP
      |-# 512 "arch/x86/include/asm/fpu/internal.h" 1
      |-       movq %gs:fpu_fpregs_owner_ctx,%rax      #, pfo_ret__
      |-# 0 "" 2
      |-#NO_APP
      |-       movq    %rax, -88(%rbp) # pfo_ret__, %sfp
      …
      |-# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |-       movq    -88(%rbp), %rcx # %sfp, pfo_ret__
      |-       cmpq    %rcx, -64(%rbp) # pfo_ret__, %sfp
      |+# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |+#APP
      |+# 512 "arch/x86/include/asm/fpu/internal.h" 1
      |+       movq %gs:fpu_fpregs_owner_ctx(%rip),%rax        # fpu_fpregs_owner_ctx, pfo_ret__
      |+# 0 "" 2
      |+# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |+#NO_APP
      |+       cmpq    %rax, -64(%rbp) # pfo_ret__, %sfp
      
      Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
      fpu_fpregs_owner_ctx during preemption points.
      
      The Fixes: tag points to the commit where deferred FPU loading was
      added. Since this commit, the compiler is no longer allowed to move the
      load of fpu_fpregs_owner_ctx somewhere else / outside of the locked
      section. A task preemption will change its value and stale content will
      be observed.
      
       [ bp: Massage. ]
      Debugged-by: NAustin Clements <austin@google.com>
      Debugged-by: NDavid Chase <drchase@golang.org>
      Debugged-by: NIan Lance Taylor <ian@airs.com>
      Fixes: 5f409e20 ("x86/fpu: Defer FPU state load until return to userspace")
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Cc: Aubrey Li <aubrey.li@intel.com>
      Cc: Austin Clements <austin@google.com>
      Cc: Barret Rhoden <brho@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Chase <drchase@golang.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: ian@airs.com
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Bleecher Snyder <josharian@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20191128085306.hxfa2o3knqtu4wfn@linutronix.de
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=205663
      59c4bd85
  5. 27 11月, 2019 5 次提交
    • A
      perf/x86: Implement immediate enforcement of /sys/devices/cpu/rdpmc value of 0 · 405b4537
      Anthony Steinhauser 提交于
      When you successfully write 0 to /sys/devices/cpu/rdpmc, the RDPMC
      instruction should be disabled unconditionally and immediately (after you
      close the SYSFS file) by the documentation.
      
      Instead, in the current implementation the PMU must be reloaded which
      happens only eventually some time in the future. Only after that the RDPMC
      instruction becomes disabled (on ring 3) on the respective core.
      
      This change makes the treatment of the 0 value as blocking and as
      unconditional as the current treatment of the 2 value, only the CR4.PCE
      bit is naturally set to false instead of true.
      Signed-off-by: NAnthony Steinhauser <asteinhauser@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: https://lkml.kernel.org/r/20191125054838.137615-1-asteinhauser@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      405b4537
    • A
      x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit · 7d8d8cfd
      Andy Lutomirski 提交于
      The old x86_32 doublefault_fn() was old and crufty, and it did not
      even try to recover.  do_double_fault() is much nicer.  Rewrite the
      32-bit double fault code to sanitize CPU state and call
      do_double_fault().  This is mostly an exercise i386 archaeology.
      
      With this patch applied, 32-bit double faults get a real stack trace,
      just like 64-bit double faults.
      
      [ mingo: merged the patch to a later kernel base. ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7d8d8cfd
    • A
      x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area · dc4e0021
      Andy Lutomirski 提交于
      There are three problems with the current layout of the doublefault
      stack and TSS.  First, the TSS is only cacheline-aligned, which is
      not enough -- if the hardware portion of the TSS (struct x86_hw_tss)
      crosses a page boundary, horrible things happen [0].  Second, the
      stack and TSS are global, so simultaneous double faults on different
      CPUs will cause massive corruption.  Third, the whole mechanism
      won't work if user CR3 is loaded, resulting in a triple fault [1].
      
      Let the doublefault stack and TSS share a page (which prevents the
      TSS from spanning a page boundary), make it percpu, and move it into
      cpu_entry_area.  Teach the stack dump code about the doublefault
      stack.
      
      [0] Real hardware will read past the end of the page onto the next
          *physical* page if a task switch happens.  Virtual machines may
          have any number of bugs, and I would consider it reasonable for
          a VM to summarily kill the guest if it tries to task-switch to
          a page-spanning TSS.
      
      [1] Real hardware triple faults.  At least some VMs seem to hang.
          I'm not sure what's going on.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      dc4e0021
    • A
      x86/traps: Disentangle the 32-bit and 64-bit doublefault code · 93efbde2
      Andy Lutomirski 提交于
      The 64-bit doublefault handler is much nicer than the 32-bit one.
      As a first step toward unifying them, make the 64-bit handler
      self-contained.  This should have no effect no functional effect
      except in the odd case of x86_64 with CONFIG_DOUBLEFAULT=n in which
      case it will change the logging a bit.
      
      This also gets rid of CONFIG_DOUBLEFAULT configurability on 64-bit
      kernels.  It didn't do anything useful -- CONFIG_DOUBLEFAULT=n
      didn't actually disable doublefault handling on x86_64.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      93efbde2
    • I
      x86/iopl: Make 'struct tss_struct' constant size again · 0bcd7762
      Ingo Molnar 提交于
      After the following commit:
      
        05b042a1: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise")
      
      'struct cpu_entry_area' has to be Kconfig invariant, so that we always
      have a matching CPU_ENTRY_AREA_PAGES size.
      
      This commit added a CONFIG_X86_IOPL_IOPERM dependency to tss_struct:
      
        111e7b15: ("x86/ioperm: Extend IOPL config to control ioperm() as well")
      
      Which, if CONFIG_X86_IOPL_IOPERM is turned off, reduces the size of
      cpu_entry_area by two pages, triggering the assert:
      
        ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_202’ declared with attribute error: BUILD_BUG_ON failed: (CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE
      
      Simplify the Kconfig dependencies and make cpu_entry_area constant
      size on 32-bit kernels again.
      
      Fixes: 05b042a1: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise")
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0bcd7762
  6. 25 11月, 2019 2 次提交
    • W
      locking/refcount: Consolidate implementations of refcount_t · fb041bb7
      Will Deacon 提交于
      The generic implementation of refcount_t should be good enough for
      everybody, so remove ARCH_HAS_REFCOUNT and REFCOUNT_FULL entirely,
      leaving the generic implementation enabled unconditionally.
      Signed-off-by: NWill Deacon <will@kernel.org>
      Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Tested-by: NHanjun Guo <guohanjun@huawei.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20191121115902.2551-9-will@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb041bb7
    • I
      x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the... · 05b042a1
      Ingo Molnar 提交于
      x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise
      
      When two recent commits that increased the size of the 'struct cpu_entry_area'
      were merged in -tip, the 32-bit defconfig build started failing on the following
      build time assert:
      
        ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_189’ declared with attribute error: BUILD_BUG_ON failed: CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE
        arch/x86/mm/cpu_entry_area.c:189:2: note: in expansion of macro ‘BUILD_BUG_ON’
        In function ‘setup_cpu_entry_area_ptes’,
      
      Which corresponds to the following build time assert:
      
      	BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE);
      
      The purpose of this assert is to sanity check the fixed-value definition of
      CPU_ENTRY_AREA_PAGES arch/x86/include/asm/pgtable_32_types.h:
      
      	#define CPU_ENTRY_AREA_PAGES    (NR_CPUS * 41)
      
      The '41' is supposed to match sizeof(struct cpu_entry_area)/PAGE_SIZE, which value
      we didn't want to define in such a low level header, because it would cause
      dependency hell.
      
      Every time the size of cpu_entry_area is changed, we have to adjust CPU_ENTRY_AREA_PAGES
      accordingly - and this assert is checking that constraint.
      
      But the assert is both imprecise and buggy, primarily because it doesn't
      include the single readonly IDT page that is mapped at CPU_ENTRY_AREA_BASE
      (which begins at a PMD boundary).
      
      This bug was hidden by the fact that by accident CPU_ENTRY_AREA_PAGES is defined
      too large upstream (v5.4-rc8):
      
      	#define CPU_ENTRY_AREA_PAGES    (NR_CPUS * 40)
      
      While 'struct cpu_entry_area' is 155648 bytes, or 38 pages. So we had two extra
      pages, which hid the bug.
      
      The following commit (not yet upstream) increased the size to 40 pages:
      
        x86/iopl: ("Restrict iopl() permission scope")
      
      ... but increased CPU_ENTRY_AREA_PAGES only 41 - i.e. shortening the gap
      to just 1 extra page.
      
      Then another not-yet-upstream commit changed the size again:
      
        880a98c3: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")
      
      Which increased the cpu_entry_area size from 38 to 39 pages, but
      didn't change CPU_ENTRY_AREA_PAGES (kept it at 40). This worked
      fine, because we still had a page left from the accidental 'reserve'.
      
      But when these two commits were merged into the same tree, the
      combined size of cpu_entry_area grew from 38 to 40 pages, while
      CPU_ENTRY_AREA_PAGES finally caught up to 40 as well.
      
      Which is fine in terms of functionality, but the assert broke:
      
      	BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE);
      
      because CPU_ENTRY_AREA_MAP_SIZE is the total size of the area,
      which is 1 page larger due to the IDT page.
      
      To fix all this, change the assert to two precise asserts:
      
      	BUILD_BUG_ON((CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE);
      	BUILD_BUG_ON(CPU_ENTRY_AREA_TOTAL_SIZE != CPU_ENTRY_AREA_MAP_SIZE);
      
      This takes the IDT page into account, and also connects the size-based
      define of CPU_ENTRY_AREA_TOTAL_SIZE with the address-subtraction based
      define of CPU_ENTRY_AREA_MAP_SIZE.
      
      Also clean up some of the names which made it rather confusing:
      
       - 'CPU_ENTRY_AREA_TOT_SIZE' wasn't actually the 'total' size of
         the cpu-entry-area, but the per-cpu array size, so rename this
         to CPU_ENTRY_AREA_ARRAY_SIZE.
      
       - Introduce CPU_ENTRY_AREA_TOTAL_SIZE that _is_ the total mapping
         size, with the IDT included.
      
       - Add comments where '+1' denotes the IDT mapping - it wasn't
         obvious and took me about 3 hours to decode...
      
      Finally, because this particular commit is actually applied after
      this patch:
      
        880a98c3: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")
      
      Fix the CPU_ENTRY_AREA_PAGES value from 40 pages to the correct 39 pages.
      
      All future commits that change cpu_entry_area will have to adjust
      this value precisely.
      
      As a side note, we should probably attempt to remove CPU_ENTRY_AREA_PAGES
      and derive its value directly from the structure, without causing
      header hell - but that is an adventure for another day! :-)
      
      Fixes: 880a98c3: ("x86/cpu_entry_area: Add guard page for entry stack on 32bit")
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      05b042a1
  7. 22 11月, 2019 2 次提交
  8. 21 11月, 2019 1 次提交
  9. 20 11月, 2019 1 次提交
  10. 18 11月, 2019 1 次提交
  11. 16 11月, 2019 7 次提交
    • T
      x86/ioperm: Extend IOPL config to control ioperm() as well · 111e7b15
      Thomas Gleixner 提交于
      If iopl() is disabled, then providing ioperm() does not make much sense.
      
      Rename the config option and disable/enable both syscalls with it. Guard
      the code with #ifdefs where appropriate.
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      111e7b15
    • T
      x86/iopl: Remove legacy IOPL option · a24ca997
      Thomas Gleixner 提交于
      The IOPL emulation via the I/O bitmap is sufficient. Remove the legacy
      cruft dealing with the (e)flags based IOPL mechanism.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: Juergen Gross <jgross@suse.com> (Paravirt and Xen parts)
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      a24ca997
    • T
      x86/iopl: Restrict iopl() permission scope · c8137ace
      Thomas Gleixner 提交于
      The access to the full I/O port range can be also provided by the TSS I/O
      bitmap, but that would require to copy 8k of data on scheduling in the
      task. As shown with the sched out optimization TSS.io_bitmap_base can be
      used to switch the incoming task to a preallocated I/O bitmap which has all
      bits zero, i.e. allows access to all I/O ports.
      
      Implementing this allows to provide an iopl() emulation mode which restricts
      the IOPL level 3 permissions to I/O port access but removes the STI/CLI
      permission which is coming with the hardware IOPL mechansim.
      
      Provide a config option to switch IOPL to emulation mode, make it the
      default and while at it also provide an option to disable IOPL completely.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      c8137ace
    • T
      x86/ioperm: Share I/O bitmap if identical · 4804e382
      Thomas Gleixner 提交于
      The I/O bitmap is duplicated on fork. That's wasting memory and slows down
      fork. There is no point to do so. As long as the bitmap is not modified it
      can be shared between threads and processes.
      
      Add a refcount and just share it on fork. If a task modifies the bitmap
      then it has to do the duplication if and only if it is shared.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      4804e382
    • T
      x86/ioperm: Remove bitmap if all permissions dropped · ea5f1cd7
      Thomas Gleixner 提交于
      If ioperm() results in a bitmap with all bits set (no permissions to any
      I/O port), then handling that bitmap on context switch and exit to user
      mode is pointless. Drop it.
      
      Move the bitmap exit handling to the ioport code and reuse it for both the
      thread exit path and dropping it. This allows to reuse this code for the
      upcoming iopl() emulation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      ea5f1cd7
    • T
      x86/ioperm: Move TSS bitmap update to exit to user work · 22fe5b04
      Thomas Gleixner 提交于
      There is no point to update the TSS bitmap for tasks which use I/O bitmaps
      on every context switch. It's enough to update it right before exiting to
      user space.
      
      That reduces the context switch bitmap handling to invalidating the io
      bitmap base offset in the TSS when the outgoing task has TIF_IO_BITMAP
      set. The invaldiation is done on purpose when a task with an IO bitmap
      switches out to prevent any possible leakage of an activated IO bitmap.
      
      It also removes the requirement to update the tasks bitmap atomically in
      ioperm().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      22fe5b04
    • T
      x86/ioperm: Add bitmap sequence number · 060aa16f
      Thomas Gleixner 提交于
      Add a globally unique sequence number which is incremented when ioperm() is
      changing the I/O bitmap of a task. Store the new sequence number in the
      io_bitmap structure and compare it with the sequence number of the I/O
      bitmap which was last loaded on a CPU. Only update the bitmap if the
      sequence is different.
      
      That should further reduce the overhead of I/O bitmap scheduling when there
      are only a few I/O bitmap users on the system.
      
      The 64bit sequence counter is sufficient. A wraparound of the sequence
      counter assuming an ioperm() call every nanosecond would require about 584
      years of uptime.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      060aa16f