1. 30 8月, 2018 1 次提交
  2. 24 8月, 2018 5 次提交
  3. 23 8月, 2018 5 次提交
    • P
      x86/mm/tlb: Revert the recent lazy TLB patches · 52a288c7
      Peter Zijlstra 提交于
      Revert commits:
      
        95b0e635 x86/mm/tlb: Always use lazy TLB mode
        64482aaf x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs
        ac031589 x86/mm/tlb: Make lazy TLB mode lazier
        61d0beb5 x86/mm/tlb: Restructure switch_mm_irqs_off()
        2ff6ddf1 x86/mm/tlb: Leave lazy TLB mode at page table free time
      
      In order to simplify the TLB invalidate fixes for x86 and unify the
      parts that need backporting.  We'll try again later.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NRik van Riel <riel@surriel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52a288c7
    • A
      module: use relative references for __ksymtab entries · 7290d580
      Ard Biesheuvel 提交于
      An ordinary arm64 defconfig build has ~64 KB worth of __ksymtab entries,
      each consisting of two 64-bit fields containing absolute references, to
      the symbol itself and to a char array containing its name, respectively.
      
      When we build the same configuration with KASLR enabled, we end up with an
      additional ~192 KB of relocations in the .init section, i.e., one 24 byte
      entry for each absolute reference, which all need to be processed at boot
      time.
      
      Given how the struct kernel_symbol that describes each entry is completely
      local to module.c (except for the references emitted by EXPORT_SYMBOL()
      itself), we can easily modify it to contain two 32-bit relative references
      instead.  This reduces the size of the __ksymtab section by 50% for all
      64-bit architectures, and gets rid of the runtime relocations entirely for
      architectures implementing KASLR, either via standard PIE linking (arm64)
      or using custom host tools (x86).
      
      Note that the binary search involving __ksymtab contents relies on each
      section being sorted by symbol name.  This is implemented based on the
      input section names, not the names in the ksymtab entries, so this patch
      does not interfere with that.
      
      Given that the use of place-relative relocations requires support both in
      the toolchain and in the module loader, we cannot enable this feature for
      all architectures.  So make it dependent on whether
      CONFIG_HAVE_ARCH_PREL32_RELOCATIONS is defined.
      
      Link: http://lkml.kernel.org/r/20180704083651.24360-4-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NJessica Yu <jeyu@kernel.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morris <james.morris@microsoft.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7290d580
    • A
      module: allow symbol exports to be disabled · f922c4ab
      Ard Biesheuvel 提交于
      To allow existing C code to be incorporated into the decompressor or the
      UEFI stub, introduce a CPP macro that turns all EXPORT_SYMBOL_xxx
      declarations into nops, and #define it in places where such exports are
      undesirable.  Note that this gets rid of a rather dodgy redefine of
      linux/export.h's header guard.
      
      Link: http://lkml.kernel.org/r/20180704083651.24360-3-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NNicolas Pitre <nico@linaro.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morris <james.morris@microsoft.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f922c4ab
    • A
      arch: enable relative relocations for arm64, power and x86 · 271ca788
      Ard Biesheuvel 提交于
      Patch series "add support for relative references in special sections", v10.
      
      This adds support for emitting special sections such as initcall arrays,
      PCI fixups and tracepoints as relative references rather than absolute
      references.  This reduces the size by 50% on 64-bit architectures, but
      more importantly, it removes the need for carrying relocation metadata for
      these sections in relocatable kernels (e.g., for KASLR) that needs to be
      fixed up at boot time.  On arm64, this reduces the vmlinux footprint of
      such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs
      4 byte relative reference)
      
      Patch #3 was sent out before as a single patch.  This series supersedes
      the previous submission.  This version makes relative ksymtab entries
      dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
      than trying to infer from kbuild test robot replies for which
      architectures it should be blacklisted.
      
      Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
      and sets it for the main architectures that are expected to benefit the
      most from this feature, i.e., 64-bit architectures or ones that use
      runtime relocations.
      
      Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
      ksymtab/kcrctab sections in decompressor and EFI stub objects when
      rebuilding existing C files to run in a different context.
      
      Patches #4 - #6 implement relative references for initcalls, PCI fixups
      and tracepoints, respectively, all of which produce sections with order
      ~1000 entries on an arm64 defconfig kernel with tracing enabled.  This
      means we save about 28 KB of vmlinux space for each of these patches.
      
      [From the v7 series blurb, which included the jump_label patches as well]:
      
        For the arm64 kernel, all patches combined reduce the memory footprint
        of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has
        KASLR enabled), of which ~1 MB is the size reduction of the RELA section
        in .init, and the remaining 300 KB is reduction of .text/.data.
      
      This patch (of 6):
      
      Before updating certain subsystems to use place relative 32-bit
      relocations in special sections, to save space and reduce the number of
      absolute relocations that need to be processed at runtime by relocatable
      kernels, introduce the Kconfig symbol and define it for some architectures
      that should be able to support and benefit from it.
      
      Link: http://lkml.kernel.org/r/20180704083651.24360-2-ard.biesheuvel@linaro.orgSigned-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
      Cc: James Morris <james.morris@microsoft.com>
      Cc: Jessica Yu <jeyu@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      271ca788
    • M
      mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7
      Michal Hocko 提交于
      There are several blockable mmu notifiers which might sleep in
      mmu_notifier_invalidate_range_start and that is a problem for the
      oom_reaper because it needs to guarantee a forward progress so it cannot
      depend on any sleepable locks.
      
      Currently we simply back off and mark an oom victim with blockable mmu
      notifiers as done after a short sleep.  That can result in selecting a new
      oom victim prematurely because the previous one still hasn't torn its
      memory down yet.
      
      We can do much better though.  Even if mmu notifiers use sleepable locks
      there is no reason to automatically assume those locks are held.  Moreover
      majority of notifiers only care about a portion of the address space and
      there is absolutely zero reason to fail when we are unmapping an unrelated
      range.  Many notifiers do really block and wait for HW which is harder to
      handle and we have to bail out though.
      
      This patch handles the low hanging fruit.
      __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
      are not allowed to sleep if the flag is set to false.  This is achieved by
      using trylock instead of the sleepable lock for most callbacks and
      continue as long as we do not block down the call chain.
      
      I think we can improve that even further because there is a common pattern
      to do a range lookup first and then do something about that.  The first
      part can be done without a sleeping lock in most cases AFAICS.
      
      The oom_reaper end then simply retries if there is at least one notifier
      which couldn't make any progress in !blockable mode.  A retry loop is
      already implemented to wait for the mmap_sem and this is basically the
      same thing.
      
      The simplest way for driver developers to test this code path is to wrap
      userspace code which uses these notifiers into a memcg and set the hard
      limit to hit the oom.  This can be done e.g.  after the test faults in all
      the mmu notifier managed memory and set the hard limit to something really
      small.  Then we are looking for a proper process tear down.
      
      [akpm@linux-foundation.org: coding style fixes]
      [akpm@linux-foundation.org: minor code simplification]
      Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Sudeep Dutt <sudeep.dutt@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93065ac7
  4. 22 8月, 2018 6 次提交
    • P
      KVM: VMX: fixes for vmentry_l1d_flush module parameter · 0027ff2a
      Paolo Bonzini 提交于
      Two bug fixes:
      
      1) missing entries in the l1d_param array; this can cause a host crash
      if an access attempts to reach the missing entry. Future-proof the get
      function against any overflows as well.  However, the two entries
      VMENTER_L1D_FLUSH_EPT_DISABLED and VMENTER_L1D_FLUSH_NOT_REQUIRED must
      not be accepted by the parse function, so disable them there.
      
      2) invalid values must be rejected even if the CPU does not have the
      bug, so test for them before checking boot_cpu_has(X86_BUG_L1TF)
      
      ... and a small refactoring, since the .cmd field is redundant with
      the index in the array.
      Reported-by: NBandan Das <bsd@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: a7b9020bSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0027ff2a
    • T
      KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled · 024d83ca
      Thomas Gleixner 提交于
      Mikhail reported the following lockdep splat:
      
      WARNING: possible irq lock inversion dependency detected
      CPU 0/KVM/10284 just changed the state of lock:
        000000000d538a88 (&st->lock){+...}, at:
        speculative_store_bypass_update+0x10b/0x170
      
      but this lock was taken by another, HARDIRQ-safe lock
      in the past:
      
      (&(&sighand->siglock)->rlock){-.-.}
      
         and interrupts could create inverse lock ordering between them.
      
      Possible interrupt unsafe locking scenario:
      
          CPU0                    CPU1
          ----                    ----
         lock(&st->lock);
                                 local_irq_disable();
                                 lock(&(&sighand->siglock)->rlock);
                                 lock(&st->lock);
          <Interrupt>
           lock(&(&sighand->siglock)->rlock);
           *** DEADLOCK ***
      
      The code path which connects those locks is:
      
         speculative_store_bypass_update()
         ssb_prctl_set()
         do_seccomp()
         do_syscall_64()
      
      In svm_vcpu_run() speculative_store_bypass_update() is called with
      interupts enabled via x86_virt_spec_ctrl_set_guest/host().
      
      This is actually a false positive, because GIF=0 so interrupts are
      disabled even if IF=1; however, we can easily move the invocations of
      x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to
      cure it, and it's a good idea to keep the GIF=0/IF=1 area as small
      and self-contained as possible.
      
      Fixes: 1f50ddb4 ("x86/speculation: Handle HT correctly on AMD")
      Reported-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: kvm@vger.kernel.org
      Cc: x86@kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      024d83ca
    • S
      KVM: vmx: Inject #UD for SGX ENCLS instruction in guest · 0b665d30
      Sean Christopherson 提交于
      Virtualization of Intel SGX depends on Enclave Page Cache (EPC)
      management that is not yet available in the kernel, i.e. KVM support
      for exposing SGX to a guest cannot be added until basic support
      for SGX is upstreamed, which is a WIP[1].
      
      Until SGX is properly supported in KVM, ensure a guest sees expected
      behavior for ENCLS, i.e. all ENCLS #UD.  Because SGX does not have a
      true software enable bit, e.g. there is no CR4.SGXE bit, the ENCLS
      instruction can be executed[1] by the guest if SGX is supported by the
      system.  Intercept all ENCLS leafs (via the ENCLS- exiting control and
      field) and unconditionally inject #UD.
      
      [1] https://www.spinics.net/lists/kvm/msg171333.html or
          https://lkml.org/lkml/2018/7/3/879
      
      [2] A guest can execute ENCLS in the sense that ENCLS will not take
          an immediate #UD, but no ENCLS will ever succeed in a guest
          without explicit support from KVM (map EPC memory into the guest),
          unless KVM has a *very* egregious bug, e.g. accidentally mapped
          EPC memory into the guest SPTEs.  In other words this patch is
          needed only to prevent the guest from seeing inconsistent behavior,
          e.g. #GP (SGX not enabled in Feature Control MSR) or #PF (leaf
          operand(s) does not point at EPC memory) instead of #UD on ENCLS.
          Intercepting ENCLS is not required to prevent the guest from truly
          utilizing SGX.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20180814163334.25724-3-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b665d30
    • S
      KVM: vmx: Add defines for SGX ENCLS exiting · 802ec461
      Sean Christopherson 提交于
      Hardware support for basic SGX virtualization adds a new execution
      control (ENCLS_EXITING), VMCS field (ENCLS_EXITING_BITMAP) and exit
      reason (ENCLS), that enables a VMM to intercept specific ENCLS leaf
      functions, e.g. to inject faults when the VMM isn't exposing SGX to
      a VM.  When ENCLS_EXITING is enabled, the VMM can set/clear bits in
      the bitmap to intercept/allow ENCLS leaf functions in non-root, e.g.
      setting bit 2 in the ENCLS_EXITING_BITMAP will cause ENCLS[EINIT]
      to VMExit(ENCLS).
      
      Note: EXIT_REASON_ENCLS was previously added by commit 1f519992
      ("KVM: VMX: add missing exit reasons").
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20180814163334.25724-2-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      802ec461
    • Y
      x86/kvm/vmx: Fix coding style in vmx_setup_l1d_flush() · d806afa4
      Yi Wang 提交于
      Substitute spaces with tab. No functional changes.
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Reviewed-by: NJiang Biao <jiang.biao2@zte.com.cn>
      Message-Id: <1534398159-48509-1-git-send-email-wang.yi59@zte.com.cn>
      Cc: stable@vger.kernel.org # L1TF
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d806afa4
    • A
      x86: kvm: avoid unused variable warning · 7288bde1
      Arnd Bergmann 提交于
      Removing one of the two accesses of the maxphyaddr variable led to
      a harmless warning:
      
      arch/x86/kvm/x86.c: In function 'kvm_set_mmio_spte_mask':
      arch/x86/kvm/x86.c:6563:6: error: unused variable 'maxphyaddr' [-Werror=unused-variable]
      
      Removing the #ifdef seems to be the nicest workaround, as it
      makes the code look cleaner than adding another #ifdef.
      
      Fixes: 28a1f3ac ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: stable@vger.kernel.org # L1TF
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7288bde1
  5. 21 8月, 2018 12 次提交
  6. 18 8月, 2018 2 次提交
    • S
      mm: convert return type of handle_mm_fault() caller to vm_fault_t · 50a7ca3c
      Souptick Joarder 提交于
      Use new return type vm_fault_t for fault handler.  For now, this is just
      documenting that the function returns a VM_FAULT value rather than an
      errno.  Once all instances are converted, vm_fault_t will become a
      distinct type.
      
      Ref-> commit 1c8f4220 ("mm: change return type to vm_fault_t")
      
      In this patch all the caller of handle_mm_fault() are changed to return
      vm_fault_t type.
      
      Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50a7ca3c
    • S
      x86/speculation/l1tf: Exempt zeroed PTEs from inversion · f19f5c49
      Sean Christopherson 提交于
      It turns out that we should *not* invert all not-present mappings,
      because the all zeroes case is obviously special.
      
      clear_page() does not undergo the XOR logic to invert the address bits,
      i.e. PTE, PMD and PUD entries that have not been individually written
      will have val=0 and so will trigger __pte_needs_invert(). As a result,
      {pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones
      (adjusted by the max PFN mask) instead of zero. A zeroed entry is ok
      because the page at physical address 0 is reserved early in boot
      specifically to mitigate L1TF, so explicitly exempt them from the
      inversion when reading the PFN.
      
      Manifested as an unexpected mprotect(..., PROT_NONE) failure when called
      on a VMA that has VM_PFNMAP and was mmap'd to as something other than
      PROT_NONE but never used. mprotect() sends the PROT_NONE request down
      prot_none_walk(), which walks the PTEs to check the PFNs.
      prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns
      -EACCES because it thinks mprotect() is trying to adjust a high MMIO
      address.
      
      [ This is a very modified version of Sean's original patch, but all
        credit goes to Sean for doing this and also pointing out that
        sometimes the __pte_needs_invert() function only gets the protection
        bits, not the full eventual pte.  But zero remains special even in
        just protection bits, so that's ok.   - Linus ]
      
      Fixes: f22cc87f ("x86/speculation/l1tf: Invert all not present mappings")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f19f5c49
  7. 17 8月, 2018 1 次提交
  8. 16 8月, 2018 2 次提交
  9. 15 8月, 2018 6 次提交