1. 04 8月, 2022 1 次提交
  2. 08 7月, 2022 1 次提交
    • S
      x86/mm: Signal SIGSEGV with PF_SGX · 84ddf6b0
      Sean Christopherson 提交于
      mainline inclusion
      from mainline-5.11
      commit 74faeee0
      category: feature
      bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZEK
      CVE: NA
      
      Intel-SIG: commit 74faeee0 x86/mm: Signal SIGSEGV with PF_SGX.
      Backport for SGX Foundations support
      
      --------------------------------
      
      The x86 architecture has a set of page fault error codes.  These indicate
      things like whether the fault occurred from a write, or whether it
      originated in userspace.
      
      The SGX hardware architecture has its own per-page memory management
      metadata (EPCM) [*] and hardware which is separate from the normal x86 MMU.
      The architecture has a new page fault error code: PF_SGX.  This new error
      code bit is set whenever a page fault occurs as the result of the SGX MMU.
      
      These faults occur for a variety of reasons.  For instance, an access
      attempt to enclave memory from outside the enclave causes a PF_SGX fault.
      PF_SGX would also be set for permission conflicts, such as if a write to an
      enclave page occurs and the page is marked read-write in the x86 page
      tables but is read-only in the EPCM.
      
      These faults do not always indicate errors, though.  SGX pages are
      encrypted with a key that is destroyed at hardware reset, including
      suspend. Throwing a SIGSEGV allows user space software to react and recover
      when these events occur.
      
      Include PF_SGX in the PF error codes list and throw SIGSEGV when it is
      encountered.
      
      [*] Intel SDM: 36.5.1 Enclave Page Cache Map (EPCM)
      
       [ bp: Add bit 15 to the comment above enum x86_pf_error_code too. ]
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NJethro Beekman <jethro@fortanix.com>
      Link: https://lkml.kernel.org/r/20201112220135.165028-7-jarkko@kernel.orgSigned-off-by: NFan Du <fan.du@intel.com>
      Signed-off-by: NZhiquan Li <zhiquan1.li@intel.com>
      84ddf6b0
  3. 22 2月, 2022 2 次提交
  4. 21 10月, 2021 3 次提交
  5. 09 4月, 2021 1 次提交
  6. 22 10月, 2020 1 次提交
    • V
      x86/kvm: Update the comment about asynchronous page fault in exc_page_fault() · 66af4f5c
      Vitaly Kuznetsov 提交于
      KVM was switched to interrupt-based mechanism for 'page ready' event
      delivery in Linux-5.8 (see commit 2635b5c4 ("KVM: x86: interrupt based
      APF 'page ready' event delivery")) and #PF (ab)use for 'page ready' event
      delivery was removed. Linux guest switched to this new mechanism
      exclusively in 5.9 (see commit b1d40575 ("KVM: x86: Switch KVM guest to
      using interrupts for page ready APF delivery")) so it is not possible to
      get #PF for a 'page ready' event even when the guest is running on top
      of an older KVM (APF mechanism won't be enabled). Update the comment in
      exc_page_fault() to reflect the new reality.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20201002154313.1505327-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      66af4f5c
  7. 07 10月, 2020 1 次提交
  8. 03 9月, 2020 1 次提交
    • J
      x86/mm/32: Bring back vmalloc faulting on x86_32 · 4819e15f
      Joerg Roedel 提交于
      One can not simply remove vmalloc faulting on x86-32. Upstream
      
      	commit: 7f0a002b ("x86/mm: remove vmalloc faulting")
      
      removed it on x86 alltogether because previously the
      arch_sync_kernel_mappings() interface was introduced. This interface
      added synchronization of vmalloc/ioremap page-table updates to all
      page-tables in the system at creation time and was thought to make
      vmalloc faulting obsolete.
      
      But that assumption was incredibly naive.
      
      It turned out that there is a race window between the time the vmalloc
      or ioremap code establishes a mapping and the time it synchronizes
      this change to other page-tables in the system.
      
      During this race window another CPU or thread can establish a vmalloc
      mapping which uses the same intermediate page-table entries (e.g. PMD
      or PUD) and does no synchronization in the end, because it found all
      necessary mappings already present in the kernel reference page-table.
      
      But when these intermediate page-table entries are not yet
      synchronized, the other CPU or thread will continue with a vmalloc
      address that is not yet mapped in the page-table it currently uses,
      causing an unhandled page fault and oops like below:
      
      	BUG: unable to handle page fault for address: fe80c000
      	#PF: supervisor write access in kernel mode
      	#PF: error_code(0x0002) - not-present page
      	*pde = 33183067 *pte = a8648163
      	Oops: 0002 [#1] SMP
      	CPU: 1 PID: 13514 Comm: cve-2017-17053 Tainted: G
      	...
      	Call Trace:
      	 ldt_dup_context+0x66/0x80
      	 dup_mm+0x2b3/0x480
      	 copy_process+0x133b/0x15c0
      	 _do_fork+0x94/0x3e0
      	 __ia32_sys_clone+0x67/0x80
      	 __do_fast_syscall_32+0x3f/0x70
      	 do_fast_syscall_32+0x29/0x60
      	 do_SYSENTER_32+0x15/0x20
      	 entry_SYSENTER_32+0x9f/0xf2
      	EIP: 0xb7eef549
      
      So the arch_sync_kernel_mappings() interface is racy, but removing it
      would mean to re-introduce the vmalloc_sync_all() interface, which is
      even more awful. Keep arch_sync_kernel_mappings() in place and catch
      the race condition in the page-fault handler instead.
      
      Do a partial revert of above commit to get vmalloc faulting on x86-32
      back in place.
      
      Fixes: 7f0a002b ("x86/mm: remove vmalloc faulting")
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200902155904.17544-1-joro@8bytes.org
      4819e15f
  9. 13 8月, 2020 2 次提交
    • P
      mm/x86: use general page fault accounting · 968614fc
      Peter Xu 提交于
      Use the general page fault accounting by passing regs into
      handle_mm_fault().
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/20200707225021.200906-23-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      968614fc
    • P
      mm: do page fault accounting in handle_mm_fault · bce617ed
      Peter Xu 提交于
      Patch series "mm: Page fault accounting cleanups", v5.
      
      This is v5 of the pf accounting cleanup series.  It originates from Gerald
      Schaefer's report on an issue a week ago regarding to incorrect page fault
      accountings for retried page fault after commit 4064b982 ("mm: allow
      VM_FAULT_RETRY for multiple times"):
      
        https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/
      
      What this series did:
      
        - Correct page fault accounting: we do accounting for a page fault
          (no matter whether it's from #PF handling, or gup, or anything else)
          only with the one that completed the fault.  For example, page fault
          retries should not be counted in page fault counters.  Same to the
          perf events.
      
        - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
          event is used in an adhoc way across different archs.
      
          Case (1): for many archs it's done at the entry of a page fault
          handler, so that it will also cover e.g.  errornous faults.
      
          Case (2): for some other archs, it is only accounted when the page
          fault is resolved successfully.
      
          Case (3): there're still quite some archs that have not enabled
          this perf event.
      
          Since this series will touch merely all the archs, we unify this
          perf event to always follow case (1), which is the one that makes most
          sense.  And since we moved the accounting into handle_mm_fault, the
          other two MAJ/MIN perf events are well taken care of naturally.
      
        - Unify definition of "major faults": the definition of "major
          fault" is slightly changed when used in accounting (not
          VM_FAULT_MAJOR).  More information in patch 1.
      
        - Always account the page fault onto the one that triggered the page
          fault.  This does not matter much for #PF handlings, but mostly for
          gup.  More information on this in patch 25.
      
      Patchset layout:
      
      Patch 1:     Introduced the accounting in handle_mm_fault(), not enabled.
      Patch 2-23:  Enable the new accounting for arch #PF handlers one by one.
      Patch 24:    Enable the new accounting for the rest outliers (gup, iommu, etc.)
      Patch 25:    Cleanup GUP task_struct pointer since it's not needed any more
      
      This patch (of 25):
      
      This is a preparation patch to move page fault accountings into the
      general code in handle_mm_fault().  This includes both the per task
      flt_maj/flt_min counters, and the major/minor page fault perf events.  To
      do this, the pt_regs pointer is passed into handle_mm_fault().
      
      PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
      handlers.
      
      So far, all the pt_regs pointer that passed into handle_mm_fault() is
      NULL, which means this patch should have no intented functional change.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
      Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bce617ed
  10. 08 8月, 2020 1 次提交
    • M
      mm: remove unneeded includes of <asm/pgalloc.h> · ca15ca40
      Mike Rapoport 提交于
      Patch series "mm: cleanup usage of <asm/pgalloc.h>"
      
      Most architectures have very similar versions of pXd_alloc_one() and
      pXd_free_one() for intermediate levels of page table.  These patches add
      generic versions of these functions in <asm-generic/pgalloc.h> and enable
      use of the generic functions where appropriate.
      
      In addition, functions declared and defined in <asm/pgalloc.h> headers are
      used mostly by core mm and early mm initialization in arch and there is no
      actual reason to have the <asm/pgalloc.h> included all over the place.
      The first patch in this series removes unneeded includes of
      <asm/pgalloc.h>
      
      In the end it didn't work out as neatly as I hoped and moving
      pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
      unnecessary changes to arches that have custom page table allocations, so
      I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
      to mm/.
      
      This patch (of 8):
      
      In most cases <asm/pgalloc.h> header is required only for allocations of
      page table memory.  Most of the .c files that include that header do not
      use symbols declared in <asm/pgalloc.h> and do not require that header.
      
      As for the other header files that used to include <asm/pgalloc.h>, it is
      possible to move that include into the .c file that actually uses symbols
      from <asm/pgalloc.h> and drop the include from the header file.
      
      The process was somewhat automated using
      
      	sed -i -E '/[<"]asm\/pgalloc\.h/d' \
                      $(grep -L -w -f /tmp/xx \
                              $(git grep -E -l '[<"]asm/pgalloc\.h'))
      
      where /tmp/xx contains all the symbols defined in
      arch/*/include/asm/pgalloc.h.
      
      [rppt@linux.ibm.com: fix powerpc warning]
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca15ca40
  11. 24 7月, 2020 1 次提交
  12. 07 7月, 2020 1 次提交
  13. 19 6月, 2020 1 次提交
  14. 18 6月, 2020 1 次提交
  15. 11 6月, 2020 4 次提交
  16. 10 6月, 2020 3 次提交
  17. 03 6月, 2020 3 次提交
  18. 19 5月, 2020 1 次提交
  19. 08 4月, 2020 1 次提交
  20. 03 4月, 2020 3 次提交
    • P
      mm: allow VM_FAULT_RETRY for multiple times · 4064b982
      Peter Xu 提交于
      The idea comes from a discussion between Linus and Andrea [1].
      
      Before this patch we only allow a page fault to retry once.  We achieved
      this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
      handle_mm_fault() the second time.  This was majorly used to avoid
      unexpected starvation of the system by looping over forever to handle the
      page fault on a single page.  However that should hardly happen, and after
      all for each code path to return a VM_FAULT_RETRY we'll first wait for a
      condition (during which time we should possibly yield the cpu) to happen
      before VM_FAULT_RETRY is really returned.
      
      This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
      flag when we receive VM_FAULT_RETRY.  It means that the page fault handler
      now can retry the page fault for multiple times if necessary without the
      need to generate another page fault event.  Meanwhile we still keep the
      FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
      page fault is the first attempt or not.
      
      Then we'll have these combinations of fault flags (only considering
      ALLOW_RETRY flag and TRIED flag):
      
        - ALLOW_RETRY and !TRIED:  this means the page fault allows to
                                   retry, and this is the first try
      
        - ALLOW_RETRY and TRIED:   this means the page fault allows to
                                   retry, and this is not the first try
      
        - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
                                   to retry at all
      
        - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used
      
      In existing code we have multiple places that has taken special care of
      the first condition above by checking against (fault_flags &
      FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to detect
      the first retry of a page fault by checking against both (fault_flags &
      FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
      even the 2nd try will have the ALLOW_RETRY set, then use that helper in
      all existing special paths.  One example is in __lock_page_or_retry(), now
      we'll drop the mmap_sem only in the first attempt of page fault and we'll
      keep it in follow up retries, so old locking behavior will be retained.
      
      This will be a nice enhancement for current code [2] at the same time a
      supporting material for the future userfaultfd-writeprotect work, since in
      that work there will always be an explicit userfault writeprotect retry
      for protected pages, and if that cannot resolve the page fault (e.g., when
      userfaultfd-writeprotect is used in conjunction with swapped pages) then
      we'll possibly need a 3rd retry of the page fault.  It might also benefit
      other potential users who will have similar requirement like userfault
      write-protection.
      
      GUP code is not touched yet and will be covered in follow up patch.
      
      Please read the thread below for more information.
      
      [1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
      [2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4064b982
    • P
      mm: introduce FAULT_FLAG_DEFAULT · dde16072
      Peter Xu 提交于
      Although there're tons of arch-specific page fault handlers, most of them
      are still sharing the same initial value of the page fault flags.  Say,
      merely all of the page fault handlers would allow the fault to be retried,
      and they also allow the fault to respond to SIGKILL.
      
      Let's define a default value for the fault flags to replace those initial
      page fault flags that were copied over.  With this, it'll be far easier to
      introduce new fault flag that can be used by all the architectures instead
      of touching all the archs.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dde16072
    • P
      x86/mm: use helper fault_signal_pending() · 39678191
      Peter Xu 提交于
      Let's move the fatal signal check even earlier so that we can directly use
      the new fault_signal_pending() in x86 mm code.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220155353.8676-5-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39678191
  21. 22 3月, 2020 1 次提交
  22. 07 1月, 2020 1 次提交
    • F
      x86/context-tracking: Remove exception_enter/exit() from do_page_fault() · ee6352b2
      Frederic Weisbecker 提交于
      do_page_fault(), like other exceptions, is already covered by
      user_enter() and user_exit() when the exception triggers in userspace.
      
      As explained in:
      
        8c84014f ("x86/entry: Remove exception_enter() from most trap handlers")
      
      exception_enter/exit() only remained to handle possible page fault from
      kernel mode while context tracking is in CONTEXT_USER mode, ie: on
      kernel entry before we manage to call user_exit(). The only known
      offender was do_fast_syscall_32() fetching EBP register from where
      vDSO stashed it.
      
      Meanwhile this got fixed in:
      
        9999c8c0 ("x86/entry: Call enter_from_user_mode() with IRQs off")
      
      that moved enter_from_user_mode() before the call to get_user().
      
      So we can safely remove it now.
      Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Link: https://lkml.kernel.org/r/20191227163612.10039-2-frederic@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ee6352b2
  23. 10 12月, 2019 1 次提交
    • I
      mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions · 186525bd
      Ingo Molnar 提交于
      - Untangle the somewhat incestous way of how VMALLOC_START is used all across the
        kernel, but is, on x86, defined deep inside one of the lowest level page table headers.
        It doesn't help that vmalloc.h only includes a single asm header:
      
           #include <asm/page.h>           /* pgprot_t */
      
        So there was no existing cross-arch way to decouple address layout
        definitions from page.h details. I used this:
      
         #ifndef VMALLOC_START
         # include <asm/vmalloc.h>
         #endif
      
        This way every architecture that wants to simplify page.h can do so.
      
      - Also on x86 we had a couple of LDT related inline functions that used
        the late-stage address space layout positions - but these could be
        uninlined without real trouble - the end result is cleaner this way as
        well.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      186525bd
  24. 27 11月, 2019 1 次提交
    • J
      x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all() · 9a62d200
      Joerg Roedel 提交于
      The job of vmalloc_sync_all() is to help the lazy freeing of vmalloc()
      ranges: before such vmap ranges are reused we make sure that they are
      unmapped from every task's page tables.
      
      This is really easy on pagetable setups where the kernel page tables
      are shared between all tasks - this is the case on 32-bit kernels
      with SHARED_KERNEL_PMD = 1.
      
      But on !SHARED_KERNEL_PMD 32-bit kernels this involves iterating
      over the pgd_list and clearing all pmd entries in the pgds that
      are cleared in the init_mm.pgd, which is the reference pagetable
      that the vmalloc() code uses.
      
      In that context the current practice of vmalloc_sync_all() iterating
      until FIX_ADDR_TOP is buggy:
      
              for (address = VMALLOC_START & PMD_MASK;
                   address >= TASK_SIZE_MAX && address < FIXADDR_TOP;
                   address += PMD_SIZE) {
                      struct page *page;
      
      Because iterating up to FIXADDR_TOP will involve a lot of non-vmalloc
      address ranges:
      
      	VMALLOC -> PKMAP -> LDT -> CPU_ENTRY_AREA -> FIX_ADDR
      
      This is mostly harmless for the FIX_ADDR and CPU_ENTRY_AREA ranges
      that don't clear their pmds, but it's lethal for the LDT range,
      which relies on having different mappings in different processes,
      and 'synchronizing' them in the vmalloc sense corrupts those
      pagetable entries (clearing them).
      
      This got particularly prominent with PTI, which turns SHARED_KERNEL_PMD
      off and makes this the dominant mapping mode on 32-bit.
      
      To make LDT working again vmalloc_sync_all() must only iterate over
      the volatile parts of the kernel address range that are identical
      between all processes.
      
      So the correct check in vmalloc_sync_all() is "address < VMALLOC_END"
      to make sure the VMALLOC areas are synchronized and the LDT
      mapping is not falsely overwritten.
      
      The CPU_ENTRY_AREA and the FIXMAP area are no longer synced either,
      but this is not really a proplem since their PMDs get established
      during bootup and never change.
      
      This change fixes the ldt_gdt selftest in my setup.
      
      [ mingo: Fixed up the changelog to explain the logic and modified the
               copying to only happen up until VMALLOC_END. ]
      Reported-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Fixes: 7757d607: ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32")
      Link: https://lkml.kernel.org/r/20191126111119.GA110513@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9a62d200
  25. 22 7月, 2019 2 次提交
  26. 18 7月, 2019 1 次提交