1. 19 2月, 2016 2 次提交
    • D
      mm/core, x86/mm/pkeys: Differentiate instruction fetches · d61172b4
      Dave Hansen 提交于
      As discussed earlier, we attempt to enforce protection keys in
      software.
      
      However, the code checks all faults to ensure that they are not
      violating protection key permissions.  It was assumed that all
      faults are either write faults where we check PKRU[key].WD (write
      disable) or read faults where we check the AD (access disable)
      bit.
      
      But, there is a third category of faults for protection keys:
      instruction faults.  Instruction faults never run afoul of
      protection keys because they do not affect instruction fetches.
      
      So, plumb the PF_INSTR bit down in to the
      arch_vma_access_permitted() function where we do the protection
      key checks.
      
      We also add a new FAULT_FLAG_INSTRUCTION.  This is because
      handle_mm_fault() is not passed the architecture-specific
      error_code where we keep PF_INSTR, so we need to encode the
      instruction fetch information in to the arch-generic fault
      flags.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d61172b4
    • D
      x86/mm/pkeys: Optimize fault handling in access_error() · 07f146f5
      Dave Hansen 提交于
      We might not strictly have to make modifictions to
      access_error() to check the VMA here.
      
      If we do not, we will do this:
      
       1. app sets VMA pkey to K
       2. app touches a !present page
       3. do_page_fault(), allocates and maps page, sets pte.pkey=K
       4. return to userspace
       5. touch instruction reexecutes, but triggers PF_PK
       6. do PKEY signal
      
      What happens with this patch applied:
      
       1. app sets VMA pkey to K
       2. app touches a !present page
       3. do_page_fault() notices that K is inaccessible
       4. do PKEY signal
      
      We basically skip the fault that does an allocation.
      
      So what this lets us do is protect areas from even being
      *populated* unless it is accessible according to protection
      keys.  That seems handy to me and makes protection keys work
      more like an mprotect()'d mapping.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210222.EBB63D8C@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      07f146f5
  2. 18 2月, 2016 4 次提交
    • D
      mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys · 33a709b2
      Dave Hansen 提交于
      Today, for normal faults and page table walks, we check the VMA
      and/or PTE to ensure that it is compatible with the action.  For
      instance, if we get a write fault on a non-writeable VMA, we
      SIGSEGV.
      
      We try to do the same thing for protection keys.  Basically, we
      try to make sure that if a user does this:
      
      	mprotect(ptr, size, PROT_NONE);
      	*ptr = foo;
      
      they see the same effects with protection keys when they do this:
      
      	mprotect(ptr, size, PROT_READ|PROT_WRITE);
      	set_pkey(ptr, size, 4);
      	wrpkru(0xffffff3f); // access disable pkey 4
      	*ptr = foo;
      
      The state to do that checking is in the VMA, but we also
      sometimes have to do it on the page tables only, like when doing
      a get_user_pages_fast() where we have no VMA.
      
      We add two functions and expose them to generic code:
      
      	arch_pte_access_permitted(pte_flags, write)
      	arch_vma_access_permitted(vma, write)
      
      These are, of course, backed up in x86 arch code with checks
      against the PTE or VMA's protection key.
      
      But, there are also cases where we do not want to respect
      protection keys.  When we ptrace(), for instance, we do not want
      to apply the tracer's PKRU permissions to the PTEs from the
      process being traced.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-s390@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      33a709b2
    • D
      x86/mm/pkeys: Fill in pkey field in siginfo · 019132ff
      Dave Hansen 提交于
      This fills in the new siginfo field: si_pkey to indicate to
      userspace which protection key was set on the PTE that we faulted
      on.
      
      Note though that *ALL* protection key faults have to be generated
      by a valid, present PTE at some point.  But this code does no PTE
      lookups which seeds odd.  The reason is that we take advantage of
      the way we generate PTEs from VMAs.  All PTEs under a VMA share
      some attributes.  For instance, they are _all_ either PROT_READ
      *OR* PROT_NONE.  They also always share a protection key, so we
      never have to walk the page tables; we just use the VMA.
      
      Note that _pkey is a 64-bit value.  The current hardware only
      supports 4-bit protection keys.  We do this because there is
      _plenty_ of space in _sigfault and it is possible that future
      processors would support more than 4 bits of protection keys.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210213.ABC488FA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      019132ff
    • D
      x86/mm/pkeys: Pass VMA down in to fault signal generation code · 7b2d0dba
      Dave Hansen 提交于
      During a page fault, we look up the VMA to ensure that the fault
      is in a region with a valid mapping.  But, in the top-level page
      fault code we don't need the VMA for much else.  Once we have
      decided that an access is bad, we are going to send a signal no
      matter what and do not need the VMA any more.  So we do not pass
      it down in to the signal generation code.
      
      But, for protection keys, we need the VMA.  It tells us *which*
      protection key we violated if we get a PF_PK.  So, we need to
      pass the VMA down and fill in siginfo->si_pkey.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210211.AD3B36A3@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7b2d0dba
    • D
      x86/mm/pkeys: Add new 'PF_PK' page fault error code bit · b3ecd515
      Dave Hansen 提交于
      Note: "PK" is how the Intel SDM refers to this bit, so we also
      use that nomenclature.
      
      This only defines the bit, it does not plumb it anywhere to be
      handled.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210207.DA7B43E6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3ecd515
  3. 31 7月, 2015 2 次提交
  4. 19 5月, 2015 1 次提交
    • D
      mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler · 70ffdb93
      David Hildenbrand 提交于
      Introduce faulthandler_disabled() and use it to check for irq context and
      disabled pagefaults (via pagefault_disable()) in the pagefault handlers.
      
      Please note that we keep the in_atomic() checks in place - to detect
      whether in irq context (in which case preemption is always properly
      disabled).
      
      In contrast, preempt_disable() should never be used to disable pagefaults.
      With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
      counter, and therefore the result of in_atomic() differs.
      We validate that condition by using might_fault() checks when calling
      might_sleep().
      
      Therefore, add a comment to faulthandler_disabled(), describing why this
      is needed.
      
      faulthandler_disabled() and pagefault_disable() are defined in
      linux/uaccess.h, so let's properly add that include to all relevant files.
      
      This patch is based on a patch from Thomas Gleixner.
      Reviewed-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: airlied@linux.ie
      Cc: akpm@linux-foundation.org
      Cc: benh@kernel.crashing.org
      Cc: bigeasy@linutronix.de
      Cc: borntraeger@de.ibm.com
      Cc: daniel.vetter@intel.com
      Cc: heiko.carstens@de.ibm.com
      Cc: herbert@gondor.apana.org.au
      Cc: hocko@suse.cz
      Cc: hughd@google.com
      Cc: mst@redhat.com
      Cc: paulus@samba.org
      Cc: ralf@linux-mips.org
      Cc: schwidefsky@de.ibm.com
      Cc: yang.shi@windriver.com
      Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      70ffdb93
  5. 23 3月, 2015 2 次提交
  6. 04 2月, 2015 1 次提交
  7. 30 1月, 2015 1 次提交
    • L
      vm: add VM_FAULT_SIGSEGV handling support · 33692f27
      Linus Torvalds 提交于
      The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
      "you should SIGSEGV" error, because the SIGSEGV case was generally
      handled by the caller - usually the architecture fault handler.
      
      That results in lots of duplication - all the architecture fault
      handlers end up doing very similar "look up vma, check permissions, do
      retries etc" - but it generally works.  However, there are cases where
      the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
      
      In particular, when accessing the stack guard page, libsigsegv expects a
      SIGSEGV.  And it usually got one, because the stack growth is handled by
      that duplicated architecture fault handler.
      
      However, when the generic VM layer started propagating the error return
      from the stack expansion in commit fee7e49d ("mm: propagate error
      from stack expansion even for guard page"), that now exposed the
      existing VM_FAULT_SIGBUS result to user space.  And user space really
      expected SIGSEGV, not SIGBUS.
      
      To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
      duplicate architecture fault handlers about it.  They all already have
      the code to handle SIGSEGV, so it's about just tying that new return
      value to the existing code, but it's all a bit annoying.
      
      This is the mindless minimal patch to do this.  A more extensive patch
      would be to try to gather up the mostly shared fault handling logic into
      one generic helper routine, and long-term we really should do that
      cleanup.
      
      Just from this patch, you can generally see that most architectures just
      copied (directly or indirectly) the old x86 way of doing things, but in
      the meantime that original x86 model has been improved to hold the VM
      semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
      "newer" things, so it would be a good idea to bring all those
      improvements to the generic case and teach other architectures about
      them too.
      Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
      Tested-by: NJan Engelhardt <jengelh@inai.de>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33692f27
  8. 18 12月, 2014 1 次提交
  9. 16 12月, 2014 2 次提交
    • L
      x86: mm: consolidate VM_FAULT_RETRY handling · 26178ec1
      Linus Torvalds 提交于
      The VM_FAULT_RETRY handling was confusing and incorrect for the case of
      returning to kernel mode.  We need to handle the exception table fixup
      if we return to kernel mode due to a fatal signal - it will basically
      look to the kernel user mode access like the access failed due to the VM
      going away from udner it.  Which is correct - the process is dying - and
      avoids the whole "repeat endless kernel page faults" case.
      
      Handling the VM_FAULT_RETRY early and in just one place also simplifies
      the mmap_sem handling, since once we've taken care of VM_FAULT_RETRY we
      know that we can just drop the lock.  The remaining accounting and
      possible error handling is thread-local and does not need the mmap_sem.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26178ec1
    • L
      x86: mm: move mmap_sem unlock from mm_fault_error() to caller · 7fb08eca
      Linus Torvalds 提交于
      This replaces four copies in various stages of mm_fault_error() handling
      with just a single one.  It will also allow for more natural placement
      of the unlocking after some further cleanup.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7fb08eca
  10. 23 9月, 2014 1 次提交
    • D
      x86: skip check for spurious faults for non-present faults · 31668511
      David Vrabel 提交于
      If a fault on a kernel address is due to a non-present page, then it
      cannot be the result of stale TLB entry from a protection change (RO
      to RW or NX to X).  Thus the pagetable walk in spurious_fault() can be
      skipped.
      
      See the initial if in spurious_fault() and the tests in
      spurious_fault_check()) for the set of possible error codes checked
      for spurious faults.  These are:
      
               IRUWP
      Before   x00xx && ( 1xxxx || xxx1x )
      After  ( 10001 || 00011 ) && ( 1xxxx || xxx1x )
      
      Thus the new condition is a subset of the previous one, excluding only
      non-present faults (I == 1 and W == 1 are mutually exclusive).
      
      This avoids spurious_fault() oopsing in some cases if the pagetables
      it attempts to walk are not accessible.  This obscures the location of
      the original fault.
      
      This also fixes a crash with Xen PV guests when they access entries in
      the M2P corresponding to device MMIO regions.  The M2P is mapped
      (read-only) by Xen into the kernel address space of the guest and this
      mapping may contains holes for non-RAM regions.  Read faults will
      result in calls to spurious_fault(), but because the page tables for
      the M2P mappings are not accessible by the guest the pagetable walk
      would fault.
      
      This was not normally a problem as MMIO mappings would not normally
      result in a M2P lookup because of the use of the _PAGE_IOMAP bit the
      PTE.  However, removing the _PAGE_IOMAP bit requires M2P lookups for
      MMIO mappings as well.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      31668511
  11. 19 9月, 2014 2 次提交
    • A
      sched: Add helper for task stack page overrun checking · a70857e4
      Aaron Tomlin 提交于
      This facility is used in a few places so let's introduce
      a helper function to improve code readability.
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dzickus@redhat.com
      Cc: bmr@redhat.com
      Cc: jcastillo@redhat.com
      Cc: oleg@redhat.com
      Cc: riel@redhat.com
      Cc: prarit@redhat.com
      Cc: jgh@redhat.com
      Cc: minchan@kernel.org
      Cc: mpe@ellerman.id.au
      Cc: tglx@linutronix.de
      Cc: hannes@cmpxchg.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1410527779-8133-3-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a70857e4
    • A
      init/main.c: Give init_task a canary · d4311ff1
      Aaron Tomlin 提交于
      Tasks get their end of stack set to STACK_END_MAGIC with the
      aim to catch stack overruns. Currently this feature does not
      apply to init_task. This patch removes this restriction.
      
      Note that a similar patch was posted by Prarit Bhargava
      some time ago but was never merged:
      
        http://marc.info/?l=linux-kernel&m=127144305403241&w=2Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dzickus@redhat.com
      Cc: bmr@redhat.com
      Cc: jcastillo@redhat.com
      Cc: jgh@redhat.com
      Cc: minchan@kernel.org
      Cc: tglx@linutronix.de
      Cc: hannes@cmpxchg.org
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Daeseok Youn <daeseok.youn@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4311ff1
  12. 16 9月, 2014 1 次提交
    • Y
      x86/mm/hotplug: Modify PGD entry when removing memory · 9661d5bc
      Yasuaki Ishimatsu 提交于
      When hot-adding/removing memory, sync_global_pgds() is called
      for synchronizing PGD to PGD entries of all processes MM.  But
      when hot-removing memory, sync_global_pgds() does not work
      correctly.
      
      At first, sync_global_pgds() checks whether target PGD is none
      or not.  And if PGD is none, the PGD is skipped.  But when
      hot-removing memory, PGD may be none since PGD may be cleared by
      free_pud_table().  So when sync_global_pgds() is called after
      hot-removing memory, sync_global_pgds() should not skip PGD even
      if the PGD is none.  And sync_global_pgds() must clear PGD
      entries of all processes MM.
      
      Currently sync_global_pgds() does not clear PGD entries of all
      processes MM when hot-removing memory.  So when hot adding
      memory which is same memory range as removed memory after
      hot-removing memory, following call traces are shown:
      
       kernel BUG at arch/x86/mm/init_64.c:206!
       ...
       [<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2
       [<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380
       [<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0
       [<ffffffff815d03d9>] add_memory+0xb9/0x1b0
       [<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e
       [<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0
       [<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f
       [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7
       [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7
       [<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5
       [<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2
       [<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e
       [<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15
       [<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32
       [<ffffffff8107e02b>] process_one_work+0x17b/0x460
       [<ffffffff8107edfb>] worker_thread+0x11b/0x400
       [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400
       [<ffffffff81085aef>] kthread+0xcf/0xe0
       [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0
       [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140
      
      This patch clears PGD entries of all processes MM when
      sync_global_pgds() is called after hot-removing memory
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9661d5bc
  13. 07 8月, 2014 1 次提交
  14. 12 6月, 2014 1 次提交
  15. 06 5月, 2014 1 次提交
  16. 24 4月, 2014 1 次提交
    • M
      kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation · 9326638c
      Masami Hiramatsu 提交于
      Use NOKPROBE_SYMBOL macro for protecting functions
      from kprobes instead of __kprobes annotation under
      arch/x86.
      
      This applies nokprobe_inline annotation for some cases,
      because NOKPROBE_SYMBOL() will inhibit inlining by
      referring the symbol address.
      
      This just folds a bunch of previous NOKPROBE_SYMBOL()
      cleanup patches for x86 to one patch.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Link: http://lkml.kernel.org/r/20140417081814.26341.51656.stgit@ltc230.yrl.intra.hitachi.co.jp
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fernando Luis Vázquez Cao <fernando_b1@lab.ntt.co.jp>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jonathan Lebon <jlebon@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9326638c
  17. 07 3月, 2014 1 次提交
  18. 05 3月, 2014 2 次提交
  19. 14 2月, 2014 1 次提交
  20. 16 1月, 2014 1 次提交
  21. 13 11月, 2013 1 次提交
    • J
      x86/dumpstack: Fix printk_address for direct addresses · 5f01c988
      Jiri Slaby 提交于
      Consider a kernel crash in a module, simulated the following way:
      
       static int my_init(void)
       {
               char *map = (void *)0x5;
               *map = 3;
               return 0;
       }
       module_init(my_init);
      
      When we turn off FRAME_POINTERs, the very first instruction in
      that function causes a BUG. The problem is that we print IP in
      the BUG report using %pB (from printk_address). And %pB
      decrements the pointer by one to fix printing addresses of
      functions with tail calls.
      
      This was added in commit 71f9e598 ("x86, dumpstack: Use
      %pB format specifier for stack trace") to fix the call stack
      printouts.
      
      So instead of correct output:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]
      
      We get:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa0152000>] 0xffffffffa0151fff
      
      To fix that, we use %pS only for stack addresses printouts (via
      newly added printk_stack_address) and %pB for regs->ip (via
      printk_address). I.e. we revert to the old behaviour for all
      except call stacks. And since from all those reliable is 1, we
      remove that parameter from printk_address.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: joe@perches.com
      Cc: jirislaby@gmail.com
      Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5f01c988
  22. 12 11月, 2013 1 次提交
  23. 09 11月, 2013 2 次提交
  24. 29 10月, 2013 1 次提交
    • P
      perf/x86: Further optimize copy_from_user_nmi() · e00b12e6
      Peter Zijlstra 提交于
      Now that we can deal with nested NMI due to IRET re-enabling NMIs and
      can deal with faults from NMI by making sure we preserve CR2 over NMIs
      we can in fact simply access user-space memory from NMI context.
      
      So rewrite copy_from_user_nmi() to use __copy_from_user_inatomic() and
      rework the fault path to do the minimal required work before taking
      the in_atomic() fault handler.
      
      In particular avoid perf_sw_event() which would make perf recurse on
      itself (it should be harmless as our recursion protections should be
      able to deal with this -- but why tempt fate).
      
      Also rename notify_page_fault() to kprobes_fault() as that is a much
      better name; there is no notifier in it and its specific to kprobes.
      
      Don measured that his worst case NMI path shrunk from ~300K cycles to
      ~150K cycles.
      
      Cc: Stephane Eranian <eranian@google.com>
      Cc: jmario@redhat.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: dave.hansen@linux.intel.com
      Tested-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20131024105206.GM2490@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e00b12e6
  25. 13 9月, 2013 2 次提交
  26. 11 4月, 2013 1 次提交
    • S
      x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates · 1160c277
      Samu Kallio 提交于
      In paravirtualized x86_64 kernels, vmalloc_fault may cause an oops
      when lazy MMU updates are enabled, because set_pgd effects are being
      deferred.
      
      One instance of this problem is during process mm cleanup with memory
      cgroups enabled. The chain of events is as follows:
      
      - zap_pte_range enables lazy MMU updates
      - zap_pte_range eventually calls mem_cgroup_charge_statistics,
        which accesses the vmalloc'd mem_cgroup per-cpu stat area
      - vmalloc_fault is triggered which tries to sync the corresponding
        PGD entry with set_pgd, but the update is deferred
      - vmalloc_fault oopses due to a mismatch in the PUD entries
      
      The OOPs usually looks as so:
      
      ------------[ cut here ]------------
      kernel BUG at arch/x86/mm/fault.c:396!
      invalid opcode: 0000 [#1] SMP
      .. snip ..
      CPU 1
      Pid: 10866, comm: httpd Not tainted 3.6.10-4.fc18.x86_64 #1
      RIP: e030:[<ffffffff816271bf>]  [<ffffffff816271bf>] vmalloc_fault+0x11f/0x208
      .. snip ..
      Call Trace:
       [<ffffffff81627759>] do_page_fault+0x399/0x4b0
       [<ffffffff81004f4c>] ? xen_mc_extend_args+0xec/0x110
       [<ffffffff81624065>] page_fault+0x25/0x30
       [<ffffffff81184d03>] ? mem_cgroup_charge_statistics.isra.13+0x13/0x50
       [<ffffffff81186f78>] __mem_cgroup_uncharge_common+0xd8/0x350
       [<ffffffff8118aac7>] mem_cgroup_uncharge_page+0x57/0x60
       [<ffffffff8115fbc0>] page_remove_rmap+0xe0/0x150
       [<ffffffff8115311a>] ? vm_normal_page+0x1a/0x80
       [<ffffffff81153e61>] unmap_single_vma+0x531/0x870
       [<ffffffff81154962>] unmap_vmas+0x52/0xa0
       [<ffffffff81007442>] ? pte_mfn_to_pfn+0x72/0x100
       [<ffffffff8115c8f8>] exit_mmap+0x98/0x170
       [<ffffffff810050d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
       [<ffffffff81059ce3>] mmput+0x83/0xf0
       [<ffffffff810624c4>] exit_mm+0x104/0x130
       [<ffffffff8106264a>] do_exit+0x15a/0x8c0
       [<ffffffff810630ff>] do_group_exit+0x3f/0xa0
       [<ffffffff81063177>] sys_exit_group+0x17/0x20
       [<ffffffff8162bae9>] system_call_fastpath+0x16/0x1b
      
      Calling arch_flush_lazy_mmu_mode immediately after set_pgd makes the
      changes visible to the consistency checks.
      
      Cc: <stable@vger.kernel.org>
      RedHat-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=914737Tested-by: NJosh Boyer <jwboyer@redhat.com>
      Reported-and-Tested-by: NKrishna Raman <kraman@redhat.com>
      Signed-off-by: NSamu Kallio <samu.kallio@aberdeencloud.com>
      Link: http://lkml.kernel.org/r/1364045796-10720-1-git-send-email-konrad.wilk@oracle.comTested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      1160c277
  27. 03 4月, 2013 1 次提交
  28. 08 3月, 2013 2 次提交
    • F
      context_tracking: Restore correct previous context state on exception exit · 6c1e0256
      Frederic Weisbecker 提交于
      On exception exit, we restore the previous context tracking state based on
      the regs of the interrupted frame. Iff that frame is in user mode as
      stated by user_mode() helper, we restore the context tracking user mode.
      
      However there is a tiny chunck of low level arch code after we pass through
      user_enter() and until the CPU eventually resumes userspace.
      If an exception happens in this tiny area, exception_enter() correctly
      exits the context tracking user mode but exception_exit() won't restore
      it because of the value returned by user_mode(regs).
      
      As a result we may return to userspace with the wrong context tracking
      state.
      
      To fix this, change exception_enter() to return the context tracking state
      prior to its call and pass this saved state to exception_exit(). This restores
      the real context tracking state of the interrupted frame.
      
      (May be this patch was suggested to me, I don't recall exactly. If so,
      sorry for the missing credit).
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      6c1e0256
    • F
      context_tracking: Move exception handling to generic code · 56dd9470
      Frederic Weisbecker 提交于
      Exceptions handling on context tracking should share common
      treatment: on entry we exit user mode if the exception triggered
      in that context. Then on exception exit we return to that previous
      context.
      
      Generalize this to avoid duplication across archs.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      56dd9470