1. 19 2月, 2016 10 次提交
    • D
      mm/core, x86/mm/pkeys: Add arch_validate_pkey() · 66d37570
      Dave Hansen 提交于
      The syscall-level code is passed a protection key and need to
      return an appropriate error code if the protection key is bogus.
      We will be using this in subsequent patches.
      
      Note that this also begins a series of arch-specific calls that
      we need to expose in otherwise arch-independent code.  We create
      a linux/pkeys.h header where we will put *all* the stubs for
      these functions.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210232.774EEAAB@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66d37570
    • D
      mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits() · e6bfb709
      Dave Hansen 提交于
      This plumbs a protection key through calc_vm_flag_bits().  We
      could have done this in calc_vm_prot_bits(), but I did not feel
      super strongly which way to go.  It was pretty arbitrary which
      one to use.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Geliang Tang <geliangtang@163.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Leon Romanovsky <leon@leon.nu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Riley Andrews <riandrews@android.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: devel@driverdev.osuosl.org
      Cc: linux-api@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20160212210231.E6F1F0D6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e6bfb709
    • D
      x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU · 06976945
      Dave Hansen 提交于
      This sets the bit in 'cr4' to actually enable the protection
      keys feature.  We also include a boot-time disable for the
      feature "nopku".
      
      Seting X86_CR4_PKE will cause the X86_FEATURE_OSPKE cpuid
      bit to appear set.  At this point in boot, identify_cpu()
      has already run the actual CPUID instructions and populated
      the "cpu features" structures.  We need to go back and
      re-run identify_cpu() to make sure it gets updated values.
      
      We *could* simply re-populate the 11th word of the cpuid
      data, but this is probably quick enough.
      
      Also note that with the cpu_has() check and X86_FEATURE_PKU
      present in disabled-features.h, we do not need an #ifdef
      for setup_pku().
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210229.6708027C@viggo.jf.intel.com
      [ Small readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      06976945
    • D
      x86/mm/pkeys: Add Kconfig prompt to existing config option · 284244a9
      Dave Hansen 提交于
      I don't have a strong opinion on whether we need this or not.
      Protection Keys has relatively little code associated with it,
      and it is not a heavyweight feature to keep enabled.  However,
      I can imagine that folks would still appreciate being able to
      disable it.
      
      Here's the option if folks want it.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210228.7E79386C@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      284244a9
    • D
      x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps · c1192f84
      Dave Hansen 提交于
      The protection key can now be just as important as read/write
      permissions on a VMA.  We need some debug mechanism to help
      figure out if it is in play.  smaps seems like a logical
      place to expose it.
      
      arch/x86/kernel/setup.c is a bit of a weirdo place to put
      this code, but it already had seq_file.h and there was not
      a much better existing place to put it.
      
      We also use no #ifdef.  If protection keys is .config'd out we
      will effectively get the same function as if we used the weak
      generic function.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Mark Williamson <mwilliamson@undo-software.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210227.4F8EB3F8@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1192f84
    • D
      x86/mm/pkeys: Dump PKRU with other kernel registers · c0b17b5b
      Dave Hansen 提交于
      Protection Keys never affect kernel mappings.  But, they can
      affect whether the kernel will fault when it touches a user
      mapping.  The kernel doesn't touch user mappings without some
      careful choreography and these accesses don't generally result in
      oopses.  But, if one does, we definitely want to have PKRU
      available so we can figure out if protection keys played a role.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210225.BF0D4482@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c0b17b5b
    • D
      mm/core, x86/mm/pkeys: Differentiate instruction fetches · d61172b4
      Dave Hansen 提交于
      As discussed earlier, we attempt to enforce protection keys in
      software.
      
      However, the code checks all faults to ensure that they are not
      violating protection key permissions.  It was assumed that all
      faults are either write faults where we check PKRU[key].WD (write
      disable) or read faults where we check the AD (access disable)
      bit.
      
      But, there is a third category of faults for protection keys:
      instruction faults.  Instruction faults never run afoul of
      protection keys because they do not affect instruction fetches.
      
      So, plumb the PF_INSTR bit down in to the
      arch_vma_access_permitted() function where we do the protection
      key checks.
      
      We also add a new FAULT_FLAG_INSTRUCTION.  This is because
      handle_mm_fault() is not passed the architecture-specific
      error_code where we keep PF_INSTR, so we need to encode the
      instruction fetch information in to the arch-generic fault
      flags.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d61172b4
    • D
      x86/mm/pkeys: Optimize fault handling in access_error() · 07f146f5
      Dave Hansen 提交于
      We might not strictly have to make modifictions to
      access_error() to check the VMA here.
      
      If we do not, we will do this:
      
       1. app sets VMA pkey to K
       2. app touches a !present page
       3. do_page_fault(), allocates and maps page, sets pte.pkey=K
       4. return to userspace
       5. touch instruction reexecutes, but triggers PF_PK
       6. do PKEY signal
      
      What happens with this patch applied:
      
       1. app sets VMA pkey to K
       2. app touches a !present page
       3. do_page_fault() notices that K is inaccessible
       4. do PKEY signal
      
      We basically skip the fault that does an allocation.
      
      So what this lets us do is protect areas from even being
      *populated* unless it is accessible according to protection
      keys.  That seems handy to me and makes protection keys work
      more like an mprotect()'d mapping.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210222.EBB63D8C@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      07f146f5
    • D
      mm/core: Do not enforce PKEY permissions on remote mm access · 1b2ee126
      Dave Hansen 提交于
      We try to enforce protection keys in software the same way that we
      do in hardware.  (See long example below).
      
      But, we only want to do this when accessing our *own* process's
      memory.  If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
      tried to PTRACE_POKE a target process which just happened to have
      some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
      debugger access to that memory.  PKRU is fundamentally a
      thread-local structure and we do not want to enforce it on access
      to _another_ thread's data.
      
      This gets especially tricky when we have workqueues or other
      delayed-work mechanisms that might run in a random process's context.
      We can check that we only enforce pkeys when operating on our *own* mm,
      but delayed work gets performed when a random user context is active.
      We might end up with a situation where a delayed-work gup fails when
      running randomly under its "own" task but succeeds when running under
      another process.  We want to avoid that.
      
      To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
      fault flag: FAULT_FLAG_REMOTE.  They indicate that we are
      walking an mm which is not guranteed to be the same as
      current->mm and should not be subject to protection key
      enforcement.
      
      Thanks to Jerome Glisse for pointing out this scenario.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Geliang Tang <geliangtang@163.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-s390@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1b2ee126
    • D
      um, pkeys: Add UML arch_*_access_permitted() methods · 9d95b175
      Dave Hansen 提交于
      UML has a special mmu_context.h and needs updates whenever the generic one
      is updated.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: user-mode-linux-user@lists.sourceforge.net
      Link: http://lkml.kernel.org/r/20160218183557.AE1DB383@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9d95b175
  2. 18 2月, 2016 12 次提交
    • D
      mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys · 33a709b2
      Dave Hansen 提交于
      Today, for normal faults and page table walks, we check the VMA
      and/or PTE to ensure that it is compatible with the action.  For
      instance, if we get a write fault on a non-writeable VMA, we
      SIGSEGV.
      
      We try to do the same thing for protection keys.  Basically, we
      try to make sure that if a user does this:
      
      	mprotect(ptr, size, PROT_NONE);
      	*ptr = foo;
      
      they see the same effects with protection keys when they do this:
      
      	mprotect(ptr, size, PROT_READ|PROT_WRITE);
      	set_pkey(ptr, size, 4);
      	wrpkru(0xffffff3f); // access disable pkey 4
      	*ptr = foo;
      
      The state to do that checking is in the VMA, but we also
      sometimes have to do it on the page tables only, like when doing
      a get_user_pages_fast() where we have no VMA.
      
      We add two functions and expose them to generic code:
      
      	arch_pte_access_permitted(pte_flags, write)
      	arch_vma_access_permitted(vma, write)
      
      These are, of course, backed up in x86 arch code with checks
      against the PTE or VMA's protection key.
      
      But, there are also cases where we do not want to respect
      protection keys.  When we ptrace(), for instance, we do not want
      to apply the tracer's PKRU permissions to the PTEs from the
      process being traced.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-s390@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      33a709b2
    • D
      x86/mm/gup: Simplify get_user_pages() PTE bit handling · 1874f689
      Dave Hansen 提交于
      The current get_user_pages() code is a wee bit more complicated
      than it needs to be for pte bit checking.  Currently, it establishes
      a mask of required pte _PAGE_* bits and ensures that the pte it
      goes after has all those bits.
      
      This consolidates the three identical copies of this code.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210218.3A2D4045@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1874f689
    • D
      mm/gup: Factor out VMA fault permission checking · d4925e00
      Dave Hansen 提交于
      This code matches a fault condition up with the VMA and ensures
      that the VMA allows the fault to be handled instead of just
      erroring out.
      
      We will be extending this in a moment to comprehend protection
      keys.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210216.C3824032@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4925e00
    • D
      x86/mm/pkeys: Add functions to fetch PKRU · a927cb83
      Dave Hansen 提交于
      This adds the raw instruction to access PKRU as well as some
      accessor functions that correctly handle when the CPU does not
      support the instruction.  We don't use it here, but we will use
      read_pkru() in the next patch.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210215.15238D34@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a927cb83
    • D
      x86/mm/pkeys: Fill in pkey field in siginfo · 019132ff
      Dave Hansen 提交于
      This fills in the new siginfo field: si_pkey to indicate to
      userspace which protection key was set on the PTE that we faulted
      on.
      
      Note though that *ALL* protection key faults have to be generated
      by a valid, present PTE at some point.  But this code does no PTE
      lookups which seeds odd.  The reason is that we take advantage of
      the way we generate PTEs from VMAs.  All PTEs under a VMA share
      some attributes.  For instance, they are _all_ either PROT_READ
      *OR* PROT_NONE.  They also always share a protection key, so we
      never have to walk the page tables; we just use the VMA.
      
      Note that _pkey is a 64-bit value.  The current hardware only
      supports 4-bit protection keys.  We do this because there is
      _plenty_ of space in _sigfault and it is possible that future
      processors would support more than 4 bits of protection keys.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210213.ABC488FA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      019132ff
    • D
      signals, pkeys: Notify userspace about protection key faults · cd0ea35f
      Dave Hansen 提交于
      A protection key fault is very similar to any other access error.
      There must be a VMA, etc...  We even want to take the same action
      (SIGSEGV) that we do with a normal access fault.
      
      However, we do need to let userspace know that something is
      different.  We do this the same way what we did with SEGV_BNDERR
      with Memory Protection eXtensions (MPX): define a new SEGV code:
      SEGV_PKUERR.
      
      We add a siginfo field: si_pkey that reveals to userspace which
      protection key was set on the PTE that we faulted on.  There is
      no other easy way for userspace to figure this out.  They could
      parse smaps but that would be a bit cruel.
      
      We share space with in siginfo with _addr_bnd.  #BR faults from
      MPX are completely separate from page faults (#PF) that trigger
      from protection key violations, so we never need both at the same
      time.
      
      Note that _pkey is a 64-bit value.  The current hardware only
      supports 4-bit protection keys.  We do this because there is
      _plenty_ of space in _sigfault and it is possible that future
      processors would support more than 4 bits of protection keys.
      
      The x86 code to actually fill in the siginfo is in the next
      patch.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Amanieu d'Antras <amanieu@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210212.3A9B83AC@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cd0ea35f
    • D
      signals, ia64, mips: Update arch-specific siginfos with pkeys field · b376cd02
      Dave Hansen 提交于
      ia64 and mips have separate definitions for siginfo from the
      generic one.  Patch them to have the pkey fields.
      
      Note that this is exactly what we did for MPX as well.
      
      [ This fixes a compile error that Ingo was hitting with MIPS when the
        x86 pkeys patch set is applied. ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Malat <oss@malat.biz>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160217181703.E99B6656@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b376cd02
    • D
      x86/mm/pkeys: Pass VMA down in to fault signal generation code · 7b2d0dba
      Dave Hansen 提交于
      During a page fault, we look up the VMA to ensure that the fault
      is in a region with a valid mapping.  But, in the top-level page
      fault code we don't need the VMA for much else.  Once we have
      decided that an access is bad, we are going to send a signal no
      matter what and do not need the VMA any more.  So we do not pass
      it down in to the signal generation code.
      
      But, for protection keys, we need the VMA.  It tells us *which*
      protection key we violated if we get a PF_PK.  So, we need to
      pass the VMA down and fill in siginfo->si_pkey.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210211.AD3B36A3@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7b2d0dba
    • D
      x86/mm/pkeys: Add arch-specific VMA protection bits · 8f62c883
      Dave Hansen 提交于
      Lots of things seem to do:
      
              vma->vm_page_prot = vm_get_page_prot(flags);
      
      and the ptes get created right from things we pull out
      of ->vm_page_prot.  So it is very convenient if we can
      store the protection key in flags and vm_page_prot, just
      like the existing permission bits (_PAGE_RW/PRESENT).  It
      greatly reduces the amount of plumbing and arch-specific
      hacking we have to do in generic code.
      
      This also takes the new PROT_PKEY{0,1,2,3} flags and
      turns *those* in to VM_ flags for vma->vm_flags.
      
      The protection key values are stored in 4 places:
      	1. "prot" argument to system calls
      	2. vma->vm_flags, filled from the mmap "prot"
      	3. vma->vm_page prot, filled from vma->vm_flags
      	4. the PTE itself.
      
      The pseudocode for these for steps are as follows:
      
      	mmap(PROT_PKEY*)
      	vma->vm_flags 	  = ... | arch_calc_vm_prot_bits(mmap_prot);
      	vma->vm_page_prot = ... | arch_vm_get_page_prot(vma->vm_flags);
      	pte = pfn | vma->vm_page_prot
      
      Note that this provides a new definitions for x86:
      
      	arch_vm_get_page_prot()
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210210.FE483A42@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8f62c883
    • D
      mm/core, x86/mm/pkeys: Store protection bits in high VMA flags · 63c17fb8
      Dave Hansen 提交于
      vma->vm_flags is an 'unsigned long', so has space for 32 flags
      on 32-bit architectures.  The high 32 bits are unused on 64-bit
      platforms.  We've steered away from using the unused high VMA
      bits for things because we would have difficulty supporting it
      on 32-bit.
      
      Protection Keys are not available in 32-bit mode, so there is
      no concern about supporting this feature in 32-bit mode or on
      32-bit CPUs.
      
      This patch carves out 4 bits from the high half of
      vma->vm_flags and allows architectures to set config option
      to make them available.
      
      Sparse complains about these constants unless we explicitly
      call them "UL".
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210208.81AF00D5@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      63c17fb8
    • D
      x86/mm/pkeys: Add new 'PF_PK' page fault error code bit · b3ecd515
      Dave Hansen 提交于
      Note: "PK" is how the Intel SDM refers to this bit, so we also
      use that nomenclature.
      
      This only defines the bit, it does not plumb it anywhere to be
      handled.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210207.DA7B43E6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3ecd515
    • D
      x86/mm/pkeys: Add PTE bits for storing protection key · 5c1d90f5
      Dave Hansen 提交于
      Previous documentation has referred to these 4 bits as "ignored".
      That means that software could have made use of them.  But, as
      far as I know, the kernel never used them.
      
      They are still ignored when protection keys is not enabled, so
      they could theoretically still get used for software purposes.
      
      We also implement "empty" versions so that code that references
      to them can be optimized away by the compiler when the config
      option is not enabled.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210205.81E33ED6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5c1d90f5
  3. 16 2月, 2016 10 次提交
    • D
      x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures · c8df4009
      Dave Hansen 提交于
      The protection keys register (PKRU) is saved and restored using
      xsave.  Define the data structure that we will use to access it
      inside the xsave buffer.
      
      Note that we also have to widen the printk of the xsave feature
      masks since this is feature 0x200 and we only did two characters
      before.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210204.56DF8F7B@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c8df4009
    • D
      x86/cpu, x86/mm/pkeys: Define new CR4 bit · f28b49d2
      Dave Hansen 提交于
      There is a new bit in CR4 for enabling protection keys.  We
      will actually enable it later in the series.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210202.3CFC3DB2@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f28b49d2
    • D
      x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions · dfb4a70f
      Dave Hansen 提交于
      There are two CPUID bits for protection keys.  One is for whether
      the CPU contains the feature, and the other will appear set once
      the OS enables protection keys.  Specifically:
      
      	Bit 04: OSPKE. If 1, OS has set CR4.PKE to enable
      	Protection keys (and the RDPKRU/WRPKRU instructions)
      
      This is because userspace can not see CR4 contents, but it can
      see CPUID contents.
      
      X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation:
      
      	CPUID.(EAX=07H,ECX=0H):ECX.PKU [bit 3]
      
      X86_FEATURE_OSPKE is "OSPKU":
      
      	CPUID.(EAX=07H,ECX=0H):ECX.OSPKE [bit 4]
      
      These are the first CPU features which need to look at the
      ECX word in CPUID leaf 0x7, so this patch also includes
      fetching that word in to the cpuinfo->x86_capability[] array.
      
      Add it to the disabled-features mask when its config option is
      off.  Even though we are not using it here, we also extend the
      REQUIRED_MASK_BIT_SET() macro to keep it mirroring the
      DISABLED_MASK_BIT_SET() version.
      
      This means that in almost all code, you should use:
      
      	cpu_has(c, X86_FEATURE_PKU)
      
      and *not* the CONFIG option.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210201.7714C250@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dfb4a70f
    • D
      x86/mm/pkeys: Add Kconfig option · 35e97790
      Dave Hansen 提交于
      I don't have a strong opinion on whether we need a Kconfig prompt
      or not.  Protection Keys has relatively little code associated
      with it, and it is not a heavyweight feature to keep enabled.
      However, I can imagine that folks would still appreciate being
      able to disable it.
      
      Note that, with disabled-features.h, the checks in the code
      for protection keys are always the same:
      
      	cpu_has(c, X86_FEATURE_PKU)
      
      With the config option disabled, this essentially turns into an
      
      We will hide the prompt for now.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210200.DB7055E8@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35e97790
    • D
      x86/fpu: Add placeholder for 'Processor Trace' XSAVE state · 1f96b1ef
      Dave Hansen 提交于
      There is an XSAVE state component for Intel Processor Trace (PT).
      But, we do not currently use it.
      
      We add a placeholder in the code for it so it is not a mystery and
      also so we do not need an explicit enum initialization for Protection
      Keys in a moment.
      
      Why don't we use it?
      
      We might end up using this at _some_ point in the future.  But,
      this is a "system" state which requires using the currently
      unsupported XSAVES feature.  Unlike all the other XSAVE states,
      PT state is also not directly tied to a thread.  You might
      context-switch between threads, but not want to change any of the
      PT state.  Or, you might switch between threads, and *do* want to
      change PT state, all depending on what is being traced.
      
      We currently just manually set some MSRs to do this PT context
      switching, and it is unclear whether replacing our direct MSR use
      with XSAVE will be a net win or loss, both in code complexity and
      performance.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: fenghua.yu@intel.com
      Cc: linux-mm@kvack.org
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20160212210158.5E4BCAE2@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f96b1ef
    • D
      mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm · d4edcf0d
      Dave Hansen 提交于
      We will soon modify the vanilla get_user_pages() so it can no
      longer be used on mm/tasks other than 'current/current->mm',
      which is by far the most common way it is called.  For now,
      we allow the old-style calls, but warn when they are used.
      (implemented in previous patch)
      
      This patch switches all callers of:
      
      	get_user_pages()
      	get_user_pages_unlocked()
      	get_user_pages_locked()
      
      to stop passing tsk/mm so they will no longer see the warnings.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: jack@suse.cz
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4edcf0d
    • D
      mm/gup: Overload get_user_pages() functions · cde70140
      Dave Hansen 提交于
      The concept here was a suggestion from Ingo.  The implementation
      horrors are all mine.
      
      This allows get_user_pages(), get_user_pages_unlocked(), and
      get_user_pages_locked() to be called with or without the
      leading tsk/mm arguments.  We will give a compile-time warning
      about the old style being __deprecated and we will also
      WARN_ON() if the non-remote version is used for a remote-style
      access.
      
      Doing this, folks will get nice warnings and will not break the
      build.  This should be nice for -next and will hopefully let
      developers fix up their own code instead of maintainers needing
      to do it at merge time.
      
      The way we do this is hideous.  It uses the __VA_ARGS__ macro
      functionality to call different functions based on the number
      of arguments passed to the macro.
      
      There's an additional hack to ensure that our EXPORT_SYMBOL()
      of the deprecated symbols doesn't trigger a warning.
      
      We should be able to remove this mess as soon as -rc1 hits in
      the release after this is merged.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Geliang Tang <geliangtang@163.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Leon Romanovsky <leon@leon.nu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210155.73222EE1@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cde70140
    • D
      mm/gup: Introduce get_user_pages_remote() · 1e987790
      Dave Hansen 提交于
      For protection keys, we need to understand whether protections
      should be enforced in software or not.  In general, we enforce
      protections when working on our own task, but not when on others.
      We call these "current" and "remote" operations.
      
      This patch introduces a new get_user_pages() variant:
      
              get_user_pages_remote()
      
      Which is a replacement for when get_user_pages() is called on
      non-current tsk/mm.
      
      We also introduce a new gup flag: FOLL_REMOTE which can be used
      for the "__" gup variants to get this new behavior.
      
      The uprobes is_trap_at_addr() location holds mmap_sem and
      calls get_user_pages(current->mm) on an instruction address.  This
      makes it a pretty unique gup caller.  Being an instruction access
      and also really originating from the kernel (vs. the app), I opted
      to consider this a 'remote' access where protection keys will not
      be enforced.
      
      Without protection keys, this patch should not change any behavior.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: jack@suse.cz
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e987790
    • I
      Merge branches 'x86/fpu', 'x86/mm' and 'x86/asm' into x86/pkeys · 1fe3f29e
      Ingo Molnar 提交于
      Provide a stable basis for the pkeys patches, which touches various
      x86 details.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1fe3f29e
    • B
      x86/cpufeature: Speed up cpu_feature_enabled() · f2cc8e07
      Borislav Petkov 提交于
      When GCC cannot do constant folding for this macro, it falls back to
      cpu_has(). But static_cpu_has() is optimal and it works at all times
      now. So use it and speedup the fallback case.
      
      Before we had this:
      
        mov    0x99d674(%rip),%rdx        # ffffffff81b0d9f4 <boot_cpu_data+0x34>
        shr    $0x2e,%rdx
        and    $0x1,%edx
        jne    ffffffff811704e9 <do_munmap+0x3f9>
      
      After alternatives patching, it turns into:
      
      		  jmp    0xffffffff81170390
      		  nopl   (%rax)
      		  ...
      		  callq  ffffffff81056e00 <mpx_notify_unmap>
      ffffffff81170390: mov    0x170(%r12),%rdi
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1455578358-28347-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2cc8e07
  4. 15 2月, 2016 8 次提交