1. 19 2月, 2016 2 次提交
    • D
      mm/core: Do not enforce PKEY permissions on remote mm access · 1b2ee126
      Dave Hansen 提交于
      We try to enforce protection keys in software the same way that we
      do in hardware.  (See long example below).
      
      But, we only want to do this when accessing our *own* process's
      memory.  If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
      tried to PTRACE_POKE a target process which just happened to have
      some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
      debugger access to that memory.  PKRU is fundamentally a
      thread-local structure and we do not want to enforce it on access
      to _another_ thread's data.
      
      This gets especially tricky when we have workqueues or other
      delayed-work mechanisms that might run in a random process's context.
      We can check that we only enforce pkeys when operating on our *own* mm,
      but delayed work gets performed when a random user context is active.
      We might end up with a situation where a delayed-work gup fails when
      running randomly under its "own" task but succeeds when running under
      another process.  We want to avoid that.
      
      To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
      fault flag: FAULT_FLAG_REMOTE.  They indicate that we are
      walking an mm which is not guranteed to be the same as
      current->mm and should not be subject to protection key
      enforcement.
      
      Thanks to Jerome Glisse for pointing out this scenario.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Geliang Tang <geliangtang@163.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-s390@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1b2ee126
    • D
      um, pkeys: Add UML arch_*_access_permitted() methods · 9d95b175
      Dave Hansen 提交于
      UML has a special mmu_context.h and needs updates whenever the generic one
      is updated.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: user-mode-linux-user@lists.sourceforge.net
      Link: http://lkml.kernel.org/r/20160218183557.AE1DB383@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9d95b175
  2. 18 2月, 2016 10 次提交
    • D
      mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys · 33a709b2
      Dave Hansen 提交于
      Today, for normal faults and page table walks, we check the VMA
      and/or PTE to ensure that it is compatible with the action.  For
      instance, if we get a write fault on a non-writeable VMA, we
      SIGSEGV.
      
      We try to do the same thing for protection keys.  Basically, we
      try to make sure that if a user does this:
      
      	mprotect(ptr, size, PROT_NONE);
      	*ptr = foo;
      
      they see the same effects with protection keys when they do this:
      
      	mprotect(ptr, size, PROT_READ|PROT_WRITE);
      	set_pkey(ptr, size, 4);
      	wrpkru(0xffffff3f); // access disable pkey 4
      	*ptr = foo;
      
      The state to do that checking is in the VMA, but we also
      sometimes have to do it on the page tables only, like when doing
      a get_user_pages_fast() where we have no VMA.
      
      We add two functions and expose them to generic code:
      
      	arch_pte_access_permitted(pte_flags, write)
      	arch_vma_access_permitted(vma, write)
      
      These are, of course, backed up in x86 arch code with checks
      against the PTE or VMA's protection key.
      
      But, there are also cases where we do not want to respect
      protection keys.  When we ptrace(), for instance, we do not want
      to apply the tracer's PKRU permissions to the PTEs from the
      process being traced.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-s390@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      33a709b2
    • D
      x86/mm/gup: Simplify get_user_pages() PTE bit handling · 1874f689
      Dave Hansen 提交于
      The current get_user_pages() code is a wee bit more complicated
      than it needs to be for pte bit checking.  Currently, it establishes
      a mask of required pte _PAGE_* bits and ensures that the pte it
      goes after has all those bits.
      
      This consolidates the three identical copies of this code.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210218.3A2D4045@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1874f689
    • D
      x86/mm/pkeys: Add functions to fetch PKRU · a927cb83
      Dave Hansen 提交于
      This adds the raw instruction to access PKRU as well as some
      accessor functions that correctly handle when the CPU does not
      support the instruction.  We don't use it here, but we will use
      read_pkru() in the next patch.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210215.15238D34@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a927cb83
    • D
      x86/mm/pkeys: Fill in pkey field in siginfo · 019132ff
      Dave Hansen 提交于
      This fills in the new siginfo field: si_pkey to indicate to
      userspace which protection key was set on the PTE that we faulted
      on.
      
      Note though that *ALL* protection key faults have to be generated
      by a valid, present PTE at some point.  But this code does no PTE
      lookups which seeds odd.  The reason is that we take advantage of
      the way we generate PTEs from VMAs.  All PTEs under a VMA share
      some attributes.  For instance, they are _all_ either PROT_READ
      *OR* PROT_NONE.  They also always share a protection key, so we
      never have to walk the page tables; we just use the VMA.
      
      Note that _pkey is a 64-bit value.  The current hardware only
      supports 4-bit protection keys.  We do this because there is
      _plenty_ of space in _sigfault and it is possible that future
      processors would support more than 4 bits of protection keys.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210213.ABC488FA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      019132ff
    • D
      signals, ia64, mips: Update arch-specific siginfos with pkeys field · b376cd02
      Dave Hansen 提交于
      ia64 and mips have separate definitions for siginfo from the
      generic one.  Patch them to have the pkey fields.
      
      Note that this is exactly what we did for MPX as well.
      
      [ This fixes a compile error that Ingo was hitting with MIPS when the
        x86 pkeys patch set is applied. ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Malat <oss@malat.biz>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160217181703.E99B6656@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b376cd02
    • D
      x86/mm/pkeys: Pass VMA down in to fault signal generation code · 7b2d0dba
      Dave Hansen 提交于
      During a page fault, we look up the VMA to ensure that the fault
      is in a region with a valid mapping.  But, in the top-level page
      fault code we don't need the VMA for much else.  Once we have
      decided that an access is bad, we are going to send a signal no
      matter what and do not need the VMA any more.  So we do not pass
      it down in to the signal generation code.
      
      But, for protection keys, we need the VMA.  It tells us *which*
      protection key we violated if we get a PF_PK.  So, we need to
      pass the VMA down and fill in siginfo->si_pkey.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210211.AD3B36A3@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7b2d0dba
    • D
      x86/mm/pkeys: Add arch-specific VMA protection bits · 8f62c883
      Dave Hansen 提交于
      Lots of things seem to do:
      
              vma->vm_page_prot = vm_get_page_prot(flags);
      
      and the ptes get created right from things we pull out
      of ->vm_page_prot.  So it is very convenient if we can
      store the protection key in flags and vm_page_prot, just
      like the existing permission bits (_PAGE_RW/PRESENT).  It
      greatly reduces the amount of plumbing and arch-specific
      hacking we have to do in generic code.
      
      This also takes the new PROT_PKEY{0,1,2,3} flags and
      turns *those* in to VM_ flags for vma->vm_flags.
      
      The protection key values are stored in 4 places:
      	1. "prot" argument to system calls
      	2. vma->vm_flags, filled from the mmap "prot"
      	3. vma->vm_page prot, filled from vma->vm_flags
      	4. the PTE itself.
      
      The pseudocode for these for steps are as follows:
      
      	mmap(PROT_PKEY*)
      	vma->vm_flags 	  = ... | arch_calc_vm_prot_bits(mmap_prot);
      	vma->vm_page_prot = ... | arch_vm_get_page_prot(vma->vm_flags);
      	pte = pfn | vma->vm_page_prot
      
      Note that this provides a new definitions for x86:
      
      	arch_vm_get_page_prot()
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210210.FE483A42@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8f62c883
    • D
      mm/core, x86/mm/pkeys: Store protection bits in high VMA flags · 63c17fb8
      Dave Hansen 提交于
      vma->vm_flags is an 'unsigned long', so has space for 32 flags
      on 32-bit architectures.  The high 32 bits are unused on 64-bit
      platforms.  We've steered away from using the unused high VMA
      bits for things because we would have difficulty supporting it
      on 32-bit.
      
      Protection Keys are not available in 32-bit mode, so there is
      no concern about supporting this feature in 32-bit mode or on
      32-bit CPUs.
      
      This patch carves out 4 bits from the high half of
      vma->vm_flags and allows architectures to set config option
      to make them available.
      
      Sparse complains about these constants unless we explicitly
      call them "UL".
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210208.81AF00D5@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      63c17fb8
    • D
      x86/mm/pkeys: Add new 'PF_PK' page fault error code bit · b3ecd515
      Dave Hansen 提交于
      Note: "PK" is how the Intel SDM refers to this bit, so we also
      use that nomenclature.
      
      This only defines the bit, it does not plumb it anywhere to be
      handled.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210207.DA7B43E6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3ecd515
    • D
      x86/mm/pkeys: Add PTE bits for storing protection key · 5c1d90f5
      Dave Hansen 提交于
      Previous documentation has referred to these 4 bits as "ignored".
      That means that software could have made use of them.  But, as
      far as I know, the kernel never used them.
      
      They are still ignored when protection keys is not enabled, so
      they could theoretically still get used for software purposes.
      
      We also implement "empty" versions so that code that references
      to them can be optimized away by the compiler when the config
      option is not enabled.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210205.81E33ED6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5c1d90f5
  3. 16 2月, 2016 7 次提交
    • D
      x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures · c8df4009
      Dave Hansen 提交于
      The protection keys register (PKRU) is saved and restored using
      xsave.  Define the data structure that we will use to access it
      inside the xsave buffer.
      
      Note that we also have to widen the printk of the xsave feature
      masks since this is feature 0x200 and we only did two characters
      before.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210204.56DF8F7B@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c8df4009
    • D
      x86/cpu, x86/mm/pkeys: Define new CR4 bit · f28b49d2
      Dave Hansen 提交于
      There is a new bit in CR4 for enabling protection keys.  We
      will actually enable it later in the series.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210202.3CFC3DB2@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f28b49d2
    • D
      x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions · dfb4a70f
      Dave Hansen 提交于
      There are two CPUID bits for protection keys.  One is for whether
      the CPU contains the feature, and the other will appear set once
      the OS enables protection keys.  Specifically:
      
      	Bit 04: OSPKE. If 1, OS has set CR4.PKE to enable
      	Protection keys (and the RDPKRU/WRPKRU instructions)
      
      This is because userspace can not see CR4 contents, but it can
      see CPUID contents.
      
      X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation:
      
      	CPUID.(EAX=07H,ECX=0H):ECX.PKU [bit 3]
      
      X86_FEATURE_OSPKE is "OSPKU":
      
      	CPUID.(EAX=07H,ECX=0H):ECX.OSPKE [bit 4]
      
      These are the first CPU features which need to look at the
      ECX word in CPUID leaf 0x7, so this patch also includes
      fetching that word in to the cpuinfo->x86_capability[] array.
      
      Add it to the disabled-features mask when its config option is
      off.  Even though we are not using it here, we also extend the
      REQUIRED_MASK_BIT_SET() macro to keep it mirroring the
      DISABLED_MASK_BIT_SET() version.
      
      This means that in almost all code, you should use:
      
      	cpu_has(c, X86_FEATURE_PKU)
      
      and *not* the CONFIG option.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210201.7714C250@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dfb4a70f
    • D
      x86/mm/pkeys: Add Kconfig option · 35e97790
      Dave Hansen 提交于
      I don't have a strong opinion on whether we need a Kconfig prompt
      or not.  Protection Keys has relatively little code associated
      with it, and it is not a heavyweight feature to keep enabled.
      However, I can imagine that folks would still appreciate being
      able to disable it.
      
      Note that, with disabled-features.h, the checks in the code
      for protection keys are always the same:
      
      	cpu_has(c, X86_FEATURE_PKU)
      
      With the config option disabled, this essentially turns into an
      
      We will hide the prompt for now.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210200.DB7055E8@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35e97790
    • D
      x86/fpu: Add placeholder for 'Processor Trace' XSAVE state · 1f96b1ef
      Dave Hansen 提交于
      There is an XSAVE state component for Intel Processor Trace (PT).
      But, we do not currently use it.
      
      We add a placeholder in the code for it so it is not a mystery and
      also so we do not need an explicit enum initialization for Protection
      Keys in a moment.
      
      Why don't we use it?
      
      We might end up using this at _some_ point in the future.  But,
      this is a "system" state which requires using the currently
      unsupported XSAVES feature.  Unlike all the other XSAVE states,
      PT state is also not directly tied to a thread.  You might
      context-switch between threads, but not want to change any of the
      PT state.  Or, you might switch between threads, and *do* want to
      change PT state, all depending on what is being traced.
      
      We currently just manually set some MSRs to do this PT context
      switching, and it is unclear whether replacing our direct MSR use
      with XSAVE will be a net win or loss, both in code complexity and
      performance.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: fenghua.yu@intel.com
      Cc: linux-mm@kvack.org
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20160212210158.5E4BCAE2@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f96b1ef
    • D
      mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm · d4edcf0d
      Dave Hansen 提交于
      We will soon modify the vanilla get_user_pages() so it can no
      longer be used on mm/tasks other than 'current/current->mm',
      which is by far the most common way it is called.  For now,
      we allow the old-style calls, but warn when they are used.
      (implemented in previous patch)
      
      This patch switches all callers of:
      
      	get_user_pages()
      	get_user_pages_unlocked()
      	get_user_pages_locked()
      
      to stop passing tsk/mm so they will no longer see the warnings.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: jack@suse.cz
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4edcf0d
    • B
      x86/cpufeature: Speed up cpu_feature_enabled() · f2cc8e07
      Borislav Petkov 提交于
      When GCC cannot do constant folding for this macro, it falls back to
      cpu_has(). But static_cpu_has() is optimal and it works at all times
      now. So use it and speedup the fallback case.
      
      Before we had this:
      
        mov    0x99d674(%rip),%rdx        # ffffffff81b0d9f4 <boot_cpu_data+0x34>
        shr    $0x2e,%rdx
        and    $0x1,%edx
        jne    ffffffff811704e9 <do_munmap+0x3f9>
      
      After alternatives patching, it turns into:
      
      		  jmp    0xffffffff81170390
      		  nopl   (%rax)
      		  ...
      		  callq  ffffffff81056e00 <mpx_notify_unmap>
      ffffffff81170390: mov    0x170(%r12),%rdi
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1455578358-28347-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2cc8e07
  4. 14 2月, 2016 1 次提交
    • B
      x86/mm: Fix INVPCID asm constraint · e2c7698c
      Borislav Petkov 提交于
      So we want to specify the dependency on both @pcid and @addr so that the
      compiler doesn't reorder accesses to them *before* the TLB flush. But
      for that to work, we need to express this properly in the inline asm and
      deref the whole desc array, not the pointer to it. See clwb() for an
      example.
      
      This fixes the build error on 32-bit:
      
        arch/x86/include/asm/tlbflush.h: In function ‘__invpcid’:
        arch/x86/include/asm/tlbflush.h:26:18: error: memory input 0 is not directly addressable
      
      which gcc4.7 caught but 5.x didn't. Which is strange. :-\
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Michael Matz <matz@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e2c7698c
  5. 12 2月, 2016 2 次提交
  6. 11 2月, 2016 3 次提交
  7. 10 2月, 2016 4 次提交
    • Z
      MIPS: Octeon: Update OCTEON_FEATURE_PCIE for Octeon III · b96d6a80
      Zubair Lutfullah Kakakhel 提交于
      Currently the driver tries to probe the pci driver and oops.
      
      Add CN7XXX to case so that driver probes the pcie driver.
      Signed-off-by: NZubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
      Cc: david.daney@cavium.com
      Cc: matt.redfearn@imgtec.com
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/12530/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      b96d6a80
    • W
      MIPS: pci-mt7620: Fix return value check in mt7620_pci_probe() · aaa0bf22
      Wei Yongjun 提交于
      In case of error, the function devm_ioremap_resource() returns
      ERR_PTR() and never returns NULL. The NULL test in the return
      value check should be replaced with IS_ERR().
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Acked-by: NJohn Crispin <blogic@openwrt.org>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-mediatek@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/12451/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      aaa0bf22
    • V
      ARCv2: intc: Allow interruption by lowest priority interrupt · dec2b284
      Vineet Gupta 提交于
      ARC HS Cores support configurable multiple interrupt priorities of upto
      16 levels.
      
      There is processor "interrupt preemption threshhold" in STATUS32.E[4:1]
      And several places need to set this up:
      1. seed value as kernel is booting
      2. seed value for user space programs
      3. Arg to SLEEP instruction in idle task (what interrupt prio can wake)
      4. Per-IRQ line prioirty (i.e. what is the priority of interrupt
         raised by a peripheral or timer or perf counter...
      
      Currently above sites use the highest priority 0. This can be potential
      problem when multiple priorities are supported. e.g. user space could
      only be interrupted by P0 interrupt, not others...
      So turn this over and instead make default interruption level to be
      the lowest priority possible 15. This should be fine even if there are
      fewer priority levels configured (say two: P0 HIGH, P1 LOW)
      
      This feature also effectively disables FIRQ feature if present in
      hardware config. With old code, a P0 interrupt would be FIRQ, needing
      special handling (ISR or Register Banks) which is NOT supported yet.
      Now it not be P0 (P15 or whatever is lowest prio) so FIRQ is not
      triggered.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      dec2b284
    • P
      MIPS: Fix early CM probing · 3af5a67c
      Paul Burton 提交于
      Commit c014d164 ("MIPS: Add platform callback before initializing
      the L2 cache") added a platform_early_l2_init function in order to allow
      platforms to probe for the CM before L2 initialisation is performed, so
      that CM GCRs are available to mips_sc_probe.
      
      That commit actually fails to do anything useful, since it checks
      mips_cm_revision to determine whether it should call mips_cm_probe but
      the result of mips_cm_revision will always be 0 until mips_cm_probe has
      been called. Thus the "early" mips_cm_probe call never occurs.
      
      Fix this & drop the useless weak platform_early_l2_init function by
      simply calling mips_cm_probe from setup_arch. For platforms that don't
      select CONFIG_MIPS_CM this will be a no-op, and for those that do it
      removes the requirement for them to call mips_cm_probe manually
      (although doing so isn't harmful for now).
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Reviewed-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Cc: Andrzej Hajda <a.hajda@samsung.com>
      Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Jaedon Shin <jaedon.shin@gmail.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jonas Gorski <jogo@openwrt.org>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/12475/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      3af5a67c
  8. 09 2月, 2016 11 次提交
    • A
      x86/fpu: Default eagerfpu=on on all CPUs · 58122bf1
      Andy Lutomirski 提交于
      We have eager and lazy FPU modes, introduced in:
      
        304bceda ("x86, fpu: use non-lazy fpu restore for processors supporting xsave")
      
      The result is rather messy.  There are two code paths in almost all
      of the FPU code, and only one of them (the eager case) is tested
      frequently, since most kernel developers have new enough hardware
      that we use eagerfpu.
      
      It seems that, on any remotely recent hardware, eagerfpu is a win:
      glibc uses SSE2, so laziness is probably overoptimistic, and, in any
      case, manipulating TS is far slower that saving and restoring the
      full state.  (Stores to CR0.TS are serializing and are poorly
      optimized.)
      
      To try to shake out any latent issues on old hardware, this changes
      the default to eager on all CPUs.  If no performance or functionality
      problems show up, a subsequent patch could remove lazy mode entirely.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/ac290de61bf08d9cfc2664a4f5080257ffc1075a.1453675014.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      58122bf1
    • A
      x86/fpu: Speed up lazy FPU restores slightly · c6ab109f
      Andy Lutomirski 提交于
      If we have an FPU, there's no need to check CR0 for FPU emulation.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/980004297e233c27066d54e71382c44cdd36ef7c.1453675014.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c6ab109f
    • A
      x86/fpu: Fold fpu_copy() into fpu__copy() · a20d7297
      Andy Lutomirski 提交于
      Splitting it into two functions needlessly obfuscated the code.
      While we're at it, improve the comment slightly.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/3eb5a63a9c5c84077b2677a7dfe684eef96fe59e.1453675014.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a20d7297
    • A
      x86/fpu: Fix FNSAVE usage in eagerfpu mode · 5ed73f40
      Andy Lutomirski 提交于
      In eager fpu mode, having deactivated FPU without immediately
      reloading some other context is illegal.  Therefore, to recover from
      FNSAVE, we can't just deactivate the state -- we need to reload it
      if we're not actively context switching.
      
      We had this wrong in fpu__save() and fpu__copy().  Fix both.
      __kernel_fpu_begin() was fine -- add a comment.
      
      This fixes a warning triggerable with nofxsr eagerfpu=on.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/60662444e13c76f06e23c15c5dcdba31b4ac3d67.1453675014.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5ed73f40
    • A
      x86/fpu: Fix math emulation in eager fpu mode · 4ecd16ec
      Andy Lutomirski 提交于
      Systems without an FPU are generally old and therefore use lazy FPU
      switching. Unsurprisingly, math emulation in eager FPU mode is a
      bit buggy. Fix it.
      
      There were two bugs involving kernel code trying to use the FPU
      registers in eager mode even if they didn't exist and one BUG_ON()
      that was incorrect.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/b4b8d112436bd6fab866e1b4011131507e8d7fbe.1453675014.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4ecd16ec
    • M
      x86/mm: Honour passed pgprot in track_pfn_insert() and track_pfn_remap() · dd7b6847
      Matthew Wilcox 提交于
      track_pfn_insert() overwrites the pgprot that is passed in with a value
      based on the VMA's page_prot.  This is a problem for people trying to
      do clever things with the new vm_insert_pfn_prot() as it will simply
      overwrite the passed protection flags.  If we use the current value of
      the pgprot as the base, then it will behave as people are expecting.
      
      Also fix track_pfn_remap() in the same way.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1453742717-10326-2-git-send-email-matthew.r.wilcox@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dd7b6847
    • A
      x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache() · ce1143aa
      Andy Lutomirski 提交于
      DMI cacheability is very confused on x86.
      
      dmi_early_remap() uses early_ioremap(), which uses FIXMAP_PAGE_IO,
      which is __PAGE_KERNEL_IO, which is __PAGE_KERNEL, which is cached.
      
      Don't ask me why this makes any sense.
      
      dmi_remap() uses ioremap(), which requests an uncached mapping.
      
      However, on non-EFI systems, the DMI data generally lives between
      0xf0000 and 0x100000, which is in the legacy ISA range, which
      triggers a special case in the PAT code that overrides the cache
      mode requested by ioremap() and forces a WB mapping.
      
      On a UEFI boot, however, the DMI table can live at any physical
      address.  On my laptop, it's around 0x77dd0000.  That's nowhere near
      the legacy ISA range, so the ioremap() implicit uncached type is
      honored and we end up with a UC- mapping.
      
      UC- is a very, very slow way to read from main memory, so dmi_walk()
      is likely to take much longer than necessary.
      
      Given that, even on UEFI, we do early cached DMI reads, it seems
      safe to just ask for cached access.  Switch to ioremap_cache().
      
      I haven't tried to benchmark this, but I'd guess it saves several
      milliseconds of boot time.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jean Delvare <jdelvare@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Link: http://lkml.kernel.org/r/3147c38e51f439f3c8911db34c7d4ab22d854915.1453791969.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce1143aa
    • A
      x86/mm: If INVPCID is available, use it to flush global mappings · d8bced79
      Andy Lutomirski 提交于
      On my Skylake laptop, INVPCID function 2 (flush absolutely
      everything) takes about 376ns, whereas saving flags, twiddling
      CR4.PGE to flush global mappings, and restoring flags takes about
      539ns.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/ed0ef62581c0ea9c99b9bf6df726015e96d44743.1454096309.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d8bced79
    • A
      x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID · d12a72b8
      Andy Lutomirski 提交于
      This adds a chicken bit to turn off INVPCID in case something goes
      wrong.  It's an early_param() because we do TLB flushes before we
      parse __setup() parameters.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/f586317ed1bc2b87aee652267e515b90051af385.1454096309.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d12a72b8
    • A
      x86/mm: Add INVPCID helpers · 060a402a
      Andy Lutomirski 提交于
      This adds helpers for each of the four currently-specified INVPCID
      modes.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/8a62b23ad686888cee01da134c91409e22064db9.1454096309.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      060a402a
    • A
      x86/kasan: Write protect kasan zero shadow · 063fb3e5
      Andrey Ryabinin 提交于
      After kasan_init() executed, no one is allowed to write to kasan_zero_page,
      so write protect it.
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1452516679-32040-3-git-send-email-aryabinin@virtuozzo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      063fb3e5