1. 09 9月, 2016 2 次提交
    • D
      x86/pkeys: Default to a restrictive init PKRU · acd547b2
      Dave Hansen 提交于
      PKRU is the register that lets you disallow writes or all access to a given
      protection key.
      
      The XSAVE hardware defines an "init state" of 0 for PKRU: its most
      permissive state, allowing access/writes to everything.  Since we start off
      all new processes with the init state, we start all processes off with the
      most permissive possible PKRU.
      
      This is unfortunate.  If a thread is clone()'d [1] before a program has
      time to set PKRU to a restrictive value, that thread will be able to write
      to all data, no matter what pkey is set on it.  This weakens any integrity
      guarantees that we want pkeys to provide.
      
      To fix this, we define a very restrictive PKRU to override the
      XSAVE-provided value when we create a new FPU context.  We choose a value
      that only allows access to pkey 0, which is as restrictive as we can
      practically make it.
      
      This does not cause any practical problems with applications using
      protection keys because we require them to specify initial permissions for
      each key when it is allocated, which override the restrictive default.
      
      In the end, this ensures that threads which do not know how to manage their
      own pkey rights can not do damage to data which is pkey-protected.
      
      I would have thought this was a pretty contrived scenario, except that I
      heard a bug report from an MPX user who was creating threads in some very
      early code before main().  It may be crazy, but folks evidently _do_ it.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-arch@vger.kernel.org
      Cc: Dave Hansen <dave@sr71.net>
      Cc: mgorman@techsingularity.net
      Cc: arnd@arndb.de
      Cc: linux-api@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: luto@kernel.org
      Cc: akpm@linux-foundation.org
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/20160729163021.F3C25D4A@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      acd547b2
    • D
      x86/pkeys: Allocation/free syscalls · e8c24d3a
      Dave Hansen 提交于
      This patch adds two new system calls:
      
      	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
      	int pkey_free(int pkey);
      
      These implement an "allocator" for the protection keys
      themselves, which can be thought of as analogous to the allocator
      that the kernel has for file descriptors.  The kernel tracks
      which numbers are in use, and only allows operations on keys that
      are valid.  A key which was not obtained by pkey_alloc() may not,
      for instance, be passed to pkey_mprotect().
      
      These system calls are also very important given the kernel's use
      of pkeys to implement execute-only support.  These help ensure
      that userspace can never assume that it has control of a key
      unless it first asks the kernel.  The kernel does not promise to
      preserve PKRU (right register) contents except for allocated
      pkeys.
      
      The 'init_access_rights' argument to pkey_alloc() specifies the
      rights that will be established for the returned pkey.  For
      instance:
      
      	pkey = pkey_alloc(flags, PKEY_DENY_WRITE);
      
      will allocate 'pkey', but also sets the bits in PKRU[1] such that
      writing to 'pkey' is already denied.
      
      The kernel does not prevent pkey_free() from successfully freeing
      in-use pkeys (those still assigned to a memory range by
      pkey_mprotect()).  It would be expensive to implement the checks
      for this, so we instead say, "Just don't do it" since sane
      software will never do it anyway.
      
      Any piece of userspace calling pkey_alloc() needs to be prepared
      for it to fail.  Why?  pkey_alloc() returns the same error code
      (ENOSPC) when there are no pkeys and when pkeys are unsupported.
      They can be unsupported for a whole host of reasons, so apps must
      be prepared for this.  Also, libraries or LD_PRELOADs might steal
      keys before an application gets access to them.
      
      This allocation mechanism could be implemented in userspace.
      Even if we did it in userspace, we would still need additional
      user/kernel interfaces to tell userspace which keys are being
      used by the kernel internally (such as for execute-only
      mappings).  Having the kernel provide this facility completely
      removes the need for these additional interfaces, or having an
      implementation of this in userspace at all.
      
      Note that we have to make changes to all of the architectures
      that do not use mman-common.h because we use the new
      PKEY_DENY_ACCESS/WRITE macros in arch-independent code.
      
      1. PKRU is the Protection Key Rights User register.  It is a
         usermode-accessible register that controls whether writes
         and/or access to each individual pkey is allowed or denied.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: linux-arch@vger.kernel.org
      Cc: Dave Hansen <dave@sr71.net>
      Cc: arnd@arndb.de
      Cc: linux-api@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: luto@kernel.org
      Cc: akpm@linux-foundation.org
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e8c24d3a
  2. 10 8月, 2016 1 次提交
    • D
      x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation · b79daf85
      Dave Hansen 提交于
      The Memory Protection Keys "rights register" (PKRU) is
      XSAVE-managed, and is saved/restored along with the FPU state.
      
      When kernel code accesses FPU regsisters, it does a delicate
      dance with preempt.  Otherwise, the context switching code can
      get confused as to whether the most up-to-date state is in the
      registers themselves or in the XSAVE buffer.
      
      But, PKRU is not a normal FPU register.  Using it does not
      generate the normal device-not-available (#NM) exceptions which
      means we can not manage it lazily, and the kernel completley
      disallows using lazy mode when it is enabled.
      
      The dance with preempt *only* occurs when managing the FPU
      lazily.  Since we never manage PKRU lazily, we do not have to do
      the dance with preempt; we can access it directly.  Doing it
      this way saves a ton of complicated code (and is faster too).
      
      Further, the XSAVES reenabling failed to patch a bit of code
      in fpu__xfeature_set_state() the checked for compacted buffers.
      That check caused fpu__xfeature_set_state() to silently refuse to
      work when the kernel is using compacted XSAVE buffers.  This
      broke execute-only and future pkey_mprotect() support when using
      compact XSAVE buffers.
      
      But, removing fpu__xfeature_set_state() gets rid of this issue,
      in addition to the nice cleanup and speedup.
      
      This fixes the same thing as a fix that Sai posted:
      
        https://lkml.org/lkml/2016/7/25/637
      
      The fix that he posted is a much more obviously correct, but I
      think we should just do this instead.
      Reported-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-Cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20160727232040.7D060DAD@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b79daf85
  3. 04 8月, 2016 1 次提交
    • M
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada 提交于
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      clearer.
      
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      
      I noticed two cases where config_enabled() is used against a tristate
      option:
      
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
      
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      intention.
      
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Nikolay Martynov <mar.kolya@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: James Cowgill <James.Cowgill@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tony Wu <tung7970@gmail.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Cc: "Maciej W. Rozycki" <macro@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97f2645f
  4. 22 7月, 2016 1 次提交
    • D
      x86/fpu: Do not BUG_ON() in early FPU code · ec3ed4a2
      Dave Hansen 提交于
      I don't think it is really possible to have a system where CPUID
      enumerates support for XSAVE but that it does not have FP/SSE
      (they are "legacy" features and always present).
      
      But, I did manage to hit this case in qemu when I enabled its
      somewhat shaky XSAVE support.  The bummer is that the FPU is set
      up before we parse the command-line or have *any* console support
      including earlyprintk.  That turned what should have been an easy
      thing to debug in to a bit more of an odyssey.
      
      So a BUG() here is worthless.  All it does it guarantee that
      if/when we hit this case we have an empty console.  So, remove
      the BUG() and try to limp along by disabling XSAVE and trying to
      continue.  Add a comment on why we are doing this, and also add
      a common "out_disable" path for leaving fpu__init_system_xstate().
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160720194551.63BB2B58@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ec3ed4a2
  5. 11 7月, 2016 4 次提交
  6. 10 7月, 2016 5 次提交
  7. 18 6月, 2016 4 次提交
  8. 08 6月, 2016 1 次提交
    • D
      x86/fpu: Add tracepoints to dump FPU state at key points · d1898b73
      Dave Hansen 提交于
      I've been carrying this patch around for a bit and it's helped me
      solve at least a couple FPU-related bugs.  In addition to using
      it for debugging, I also drug it out because using AVX (and
      AVX2/AVX-512) can have serious power consequences for a modern
      core.  It's very important to be able to figure out who is using
      it.
      
      It's also insanely useful to go out and see who is using a given
      feature, like MPX or Memory Protection Keys.  If you, for
      instance, want to find all processes using protection keys, you
      can do:
      
      	echo 'xfeatures & 0x200' > filter
      
      Since 0x200 is the protection keys feature bit.
      
      Note that this touches the KVM code.  KVM did a CREATE_TRACE_POINTS
      and then included a bunch of random headers.  If anyone one of
      those included other tracepoints, it would have defined the *OTHER*
      tracepoints.  That's bogus, so move it to the right place.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160601174220.3CDFB90E@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d1898b73
  9. 13 4月, 2016 8 次提交
  10. 13 3月, 2016 1 次提交
    • F
      x86/cpufeature: Enable new AVX-512 features · d0500494
      Fenghua Yu 提交于
      A few new AVX-512 instruction groups/features are added in cpufeatures.h
      for enuermation: AVX512DQ, AVX512BW, and AVX512VL.
      
      Clear the flags in fpu__xstate_clear_all_cpu_caps().
      
      The specification for latest AVX-512 including the features can be found at:
      
        https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf
      
      Note, I didn't enable the flags in KVM. Hopefully the KVM guys can pick up
      the flags and enable them in KVM.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/1457667498-37357-1-git-send-email-fenghua.yu@intel.com
      [ Added more detailed feature descriptions. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d0500494
  11. 12 3月, 2016 1 次提交
    • B
      x86/fpu: Fix eager-FPU handling on legacy FPU machines · 6e686709
      Borislav Petkov 提交于
      i486 derived cores like Intel Quark support only the very old,
      legacy x87 FPU (FSAVE/FRSTOR, CPUID bit FXSR is not set), and
      our FPU code wasn't handling the saving and restoring there
      properly in the 'eagerfpu' case.
      
      So after we made eagerfpu the default for all CPU types:
      
        58122bf1 x86/fpu: Default eagerfpu=on on all CPUs
      
      these old FPU designs broke. First, Andy Shevchenko reported a splat:
      
        WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
      
      which was us trying to execute FXRSTOR on those machines even though
      they don't support it.
      
      After taking care of that, Bryan O'Donoghue reported that a simple FPU
      test still failed because we weren't initializing the FPU state properly
      on those machines.
      
      Take care of all that.
      Reported-and-tested-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Reported-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20160311113206.GD4312@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e686709
  12. 10 3月, 2016 1 次提交
    • Y
      x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off") · a65050c6
      Yu-cheng Yu 提交于
      Leonid Shatz noticed that the SDM interpretation of the following
      recent commit:
      
        394db20c ("x86/fpu: Disable AVX when eagerfpu is off")
      
      ... is incorrect and that the original behavior of the FPU code was correct.
      
      Because AVX is not stated in CR0 TS bit description, it was mistakenly
      believed to be not supported for lazy context switch. This turns out
      to be false:
      
        Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers:
      
         'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/
          MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until
          an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed
          by the new task.'
      
        Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception
        Specification:
      
         'AVX instructions refer to exceptions by classes that include #NM
          "Device Not Available" exception for lazy context switch.'
      
      So revert the commit.
      Reported-by: NLeonid Shatz <leonid.shatz@ravellosystems.com>
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a65050c6
  13. 09 3月, 2016 1 次提交
    • A
      x86/fpu: Fix 'no387' regression · f363938c
      Andy Lutomirski 提交于
      After fixing FPU option parsing, we now parse the 'no387' boot option
      too early: no387 clears X86_FEATURE_FPU before it's even probed, so
      the boot CPU promptly re-enables it.
      
      I suspect it gets even more confused on SMP.
      
      Fix the probing code to leave X86_FEATURE_FPU off if it's been
      disabled by setup_clear_cpu_cap().
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Fixes: 4f81cbaf ("x86/fpu: Fix early FPU command-line parsing")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f363938c
  14. 24 2月, 2016 1 次提交
  15. 19 2月, 2016 3 次提交
    • D
      mm/core, x86/mm/pkeys: Add execute-only protection keys support · 62b5f7d0
      Dave Hansen 提交于
      Protection keys provide new page-based protection in hardware.
      But, they have an interesting attribute: they only affect data
      accesses and never affect instruction fetches.  That means that
      if we set up some memory which is set as "access-disabled" via
      protection keys, we can still execute from it.
      
      This patch uses protection keys to set up mappings to do just that.
      If a user calls:
      
      	mmap(..., PROT_EXEC);
      or
      	mprotect(ptr, sz, PROT_EXEC);
      
      (note PROT_EXEC-only without PROT_READ/WRITE), the kernel will
      notice this, and set a special protection key on the memory.  It
      also sets the appropriate bits in the Protection Keys User Rights
      (PKRU) register so that the memory becomes unreadable and
      unwritable.
      
      I haven't found any userspace that does this today.  With this
      facility in place, we expect userspace to move to use it
      eventually.  Userspace _could_ start doing this today.  Any
      PROT_EXEC calls get converted to PROT_READ inside the kernel, and
      would transparently be upgraded to "true" PROT_EXEC with this
      code.  IOW, userspace never has to do any PROT_EXEC runtime
      detection.
      
      This feature provides enhanced protection against leaking
      executable memory contents.  This helps thwart attacks which are
      attempting to find ROP gadgets on the fly.
      
      But, the security provided by this approach is not comprehensive.
      The PKRU register which controls access permissions is a normal
      user register writable from unprivileged userspace.  An attacker
      who can execute the 'wrpkru' instruction can easily disable the
      protection provided by this feature.
      
      The protection key that is used for execute-only support is
      permanently dedicated at compile time.  This is fine for now
      because there is currently no API to set a protection key other
      than this one.
      
      Despite there being a constant PKRU value across the entire
      system, we do not set it unless this feature is in use in a
      process.  That is to preserve the PKRU XSAVE 'init state',
      which can lead to faster context switches.
      
      PKRU *is* a user register and the kernel is modifying it.  That
      means that code doing:
      
      	pkru = rdpkru()
      	pkru |= 0x100;
      	mmap(..., PROT_EXEC);
      	wrpkru(pkru);
      
      could lose the bits in PKRU that enforce execute-only
      permissions.  To avoid this, we suggest avoiding ever calling
      mmap() or mprotect() when the PKRU value is expected to be
      unstable.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Piotr Kwapulinski <kwapulinski.piotr@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Vladimir Murzin <vladimir.murzin@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: keescook@google.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210240.CB4BB5CA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      62b5f7d0
    • D
      x86/mm/pkeys: Allow kernel to modify user pkey rights register · 84594296
      Dave Hansen 提交于
      The Protection Key Rights for User memory (PKRU) is a 32-bit
      user-accessible register.  It contains two bits for each
      protection key: one to write-disable (WD) access to memory
      covered by the key and another to access-disable (AD).
      
      Userspace can read/write the register with the RDPKRU and WRPKRU
      instructions.  But, the register is saved and restored with the
      XSAVE family of instructions, which means we have to treat it
      like a floating point register.
      
      The kernel needs to write to the register if it wants to
      implement execute-only memory or if it implements a system call
      to change PKRU.
      
      To do this, we need to create a 'pkru_state' buffer, read the old
      contents in to it, modify it, and then tell the FPU code that
      there is modified data in there so it can (possibly) move the
      buffer back in to the registers.
      
      This uses the fpu__xfeature_set_state() function that we defined
      in the previous patch.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210236.0BE13217@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      84594296
    • D
      x86/fpu: Allow setting of XSAVE state · b8b9b6ba
      Dave Hansen 提交于
      We want to modify the Protection Key rights inside the kernel, so
      we need to change PKRU's contents.  But, if we do a plain
      'wrpkru', when we return to userspace we might do an XRSTOR and
      wipe out the kernel's 'wrpkru'.  So, we need to go after PKRU in
      the xsave buffer.
      
      We do this by:
      
        1. Ensuring that we have the XSAVE registers (fpregs) in the
           kernel FPU buffer (fpstate)
        2. Looking up the location of a given state in the buffer
        3. Filling in the stat
        4. Ensuring that the hardware knows that state is present there
           (basically that the 'init optimization' is not in place).
        5. Copying the newly-modified state back to the registers if
           necessary.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210235.5A3139BF@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b8b9b6ba
  16. 16 2月, 2016 2 次提交
    • D
      x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures · c8df4009
      Dave Hansen 提交于
      The protection keys register (PKRU) is saved and restored using
      xsave.  Define the data structure that we will use to access it
      inside the xsave buffer.
      
      Note that we also have to widen the printk of the xsave feature
      masks since this is feature 0x200 and we only did two characters
      before.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210204.56DF8F7B@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c8df4009
    • D
      x86/fpu: Add placeholder for 'Processor Trace' XSAVE state · 1f96b1ef
      Dave Hansen 提交于
      There is an XSAVE state component for Intel Processor Trace (PT).
      But, we do not currently use it.
      
      We add a placeholder in the code for it so it is not a mystery and
      also so we do not need an explicit enum initialization for Protection
      Keys in a moment.
      
      Why don't we use it?
      
      We might end up using this at _some_ point in the future.  But,
      this is a "system" state which requires using the currently
      unsupported XSAVES feature.  Unlike all the other XSAVE states,
      PT state is also not directly tied to a thread.  You might
      context-switch between threads, but not want to change any of the
      PT state.  Or, you might switch between threads, and *do* want to
      change PT state, all depending on what is being traced.
      
      We currently just manually set some MSRs to do this PT context
      switching, and it is unclear whether replacing our direct MSR use
      with XSAVE will be a net win or loss, both in code complexity and
      performance.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: fenghua.yu@intel.com
      Cc: linux-mm@kvack.org
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20160212210158.5E4BCAE2@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f96b1ef
  17. 09 2月, 2016 3 次提交
    • A
      x86/fpu: Default eagerfpu=on on all CPUs · 58122bf1
      Andy Lutomirski 提交于
      We have eager and lazy FPU modes, introduced in:
      
        304bceda ("x86, fpu: use non-lazy fpu restore for processors supporting xsave")
      
      The result is rather messy.  There are two code paths in almost all
      of the FPU code, and only one of them (the eager case) is tested
      frequently, since most kernel developers have new enough hardware
      that we use eagerfpu.
      
      It seems that, on any remotely recent hardware, eagerfpu is a win:
      glibc uses SSE2, so laziness is probably overoptimistic, and, in any
      case, manipulating TS is far slower that saving and restoring the
      full state.  (Stores to CR0.TS are serializing and are poorly
      optimized.)
      
      To try to shake out any latent issues on old hardware, this changes
      the default to eager on all CPUs.  If no performance or functionality
      problems show up, a subsequent patch could remove lazy mode entirely.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/ac290de61bf08d9cfc2664a4f5080257ffc1075a.1453675014.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      58122bf1
    • A
      x86/fpu: Fold fpu_copy() into fpu__copy() · a20d7297
      Andy Lutomirski 提交于
      Splitting it into two functions needlessly obfuscated the code.
      While we're at it, improve the comment slightly.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/3eb5a63a9c5c84077b2677a7dfe684eef96fe59e.1453675014.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a20d7297
    • A
      x86/fpu: Fix FNSAVE usage in eagerfpu mode · 5ed73f40
      Andy Lutomirski 提交于
      In eager fpu mode, having deactivated FPU without immediately
      reloading some other context is illegal.  Therefore, to recover from
      FNSAVE, we can't just deactivate the state -- we need to reload it
      if we're not actively context switching.
      
      We had this wrong in fpu__save() and fpu__copy().  Fix both.
      __kernel_fpu_begin() was fine -- add a comment.
      
      This fixes a warning triggerable with nofxsr eagerfpu=on.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/60662444e13c76f06e23c15c5dcdba31b4ac3d67.1453675014.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5ed73f40