1. 23 1月, 2015 4 次提交
    • A
      x86, tls: Interpret an all-zero struct user_desc as "no segment" · 3669ef9f
      Andy Lutomirski 提交于
      The Witcher 2 did something like this to allocate a TLS segment index:
      
              struct user_desc u_info;
              bzero(&u_info, sizeof(u_info));
              u_info.entry_number = (uint32_t)-1;
      
              syscall(SYS_set_thread_area, &u_info);
      
      Strictly speaking, this code was never correct.  It should have set
      read_exec_only and seg_not_present to 1 to indicate that it wanted
      to find a free slot without putting anything there, or it should
      have put something sensible in the TLS slot if it wanted to allocate
      a TLS entry for real.  The actual effect of this code was to
      allocate a bogus segment that could be used to exploit espfix.
      
      The set_thread_area hardening patches changed the behavior, causing
      set_thread_area to return -EINVAL and crashing the game.
      
      This changes set_thread_area to interpret this as a request to find
      a free slot and to leave it empty, which isn't *quite* what the game
      expects but should be close enough to keep it working.  In
      particular, using the code above to allocate two segments will
      allocate the same segment both times.
      
      According to FrostbittenKing on Github, this fixes The Witcher 2.
      
      If this somehow still causes problems, we could instead allocate
      a limit==0 32-bit data segment, but that seems rather ugly to me.
      
      Fixes: 41bdc785 x86/tls: Validate TLS entries to protect espfix
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: stable@vger.kernel.org
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/0cb251abe1ff0958b8e468a9a9a905b80ae3a746.1421954363.git.luto@amacapital.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3669ef9f
    • A
      x86, tls, ldt: Stop checking lm in LDT_empty · e30ab185
      Andy Lutomirski 提交于
      32-bit programs don't have an lm bit in their ABI, so they can't
      reliably cause LDT_empty to return true without resorting to memset.
      They shouldn't need to do this.
      
      This should fix a longstanding, if minor, issue in all 64-bit kernels
      as well as a potential regression in the TLS hardening code.
      
      Fixes: 41bdc785 x86/tls: Validate TLS entries to protect espfix
      Cc: stable@vger.kernel.org
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/72a059de55e86ad5e2935c80aa91880ddf19d07c.1421954363.git.luto@amacapital.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e30ab185
    • D
      x86, mpx: Fix potential performance issue on unmaps · c922228e
      Dave Hansen 提交于
      The 3.19 merge window saw some TLB modifications merged which caused a
      performance regression. They were fixed in commit 045bbb9fa.
      
      Once that fix was applied, I also noticed that there was a small
      but intermittent regression still present.  It was not present
      consistently enough to bisect reliably, but I'm fairly confident
      that it came from (my own) MPX patches.  The source was reading
      a relatively unused field in the mm_struct via arch_unmap.
      
      I also noted that this code was in the main instruction flow of
      do_munmap() and probably had more icache impact than we want.
      
      This patch does two things:
      1. Adds a static (via Kconfig) and dynamic (via cpuid) check
         for MPX with cpu_feature_enabled().  This keeps us from
         reading that cacheline in the mm and trades it for a check
         of the global CPUID variables at least on CPUs without MPX.
      2. Adds an unlikely() to ensure that the MPX call ends up out
         of the main instruction flow in do_munmap().  I've added
         a detailed comment about why this was done and why we want
         it even on systems where MPX is present.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: luto@amacapital.net
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20150108223021.AEEAB987@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c922228e
    • D
      x86, mpx: Explicitly disable 32-bit MPX support on 64-bit kernels · 814564a0
      Dave Hansen 提交于
      We had originally planned on submitting MPX support in one patch
      set.  We eventually broke it up in to two pieces for easier
      review.  One of the features that didn't make the first round
      was supporting 32-bit binaries on 64-bit kernels.
      
      Once we split the set up, we never added code to restrict 32-bit
      binaries from _using_ MPX on 64-bit kernels.
      
      The 32-bit bounds tables are a different format than the 64-bit
      ones.  Without this patch, the kernel will try to read a 32-bit
      binary's tables as if they were the 64-bit version.  They will
      likely be noticed as being invalid rather quickly and the app
      will get killed, but that's kinda mean.
      
      This patch adds an explicit check, and will make a 64-bit kernel
      essentially behave as if it has no MPX support when called from
      a 32-bit binary.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20150108223020.9E9AA511@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      814564a0
  2. 20 1月, 2015 6 次提交
  3. 16 1月, 2015 2 次提交
  4. 15 1月, 2015 1 次提交
    • S
      ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing · 237d28db
      Steven Rostedt (Red Hat) 提交于
      If the function graph tracer traces a jprobe callback, the system will
      crash. This can easily be demonstrated by compiling the jprobe
      sample module that is in the kernel tree, loading it and running the
      function graph tracer.
      
       # modprobe jprobe_example.ko
       # echo function_graph > /sys/kernel/debug/tracing/current_tracer
       # ls
      
      The first two commands end up in a nice crash after the first fork.
      (do_fork has a jprobe attached to it, so "ls" just triggers that fork)
      
      The problem is caused by the jprobe_return() that all jprobe callbacks
      must end with. The way jprobes works is that the function a jprobe
      is attached to has a breakpoint placed at the start of it (or it uses
      ftrace if fentry is supported). The breakpoint handler (or ftrace callback)
      will copy the stack frame and change the ip address to return to the
      jprobe handler instead of the function. The jprobe handler must end
      with jprobe_return() which swaps the stack and does an int3 (breakpoint).
      This breakpoint handler will then put back the saved stack frame,
      simulate the instruction at the beginning of the function it added
      a breakpoint to, and then continue on.
      
      For function tracing to work, it hijakes the return address from the
      stack frame, and replaces it with a hook function that will trace
      the end of the call. This hook function will restore the return
      address of the function call.
      
      If the function tracer traces the jprobe handler, the hook function
      for that handler will not be called, and its saved return address
      will be used for the next function. This will result in a kernel crash.
      
      To solve this, pause function tracing before the jprobe handler is called
      and unpause it before it returns back to the function it probed.
      
      Some other updates:
      
      Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the
      code look a bit cleaner and easier to understand (various tries to fix
      this bug required this change).
      
      Note, if fentry is being used, jprobes will change the ip address before
      the function graph tracer runs and it will not be able to trace the
      function that the jprobe is probing.
      
      Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org
      
      Cc: stable@vger.kernel.org # 2.6.30+
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      237d28db
  5. 13 1月, 2015 1 次提交
    • J
      x86/xen: properly retrieve NMI reason · f221b04f
      Jan Beulich 提交于
      Using the native code here can't work properly, as the hypervisor would
      normally have cleared the two reason bits by the time Dom0 gets to see
      the NMI (if passed to it at all). There's a shared info field for this,
      and there's an existing hook to use - just fit the two together. This
      is particularly relevant so that NMIs intended to be handled by APEI /
      GHES actually make it to the respective handler.
      
      Note that the hook can (and should) be used irrespective of whether
      being in Dom0, as accessing port 0x61 in a DomU would be even worse,
      while the shared info field would just hold zero all the time. Note
      further that hardware NMI handling for PVH doesn't currently work
      anyway due to missing code in the hypervisor (but it is expected to
      work the native rather than the PV way).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      f221b04f
  6. 12 1月, 2015 4 次提交
  7. 09 1月, 2015 4 次提交
  8. 08 1月, 2015 4 次提交
  9. 06 1月, 2015 1 次提交
  10. 05 1月, 2015 2 次提交
    • V
      crypto: sha-mb - Add avx2_supported check. · 0b8c960c
      Vinson Lee 提交于
      This patch fixes this allyesconfig target build error with older
      binutils.
      
        LD      arch/x86/crypto/built-in.o
      ld: arch/x86/crypto/sha-mb/built-in.o: No such file: No such file or directory
      
      Cc: stable@vger.kernel.org # 3.18+
      Signed-off-by: NVinson Lee <vlee@twitter.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      0b8c960c
    • M
      crypto: aesni - fix "by8" variant for 128 bit keys · 0b1e95b2
      Mathias Krause 提交于
      The "by8" counter mode optimization is broken for 128 bit keys with
      input data longer than 128 bytes. It uses the wrong key material for
      en- and decryption.
      
      The key registers xkey0, xkey4, xkey8 and xkey12 need to be preserved
      in case we're handling more than 128 bytes of input data -- they won't
      get reloaded after the initial load. They must therefore be (a) loaded
      on the first iteration and (b) be preserved for the latter ones. The
      implementation for 128 bit keys does not comply with (a) nor (b).
      
      Fix this by bringing the implementation back to its original source
      and correctly load the key registers and preserve their values by
      *not* re-using the registers for other purposes.
      
      Kudos to James for reporting the issue and providing a test case
      showing the discrepancies.
      Reported-by: NJames Yonan <james@openvpn.net>
      Cc: Chandramouli Narayanan <mouli@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v3.18
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      0b1e95b2
  11. 04 1月, 2015 1 次提交
    • D
      x86, um: actually mark system call tables readonly · b485342b
      Daniel Borkmann 提交于
      Commit a074335a ("x86, um: Mark system call tables readonly") was
      supposed to mark the sys_call_table in UML as RO by adding the const,
      but it doesn't have the desired effect as it's nevertheless being placed
      into the data section since __cacheline_aligned enforces sys_call_table
      being placed into .data..cacheline_aligned instead. We need to use
      the ____cacheline_aligned version instead to fix this issue.
      
      Before:
      
      $ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                       U sys_writev
      0000000000000000 D sys_call_table
      0000000000000000 D syscall_table_size
      
      After:
      
      $ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
                       U sys_writev
      0000000000000000 R sys_call_table
      0000000000000000 D syscall_table_size
      
      Fixes: a074335a ("x86, um: Mark system call tables readonly")
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      b485342b
  12. 28 12月, 2014 2 次提交
  13. 24 12月, 2014 1 次提交
    • A
      x86, vdso: Use asm volatile in __getcpu · 1ddf0b1b
      Andy Lutomirski 提交于
      In Linux 3.18 and below, GCC hoists the lsl instructions in the
      pvclock code all the way to the beginning of __vdso_clock_gettime,
      slowing the non-paravirt case significantly.  For unknown reasons,
      presumably related to the removal of a branch, the performance issue
      is gone as of
      
      e76b027e x86,vdso: Use LSL unconditionally for vgetcpu
      
      but I don't trust GCC enough to expect the problem to stay fixed.
      
      There should be no correctness issue, because the __getcpu calls in
      __vdso_vlock_gettime were never necessary in the first place.
      
      Note to stable maintainers: In 3.18 and below, depending on
      configuration, gcc 4.9.2 generates code like this:
      
           9c3:       44 0f 03 e8             lsl    %ax,%r13d
           9c7:       45 89 eb                mov    %r13d,%r11d
           9ca:       0f 03 d8                lsl    %ax,%ebx
      
      This patch won't apply as is to any released kernel, but I'll send a
      trivial backported version if needed.
      
      Fixes: 51c19b4f x86: vdso: pvclock gettime support
      Cc: stable@vger.kernel.org # 3.8+
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      1ddf0b1b
  14. 23 12月, 2014 4 次提交
  15. 21 12月, 2014 1 次提交
    • A
      x86_64, vdso: Fix the vdso address randomization algorithm · 394f56fe
      Andy Lutomirski 提交于
      The theory behind vdso randomization is that it's mapped at a random
      offset above the top of the stack.  To avoid wasting a page of
      memory for an extra page table, the vdso isn't supposed to extend
      past the lowest PMD into which it can fit.  Other than that, the
      address should be a uniformly distributed address that meets all of
      the alignment requirements.
      
      The current algorithm is buggy: the vdso has about a 50% probability
      of being at the very end of a PMD.  The current algorithm also has a
      decent chance of failing outright due to incorrect handling of the
      case where the top of the stack is near the top of its PMD.
      
      This fixes the implementation.  The paxtest estimate of vdso
      "randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
      don't know what the paxtest code is actually calculating.)
      
      It's worth noting that this algorithm is inherently biased: the vdso
      is more likely to end up near the end of its PMD than near the
      beginning.  Ideally we would either nix the PMD sharing requirement
      or jointly randomize the vdso and the stack to reduce the bias.
      
      In the mean time, this is a considerable improvement with basically
      no risk of compatibility issues, since the allowed outputs of the
      algorithm are unchanged.
      
      As an easy test, doing this:
      
      for i in `seq 10000`
        do grep -P vdso /proc/self/maps |cut -d- -f1
      done |sort |uniq -d
      
      used to produce lots of output (1445 lines on my most recent run).
      A tiny subset looks like this:
      
      7fffdfffe000
      7fffe01fe000
      7fffe05fe000
      7fffe07fe000
      7fffe09fe000
      7fffe0bfe000
      7fffe0dfe000
      
      Note the suspicious fe000 endings.  With the fix, I get a much more
      palatable 76 repeated addresses.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      394f56fe
  16. 18 12月, 2014 2 次提交