1. 18 6月, 2015 1 次提交
  2. 09 6月, 2015 11 次提交
    • D
      x86/mpx: Support 32-bit binaries on 64-bit kernels · 613fcb7d
      Dave Hansen 提交于
      Right now, the kernel can only switch between 64-bit and 32-bit
      binaries at compile time. This patch adds support for 32-bit
      binaries on 64-bit kernels when we support ia32 emulation.
      
      We essentially choose which set of table sizes to use when doing
      arithmetic for the bounds table calculations.
      
      This also uses a different approach for calculating the table
      indexes than before.  I think the new one makes it much more
      clear what is going on, and allows us to share more code between
      the 32-bit and 64-bit cases.
      Based-on-patch-by: NQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183705.E01F21E2@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      613fcb7d
    • D
      x86/mpx: Introduce new 'directory entry' to 'addr' helper function · 54587653
      Dave Hansen 提交于
      Currently, to get from a bounds directory entry to the virtual
      address of a bounds table, we simply mask off a few low bits.
      However, the set of bits we mask off is different for 32-bit and
      64-bit binaries.
      
      This breaks the operation out in to a helper function and also
      adds a temporary variable to store the result until we are
      sure we are returning one.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183704.007686CE@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      54587653
    • D
      x86: Make is_64bit_mm() widely available · b0e9b09b
      Dave Hansen 提交于
      The uprobes code has a nice helper, is_64bit_mm(), that consults
      both the runtime and compile-time flags for 32-bit support.
      Instead of reinventing the wheel, pull it in to an x86 header so
      we can use it for MPX.
      
      I prefer passing the 'mm' around to test_thread_flag(TIF_IA32)
      because it makes it explicit where the context is coming from.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183704.F0209999@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b0e9b09b
    • D
      x86/mpx: Trace allocation of new bounds tables · cd4996dc
      Dave Hansen 提交于
      Bounds tables are a significant consumer of memory.  It is
      important to know when they are being allocated.  Add a trace
      point to trace whenever an allocation occurs and also its
      virtual address.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183704.EC23A93E@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cd4996dc
    • D
      x86/mpx: Trace the attempts to find bounds tables · 2a1dcb1f
      Dave Hansen 提交于
      There are two different events being traced here.  They are
      doing similar things so share a trace "EVENT_CLASS" and are
      presented together.
      
      1. Trace when MPX is zapping pages "mpx_unmap_zap":
      
      	When MPX can not free an entire bounds table, it will
      	instead try to zap unused parts of a bounds table to free
      	the backing memory.  This decreases RSS (resident set
      	size) without decreasing the virtual space allocated
      	for bounds tables.
      
      2. Trace attempts to find bounds tables "mpx_unmap_search":
      
      	This event traces any time we go looking to unmap a
      	bounds table for a given virtual address range.  This is
      	useful to ensure that the kernel actually "tried" to free
      	a bounds table versus times it succeeded in finding one.
      
      	It might try and fail if it realized that a table was
      	shared with an adjacent VMA which is not being unmapped.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183703.B9D2468B@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2a1dcb1f
    • D
      x86/mpx: Trace entry to bounds exception paths · 97efebf1
      Dave Hansen 提交于
      There are two basic things that can happen as the result of
      a bounds exception (#BR):
      
      	1. We allocate a new bounds table
      	2. We pass up a bounds exception to userspace.
      
      This patch adds a trace point for the case where we are
      passing the exception up to userspace with a signal.
      
      We are also explicit that we're printing out the inverse of
      the 'upper' that we encounter.  If you want to filter, for
      instance, you need to ~ the value first.  The reason we do
      this is because of how 'upper' is stored in the bounds table.
      
      If a pointer's range is:
      
      	0x1000 -> 0x2000
      
      it is stored in the bounds table as (32-bits here for brevity):
      
      	lower: 0x00001000
      	upper: 0xffffdfff
      
      That is so that an all 0's entry:
      
      	lower: 0x00000000
      	upper: 0x00000000
      
      corresponds to the "init" bounds which store a *range* of:
      
      	0x00000000 -> 0xffffffff
      
      That is, by far, the common case, and that lets us use the
      zero page, or deduplicate the memory, etc... The 'upper'
      stored in the table is gibberish to print by itself, so we
      print ~upper to get the *actual*, logical, human-readable
      value printed out.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183703.027BB9B0@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      97efebf1
    • D
      x86/mpx: Trace #BR exceptions · e7126cf5
      Dave Hansen 提交于
      This is the first in a series of MPX tracing patches.
      I've found these extremely useful in the process of
      debugging applications and the kernel code itself.
      
      This exception hooks in to the bounds (#BR) exception
      very early and allows capturing the key registers which
      would influence how the exception is handled.
      
      Note that bndcfgu/bndstatus are technically still
      64-bit registers even in 32-bit mode.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183703.5FE2619A@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e7126cf5
    • Q
      x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK · 3c1d3230
      Qiaowei Ren 提交于
      MPX_BNDCFG_ADDR_MASK is defined two times, so this patch removes
      redundant one.
      Signed-off-by: NQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20150607183702.5F129376@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3c1d3230
    • D
      x86/mpx: Clean up the code by not passing a task pointer around when unnecessary · 46a6e0cf
      Dave Hansen 提交于
      The MPX code can only work on the current task.  You can not,
      for instance, enable MPX management in another process or
      thread. You can also not handle a fault for another process or
      thread.
      
      Despite this, we pass a task_struct around prolifically.  This
      patch removes all of the task struct passing for code paths
      where the code can not deal with another task (which turns out
      to be all of them).
      
      This has no functional changes.  It's just a cleanup.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: bp@alien8.de
      Link: http://lkml.kernel.org/r/20150607183702.6A81DA2C@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      46a6e0cf
    • D
      x86/mpx: Use the new get_xsave_field_ptr()API · a84eeaa9
      Dave Hansen 提交于
      The MPX registers (bndcsr/bndcfgu/bndstatus) are not directly
      accessible via normal instructions.  They essentially act as
      if they were floating point registers and are saved/restored
      along with those registers.
      
      There are two main paths in the MPX code where we care about
      the contents of these registers:
      
      	1. #BR (bounds) faults
      	2. the prctl() code where we are setting MPX up
      
      Both of those paths _might_ be called without the FPU having
      been used.  That means that 'tsk->thread.fpu.state' might
      never be allocated.
      
      Also, fpu_save_init() is not preempt-safe.  It was a bug to
      call it without disabling preemption.  The new
      get_xsave_addr() calls unlazy_fpu() instead and properly
      disables preemption.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Cc: bp@alien8.de
      Link: http://lkml.kernel.org/r/20150607183701.BC0D37CF@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a84eeaa9
    • D
      x86/fpu/xstate: Wrap get_xsave_addr() to make it safer · 04cd027b
      Dave Hansen 提交于
      The MPX code appears is calling a low-level FPU function
      (copy_fpregs_to_fpstate()).  This function is not able to
      be called in all contexts, although it is safe to call
      directly in some cases.
      
      Although probably correct, the current code is ugly and
      potentially error-prone.  So, add a wrapper that calls
      the (slightly) higher-level fpu__save() (which is preempt-
      safe) and also ensures that we even *have* an FPU context
      (in the case that this was called when in lazy FPU mode).
      
      Ingo had this to say about the details about when we need
      preemption disabled:
      
      > it's indeed generally unsafe to access/copy FPU registers with preemption enabled,
      > for two reasons:
      >
      >   - on older systems that use FSAVE the instruction destroys FPU register
      >     contents, which has to be handled carefully
      >
      >   - even on newer systems if we copy to FPU registers (which this code doesn't)
      >     then we don't want a context switch to occur in the middle of it, because a
      >     context switch will write to the fpstate, potentially overwriting our new data
      >     with old FPU state.
      >
      > But it's safe to access FPU registers with preemption enabled in a couple of
      > special cases:
      >
      >   - potentially destructively saving FPU registers: the signal handling code does
      >     this in copy_fpstate_to_sigframe(), because it can rely on the signal restore
      >     side to restore the original FPU state.
      >
      >   - reading FPU registers on modern systems: we don't do this anywhere at the
      >     moment, mostly to keep symmetry with older systems where FSAVE is
      >     destructive.
      >
      >   - initializing FPU registers on modern systems: fpu__clear() does this. Here
      >     it's safe because we don't copy from the fpstate.
      >
      >   - directly writing FPU registers from user-space memory (!). We do this in
      >     fpu__restore_sig(), and it's safe because neither context switches nor
      >     irq-handler FPU use can corrupt the source context of the copy (which is
      >     user-space memory).
      >
      > Note that the MPX code's current use of copy_fpregs_to_fpstate() was safe I think,
      > because:
      >
      >  - MPX is predicated on eagerfpu, so the destructive F[N]SAVE instruction won't be
      >    used.
      >
      >  - the code was only reading FPU registers, and was doing it only in places that
      >    guaranteed that an FPU state was already active (i.e. didn't do it in
      >    kthreads)
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Cc: bp@alien8.de
      Link: http://lkml.kernel.org/r/20150607183700.AA881696@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      04cd027b
  3. 07 6月, 2015 2 次提交
  4. 02 6月, 2015 1 次提交
    • A
      x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers · 425be567
      Andy Lutomirski 提交于
      The early_idt_handlers asm code generates an array of entry
      points spaced nine bytes apart.  It's not really clear from that
      code or from the places that reference it what's going on, and
      the code only works in the first place because GAS never
      generates two-byte JMP instructions when jumping to global
      labels.
      
      Clean up the code to generate the correct array stride (member size)
      explicitly. This should be considerably more robust against
      screw-ups, as GAS will warn if a .fill directive has a negative
      count.  Using '. =' to advance would have been even more robust
      (it would generate an actual error if it tried to move
      backwards), but it would pad with nulls, confusing anyone who
      tries to disassemble the code.  The new scheme should be much
      clearer to future readers.
      
      While we're at it, improve the comments and rename the array and
      common code.
      
      Binutils may start relaxing jumps to non-weak labels.  If so,
      this change will fix our build, and we may need to backport this
      change.
      
      Before, on x86_64:
      
        0000000000000000 <early_idt_handlers>:
           0:   6a 00                   pushq  $0x0
           2:   6a 00                   pushq  $0x0
           4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                                5: R_X86_64_PC32        early_idt_handler-0x4
        ...
          48:   66 90                   xchg   %ax,%ax
          4a:   6a 08                   pushq  $0x8
          4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                                4d: R_X86_64_PC32       early_idt_handler-0x4
        ...
         117:   6a 00                   pushq  $0x0
         119:   6a 1f                   pushq  $0x1f
         11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                                11c: R_X86_64_PC32      early_idt_handler-0x4
      
      After:
      
        0000000000000000 <early_idt_handler_array>:
           0:   6a 00                   pushq  $0x0
           2:   6a 00                   pushq  $0x0
           4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
        ...
          48:   6a 08                   pushq  $0x8
          4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
          4f:   cc                      int3
          50:   cc                      int3
        ...
         117:   6a 00                   pushq  $0x0
         119:   6a 1f                   pushq  $0x1f
         11b:   eb 03                   jmp    120 <early_idt_handler_common>
         11d:   cc                      int3
         11e:   cc                      int3
         11f:   cc                      int3
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Binutils <binutils@sourceware.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      425be567
  5. 29 5月, 2015 1 次提交
  6. 28 5月, 2015 1 次提交
  7. 27 5月, 2015 11 次提交
    • B
      x86: Remove cpu_sibling_mask() and cpu_core_mask() · 960d447b
      Bartosz Golaszewski 提交于
      These functions are arch-specific and duplicate the
      functionality of macros defined in linux/include/topology.h.
      
      Remove them as all the callers in x86 have now switched to using
      the topology_**_cpumask() family.
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benoit Cousson <bcousson@baylibre.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jean Delvare <jdelvare@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/1432645896-12588-10-git-send-email-bgolaszewski@baylibre.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      960d447b
    • B
      sched/topology: Rename topology_thread_cpumask() to topology_sibling_cpumask() · 06931e62
      Bartosz Golaszewski 提交于
      Rename topology_thread_cpumask() to topology_sibling_cpumask()
      for more consistency with scheduler code.
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Benoit Cousson <bcousson@baylibre.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jean Delvare <jdelvare@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/1432645896-12588-2-git-send-email-bgolaszewski@baylibre.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      06931e62
    • I
      x86/fpu: Make WARN_ON_FPU() more robust in the !CONFIG_X86_DEBUG_FPU case · 83242c51
      Ingo Molnar 提交于
      Make sure the WARN_ON_FPU() macro consumes the macro argument,
      to avoid 'unused variable' build warnings if the only use of
      a variable is in debugging code.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      83242c51
    • I
      x86/fpu: Simplify copy_kernel_to_xregs_booting() · d65fcd60
      Ingo Molnar 提交于
      copy_kernel_to_xregs_booting() has a second parameter that is the mask
      of xfeatures that should be copied - but this parameter is always -1.
      
      Simplify the call site of this function, this also makes it more
      similar to the function call signature of other copy_kernel_to*regs()
      functions.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d65fcd60
    • I
      x86/fpu: Standardize the parameter type of copy_kernel_to_fpregs() · 003e2e8b
      Ingo Molnar 提交于
      Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions
      in line with the parameter passing convention of other kernel-to-FPU-registers
      copying functions: pass around an in-memory FPU register state pointer,
      instead of struct fpu *.
      
      NOTE: This patch also changes the assembly constraint of the FXSAVE-leak
            workaround from 'fpu->fpregs_active' to 'fpstate' - but that is fine,
            as we only need a valid memory address there for the FILDL instruction.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      003e2e8b
    • I
      x86/fpu: Remove error return values from copy_kernel_to_*regs() functions · 9ccc27a5
      Ingo Molnar 提交于
      None of the copy_kernel_to_*regs() FPU register copying functions are
      supposed to fail, and all of them have debugging checks that enforce
      this.
      
      Remove their return values and simplify their call sites, which have
      redundant error checks and error handling code paths.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9ccc27a5
    • I
      x86/fpu: Rename copy_fpstate_to_fpregs() to copy_kernel_to_fpregs() · 3e1bf47e
      Ingo Molnar 提交于
      Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions
      in line with the naming of other kernel-to-FPU-registers copying functions.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3e1bf47e
    • I
      x86/fpu: Add debugging checks to all copy_kernel_to_*() functions · 43b287b3
      Ingo Molnar 提交于
      Copying from in-kernel FPU context buffers to FPU registers are
      never supposed to fault.
      
      Add debugging checks to copy_kernel_to_fxregs() and copy_kernel_to_fregs()
      to double check this assumption.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      43b287b3
    • I
      x86/fpu: Rename fpu__activate_fpstate() to fpu__activate_fpstate_write() · 6a81d7eb
      Ingo Molnar 提交于
      Remaining users of fpu__activate_fpstate() are all places that want to modify
      FPU registers, rename the function to fpu__activate_fpstate_write() according
      to this usage.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6a81d7eb
    • I
      x86/fpu: Split out the fpu__activate_fpstate_read() method · 05602812
      Ingo Molnar 提交于
      Currently fpu__activate_fpstate() is used for two distinct purposes:
      
        - read access by ptrace and core dumping, where in the core dumping
          case the current task's FPU state may be examined as well.
      
        - write access by ptrace, which modifies FPU registers and expects
          the modified registers to be reloaded on the next context switch.
      
      Split out the reading side into fpu__activate_fpstate_read().
      
      ( Note that this is just a pure duplication of fpu__activate_fpstate()
        for the time being, we'll optimize the new function in the next patch. )
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      05602812
    • I
      x86/fpu: Fix FPU register read access to the current task · 47f01e8c
      Ingo Molnar 提交于
      Bobby Powers reported the following FPU warning during ELF coredumping:
      
         WARNING: CPU: 0 PID: 27452 at arch/x86/kernel/fpu/core.c:324 fpu__activate_stopped+0x8a/0xa0()
      
      This warning unearthed an invalid assumption about fpu__activate_stopped()
      that I added in:
      
        67e97fc2 ("x86/fpu: Rename init_fpu() to fpu__unlazy_stopped() and add debugging check")
      
      the old init_fpu() function had an (intentional but obscure) side effect:
      when FPU registers are accessed for the current task, for reading, then
      it synchronized live in-register FPU state with the fpstate by saving it.
      
      So fix this bug by saving the FPU if we are the current task. We'll
      still warn in fpu__save() if this is called for not yet stopped
      child tasks, so the debugging check is still preserved.
      
      Also rename the function to fpu__activate_fpstate(), because it's not
      exclusively used for stopped tasks, but for the current task as well.
      
      ( Note that this bug calls for a cleaner separation of access-for-read
        and access-for-modification FPU methods, but we'll do that in separate
        patches. )
      Reported-by: NBobby Powers <bobbypowers@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      47f01e8c
  8. 25 5月, 2015 6 次提交
    • I
      x86/fpu: Micro-optimize the copy_xregs_to_kernel*() and copy_kernel_to_xregs*() functions · 8c05f05e
      Ingo Molnar 提交于
      The copy_xregs_to_kernel*() and copy_kernel_to_xregs*() functions are used
      to copy FPU registers to kernel memory and vice versa.
      
      They are never expected to fail, yet they have a return code, mostly because
      that way they can share the assembly macros with the copy*user*() functions.
      
      This error code is then silently ignored by the context switching
      and other code - which made the bug in:
      
        b8c1b8ea ("x86/fpu: Fix FPU state save area alignment bug")
      
      harder to fix than necessary.
      
      So remove the return values and check for no faults when FPU debugging
      is enabled in the .config.
      
      This improves the eagerfpu context switching fast path by a couple of
      instructions, when FPU debugging is disabled:
      
         ffffffff810407fa:      89 c2                   mov    %eax,%edx
         ffffffff810407fc:      48 0f ae 2f             xrstor64 (%rdi)
         ffffffff81040800:      31 c0                   xor    %eax,%eax
        -ffffffff81040802:      eb 0a                   jmp    ffffffff8104080e <__switch_to+0x321>
        +ffffffff81040802:      eb 16                   jmp    ffffffff8104081a <__switch_to+0x32d>
         ffffffff81040804:      31 c0                   xor    %eax,%eax
         ffffffff81040806:      48 0f ae 8b c0 05 00    fxrstor64 0x5c0(%rbx)
         ffffffff8104080d:      00
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8c05f05e
    • I
      x86/fpu: Improve the initialization logic of 'err' around xstate_fault() constraints · 685c9616
      Ingo Molnar 提交于
      There's a confusing aspect of how xstate_fault() constraints are
      handled by the FPU register/memory copying functions in
      fpu/internal.h: they use "0" (0) to signal that the asm code
      will not always set 'err' to a valid value.
      
      But 'err' is already initialized to 0 in C code, which is duplicated
      by the asm() constraint. Should the initialization value ever be
      changed, it might become subtly inconsistent with the not too clear
      asm() constraint.
      
      Use 'err' as the value of the input variable instead, to clarify
      this all.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      685c9616
    • I
      x86/fpu: Improve xstate_fault() handling · 87b6559d
      Ingo Molnar 提交于
      There are two problems with xstate_fault handling:
      
       - The xstate_fault() macro takes an argument, but that's
         propagated into the assembly named label as well. This
         is technically correct currently but might result in
         failures if anytime a more complex argument is used.
         So use a separate '_err' name instead for the label.
      
       - All the xstate_fault() using functions have an error
         variable named 'err', which is an output variable to
         the asm() they are using. The problem is, it's not always
         set by the asm(), in which case the compiler might
         optimize out its initialization, so that the C variable
         'err' might become corrupted after the asm() - confusing
         anyone who tries to take advantage of this variable
         after the asm(). Mark it an input variable as well.
      
         This is a latent bug currently, but an upcoming debug
         patch will make use of 'err'.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      87b6559d
    • I
      x86/fpu: Rename xstate related 'fx' references to 'xstate' · 87dafd41
      Ingo Molnar 提交于
      So the xstate code was probably first copied from the fxregs code,
      hence it carried over the 'fx' naming for the state pointer variable.
      
      But this is slightly confusing, as we usually on call the (legacy)
      MMX/SSE state 'fx', both in data structures and in the functions
      build around FXSAVE/FXRSTOR.
      
      So rename it to 'xstate' to make it more apparent what it is related to.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      87dafd41
    • I
      x86/fpu: Move the xstate copying functions into fpu/internal.h · fd169b05
      Ingo Molnar 提交于
      All the other register<-> memory copying functions are defined
      in fpu/internal.h, so move the xstate variants there too.
      
      Beyond being more consistent, this also allows FPU debugging
      checks to be added to them. (Because they can now use the
      macros defined in fpu/internal.h.)
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fd169b05
    • I
      x86/fpu: Fix FPU state save area alignment bug · b8c1b8ea
      Ingo Molnar 提交于
      On most configs task-struct is cache line aligned, which makes
      the XSAVE area's 64-byte required alignment work out fine.
      
      But on some .config's task_struct is aligned only to 16 bytes
      (enforced by ARCH_MIN_TASKALIGN), which makes things like
      fpu__copy() (that XSAVEOPT uses) not work so well.
      
      I broke this in:
      
        7366ed77 ("x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)")
      
      which embedded the fpstate in the task_struct.
      
      The alignment requirements of the FPU code were originally present
      in ARCH_MIN_TASKALIGN, which still has a value of 16, which was the
      alignment requirement of the FPU state area prior XSAVE. But this
      link was not documented (and not required) and the link got lost
      when the FPU state area was made dynamic years ago.
      
      With XSAVEOPT the minimum alignment requirment went up to 64 bytes,
      and the embedding of the FPU state area in task_struct exposed it
      again - and '16' was not increased to '64'.
      
      So fix this bug, but also try to address the underlying lost link
      of information that made it easier to happen:
      
        - document ARCH_MIN_TASKALIGN a bit better
      
        - use alignof() to recover the current alignment requirements.
          This would work in the future as well, should the alignment
          requirements go up to 128 bytes with things like AVX512.
      
      ( We should probably also use the vSMP alignment rules for all
        of x86, but that's for another patch. )
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b8c1b8ea
  9. 20 5月, 2015 3 次提交
  10. 19 5月, 2015 3 次提交
    • I
      x86/fpu: Reorganize fpu/internal.h · b1b64dc3
      Ingo Molnar 提交于
      fpu/internal.h has grown organically, with not much high level structure,
      which hurts its readability.
      
      Organize the various definitions into 5 sections:
      
       - high level FPU state functions
       - FPU/CPU feature flag helpers
       - fpstate handling functions
       - FPU context switching helpers
       - misc helper functions
      
      Other related changes:
      
       - Move MXCSR_DEFAULT to fpu/types.h.
       - drop the unused X87_FSW_ES define
      
      No change in functionality.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b1b64dc3
    • I
      x86/fpu: Add CONFIG_X86_DEBUG_FPU=y FPU debugging code · e97131a8
      Ingo Molnar 提交于
      There are various internal FPU state debugging checks that never
      trigger in practice, but which are useful for FPU code development.
      
      Separate these out into CONFIG_X86_DEBUG_FPU=y, and also add a
      couple of new ones.
      
      The size difference is about 0.5K of code on defconfig:
      
         text        data     bss          filename
         15028906    2578816  1638400      vmlinux
         15029430    2578816  1638400      vmlinux
      
      ( Keep this enabled by default until the new FPU code is debugged. )
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e97131a8
    • I
      x86/fpu: Pass 'struct fpu' to fpu__restore() · e1884d69
      Ingo Molnar 提交于
      This cleans up the call sites and the function a bit,
      and also makes it more symmetric with the other high
      level FPU state handling functions.
      
      It's still only valid for the current task, as we copy
      to the FPU registers of the current CPU.
      
      No change in functionality.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e1884d69