1. 31 7月, 2015 1 次提交
  2. 21 7月, 2015 3 次提交
  3. 18 7月, 2015 1 次提交
    • D
      x86/fpu, sched: Dynamically allocate 'struct fpu' · 0c8c0f03
      Dave Hansen 提交于
      The FPU rewrite removed the dynamic allocations of 'struct fpu'.
      But, this potentially wastes massive amounts of memory (2k per
      task on systems that do not have AVX-512 for instance).
      
      Instead of having a separate slab, this patch just appends the
      space that we need to the 'task_struct' which we dynamically
      allocate already.  This saves from doing an extra slab
      allocation at fork().
      
      The only real downside here is that we have to stick everything
      and the end of the task_struct.  But, I think the
      BUILD_BUG_ON()s I stuck in there should keep that from being too
      fragile.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1437128892-9831-2-git-send-email-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0c8c0f03
  4. 18 6月, 2015 1 次提交
  5. 09 6月, 2015 1 次提交
  6. 25 5月, 2015 1 次提交
    • I
      x86/fpu: Fix FPU state save area alignment bug · b8c1b8ea
      Ingo Molnar 提交于
      On most configs task-struct is cache line aligned, which makes
      the XSAVE area's 64-byte required alignment work out fine.
      
      But on some .config's task_struct is aligned only to 16 bytes
      (enforced by ARCH_MIN_TASKALIGN), which makes things like
      fpu__copy() (that XSAVEOPT uses) not work so well.
      
      I broke this in:
      
        7366ed77 ("x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)")
      
      which embedded the fpstate in the task_struct.
      
      The alignment requirements of the FPU code were originally present
      in ARCH_MIN_TASKALIGN, which still has a value of 16, which was the
      alignment requirement of the FPU state area prior XSAVE. But this
      link was not documented (and not required) and the link got lost
      when the FPU state area was made dynamic years ago.
      
      With XSAVEOPT the minimum alignment requirment went up to 64 bytes,
      and the embedding of the FPU state area in task_struct exposed it
      again - and '16' was not increased to '64'.
      
      So fix this bug, but also try to address the underlying lost link
      of information that made it easier to happen:
      
        - document ARCH_MIN_TASKALIGN a bit better
      
        - use alignof() to recover the current alignment requirements.
          This would work in the future as well, should the alignment
          requirements go up to 128 bytes with things like AVX512.
      
      ( We should probably also use the vSMP alignment rules for all
        of x86, but that's for another patch. )
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b8c1b8ea
  7. 19 5月, 2015 8 次提交
    • I
      x86/fpu: Remove the extra fpu__detect() layer · c66e3f28
      Ingo Molnar 提交于
      Now that fpu__detect() has become an empty layer around
      fpu__init_system(), eliminate it and make fpu__init_system()
      the main system initialization routine.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c66e3f28
    • I
      x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active · c5bedc68
      Ingo Molnar 提交于
      Introduce a simple fpu->fpstate_active flag in the fpu context data structure
      and use that instead of PF_USED_MATH in task->flags.
      
      Testing for this flag byte should be slightly more efficient than
      testing a bit in a bitmask, but the main advantage is that most
      FPU functions can now be performed on a 'struct fpu' alone, they
      don't need access to 'struct task_struct' anymore.
      
      There's a slight linecount increase, mostly due to the 'fpu' local
      variables and due to extra comments. The local variables will go away
      once we move most of the FPU methods to pure 'struct fpu' parameters.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c5bedc68
    • I
      x86/fpu: Make task_xstate_cachep static · f55f88e2
      Ingo Molnar 提交于
      It's now local to fpu/core.c, make it static.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f55f88e2
    • I
      x86/fpu: Remove the free_thread_xstate() complication · 11ad1927
      Ingo Molnar 提交于
      Use fpstate_free() directly to manage FPU state.
      
      Only process.c was using this method, so this is a speedup as well,
      as it removes the extra function call and related clobbers.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      11ad1927
    • I
      x86/fpu: Move FPU data structures to asm/fpu_types.h · 14b9675a
      Ingo Molnar 提交于
      Move the FPU details to asm/fpu_types.h, to further factor out the
      FPU code.
      
      ( As an added bonus, the 'struct orig_ist' definition now moves
        next to its other data types - the FPU definitions were
        slapped in the middle of them for some mysterious reason. )
      
      No code changed.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      14b9675a
    • I
      x86/fpu: Improve the comment for the fpu::counter field · 12600999
      Ingo Molnar 提交于
      This was pretty hard to read, improve it.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      12600999
    • I
      x86/fpu: Move thread_info::fpu_counter into thread_info::fpu.counter · c0c2803d
      Ingo Molnar 提交于
      This field is kept separate from the main FPU state structure for
      no good reason.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c0c2803d
    • I
      x86/fpu: Rename fpu_detect() to fpu__detect() · 1a7dc0db
      Ingo Molnar 提交于
      Use the fpu__*() namespace to organize FPU ops better.
      
      Also document fpu__detect() a bit.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1a7dc0db
  8. 03 4月, 2015 1 次提交
  9. 17 3月, 2015 7 次提交
  10. 07 3月, 2015 1 次提交
  11. 06 3月, 2015 3 次提交
  12. 25 2月, 2015 1 次提交
    • P
      x86: Add support for Intel Cache QoS Monitoring (CQM) detection · cbc82b17
      Peter P Waskiewicz Jr 提交于
      This patch adds support for the new Cache QoS Monitoring (CQM)
      feature found in future Intel Xeon processors.  It includes the
      new values to track CQM resources to the cpuinfo_x86 structure,
      plus the CPUID detection routines for CQM.
      
      CQM allows a process, or set of processes, to be tracked by the CPU
      to determine the cache usage of that task group.  Using this data
      from the CPU, software can be written to extract this data and
      report cache usage and occupancy for a particular process, or
      group of processes.
      
      More information about Cache QoS Monitoring can be found in the
      Intel (R) x86 Architecture Software Developer Manual, section 17.14.
      Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Webb <chris@arachsys.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Honeyman <stevenhoneyman@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422038748-21397-5-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cbc82b17
  13. 23 2月, 2015 1 次提交
    • B
      x86/asm: Cleanup prefetch primitives · a930dc45
      Borislav Petkov 提交于
      This is based on a patch originally by hpa.
      
      With the current improvements to the alternatives, we can simply use %P1
      as a mem8 operand constraint and rely on the toolchain to generate the
      proper instruction sizes. For example, on 32-bit, where we use an empty
      old instruction we get:
      
        apply_alternatives: feat: 6*32+8, old: (c104648b, len: 4), repl: (c195566c, len: 4)
        c104648b: alt_insn: 90 90 90 90
        c195566c: rpl_insn: 0f 0d 4b 5c
      
        ...
      
        apply_alternatives: feat: 6*32+8, old: (c18e09b4, len: 3), repl: (c1955948, len: 3)
        c18e09b4: alt_insn: 90 90 90
        c1955948: rpl_insn: 0f 0d 08
      
        ...
      
        apply_alternatives: feat: 6*32+8, old: (c1190cf9, len: 7), repl: (c1955a79, len: 7)
        c1190cf9: alt_insn: 90 90 90 90 90 90 90
        c1955a79: rpl_insn: 0f 0d 0d a0 d4 85 c1
      
      all with the proper padding done depending on the size of the
      replacement instruction the compiler generates.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      a930dc45
  14. 04 2月, 2015 1 次提交
  15. 18 11月, 2014 3 次提交
    • D
      x86, mpx: On-demand kernel allocation of bounds tables · fe3d197f
      Dave Hansen 提交于
      This is really the meat of the MPX patch set.  If there is one patch to
      review in the entire series, this is the one.  There is a new ABI here
      and this kernel code also interacts with userspace memory in a
      relatively unusual manner.  (small FAQ below).
      
      Long Description:
      
      This patch adds two prctl() commands to provide enable or disable the
      management of bounds tables in kernel, including on-demand kernel
      allocation (See the patch "on-demand kernel allocation of bounds tables")
      and cleanup (See the patch "cleanup unused bound tables"). Applications
      do not strictly need the kernel to manage bounds tables and we expect
      some applications to use MPX without taking advantage of this kernel
      support. This means the kernel can not simply infer whether an application
      needs bounds table management from the MPX registers.  The prctl() is an
      explicit signal from userspace.
      
      PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
      require kernel's help in managing bounds tables.
      
      PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
      want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
      won't allocate and free bounds tables even if the CPU supports MPX.
      
      PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
      directory out of a userspace register (bndcfgu) and then cache it into
      a new field (->bd_addr) in  the 'mm_struct'.  PR_MPX_DISABLE_MANAGEMENT
      will set "bd_addr" to an invalid address.  Using this scheme, we can
      use "bd_addr" to determine whether the management of bounds tables in
      kernel is enabled.
      
      Also, the only way to access that bndcfgu register is via an xsaves,
      which can be expensive.  Caching "bd_addr" like this also helps reduce
      the cost of those xsaves when doing table cleanup at munmap() time.
      Unfortunately, we can not apply this optimization to #BR fault time
      because we need an xsave to get the value of BNDSTATUS.
      
      ==== Why does the hardware even have these Bounds Tables? ====
      
      MPX only has 4 hardware registers for storing bounds information.
      If MPX-enabled code needs more than these 4 registers, it needs to
      spill them somewhere. It has two special instructions for this
      which allow the bounds to be moved between the bounds registers
      and some new "bounds tables".
      
      They are similar conceptually to a page fault and will be raised by
      the MPX hardware during both bounds violations or when the tables
      are not present. This patch handles those #BR exceptions for
      not-present tables by carving the space out of the normal processes
      address space (essentially calling the new mmap() interface indroduced
      earlier in this patch set.) and then pointing the bounds-directory
      over to it.
      
      The tables *need* to be accessed and controlled by userspace because
      the instructions for moving bounds in and out of them are extremely
      frequent. They potentially happen every time a register pointing to
      memory is dereferenced. Any direct kernel involvement (like a syscall)
      to access the tables would obviously destroy performance.
      
      ==== Why not do this in userspace? ====
      
      This patch is obviously doing this allocation in the kernel.
      However, MPX does not strictly *require* anything in the kernel.
      It can theoretically be done completely from userspace. Here are
      a few ways this *could* be done. I don't think any of them are
      practical in the real-world, but here they are.
      
      Q: Can virtual space simply be reserved for the bounds tables so
         that we never have to allocate them?
      A: As noted earlier, these tables are *HUGE*. An X-GB virtual
         area needs 4*X GB of virtual space, plus 2GB for the bounds
         directory. If we were to preallocate them for the 128TB of
         user virtual address space, we would need to reserve 512TB+2GB,
         which is larger than the entire virtual address space today.
         This means they can not be reserved ahead of time. Also, a
         single process's pre-popualated bounds directory consumes 2GB
         of virtual *AND* physical memory. IOW, it's completely
         infeasible to prepopulate bounds directories.
      
      Q: Can we preallocate bounds table space at the same time memory
         is allocated which might contain pointers that might eventually
         need bounds tables?
      A: This would work if we could hook the site of each and every
         memory allocation syscall. This can be done for small,
         constrained applications. But, it isn't practical at a larger
         scale since a given app has no way of controlling how all the
         parts of the app might allocate memory (think libraries). The
         kernel is really the only place to intercept these calls.
      
      Q: Could a bounds fault be handed to userspace and the tables
         allocated there in a signal handler instead of in the kernel?
      A: (thanks to tglx) mmap() is not on the list of safe async
         handler functions and even if mmap() would work it still
         requires locking or nasty tricks to keep track of the
         allocation state there.
      
      Having ruled out all of the userspace-only approaches for managing
      bounds tables that we could think of, we create them on demand in
      the kernel.
      Based-on-patch-by: NQiaowei Ren <qiaowei.ren@intel.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      fe3d197f
    • D
      x86, mpx: Rename cfg_reg_u and status_reg · 62e7759b
      Dave Hansen 提交于
      According to Intel SDM extension, MPX configuration and status registers
      should be BNDCFGU and BNDSTATUS. This patch renames cfg_reg_u and
      status_reg to bndcfgu and bndstatus.
      
      [ tglx: Renamed 'struct bndscr_struct' to 'struct bndscr' ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Link: http://lkml.kernel.org/r/20141114151817.031762AC@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      62e7759b
    • D
      x86: mpx: Give bndX registers actual names · c04e051c
      Dave Hansen 提交于
      Consider the bndX MPX registers.  There 4 registers each
      containing a 64-bit lower and a 64-bit upper bound.  That's 8*64
      bits and we declare it thusly:
      
      	struct bndregs_struct {
      		u64 bndregs[8];
      	}
          
      Let's say you want to read the upper bound from the MPX register
      bnd2 out of the xsave buf.  You do:
      
      	bndregno = 2;
      	upper_bound = xsave_buf->bndregs.bndregs[2*bndregno+1];
      
      That kinda sucks.  Every time you access it, you need to know:
      1. Each bndX register is two entries wide in "bndregs"
      2. The lower comes first followed by upper.  We do the +1 to get
         upper vs. lower.
      
      This replaces the old definition.  You can now access them
      indexed by the register number directly, and with a meaningful
      name for the lower and upper bound:
      
      	bndregno = 2;
      	xsave_buf->bndreg[bndregno].upper_bound;
      
      It's now *VERY* clear that there are 4 registers.  The programmer
      now doesn't have to care what order the lower and upper bounds
      are in, and it's harder to get it wrong.
      
      [ tglx: Changed ub/lb to upper_bound/lower_bound and renamed struct
      bndreg_struct to struct bndreg ]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: x86@kernel.org
      Cc: "H. Peter Anvin" <hpa@linux.intel.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: "Yu, Fenghua" <fenghua.yu@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Link: http://lkml.kernel.org/r/20141031215820.5EA5E0EC@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c04e051c
  16. 10 11月, 2014 1 次提交
  17. 05 11月, 2014 1 次提交
  18. 31 7月, 2014 1 次提交
  19. 17 7月, 2014 1 次提交
    • D
      arch, locking: Ciao arch_mutex_cpu_relax() · 3a6bfbc9
      Davidlohr Bueso 提交于
      The arch_mutex_cpu_relax() function, introduced by 34b133f8, is
      hacky and ugly. It was added a few years ago to address the fact
      that common cpu_relax() calls include yielding on s390, and thus
      impact the optimistic spinning functionality of mutexes. Nowadays
      we use this function well beyond mutexes: rwsem, qrwlock, mcs and
      lockref. Since the macro that defines the call is in the mutex header,
      any users must include mutex.h and the naming is misleading as well.
      
      This patch (i) renames the call to cpu_relax_lowlatency  ("relax, but
      only if you can do it with very low latency") and (ii) defines it in
      each arch's asm/processor.h local header, just like for regular cpu_relax
      functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
      and thus we can take it out of mutex.h. While this can seem redundant,
      I believe it is a good choice as it allows us to move out arch specific
      logic from generic locking primitives and enables future(?) archs to
      transparently define it, similarly to System Z.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bharat Bhushan <r65777@freescale.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Joseph Myers <joseph@codesourcery.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Stratos Karafotis <stratosk@semaphore.gr>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Kulikov <segoon@openwall.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wolfram Sang <wsa@the-dreams.de>
      Cc: adi-buildroot-devel@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-am33-list@redhat.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-hexagon@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux@lists.openrisc.net
      Cc: linux-m32r-ja@ml.linux-m32r.org
      Cc: linux-m32r@ml.linux-m32r.org
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linux-metag@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3a6bfbc9
  20. 30 5月, 2014 1 次提交
    • F
      x86/xsaves: Change compacted format xsave area header · 0b29643a
      Fenghua Yu 提交于
      The XSAVE area header is changed to support both compacted format and
      standard format of xsave area.
      
      The XSAVE header of an xsave area comprises the 64 bytes starting at offset
      512 from the area base address:
      
      - Bytes 7:0 of the xsave header is a state-component bitmap called
        xstate_bv. It identifies the state components in the xsave area.
      
      - Bytes 15:8 of the xsave header is a state-component bitmap called
        xcomp_bv. It is used as follows:
        - xcomp_bv[63] indicates the format of the extended region of
          the xsave area. If it is clear, the standard format is used.
          If it is set, the compacted format is used.
        - xcomp_bv[62:0] indicate which features (starting at feature 2)
          have space allocated for them in the compacted format.
      
      - Bytes 63:16 of the xsave header are reserved.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Link: http://lkml.kernel.org/r/1401387164-43416-6-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0b29643a
  21. 07 3月, 2014 1 次提交
    • S
      x86: Keep thread_info on thread stack in x86_32 · 198d208d
      Steven Rostedt 提交于
      x86_64 uses a per_cpu variable kernel_stack to always point to
      the thread stack of current. This is where the thread_info is stored
      and is accessed from this location even when the irq or exception stack
      is in use. This removes the complexity of having to maintain the
      thread info on the stack when interrupts are running and having to
      copy the preempt_count and other fields to the interrupt stack.
      
      x86_32 uses the old method of copying the thread_info from the thread
      stack to the exception stack just before executing the exception.
      
      Having the two different requires #ifdefs and also the x86_32 way
      is a bit of a pain to maintain. By converting x86_32 to the same
      method of x86_64, we can remove #ifdefs, clean up the x86_32 code
      a little, and remove the overhead of the copy.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
      Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      198d208d