1. 07 12月, 2013 1 次提交
  2. 22 9月, 2012 1 次提交
  3. 19 9月, 2012 3 次提交
    • S
      x86, fpu: use non-lazy fpu restore for processors supporting xsave · 304bceda
      Suresh Siddha 提交于
      Fundamental model of the current Linux kernel is to lazily init and
      restore FPU instead of restoring the task state during context switch.
      This changes that fundamental lazy model to the non-lazy model for
      the processors supporting xsave feature.
      
      Reasons driving this model change are:
      
      i. Newer processors support optimized state save/restore using xsaveopt and
      xrstor by tracking the INIT state and MODIFIED state during context-switch.
      This is faster than modifying the cr0.TS bit which has serializing semantics.
      
      ii. Newer glibc versions use SSE for some of the optimized copy/clear routines.
      With certain workloads (like boot, kernel-compilation etc), application
      completes its work with in the first 5 task switches, thus taking upto 5 #DNA
      traps with the kernel not getting a chance to apply the above mentioned
      pre-load heuristic.
      
      iii. Some xstate features (like AMD's LWP feature) don't honor the cr0.TS bit
      and thus will not work correctly in the presence of lazy restore. Non-lazy
      state restore is needed for enabling such features.
      
      Some data on a two socket SNB system:
       * Saved 20K DNA exceptions during boot on a two socket SNB system.
       * Saved 50K DNA exceptions during kernel-compilation workload.
       * Improved throughput of the AVX based checksumming function inside the
         kernel by ~15% as xsave/xrstor is faster than the serializing clts/stts
         pair.
      
      Also now kernel_fpu_begin/end() relies on the patched
      alternative instructions. So move check_fpu() which uses the
      kernel_fpu_begin/end() after alternative_instructions().
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1345842782-24175-7-git-send-email-suresh.b.siddha@intel.com
      Merge 32-bit boot fix from,
      Link: http://lkml.kernel.org/r/1347300665-6209-4-git-send-email-suresh.b.siddha@intel.com
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      304bceda
    • S
      x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels · 72a671ce
      Suresh Siddha 提交于
      Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
      to/from the fpstate in the task struct.
      
      And in the case of signal delivery for x86_64 binaries, if the fpstate is live
      in the CPU registers, then the live state is copied directly to the user
      sigframe. Otherwise  fpstate in the task struct is copied to the user sigframe.
      During restore, fpstate in the user sigframe is restored directly to the live
      CPU registers.
      
      Historically, different code paths led to different bugs. For example,
      x86_64 code path was not preemption safe till recently. Also there is lot
      of code duplication for support of new features like xsave etc.
      
      Unify signal handling code paths for x86 and x86_64 kernels.
      
      New strategy is as follows:
      
      Signal delivery: Both for 32/64-bit frames, align the core math frame area to
      64bytes as needed by xsave (this where the main fpu/extended state gets copied
      to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
      frames). If the state is live, copy the register state directly to the user
      frame. If not live, copy the state in the thread struct to the user frame. And
      for 32-bit [f]xsave frames, construct the fsave header separately before
      the actual [f]xsave area.
      
      Signal return: As the 32-bit frames with [f]xstate has an additional
      'fsave' header, copy everything back from the user sigframe to the
      fpstate in the task structure and reconstruct the fxstate from the 'fsave'
      header (Also user passed pointers may not be correctly aligned for
      any attempt to directly restore any partial state). At the next fpstate usage,
      everything will be restored to the live CPU registers.
      For all the 64-bit frames and the 32-bit fsave frame, restore the state from
      the user sigframe directly to the live CPU registers. 64-bit signals always
      restored the math frame directly, so we can expect the math frame pointer
      to be correctly aligned. For 32-bit fsave frames, there are no alignment
      requirements, so we can restore the state directly.
      
      "lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
      with in the noise range with this change.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
      [ Merged in compilation fix ]
      Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      72a671ce
    • S
      x86, fpu: Consolidate inline asm routines for saving/restoring fpu state · 0ca5bd0d
      Suresh Siddha 提交于
      Consolidate x86, x86_64 inline asm routines saving/restoring fpu state
      using config_enabled().
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1343171129-2747-3-git-send-email-suresh.b.siddha@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0ca5bd0d
  4. 21 4月, 2012 1 次提交
  5. 01 8月, 2010 2 次提交
  6. 22 7月, 2010 3 次提交
  7. 21 7月, 2010 1 次提交
  8. 20 7月, 2010 2 次提交
    • S
      x86, xsave: Use xsaveopt in context-switch path when supported · 6bad06b7
      Suresh Siddha 提交于
      xsaveopt is a more optimized form of xsave specifically designed
      for the context switch usage. xsaveopt doesn't save the state that's not
      modified from the prior xrstor. And if a specific feature state gets
      modified to the init state, then xsaveopt just updates the header bit
      in the xsave memory layout without updating the corresponding memory
      layout.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100719230205.604014179@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      6bad06b7
    • S
      x86, xsave: Sync xsave memory layout with its header for user handling · 29104e10
      Suresh Siddha 提交于
      With xsaveopt, if a processor implementation discern that a processor state
      component is in its initialized state it may modify the corresponding bit in
      the xsave_hdr.xstate_bv as '0', with out modifying the corresponding memory
      layout. Hence wHile presenting the xstate information to the user, we always
      ensure that the memory layout of a feature will be in the init state if the
      corresponding header bit is zero. This ensures the consistency and avoids the
      condition of the user seeing some some stale state in the memory layout during
      signal handling, debugging etc.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100719230205.351459480@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      29104e10
  9. 07 7月, 2010 1 次提交
    • S
      x86: Avoid unnecessary __clear_user() and xrstor in signal handling · 8e221b6d
      Suresh Siddha 提交于
      fxsave/xsave doesn't touch all the bytes in the memory layout used by
      these instructions. Specifically SW reserved (bytes 464..511) fields
      in the fxsave frame and the reserved fields in the xsave header.
      
      To present a clean context for the signal handling, just clear these fields
      instead of clearing the complete fxsave/xsave memory layout, when we dump these
      registers directly to the user signal frame.
      
      Also avoid the call to second xrstor (which inits the state not passed
      in the signal frame) in restore_user_xstate() if all the state has already
      been restored by the first xrstor.
      
      These changes improve the performance of signal handling(by ~3-5% as measured
      by the lat_sig).
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1277249017.2847.85.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      8e221b6d
  10. 11 5月, 2010 1 次提交
  11. 12 2月, 2010 1 次提交
    • S
      x86, ptrace: regset extensions to support xstate · 5b3efd50
      Suresh Siddha 提交于
      Add the xstate regset support which helps extend the kernel ptrace and the
      core-dump interfaces to support AVX state etc.
      
      This regset interface is designed to support all the future state that gets
      supported using xsave/xrstor infrastructure.
      
      Looking at the memory layout saved by "xsave", one can't say which state
      is represented in the memory layout. This is because if a particular state is
      in init state, in the xsave hdr it can be represented by bit '0'. And hence
      we can't really say by the xsave header wether a state is in init state or
      the state is not saved in the memory layout.
      
      And hence the xsave memory layout available through this regset
      interface uses SW usable bytes [464..511] to convey what state is represented
      in the memory layout.
      
      First 8 bytes of the sw_usable_bytes[464..467] will be set to OS enabled xstate
      mask(which is same as the 64bit mask returned by the xgetbv's xCR0).
      
      The note NT_X86_XSTATE represents the extended state information in the
      core file, using the above mentioned memory layout.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100211195614.802495327@sbs-t61.sc.intel.com>
      Signed-off-by: NHongjiu Lu <hjl.tools@gmail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5b3efd50
  12. 12 4月, 2009 1 次提交
  13. 23 10月, 2008 1 次提交
  14. 31 7月, 2008 5 次提交