- 07 12月, 2013 1 次提交
-
-
由 Qiaowei Ren 提交于
Some features, like Intel MPX, work only if the kernel uses eagerfpu model. So we should force eagerfpu on unless the user has explicitly disabled it. Add definitions for Intel MPX and add it to the supported list. [ hpa: renamed XSTATE_FLEXIBLE to XSTATE_LAZY and added comments ] Signed-off-by: NQiaowei Ren <qiaowei.ren@intel.com> Link: http://lkml.kernel.org/r/9E0BE1322F2F2246BD820DA9FC397ADE014A6115@SHSMSX102.ccr.corp.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 22 9月, 2012 1 次提交
-
-
由 H. Peter Anvin 提交于
When Supervisor Mode Access Prevention (SMAP) is enabled, access to userspace from the kernel is controlled by the AC flag. To make the performance of manipulating that flag acceptable, there are two new instructions, STAC and CLAC, to set and clear it. This patch adds those instructions, via alternative(), when the SMAP feature is enabled. It also adds X86_EFLAGS_AC unconditionally to the SYSCALL entry mask; there is simply no reason to make that one conditional. Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1348256595-29119-9-git-send-email-hpa@linux.intel.com
-
- 19 9月, 2012 3 次提交
-
-
由 Suresh Siddha 提交于
Fundamental model of the current Linux kernel is to lazily init and restore FPU instead of restoring the task state during context switch. This changes that fundamental lazy model to the non-lazy model for the processors supporting xsave feature. Reasons driving this model change are: i. Newer processors support optimized state save/restore using xsaveopt and xrstor by tracking the INIT state and MODIFIED state during context-switch. This is faster than modifying the cr0.TS bit which has serializing semantics. ii. Newer glibc versions use SSE for some of the optimized copy/clear routines. With certain workloads (like boot, kernel-compilation etc), application completes its work with in the first 5 task switches, thus taking upto 5 #DNA traps with the kernel not getting a chance to apply the above mentioned pre-load heuristic. iii. Some xstate features (like AMD's LWP feature) don't honor the cr0.TS bit and thus will not work correctly in the presence of lazy restore. Non-lazy state restore is needed for enabling such features. Some data on a two socket SNB system: * Saved 20K DNA exceptions during boot on a two socket SNB system. * Saved 50K DNA exceptions during kernel-compilation workload. * Improved throughput of the AVX based checksumming function inside the kernel by ~15% as xsave/xrstor is faster than the serializing clts/stts pair. Also now kernel_fpu_begin/end() relies on the patched alternative instructions. So move check_fpu() which uses the kernel_fpu_begin/end() after alternative_instructions(). Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-7-git-send-email-suresh.b.siddha@intel.com Merge 32-bit boot fix from, Link: http://lkml.kernel.org/r/1347300665-6209-4-git-send-email-suresh.b.siddha@intel.com Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: NeilBrown <neilb@suse.de> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied to/from the fpstate in the task struct. And in the case of signal delivery for x86_64 binaries, if the fpstate is live in the CPU registers, then the live state is copied directly to the user sigframe. Otherwise fpstate in the task struct is copied to the user sigframe. During restore, fpstate in the user sigframe is restored directly to the live CPU registers. Historically, different code paths led to different bugs. For example, x86_64 code path was not preemption safe till recently. Also there is lot of code duplication for support of new features like xsave etc. Unify signal handling code paths for x86 and x86_64 kernels. New strategy is as follows: Signal delivery: Both for 32/64-bit frames, align the core math frame area to 64bytes as needed by xsave (this where the main fpu/extended state gets copied to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave frames). If the state is live, copy the register state directly to the user frame. If not live, copy the state in the thread struct to the user frame. And for 32-bit [f]xsave frames, construct the fsave header separately before the actual [f]xsave area. Signal return: As the 32-bit frames with [f]xstate has an additional 'fsave' header, copy everything back from the user sigframe to the fpstate in the task structure and reconstruct the fxstate from the 'fsave' header (Also user passed pointers may not be correctly aligned for any attempt to directly restore any partial state). At the next fpstate usage, everything will be restored to the live CPU registers. For all the 64-bit frames and the 32-bit fsave frame, restore the state from the user sigframe directly to the live CPU registers. 64-bit signals always restored the math frame directly, so we can expect the math frame pointer to be correctly aligned. For 32-bit fsave frames, there are no alignment requirements, so we can restore the state directly. "lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are with in the noise range with this change. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com [ Merged in compilation fix ] Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
Consolidate x86, x86_64 inline asm routines saving/restoring fpu state using config_enabled(). Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343171129-2747-3-git-send-email-suresh.b.siddha@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 21 4月, 2012 1 次提交
-
-
由 H. Peter Anvin 提交于
Remove open-coded exception table entries in arch/x86/include/asm/xsave.h, and replace them with _ASM_EXTABLE() macros; this will allow us to change the format and type of the exception table entries. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Cc: David Daney <david.daney@cavium.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFyijf43qSu3N9nWHEBwaGbb7T2Oq9A=9EyR=Jtyqfq_cQ@mail.gmail.com
-
- 01 8月, 2010 2 次提交
-
-
由 Sheng Yang 提交于
This patch enable save/restore of xsave state. Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Sheng Yang 提交于
Also add some constants. Signed-off-by: NSheng Yang <sheng@linux.intel.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 22 7月, 2010 3 次提交
-
-
由 Robert Richter 提交于
The pointer is only used in xsave.c. Making it static. Signed-off-by: NRobert Richter <robert.richter@amd.com> LKML-Reference: <1279731838-1522-5-git-send-email-robert.richter@amd.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Robert Richter 提交于
The patch introduces the XSTATE_CPUID macro and adds a check that tests if XSTATE_CPUID exists. Signed-off-by: NRobert Richter <robert.richter@amd.com> LKML-Reference: <1279731838-1522-4-git-send-email-robert.richter@amd.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Robert Richter 提交于
As xsave also supports other than fpu features, it should be initialized independently of the fpu. This patch moves this out of fpu initialization. There is also a lot of cross referencing between fpu and xsave code. This patch reduces this by making xsave_cntxt_init() and init_thread_xstate() static functions. The patch moves the cpu_has_xsave check at the beginning of xsave_init(). All other checks may removed then. Signed-off-by: NRobert Richter <robert.richter@amd.com> LKML-Reference: <1279731838-1522-2-git-send-email-robert.richter@amd.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 21 7月, 2010 1 次提交
-
-
由 Robert Richter 提交于
There are no dependencies to asm/i387.h. Instead, if including only xsave.h the following error occurs: .../arch/x86/include/asm/i387.h:110: error: ‘XSTATE_FP’ undeclared (first use in this function) .../arch/x86/include/asm/i387.h:110: error: (Each undeclared identifier is reported only once .../arch/x86/include/asm/i387.h:110: error: for each function it appears in.) This patch fixes this. Signed-off-by: NRobert Richter <robert.richter@amd.com> LKML-Reference: <1279651857-24639-2-git-send-email-robert.richter@amd.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 20 7月, 2010 2 次提交
-
-
由 Suresh Siddha 提交于
xsaveopt is a more optimized form of xsave specifically designed for the context switch usage. xsaveopt doesn't save the state that's not modified from the prior xrstor. And if a specific feature state gets modified to the init state, then xsaveopt just updates the header bit in the xsave memory layout without updating the corresponding memory layout. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100719230205.604014179@sbs-t61.sc.intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
With xsaveopt, if a processor implementation discern that a processor state component is in its initialized state it may modify the corresponding bit in the xsave_hdr.xstate_bv as '0', with out modifying the corresponding memory layout. Hence wHile presenting the xstate information to the user, we always ensure that the memory layout of a feature will be in the init state if the corresponding header bit is zero. This ensures the consistency and avoids the condition of the user seeing some some stale state in the memory layout during signal handling, debugging etc. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100719230205.351459480@sbs-t61.sc.intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 07 7月, 2010 1 次提交
-
-
由 Suresh Siddha 提交于
fxsave/xsave doesn't touch all the bytes in the memory layout used by these instructions. Specifically SW reserved (bytes 464..511) fields in the fxsave frame and the reserved fields in the xsave header. To present a clean context for the signal handling, just clear these fields instead of clearing the complete fxsave/xsave memory layout, when we dump these registers directly to the user signal frame. Also avoid the call to second xrstor (which inits the state not passed in the signal frame) in restore_user_xstate() if all the state has already been restored by the first xrstor. These changes improve the performance of signal handling(by ~3-5% as measured by the lat_sig). Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1277249017.2847.85.camel@sbs-t61.sc.intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 11 5月, 2010 1 次提交
-
-
由 Avi Kivity 提交于
Currently all fpu state access is through tsk->thread.xstate. Since we wish to generalize fpu access to non-task contexts, wrap the state in a new 'struct fpu' and convert existing access to use an fpu API. Signal frame handlers are not converted to the API since they will remain task context only things. Signed-off-by: NAvi Kivity <avi@redhat.com> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 12 2月, 2010 1 次提交
-
-
由 Suresh Siddha 提交于
Add the xstate regset support which helps extend the kernel ptrace and the core-dump interfaces to support AVX state etc. This regset interface is designed to support all the future state that gets supported using xsave/xrstor infrastructure. Looking at the memory layout saved by "xsave", one can't say which state is represented in the memory layout. This is because if a particular state is in init state, in the xsave hdr it can be represented by bit '0'. And hence we can't really say by the xsave header wether a state is in init state or the state is not saved in the memory layout. And hence the xsave memory layout available through this regset interface uses SW usable bytes [464..511] to convey what state is represented in the memory layout. First 8 bytes of the sw_usable_bytes[464..467] will be set to OS enabled xstate mask(which is same as the 64bit mask returned by the xgetbv's xCR0). The note NT_X86_XSTATE represents the extended state information in the core file, using the above mentioned memory layout. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100211195614.802495327@sbs-t61.sc.intel.com> Signed-off-by: NHongjiu Lu <hjl.tools@gmail.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 12 4月, 2009 1 次提交
-
-
由 Suresh Siddha 提交于
Impact: save/restore Intel-AVX state properly between tasks Intel Advanced Vector Extensions (AVX) introduce 256-bit vector processing capability. More about AVX at http://software.intel.com/sites/avx Add OS support for YMM state management using xsave/xrstor infrastructure to support AVX. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1239402084.27006.8057.camel@localhost.localdomain> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 23 10月, 2008 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 31 7月, 2008 5 次提交
-
-
由 H. Peter Anvin 提交于
The XSAVE feature mask is a 64-bit number; keep it that way, in order to avoid the mistake done with rdmsr/wrmsr. Use the xsetbv() function provided in the previous patch. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Suresh Siddha 提交于
On cpu's supporting xsave/xrstor, fpstate pointer in the sigcontext, will include the extended state information along with fpstate information. Presence of extended state information is indicated by the presence of FP_XSTATE_MAGIC1 at fpstate.sw_reserved.magic1 and FP_XSTATE_MAGIC2 at fpstate + (fpstate.sw_reserved.extended_size - FP_XSTATE_MAGIC2_SIZE). Extended feature bit mask that is saved in the memory layout is represented by the fpstate.sw_reserved.xstate_bv For RT signal frames, UC_FP_XSTATE in the uc_flags also indicate the presence of extended state information in the sigcontext's fpstate pointer. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Suresh Siddha 提交于
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Suresh Siddha 提交于
Uses xsave/xrstor (instead of traditional fxsave/fxrstor) in context switch when available. Introduces TS_XSAVE flag, which determine the need to use xsave/xrstor instructions during context switch instead of the legacy fxsave/fxrstor instructions. Thread-synchronous status word is already in L1 cache during this code patch and thus minimizes the performance penality compared to (cpu_has_xsave) checks. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Suresh Siddha 提交于
Enables xsave/xrstor by turning on cr4.osxsave on cpu's which have the xsave support. For now, features that OS supports/enabled are FP and SSE. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-