1. 17 2月, 2012 4 次提交
    • L
      i387: do not preload FPU state at task switch time · b3b0870e
      Linus Torvalds 提交于
      Yes, taking the trap to re-load the FPU/MMX state is expensive, but so
      is spending several days looking for a bug in the state save/restore
      code.  And the preload code has some rather subtle interactions with
      both paravirtualization support and segment state restore, so it's not
      nearly as simple as it should be.
      
      Also, now that we no longer necessarily depend on a single bit (ie
      TS_USEDFPU) for keeping track of the state of the FPU, we migth be able
      to do better.  If we are really switching between two processes that
      keep touching the FP state, save/restore is inevitable, but in the case
      of having one process that does most of the FPU usage, we may actually
      be able to do much better than the preloading.
      
      In particular, we may be able to keep track of which CPU the process ran
      on last, and also per CPU keep track of which process' FP state that CPU
      has.  For modern CPU's that don't destroy the FPU contents on save time,
      that would allow us to do a lazy restore by just re-enabling the
      existing FPU state - with no restore cost at all!
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3b0870e
    • L
      i387: don't ever touch TS_USEDFPU directly, use helper functions · 6d59d7a9
      Linus Torvalds 提交于
      This creates three helper functions that do the TS_USEDFPU accesses, and
      makes everybody that used to do it by hand use those helpers instead.
      
      In addition, there's a couple of helper functions for the "change both
      CR0.TS and TS_USEDFPU at the same time" case, and the places that do
      that together have been changed to use those.  That means that we have
      fewer random places that open-code this situation.
      
      The intent is partly to clarify the code without actually changing any
      semantics yet (since we clearly still have some hard to reproduce bug in
      this area), but also to make it much easier to use another approach
      entirely to caching the CR0.TS bit for software accesses.
      
      Right now we use a bit in the thread-info 'status' variable (this patch
      does not change that), but we might want to make it a full field of its
      own or even make it a per-cpu variable.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d59d7a9
    • L
      i387: move TS_USEDFPU clearing out of __save_init_fpu and into callers · b6c66418
      Linus Torvalds 提交于
      Touching TS_USEDFPU without touching CR0.TS is confusing, so don't do
      it.  By moving it into the callers, we always do the TS_USEDFPU next to
      the CR0.TS accesses in the source code, and it's much easier to see how
      the two go hand in hand.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6c66418
    • L
      i387: fix x86-64 preemption-unsafe user stack save/restore · 15d8791c
      Linus Torvalds 提交于
      Commit 5b1cbac3 ("i387: make irq_fpu_usable() tests more robust")
      added a sanity check to the #NM handler to verify that we never cause
      the "Device Not Available" exception in kernel mode.
      
      However, that check actually pinpointed a (fundamental) race where we do
      cause that exception as part of the signal stack FPU state save/restore
      code.
      
      Because we use the floating point instructions themselves to save and
      restore state directly from user mode, we cannot do that atomically with
      testing the TS_USEDFPU bit: the user mode access itself may cause a page
      fault, which causes a task switch, which saves and restores the FP/MMX
      state from the kernel buffers.
      
      This kind of "recursive" FP state save is fine per se, but it means that
      when the signal stack save/restore gets restarted, it will now take the
      '#NM' exception we originally tried to avoid.  With preemption this can
      happen even without the page fault - but because of the user access, we
      cannot just disable preemption around the save/restore instruction.
      
      There are various ways to solve this, including using the
      "enable/disable_page_fault()" helpers to not allow page faults at all
      during the sequence, and fall back to copying things by hand without the
      use of the native FP state save/restore instructions.
      
      However, the simplest thing to do is to just allow the #NM from kernel
      space, but fix the race in setting and clearing CR0.TS that this all
      exposed: the TS bit changes and the TS_USEDFPU bit absolutely have to be
      atomic wrt scheduling, so while the actual state save/restore can be
      interrupted and restarted, the act of actually clearing/setting CR0.TS
      and the TS_USEDFPU bit together must not.
      
      Instead of just adding random "preempt_disable/enable()" calls to what
      is already excessively ugly code, this introduces some helper functions
      that mostly mirror the "kernel_fpu_begin/end()" functionality, just for
      the user state instead.
      
      Those helper functions should probably eventually replace the other
      ad-hoc CR0.TS and TS_USEDFPU tests too, but I'll need to think about it
      some more: the task switching functionality in particular needs to
      expose the difference between the 'prev' and 'next' threads, while the
      new helper functions intentionally were written to only work with
      'current'.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15d8791c
  2. 16 2月, 2012 1 次提交
    • L
      i387: fix sense of sanity check · c38e2345
      Linus Torvalds 提交于
      The check for save_init_fpu() (introduced in commit 5b1cbac3: "i387:
      make irq_fpu_usable() tests more robust") was the wrong way around, but
      I hadn't noticed, because my "tests" were bogus: the FPU exceptions are
      disabled by default, so even doing a divide by zero never actually
      triggers this code at all unless you do extra work to enable them.
      
      So if anybody did enable them, they'd get one spurious warning.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c38e2345
  3. 14 2月, 2012 2 次提交
    • L
      i387: make irq_fpu_usable() tests more robust · 5b1cbac3
      Linus Torvalds 提交于
      Some code - especially the crypto layer - wants to use the x86
      FP/MMX/AVX register set in what may be interrupt (typically softirq)
      context.
      
      That *can* be ok, but the tests for when it was ok were somewhat
      suspect.  We cannot touch the thread-specific status bits either, so
      we'd better check that we're not going to try to save FP state or
      anything like that.
      
      Now, it may be that the TS bit is always cleared *before* we set the
      USEDFPU bit (and only set when we had already cleared the USEDFP
      before), so the TS bit test may actually have been sufficient, but it
      certainly was not obviously so.
      
      So this explicitly verifies that we will not touch the TS_USEDFPU bit,
      and adds a few related sanity-checks.  Because it seems that somehow
      AES-NI is corrupting user FP state.  The cause is not clear, and this
      patch doesn't fix it, but while debugging it I really wanted the code to
      be more obviously correct and robust.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b1cbac3
    • L
      i387: math_state_restore() isn't called from asm · be98c2cd
      Linus Torvalds 提交于
      It was marked asmlinkage for some really old and stale legacy reasons.
      Fix that and the equally stale comment.
      
      Noticed when debugging the irq_fpu_usable() bugs.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be98c2cd
  4. 06 12月, 2011 1 次提交
  5. 07 4月, 2011 1 次提交
    • H
      x86-32, fpu: Fix FPU exception handling on non-SSE systems · f994d99c
      Hans Rosenfeld 提交于
      On 32bit systems without SSE (that is, they use FSAVE/FRSTOR for FPU
      context switches), FPU exceptions in user mode cause Oopses, BUGs,
      recursive faults and other nasty things:
      
      fpu exception: 0000 [#1]
      last sysfs file: /sys/power/state
      Modules linked in: psmouse evdev pcspkr serio_raw [last unloaded: scsi_wait_scan]
      
      Pid: 1638, comm: fxsave-32-excep Not tainted 2.6.35-07798-g58a992b9-dirty #633 VP3-596B-DD/VT82C597
      EIP: 0060:[<c1003527>] EFLAGS: 00010202 CPU: 0
      EIP is at math_error+0x1b4/0x1c8
      EAX: 00000003 EBX: cf9be7e0 ECX: 00000000 EDX: cf9c5c00
      ESI: cf9d9fb4 EDI: c1372db3 EBP: 00000010 ESP: cf9d9f1c
      DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
      Process fxsave-32-excep (pid: 1638, ti=cf9d8000 task=cf9be7e0 task.ti=cf9d8000)
      Stack:
      00000000 00000301 00000004 00000000 00000000 cf9d3000 cf9da8f0 00000001
      <0> 00000004 cf9b6b60 c1019a6b c1019a79 00000020 00000242 000001b6 cf9c5380
      <0> cf806b40 cf791880 00000000 00000282 00000282 c108a213 00000020 cf9c5380
      Call Trace:
      [<c1019a6b>] ? need_resched+0x11/0x1a
      [<c1019a79>] ? should_resched+0x5/0x1f
      [<c108a213>] ? do_sys_open+0xbd/0xc7
      [<c108a213>] ? do_sys_open+0xbd/0xc7
      [<c100353b>] ? do_coprocessor_error+0x0/0x11
      [<c12d5965>] ? error_code+0x65/0x70
      Code: a8 20 74 30 c7 44 24 0c 06 00 03 00 8d 54 24 04 89 d9 b8 08 00 00 00 e8 9b 6d 02 00 eb 16 8b 93 5c 02 00 00 eb 05 e9 04 ff ff ff <9b> dd 32 9b e9 16 ff ff ff 81 c4 84 00 00 00 5b 5e 5f 5d c3 c6
      EIP: [<c1003527>] math_error+0x1b4/0x1c8 SS:ESP 0068:cf9d9f1c
      
      This usually continues in slight variations until the system is reset.
      
      This bug was introduced by commit 58a992b9:
      	x86-32, fpu: Rewrite fpu_save_init()
      Signed-off-by: NHans Rosenfeld <hans.rosenfeld@amd.com>
      Link: http://lkml.kernel.org/r/1302106003-366952-1-git-send-email-hans.rosenfeld@amd.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      f994d99c
  6. 23 10月, 2010 1 次提交
  7. 14 10月, 2010 1 次提交
  8. 10 9月, 2010 8 次提交
  9. 01 8月, 2010 1 次提交
  10. 22 7月, 2010 1 次提交
  11. 20 7月, 2010 2 次提交
    • S
      x86, xsave: Use xsaveopt in context-switch path when supported · 6bad06b7
      Suresh Siddha 提交于
      xsaveopt is a more optimized form of xsave specifically designed
      for the context switch usage. xsaveopt doesn't save the state that's not
      modified from the prior xrstor. And if a specific feature state gets
      modified to the init state, then xsaveopt just updates the header bit
      in the xsave memory layout without updating the corresponding memory
      layout.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100719230205.604014179@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      6bad06b7
    • S
      x86, xsave: Sync xsave memory layout with its header for user handling · 29104e10
      Suresh Siddha 提交于
      With xsaveopt, if a processor implementation discern that a processor state
      component is in its initialized state it may modify the corresponding bit in
      the xsave_hdr.xstate_bv as '0', with out modifying the corresponding memory
      layout. Hence wHile presenting the xstate information to the user, we always
      ensure that the memory layout of a feature will be in the init state if the
      corresponding header bit is zero. This ensures the consistency and avoids the
      condition of the user seeing some some stale state in the memory layout during
      signal handling, debugging etc.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100719230205.351459480@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      29104e10
  12. 07 7月, 2010 1 次提交
    • S
      x86: Avoid unnecessary __clear_user() and xrstor in signal handling · 8e221b6d
      Suresh Siddha 提交于
      fxsave/xsave doesn't touch all the bytes in the memory layout used by
      these instructions. Specifically SW reserved (bytes 464..511) fields
      in the fxsave frame and the reserved fields in the xsave header.
      
      To present a clean context for the signal handling, just clear these fields
      instead of clearing the complete fxsave/xsave memory layout, when we dump these
      registers directly to the user signal frame.
      
      Also avoid the call to second xrstor (which inits the state not passed
      in the signal frame) in restore_user_xstate() if all the state has already
      been restored by the first xrstor.
      
      These changes improve the performance of signal handling(by ~3-5% as measured
      by the lat_sig).
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1277249017.2847.85.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      8e221b6d
  13. 12 5月, 2010 1 次提交
  14. 11 5月, 2010 3 次提交
    • H
      x86, fpu: Use the proper asm constraint in use_xsave() · dce8bf4e
      H. Peter Anvin 提交于
      The proper constraint for a receiving 8-bit variable is "=qm", not
      "=g" which equals "=rim"; even though the "i" will never match, bugs
      can and do happen due to the difference between "q" and "r".
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com>
      dce8bf4e
    • A
      x86: Introduce 'struct fpu' and related API · 86603283
      Avi Kivity 提交于
      Currently all fpu state access is through tsk->thread.xstate.  Since we wish
      to generalize fpu access to non-task contexts, wrap the state in a new
      'struct fpu' and convert existing access to use an fpu API.
      
      Signal frame handlers are not converted to the API since they will remain
      task context only things.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      86603283
    • A
      x86: Eliminate TS_XSAVE · c9ad4882
      Avi Kivity 提交于
      The fpu code currently uses current->thread_info->status & TS_XSAVE as
      a way to distinguish between XSAVE capable processors and older processors.
      The decision is not really task specific; instead we use the task status to
      avoid a global memory reference - the value should be the same across all
      threads.
      
      Eliminate this tie-in into the task structure by using an alternative
      instruction keyed off the XSAVE cpu feature; this results in shorter and
      faster code, without introducing a global memory reference.
      
      [ hpa: in the future, this probably should use an asm jmp ]
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      c9ad4882
  15. 12 2月, 2010 1 次提交
    • S
      x86, ptrace: regset extensions to support xstate · 5b3efd50
      Suresh Siddha 提交于
      Add the xstate regset support which helps extend the kernel ptrace and the
      core-dump interfaces to support AVX state etc.
      
      This regset interface is designed to support all the future state that gets
      supported using xsave/xrstor infrastructure.
      
      Looking at the memory layout saved by "xsave", one can't say which state
      is represented in the memory layout. This is because if a particular state is
      in init state, in the xsave hdr it can be represented by bit '0'. And hence
      we can't really say by the xsave header wether a state is in init state or
      the state is not saved in the memory layout.
      
      And hence the xsave memory layout available through this regset
      interface uses SW usable bytes [464..511] to convey what state is represented
      in the memory layout.
      
      First 8 bytes of the sw_usable_bytes[464..467] will be set to OS enabled xstate
      mask(which is same as the 64bit mask returned by the xgetbv's xCR0).
      
      The note NT_X86_XSTATE represents the extended state information in the
      core file, using the above mentioned memory layout.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100211195614.802495327@sbs-t61.sc.intel.com>
      Signed-off-by: NHongjiu Lu <hjl.tools@gmail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5b3efd50
  16. 03 11月, 2009 1 次提交
  17. 02 9月, 2009 1 次提交
    • H
      x86: Move kernel_fpu_using to irq_fpu_usable in asm/i387.h · ae4b688d
      Huang Ying 提交于
      This function measures whether the FPU/SSE state can be touched in
      interrupt context. If the interrupted code is in user space or has no
      valid FPU/SSE context (CR0.TS == 1), FPU/SSE state can be used in IRQ
      or soft_irq context too.
      
      This is used by AES-NI accelerated AES implementation and PCLMULQDQ
      accelerated GHASH implementation.
      
      v3:
       - Renamed to irq_fpu_usable to reflect the purpose of the function.
      
      v2:
       - Renamed to irq_is_fpu_using to reflect the real situation.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ae4b688d
  18. 18 6月, 2009 1 次提交
    • J
      x86: split out core __math_state_restore · e6e9cac8
      Jeremy Fitzhardinge 提交于
      Split the core fpu state restoration out into __math_state_restore, which
      assumes that cr0.TS is clear and that the fpu context has been initialized.
      
      This will be used during context switch.  There are two reasons this is
      desireable:
      
      - There's a small clarification.  When __switch_to() calls math_state_restore,
        it relies on the fact that tsk_used_math() returns true, and so will
        never do a blocking init_fpu().  __math_state_restore() does not have
        (or need) that logic, so the question never arises.
      
      - It allows the clts() to be moved earler in __switch_to() so it can be performed
        while cpu context updates are batched (will be done in a later patch).
      
      [ Impact: refactor code to make reuse cleaner; no functional change ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      e6e9cac8
  19. 09 6月, 2009 1 次提交
    • C
      x86: Clear TS in irq_ts_save() when in an atomic section · 0b8c3d5a
      Chuck Ebbert 提交于
      The dynamic FPU context allocation changes caused the padlock driver
      to generate the below warning. Fix it by masking TS when doing padlock
      encryption operations in an atomic section.
      
      This solves:
      
      BUG: sleeping function called from invalid context at mm/slub.c:1602
      in_atomic(): 1, irqs_disabled(): 0, pid: 82, name: cryptomgr_test
      Pid: 82, comm: cryptomgr_test Not tainted 2.6.29.4-168.test7.fc11.x86_64 #1
      Call Trace:
      [<ffffffff8103ff16>] __might_sleep+0x10b/0x110
      [<ffffffff810cd3b2>] kmem_cache_alloc+0x37/0xf1
      [<ffffffff81018505>] init_fpu+0x49/0x8a
      [<ffffffff81012a83>] math_state_restore+0x3e/0xbc
      [<ffffffff813ac6d0>] do_device_not_available+0x9/0xb
      [<ffffffff810123ab>] device_not_available+0x1b/0x20
      [<ffffffffa001c066>] ? aes_crypt+0x66/0x74 [padlock_aes]
      [<ffffffff8119a51a>] ? blkcipher_walk_next+0x257/0x2e0
      [<ffffffff8119a731>] ? blkcipher_walk_first+0x18e/0x19d
      [<ffffffffa001c1fe>] aes_encrypt+0x9d/0xe5 [padlock_aes]
      [<ffffffffa0027253>] crypt+0x6b/0x114 [xts]
      [<ffffffffa001c161>] ? aes_encrypt+0x0/0xe5 [padlock_aes]
      [<ffffffffa001c161>] ? aes_encrypt+0x0/0xe5 [padlock_aes]
      [<ffffffffa0027390>] encrypt+0x49/0x4b [xts]
      [<ffffffff81199acc>] async_encrypt+0x3c/0x3e
      [<ffffffff8119dafc>] test_skcipher+0x1da/0x658
      [<ffffffff811979c3>] ? crypto_spawn_tfm+0x8e/0xb1
      [<ffffffff8119672d>] ? __crypto_alloc_tfm+0x11b/0x15f
      [<ffffffff811979c3>] ? crypto_spawn_tfm+0x8e/0xb1
      [<ffffffff81199dbe>] ? skcipher_geniv_init+0x2b/0x47
      [<ffffffff8119a905>] ? async_chainiv_init+0x5c/0x61
      [<ffffffff8119dfdd>] alg_test_skcipher+0x63/0x9b
      [<ffffffff8119e1bc>] alg_test+0x12d/0x175
      [<ffffffff8119c488>] cryptomgr_test+0x38/0x54
      [<ffffffff8119c450>] ? cryptomgr_test+0x0/0x54
      [<ffffffff8105c6c9>] kthread+0x4d/0x78
      [<ffffffff8101264a>] child_rip+0xa/0x20
      [<ffffffff81011f67>] ? restore_args+0x0/0x30
      [<ffffffff8105c67c>] ? kthread+0x0/0x78
      [<ffffffff81012640>] ? child_rip+0x0/0x20
      Signed-off-by: NChuck Ebbert <cebbert@redhat.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20090609104050.50158cfe@dhcp-100-2-144.bos.redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0b8c3d5a
  20. 08 4月, 2009 3 次提交
  21. 05 3月, 2009 1 次提交
    • D
      x86, math-emu: fix init_fpu for task != current · ab9e1858
      Daniel Glöckner 提交于
      Impact: fix math-emu related crash while using GDB/ptrace
      
      init_fpu() calls finit to initialize a task's xstate, while finit always
      works on the current task. If we use PTRACE_GETFPREGS on another
      process and both processes did not already use floating point, we get
      a null pointer exception in finit.
      
      This patch creates a new function finit_task that takes a task_struct
      parameter. finit becomes a wrapper that simply calls finit_task with
      current. On the plus side this avoids many calls to get_current which
      would each resolve to an inline assembler mov instruction.
      
      An empty finit_task has been added to i387.h to avoid linker errors in
      case the compiler still emits the call in init_fpu when
      CONFIG_MATH_EMULATION is not defined.
      
      The declaration of finit in i387.h has been removed as the remaining
      code using this function gets its prototype from fpu_proto.h.
      Signed-off-by: NDaniel Glöckner <dg@emlix.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Bill Metzenthen <billm@melbpc.org.au>
      LKML-Reference: <E1Lew31-0004il-Fg@mailer.emlix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ab9e1858
  22. 23 10月, 2008 2 次提交
  23. 13 8月, 2008 1 次提交
    • S
      crypto: padlock - fix VIA PadLock instruction usage with irq_ts_save/restore() · e4914012
      Suresh Siddha 提交于
      Wolfgang Walter reported this oops on his via C3 using padlock for
      AES-encryption:
      
      ##################################################################
      
      BUG: unable to handle kernel NULL pointer dereference at 000001f0
      IP: [<c01028c5>] __switch_to+0x30/0x117
      *pde = 00000000
      Oops: 0002 [#1] PREEMPT
      Modules linked in:
      
      Pid: 2071, comm: sleep Not tainted (2.6.26 #11)
      EIP: 0060:[<c01028c5>] EFLAGS: 00010002 CPU: 0
      EIP is at __switch_to+0x30/0x117
      EAX: 00000000 EBX: c0493300 ECX: dc48dd00 EDX: c0493300
      ESI: dc48dd00 EDI: c0493530 EBP: c04cff8c ESP: c04cff7c
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
      Process sleep (pid: 2071, ti=c04ce000 task=dc48dd00 task.ti=d2fe6000)
      Stack: dc48df30 c0493300 00000000 00000000 d2fe7f44 c03b5b43 c04cffc8 00000046
             c0131856 0000005a dc472d3c c0493300 c0493470 d983ae00 00002696 00000000
             c0239f54 00000000 c04c4000 c04cffd8 c01025fe c04f3740 00049800 c04cffe0
      Call Trace:
       [<c03b5b43>] ? schedule+0x285/0x2ff
       [<c0131856>] ? pm_qos_requirement+0x3c/0x53
       [<c0239f54>] ? acpi_processor_idle+0x0/0x434
       [<c01025fe>] ? cpu_idle+0x73/0x7f
       [<c03a4dcd>] ? rest_init+0x61/0x63
       =======================
      
      Wolfgang also found out that adding kernel_fpu_begin() and kernel_fpu_end()
      around the padlock instructions fix the oops.
      
      Suresh wrote:
      
      These padlock instructions though don't use/touch SSE registers, but it behaves
      similar to other SSE instructions. For example, it might cause DNA faults
      when cr0.ts is set. While this is a spurious DNA trap, it might cause
      oops with the recent fpu code changes.
      
      This is the code sequence  that is probably causing this problem:
      
      a) new app is getting exec'd and it is somewhere in between
         start_thread() and flush_old_exec() in the load_xyz_binary()
      
      b) At pont "a", task's fpu state (like TS_USEDFPU, used_math() etc) is
         cleared.
      
      c) Now we get an interrupt/softirq which starts using these encrypt/decrypt
         routines in the network stack. This generates a math fault (as
         cr0.ts is '1') which sets TS_USEDFPU and restores the math that is
         in the task's xstate.
      
      d) Return to exec code path, which does start_thread() which does
         free_thread_xstate() and sets xstate pointer to NULL while
         the TS_USEDFPU is still set.
      
      e) At the next context switch from the new exec'd task to another task,
         we have a scenarios where TS_USEDFPU is set but xstate pointer is null.
         This can cause an oops during unlazy_fpu() in __switch_to()
      
      Now:
      
      1) This should happen with or with out pre-emption. Viro also encountered
         similar problem with out CONFIG_PREEMPT.
      
      2) kernel_fpu_begin() and kernel_fpu_end() will fix this problem, because
         kernel_fpu_begin() will manually do a clts() and won't run in to the
         situation of setting TS_USEDFPU in step "c" above.
      
      3) This was working before the fpu changes, because its a spurious
         math fault  which doesn't corrupt any fpu/sse registers and the task's
         math state was always in an allocated state.
      
      With out the recent lazy fpu allocation changes, while we don't see oops,
      there is a possible race still present in older kernels(for example,
      while kernel is using kernel_fpu_begin() in some optimized clear/copy
      page and an interrupt/softirq happens which uses these padlock
      instructions generating DNA fault).
      
      This is the failing scenario that existed even before the lazy fpu allocation
      changes:
      
      0. CPU's TS flag is set
      
      1. kernel using FPU in some optimized copy  routine and while doing
      kernel_fpu_begin() takes an interrupt just before doing clts()
      
      2. Takes an interrupt and ipsec uses padlock instruction. And we
      take a DNA fault as TS flag is still set.
      
      3. We handle the DNA fault and set TS_USEDFPU and clear cr0.ts
      
      4. We complete the padlock routine
      
      5. Go back to step-1, which resumes clts() in kernel_fpu_begin(), finishes
      the optimized copy routine and does kernel_fpu_end(). At this point,
      we have cr0.ts again set to '1' but the task's TS_USEFPU is stilll
      set and not cleared.
      
      6. Now kernel resumes its user operation. And at the next context
      switch, kernel sees it has do a FP save as TS_USEDFPU is still set
      and then will do a unlazy_fpu() in __switch_to(). unlazy_fpu()
      will take a DNA fault, as cr0.ts is '1' and now, because we are
      in __switch_to(), math_state_restore() will get confused and will
      restore the next task's FP state and will save it in prev tasks's FP state.
      Remember, in __switch_to() we are already on the stack of the next task
      but take a DNA fault for the prev task.
      
      This causes the fpu leakage.
      
      Fix the padlock instruction usage by calling them inside the
      context of new routines irq_ts_save/restore(), which clear/restore cr0.ts
      manually in the interrupt context. This will not generate spurious DNA
      in the  context of the interrupt which will fix the oops encountered and
      the possible FPU leakage issue.
      Reported-and-bisected-by: NWolfgang Walter <wolfgang.walter@stwm.de>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e4914012