1. 09 6月, 2015 1 次提交
  2. 08 6月, 2015 3 次提交
    • I
      x86/asm/entry/32: Clean up entry_32.S · a49976d1
      Ingo Molnar 提交于
      Make the 32-bit syscall entry code a bit more readable:
      
       - use consistent assembly coding style similar to entry_64.S
      
       - remove old comments that are not true anymore
      
       - eliminate whitespace noise
      
       - use consistent vertical spacing
      
       - fix various comments
      
      No code changed:
      
        # arch/x86/entry/entry_32.o:
      
         text    data     bss     dec     hex filename
         6025       0       0    6025    1789 entry_32.o.before
         6025       0       0    6025    1789 entry_32.o.after
      
      md5:
         f3fa16b2b0dca804f052deb6b30ba6cb  entry_32.o.before.asm
         f3fa16b2b0dca804f052deb6b30ba6cb  entry_32.o.after.asm
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a49976d1
    • I
      x86/asm/entry: Untangle 'system_call' into two entry points: entry_SYSCALL_64 and entry_INT80_32 · b2502b41
      Ingo Molnar 提交于
      The 'system_call' entry points differ starkly between native 32-bit and 64-bit
      kernels: on 32-bit kernels it defines the INT 0x80 entry point, while on
      64-bit it's the SYSCALL entry point.
      
      This is pretty confusing when looking at generic code, and it also obscures
      the nature of the entry point at the assembly level.
      
      So unangle this by splitting the name into its two uses:
      
      	system_call (32) -> entry_INT80_32
      	system_call (64) -> entry_SYSCALL_64
      
      As per the generic naming scheme for x86 system call entry points:
      
      	entry_MNEMONIC_qualifier
      
      where 'qualifier' is one of _32, _64 or _compat.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b2502b41
    • I
      x86/asm/entry: Untangle 'ia32_sysenter_target' into two entry points:... · 4c8cd0c5
      Ingo Molnar 提交于
      x86/asm/entry: Untangle 'ia32_sysenter_target' into two entry points: entry_SYSENTER_32 and entry_SYSENTER_compat
      
      So the SYSENTER instruction is pretty quirky and it has different behavior
      depending on bitness and CPU maker.
      
      Yet we create a false sense of coherency by naming it 'ia32_sysenter_target'
      in both of the cases.
      
      Split the name into its two uses:
      
      	ia32_sysenter_target (32)    -> entry_SYSENTER_32
      	ia32_sysenter_target (64)    -> entry_SYSENTER_compat
      
      As per the generic naming scheme for x86 system call entry points:
      
      	entry_MNEMONIC_qualifier
      
      where 'qualifier' is one of _32, _64 or _compat.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4c8cd0c5
  3. 04 6月, 2015 1 次提交
    • I
      x86/asm/entry: Move entry_64.S and entry_32.S to arch/x86/entry/ · 905a36a2
      Ingo Molnar 提交于
      Create a new directory hierarchy for the low level x86 entry code:
      
          arch/x86/entry/*
      
      This will host all the low level glue that is currently scattered
      all across arch/x86/.
      
      Start with entry_64.S and entry_32.S.
      
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      905a36a2
  4. 02 6月, 2015 1 次提交
    • I
      x86/debug: Remove perpetually broken, unmaintainable dwarf annotations · 131484c8
      Ingo Molnar 提交于
      So the dwarf2 annotations in low level assembly code have
      become an increasing hindrance: unreadable, messy macros
      mixed into some of the most security sensitive code paths
      of the Linux kernel.
      
      These debug info annotations don't even buy the upstream
      kernel anything: dwarf driven stack unwinding has caused
      problems in the past so it's out of tree, and the upstream
      kernel only uses the much more robust framepointers based
      stack unwinding method.
      
      In addition to that there's a steady, slow bitrot going
      on with these annotations, requiring frequent fixups.
      There's no tooling and no functionality upstream that
      keeps it correct.
      
      So burn down the sick forest, allowing new, healthier growth:
      
         27 files changed, 350 insertions(+), 1101 deletions(-)
      
      Someone who has the willingness and time to do this
      properly can attempt to reintroduce dwarf debuginfo in x86
      assembly code plus dwarf unwinding from first principles,
      with the following conditions:
      
       - it should be maximally readable, and maximally low-key to
         'ordinary' code reading and maintenance.
      
       - find a build time method to insert dwarf annotations
         automatically in the most common cases, for pop/push
         instructions that manipulate the stack pointer. This could
         be done for example via a preprocessing step that just
         looks for common patterns - plus special annotations for
         the few cases where we want to depart from the default.
         We have hundreds of CFI annotations, so automating most of
         that makes sense.
      
       - it should come with build tooling checks that ensure that
         CFI annotations are sensible. We've seen such efforts from
         the framepointer side, and there's no reason it couldn't be
         done on the dwarf side.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      131484c8
  5. 11 4月, 2015 1 次提交
  6. 08 4月, 2015 1 次提交
    • D
      x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout · 3304c9c3
      Denys Vlasenko 提交于
      Interrupt entry points are handled with the following code,
      each 32-byte code block contains seven entry points:
      
      		...
      		[push][jump 22] // 4 bytes
      		[push][jump 18] // 4 bytes
      		[push][jump 14] // 4 bytes
      		[push][jump 10] // 4 bytes
      		[push][jump  6] // 4 bytes
      		[push][jump  2] // 4 bytes
      		[push][jump common_interrupt][padding] // 8 bytes
      
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump common_interrupt][padding]
      
      		[padding_2]
      	common_interrupt:
      
      And there is a table which holds pointers to every entry point,
      IOW: to every push.
      
      In cold cache, two jumps are still costlier than one, even
      though we get the benefit of them residing in the same
      cacheline.
      
      This change replaces short jumps with near ones to
      'common_interrupt', and pads every push+jump pair to 8 bytes. This
      way, each interrupt takes only one jump.
      
      This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before
      dispatch table with ".align 8" - we do not need anything
      stronger than that.
      
      The table of entry addresses (the interrupt[] array) is no
      longer necessary, the address of entries can be easily
      calculated as (irq_entries_start + i*8).
      
         text	   data	    bss	    dec	    hex	filename
        12546	      0	      0	  12546	   3102	entry_64.o.before
        11626	      0	      0	  11626	   2d6a	entry_64.o
      
      The size decrease is because 1656 bytes of .init.rodata are
      gone. That's initdata, though. The resident size does go up a
      bit.
      
      Run-tested (32 and 64 bits).
      Acked-and-Tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1428090553-7283-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3304c9c3
  7. 03 4月, 2015 1 次提交
  8. 17 3月, 2015 1 次提交
  9. 05 3月, 2015 1 次提交
  10. 24 2月, 2015 1 次提交
    • D
      x86/xen: allow privcmd hypercalls to be preempted · fdfd811d
      David Vrabel 提交于
      Hypercalls submitted by user space tools via the privcmd driver can
      take a long time (potentially many 10s of seconds) if the hypercall
      has many sub-operations.
      
      A fully preemptible kernel may deschedule such as task in any upcall
      called from a hypercall continuation.
      
      However, in a kernel with voluntary or no preemption, hypercall
      continuations in Xen allow event handlers to be run but the task
      issuing the hypercall will not be descheduled until the hypercall is
      complete and the ioctl returns to user space.  These long running
      tasks may also trigger the kernel's soft lockup detection.
      
      Add xen_preemptible_hcall_begin() and xen_preemptible_hcall_end() to
      bracket hypercalls that may be preempted.  Use these in the privcmd
      driver.
      
      When returning from an upcall, call xen_maybe_preempt_hcall() which
      adds a schedule point if if the current task was within a preemptible
      hypercall.
      
      Since _cond_resched() can move the task to a different CPU, clear and
      set xen_in_preemptible_hcall around the call.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      fdfd811d
  11. 23 2月, 2015 2 次提交
    • B
      x86/entry_32: Convert X86_INVD_BUG to ALTERNATIVE macro · 8e65f6e0
      Borislav Petkov 提交于
      Booting a 486 kernel on an AMD guest with this patch applied, says:
      
        apply_alternatives: feat: 0*32+25, old: (c160a475, len: 5), repl: (c19557d4, len: 5)
        c160a475: alt_insn: 68 10 35 00 c1
        c19557d4: rpl_insn: 68 80 39 00 c1
      
      which is:
      
        old insn VA: 0xc160a475, CPU feat: X86_FEATURE_XMM, size: 5
        simd_coprocessor_error:
                 c160a475:      68 10 35 00 c1          push $0xc1003510 <do_general_protection>
        repl insn: 0xc19557d4, size: 5
                 c160a475:      68 80 39 00 c1          push $0xc1003980 <do_simd_coprocessor_error>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      8e65f6e0
    • B
      x86/alternatives: Add instruction padding · 4332195c
      Borislav Petkov 提交于
      Up until now we have always paid attention to make sure the length of
      the new instruction replacing the old one is at least less or equal to
      the length of the old instruction. If the new instruction is longer, at
      the time it replaces the old instruction it will overwrite the beginning
      of the next instruction in the kernel image and cause your pants to
      catch fire.
      
      So instead of having to pay attention, teach the alternatives framework
      to pad shorter old instructions with NOPs at buildtime - but only in the
      case when
      
        len(old instruction(s)) < len(new instruction(s))
      
      and add nothing in the >= case. (In that case we do add_nops() when
      patching).
      
      This way the alternatives user shouldn't have to care about instruction
      sizes and simply use the macros.
      
      Add asm ALTERNATIVE* flavor macros too, while at it.
      
      Also, we need to save the pad length in a separate struct alt_instr
      member for NOP optimization and the way to do that reliably is to carry
      the pad length instead of trying to detect whether we're looking at
      single-byte NOPs or at pathological instruction offsets like e9 90 90 90
      90, for example, which is a valid instruction.
      
      Thanks to Michael Matz for the great help with toolchain questions.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      4332195c
  12. 16 12月, 2014 1 次提交
    • J
      x86: Avoid building unused IRQ entry stubs · 2414e021
      Jan Beulich 提交于
      When X86_LOCAL_APIC (i.e. unconditionally on x86-64),
      first_system_vector will never end up being higher than
      LOCAL_TIMER_VECTOR (0xef), and hence building stubs for vectors
      0xef...0xff is pointlessly reducing code density. Deal with this at
      build time already.
      
      Taking into consideration that X86_64 implies X86_LOCAL_APIC, also
      simplify (and hence make easier to read and more consistent with the
      change done here) some #if-s in arch/x86/kernel/irqinit.c.
      
      While we could further improve the packing of the IRQ entry stubs (the
      four ones now left in the last set could be fit into the four padding
      bytes each of the final four sets have) this doesn't seem to provide
      any real benefit: Both irq_entries_start and common_interrupt getting
      cache line aligned, eliminating the 30th set would just produce 32
      bytes of padding between the 29th and common_interrupt.
      
      [ tglx: Folded lguest fix from Dan Carpenter ]
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: lguest@lists.ozlabs.org
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Link: http://lkml.kernel.org/r/54574D5F0200007800044389@mail.emea.novell.com
      Link: http://lkml.kernel.org/r/20141115185718.GB6530@mwandaSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2414e021
  13. 12 12月, 2014 1 次提交
  14. 25 10月, 2014 1 次提交
  15. 24 9月, 2014 1 次提交
    • R
      audit: x86: drop arch from __audit_syscall_entry() interface · b4f0d375
      Richard Guy Briggs 提交于
      Since the arch is found locally in __audit_syscall_entry(), there is no need to
      pass it in as a parameter.  Delete it from the parameter list.
      
      x86* was the only arch to call __audit_syscall_entry() directly and did so from
      assembly code.
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-audit@redhat.com
      Signed-off-by: NEric Paris <eparis@redhat.com>
      
      ---
      
      As this patch relies on changes in the audit tree, I think it
      appropriate to send it through my tree rather than the x86 tree.
      b4f0d375
  16. 16 8月, 2014 1 次提交
  17. 22 7月, 2014 1 次提交
    • S
      x86_32, entry: Store badsys error code in %eax · 8142b215
      Sven Wegener 提交于
      Commit 554086d8 ("x86_32, entry: Do syscall exit work on badsys
      (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
      code, resulting in syscall() not returning proper errors for undefined
      syscalls on CPUs supporting the sysenter feature.
      
      The following code:
      
      > int result = syscall(666);
      > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));
      
      results in:
      
      > result=666 errno=0 error=Success
      
      Obviously, the syscall return value is the called syscall number, but it
      should have been an ENOSYS error. When run under ptrace it behaves
      correctly, which makes it hard to debug in the wild:
      
      > result=-1 errno=38 error=Function not implemented
      
      The %eax register is the return value register. For debugging via ptrace
      the syscall entry code stores the complete register context on the
      stack. The badsys handlers only store the ENOSYS error code in the
      ptrace register set and do not set %eax like a regular syscall handler
      would. The old resume_userspace call chain contains code that clobbers
      %eax and it restores %eax from the ptrace registers afterwards. The same
      goes for the ptrace-enabled call chain. When ptrace is not used, the
      syscall return value is the passed-in syscall number from the untouched
      %eax register.
      
      Use %eax as the return value register in syscall_badsys and
      sysenter_badsys, like a real syscall handler does, and have the caller
      push the value onto the stack for ptrace access.
      Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
      Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.netReviewed-and-tested-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: <stable@vger.kernel.org> # If 554086d8 is backported
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8142b215
  18. 19 7月, 2014 1 次提交
  19. 24 6月, 2014 1 次提交
  20. 05 5月, 2014 1 次提交
  21. 01 5月, 2014 1 次提交
  22. 24 4月, 2014 1 次提交
  23. 10 1月, 2014 1 次提交
  24. 09 11月, 2013 1 次提交
  25. 25 9月, 2013 1 次提交
  26. 05 9月, 2013 1 次提交
  27. 21 6月, 2013 1 次提交
    • S
      x86, trace: Add irq vector tracepoints · cf910e83
      Seiji Aguchi 提交于
      [Purpose of this patch]
      
      As Vaibhav explained in the thread below, tracepoints for irq vectors
      are useful.
      
      http://www.spinics.net/lists/mm-commits/msg85707.html
      
      <snip>
      The current interrupt traces from irq_handler_entry and irq_handler_exit
      provide when an interrupt is handled.  They provide good data about when
      the system has switched to kernel space and how it affects the currently
      running processes.
      
      There are some IRQ vectors which trigger the system into kernel space,
      which are not handled in generic IRQ handlers.  Tracing such events gives
      us the information about IRQ interaction with other system events.
      
      The trace also tells where the system is spending its time.  We want to
      know which cores are handling interrupts and how they are affecting other
      processes in the system.  Also, the trace provides information about when
      the cores are idle and which interrupts are changing that state.
      <snip>
      
      On the other hand, my usecase is tracing just local timer event and
      getting a value of instruction pointer.
      
      I suggested to add an argument local timer event to get instruction pointer before.
      But there is another way to get it with external module like systemtap.
      So, I don't need to add any argument to irq vector tracepoints now.
      
      [Patch Description]
      
      Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
      But there is an above use case to trace specific irq_vector rather than tracing all events.
      In this case, we are concerned about overhead due to unwanted events.
      
      So, add following tracepoints instead of introducing irq_vector_entry/exit.
      so that we can enable them independently.
         - local_timer_vector
         - reschedule_vector
         - call_function_vector
         - call_function_single_vector
         - irq_work_entry_vector
         - error_apic_vector
         - thermal_apic_vector
         - threshold_apic_vector
         - spurious_apic_vector
         - x86_platform_ipi_vector
      
      Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
      makes a zero when tracepoints are disabled. Detailed explanations are as follows.
       - Create trace irq handlers with entering_irq()/exiting_irq().
       - Create a new IDT, trace_idt_table, at boot time by adding a logic to
         _set_gate(). It is just a copy of original idt table.
       - Register the new handlers for tracpoints to the new IDT by introducing
         macros to alloc_intr_gate() called at registering time of irq_vector handlers.
       - Add checking, whether irq vector tracing is on/off, into load_current_idt().
         This has to be done below debug checking for these reasons.
         - Switching to debug IDT may be kicked while tracing is enabled.
         - On the other hands, switching to trace IDT is kicked only when debugging
           is disabled.
      
      In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
      used for other purposes.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      cf910e83
  28. 13 2月, 2013 1 次提交
  29. 04 2月, 2013 3 次提交
  30. 17 1月, 2013 1 次提交
    • A
      xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests. · 9174adbe
      Andrew Cooper 提交于
      This fixes CVE-2013-0190 / XSA-40
      
      There has been an error on the xen_failsafe_callback path for failed
      iret, which causes the stack pointer to be wrong when entering the
      iret_exc error path.  This can result in the kernel crashing.
      
      In the classic kernel case, the relevant code looked a little like:
      
              popl %eax      # Error code from hypervisor
              jz 5f
              addl $16,%esp
              jmp iret_exc   # Hypervisor said iret fault
      5:      addl $16,%esp
                             # Hypervisor said segment selector fault
      
      Here, there are two identical addls on either option of a branch which
      appears to have been optimised by hoisting it above the jz, and
      converting it to an lea, which leaves the flags register unaffected.
      
      In the PVOPS case, the code looks like:
      
              popl_cfi %eax         # Error from the hypervisor
              lea 16(%esp),%esp     # Add $16 before choosing fault path
              CFI_ADJUST_CFA_OFFSET -16
              jz 5f
              addl $16,%esp         # Incorrectly adjust %esp again
              jmp iret_exc
      
      It is possible unprivileged userspace applications to cause this
      behaviour, for example by loading an LDT code selector, then changing
      the code selector to be not-present.  At this point, there is a race
      condition where it is possible for the hypervisor to return back to
      userspace from an interrupt, fault on its own iret, and inject a
      failsafe_callback into the kernel.
      
      This bug has been present since the introduction of Xen PVOPS support
      in commit 5ead97c8 (xen: Core Xen implementation), in 2.6.23.
      Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9174adbe
  31. 20 12月, 2012 1 次提交
  32. 29 11月, 2012 1 次提交
  33. 20 10月, 2012 1 次提交
    • D
      xen/x86: don't corrupt %eip when returning from a signal handler · a349e23d
      David Vrabel 提交于
      In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS
      (-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event
      /and/ the process has a pending signal then %eip (and %eax) are
      corrupted when returning to the main process after handling the
      signal.  The application may then crash with SIGSEGV or a SIGILL or it
      may have subtly incorrect behaviour (depending on what instruction it
      returned to).
      
      The occurs because handle_signal() is incorrectly thinking that there
      is a system call that needs to restarted so it adjusts %eip and %eax
      to re-execute the system call instruction (even though user space had
      not done a system call).
      
      If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK
      (-516) then handle_signal() only corrupted %eax (by setting it to
      -EINTR).  This may cause the application to crash or have incorrect
      behaviour.
      
      handle_signal() assumes that regs->orig_ax >= 0 means a system call so
      any kernel entry point that is not for a system call must push a
      negative value for orig_ax.  For example, for physical interrupts on
      bare metal the inverse of the vector is pushed and page_fault() sets
      regs->orig_ax to -1, overwriting the hardware provided error code.
      
      xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax
      instead of -1.
      
      Classic Xen kernels pushed %eax which works as %eax cannot be both
      non-negative and -RESTARTSYS (etc.), but using -1 is consistent with
      other non-system call entry points and avoids some of the tests in
      handle_signal().
      
      There were similar bugs in xen_failsafe_callback() of both 32 and
      64-bit guests. If the fault was corrected and the normal return path
      was used then 0 was incorrectly pushed as the value for orig_ax.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NJan Beulich <JBeulich@suse.com>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a349e23d
  34. 13 10月, 2012 1 次提交
  35. 01 10月, 2012 1 次提交