1. 23 2月, 2015 1 次提交
    • B
      x86/alternatives: Add instruction padding · 4332195c
      Borislav Petkov 提交于
      Up until now we have always paid attention to make sure the length of
      the new instruction replacing the old one is at least less or equal to
      the length of the old instruction. If the new instruction is longer, at
      the time it replaces the old instruction it will overwrite the beginning
      of the next instruction in the kernel image and cause your pants to
      catch fire.
      
      So instead of having to pay attention, teach the alternatives framework
      to pad shorter old instructions with NOPs at buildtime - but only in the
      case when
      
        len(old instruction(s)) < len(new instruction(s))
      
      and add nothing in the >= case. (In that case we do add_nops() when
      patching).
      
      This way the alternatives user shouldn't have to care about instruction
      sizes and simply use the macros.
      
      Add asm ALTERNATIVE* flavor macros too, while at it.
      
      Also, we need to save the pad length in a separate struct alt_instr
      member for NOP optimization and the way to do that reliably is to carry
      the pad length instead of trying to detect whether we're looking at
      single-byte NOPs or at pathological instruction offsets like e9 90 90 90
      90, for example, which is a valid instruction.
      
      Thanks to Michael Matz for the great help with toolchain questions.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      4332195c
  2. 16 12月, 2014 1 次提交
    • J
      x86: Avoid building unused IRQ entry stubs · 2414e021
      Jan Beulich 提交于
      When X86_LOCAL_APIC (i.e. unconditionally on x86-64),
      first_system_vector will never end up being higher than
      LOCAL_TIMER_VECTOR (0xef), and hence building stubs for vectors
      0xef...0xff is pointlessly reducing code density. Deal with this at
      build time already.
      
      Taking into consideration that X86_64 implies X86_LOCAL_APIC, also
      simplify (and hence make easier to read and more consistent with the
      change done here) some #if-s in arch/x86/kernel/irqinit.c.
      
      While we could further improve the packing of the IRQ entry stubs (the
      four ones now left in the last set could be fit into the four padding
      bytes each of the final four sets have) this doesn't seem to provide
      any real benefit: Both irq_entries_start and common_interrupt getting
      cache line aligned, eliminating the 30th set would just produce 32
      bytes of padding between the 29th and common_interrupt.
      
      [ tglx: Folded lguest fix from Dan Carpenter ]
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: lguest@lists.ozlabs.org
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Link: http://lkml.kernel.org/r/54574D5F0200007800044389@mail.emea.novell.com
      Link: http://lkml.kernel.org/r/20141115185718.GB6530@mwandaSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2414e021
  3. 12 12月, 2014 1 次提交
  4. 25 10月, 2014 1 次提交
  5. 24 9月, 2014 1 次提交
    • R
      audit: x86: drop arch from __audit_syscall_entry() interface · b4f0d375
      Richard Guy Briggs 提交于
      Since the arch is found locally in __audit_syscall_entry(), there is no need to
      pass it in as a parameter.  Delete it from the parameter list.
      
      x86* was the only arch to call __audit_syscall_entry() directly and did so from
      assembly code.
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-audit@redhat.com
      Signed-off-by: NEric Paris <eparis@redhat.com>
      
      ---
      
      As this patch relies on changes in the audit tree, I think it
      appropriate to send it through my tree rather than the x86 tree.
      b4f0d375
  6. 16 8月, 2014 1 次提交
  7. 22 7月, 2014 1 次提交
    • S
      x86_32, entry: Store badsys error code in %eax · 8142b215
      Sven Wegener 提交于
      Commit 554086d8 ("x86_32, entry: Do syscall exit work on badsys
      (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry
      code, resulting in syscall() not returning proper errors for undefined
      syscalls on CPUs supporting the sysenter feature.
      
      The following code:
      
      > int result = syscall(666);
      > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno));
      
      results in:
      
      > result=666 errno=0 error=Success
      
      Obviously, the syscall return value is the called syscall number, but it
      should have been an ENOSYS error. When run under ptrace it behaves
      correctly, which makes it hard to debug in the wild:
      
      > result=-1 errno=38 error=Function not implemented
      
      The %eax register is the return value register. For debugging via ptrace
      the syscall entry code stores the complete register context on the
      stack. The badsys handlers only store the ENOSYS error code in the
      ptrace register set and do not set %eax like a regular syscall handler
      would. The old resume_userspace call chain contains code that clobbers
      %eax and it restores %eax from the ptrace registers afterwards. The same
      goes for the ptrace-enabled call chain. When ptrace is not used, the
      syscall return value is the passed-in syscall number from the untouched
      %eax register.
      
      Use %eax as the return value register in syscall_badsys and
      sysenter_badsys, like a real syscall handler does, and have the caller
      push the value onto the stack for ptrace access.
      Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
      Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.netReviewed-and-tested-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: <stable@vger.kernel.org> # If 554086d8 is backported
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8142b215
  8. 19 7月, 2014 1 次提交
  9. 24 6月, 2014 1 次提交
  10. 05 5月, 2014 1 次提交
  11. 01 5月, 2014 1 次提交
  12. 24 4月, 2014 1 次提交
  13. 10 1月, 2014 1 次提交
  14. 09 11月, 2013 1 次提交
  15. 25 9月, 2013 1 次提交
  16. 05 9月, 2013 1 次提交
  17. 21 6月, 2013 1 次提交
    • S
      x86, trace: Add irq vector tracepoints · cf910e83
      Seiji Aguchi 提交于
      [Purpose of this patch]
      
      As Vaibhav explained in the thread below, tracepoints for irq vectors
      are useful.
      
      http://www.spinics.net/lists/mm-commits/msg85707.html
      
      <snip>
      The current interrupt traces from irq_handler_entry and irq_handler_exit
      provide when an interrupt is handled.  They provide good data about when
      the system has switched to kernel space and how it affects the currently
      running processes.
      
      There are some IRQ vectors which trigger the system into kernel space,
      which are not handled in generic IRQ handlers.  Tracing such events gives
      us the information about IRQ interaction with other system events.
      
      The trace also tells where the system is spending its time.  We want to
      know which cores are handling interrupts and how they are affecting other
      processes in the system.  Also, the trace provides information about when
      the cores are idle and which interrupts are changing that state.
      <snip>
      
      On the other hand, my usecase is tracing just local timer event and
      getting a value of instruction pointer.
      
      I suggested to add an argument local timer event to get instruction pointer before.
      But there is another way to get it with external module like systemtap.
      So, I don't need to add any argument to irq vector tracepoints now.
      
      [Patch Description]
      
      Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
      But there is an above use case to trace specific irq_vector rather than tracing all events.
      In this case, we are concerned about overhead due to unwanted events.
      
      So, add following tracepoints instead of introducing irq_vector_entry/exit.
      so that we can enable them independently.
         - local_timer_vector
         - reschedule_vector
         - call_function_vector
         - call_function_single_vector
         - irq_work_entry_vector
         - error_apic_vector
         - thermal_apic_vector
         - threshold_apic_vector
         - spurious_apic_vector
         - x86_platform_ipi_vector
      
      Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
      makes a zero when tracepoints are disabled. Detailed explanations are as follows.
       - Create trace irq handlers with entering_irq()/exiting_irq().
       - Create a new IDT, trace_idt_table, at boot time by adding a logic to
         _set_gate(). It is just a copy of original idt table.
       - Register the new handlers for tracpoints to the new IDT by introducing
         macros to alloc_intr_gate() called at registering time of irq_vector handlers.
       - Add checking, whether irq vector tracing is on/off, into load_current_idt().
         This has to be done below debug checking for these reasons.
         - Switching to debug IDT may be kicked while tracing is enabled.
         - On the other hands, switching to trace IDT is kicked only when debugging
           is disabled.
      
      In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
      used for other purposes.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      cf910e83
  18. 13 2月, 2013 1 次提交
  19. 04 2月, 2013 3 次提交
  20. 17 1月, 2013 1 次提交
    • A
      xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests. · 9174adbe
      Andrew Cooper 提交于
      This fixes CVE-2013-0190 / XSA-40
      
      There has been an error on the xen_failsafe_callback path for failed
      iret, which causes the stack pointer to be wrong when entering the
      iret_exc error path.  This can result in the kernel crashing.
      
      In the classic kernel case, the relevant code looked a little like:
      
              popl %eax      # Error code from hypervisor
              jz 5f
              addl $16,%esp
              jmp iret_exc   # Hypervisor said iret fault
      5:      addl $16,%esp
                             # Hypervisor said segment selector fault
      
      Here, there are two identical addls on either option of a branch which
      appears to have been optimised by hoisting it above the jz, and
      converting it to an lea, which leaves the flags register unaffected.
      
      In the PVOPS case, the code looks like:
      
              popl_cfi %eax         # Error from the hypervisor
              lea 16(%esp),%esp     # Add $16 before choosing fault path
              CFI_ADJUST_CFA_OFFSET -16
              jz 5f
              addl $16,%esp         # Incorrectly adjust %esp again
              jmp iret_exc
      
      It is possible unprivileged userspace applications to cause this
      behaviour, for example by loading an LDT code selector, then changing
      the code selector to be not-present.  At this point, there is a race
      condition where it is possible for the hypervisor to return back to
      userspace from an interrupt, fault on its own iret, and inject a
      failsafe_callback into the kernel.
      
      This bug has been present since the introduction of Xen PVOPS support
      in commit 5ead97c8 (xen: Core Xen implementation), in 2.6.23.
      Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9174adbe
  21. 20 12月, 2012 1 次提交
  22. 29 11月, 2012 1 次提交
  23. 20 10月, 2012 1 次提交
    • D
      xen/x86: don't corrupt %eip when returning from a signal handler · a349e23d
      David Vrabel 提交于
      In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS
      (-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event
      /and/ the process has a pending signal then %eip (and %eax) are
      corrupted when returning to the main process after handling the
      signal.  The application may then crash with SIGSEGV or a SIGILL or it
      may have subtly incorrect behaviour (depending on what instruction it
      returned to).
      
      The occurs because handle_signal() is incorrectly thinking that there
      is a system call that needs to restarted so it adjusts %eip and %eax
      to re-execute the system call instruction (even though user space had
      not done a system call).
      
      If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK
      (-516) then handle_signal() only corrupted %eax (by setting it to
      -EINTR).  This may cause the application to crash or have incorrect
      behaviour.
      
      handle_signal() assumes that regs->orig_ax >= 0 means a system call so
      any kernel entry point that is not for a system call must push a
      negative value for orig_ax.  For example, for physical interrupts on
      bare metal the inverse of the vector is pushed and page_fault() sets
      regs->orig_ax to -1, overwriting the hardware provided error code.
      
      xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax
      instead of -1.
      
      Classic Xen kernels pushed %eax which works as %eax cannot be both
      non-negative and -RESTARTSYS (etc.), but using -1 is consistent with
      other non-system call entry points and avoids some of the tests in
      handle_signal().
      
      There were similar bugs in xen_failsafe_callback() of both 32 and
      64-bit guests. If the fault was corrected and the normal return path
      was used then 0 was incorrectly pushed as the value for orig_ax.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: NJan Beulich <JBeulich@suse.com>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a349e23d
  24. 13 10月, 2012 1 次提交
  25. 01 10月, 2012 3 次提交
  26. 22 9月, 2012 1 次提交
  27. 14 9月, 2012 1 次提交
  28. 31 7月, 2012 2 次提交
  29. 20 7月, 2012 3 次提交
  30. 02 6月, 2012 1 次提交
    • A
      x86: get rid of calling do_notify_resume() when returning to kernel mode · 44fbbb3d
      Al Viro 提交于
      If we end up calling do_notify_resume() with !user_mode(refs), it
      does nothing (do_signal() explicitly bails out and we can't get there
      with TIF_NOTIFY_RESUME in such situations).  Then we jump to
      resume_userspace_sig, which rechecks the same thing and bails out
      to resume_kernel, thus breaking the loop.
      
      It's easier and cheaper to check *before* calling do_notify_resume()
      and bail out to resume_kernel immediately.  And kill the check in
      do_signal()...
      
      Note that on amd64 we can't get there with !user_mode() at all - asm
      glue takes care of that.
      Acked-and-reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      44fbbb3d
  31. 21 4月, 2012 1 次提交
  32. 23 3月, 2012 1 次提交
    • D
      x86-32: Fix endless loop when processing signals for kernel tasks · 29a2e283
      Dmitry Adamushko 提交于
      The problem occurs on !CONFIG_VM86 kernels [1] when a kernel-mode task
      returns from a system call with a pending signal.
      
      A real-life scenario is a child of 'khelper' returning from a failed
      kernel_execve() in ____call_usermodehelper() [ kernel/kmod.c ].
      kernel_execve() fails due to a pending SIGKILL, which is the result of
      "kill -9 -1" (at least, busybox's init does it upon reboot).
      
      The loop is as follows:
      
      * syscall_exit_work:
       - work_pending:            // start_of_the_loop
       - work_notify_sig:
         - do_notify_resume()
           - do_signal()
             - if (!user_mode(regs)) return;
       - resume_userspace         // TIF_SIGPENDING is still set
       - work_pending             // so we call work_pending => goto
                                  // start_of_the_loop
      
      More information can be found in another LKML thread:
      http://www.serverphorums.com/read.php?12,457826
      
      [1] the problem was also seen on MIPS.
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Link: http://lkml.kernel.org/r/1332448765.2299.68.camel@dimm
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      29a2e283
  33. 18 1月, 2012 1 次提交