1. 21 11月, 2008 2 次提交
    • I
      x86: clean up after: move entry_64.S register saving out of the macros, fix · e8a0e276
      Ingo Molnar 提交于
      Impact: build fix
      
      The break builds with older binutils (2.16.1):
      
       arch/x86/kernel/entry_64.S: Assembler messages:
       arch/x86/kernel/entry_64.S:282: Error: too many positional arguments
       arch/x86/kernel/entry_64.S:283: Error: too many positional arguments
       arch/x86/kernel/entry_64.S:284: Error: too many positional arguments
       arch/x86/kernel/entry_64.S:285: Error: too many positional arguments
       arch/x86/kernel/entry_64.S:286: Error: too many positional arguments
       arch/x86/kernel/entry_64.S:287: Error: too many positional arguments
       arch/x86/kernel/entry_64.S:288: Error: too many positional arguments
       arch/x86/kernel/entry_64.S:289: Error: too many positional arguments
       arch/x86/kernel/entry_64.S:290: Error: too many positional arguments
      
      Took some time to figure out the detail that GAS chokes on: it's
      negative offsets. Rearrange the calculations to make sure we never
      go negative.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e8a0e276
    • A
      x86: clean up after: move entry_64.S register saving out of the macros · dcd072e2
      Alexander van Heukelum 提交于
      This add-on patch to x86: move entry_64.S register saving out
      of the macros visually cleans up the appearance of the code by
      introducing some basic helper macro's. It also adds some cfi
      annotations which were missing.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dcd072e2
  2. 20 11月, 2008 1 次提交
    • A
      x86: move entry_64.S register saving out of the macros · d99015b1
      Alexander van Heukelum 提交于
      Here is a combined patch that moves "save_args" out-of-line for
      the interrupt macro and moves "error_entry" mostly out-of-line
      for the zeroentry and errorentry macros.
      
      The save_args function becomes really straightforward and easy
      to understand, with the possible exception of the stack switch
      code, which now needs to copy the return address of to the
      calling function. Normal interrupts arrive with ((~vector)-0x80)
      on the stack, which gets adjusted in common_interrupt:
      
      <common_interrupt>:
      (5)  addq   $0xffffffffffffff80,(%rsp)		/* -> ~(vector) */
      (4)  sub    $0x50,%rsp				/* space for registers */
      (5)  callq  ffffffff80211290 <save_args>
      (5)  callq  ffffffff80214290 <do_IRQ>
      <ret_from_intr>:
           ...
      
      An apic interrupt stub now look like this:
      
      <thermal_interrupt>:
      (5)  pushq  $0xffffffffffffff05			/* ~(vector) */
      (4)  sub    $0x50,%rsp				/* space for registers */
      (5)  callq  ffffffff80211290 <save_args>
      (5)  callq  ffffffff80212b8f <smp_thermal_interrupt>
      (5)  jmpq   ffffffff80211f93 <ret_from_intr>
      
      Similarly the exception handler register saving function becomes
      simpler, without the need of any parameter shuffling. The stub
      for an exception without errorcode looks like this:
      
      <overflow>:
      (6)  callq  *0x1cad12(%rip)        # ffffffff803dd448 <pv_irq_ops+0x38>
      (2)  pushq  $0xffffffffffffffff			/* no syscall */
      (4)  sub    $0x78,%rsp				/* space for registers */
      (5)  callq  ffffffff8030e3b0 <error_entry>
      (3)  mov    %rsp,%rdi				/* pt_regs pointer */
      (2)  xor    %esi,%esi				/* no error code */
      (5)  callq  ffffffff80213446 <do_overflow>
      (5)  jmpq   ffffffff8030e460 <error_exit>
      
      And one for an exception with errorcode like this:
      
      <segment_not_present>:
      (6)  callq  *0x1cab92(%rip)        # ffffffff803dd448 <pv_irq_ops+0x38>
      (4)  sub    $0x78,%rsp				/* space for registers */
      (5)  callq  ffffffff8030e3b0 <error_entry>
      (3)  mov    %rsp,%rdi				/* pt_regs pointer */
      (5)  mov    0x78(%rsp),%rsi			/* load error code */
      (9)  movq   $0xffffffffffffffff,0x78(%rsp)	/* no syscall */
      (5)  callq  ffffffff80213209 <do_segment_not_present>
      (5)  jmpq   ffffffff8030e460 <error_exit>
      
      Unfortunately, this last type is more than 32 bytes. But the total space
      savings due to this patch is about 2500 bytes on an smp-configuration,
      and I think the code is clearer than it was before. The tested kernels
      were non-paravirt ones (i.e., without the indirect call at the top of
      the exception handlers).
      
      Anyhow, I tested this patch on top of a recent -tip. The machine
      was an 2x4-core Xeon at 2333MHz. Measured where the delays between
      (almost-)adjacent rdtsc instructions. The graphs show how much
      time is spent outside of the program as a function of the measured
      delay. The area under the graph represents the total time spent
      outside the program. Eight instances of the rdtsctest were
      started, each pinned to a single cpu. The histogams are added.
      For each kernel two measurements were done: one in mostly idle
      condition, the other while running "bonnie++ -f", bound to cpu 0.
      Each measurement took 40 minutes runtime. See the attached graphs
      for the results. The graphs overlap almost everywhere, but there
      are small differences.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d99015b1
  3. 17 11月, 2008 1 次提交
  4. 14 11月, 2008 1 次提交
  5. 13 11月, 2008 1 次提交
  6. 12 11月, 2008 1 次提交
    • H
      x86: 64 bits: shrink and align IRQ stubs · 939b7871
      H. Peter Anvin 提交于
      Move the IRQ stub generation to assembly to simplify it and for
      consistency with 32 bits.  Doing it in a C file with asm() statements
      doesn't help clarity, and it prevents some optimizations.
      
      Shrink the IRQ stubs down to just over four bytes per (we fit seven
      into a 32-byte chunk.)  This shrinks the total icache consumption of
      the IRQ stubs down to an even kilobyte, if all of them are in active
      use.
      
      The downside is that we end up with a double jump, which could have a
      negative effect on some pipelines.  The double jump is always inside
      the same cacheline on any modern chips.
      
      To get the most effect, cache-align the IRQ stubs.
      
      This makes the 64-bit code match changes already done to the 32-bit
      code, and should open up irqinit*.c for unification.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      939b7871
  7. 21 10月, 2008 1 次提交
  8. 14 10月, 2008 1 次提交
    • S
      ftrace: x86 mcount stub · 0a37605c
      Steven Rostedt 提交于
      x86 now sets up the mcount locations through the build and no longer
      needs to record the ip when the function is executed. This patch changes
      the initial mcount to simply return. There's no need to do any other work.
      If the ftrace start up test fails, the original mcount will be what everything
      will use, so having this as fast as possible is a good thing.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a37605c
  9. 13 10月, 2008 4 次提交
  10. 24 7月, 2008 4 次提交
    • A
      x86, 64-bit, dwarf2: push pushes 8 bytes and popf pops 8 · e0a5a5d9
      Alexander van Heukelum 提交于
      The CFI_ADJUST_CFA_OFFSET dwarf2 annotation of a push/popf
      pair in ret_from_fork wrongly used a value of 4. It should
      have been 8. Fix that.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: heukelum@fastmail.fm
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e0a5a5d9
    • R
      x86_64 ia32 syscall audit fast-path · 5cbf1565
      Roland McGrath 提交于
      This adds fast paths for 32-bit syscall entry and exit when
      TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
      These paths does not need to save and restore all registers as
      the general case of tracing does.  Avoiding the iret return path
      when syscall audit is enabled helps performance a lot.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      5cbf1565
    • R
      x86_64 syscall audit fast-path · 86a1c34a
      Roland McGrath 提交于
      This adds a fast path for 64-bit syscall entry and exit when
      TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
      This path does not need to save and restore all registers as
      the general case of tracing does.  Avoiding the iret return path
      when syscall audit is enabled helps performance a lot.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      86a1c34a
    • R
      x86_64: remove bogus optimization in sysret_signal · 15e8f348
      Roland McGrath 提交于
      This short-circuit path in sysret_signal looks wrong to me.
      AFAICT, in practice the branch is never taken--and if it were,
      it would go wrong.  To wit, try loading a module whose init
      function does set_thread_flag(TIF_IRET), and see insmod crash
      (presumably with a wrong user stack pointer).
      
      This is because the FIXUP_TOP_OF_STACK work hasn't been done yet
      when we jump around the call to ptregscall_common and get to
      int_with_check--where it expects the user RSP,SS,CS and EFLAGS to
      have been stored by FIXUP_TOP_OF_STACK.
      
      I don't think it's normally possible to get to sysret_signal with no
      _TIF_DO_NOTIFY_MASK bits set anyway, so these two instructions are
      already superfluous.  If it ever did happen, it is harmless to call
      do_notify_resume with nothing for it to do.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      15e8f348
  11. 17 7月, 2008 1 次提交
    • R
      x86 ptrace: unify syscall tracing · d4d67150
      Roland McGrath 提交于
      This unifies and cleans up the syscall tracing code on i386 and x86_64.
      
      Using a single function for entry and exit tracing on 32-bit made the
      do_syscall_trace() into some terrible spaghetti.  The logic is clear and
      simple using separate syscall_trace_enter() and syscall_trace_leave()
      functions as on 64-bit.
      
      The unification adds PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP support
      on x86_64, for 32-bit ptrace() callers and for 64-bit ptrace() callers
      tracing either 32-bit or 64-bit tasks.  It behaves just like 32-bit.
      
      Changing syscall_trace_enter() to return the syscall number shortens
      all the assembly paths, while adding the SYSEMU feature in a simple way.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      d4d67150
  12. 16 7月, 2008 3 次提交
  13. 12 7月, 2008 1 次提交
    • R
      x86_64: fix delayed signals · eca91e78
      Roland McGrath 提交于
      On three of the several paths in entry_64.S that call
      do_notify_resume() on the way back to user mode, we fail to properly
      check again for newly-arrived work that requires another call to
      do_notify_resume() before going to user mode.  These paths set the
      mask to check only _TIF_NEED_RESCHED, but this is wrong.  The other
      paths that lead to do_notify_resume() do this correctly already, and
      entry_32.S does it correctly in all cases.
      
      All paths back to user mode have to check all the _TIF_WORK_MASK
      flags at the last possible stage, with interrupts disabled.
      Otherwise, we miss any flags (TIF_SIGPENDING for example) that were
      set any time after we entered do_notify_resume().  More work flags
      can be set (or left set) synchronously inside do_notify_resume(), as
      TIF_SIGPENDING can be, or asynchronously by interrupts or other CPUs
      (which then send an asynchronous interrupt).
      
      There are many different scenarios that could hit this bug, most of
      them races.  The simplest one to demonstrate does not require any
      race: when one signal has done handler setup at the check before
      returning from a syscall, and there is another signal pending that
      should be handled.  The second signal's handler should interrupt the
      first signal handler before it actually starts (so the interrupted PC
      is still at the handler's entry point).  Instead, it runs away until
      the next kernel entry (next syscall, tick, etc).
      
      This test behaves correctly on 32-bit kernels, and fails on 64-bit
      (either 32-bit or 64-bit test binary).  With this fix, it works.
      
          #define _GNU_SOURCE
          #include <stdio.h>
          #include <signal.h>
          #include <string.h>
          #include <sys/ucontext.h>
      
          #ifndef REG_RIP
          #define REG_RIP REG_EIP
          #endif
      
          static sig_atomic_t hit1, hit2;
      
          static void
          handler (int sig, siginfo_t *info, void *ctx)
          {
            ucontext_t *uc = ctx;
      
            if ((void *) uc->uc_mcontext.gregs[REG_RIP] == &handler)
              {
                if (sig == SIGUSR1)
                  hit1 = 1;
                else
                  hit2 = 1;
              }
      
            printf ("%s at %#lx\n", strsignal (sig),
                    uc->uc_mcontext.gregs[REG_RIP]);
          }
      
          int
          main (void)
          {
            struct sigaction sa;
            sigset_t set;
      
            sigemptyset (&sa.sa_mask);
            sa.sa_flags = SA_SIGINFO;
            sa.sa_sigaction = &handler;
      
            if (sigaction (SIGUSR1, &sa, NULL)
                || sigaction (SIGUSR2, &sa, NULL))
              return 2;
      
            sigemptyset (&set);
            sigaddset (&set, SIGUSR1);
            sigaddset (&set, SIGUSR2);
            if (sigprocmask (SIG_BLOCK, &set, NULL))
              return 3;
      
            printf ("main at %p, handler at %p\n", &main, &handler);
      
            raise (SIGUSR1);
            raise (SIGUSR2);
      
            if (sigprocmask (SIG_UNBLOCK, &set, NULL))
              return 4;
      
            if (hit1 + hit2 == 1)
              {
                puts ("PASS");
                return 0;
              }
      
            puts ("FAIL");
            return 1;
          }
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eca91e78
  14. 09 7月, 2008 1 次提交
  15. 08 7月, 2008 7 次提交
  16. 27 6月, 2008 1 次提交
    • V
      x86: don't destroy %rbp on kernel-mode faults · 9d8ad5d6
      Vegard Nossum 提交于
      From the code:
      
          "B stepping K8s sometimes report an truncated RIP for IRET exceptions
          returning to compat mode. Check for these here too."
      
      The code then proceeds to truncate the upper 32 bits of %rbp. This means
      that when do_page_fault() is finally called, its prologue,
      
          do_page_fault:
              push %rbp
              movl %rsp, %rbp
      
      will put the truncated base pointer on the stack. This means that the
      stack tracer will not be able to follow the base-pointer changes and
      will see all subsequent stack frames as unreliable.
      
      This patch changes the code to use a different register (%rcx) for the
      checking and leaves %rbp untouched.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9d8ad5d6
  17. 26 6月, 2008 1 次提交
  18. 24 6月, 2008 1 次提交
  19. 19 6月, 2008 1 次提交
  20. 25 5月, 2008 1 次提交
  21. 24 5月, 2008 2 次提交
    • S
      ftrace: use dynamic patching for updating mcount calls · d61f82d0
      Steven Rostedt 提交于
      This patch replaces the indirect call to the mcount function
      pointer with a direct call that will be patched by the
      dynamic ftrace routines.
      
      On boot up, the mcount function calls the ftace_stub function.
      When the dynamic ftrace code is initialized, the ftrace_stub
      is replaced with a call to the ftrace_record_ip, which records
      the instruction pointers of the locations that call it.
      
      Later, the ftraced daemon will call kstop_machine and patch all
      the locations to nops.
      
      When a ftrace is enabled, the original calls to mcount will now
      be set top call ftrace_caller, which will do a direct call
      to the registered ftrace function. This direct call is also patched
      when the function that should be called is updated.
      
      All patching is performed by a kstop_machine routine to prevent any
      type of race conditions that is associated with modifying code
      on the fly.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d61f82d0
    • A
      ftrace: add basic support for gcc profiler instrumentation · 16444a8a
      Arnaldo Carvalho de Melo 提交于
      If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
      set to a non-zero value the ftrace routine will be called everytime
      we enter a kernel function that is not marked with the "notrace"
      attribute.
      
      The ftrace routine will then call a registered function if a function
      happens to be registered.
      
      [ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
        so don't blame Arnaldo for all of this ;-) ]
      
      Update:
        It is now possible to register more than one ftrace function.
        If only one ftrace function is registered, that will be the
        function that ftrace calls directly. If more than one function
        is registered, then ftrace will call a function that will loop
        through the functions to call.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      16444a8a
  22. 17 4月, 2008 1 次提交
    • R
      x86: ptrace vs -ENOSYS · a31f8dd7
      Roland McGrath 提交于
      When we're stopped at syscall entry tracing, ptrace can change the %rax
      value from -ENOSYS to something else.  If no system call is actually made
      because the syscall number (now in orig_rax) is bad, then we now always
      reset %rax to -ENOSYS again.
      
      This changes it to leave the return value alone after entry tracing.
      That way, the %rax value set by ptrace is there to be seen in user mode
      (or in syscall exit tracing).  This is consistent with what the 32-bit
      kernel does.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a31f8dd7
  23. 26 2月, 2008 1 次提交
    • I
      x86: fix execve with -fstack-protect · 5d119b2c
      Ingo Molnar 提交于
      pointed out by pageexec@freemail.hu:
      
      > what happens here is that gcc treats the argument area as owned by the
      > callee, not the caller and is allowed to do certain tricks. for ssp it
      > will make a copy of the struct passed by value into the local variable
      > area and pass *its* address down, and it won't copy it back into the
      > original instance stored in the argument area.
      >
      > so once sys_execve returns, the pt_regs passed by value hasn't at all
      > changed and its default content will cause a nice double fault (FWIW,
      > this part took me the longest to debug, being down with cold didn't
      > help it either ;).
      
      To fix this we pass in pt_regs by pointer.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5d119b2c
  24. 19 2月, 2008 1 次提交