1. 16 7月, 2008 2 次提交
  2. 12 7月, 2008 1 次提交
    • R
      x86_64: fix delayed signals · eca91e78
      Roland McGrath 提交于
      On three of the several paths in entry_64.S that call
      do_notify_resume() on the way back to user mode, we fail to properly
      check again for newly-arrived work that requires another call to
      do_notify_resume() before going to user mode.  These paths set the
      mask to check only _TIF_NEED_RESCHED, but this is wrong.  The other
      paths that lead to do_notify_resume() do this correctly already, and
      entry_32.S does it correctly in all cases.
      
      All paths back to user mode have to check all the _TIF_WORK_MASK
      flags at the last possible stage, with interrupts disabled.
      Otherwise, we miss any flags (TIF_SIGPENDING for example) that were
      set any time after we entered do_notify_resume().  More work flags
      can be set (or left set) synchronously inside do_notify_resume(), as
      TIF_SIGPENDING can be, or asynchronously by interrupts or other CPUs
      (which then send an asynchronous interrupt).
      
      There are many different scenarios that could hit this bug, most of
      them races.  The simplest one to demonstrate does not require any
      race: when one signal has done handler setup at the check before
      returning from a syscall, and there is another signal pending that
      should be handled.  The second signal's handler should interrupt the
      first signal handler before it actually starts (so the interrupted PC
      is still at the handler's entry point).  Instead, it runs away until
      the next kernel entry (next syscall, tick, etc).
      
      This test behaves correctly on 32-bit kernels, and fails on 64-bit
      (either 32-bit or 64-bit test binary).  With this fix, it works.
      
          #define _GNU_SOURCE
          #include <stdio.h>
          #include <signal.h>
          #include <string.h>
          #include <sys/ucontext.h>
      
          #ifndef REG_RIP
          #define REG_RIP REG_EIP
          #endif
      
          static sig_atomic_t hit1, hit2;
      
          static void
          handler (int sig, siginfo_t *info, void *ctx)
          {
            ucontext_t *uc = ctx;
      
            if ((void *) uc->uc_mcontext.gregs[REG_RIP] == &handler)
              {
                if (sig == SIGUSR1)
                  hit1 = 1;
                else
                  hit2 = 1;
              }
      
            printf ("%s at %#lx\n", strsignal (sig),
                    uc->uc_mcontext.gregs[REG_RIP]);
          }
      
          int
          main (void)
          {
            struct sigaction sa;
            sigset_t set;
      
            sigemptyset (&sa.sa_mask);
            sa.sa_flags = SA_SIGINFO;
            sa.sa_sigaction = &handler;
      
            if (sigaction (SIGUSR1, &sa, NULL)
                || sigaction (SIGUSR2, &sa, NULL))
              return 2;
      
            sigemptyset (&set);
            sigaddset (&set, SIGUSR1);
            sigaddset (&set, SIGUSR2);
            if (sigprocmask (SIG_BLOCK, &set, NULL))
              return 3;
      
            printf ("main at %p, handler at %p\n", &main, &handler);
      
            raise (SIGUSR1);
            raise (SIGUSR2);
      
            if (sigprocmask (SIG_UNBLOCK, &set, NULL))
              return 4;
      
            if (hit1 + hit2 == 1)
              {
                puts ("PASS");
                return 0;
              }
      
            puts ("FAIL");
            return 1;
          }
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eca91e78
  3. 09 7月, 2008 1 次提交
  4. 08 7月, 2008 7 次提交
  5. 27 6月, 2008 1 次提交
    • V
      x86: don't destroy %rbp on kernel-mode faults · 9d8ad5d6
      Vegard Nossum 提交于
      From the code:
      
          "B stepping K8s sometimes report an truncated RIP for IRET exceptions
          returning to compat mode. Check for these here too."
      
      The code then proceeds to truncate the upper 32 bits of %rbp. This means
      that when do_page_fault() is finally called, its prologue,
      
          do_page_fault:
              push %rbp
              movl %rsp, %rbp
      
      will put the truncated base pointer on the stack. This means that the
      stack tracer will not be able to follow the base-pointer changes and
      will see all subsequent stack frames as unreliable.
      
      This patch changes the code to use a different register (%rcx) for the
      checking and leaves %rbp untouched.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9d8ad5d6
  6. 26 6月, 2008 1 次提交
  7. 24 6月, 2008 1 次提交
  8. 19 6月, 2008 1 次提交
  9. 25 5月, 2008 1 次提交
  10. 24 5月, 2008 2 次提交
    • S
      ftrace: use dynamic patching for updating mcount calls · d61f82d0
      Steven Rostedt 提交于
      This patch replaces the indirect call to the mcount function
      pointer with a direct call that will be patched by the
      dynamic ftrace routines.
      
      On boot up, the mcount function calls the ftace_stub function.
      When the dynamic ftrace code is initialized, the ftrace_stub
      is replaced with a call to the ftrace_record_ip, which records
      the instruction pointers of the locations that call it.
      
      Later, the ftraced daemon will call kstop_machine and patch all
      the locations to nops.
      
      When a ftrace is enabled, the original calls to mcount will now
      be set top call ftrace_caller, which will do a direct call
      to the registered ftrace function. This direct call is also patched
      when the function that should be called is updated.
      
      All patching is performed by a kstop_machine routine to prevent any
      type of race conditions that is associated with modifying code
      on the fly.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d61f82d0
    • A
      ftrace: add basic support for gcc profiler instrumentation · 16444a8a
      Arnaldo Carvalho de Melo 提交于
      If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
      set to a non-zero value the ftrace routine will be called everytime
      we enter a kernel function that is not marked with the "notrace"
      attribute.
      
      The ftrace routine will then call a registered function if a function
      happens to be registered.
      
      [ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
        so don't blame Arnaldo for all of this ;-) ]
      
      Update:
        It is now possible to register more than one ftrace function.
        If only one ftrace function is registered, that will be the
        function that ftrace calls directly. If more than one function
        is registered, then ftrace will call a function that will loop
        through the functions to call.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      16444a8a
  11. 17 4月, 2008 1 次提交
    • R
      x86: ptrace vs -ENOSYS · a31f8dd7
      Roland McGrath 提交于
      When we're stopped at syscall entry tracing, ptrace can change the %rax
      value from -ENOSYS to something else.  If no system call is actually made
      because the syscall number (now in orig_rax) is bad, then we now always
      reset %rax to -ENOSYS again.
      
      This changes it to leave the return value alone after entry tracing.
      That way, the %rax value set by ptrace is there to be seen in user mode
      (or in syscall exit tracing).  This is consistent with what the 32-bit
      kernel does.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a31f8dd7
  12. 26 2月, 2008 1 次提交
    • I
      x86: fix execve with -fstack-protect · 5d119b2c
      Ingo Molnar 提交于
      pointed out by pageexec@freemail.hu:
      
      > what happens here is that gcc treats the argument area as owned by the
      > callee, not the caller and is allowed to do certain tricks. for ssp it
      > will make a copy of the struct passed by value into the local variable
      > area and pass *its* address down, and it won't copy it back into the
      > original instance stored in the argument area.
      >
      > so once sys_execve returns, the pt_regs passed by value hasn't at all
      > changed and its default content will cause a nice double fault (FWIW,
      > this part took me the longest to debug, being down with cold didn't
      > help it either ;).
      
      To fix this we pass in pt_regs by pointer.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5d119b2c
  13. 19 2月, 2008 1 次提交
  14. 10 2月, 2008 1 次提交
  15. 07 2月, 2008 2 次提交
  16. 30 1月, 2008 1 次提交
  17. 26 1月, 2008 1 次提交
    • P
      sched: high-res preemption tick · 8f4d37ec
      Peter Zijlstra 提交于
      Use HR-timers (when available) to deliver an accurate preemption tick.
      
      The regular scheduler tick that runs at 1/HZ can be too coarse when nice
      level are used. The fairness system will still keep the cpu utilisation 'fair'
      by then delaying the task that got an excessive amount of CPU time but try to
      minimize this by delivering preemption points spot-on.
      
      The average frequency of this extra interrupt is sched_latency / nr_latency.
      Which need not be higher than 1/HZ, its just that the distribution within the
      sched_latency period is important.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8f4d37ec
  18. 18 10月, 2007 1 次提交
  19. 12 10月, 2007 1 次提交
  20. 11 10月, 2007 2 次提交
  21. 01 8月, 2007 1 次提交
  22. 22 7月, 2007 1 次提交
    • T
      x86_64: support poll() on /dev/mcelog · e02e68d3
      Tim Hockin 提交于
      Background:
       /dev/mcelog is typically polled manually.  This is less than optimal for
       situations where accurate accounting of MCEs is important.  Calling
       poll() on /dev/mcelog does not work.
      
      Description:
       This patch adds support for poll() to /dev/mcelog.  This results in
       immediate wakeup of user apps whenever the poller finds MCEs.  Because
       the exception handler can not take any locks, it can not call the wakeup
       itself.  Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is
       caught at the next return from interrupt or exit from idle, calling the
       mce_user_notify() routine.  This patch also disables the "fake panic"
       path of the mce_panic(), because it results in printk()s in the exception
       handler and crashy systems.
      
       This patch also does some small cleanup for essentially unused variables,
       and moves the user notification into the body of the poller, so it is
       only called once per poll, rather than once per CPU.
      
      Result:
       Applications can now poll() on /dev/mcelog.  When an error is logged
       (whether through the poller or through an exception) the applications are
       woken up promptly.  This should not affect any previous behaviors.  If no
       MCEs are being logged, there is no overhead.
      
      Alternatives:
       I considered simply supporting poll() through the poller and not using
       TIF_MCE_NOTIFY at all.  However, the time between an uncorrectable error
       happening and the user application being notified is *the*most* critical
       window for us.  Many uncorrectable errors can be logged to the network if
       given a chance.
      
       I also considered doing the MCE poll directly from the idle notifier, but
       decided that was overkill.
      
      Testing:
       I used an error-injecting DIMM to create lots of correctable DRAM errors
       and verified that my user app is woken up in sync with the polling interval.
       I also used the northbridge to inject uncorrectable ECC errors, and
       verified (printk() to the rescue) that the notify routine is called and the
       user app does wake up.  I built with PREEMPT on and off, and verified
       that my machine survives MCEs.
      
      [wli@holomorphy.com: build fix]
      Signed-off-by: NTim Hockin <thockin@google.com>
      Signed-off-by: NWilliam Irwin <bill.irwin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e02e68d3
  23. 23 6月, 2007 1 次提交
  24. 03 5月, 2007 1 次提交
  25. 27 2月, 2007 1 次提交
    • E
      [PATCH] x86_64 irq: Safely cleanup an irq after moving it. · 61014292
      Eric W. Biederman 提交于
      The problem:  After moving an interrupt when is it safe to teardown
      the data structures for receiving the interrupt at the old location?
      
      With a normal pci device it is possible to issue a read to a device
      to flush all posted writes.  This does not work for the oldest ioapics
      because they are on a 3-wire apic bus which is a completely different
      data path.  For some more modern ioapics when everything is using
      front side bus delivery you can flush interrupts by simply issuing a
      read to the ioapic.  For other modern ioapics emperical testing has
      shown that this does not work.
      
      So it appears the only reliable way to know the last of the irqs from an
      ioapic have been received from before the ioapic was reprogrammed is to
      received the first irq from the ioapic from after it was reprogrammed.
      
      Once we know the last irq message has been received from an ioapic
      into a local apic we then need to know that irq message has been
      processed through the local apics.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61014292
  26. 16 12月, 2006 1 次提交
    • L
      Remove stack unwinder for now · d1526e2c
      Linus Torvalds 提交于
      It has caused more problems than it ever really solved, and is
      apparently not getting cleaned up and fixed.  We can put it back when
      it's stable and isn't likely to make warning or bug events worse.
      
      In the meantime, enable frame pointers for more readable stack traces.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d1526e2c
  27. 07 12月, 2006 1 次提交
    • J
      [PATCH] x86-64: miscellaneous entry.S adjustments · bcddc015
      Jan Beulich 提交于
      This patch:
      - makes ret_from_sys_call no longer global (all external users were
        previously switched to use int_ret_from_sys_call)
      - adjusts placement of a CFI_{REMEMBER,RESTORE}_STATE pair to better
        fit logic flow
      - eliminates an unnecessary pair of CFI_{REMEMBER,RESTORE}_STATE
      - glues together function- and unwinder-wise the previously separate
        system_call and int_ret_from_sys_call function fragments
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      bcddc015
  28. 22 10月, 2006 3 次提交