1. 17 2月, 2010 1 次提交
  2. 30 1月, 2010 2 次提交
    • J
      perf, hw_breakpoint, kgdb: Do not take mutex for kernel debugger · 5352ae63
      Jason Wessel 提交于
      This patch fixes the regression in functionality where the
      kernel debugger and the perf API do not nicely share hw
      breakpoint reservations.
      
      The kernel debugger cannot use any mutex_lock() calls because it
      can start the kernel running from an invalid context.
      
      A mutex free version of the reservation API needed to get
      created for the kernel debugger to safely update hw breakpoint
      reservations.
      
      The possibility for a breakpoint reservation to be concurrently
      processed at the time that kgdb interrupts the system is
      improbable. Should this corner case occur the end user is
      warned, and the kernel debugger will prohibit updating the
      hardware breakpoint reservations.
      
      Any time the kernel debugger reserves a hardware breakpoint it
      will be a system wide reservation.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: kgdb-bugreport@lists.sourceforge.net
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: torvalds@linux-foundation.org
      LKML-Reference: <1264719883-7285-3-git-send-email-jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5352ae63
    • J
      x86, hw_breakpoints, kgdb: Fix kgdb to use hw_breakpoint API · cc096749
      Jason Wessel 提交于
      In the 2.6.33 kernel, the hw_breakpoint API is now used for the
      performance event counters.  The hw_breakpoint_handler() now
      consumes the hw breakpoints that were previously set by kgdb
      arch specific code.  In order for kgdb to work in conjunction
      with this core API change, kgdb must use some of the low level
      functions of the hw_breakpoint API to install, uninstall, and
      deal with hw breakpoint reservations.
      
      The kgdb core required a change to call kgdb_disable_hw_debug
      anytime a slave cpu enters kgdb_wait() in order to keep all the
      hw breakpoints in sync as well as to prevent hitting a hw
      breakpoint while kgdb is active.
      
      During the architecture specific initialization of kgdb, it will
      pre-allocate 4 disabled (struct perf event **) structures.  Kgdb
      will use these to manage the capabilities for the 4 hw
      breakpoint registers, per cpu.  Right now the hw_breakpoint API
      does not have a way to ask how many breakpoints are available,
      on each CPU so it is possible that the install of a breakpoint
      might fail when kgdb restores the system to the run state.  The
      intent of this patch is to first get the basic functionality of
      hw breakpoints working and leave it to the person debugging the
      kernel to understand what hw breakpoints are in use and what
      restrictions have been imposed as a result.  Breakpoint
      constraints will be dealt with in a future patch.
      
      While atomic, the x86 specific kgdb code will call
      arch_uninstall_hw_breakpoint() and arch_install_hw_breakpoint()
      to manage the cpu specific hw breakpoints.
      
      The net result of these changes allow kgdb to use the same pool
      of hw_breakpoints that are used by the perf event API, but
      neither knows about future reservations for the available hw
      breakpoint slots.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: kgdb-bugreport@lists.sourceforge.net
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: torvalds@linux-foundation.org
      LKML-Reference: <1264719883-7285-2-git-send-email-jason.wessel@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc096749
  3. 28 1月, 2010 3 次提交
  4. 27 1月, 2010 4 次提交
    • O
      lockdep: Fix check_usage_backwards() error message · 48d50674
      Oleg Nesterov 提交于
      Lockdep has found the real bug, but the output doesn't look right to me:
      
      > =========================================================
      > [ INFO: possible irq lock inversion dependency detected ]
      > 2.6.33-rc5 #77
      > ---------------------------------------------------------
      > emacs/1609 just changed the state of lock:
      >  (&(&tty->ctrl_lock)->rlock){+.....}, at: [<ffffffff8127c648>] tty_fasync+0xe8/0x190
      > but this lock took another, HARDIRQ-unsafe lock in the past:
      >  (&(&sighand->siglock)->rlock){-.....}
      
      "HARDIRQ-unsafe" and "this lock took another" looks wrong, afaics.
      
      >   ... key      at: [<ffffffff81c054a4>] __key.46539+0x0/0x8
      >   ... acquired at:
      >    [<ffffffff81089af6>] __lock_acquire+0x1056/0x15a0
      >    [<ffffffff8108a0df>] lock_acquire+0x9f/0x120
      >    [<ffffffff81423012>] _raw_spin_lock_irqsave+0x52/0x90
      >    [<ffffffff8127c1be>] __proc_set_tty+0x3e/0x150
      >    [<ffffffff8127e01d>] tty_open+0x51d/0x5e0
      
      The stack-trace shows that this lock (ctrl_lock) was taken under
      ->siglock (which is hopefully irq-safe).
      
      This is a clear typo in check_usage_backwards() where we tell the print a
      fancy routine we're forwards.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20100126181641.GA10460@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      48d50674
    • M
      tracing/documentation: Cover new frame pointer semantics · 03688970
      Mike Frysinger 提交于
      Update the graph tracer examples to cover the new frame pointer semantics
      (in terms of passing it along).  Move the HAVE_FUNCTION_GRAPH_FP_TEST docs
      out of the Kconfig, into the right place, and expand on the details.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      LKML-Reference: <1264165967-18938-1-git-send-email-vapier@gentoo.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      03688970
    • S
      ring-buffer: Check for end of page in iterator · 3c05d748
      Steven Rostedt 提交于
      If the iterator comes to an empty page for some reason, or if
      the page is emptied by a consuming read. The iterator code currently
      does not check if the iterator is pass the contents, and may
      return a false entry.
      
      This patch adds a check to the ring buffer iterator to test if the
      current page has been completely read and sets the iterator to the
      next page if necessary.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3c05d748
    • S
      ring-buffer: Check if ring buffer iterator has stale data · 492a74f4
      Steven Rostedt 提交于
      Usually reads of the ring buffer is performed by a single task.
      There are two types of reads from the ring buffer.
      
      One is a consuming read which will consume the entry that was read
      and the next read will be the entry that follows.
      
      The other is an iterator that will let the user read the contents of
      the ring buffer without modifying it. When an iterator is allocated,
      writes to the ring buffer are disabled to protect the iterator.
      
      The problem exists when consuming reads happen while an iterator is
      allocated. Specifically, the kind of read that swaps out an entire
      page (used by splice) and replaces it with a new read. If the iterator
      is on the page that is swapped out, then the next read may read
      from this swapped out page and return garbage.
      
      This patch adds a check when reading the iterator to make sure that
      the iterator contents are still valid. If a consuming read has taken
      place, the iterator is reset.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      492a74f4
  5. 26 1月, 2010 2 次提交
    • T
      clocksource: Prevent potential kgdb dead lock · 7b7422a5
      Thomas Gleixner 提交于
      commit 0f8e8ef7 (clocksource: Simplify clocksource watchdog resume
      logic) introduced a potential kgdb dead lock. When the kernel is
      stopped by kgdb inside code which holds watchdog_lock then kgdb dead
      locks in clocksource_resume_watchdog().
      
      clocksource_resume_watchdog() is called from kbdg via
      clocksource_touch_watchdog() to avoid that the clock source watchdog
      marks TSC unstable after the kernel has been stopped.
      
      Solve this by replacing spin_lock with a spin_trylock and just return
      in case the lock is held. Not resetting the watchdog might result in
      TSC becoming marked unstable, but that's an acceptable penalty for
      using kgdb.
      
      The timekeeping is anyway easily screwed up by kgdb when the system
      uses either jiffies or a clock source which wraps in short intervals
      (e.g. pm_timer wraps about every 4.6s), so we really do not have to
      worry about that occasional TSC marked unstable side effect.
      
      The second caller of clocksource_resume_watchdog() is
      clocksource_resume(). The trylock is safe here as well because the
      system is UP at this point, interrupts are disabled and nothing else
      can hold watchdog_lock().
      Reported-by: NJason Wessel <jason.wessel@windriver.com>
      LKML-Reference: <1264480000-6997-4-git-send-email-jason.wessel@windriver.com>
      Cc: kgdb-bugreport@lists.sourceforge.net
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7b7422a5
    • S
      tracing: Prevent kernel oops with corrupted buffer · 74bf4076
      Steven Rostedt 提交于
      If the contents of the ftrace ring buffer gets corrupted and the trace
      file is read, it could create a kernel oops (usualy just killing the user
      task thread). This is caused by the checking of the pid in the buffer.
      If the pid is negative, it still references the cmdline cache array,
      which could point to an invalid address.
      
      The simple fix is to test for negative PIDs.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      74bf4076
  6. 22 1月, 2010 1 次提交
    • P
      sched: Fix fork vs hotplug vs cpuset namespaces · fabf318e
      Peter Zijlstra 提交于
      There are a number of issues:
      
      1) TASK_WAKING vs cgroup_clone (cpusets)
      
      copy_process():
      
        sched_fork()
          child->state = TASK_WAKING; /* waiting for wake_up_new_task() */
        if (current->nsproxy != p->nsproxy)
           ns_cgroup_clone()
             cgroup_clone()
               mutex_lock(inode->i_mutex)
               mutex_lock(cgroup_mutex)
               cgroup_attach_task()
      	   ss->can_attach()
                 ss->attach() [ -> cpuset_attach() ]
                   cpuset_attach_task()
                     set_cpus_allowed_ptr();
                       while (child->state == TASK_WAKING)
                         cpu_relax();
      will deadlock the system.
      
      
      2) cgroup_clone (cpusets) vs copy_process
      
      So even if the above would work we still have:
      
      copy_process():
      
        if (current->nsproxy != p->nsproxy)
           ns_cgroup_clone()
             cgroup_clone()
               mutex_lock(inode->i_mutex)
               mutex_lock(cgroup_mutex)
               cgroup_attach_task()
      	   ss->can_attach()
                 ss->attach() [ -> cpuset_attach() ]
                   cpuset_attach_task()
                     set_cpus_allowed_ptr();
        ...
      
        p->cpus_allowed = current->cpus_allowed
      
      over-writing the modified cpus_allowed.
      
      
      3) fork() vs hotplug
      
        if we unplug the child's cpu after the sanity check when the child
        gets attached to the task_list but before wake_up_new_task() shit
        will meet with fan.
      
      Solve all these issues by moving fork cpu selection into
      wake_up_new_task().
      Reported-by: NSerge E. Hallyn <serue@us.ibm.com>
      Tested-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1264106190.4283.1314.camel@laptop>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      fabf318e
  7. 21 1月, 2010 4 次提交
  8. 18 1月, 2010 1 次提交
  9. 17 1月, 2010 5 次提交
  10. 15 1月, 2010 6 次提交
  11. 13 1月, 2010 1 次提交
    • K
      futexes: Remove rw parameter from get_futex_key() · 7485d0d3
      KOSAKI Motohiro 提交于
      Currently, futexes have two problem:
      
      A) The current futex code doesn't handle private file mappings properly.
      
      get_futex_key() uses PageAnon() to distinguish file and
      anon, which can cause the following bad scenario:
      
        1) thread-A call futex(private-mapping, FUTEX_WAIT), it
           sleeps on file mapping object.
        2) thread-B writes a variable and it makes it cow.
        3) thread-B calls futex(private-mapping, FUTEX_WAKE), it
           wakes up blocked thread on the anonymous page. (but it's nothing)
      
      B) Current futex code doesn't handle zero page properly.
      
      Read mode get_user_pages() can return zero page, but current
      futex code doesn't handle it at all. Then, zero page makes
      infinite loop internally.
      
      The solution is to use write mode get_user_page() always for
      page lookup. It prevents the lookup of both file page of private
      mappings and zero page.
      
      Performance concerns:
      
      Probaly very little, because glibc always initialize variables
      for futex before to call futex(). It means glibc users never see
      the overhead of this patch.
      
      Compatibility concerns:
      
      This patch has few compatibility issues. After this patch,
      FUTEX_WAIT require writable access to futex variables (read-only
      mappings makes EFAULT). But practically it's not a problem,
      glibc always initalizes variables for futexes explicitly - nobody
      uses read-only mappings.
      Reported-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: <stable@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Ulrich Drepper <drepper@gmail.com>
      LKML-Reference: <20100105162633.45A2.A69D9226@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7485d0d3
  12. 12 1月, 2010 3 次提交
    • A
      kernel/signal.c: fix kernel information leak with print-fatal-signals=1 · b45c6e76
      Andi Kleen 提交于
      When print-fatal-signals is enabled it's possible to dump any memory
      reachable by the kernel to the log by simply jumping to that address from
      user space.
      
      Or crash the system if there's some hardware with read side effects.
      
      The fatal signals handler will dump 16 bytes at the execution address,
      which is fully controlled by ring 3.
      
      In addition when something jumps to a unmapped address there will be up to
      16 additional useless page faults, which might be potentially slow (and at
      least is not very efficient)
      
      Fortunately this option is off by default and only there on i386.
      
      But fix it by checking for kernel addresses and also stopping when there's
      a page fault.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b45c6e76
    • D
      cgroups: fix 2.6.32 regression causing BUG_ON() in cgroup_diput() · bd4f490a
      Dave Anderson 提交于
      The LTP cgroup test suite generates a "kernel BUG at kernel/cgroup.c:790!"
      here in cgroup_diput():
      
                       /*
                        * if we're getting rid of the cgroup, refcount should ensure
                        * that there are no pidlists left.
                        */
                       BUG_ON(!list_empty(&cgrp->pidlists));
      
      The cgroup pidlist rework in 2.6.32 generates the BUG_ON, which is caused
      when pidlist_array_load() calls cgroup_pidlist_find():
      
      (1) if a matching cgroup_pidlist is found, it down_write's the mutex of the
           pre-existing cgroup_pidlist, and increments its use_count.
      (2) if no matching cgroup_pidlist is found, then a new one is allocated, it
           down_write's its mutex, and the use_count is set to 0.
      (3) the matching, or new, cgroup_pidlist gets returned back to pidlist_array_load(),
           which increments its use_count -- regardless whether new or pre-existing --
           and up_write's the mutex.
      
      So if a matching list is ever encountered by cgroup_pidlist_find() during
      the life of a cgroup directory, it results in an inflated use_count value,
      preventing it from ever getting released by cgroup_release_pid_array().
      Then if the directory is subsequently removed, cgroup_diput() hits the
      BUG_ON() when it finds that the directory's cgroup is still populated with
      a pidlist.
      
      The patch simply removes the use_count increment when a matching pidlist
      is found by cgroup_pidlist_find(), because it gets bumped by the calling
      pidlist_array_load() function while still protected by the list's mutex.
      Signed-off-by: NDave Anderson <anderson@redhat.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NBen Blum <bblum@andrew.cmu.edu>
      Cc: Paul Menage <menage@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd4f490a
    • M
      kmod: fix resource leak in call_usermodehelper_pipe() · 8767ba27
      Masami Hiramatsu 提交于
      Fix resource (write-pipe file) leak in call_usermodehelper_pipe().
      
      When call_usermodehelper_exec() fails, write-pipe file is opened and
      call_usermodehelper_pipe() just returns an error.  Since it is hard for
      caller to determine whether the error occured when opening the pipe or
      executing the helper, the caller cannot close the pipe by themselves.
      
      I've found this resoruce leak when testing coredump.  You can check how
      the resource leaks as below;
      
      $ echo "|nocommand" > /proc/sys/kernel/core_pattern
      $ ulimit -c unlimited
      $ while [ 1 ]; do ./segv; done &> /dev/null &
      $ cat /proc/meminfo (<- repeat it)
      
      where segv.c is;
      //-----
      int main () {
              char *p = 0;
              *p = 1;
      }
      //-----
      
      This patch closes write-pipe file if call_usermodehelper_exec() failed.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8767ba27
  13. 07 1月, 2010 2 次提交
    • S
      ring-buffer: Add rb_list_head() wrapper around new reader page next field · 0e1ff5d7
      Steven Rostedt 提交于
      If the very unlikely case happens where the writer moves the head by one
      between where the head page is read and where the new reader page
      is assigned _and_ the writer then writes and wraps the entire ring buffer
      so that the head page is back to what was originally read as the head page,
      the page to be swapped will have a corrupted next pointer.
      
      Simple solution is to wrap the assignment of the next pointer with a
      rb_list_head().
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0e1ff5d7
    • D
      ring-buffer: Wrap a list.next reference with rb_list_head() · 5ded3dc6
      David Sharp 提交于
      This reference at the end of rb_get_reader_page() was causing off-by-one
      writes to the prev pointer of the page after the reader page when that
      page is the head page, and therefore the reader page has the RB_PAGE_HEAD
      flag in its list.next pointer. This eventually results in a GPF in a
      subsequent call to rb_set_head_page() (usually from rb_get_reader_page())
      when that prev pointer is dereferenced. The dereferenced register would
      characteristically have an address that appears shifted left by one byte
      (eg, ffxxxxxxxxxxxxyy instead of ffffxxxxxxxxxxxx) due to being written at
      an address one byte too high.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1262826727-9090-1-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5ded3dc6
  14. 06 1月, 2010 1 次提交
  15. 05 1月, 2010 2 次提交
  16. 01 1月, 2010 1 次提交
  17. 31 12月, 2009 1 次提交