1. 13 1月, 2010 2 次提交
    • P
      rcu: Adjust force_quiescent_state() locking, step 2 · 559569ac
      Paul E. McKenney 提交于
      This patch releases rnp->lock after the end of
      force_quiescent_state()'s switch statement.  This is a second
      step towards prohibiting starting grace periods while
      force_quiescent_state() is executing, which will reduce the
      number and complexity of races that force_quiescent_state() is
      involved in.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465501994-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      559569ac
    • P
      rcu: Adjust force_quiescent_state() locking, step 1 · f96e9232
      Paul E. McKenney 提交于
      This causes rnp->lock to be held on entry to
      force_quiescent_state()'s switch statement.  This is a first
      step towards prohibiting starting grace periods while
      force_quiescent_state() is executing, which will reduce the
      number and complexity of races that force_quiescent_state() is
      involved in.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465501455-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f96e9232
  2. 12 1月, 2010 3 次提交
    • A
      kernel/signal.c: fix kernel information leak with print-fatal-signals=1 · b45c6e76
      Andi Kleen 提交于
      When print-fatal-signals is enabled it's possible to dump any memory
      reachable by the kernel to the log by simply jumping to that address from
      user space.
      
      Or crash the system if there's some hardware with read side effects.
      
      The fatal signals handler will dump 16 bytes at the execution address,
      which is fully controlled by ring 3.
      
      In addition when something jumps to a unmapped address there will be up to
      16 additional useless page faults, which might be potentially slow (and at
      least is not very efficient)
      
      Fortunately this option is off by default and only there on i386.
      
      But fix it by checking for kernel addresses and also stopping when there's
      a page fault.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b45c6e76
    • D
      cgroups: fix 2.6.32 regression causing BUG_ON() in cgroup_diput() · bd4f490a
      Dave Anderson 提交于
      The LTP cgroup test suite generates a "kernel BUG at kernel/cgroup.c:790!"
      here in cgroup_diput():
      
                       /*
                        * if we're getting rid of the cgroup, refcount should ensure
                        * that there are no pidlists left.
                        */
                       BUG_ON(!list_empty(&cgrp->pidlists));
      
      The cgroup pidlist rework in 2.6.32 generates the BUG_ON, which is caused
      when pidlist_array_load() calls cgroup_pidlist_find():
      
      (1) if a matching cgroup_pidlist is found, it down_write's the mutex of the
           pre-existing cgroup_pidlist, and increments its use_count.
      (2) if no matching cgroup_pidlist is found, then a new one is allocated, it
           down_write's its mutex, and the use_count is set to 0.
      (3) the matching, or new, cgroup_pidlist gets returned back to pidlist_array_load(),
           which increments its use_count -- regardless whether new or pre-existing --
           and up_write's the mutex.
      
      So if a matching list is ever encountered by cgroup_pidlist_find() during
      the life of a cgroup directory, it results in an inflated use_count value,
      preventing it from ever getting released by cgroup_release_pid_array().
      Then if the directory is subsequently removed, cgroup_diput() hits the
      BUG_ON() when it finds that the directory's cgroup is still populated with
      a pidlist.
      
      The patch simply removes the use_count increment when a matching pidlist
      is found by cgroup_pidlist_find(), because it gets bumped by the calling
      pidlist_array_load() function while still protected by the list's mutex.
      Signed-off-by: NDave Anderson <anderson@redhat.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NBen Blum <bblum@andrew.cmu.edu>
      Cc: Paul Menage <menage@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd4f490a
    • M
      kmod: fix resource leak in call_usermodehelper_pipe() · 8767ba27
      Masami Hiramatsu 提交于
      Fix resource (write-pipe file) leak in call_usermodehelper_pipe().
      
      When call_usermodehelper_exec() fails, write-pipe file is opened and
      call_usermodehelper_pipe() just returns an error.  Since it is hard for
      caller to determine whether the error occured when opening the pipe or
      executing the helper, the caller cannot close the pipe by themselves.
      
      I've found this resoruce leak when testing coredump.  You can check how
      the resource leaks as below;
      
      $ echo "|nocommand" > /proc/sys/kernel/core_pattern
      $ ulimit -c unlimited
      $ while [ 1 ]; do ./segv; done &> /dev/null &
      $ cat /proc/meminfo (<- repeat it)
      
      where segv.c is;
      //-----
      int main () {
              char *p = 0;
              *p = 1;
      }
      //-----
      
      This patch closes write-pipe file if call_usermodehelper_exec() failed.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8767ba27
  3. 06 1月, 2010 1 次提交
  4. 31 12月, 2009 1 次提交
  5. 30 12月, 2009 6 次提交
  6. 28 12月, 2009 2 次提交
  7. 24 12月, 2009 1 次提交
    • A
      SYSCTL: Print binary sysctl warnings (nearly) only once · 4440095c
      Andi Kleen 提交于
      When printing legacy sysctls print the warning message
      for each of them only once.  This way there is a guarantee
      the syslog won't be flooded for any sane program.
      
      The original attempt at this made the tables non const and stored
      the flag inline.
      
      Linus suggested using a separate hash table for this, this is based on a
      code snippet from him.
      
      The hash implies this is not exact and can sometimes not print a
      new sysctl due to a hash collision, but in practice this should not
      be a problem
      
      I used a FNV32 hash over the binary string with a 32byte bitmap. This
      gives relatively little collisions when all the predefined binary sysctls
      are hashed:
      
      size 256
      bucket
      length      number
      0:          [25]
      1:          [67]
      2:          [88]
      3:          [47]
      4:          [22]
      5:          [6]
      6:          [1]
      
      The worst case is a single collision of 6 hash values.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      4440095c
  8. 23 12月, 2009 10 次提交
  9. 22 12月, 2009 2 次提交
  10. 21 12月, 2009 2 次提交
  11. 20 12月, 2009 2 次提交
    • A
      fix more leaks in audit_tree.c tag_chunk() · b4c30aad
      Al Viro 提交于
      Several leaks in audit_tree didn't get caught by commit
      318b6d3d, including the leak on normal
      exit in case of multiple rules refering to the same chunk.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4c30aad
    • A
      fix braindamage in audit_tree.c untag_chunk() · 6f5d5114
      Al Viro 提交于
      ... aka "Al had badly fscked up when writing that thing and nobody
      noticed until Eric had fixed leaks that used to mask the breakage".
      
      The function essentially creates a copy of old array sans one element
      and replaces the references to elements of original (they are on cyclic
      lists) with those to corresponding elements of new one.  After that the
      old one is fair game for freeing.
      
      First of all, there's a dumb braino: when we get to list_replace_init we
      use indices for wrong arrays - position in new one with the old array
      and vice versa.
      
      Another bug is more subtle - termination condition is wrong if the
      element to be excluded happens to be the last one.  We shouldn't go
      until we fill the new array, we should go until we'd finished the old
      one.  Otherwise the element we are trying to kill will remain on the
      cyclic lists...
      
      That crap used to be masked by several leaks, so it was not quite
      trivial to hit.  Eric had fixed some of those leaks a while ago and the
      shit had hit the fan...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f5d5114
  12. 18 12月, 2009 3 次提交
  13. 17 12月, 2009 5 次提交
    • P
      sched: Fix broken assertion · 077614ee
      Peter Zijlstra 提交于
      There's a preemption race in the set_task_cpu() debug check in
      that when we get preempted after setting task->state we'd still
      be on the rq proper, but fail the test.
      
      Check for preempted tasks, since those are always on the RQ.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20091217121830.137155561@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      077614ee
    • P
      perf events: Dont report side-band events on each cpu for per-task-per-cpu events · 5d27c23d
      Peter Zijlstra 提交于
      Acme noticed that his FORK/MMAP numbers were inflated by about
      the same factor as his cpu-count.
      
      This led to the discovery of a few more sites that need to
      respect the event->cpu filter.
      Reported-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20091217121830.215333434@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5d27c23d
    • F
      perf events, x86/stacktrace: Make stack walking optional · 61c1917f
      Frederic Weisbecker 提交于
      The current print_context_stack helper that does the stack
      walking job is good for usual stacktraces as it walks through
      all the stack and reports even addresses that look unreliable,
      which is nice when we don't have frame pointers for example.
      
      But we have users like perf that only require reliable
      stacktraces, and those may want a more adapted stack walker, so
      lets make this function a callback in stacktrace_ops that users
      can tune for their needs.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1261024834-5336-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      61c1917f
    • F
      sched: Teach might_sleep() about preemptible RCU · 234da7bc
      Frederic Weisbecker 提交于
      In practice, it is harmless to voluntarily sleep in a
      rcu_read_lock() section if we are running under preempt rcu, but
      it is illegal if we build a kernel running non-preemptable rcu.
      
      Currently, might_sleep() doesn't notice sleepable operations
      under rcu_read_lock() sections if we are running under
      preemptable rcu because preempt_count() is left untouched after
      rcu_read_lock() in this case. But we want developers who test
      their changes under such config to notice the "sleeping while
      atomic" issues.
      
      So we add rcu_read_lock_nesting to prempt_count() in
      might_sleep() checks.
      
      [ v2: Handle rcu-tiny ]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1260991265-8451-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      234da7bc
    • M
      kprobe-tracer: Check new event/group name · 6f3cf440
      Masami Hiramatsu 提交于
      Check new event/group name is same syntax as a C symbol. In other
      words, checking the name is as like as other tracepoint events.
      
      This can prevent user to create an event with useless name (e.g.
      foo|bar, foo*bar).
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      LKML-Reference: <20091216222408.14459.68790.stgit@dhcp-100-2-132.bos.redhat.com>
      [ v2: minor cleanups ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6f3cf440