1. 11 10月, 2007 27 次提交
  2. 09 10月, 2007 1 次提交
  3. 08 10月, 2007 1 次提交
    • L
      Don't do load-average calculations at even 5-second intervals · 0c2043ab
      Linus Torvalds 提交于
      It turns out that there are a few other five-second timers in the
      kernel, and if the timers get in sync, the load-average can get
      artificially inflated by events that just happen to coincide.
      
      So just offset the load average calculation it by a timer tick.
      
      Noticed by Anders Boström, for whom the coincidence started triggering
      on one of his machines with the JBD jiffies rounding code (JBD is one of
      the subsystems that also end up using a 5-second timer by default).
      Tested-by: NAnders Boström <anders@bostrom.dyndns.org>
      Cc: Chuck Ebbert <cebbert@redhat.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c2043ab
  4. 27 9月, 2007 1 次提交
  5. 21 9月, 2007 1 次提交
    • D
      signalfd simplification · b8fceee1
      Davide Libenzi 提交于
      This simplifies signalfd code, by avoiding it to remain attached to the
      sighand during its lifetime.
      
      In this way, the signalfd remain attached to the sighand only during
      poll(2) (and select and epoll) and read(2).  This also allows to remove
      all the custom "tsk == current" checks in kernel/signal.c, since
      dequeue_signal() will only be called by "current".
      
      I think this is also what Ben was suggesting time ago.
      
      The external effect of this, is that a thread can extract only its own
      private signals and the group ones.  I think this is an acceptable
      behaviour, in that those are the signals the thread would be able to
      fetch w/out signalfd.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8fceee1
  6. 20 9月, 2007 4 次提交
    • I
      sched: add /proc/sys/kernel/sched_compat_yield · 1799e35d
      Ingo Molnar 提交于
      add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
      more agressive, by moving the yielding task to the last position
      in the rbtree.
      
      with sched_compat_yield=0:
      
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
        2539 mingo     20   0  1576  252  204 R   50  0.0   0:02.03 loop_yield
        2541 mingo     20   0  1576  244  196 R   50  0.0   0:02.05 loop
      
      with sched_compat_yield=1:
      
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
        2584 mingo     20   0  1576  248  196 R   99  0.0   0:52.45 loop
        2582 mingo     20   0  1576  256  204 R    0  0.0   0:00.00 loop_yield
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      1799e35d
    • L
      Fix NUMA Memory Policy Reference Counting · 480eccf9
      Lee Schermerhorn 提交于
      This patch proposes fixes to the reference counting of memory policy in the
      page allocation paths and in show_numa_map().  Extracted from my "Memory
      Policy Cleanups and Enhancements" series as stand-alone.
      
      Shared policy lookup [shmem] has always added a reference to the policy,
      but this was never unrefed after page allocation or after formatting the
      numa map data.
      
      Default system policy should not require additional ref counting, nor
      should the current task's task policy.  However, show_numa_map() calls
      get_vma_policy() to examine what may be [likely is] another task's policy.
      The latter case needs protection against freeing of the policy.
      
      This patch adds a reference count to a mempolicy returned by
      get_vma_policy() when the policy is a vma policy or another task's
      mempolicy.  Again, shared policy is already reference counted on lookup.  A
      matching "unref" [__mpol_free()] is performed in alloc_page_vma() for
      shared and vma policies, and in show_numa_map() for shared and another
      task's mempolicy.  We can call __mpol_free() directly, saving an admittedly
      inexpensive inline NULL test, because we know we have a non-NULL policy.
      
      Handling policy ref counts for hugepages is a bit trickier.
      huge_zonelist() returns a zone list that might come from a shared or vma
      'BIND policy.  In this case, we should hold the reference until after the
      huge page allocation in dequeue_hugepage().  The patch modifies
      huge_zonelist() to return a pointer to the mempolicy if it needs to be
      unref'd after allocation.
      
      Kernel Build [16cpu, 32GB, ia64] - average of 10 runs:
      
      		w/o patch	w/ refcount patch
      	    Avg	  Std Devn	   Avg	  Std Devn
      Real:	 100.59	    0.38	 100.63	    0.43
      User:	1209.60	    0.37	1209.91	    0.31
      System:   81.52	    0.42	  81.64	    0.34
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NAndi Kleen <ak@suse.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      480eccf9
    • P
      Fix user namespace exiting OOPs · 28f300d2
      Pavel Emelyanov 提交于
      It turned out, that the user namespace is released during the do_exit() in
      exit_task_namespaces(), but the struct user_struct is released only during the
      put_task_struct(), i.e.  MUCH later.
      
      On debug kernels with poisoned slabs this will cause the oops in
      uid_hash_remove() because the head of the chain, which resides inside the
      struct user_namespace, will be already freed and poisoned.
      
      Since the uid hash itself is required only when someone can search it, i.e.
      when the namespace is alive, we can safely unhash all the user_struct-s from
      it during the namespace exiting.  The subsequent free_uid() will complete the
      user_struct destruction.
      
      For example simple program
      
         #include <sched.h>
      
         char stack[2 * 1024 * 1024];
      
         int f(void *foo)
         {
         	return 0;
         }
      
         int main(void)
         {
         	clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
         	return 0;
         }
      
      run on kernel with CONFIG_USER_NS turned on will oops the
      kernel immediately.
      
      This was spotted during OpenVZ kernel testing.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Acked-by: N"Serge E. Hallyn" <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28f300d2
    • P
      Convert uid hash to hlist · 735de223
      Pavel Emelyanov 提交于
      Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses
      list_heads, thus occupying twice as much place as it could.  Convert it to
      hlist_heads.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      735de223
  7. 17 9月, 2007 2 次提交
  8. 12 9月, 2007 3 次提交