1. 17 10月, 2007 7 次提交
    • A
      Shrink task_struct if CONFIG_FUTEX=n · 42b2dd0a
      Alexey Dobriyan 提交于
      robust_list, compat_robust_list, pi_state_list, pi_state_cache are
      really used if futexes are on.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42b2dd0a
    • A
      Shrink struct task_struct::oomkilladj · 3befe7ce
      Alexey Dobriyan 提交于
      oomkilladj is int, but values which can be assigned to it are -17, [-16,
      15], thus fitting into s8.
      
      While patch itself doesn't help in making task_struct smaller, because of
      natural alignment of ->link_count, it will make picture clearer wrt futher
      task_struct reduction patches.  My plan is to move ->fpu_counter and
      ->oomkilladj after ->ioprio filling hole on i386 and x86_64.  But that's
      for later, because bloated distro configs need looking at as well.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3befe7ce
    • R
      Add MMF_DUMP_ELF_HEADERS · 82df3973
      Roland McGrath 提交于
      This adds the MMF_DUMP_ELF_HEADERS option to /proc/pid/coredump_filter.
      This dumps the first page (only) of a private file mapping if it appears to
      be a mapping of an ELF file.  Including these pages in the core dump may
      give sufficient identifying information to associate the original DSO and
      executable file images and their debugging information with a core file in
      a generic way just from its contents (e.g.  when those binaries were built
      with ld --build-id).  I expect this to become the default behavior
      eventually.  Existing versions of gdb can be confused by the core dumps it
      creates, so it won't enabled by default for some time to come.  Soon many
      people will have systems with a gdb that handle these dumps, so they can
      arrange to set the bit at boot and have it inherited system-wide.
      
      This also cleans up the checking of the MMF_DUMP_* flag bits, which did not
      need to be using atomic macros.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82df3973
    • R
      softlockup: add a /proc tuning parameter · c4f3b63f
      Ravikiran G Thirumalai 提交于
      Control the trigger limit for softlockup warnings.  This is useful for
      debugging softlockups, by lowering the softlockup_thresh to identify
      possible softlockups earlier.
      
      This patch:
      1. Adds a sysctl softlockup_thresh with valid values of 1-60s
         (Higher value to disable false positives)
      2. Changes the softlockup printk to print the cpu softlockup time
      
      [akpm@linux-foundation.org: Fix various warnings and add definition of "two"]
      Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NShai Fultheim <shai@scalex86.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4f3b63f
    • P
      mm: dirty balancing for tasks · 3e26c149
      Peter Zijlstra 提交于
      Based on ideas of Andrew:
        http://marc.info/?l=linux-kernel&m=102912915020543&w=2
      
      Scale the bdi dirty limit inversly with the tasks dirty rate.
      This makes heavy writers have a lower dirty limit than the occasional writer.
      
      Andrea proposed something similar:
        http://lwn.net/Articles/152277/
      
      The main disadvantage to his patch is that he uses an unrelated quantity to
      measure time, which leaves him with a workload dependant tunable. Other than
      that the two approaches appear quite similar.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e26c149
    • P
      cpuset: remove sched domain hooks from cpusets · 607717a6
      Paul Jackson 提交于
      Remove the cpuset hooks that defined sched domains depending on the setting
      of the 'cpu_exclusive' flag.
      
      The cpu_exclusive flag can only be set on a child if it is set on the
      parent.
      
      This made that flag painfully unsuitable for use as a flag defining a
      partitioning of a system.
      
      It was entirely unobvious to a cpuset user what partitioning of sched
      domains they would be causing when they set that one cpu_exclusive bit on
      one cpuset, because it depended on what CPUs were in the remainder of that
      cpusets siblings and child cpusets, after subtracting out other
      cpu_exclusive cpusets.
      
      Furthermore, there was no way on production systems to query the
      result.
      
      Using the cpu_exclusive flag for this was simply wrong from the get go.
      
      Fortunately, it was sufficiently borked that so far as I know, almost no
      successful use has been made of this.  One real time group did use it to
      affectively isolate CPUs from any load balancing efforts.  They are willing
      to adapt to alternative mechanisms for this, such as someway to manipulate
      the list of isolated CPUs on a running system.  They can do without this
      present cpu_exclusive based mechanism while we develop an alternative.
      
      There is a real risk, to the best of my understanding, of users
      accidentally setting up a partitioned scheduler domains, inhibiting desired
      load balancing across all their CPUs, due to the nonobvious (from the
      cpuset perspective) side affects of the cpu_exclusive flag.
      
      Furthermore, since there was no way on a running system to see what one was
      doing with sched domains, this change will be invisible to any using code.
      Unless they have real insight to the scheduler load balancing choices, they
      will be unable to detect that this change has been made in the kernel's
      behaviour.
      
      Initial discussion on lkml of this patch has generated much comment.  My
      (probably controversial) take on that discussion is that it has reached a
      rough concensus that the current cpuset cpu_exclusive mechanism for
      defining sched domains is borked.  There is no concensus on the
      replacement.  But since we can remove this mechanism, and since its
      continued presence risks causing unwanted partitioning of the schedulers
      load balancing, we should remove it while we can, as we proceed to work the
      replacement scheduler domain mechanisms.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      607717a6
    • M
      move mm_struct and vm_area_struct · c92ff1bd
      Martin Schwidefsky 提交于
      Move the definitions of struct mm_struct and struct vma_area_struct to
      include/mm_types.h.  This allows to define more function in asm/pgtable.h
      and friends with inline assemblies instead of macros.  Compile tested on
      i386, powerpc, powerpc64, s390-32, s390-64 and x86_64.
      
      [aurelien@aurel32.net: build fix]
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c92ff1bd
  2. 15 10月, 2007 27 次提交
  3. 11 10月, 2007 2 次提交
  4. 08 10月, 2007 1 次提交
    • L
      Don't do load-average calculations at even 5-second intervals · 0c2043ab
      Linus Torvalds 提交于
      It turns out that there are a few other five-second timers in the
      kernel, and if the timers get in sync, the load-average can get
      artificially inflated by events that just happen to coincide.
      
      So just offset the load average calculation it by a timer tick.
      
      Noticed by Anders Boström, for whom the coincidence started triggering
      on one of his machines with the JBD jiffies rounding code (JBD is one of
      the subsystems that also end up using a 5-second timer by default).
      Tested-by: NAnders Boström <anders@bostrom.dyndns.org>
      Cc: Chuck Ebbert <cebbert@redhat.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c2043ab
  5. 21 9月, 2007 1 次提交
    • D
      signalfd simplification · b8fceee1
      Davide Libenzi 提交于
      This simplifies signalfd code, by avoiding it to remain attached to the
      sighand during its lifetime.
      
      In this way, the signalfd remain attached to the sighand only during
      poll(2) (and select and epoll) and read(2).  This also allows to remove
      all the custom "tsk == current" checks in kernel/signal.c, since
      dequeue_signal() will only be called by "current".
      
      I think this is also what Ben was suggesting time ago.
      
      The external effect of this, is that a thread can extract only its own
      private signals and the group ones.  I think this is an acceptable
      behaviour, in that those are the signals the thread would be able to
      fetch w/out signalfd.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8fceee1
  6. 20 9月, 2007 2 次提交
    • I
      sched: add /proc/sys/kernel/sched_compat_yield · 1799e35d
      Ingo Molnar 提交于
      add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
      more agressive, by moving the yielding task to the last position
      in the rbtree.
      
      with sched_compat_yield=0:
      
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
        2539 mingo     20   0  1576  252  204 R   50  0.0   0:02.03 loop_yield
        2541 mingo     20   0  1576  244  196 R   50  0.0   0:02.05 loop
      
      with sched_compat_yield=1:
      
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
        2584 mingo     20   0  1576  248  196 R   99  0.0   0:52.45 loop
        2582 mingo     20   0  1576  256  204 R    0  0.0   0:00.00 loop_yield
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      1799e35d
    • P
      Fix user namespace exiting OOPs · 28f300d2
      Pavel Emelyanov 提交于
      It turned out, that the user namespace is released during the do_exit() in
      exit_task_namespaces(), but the struct user_struct is released only during the
      put_task_struct(), i.e.  MUCH later.
      
      On debug kernels with poisoned slabs this will cause the oops in
      uid_hash_remove() because the head of the chain, which resides inside the
      struct user_namespace, will be already freed and poisoned.
      
      Since the uid hash itself is required only when someone can search it, i.e.
      when the namespace is alive, we can safely unhash all the user_struct-s from
      it during the namespace exiting.  The subsequent free_uid() will complete the
      user_struct destruction.
      
      For example simple program
      
         #include <sched.h>
      
         char stack[2 * 1024 * 1024];
      
         int f(void *foo)
         {
         	return 0;
         }
      
         int main(void)
         {
         	clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
         	return 0;
         }
      
      run on kernel with CONFIG_USER_NS turned on will oops the
      kernel immediately.
      
      This was spotted during OpenVZ kernel testing.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Acked-by: N"Serge E. Hallyn" <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28f300d2