1. 11 5月, 2011 1 次提交
  2. 27 4月, 2011 1 次提交
    • R
      init/Kconfig: fix EXPERT menu list · 6befe5f6
      Randy Dunlap 提交于
      The EXPERT menu list was recently broken by the insertion of a
      kconfig symbol (EMBEDDED) at the beginning of the EXPERT list of
      kconfig items.  Broken by:
      
        commit 6a108a14
        Author: David Rientjes <rientjes@google.com>
        Date:   Thu Jan 20 14:44:16 2011 -0800
          kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
      
      Restore the EXPERT menu list -- don't inject a symbol (EMBEDDED)
      that does not depend on EXPERT into the list.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Peter Foley <pefoley2@verizon.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6befe5f6
  3. 23 4月, 2011 1 次提交
    • J
      [PARISC] slub: fix panic with DISCONTIGMEM · 4a5fa359
      James Bottomley 提交于
      Slub makes assumptions about page_to_nid() which are violated by
      DISCONTIGMEM and !NUMA.  This violation results in a panic because
      page_to_nid() can be non-zero for pages in the discontiguous ranges and
      this leads to a null return by get_node().  The assertion by the
      maintainer is that DISCONTIGMEM should only be allowed when NUMA is also
      defined.  However, at least six architectures: alpha, ia64, m32r, m68k,
      mips, parisc violate this.  The panic is a regression against slab, so
      just mark slub broken in the problem configuration to prevent users
      reporting these panics.
      
      Cc: stable@kernel.org
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      4a5fa359
  4. 31 3月, 2011 1 次提交
  5. 24 3月, 2011 2 次提交
    • S
      userns: add a user_namespace as creator/owner of uts_namespace · 59607db3
      Serge E. Hallyn 提交于
      The expected course of development for user namespaces targeted
      capabilities is laid out at https://wiki.ubuntu.com/UserNamespace.
      
      Goals:
      
      - Make it safe for an unprivileged user to unshare namespaces.  They
        will be privileged with respect to the new namespace, but this should
        only include resources which the unprivileged user already owns.
      
      - Provide separate limits and accounting for userids in different
        namespaces.
      
      Status:
      
        Currently (as of 2.6.38) you can clone with the CLONE_NEWUSER flag to
        get a new user namespace if you have the CAP_SYS_ADMIN, CAP_SETUID, and
        CAP_SETGID capabilities.  What this gets you is a whole new set of
        userids, meaning that user 500 will have a different 'struct user' in
        your namespace than in other namespaces.  So any accounting information
        stored in struct user will be unique to your namespace.
      
        However, throughout the kernel there are checks which
      
        - simply check for a capability.  Since root in a child namespace
          has all capabilities, this means that a child namespace is not
          constrained.
      
        - simply compare uid1 == uid2.  Since these are the integer uids,
          uid 500 in namespace 1 will be said to be equal to uid 500 in
          namespace 2.
      
        As a result, the lxc implementation at lxc.sf.net does not use user
        namespaces.  This is actually helpful because it leaves us free to
        develop user namespaces in such a way that, for some time, user
        namespaces may be unuseful.
      
      Bugs aside, this patchset is supposed to not at all affect systems which
      are not actively using user namespaces, and only restrict what tasks in
      child user namespace can do.  They begin to limit privilege to a user
      namespace, so that root in a container cannot kill or ptrace tasks in the
      parent user namespace, and can only get world access rights to files.
      Since all files currently belong to the initila user namespace, that means
      that child user namespaces can only get world access rights to *all*
      files.  While this temporarily makes user namespaces bad for system
      containers, it starts to get useful for some sandboxing.
      
      I've run the 'runltplite.sh' with and without this patchset and found no
      difference.
      
      This patch:
      
      copy_process() handles CLONE_NEWUSER before the rest of the namespaces.
      So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS) the new uts namespace
      will have the new user namespace as its owner.  That is what we want,
      since we want root in that new userns to be able to have privilege over
      it.
      
      Changelog:
      	Feb 15: don't set uts_ns->user_ns if we didn't create
      		a new uts_ns.
      	Feb 23: Move extern init_user_ns declaration from
      		init/version.c to utsname.h.
      Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59607db3
    • E
      pid: remove the child_reaper special case in init/main.c · 45a68628
      Eric W. Biederman 提交于
      This patchset is a cleanup and a preparation to unshare the pid namespace.
      These prerequisites prepare for Eric's patchset to give a file descriptor
      to a namespace and join an existing namespace.
      
      This patch:
      
      It turns out that the existing assignment in copy_process of the
      child_reaper can handle the initial assignment of child_reaper we just
      need to generalize the test in kernel/fork.c
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NSerge E. Hallyn <serge@hallyn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45a68628
  6. 23 3月, 2011 6 次提交
    • D
      init: return proper error code in do_mounts_rd() · ea611b26
      Davidlohr Bueso 提交于
      In do_mounts_rd() if memory cannot be allocated, return -ENOMEM.
      Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea611b26
    • P
      calibrate: retry with wider bounds when converge seems to fail · b1b5f65e
      Phil Carmody 提交于
      Systems with unmaskable interrupts such as SMIs may massively
      underestimate loops_per_jiffy, and fail to converge anywhere near the real
      value.  A case seen on x86_64 was an initial estimate of 256<<12, which
      converged to 511<<12 where the real value should have been over 630<<12.
      This admitedly requires bypassing the TSC calibration (lpj_fine), and a
      failure to settle in the direct calibration too, but is physically
      possible.  This failure does not depend on my previous calibration
      optimisation, but by luck is easy to fix with the optimisation in place
      with a trivial retry loop.
      
      In the context of the optimised converging method, as we can no longer
      trust the starting estimate, enlarge the search bounds exponentially so
      that the number of retries is logarithmically bounded.
      
      [akpm@linux-foundation.org: mention x86_64 SMIs in comment]
      Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Tested-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1b5f65e
    • P
      calibrate: home in on correct lpj value more quickly · 191e5688
      Phil Carmody 提交于
      Binary chop with a jiffy-resync on each step to find an upper bound is
      slow, so just race in a tight-ish loop to find an underestimate.
      
      If done with lots of individual steps, sometimes several hundreds of
      iterations would be required, which would impose a significant overhead,
      and make the initial estimate very low.  By taking slowly increasing steps
      there will be less overhead.
      
      E.g.  an x86_64 2.67GHz could have fitted in 613 individual small delays,
      but in reality should have been able to fit in a single delay 644 times
      longer, so underestimated by 31 steps.  To reach the equivalent of 644
      small delays with the accelerating scheme now requires about 130
      iterations, so has <1/4th of the overhead, and can therefore be expected
      to underestimate by only 7 steps.
      
      As now we have a better initial estimate we can binary chop over a smaller
      range.  With the loop overhead in the initial estimate kept low, and the
      step sizes moderate, we won't have under-estimated by much, so chose as
      tight a range as we can.
      Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Tested-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      191e5688
    • P
      calibrate: extract fall-back calculation into own helper · 71c696b1
      Phil Carmody 提交于
      The motivation for this patch series is that currently our OMAP calibrates
      itself using the trial-and-error binary chop fallback that some other
      architectures no longer need to perform.  This is a lengthy process,
      taking 0.2s in an environment where boot time is of great interest.
      
      Patch 2/4 has two optimisations.  Firstly, it replaces the initial
      repeated- doubling to find the relevant power of 2 with a tight loop that
      just does as much as it can in a jiffy.  Secondly, it doesn't binary chop
      over an entire power of 2 range, it choses a much smaller range based on
      how much it squeezed in, and failed to squeeze in, during the first stage.
       Both are significant optimisations, and bring our calibration down from
      23 jiffies to 5, and, in the process, often arrive at a more accurate lpj
      value.
      
      The 'bands' and 'sub-logarithmic' growth may look over-engineered, but
      they only cost a small level of inaccuracy in the initial guess (for all
      architectures) in order to avoid the very large inaccuracies that appeared
      during testing (on x86_64 architectures, and presumably others with less
      metronomic operation).  Note that due to the existence of the TSC and
      other timers, the x86_64 will not typically use this fallback routine, but
      I wanted to code defensively, able to cope with all kinds of processor
      behaviours and kernel command line options.
      
      Patch 3/4 is an additional trap for the nightmare scenario where the
      initial estimate is very inaccurate, possibly due to things like SMIs.
      It simply retries with a larger bound.
      
      Stephen said:
      
      I tried this patch set out on an MSM7630.
      :
      : Before:
      :
      : Calibrating delay loop... 681.57 BogoMIPS (lpj=3407872)
      :
      : After:
      :
      : Calibrating delay loop... 680.75 BogoMIPS (lpj=3403776)
      :
      : But the really good news is calibration time dropped from ~247ms to ~56ms.
      :  Sadly we won't be able to benefit from this should my udelay patches make
      : it into ARM because we would be using calibrate_delay_direct() instead (at
      : least on machines who choose to).  Can we somehow reapply the logic behind
      : this to calibrate_delay_direct()?  That would be even better, but this is
      : definitely a boot time improvement.
      :
      : Or maybe we could just replace calibrate_delay_direct() with this fallback
      : calculation?  If __delay() is a thin wrapper around read_current_timer()
      : it should work just as well (plus patch 3 makes it handle SMIs).  I'll try
      : that out.
      
      This patch:
      
      ... so that it can be modified more clinically.
      
      This is almost entirely cosmetic. The only change to the operation
      is that the global variable is only set once after the estimation is
      completed, rather than taking on all the intermediate values. However,
      there are no readers of that variable, so this change is unimportant.
      Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Tested-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71c696b1
    • A
      smp: move smp setup functions to kernel/smp.c · 34db18a0
      Amerigo Wang 提交于
      Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter
      setup functions from init/main.c to kenrel/smp.c, saves some #ifdef
      CONFIG_SMP.
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Rakib Mullick <rakib.mullick@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34db18a0
    • M
      fs: use appropriate printk priority levels · 80cdc6da
      Mandeep Singh Baines 提交于
      printk()s without a priority level default to KERN_WARNING.  To reduce
      noise at KERN_WARNING, this patch set the priority level appriopriately
      for unleveled printks()s.  This should be useful to folks that look at
      dmesg warnings closely.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80cdc6da
  7. 15 3月, 2011 1 次提交
  8. 05 3月, 2011 1 次提交
    • A
      BKL: That's all, folks · 4ba8216c
      Arnd Bergmann 提交于
      This removes the implementation of the big kernel lock,
      at last. A lot of people have worked on this in the
      past, I so the credit for this patch should be with
      everyone who participated in the hunt.
      
      The names on the Cc list are the people that were the
      most active in this, according to the recorded git
      history, in alphabetical order.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NAlan Cox <alan@linux.intel.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Hendry <andrew.hendry@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Hans Verkuil <hverkuil@xs4all.nl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Jan Blunck <jblunck@infradead.org>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Oliver Neukum <oliver@neukum.org>
      Cc: Paul Menage <menage@google.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      4ba8216c
  9. 04 3月, 2011 1 次提交
  10. 16 2月, 2011 1 次提交
    • S
      perf: Add cgroup support · e5d1367f
      Stephane Eranian 提交于
      This kernel patch adds the ability to filter monitoring based on
      container groups (cgroups). This is for use in per-cpu mode only.
      
      The cgroup to monitor is passed as a file descriptor in the pid
      argument to the syscall. The file descriptor must be opened to
      the cgroup name in the cgroup filesystem. For instance, if the
      cgroup name is foo and cgroupfs is mounted in /cgroup, then the
      file descriptor is opened to /cgroup/foo. Cgroup mode is
      activated by passing PERF_FLAG_PID_CGROUP in the flags argument
      to the syscall.
      
      For instance to measure in cgroup foo on CPU1 assuming
      cgroupfs is mounted under /cgroup:
      
      struct perf_event_attr attr;
      int cgroup_fd, fd;
      
      cgroup_fd = open("/cgroup/foo", O_RDONLY);
      fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP);
      close(cgroup_fd);
      Signed-off-by: NStephane Eranian <eranian@google.com>
      [ added perf_cgroup_{exit,attach} ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d590250.114ddf0a.689e.4482@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5d1367f
  11. 11 2月, 2011 1 次提交
  12. 04 2月, 2011 1 次提交
  13. 21 1月, 2011 1 次提交
    • D
      kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT · 6a108a14
      David Rientjes 提交于
      The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
      is used to configure any non-standard kernel with a much larger scope than
      only small devices.
      
      This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
      references to the option throughout the kernel.  A new CONFIG_EMBEDDED
      option is added that automatically selects CONFIG_EXPERT when enabled and
      can be used in the future to isolate options that should only be
      considered for embedded systems (RISC architectures, SLOB, etc).
      
      Calling the option "EXPERT" more accurately represents its intention: only
      expert users who understand the impact of the configuration changes they
      are making should enable it.
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NDavid Woodhouse <david.woodhouse@intel.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Greg KH <gregkh@suse.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Robin Holt <holt@sgi.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a108a14
  14. 20 1月, 2011 1 次提交
    • T
      lockdep: Move early boot local IRQ enable/disable status to init/main.c · 2ce802f6
      Tejun Heo 提交于
      During early boot, local IRQ is disabled until IRQ subsystem is
      properly initialized.  During this time, no one should enable
      local IRQ and some operations which usually are not allowed with
      IRQ disabled, e.g. operations which might sleep or require
      communications with other processors, are allowed.
      
      lockdep tracked this with early_boot_irqs_off/on() callbacks.
      As other subsystems need this information too, move it to
      init/main.c and make it generally available.  While at it,
      toggle the boolean to early_boot_irqs_disabled instead of
      enabled so that it can be initialized with %false and %true
      indicates the exceptional condition.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ce802f6
  15. 17 1月, 2011 2 次提交
  16. 14 1月, 2011 2 次提交
    • P
      rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status · c072a388
      Paul E. McKenney 提交于
      Because the adaptive synchronize_srcu_expedited() approach has
      worked very well in testing, remove the kernel parameter and
      replace it by a C-preprocessor macro.  If someone finds problems
      with this approach, a more complex and aggressively adaptive
      approach might be required.
      
      Longer term, SRCU will be merged with the other RCU implementations,
      at which point synchronize_srcu_expedited() will be event driven,
      just as synchronize_sched_expedited() currently is.  At that point,
      there will be no need for this adaptive approach.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c072a388
    • L
      decompressors: add boot-time XZ support · 3ebe1243
      Lasse Collin 提交于
      This implements the API defined in <linux/decompress/generic.h> which is
      used for kernel, initramfs, and initrd decompression.  This patch together
      with the first patch is enough for XZ-compressed initramfs and initrd;
      XZ-compressed kernel will need arch-specific changes.
      
      The buffering requirements described in decompress_unxz.c are stricter
      than with gzip, so the relevant changes should be done to the
      arch-specific code when adding support for XZ-compressed kernel.
      Similarly, the heap size in arch-specific pre-boot code may need to be
      increased (30 KiB is enough).
      
      The XZ decompressor needs memmove(), memeq() (memcmp() == 0), and
      memzero() (memset(ptr, 0, size)), which aren't available in all
      arch-specific pre-boot environments.  I'm including simple versions in
      decompress_unxz.c, but a cleaner solution would naturally be nicer.
      Signed-off-by: NLasse Collin <lasse.collin@tukaani.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alain Knaff <alain@knaff.lu>
      Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
      Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ebe1243
  17. 04 1月, 2011 1 次提交
  18. 24 12月, 2010 1 次提交
    • T
      init: don't call flush_scheduled_work() from do_initcalls() · ee4569a3
      Tejun Heo 提交于
      The call to flush_scheduled_work() in do_initcalls() is there to make
      sure all works queued to system_wq by initcalls finish before the init
      sections are dropped.
      
      However, the call doesn't make much sense at this point - there
      already are multiple different workqueues and different subsystems are
      free to create and use their own.  Ordering requirements are and
      should be expressed explicitly.
      
      Drop the call to prepare for the deprecation and removal of
      flush_scheduled_work().
      
      Andrew suggested adding sanity check where the workqueue code checks
      whether any pending or running work has the work function in the init
      text section.  However, checking this for running works requires the
      worker to keep track of the current function being executed, and
      checking only the pending works will miss most cases.  As a violation
      will almost always be caught by the usual page fault mechanism, I
      don't think it would be worthwhile to make the workqueue code track
      extra state just for this.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      ee4569a3
  19. 23 12月, 2010 1 次提交
  20. 16 12月, 2010 2 次提交
  21. 30 11月, 2010 4 次提交
    • M
      sched: Add 'autogroup' scheduling feature: automated per session task groups · 5091faa4
      Mike Galbraith 提交于
      A recurring complaint from CFS users is that parallel kbuild has
      a negative impact on desktop interactivity.  This patch
      implements an idea from Linus, to automatically create task
      groups.  Currently, only per session autogroups are implemented,
      but the patch leaves the way open for enhancement.
      
      Implementation: each task's signal struct contains an inherited
      pointer to a refcounted autogroup struct containing a task group
      pointer, the default for all tasks pointing to the
      init_task_group.  When a task calls setsid(), a new task group
      is created, the process is moved into the new task group, and a
      reference to the preveious task group is dropped.  Child
      processes inherit this task group thereafter, and increase it's
      refcount.  When the last thread of a process exits, the
      process's reference is dropped, such that when the last process
      referencing an autogroup exits, the autogroup is destroyed.
      
      At runqueue selection time, IFF a task has no cgroup assignment,
      its current autogroup is used.
      
      Autogroup bandwidth is controllable via setting it's nice level
      through the proc filesystem:
      
        cat /proc/<pid>/autogroup
      
      Displays the task's group and the group's nice level.
      
        echo <nice level> > /proc/<pid>/autogroup
      
      Sets the task group's shares to the weight of nice <level> task.
      Setting nice level is rate limited for !admin users due to the
      abuse risk of task group locking.
      
      The feature is enabled from boot by default if
      CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
      the boot option noautogroup, and can also be turned on/off on
      the fly via:
      
        echo [01] > /proc/sys/kernel/sched_autogroup_enabled
      
      ... which will automatically move tasks to/from the root task group.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5091faa4
    • P
      rcu: Make synchronize_srcu_expedited() fast if running readers · 46fdb093
      Paul E. McKenney 提交于
      The synchronize_srcu_expedited() function is currently quick if there
      are no active readers, but will delay a full jiffy if there are any.
      If these readers leave their SRCU read-side critical sections quickly,
      this is way too long to wait.  So this commit first waits ten microseconds,
      and only then falls back to jiffy-at-a-time waiting.
      Reported-by: NAvi Kivity <avi@redhat.com>
      Reported-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Tested-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      46fdb093
    • P
      rcu: add tracing for TINY_RCU and TINY_PREEMPT_RCU · 9e571a82
      Paul E. McKenney 提交于
      Add tracing for the tiny RCU implementations, including statistics on
      boosting in the case of TINY_PREEMPT_RCU and RCU_BOOST.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9e571a82
    • P
      rcu: priority boosting for TINY_PREEMPT_RCU · 24278d14
      Paul E. McKenney 提交于
      Add priority boosting, but only for TINY_PREEMPT_RCU.  This is enabled
      by the default-off RCU_BOOST kernel parameter.  The priority to which to
      boost preempted RCU readers is controlled by the RCU_BOOST_PRIO kernel
      parameter (defaulting to real-time priority 1) and the time to wait
      before boosting the readers blocking a given grace period is controlled
      by the RCU_BOOST_DELAY kernel parameter (defaulting to 500 milliseconds).
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      24278d14
  22. 26 11月, 2010 1 次提交
  23. 25 11月, 2010 1 次提交
    • M
      cgroups: make swap accounting default behavior configurable · a42c390c
      Michal Hocko 提交于
      Swap accounting can be configured by CONFIG_CGROUP_MEM_RES_CTLR_SWAP
      configuration option and then it is turned on by default.  There is a boot
      option (noswapaccount) which can disable this feature.
      
      This makes it hard for distributors to enable the configuration option as
      this feature leads to a bigger memory consumption and this is a no-go for
      general purpose distribution kernel.  On the other hand swap accounting
      may be very usuful for some workloads.
      
      This patch adds a new configuration option which controls the default
      behavior (CGROUP_MEM_RES_CTLR_SWAP_ENABLED).  If the option is selected
      then the feature is turned on by default.
      
      It also adds a new boot parameter swapaccount[=1|0] which enhances the
      original noswapaccount parameter semantic by means of enable/disable logic
      (defaults to 1 if no value is provided to be still consistent with
      noswapaccount).
      
      The default behavior is unchanged (if CONFIG_CGROUP_MEM_RES_CTLR_SWAP is
      enabled then CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED is enabled as well)
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a42c390c
  24. 18 11月, 2010 1 次提交
  25. 28 10月, 2010 4 次提交