1. 29 11月, 2012 3 次提交
    • F
      cputime: Consolidate cputime adjustment code · d37f761d
      Frederic Weisbecker 提交于
      task_cputime_adjusted() and thread_group_cputime_adjusted()
      essentially share the same code. They just don't use the same
      source:
      
      * The first function uses the cputime in the task struct and the
      previous adjusted snapshot that ensures monotonicity.
      
      * The second adds the cputime of all tasks in the group and the
      previous adjusted snapshot of the whole group from the signal
      structure.
      
      Just consolidate the common code that does the adjustment. These
      functions just need to fetch the values from the appropriate
      source.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      d37f761d
    • F
      cputime: Rename thread_group_times to thread_group_cputime_adjusted · e80d0a1a
      Frederic Weisbecker 提交于
      We have thread_group_cputime() and thread_group_times(). The naming
      doesn't provide enough information about the difference between
      these two APIs.
      
      To lower the confusion, rename thread_group_times() to
      thread_group_cputime_adjusted(). This name better suggests that
      it's a version of thread_group_cputime() that does some stabilization
      on the raw cputime values. ie here: scale on top of CFS runtime
      stats and bound lower value for monotonicity.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      e80d0a1a
    • F
      cputime: Move thread_group_cputime() to sched code · a634f933
      Frederic Weisbecker 提交于
      thread_group_cputime() is a general cputime API that is not only
      used by posix cpu timer. Let's move this helper to sched code.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      a634f933
  2. 31 10月, 2012 1 次提交
    • R
      module: fix out-by-one error in kallsyms · 59ef28b1
      Rusty Russell 提交于
      Masaki found and patched a kallsyms issue: the last symbol in a
      module's symtab wasn't transferred.  This is because we manually copy
      the zero'th entry (which is always empty) then copy the rest in a loop
      starting at 1, though from src[0].  His fix was minimal, I prefer to
      rewrite the loops in more standard form.
      
      There are two loops: one to get the size, and one to copy.  Make these
      identical: always count entry 0 and any defined symbol in an allocated
      non-init section.
      
      This bug exists since the following commit was introduced.
         module: reduce symbol table for loaded modules (v2)
         commit: 4a496226
      
      LKML: http://lkml.org/lkml/2012/10/24/27Reported-by: NMasaki Kimura <masaki.kimura.kz@hitachi.com>
      Cc: stable@kernel.org
      59ef28b1
  3. 30 10月, 2012 4 次提交
    • M
      sched/autogroup: Fix crash on reboot when autogroup is disabled · 5258f386
      Mike Galbraith 提交于
      Due to these two commits:
      
        8323f26c sched: Fix race in task_group()
        800d4d30 sched, autogroup: Stop going ahead if autogroup is disabled
      
      ... autogroup scheduling's dynamic knobs are wrecked.
      
      With both patches applied, all you have to do to crash a box is
      disable autogroup during boot up, then reboot.. boom, NULL pointer
      dereference due to 800d4d30 not allowing autogroup to move things,
      and 8323f26c making that the only way to switch runqueues.
      
      Remove most of the (dysfunctional) knobs and turn the remaining
      sched_autogroup_enabled knob readonly.
      
      If the user fiddles with cgroups hereafter, once tasks
      are moved, autogroup won't mess with them again unless
      they call setsid().
      
      No knobs, no glitz, nada, just a cute little thing folks can
      turn on if they don't want to muck about with cgroups and/or
      systemd.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Xiaotian Feng <xtfeng@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Xiaotian Feng <dannyfeng@tencent.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org> # v3.6
      Link: http://lkml.kernel.org/r/1351451963.4999.8.camel@maggy.simpson.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5258f386
    • F
      cputime: Separate irqtime accounting from generic vtime · 3e1df4f5
      Frederic Weisbecker 提交于
      vtime_account() doesn't have the same role in
      CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_IRQ_TIME_ACCOUNTING.
      
      In the first case it handles time accounting in any context. In
      the second case it only handles irq time accounting.
      
      So when vtime_account() is called from outside vtime_account_irq_*()
      this call is pointless to CONFIG_IRQ_TIME_ACCOUNTING.
      
      To fix the confusion, change vtime_account() to irqtime_account_irq()
      in CONFIG_IRQ_TIME_ACCOUNTING. This way we ensure future account_vtime()
      calls won't waste useless cycles in the irqtime APIs.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      3e1df4f5
    • F
      cputime: Specialize irq vtime hooks · fa5058f3
      Frederic Weisbecker 提交于
      With CONFIG_VIRT_CPU_ACCOUNTING, when vtime_account()
      is called in irq entry/exit, we perform a check on the
      context: if we are interrupting the idle task we
      account the pending cputime to idle, otherwise account
      to system time or its sub-areas: tsk->stime, hardirq time,
      softirq time, ...
      
      However this check for idle only concerns the hardirq entry
      and softirq entry:
      
      * Hardirq may directly interrupt the idle task, in which case
      we need to flush the pending CPU time to idle.
      
      * The idle task may be directly interrupted by a softirq if
      it calls local_bh_enable(). There is probably no such call
      in any idle task but we need to cover every case. Ksoftirqd
      is not concerned because the idle time is flushed on context
      switch and softirq in the end of hardirq have the idle time
      already flushed from the hardirq entry.
      
      In the other cases we always account to system/irq time:
      
      * On hardirq exit we account the time to hardirq time.
      * On softirq exit we account the time to softirq time.
      
      To optimize this and avoid the indirect call to vtime_account()
      and the checks it performs, specialize the vtime irq APIs and
      only perform the check on irq entry. Irq exit can directly call
      vtime_account_system().
      
      CONFIG_IRQ_TIME_ACCOUNTING behaviour doesn't change and directly
      maps to its own vtime_account() implementation. One may want
      to take benefits from the new APIs to optimize irq time accounting
      as well in the future.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      fa5058f3
    • F
      vtime: Make vtime_account_system() irqsafe · 11113334
      Frederic Weisbecker 提交于
      vtime_account_system() currently has only one caller with
      vtime_account() which is irq safe.
      
      Now we are going to call it from other places like kvm where
      irqs are not always disabled by the time we account the cputime.
      
      So let's make it irqsafe. The arch implementation part is now
      prefixed with "__".
      
      vtime_account_idle() arch implementation is prefixed accordingly
      to stay consistent.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      11113334
  4. 26 10月, 2012 2 次提交
    • H
      Makefile: Documentation for external tool should be correct · 2008713c
      H. Peter Anvin 提交于
      If one includes documentation for an external tool, it should be
      correct.  This is not:
      
      1. Overriding the input to rngd should typically be neither
         necessary nor desired.  This is especially so since newer
         versions of rngd support a number of different *types* of sources.
      2. The default kernel-exported device is called /dev/hwrng not
         /dev/hwrandom nor /dev/hw_random (both of which were used in the
         past; however, kernel and udev seem to have converged on
         /dev/hwrng.)
      
      Overall it is better if the documentation for rngd is kept with rngd
      rather than in a kernel Makefile.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2008713c
    • A
      pidns: limit the nesting depth of pid namespaces · f2302505
      Andrew Vagin 提交于
      'struct pid' is a "variable sized struct" - a header with an array of
      upids at the end.
      
      The size of the array depends on a level (depth) of pid namespaces.  Now a
      level of pidns is not limited, so 'struct pid' can be more than one page.
      
      Looks reasonable, that it should be less than a page.  MAX_PIS_NS_LEVEL is
      not calculated from PAGE_SIZE, because in this case it depends on
      architectures, config options and it will be reduced, if someone adds a
      new fields in struct pid or struct upid.
      
      I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to expand
      "struct pid" and it's more than enough for all known for me use-cases.
      When someone finds a reasonable use case, we can add a config option or a
      sysctl parameter.
      
      In addition it will reduce the effect of another problem, when we have
      many nested namespaces and the oldest one starts dying.
      zap_pid_ns_processe will be called for each namespace and find_vpid will
      be called for each process in a namespace.  find_vpid will be called
      minimum max_level^2 / 2 times.  The reason of that is that when we found a
      bit in pidmap, we can't determine this pidns is top for this process or it
      isn't.
      
      vpid is a heavy operation, so a fork bomb, which create many nested
      namespace, can make a system inaccessible for a long time.  For example my
      system becomes inaccessible for a few minutes with 4000 processes.
      
      [akpm@linux-foundation.org: return -EINVAL in response to excessive nesting, not -ENOMEM]
      Signed-off-by: NAndrew Vagin <avagin@openvz.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2302505
  5. 25 10月, 2012 1 次提交
  6. 24 10月, 2012 16 次提交
  7. 22 10月, 2012 1 次提交
  8. 20 10月, 2012 6 次提交
  9. 17 10月, 2012 2 次提交
  10. 16 10月, 2012 1 次提交
  11. 13 10月, 2012 3 次提交