1. 05 12月, 2007 2 次提交
  2. 04 12月, 2007 1 次提交
  3. 03 12月, 2007 1 次提交
    • S
      sched: cpu accounting controller (V2) · d842de87
      Srivatsa Vaddagiri 提交于
      Commit cfb52856 removed a useful feature for
      us, which provided a cpu accounting resource controller.  This feature would be
      useful if someone wants to group tasks only for accounting purpose and doesnt
      really want to exercise any control over their cpu consumption.
      
      The patch below reintroduces the feature. It is based on Paul Menage's
      original patch (Commit 62d0df64), with
      these differences:
      
              - Removed load average information. I felt it needs more thought (esp
      	  to deal with SMP and virtualized platforms) and can be added for
      	  2.6.25 after more discussions.
              - Convert group cpu usage to be nanosecond accurate (as rest of the cfs
      	  stats are) and invoke cpuacct_charge() from the respective scheduler
      	  classes
      	- Make accounting scalable on SMP systems by splitting the usage
      	  counter to be per-cpu
      	- Move the code from kernel/cpu_acct.c to kernel/sched.c (since the
      	  code is not big enough to warrant a new file and also this rightly
      	  needs to live inside the scheduler. Also things like accessing
      	  rq->lock while reading cpu usage becomes easier if the code lived in
      	  kernel/sched.c)
      
      The patch also modifies the cpu controller not to provide the same accounting
      information.
      Tested-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      
       Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran
       some simple tests like cpuspin (spin on the cpu), ran several tasks in
       the same group and timed them. Compared their time stamps with
       cpuacct.usage.
      Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d842de87
  4. 30 11月, 2007 4 次提交
  5. 28 11月, 2007 5 次提交
  6. 27 11月, 2007 5 次提交
  7. 20 11月, 2007 5 次提交
  8. 19 11月, 2007 1 次提交
  9. 17 11月, 2007 2 次提交
    • D
      ntp: fix typo that makes sync_cmos_clock erratic · fa6a1a55
      David P. Reed 提交于
      Fix a typo in ntp.c that has caused updating of the persistent (RTC)
      clock when synced to NTP to behave erratically.
      
      When debugging a freeze that arises on my AMD64 machines when I
      run the ntpd service, I added a number of printk's to monitor the
      sync_cmos_clock procedure.  I discovered that it was not syncing to
      cmos RTC every 11 minutes as documented, but instead would keep trying
      every second for hours at a time.  The reason turned out to be a typo
      in sync_cmos_clock, where it attempts to ensure that
      update_persistent_clock is called very close to 500 msec. after a 1
      second boundary (required by the PC RTC's spec). That typo referred to
      "xtime" in one spot, rather than "now", which is derived from "xtime"
      but not equal to it.  This makes the test erratic, creating a
      "coin-flip" that decides when update_persistent_clock is called - when
      it is called, which is rarely, it may be at any time during the one
      second period, rather than close to 500 msec, so the value written is
      needlessly incorrect, too.
      
      Signed-off-by: David P. Reed
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      fa6a1a55
    • I
      x86: ignore the sys_getcpu() tcache parameter · 4307d1e5
      Ingo Molnar 提交于
      dont use the vgetcpu tcache - it's causing problems with tasks
      migrating, they'll see the old cache up to a jiffy after the
      migration, further increasing the costs of the migration.
      
      In the worst case they see a complete bogus information from
      the tcache, when a sys_getcpu() call "invalidated" the cache
      info by incrementing the jiffies _and_ the cpuid info in the
      cache and the following vdso_getcpu() call happens after
      vdso_jiffies have been incremented.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4307d1e5
  10. 16 11月, 2007 7 次提交
    • I
      sched: reorder SCHED_FEAT_ bits · 9612633a
      Ingo Molnar 提交于
      reorder SCHED_FEAT_ bits so that the used ones come first. Makes
      tuning instructions easier.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9612633a
    • A
      sched: make sched_nr_latency static · 518b22e9
      Adrian Bunk 提交于
      sched_nr_latency can now become static.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      518b22e9
    • D
      sched: remove activate_idle_task() · 94bc9a7b
      Dmitry Adamushko 提交于
      cpu_down() code is ok wrt sched_idle_next() placing the 'idle' task not
      at the beginning of the queue.
      
      So get rid of activate_idle_task() and make use of activate_task() instead.
      It is the same as activate_task(), except for the update_rq_clock(rq) call
      that is redundant.
      
      Code size goes down:
      
         text    data     bss     dec     hex filename
        47853    3934     336   52123    cb9b sched.o.before
        47828    3934     336   52098    cb82 sched.o.after
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      94bc9a7b
    • D
      sched: fix __set_task_cpu() SMP race · ce96b5ac
      Dmitry Adamushko 提交于
      Grant Wilson has reported rare SCHED_FAIR_USER crashes on his quad-core
      system, which crashes can only be explained via runqueue corruption.
      
      there is a narrow SMP race in __set_task_cpu(): after ->cpu is set up to
      a new value, task_rq_lock(p, ...) can be successfuly executed on another
      CPU. We must ensure that updates of per-task data have been completed by
      this moment.
      
      this bug has been hiding in the Linux scheduler for an eternity (we never
      had any explicit barrier for task->cpu in set_task_cpu() - so the bug was
      introduced in 2.5.1), but only became visible via set_task_cfs_rq() being
      accidentally put after the task->cpu update. It also probably needs a
      sufficiently out-of-order CPU to trigger.
      Reported-by: NGrant Wilson <grant.wilson@zen.co.uk>
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce96b5ac
    • O
      sched: fix SCHED_FIFO tasks & FAIR_GROUP_SCHED · dae51f56
      Oleg Nesterov 提交于
      Suppose that the SCHED_FIFO task does
      
      	switch_uid(new_user);
      
      Now, p->se.cfs_rq and p->se.parent both point into the old
      user_struct->tg because sched_move_task() doesn't call set_task_cfs_rq()
      for !fair_sched_class case.
      
      Suppose that old user_struct/task_group is freed/reused, and the task
      does
      
      	sched_setscheduler(SCHED_NORMAL);
      
      __setscheduler() sets fair_sched_class, but doesn't update
      ->se.cfs_rq/parent which point to the freed memory.
      
      This means that check_preempt_wakeup() doing
      
      		while (!is_same_group(se, pse)) {
      			se = parent_entity(se);
      			pse = parent_entity(pse);
      		}
      
      may OOPS in a similar way if rq->curr or p did something like above.
      
      Perhaps we need something like the patch below, note that
      __setscheduler() can't do set_task_cfs_rq().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dae51f56
    • C
      sched: fix accounting of interrupts during guest execution on s390 · 9778385d
      Christian Borntraeger 提交于
      Currently the scheduler checks for PF_VCPU to decide if this timeslice
      has to be accounted as guest time. On s390 host interrupts are not
      disabled during guest execution. This causes theses interrupts to be
      accounted as guest time if CONFIG_VIRT_CPU_ACCOUNTING is set. Solution
      is to check if an interrupt triggered account_system_time. As the tick
      is timer interrupt based, we have to subtract hardirq_offset.
      
      I tested the patch on s390 with CONFIG_VIRT_CPU_ACCOUNTING and on
      x86_64. Seems to work.
      
      CC: Avi Kivity <avi@qumranet.com>
      CC: Laurent Vivier <Laurent.Vivier@bull.net>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9778385d
    • R
      wait_task_stopped: Check p->exit_state instead of TASK_TRACED · a3474224
      Roland McGrath 提交于
      The original meaning of the old test (p->state > TASK_STOPPED) was
      "not dead", since it was before TASK_TRACED existed and before the
      state/exit_state split.  It was a wrong correction in commit
      14bf01bb to make this test for
      TASK_TRACED instead.  It should have been changed when TASK_TRACED
      was introducted and again when exit_state was introduced.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Alexey Dobriyan <adobriyan@sw.ru>
      Cc: Kees Cook <kees@ubuntu.com>
      Acked-by: NScott James Remnant <scott@ubuntu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3474224
  11. 15 11月, 2007 7 次提交
    • A
      kernel/taskstats.c: fix bogus nlmsg_free() · f9615984
      Adrian Bunk 提交于
      We'd better not nlmsg_free on a pointer containing an undefined value
      (and without having anything allocated).
      
      Spotted by the Coverity checker.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9615984
    • J
      hibernate: fix lockdep report · 60a0d233
      Johannes Berg 提交于
      Lockdep reports a circular locking dependency in the hibernate code
      because
       - during system boot hibernate code (from an initcall) locks pm_mutex
         and then a sysfs buffer mutex via name_to_dev_t
       - during regular operation hibernate code locks pm_mutex under a
         sysfs buffer mutex because it's called from sysfs methods.
      
      The deadlock can never happen because during initcall invocation nothing
      can write to sysfs yet. This removes the lockdep report by marking the
      initcall locking as being in a different class.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60a0d233
    • R
      __do_IRQ does not check IRQ_DISABLED when IRQ_PER_CPU is set · c642b839
      Russ Anderson 提交于
      In __do_IRQ(), the normal case is that IRQ_DISABLED is checked and if set
      the handler (handle_IRQ_event()) is not called.
      
      Earlier in __do_IRQ(), if IRQ_PER_CPU is set the code does not check
      IRQ_DISABLED and calls the handler even though IRQ_DISABLED is set.  This
      behavior seems unintentional.
      
      One user encountering this behavior is the CPE handler (in
      arch/ia64/kernel/mca.c).  When the CPE handler encounters too many CPEs
      (such as a solid single bit error), it sets up a polling timer and disables
      the CPE interrupt (to avoid excessive overhead logging the stream of single
      bit errors).  disable_irq_nosync() is called which sets IRQ_DISABLED.  The
      IRQ_PER_CPU flag was previously set (in ia64_mca_late_init()).  The net
      result is the CPE handler gets called even though it is marked disabled.
      
      If the behavior of not checking IRQ_DISABLED when IRQ_PER_CPU is set is
      intentional, it would be worthy of a comment describing the intended
      behavior.  disable_irq_nosync() does call chip->disable() to provide a
      chipset specifiec interface for disabling the interrupt, which avoids this
      issue when used.
      Signed-off-by: NRuss Anderson <rja@sgi.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c642b839
    • E
      pidns: Place under CONFIG_EXPERIMENTAL · 57d5f66b
      Eric W. Biederman 提交于
      This is my trivial patch to swat innumerable little bugs with a single
      blow.
      
      After some intensive review (my apologies for not having gotten to this
      sooner) what we have looks like a good base to build on with the current
      pid namespace code but it is not complete, and it is still much to simple
      to find issues where the kernel does the wrong thing outside of the initial
      pid namespace.
      
      Until the dust settles and we are certain we have the ABI and the
      implementation is as correct as humanly possible let's keep process ID
      namespaces behind CONFIG_EXPERIMENTAL.
      
      Allowing us the option of fixing any ABI or other bugs we find as long as
      they are minor.
      
      Allowing users of the kernel to avoid those bugs simply by ensuring their
      kernel does not have support for multiple pid namespaces.
      
      [akpm@linux-foundation.org: coding-style cleanups]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Adrian Bunk <bunk@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Kir Kolyshkin <kir@swsoft.com>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57d5f66b
    • J
      fix param_sysfs_builtin name length check · 22800a28
      Jan Kiszka 提交于
      Commit faf8c714 caused a regression:
      parameter names longer than MAX_KBUILD_MODNAME will now be rejected,
      although we just need to keep the module name part that short.  This patch
      restores the old behaviour while still avoiding that memchr is called with
      its length parameter larger than the total string length.
      Signed-off-by: NJan Kiszka <jan.kiszka@web.de>
      Cc: Dave Young <hidave.darkstar@gmail.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22800a28
    • M
      Linux Kernel Markers: fix marker mutex not taken upon module load · 314de8a9
      Mathieu Desnoyers 提交于
      Upon module load, we must take the markers mutex.  It implies that the marker
      mutex must be nested inside the module mutex.
      
      It implies changing the nesting order : now the marker mutex nests inside the
      module mutex.  Make the necessary changes to reverse the order in which the
      mutexes are taken.
      
      Includes some cleanup from Dave Hansen <haveblue@us.ibm.com>.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314de8a9
    • A
      revert "Task Control Groups: example CPU accounting subsystem" · cfb52856
      Andrew Morton 提交于
      Revert 62d0df64.
      
      This was originally intended as a simple initial example of how to create a
      control groups subsystem; it wasn't intended for mainline, but I didn't make
      this clear enough to Andrew.
      
      The CFS cgroup subsystem now has better functionality for the per-cgroup usage
      accounting (based directly on CFS stats) than the "usage" status file in this
      patch, and the "load" status file is rather simplistic - although having a
      per-cgroup load average report would be a useful feature, I don't believe this
      patch actually provides it.  If it gets into the final 2.6.24 we'd probably
      have to support this interface for ever.
      
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfb52856