1. 03 12月, 2009 1 次提交
    • H
      sched, cputime: Cleanups related to task_times() · d99ca3b9
      Hidetoshi Seto 提交于
      - Remove if({u,s}t)s because no one call it with NULL now.
      - Use cputime_{add,sub}().
      - Add ifndef-endif for prev_{u,s}time since they are used
        only when !VIRT_CPU_ACCOUNTING.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4B1624C7.7040302@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d99ca3b9
  2. 02 12月, 2009 2 次提交
    • R
      sched: Fix isolcpus boot option · bdddd296
      Rusty Russell 提交于
      Anton Blanchard wrote:
      
      > We allocate and zero cpu_isolated_map after the isolcpus
      > __setup option has run. This means cpu_isolated_map always
      > ends up empty and if CPUMASK_OFFSTACK is enabled we write to a
      > cpumask that hasn't been allocated.
      
      I introduced this regression in 49557e62 (sched: Fix
      boot crash by zalloc()ing most of the cpu masks).
      
      Use the bootmem allocator if they set isolcpus=, otherwise
      allocate and zero like normal.
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: peterz@infradead.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <200912021409.17013.rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Tested-by: NAnton Blanchard <anton@samba.org>
      bdddd296
    • T
      sched: Revert 498657a4 · 8592e648
      Tejun Heo 提交于
      498657a4 incorrectly assumed
      that preempt wasn't disabled around context_switch() and thus
      was fixing imaginary problem.  It also broke KVM because it
      depended on ->sched_in() to be called with irq enabled so that
      it can do smp calls from there.
      
      Revert the incorrect commit and add comment describing different
      contexts under with the two callbacks are invoked.
      
      Avi: spotted transposed in/out in the added comment.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NAvi Kivity <avi@redhat.com>
      Cc: peterz@infradead.org
      Cc: efault@gmx.de
      Cc: rusty@rustcorp.com.au
      LKML-Reference: <1259726212-30259-2-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8592e648
  3. 26 11月, 2009 4 次提交
    • H
      sched, time: Define nsecs_to_jiffies() · b7b20df9
      Hidetoshi Seto 提交于
      Use of msecs_to_jiffies() for nsecs_to_cputime() have some
      problems:
      
       - The type of msecs_to_jiffies()'s argument is unsigned int, so
         it cannot convert msecs greater than UINT_MAX = about 49.7 days.
      
       - msecs_to_jiffies() returns MAX_JIFFY_OFFSET if MSB of argument
         is set, assuming that input was negative value.  So it cannot
         convert msecs greater than INT_MAX = about 24.8 days too.
      
      This patch defines a new function nsecs_to_jiffies() that can
      deal greater values, and that can deal all incoming values as
      unsigned.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Amrico Wang <xiyou.wangcong@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      LKML-Reference: <4B0E16E7.5070307@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7b20df9
    • H
      sched: Remove task_{u,s,g}time() · d5b7c78e
      Hidetoshi Seto 提交于
      Now all task_{u,s}time() pairs are replaced by task_times().
      And task_gtime() is too simple to be an inline function.
      
      Cleanup them all.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      LKML-Reference: <4B0E16D1.70902@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d5b7c78e
    • H
      sched: Introduce task_times() to replace task_{u,s}time() pair · d180c5bc
      Hidetoshi Seto 提交于
      Functions task_{u,s}time() are called in pair in almost all
      cases.  However task_stime() is implemented to call task_utime()
      from its inside, so such paired calls run task_utime() twice.
      
      It means we do heavy divisions (div_u64 + do_div) twice to get
      utime and stime which can be obtained at same time by one set
      of divisions.
      
      This patch introduces a function task_times(*tsk, *utime,
      *stime) to retrieve utime and stime at once in better, optimized
      way.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      LKML-Reference: <4B0E16AE.906@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d180c5bc
    • M
      sched: Limit the number of scheduler debug messages · f6630114
      Mike Travis 提交于
      Remove the verbose scheduler debug messages unless kernel
      parameter "sched_debug" set.  /proc/sched_debug unchanged.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091118002221.489305000@alcatraz.americas.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f6630114
  4. 25 11月, 2009 1 次提交
  5. 24 11月, 2009 3 次提交
    • T
      sched: Optimize branch hint in context_switch() · 710390d9
      Tim Blechmann 提交于
      Branch hint profiling on my nehalem machine showed over 90%
      incorrect branch hints:
      
        10420275 170645395  94 context_switch                 sched.c
         3043
        10408421 171098521  94 context_switch                 sched.c
         3050
      Signed-off-by: NTim Blechmann <tim@klingt.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4B0BBB9F.6080304@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      710390d9
    • T
      sched: Optimize branch hint in pick_next_task_fair() · 36ace27e
      Tim Blechmann 提交于
      Branch hint profiling on my nehalem machine showed 90%
      incorrect branch hints:
      
        15728471 158903754  90 pick_next_task_fair
        sched_fair.c    1555
      Signed-off-by: NTim Blechmann <tim@klingt.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4B0BBBB1.2050100@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36ace27e
    • J
      sched_feat_write(): Update ppos instead of file->f_pos · 42994724
      Jan Blunck 提交于
      sched_feat_write() should update ppos instead of file->f_pos.
      
      (This reduces some BKL dependencies of this code.)
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Cc: jkacur@redhat.com
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jamie Lokier <jamie@shareable.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      LKML-Reference: <1258735245-25826-8-git-send-email-jblunck@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      42994724
  6. 16 11月, 2009 1 次提交
  7. 15 11月, 2009 1 次提交
  8. 13 11月, 2009 2 次提交
  9. 12 11月, 2009 2 次提交
    • H
      sched: Fix granularity of task_u/stime() · 761b1d26
      Hidetoshi Seto 提交于
      Originally task_s/utime() were designed to return clock_t but
      later changed to return cputime_t by following commit:
      
        commit efe567fc
        Author: Christian Borntraeger <borntraeger@de.ibm.com>
        Date:   Thu Aug 23 15:18:02 2007 +0200
      
      It only changed the type of return value, but not the
      implementation. As the result the granularity of task_s/utime()
      is still that of clock_t, not that of cputime_t.
      
      So using task_s/utime() in __exit_signal() makes values
      accumulated to the signal struct to be rounded and coarse
      grained.
      
      This patch removes casts to clock_t in task_u/stime(), to keep
      granularity of cputime_t over the calculation.
      
      v2:
        Use div_u64() to avoid error "undefined reference to `__udivdi3`"
        on some 32bit systems.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: xiyou.wangcong@gmail.com
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4AFB9029.9000208@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      761b1d26
    • M
      sched: Fix/add missing update_rq_clock() calls · 055a0086
      Mike Galbraith 提交于
      kthread_bind(), migrate_task() and sched_fork were missing
      updates, and try_to_wake_up() was updating after having already
      used the stale clock.
      
      Aside from preventing potential latency hits, there' a side
      benefit in that early boot printk time stamps become monotonic.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1258020464.6491.2.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <new-submission>
      055a0086
  10. 11 11月, 2009 1 次提交
  11. 10 11月, 2009 1 次提交
  12. 08 11月, 2009 4 次提交
    • L
      sched, no_hz: Remove unused rq->last_tick_seen field · d8c80ce0
      Lai Jiangshan 提交于
      In 15934a37,
      field last_tick_seen is added to struct rq.
      But it is unused now.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Guillaume Chazarain <guichaz@yahoo.fr>
      LKML-Reference: <4AE6A513.6010100@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d8c80ce0
    • C
      sched: Use root_task_group_empty only with FAIR_GROUP_SCHED · e9036b36
      Cyrill Gorcunov 提交于
      root_task_group_empty is used only with FAIR_GROUP_SCHED
      so if we use other scheduler options we get:
      
        kernel/sched.c:314: warning: 'root_task_group_empty' defined but not used
      
      So move CONFIG_FAIR_GROUP_SCHED up that it covers
      root_task_group_empty().
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20091026192414.GB5321@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e9036b36
    • R
      sched: Fix kernel-doc function parameter name · 968c8645
      Randy Dunlap 提交于
      Fix variable name in sched.c kernel-doc notation.
      
      Fixes this DocBook warning:
      
       Warning(kernel/sched.c:2008): No description found for parameter
       'p' Warning(kernel/sched.c:2008): Excess function parameter 'k'
       description in 'kthread_bind'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <4AF4B1BC.8020604@oracle.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      968c8645
    • Y
      genirq: try_one_irq() must be called with irq disabled · e7e7e0c0
      Yong Zhang 提交于
      Prarit reported:
      =================================
      [ INFO: inconsistent lock state ]
      2.6.32-rc5 #1
      ---------------------------------
      inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
      swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
       (&irq_desc_lock_class){?.-...}, at: [<ffffffff810c264e>] try_one_irq+0x32/0x138
      {IN-HARDIRQ-W} state was registered at:
       [<ffffffff81095160>] __lock_acquire+0x2fc/0xd5d
       [<ffffffff81095cb4>] lock_acquire+0xf3/0x12d
       [<ffffffff814cdadd>] _spin_lock+0x40/0x89
       [<ffffffff810c3389>] handle_level_irq+0x30/0x105
       [<ffffffff81014e0e>] handle_irq+0x95/0xb7
       [<ffffffff810141bd>] do_IRQ+0x6a/0xe0
       [<ffffffff81012813>] ret_from_intr+0x0/0x16
      irq event stamp: 195096
      hardirqs last  enabled at (195096): [<ffffffff814cd7f7>] _spin_unlock_irq+0x3a/0x5c
      hardirqs last disabled at (195095): [<ffffffff814cdbdd>] _spin_lock_irq+0x29/0x95
      softirqs last  enabled at (195088): [<ffffffff81068c92>] __do_softirq+0x1c1/0x1ef
      softirqs last disabled at (195093): [<ffffffff8101304c>] call_softirq+0x1c/0x30
      
      other info that might help us debug this:
      1 lock held by swapper/0:
       #0:  (kernel/irq/spurious.c:21){+.-...}, at: [<ffffffff81070cf2>]
      run_timer_softirq+0x1a9/0x315
      
      stack backtrace:
      Pid: 0, comm: swapper Not tainted 2.6.32-rc5 #1
      Call Trace:
       <IRQ>  [<ffffffff81093e94>] valid_state+0x187/0x1ae
       [<ffffffff81093fe4>] mark_lock+0x129/0x253
       [<ffffffff810951d4>] __lock_acquire+0x370/0xd5d
       [<ffffffff81095cb4>] lock_acquire+0xf3/0x12d
       [<ffffffff814cdadd>] _spin_lock+0x40/0x89
       [<ffffffff810c264e>] try_one_irq+0x32/0x138
       [<ffffffff810c2795>] poll_all_shared_irqs+0x41/0x6d
       [<ffffffff810c27dd>] poll_spurious_irqs+0x1c/0x49
       [<ffffffff81070d82>] run_timer_softirq+0x239/0x315
       [<ffffffff81068bd3>] __do_softirq+0x102/0x1ef
       [<ffffffff8101304c>] call_softirq+0x1c/0x30
       [<ffffffff81014b65>] do_softirq+0x59/0xca
       [<ffffffff810686ad>] irq_exit+0x58/0xae
       [<ffffffff81029b84>] smp_apic_timer_interrupt+0x94/0xba
       [<ffffffff81012a33>] apic_timer_interrupt+0x13/0x20
      
      The reason is that try_one_irq() is called from hardirq context with
      interrupts disabled and from softirq context (poll_all_shared_irqs())
      with interrupts enabled.
      
      Disable interrupts before calling it from poll_all_shared_irqs().
      Reported-and-tested-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
      LKML-Reference: <1257563773-4620-1-git-send-email-yong.zhang0@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e7e7e0c0
  13. 05 11月, 2009 3 次提交
    • M
      sched: Fix affinity logic in select_task_rq_fair() · fd210738
      Mike Galbraith 提交于
      Ingo Molnar reported:
      
      [   26.804000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
      [   26.808000] caller is vmstat_update+0x26/0x70
      [   26.812000] Pid: 10, comm: events/1 Not tainted 2.6.32-rc5 #6887
      [   26.816000] Call Trace:
      [   26.820000]  [<c1924a24>] ? printk+0x28/0x3c
      [   26.824000]  [<c13258a0>] debug_smp_processor_id+0xf0/0x110
      [   26.824000] mount used greatest stack depth: 1464 bytes left
      [   26.828000]  [<c111d086>] vmstat_update+0x26/0x70
      [   26.832000]  [<c1086418>] worker_thread+0x188/0x310
      [   26.836000]  [<c10863b7>] ? worker_thread+0x127/0x310
      [   26.840000]  [<c108d310>] ? autoremove_wake_function+0x0/0x60
      [   26.844000]  [<c1086290>] ? worker_thread+0x0/0x310
      [   26.848000]  [<c108cf0c>] kthread+0x7c/0x90
      [   26.852000]  [<c108ce90>] ? kthread+0x0/0x90
      [   26.856000]  [<c100c0a7>] kernel_thread_helper+0x7/0x10
      [   26.860000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
      [   26.864000] caller is vmstat_update+0x3c/0x70
      
      Because this commit:
      
        a1f84a3a: sched: Check for an idle shared cache in select_task_rq_fair()
      
      broke ->cpus_allowed.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: arjan@infradead.org
      Cc: <stable@kernel.org>
      LKML-Reference: <1257415066.12867.1.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fd210738
    • M
      sched: Rate-limit newidle · 1b9508f6
      Mike Galbraith 提交于
      Rate limit newidle to migration_cost. It's a win for all
      stages of sysbench oltp tests.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1b9508f6
    • M
      sched: Check for an idle shared cache in select_task_rq_fair() · a1f84a3a
      Mike Galbraith 提交于
      When waking affine, check for an idle shared cache, and if
      found, wake to that CPU/sibling instead of the waker's CPU.
      
      This improves pgsql+oltp ramp up by roughly 8%. Possibly more
      for other loads, depending on overlap. The trade-off is a
      roughly 1% peak downturn if tasks are truly synchronous.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <1256654138.17752.7.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a1f84a3a
  14. 04 11月, 2009 6 次提交
  15. 03 11月, 2009 5 次提交
    • I
      Correct nr_processes() when CPUs have been unplugged · 1d510750
      Ian Campbell 提交于
      nr_processes() returns the sum of the per cpu counter process_counts for
      all online CPUs. This counter is incremented for the current CPU on
      fork() and decremented for the current CPU on exit(). Since a process
      does not necessarily fork and exit on the same CPU the process_count for
      an individual CPU can be either positive or negative and effectively has
      no meaning in isolation.
      
      Therefore calculating the sum of process_counts over only the online
      CPUs omits the processes which were started or stopped on any CPU which
      has since been unplugged. Only the sum of process_counts across all
      possible CPUs has meaning.
      
      The only caller of nr_processes() is proc_root_getattr() which
      calculates the number of links to /proc as
              stat->nlink = proc_root.nlink + nr_processes();
      
      You don't have to be all that unlucky for the nr_processes() to return a
      negative value leading to a negative number of links (or rather, an
      apparently enormous number of links). If this happens then you can get
      failures where things like "ls /proc" start to fail because they got an
      -EOVERFLOW from some stat() call.
      
      Example with some debugging inserted to show what goes on:
              # ps haux|wc -l
              nr_processes: CPU0:     90
              nr_processes: CPU1:     1030
              nr_processes: CPU2:     -900
              nr_processes: CPU3:     -136
              nr_processes: TOTAL:    84
              proc_root_getattr. nlink 12 + nr_processes() 84 = 96
              84
              # echo 0 >/sys/devices/system/cpu/cpu1/online
              # ps haux|wc -l
              nr_processes: CPU0:     85
              nr_processes: CPU2:     -901
              nr_processes: CPU3:     -137
              nr_processes: TOTAL:    -953
              proc_root_getattr. nlink 12 + nr_processes() -953 = -941
              75
              # stat /proc/
              nr_processes: CPU0:     84
              nr_processes: CPU2:     -901
              nr_processes: CPU3:     -137
              nr_processes: TOTAL:    -954
              proc_root_getattr. nlink 12 + nr_processes() -954 = -942
                File: `/proc/'
                Size: 0               Blocks: 0          IO Block: 1024   directory
              Device: 3h/3d   Inode: 1           Links: 4294966354
              Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
              Access: 2009-11-03 09:06:55.000000000 +0000
              Modify: 2009-11-03 09:06:55.000000000 +0000
              Change: 2009-11-03 09:06:55.000000000 +0000
      
      I'm not 100% convinced that the per_cpu regions remain valid for offline
      CPUs, although my testing suggests that they do. If not then I think the
      correct solution would be to aggregate the process_count for a given CPU
      into a global base value in cpu_down().
      
      This bug appears to pre-date the transition to git and it looks like it
      may even have been present in linux-2.6.0-test7-bk3 since it looks like
      the code Rusty patched in http://lwn.net/Articles/64773/ was already
      wrong.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d510750
    • J
      PM / Hibernate: Add newline to load_image() fail path · bf9fd67a
      Jiri Slaby 提交于
      Finish a line by \n when load_image fails in the middle of loading.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      bf9fd67a
    • J
      PM / Hibernate: Fix error handling in save_image() · 4ff277f9
      Jiri Slaby 提交于
      There are too many retval variables in save_image(). Thus error return
      value from snapshot_read_next() may be ignored and only part of the
      snapshot (successfully) written.
      
      Remove 'error' variable, invert the condition in the do-while loop
      and convert the loop to use only 'ret' variable.
      
      Switch the rest of the function to consider only 'ret'.
      
      Also make sure we end printed line by \n if an error occurs.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      4ff277f9
    • J
      PM / Hibernate: Fix blkdev refleaks · 76b57e61
      Jiri Slaby 提交于
      While cruising through the swsusp code I found few blkdev reference
      leaks of resume_bdev.
      
      swsusp_read: remove blkdev_put altogether. Some fail paths do
                   not do that.
      swsusp_check: make sure we always put a reference on fail paths
      software_resume: all fail paths between swsusp_check and swsusp_read
                       omit swsusp_close. Add it in those cases. And since
                       swsusp_read doesn't drop the reference anymore, do
                       it here unconditionally.
      
      [rjw: Fixed a small coding style issue.]
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      76b57e61
    • M
      sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c · b84ff7d6
      Mike Galbraith 提交于
      Eric Paris reported that commit
      f685ceac causes boot time
      PREEMPT_DEBUG complaints.
      
       [    4.590699] BUG: using smp_processor_id() in preemptible [00000000] code: rmmod/1314
       [    4.593043] caller is task_hot+0x86/0xd0
      
      Since kthread_bind() messes with scheduler internals, move the
      body to sched.c, and lock the runqueue.
      Reported-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Tested-by: NEric Paris <eparis@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1256813310.7574.3.camel@marge.simson.net>
      [ v2: fix !SMP build and clean up ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b84ff7d6
  16. 02 11月, 2009 3 次提交
    • P
      rcu: Fix long-grace-period race between forcing and initialization · 83f5b01f
      Paul E. McKenney 提交于
      Very long RCU read-side critical sections (50 milliseconds or
      so) can cause a race between force_quiescent_state() and
      rcu_start_gp() as follows on kernel builds with multi-level
      rcu_node hierarchies:
      
      1.	CPU 0 calls force_quiescent_state(), sees that there is a
      	grace period in progress, and acquires ->fsqlock.
      
      2.	CPU 1 detects the end of the grace period, and so
      	cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum.
      	This operation is carried out under the root rnp->lock,
      	but CPU 0 has not yet acquired that lock.  Note that
      	rsp->signaled is still RCU_SAVE_DYNTICK from the last
      	grace period.
      
      3.	CPU 1 calls rcu_start_gp(), but no one wants a new grace
      	period, so it drops the root rnp->lock and returns.
      
      4.	CPU 0 acquires the root rnp->lock and picks up rsp->completed
      	and rsp->signaled, then drops rnp->lock.  It then enters the
      	RCU_SAVE_DYNTICK leg of the switch statement.
      
      5.	CPU 2 invokes call_rcu(), and now needs a new grace period.
      	It calls rcu_start_gp(), which acquires the root rnp->lock, sets
      	rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in
      	the RCU_SAVE_DYNTICK leg of the switch statement!)  and starts
      	initializing the rcu_node hierarchy.  If there are multiple
      	levels to the hierarchy, it will drop the root rnp->lock and
      	initialize the lower levels of the hierarchy.
      
      6.	CPU 0 notes that rsp->completed has not changed, which permits
              both CPU 2 and CPU 0 to try updating it concurrently.  If CPU 0's
      	update prevails, later calls to force_quiescent_state() can
      	count old quiescent states against the new grace period, which
      	can in turn result in premature ending of grace periods.
      
      	Not good.
      
      This patch adds an RCU_GP_IDLE state for rsp->signaled that is
      set initially at boot time and any time a grace period ends.
      This prevents CPU 0 from getting into the workings of
      force_quiescent_state() in step 4.  Additional locking and
      checks prevent the concurrent update of rsp->signaled in step 6.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1256742889199-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83f5b01f
    • T
      uids: Prevent tear down race · b00bc0b2
      Thomas Gleixner 提交于
      Ingo triggered the following warning:
      
      WARNING: at lib/debugobjects.c:255 debug_print_object+0x42/0x50()
      Hardware name: System Product Name
      ODEBUG: init active object type: timer_list
      Modules linked in:
      Pid: 2619, comm: dmesg Tainted: G        W  2.6.32-rc5-tip+ #5298
      Call Trace:
       [<81035443>] warn_slowpath_common+0x6a/0x81
       [<8120e483>] ? debug_print_object+0x42/0x50
       [<81035498>] warn_slowpath_fmt+0x29/0x2c
       [<8120e483>] debug_print_object+0x42/0x50
       [<8120ec2a>] __debug_object_init+0x279/0x2d7
       [<8120ecb3>] debug_object_init+0x13/0x18
       [<810409d2>] init_timer_key+0x17/0x6f
       [<81041526>] free_uid+0x50/0x6c
       [<8104ed2d>] put_cred_rcu+0x61/0x72
       [<81067fac>] rcu_do_batch+0x70/0x121
      
      debugobjects warns about an enqueued timer being initialized. If
      CONFIG_USER_SCHED=y the user management code uses delayed work to
      remove the user from the hash table and tear down the sysfs objects.
      
      free_uid is called from RCU and initializes/schedules delayed work if
      the usage count of the user_struct is 0. The init/schedule happens
      outside of the uidhash_lock protected region which allows a concurrent
      caller of find_user() to reference the about to be destroyed
      user_struct w/o preventing the work from being scheduled. If the next
      free_uid call happens before the work timer expired then the active
      timer is initialized and the work scheduled again.
      
      The race was introduced in commit 5cb350ba (sched: group scheduling,
      sysfs tunables) and made more prominent by commit 3959214f (sched:
      delayed cleanup of user_struct)
      
      Move the init/schedule_delayed_work inside of the uidhash_lock
      protected region to prevent the race.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@us.ibm.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: stable@kernel.org
      b00bc0b2
    • R
      sched: Fix boot crash by zalloc()ing most of the cpu masks · 49557e62
      Rusty Russell 提交于
      I got a boot crash when forcing cpumasks offstack on 32 bit,
      because find_new_ilb() returned 3 on my UP system (nohz.cpu_mask
      wasn't zeroed).
      
      AFAICT the others need to be zeroed too: only
      nohz.ilb_grp_nohz_mask is initialized before use.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <200911022037.21282.rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      49557e62