1. 22 7月, 2008 1 次提交
  2. 19 7月, 2008 1 次提交
    • E
      genirq: enable polling for disabled screaming irqs · f84dbb91
      Eric W. Biederman 提交于
      When we disable a screaming irq we never see it again.  If the irq
      line is shared or if the driver half works this is a real pain.  So
      periodically poll the handlers for screaming interrupts.
      
      I use a timer instead of the classic irq poll technique of working off
      the timer interrupt because when we use the local apic timers
      note_interrupt is never called (bug?).  Further on a system with
      dynamic ticks the timer interrupt might not even fire unless there is
      a timer telling it it needs to.
      
      I forced this case on my test system with an e1000 nic and my ssh
      session remained responsive despite the interrupt handler only being
      called every 10th of a second.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f84dbb91
  3. 17 7月, 2008 9 次提交
  4. 16 7月, 2008 1 次提交
  5. 14 7月, 2008 4 次提交
    • I
      lockdep: fix kernel/fork.c warning · d12c1a37
      Ingo Molnar 提交于
      fix:
      
      [    0.184011] ------------[ cut here ]------------
      [    0.188011] WARNING: at kernel/fork.c:918 copy_process+0x1c0/0x1084()
      [    0.192011] Pid: 0, comm: swapper Not tainted 2.6.26-tip-00351-g01d4a50-dirty #14521
      [    0.196011]  [<c0135d48>] warn_on_slowpath+0x3c/0x60
      [    0.200012]  [<c016f805>] ? __alloc_pages_internal+0x92/0x36b
      [    0.208012]  [<c033de5e>] ? __spin_lock_init+0x24/0x4a
      [    0.212012]  [<c01347e3>] copy_process+0x1c0/0x1084
      [    0.216013]  [<c013575f>] do_fork+0xb8/0x1ad
      [    0.220013]  [<c034f75e>] ? acpi_os_release_lock+0x8/0xa
      [    0.228013]  [<c034ff7a>] ? acpi_os_vprintf+0x20/0x24
      [    0.232014]  [<c01129ee>] kernel_thread+0x75/0x7d
      [    0.236014]  [<c0a491eb>] ? kernel_init+0x0/0x24a
      [    0.240014]  [<c0a491eb>] ? kernel_init+0x0/0x24a
      [    0.244014]  [<c01151b0>] ? kernel_thread_helper+0x0/0x10
      [    0.252015]  [<c06c6ac0>] rest_init+0x14/0x50
      [    0.256015]  [<c0a498ce>] start_kernel+0x2b9/0x2c0
      [    0.260015]  [<c0a4904f>] __init_begin+0x4f/0x57
      [    0.264016]  =======================
      [    0.268016] ---[ end trace 4eaa2a86a8e2da22 ]---
      [    0.272016] enabled ExtINT on CPU#0
      
      which occurs if CONFIG_TRACE_IRQFLAGS=y, CONFIG_DEBUG_LOCKDEP=y,
      but CONFIG_PROVE_LOCKING is disabled.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d12c1a37
    • I
      lockdep: fix ftrace irq tracing false positive · 992860e9
      Ingo Molnar 提交于
      fix this false positive:
      
      [    0.020000] ------------[ cut here ]------------
      [    0.020000] WARNING: at kernel/lockdep.c:2718 check_flags+0x14a/0x170()
      [    0.020000] Modules linked in:
      [    0.020000] Pid: 0, comm: swapper Not tainted 2.6.26-tip-00343-gd7e5521-dirty #14486
      [    0.020000]  [<c01312e4>] warn_on_slowpath+0x54/0x80
      [    0.020000]  [<c067e451>] ? _spin_unlock_irqrestore+0x61/0x70
      [    0.020000]  [<c0131bb1>] ? release_console_sem+0x201/0x210
      [    0.020000]  [<c0143d65>] ? __kernel_text_address+0x35/0x40
      [    0.020000]  [<c010562e>] ? dump_trace+0x5e/0x140
      [    0.020000]  [<c01518b5>] ? __lock_acquire+0x245/0x820
      [    0.020000]  [<c015063a>] check_flags+0x14a/0x170
      [    0.020000]  [<c0151ed8>] ? lock_acquire+0x48/0xc0
      [    0.020000]  [<c0151ee1>] lock_acquire+0x51/0xc0
      [    0.020000]  [<c014a16c>] ? down+0x2c/0x40
      [    0.020000]  [<c010a609>] ? sched_clock+0x9/0x10
      [    0.020000]  [<c067e7b2>] _write_lock+0x32/0x60
      [    0.020000]  [<c013797f>] ? request_resource+0x1f/0xb0
      [    0.020000]  [<c013797f>] request_resource+0x1f/0xb0
      [    0.020000]  [<c02f89ad>] vgacon_startup+0x2bd/0x3e0
      [    0.020000]  [<c094d62a>] con_init+0x19/0x22f
      [    0.020000]  [<c0330c7c>] ? tty_register_ldisc+0x5c/0x70
      [    0.020000]  [<c094cf49>] console_init+0x20/0x2e
      [    0.020000]  [<c092a969>] start_kernel+0x20c/0x379
      [    0.020000]  [<c092a516>] ? unknown_bootoption+0x0/0x1f6
      [    0.020000]  [<c092a099>] __init_begin+0x99/0xa1
      [    0.020000]  =======================
      [    0.020000] ---[ end trace 4eaa2a86a8e2da22 ]---
      [    0.020000] possible reason: unannotated irqs-on.
      [    0.020000] irq event stamp: 0
      
      which occurs if CONFIG_TRACE_IRQFLAGS=y, CONFIG_DEBUG_LOCKDEP=y,
      but CONFIG_PROVE_LOCKING is disabled.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      992860e9
    • S
      Security: split proc ptrace checking into read vs. attach · 006ebb40
      Stephen Smalley 提交于
      Enable security modules to distinguish reading of process state via
      proc from full ptrace access by renaming ptrace_may_attach to
      ptrace_may_access and adding a mode argument indicating whether only
      read access or full attach access is requested.  This allows security
      modules to permit access to reading process state without granting
      full ptrace access.  The base DAC/capability checking remains unchanged.
      
      Read access to /proc/pid/mem continues to apply a full ptrace attach
      check since check_mem_permission() already requires the current task
      to already be ptracing the target.  The other ptrace checks within
      proc for elements like environ, maps, and fds are changed to pass the
      read mode instead of attach.
      
      In the SELinux case, we model such reading of process state as a
      reading of a proc file labeled with the target process' label.  This
      enables SELinux policy to permit such reading of process state without
      permitting control or manipulation of the target process, as there are
      a number of cases where programs probe for such information via proc
      but do not need to be able to control the target (e.g. procps,
      lsof, PolicyKit, ConsoleKit).  At present we have to choose between
      allowing full ptrace in policy (more permissive than required/desired)
      or breaking functionality (or in some cases just silencing the denials
      via dontaudit rules but this can hide genuine attacks).
      
      This version of the patch incorporates comments from Casey Schaufler
      (change/replace existing ptrace_may_attach interface, pass access
      mode), and Chris Wright (provide greater consistency in the checking).
      
      Note that like their predecessors __ptrace_may_attach and
      ptrace_may_attach, the __ptrace_may_access and ptrace_may_access
      interfaces use different return value conventions from each other (0
      or -errno vs. 1 or 0).  I retained this difference to avoid any
      changes to the caller logic but made the difference clearer by
      changing the latter interface to return a bool rather than an int and
      by adding a comment about it to ptrace.h for any future callers.
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      006ebb40
    • L
      rcu classic: update qlen when cpu offline · 199a9528
      Lai Jiangshan 提交于
      When callbacks are moved from offline cpu to this cpu,
      the qlen field of this rdp should be updated.
      
      [ Paul E. McKenney: ]
      
      The effect of this bug would be for force_quiescent_state() to be invoked
      when it should not and vice versa -- wasting cycles in the first case
      and letting RCU callbacks remain piled up in the second case.  The bug
      is thus "benign" in that it does not result in premature grace-period
      termination, but should of course be fixed nonetheless.
      
      Preemption is disabled by the caller's get_cpu_var(), so we are guaranteed
      to remain on the same CPU, as required.  The local_irq_disable() is indeed
      needed, otherwise, an interrupt might invoke call_rcu() or call_rcu_bh(),
      which could cause that interrupt's increment of ->qlen to be lost.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      199a9528
  6. 13 7月, 2008 1 次提交
    • D
      cpusets, hotplug, scheduler: fix scheduler domain breakage · 3e84050c
      Dmitry Adamushko 提交于
      Commit f18f982a ("sched: CPU hotplug events must not destroy scheduler
      domains created by the cpusets") introduced a hotplug-related problem as
      described below:
      
      Upon CPU_DOWN_PREPARE,
      
        update_sched_domains() -> detach_destroy_domains(&cpu_online_map)
      
      does the following:
      
      /*
       * Force a reinitialization of the sched domains hierarchy. The domains
       * and groups cannot be updated in place without racing with the balancing
       * code, so we temporarily attach all running cpus to the NULL domain
       * which will prevent rebalancing while the sched domains are recalculated.
       */
      
      The sched-domains should be rebuilt when a CPU_DOWN ops. has been
      completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
      CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
      initial state). That's what update_sched_domains() also does but only
      for !CPUSETS case.
      
      With f18f982a, sched-domains' reinitialization is delegated to
      CPUSETS code:
      
      cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
      rebuild_sched_domains()
      
      Being called for CPU_UP_PREPARE and if its callback is called after
      update_sched_domains()), it just negates all the work done by
      update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
      the sched-domains and that makes it visible for the load-balancer
      while the CPU_DOWN ops. is in progress.
      
      __migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
      "offline" when this function is called).
      
      try_to_wake_up() is called for one of these tasks from another CPU ->
      the load-balancer (wake_idle()) picks up a "dead" CPU and places the
      task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
      -> oops.
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: miaox@cn.fujitsu.com
      Cc: rostedt@goodmis.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e84050c
  7. 11 7月, 2008 20 次提交
    • I
      ftrace: build fix for ftraced_suspend · b2613e37
      Ingo Molnar 提交于
      fix:
      
       kernel/trace/ftrace.c:1615: error: 'ftraced_suspend' undeclared (first use in this function)
       kernel/trace/ftrace.c:1615: error: (Each undeclared identifier is reported only once
       kernel/trace/ftrace.c:1615: error: for each function it appears in.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b2613e37
    • S
      sched_clock: and multiplier for TSC to gtod drift · c300ba25
      Steven Rostedt 提交于
      The sched_clock code currently tries to keep all CPU clocks of all CPUS
      somewhat in sync. At every clock tick it records the gtod clock and
      uses that and jiffies and the TSC to calculate a CPU clock that tries to
      stay in sync with all the other CPUs.
      
      ftrace depends heavily on this timer and it detects when this timer
      "jumps".  One problem is that the TSC and the gtod also drift.
      When the TSC is 0.1% faster or slower than the gtod it is very noticeable
      in ftrace. To help compensate for this, I've added a multiplier that
      tries to keep the CPU clock updating at the same rate as the gtod.
      
      I've tried various ways to get it to be in sync and this ended up being
      the most reliable. At every scheduler tick we calculate the new multiplier:
      
        multi = delta_gtod / delta_TSC
      
      This means we perform a 64 bit divide at the tick (once a HZ). A shift
      is used to handle the accuracy.
      
      Other methods that failed due to dynamic HZ are:
      
      (not used)  multi += (gtod - tsc) / delta_gtod
      (not used)  multi += (gtod - (last_tsc + delta_tsc)) / delta_gtod
      
      as well as other variants.
      
      This code still allows for a slight drift between TSC and gtod, but
      it keeps the damage down to a minimum.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c300ba25
    • S
      sched_clock: record TSC after gtod · a83bc47c
      Steven Rostedt 提交于
      To read the gtod we need to grab the xtime lock for read. Reading the gtod
      before the TSC can cause a bigger gab if the xtime lock is contended.
      
      This patch simply reverses the order to read the TSC after the gtod.
      The locking in the reading of the gtod handles any barriers one might
      think is needed.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a83bc47c
    • S
      sched_clock: only update deltas with local reads. · c0c87734
      Steven Rostedt 提交于
      Reading the CPU clock should try to stay accurate within the CPU.
      By reading the CPU clock from another CPU and updating the deltas can
      cause unneeded jumps when reading from the local CPU.
      
      This patch changes the code to update the last read TSC only when read
      from the local CPU.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c0c87734
    • S
      sched_clock: fix calculation of other CPU · 2b8a0cf4
      Steven Rostedt 提交于
      The algorithm to calculate the 'now' of another CPU is not correct.
      At each scheduler tick, each CPU records the last sched_clock and
      gtod (tick_raw and tick_gtod respectively). If the TSC is somewhat the
      same in speed between two clocks the algorithm would be:
      
        tick_gtod1 + (now1 - tick_raw1) = tick_gtod2 + (now2 - tick_raw2)
      
      To calculate now2 we would have:
      
        now2 = (tick_gtod1 - tick_gtod2) + (tick_raw2 - tick_raw1) + now1
      
      Currently the algorithm is:
      
        now2 = (tick_gtod1 - tick_gtod2) + (tick_raw1 - tick_raw2) + now1
      
      This solves most of the rest of the issues I've had with timestamps in
      ftace.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2b8a0cf4
    • S
      sched_clock: stop maximum check on NO HZ · af52a90a
      Steven Rostedt 提交于
      Working with ftrace I would get large jumps of 11 millisecs or more with
      the clock tracer. This killed the latencing timings of ftrace and also
      caused the irqoff self tests to fail.
      
      What was happening is with NO_HZ the idle would stop the jiffy counter and
      before the jiffy counter was updated the sched_clock would have a bad
      delta jiffies to compare with the gtod with the maximum.
      
      The jiffies would stop and the last sched_tick would record the last gtod.
      On wakeup, the sched clock update would compare the gtod + delta jiffies
      (which would be zero) and compare it to the TSC. The TSC would have
      correctly (with a stable TSC) moved forward several jiffies. But because the
      jiffies has not been updated yet the clock would be prevented from moving
      forward because it would appear that the TSC jumped too far ahead.
      
      The clock would then virtually stop, until the jiffies are updated. Then
      the next sched clock update would see that the clock was very much behind
      since the delta jiffies is now correct. This would then jump the clock
      forward by several jiffies.
      
      This caused ftrace to report several milliseconds of interrupts off
      latency at every resume from NO_HZ idle.
      
      This patch adds hooks into the nohz code to disable the checking of the
      maximum clock update when nohz is in effect. It resumes the max check
      when nohz has updated the jiffies again.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af52a90a
    • S
      sched_clock: widen the max and min time · f7cce27f
      Steven Rostedt 提交于
      With keeping the max and min sched time within one jiffy of the gtod clock
      was too tight. Just before a schedule tick the max could easily be hit, as
      well as just after a schedule_tick the min could be hit. This caused the
      clock to jump around by a jiffy.
      
      This patch widens the minimum to
         last gtod + (delta_jiffies ? delta_jiffies - 1 : 0) * TICK_NSECS
      
      and the maximum to
          last gtod + (2 + delta_jiffies) * TICK_NSECS
      
      This keeps the minum to gtod or if one jiffy less than delta jiffies
      and the maxim 2 jiffies ahead of gtod. This may cause unstable TSCs to be
      a bit more sporadic, but it helps keep a clock with a stable TSC working well.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f7cce27f
    • S
      sched_clock: record from last tick · 62c43dd9
      Steven Rostedt 提交于
      The sched_clock code tries to keep within the gtod time by one tick (jiffy).
      The current code mistakenly keeps track of the delta jiffies between
      updates of the clock, where the the delta is used to compare with the
      number of jiffies that have past since an update of the gtod. The gtod is
      updated at each schedule tick not each sched_clock update. After one
      jiffy passes the clock is updated fine. But the delta is taken from the
      last update so if the next update happens before the next tick the delta
      jiffies used will be incorrect.
      
      This patch changes the code to check the delta of jiffies between ticks
      and not updates to match the comparison of the updates with the gtod.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      62c43dd9
    • S
      ftrace: separate out the function enabled variable · 60bc0800
      Steven Rostedt 提交于
      Currently the function tracer uses the global tracer_enabled variable that
      is used to keep track if the tracer is enabled or not. The function tracing
      startup needs to be separated out, otherwise the internal happenings of
      the tracer startup is also recorded.
      
      This patch creates a ftrace_function_enabled variable to all the starting
      of the function traces to happen after everything has been started.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60bc0800
    • S
      ftrace: add ftrace_kill_atomic · a2bb6a3d
      Steven Rostedt 提交于
      It has been suggested that I add a way to disable the function tracer
      on an oops. This code adds a ftrace_kill_atomic. It is not meant to be
      used in normal situations. It will disable the ftrace tracer, but will
      not perform the nice shutdown that requires scheduling.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a2bb6a3d
    • S
      ftrace: use current CPU for function startup · 26bc83f4
      Steven Rostedt 提交于
      This is more of a clean up. Currently the function tracer initializes the
      tracer with which ever CPU was last used for tracing. This value isn't
      realy useful for function tracing, but at least it should be something other
      than a random number.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      26bc83f4
    • S
      ftrace: start wakeup tracing after setting function tracer · ad591240
      Steven Rostedt 提交于
      Enabling the wakeup tracer before enabling the function tracing causes
      some strange results due to the dynamic enabling of the functions.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ad591240
    • S
      ftrace: check proper config for preempt type · b5c21b45
      Steven Rostedt 提交于
      There is no CONFIG_PREEMPT_DESKTOP. Use the proper entry CONFIG_PREEMPT.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b5c21b45
    • S
      ftrace: trace schedule · 1e16c0a0
      Steven Rostedt 提交于
      After the sched_clock code has been removed from sched.c we can now trace
      the scheduler. The scheduler has a lot of functions that would be worth
      tracing.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1e16c0a0
    • S
      ftrace: define function trace nop · 001b6767
      Steven Rostedt 提交于
      When CONFIG_FTRACE is not enabled, the tracing_start_functon_trace
      and tracing_stop_function_trace should be nops.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      001b6767
    • S
      ftrace: move sched_switch enable after markers · 007c05d4
      Steven Rostedt 提交于
      We have two markers now that are enabled on sched_switch. One that records
      the context switching and the other that records task wake ups. Currently
      we enable the tracing first and then set the markers. This causes some
      confusing traces:
      
      # tracer: sched_switch
      #
      #           TASK-PID   CPU#    TIMESTAMP  FUNCTION
      #              | |      |          |         |
             trace-cmd-3973  [00]   115.834817:   3973:120:R   +     3:  0:S
             trace-cmd-3973  [01]   115.834910:   3973:120:R   +     6:  0:S
             trace-cmd-3973  [02]   115.834910:   3973:120:R   +     9:  0:S
             trace-cmd-3973  [03]   115.834910:   3973:120:R   +    12:  0:S
             trace-cmd-3973  [02]   115.834910:   3973:120:R   +     9:  0:S
                <idle>-0     [02]   115.834910:      0:140:R ==>  3973:120:R
      
      Here we see that trace-cmd with PID 3973 wakes up task 9 but the next line
      shows the idle task doing a context switch to task 3973.
      
      Enabling the tracing to _after_ the markers are set creates a much saner
      output:
      
      # tracer: sched_switch
      #
      #           TASK-PID   CPU#    TIMESTAMP  FUNCTION
      #              | |      |          |         |
                <idle>-0     [02]  7922.634225:      0:140:R ==>  4790:120:R
             trace-cmd-4789  [03]  7922.634225:      0:140:R   +  4790:120:R
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      007c05d4
    • H
      nohz: don't stop idle tick if softirqs are pending. · 857f3fd7
      Heiko Carstens 提交于
      In case a cpu goes idle but softirqs are pending only an error message is
      printed to the console. It may take a very long time until the pending
      softirqs will finally be executed. Worst case would be a hanging system.
      
      With this patch the timer tick just continues and the softirqs will be
      executed after the next interrupt. Still a delay but better than a
      hanging system.
      
      Currently we have at least two device drivers on s390 which under certain
      circumstances schedule a tasklet from process context. This is a reason
      why we can end up with pending softirqs when going idle. Fixing these
      drivers seems to be non-trivial.
      However there is no question that the drivers should be fixed.
      This patch shouldn't be considered as a bug fix. It just is intended to
      keep a system running even if device drivers are buggy.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Glauber <jan.glauber@de.ibm.com>
      Cc: Stefan Weinhuber <wein@de.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      857f3fd7
    • L
      sched: fix cpu hotplug, cleanup · b1e38734
      Linus Torvalds 提交于
      Clean up __migrate_task(): to just have separate "done" and "fail"
      cases, instead of that "out" case with random error behavior.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b1e38734
    • N
      Fix PREEMPT_RCU without HOTPLUG_CPU · 70ff0555
      Nick Piggin 提交于
      PREEMPT_RCU without HOTPLUG_CPU is broken.  The rcu_online_cpu is called
      to initially populate rcu_cpu_online_map with all online CPUs when the
      hotplug event handler is installed, and also to populate the map with
      CPUs as they come online.  The former case is meant to happen with and
      without HOTPLUG_CPU, but without HOTPLUG_CPU, the rcu_offline_cpu
      function is no-oped -- while it still gets called, it does not set the
      rcu CPU map.
      
      With a blank RCU CPU map, grace periods get to tick by completely
      oblivious to active RCU read side critical sections.  This results in
      free-before-grace bugs.
      
      Fix is obvious once the problem is known. (Also, change __devinit to
      __cpuinit so the function gets thrown away on !HOTPLUG_CPU kernels).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reported-and-tested-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ Nick is my personal hero of the day - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70ff0555
    • D
  8. 10 7月, 2008 2 次提交
    • D
      sched: fix cpu hotplug · dc7fab8b
      Dmitry Adamushko 提交于
      I think we may have a race between try_to_wake_up() and
      migrate_live_tasks() -> move_task_off_dead_cpu() when the later one
      may end up looping endlessly.
      
      Interrupts are enabled on other CPUs when migration_call(CPU_DEAD, ...) is
      called so we may get a race between try_to_wake_up() and
      migrate_live_tasks() -> move_task_off_dead_cpu(). The former one may push
      a task out of a dead CPU causing the later one to loop endlessly.
      
      Heiko Carstens observed:
      
      | That's exactly what explains a dump I got yesterday. Thanks for fixing! :)
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Cc: miaox@cn.fujitsu.com
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dc7fab8b
    • T
      genirq: remove extraneous checks in manage.c · 48627d8d
      Thomas Gleixner 提交于
      In http://bugzilla.kernel.org/show_bug.cgi?id=9580 it was pointed out
      that the desc->chip checks are extraneous. In fact these are left
      overs from early development and can be removed safely.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      48627d8d
  9. 09 7月, 2008 1 次提交