1. 28 1月, 2014 1 次提交
  2. 14 1月, 2014 1 次提交
  3. 13 1月, 2014 1 次提交
    • P
      locking: Optimize lock_bh functions · 9ea4c380
      Peter Zijlstra 提交于
      Currently all _bh_ lock functions do two preempt_count operations:
      
        local_bh_disable();
        preempt_disable();
      
      and for the unlock:
      
        preempt_enable_no_resched();
        local_bh_enable();
      
      Since its a waste of perfectly good cycles to modify the same variable
      twice when you can do it in one go; use the new
      __local_bh_{dis,en}able_ip() functions that allow us to provide a
      preempt_count value to add/sub.
      
      So define SOFTIRQ_LOCK_OFFSET as the offset a _bh_ lock needs to
      add/sub to be done in one go.
      
      As a bonus it gets rid of the preempt_enable_no_resched() usage.
      
      This reduces a 1000 loops of:
      
        spin_lock_bh(&bh_lock);
        spin_unlock_bh(&bh_lock);
      
      from 53596 cycles to 51995 cycles. I didn't do enough measurements to
      say for absolute sure that the result is significant but the the few
      runs I did for each suggest it is so.
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: jacob.jun.pan@linux.intel.com
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: hpa@zytor.com
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: lenb@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: rui.zhang@intel.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9ea4c380
  4. 03 12月, 2013 1 次提交
    • F
      nohz: Convert a few places to use local per cpu accesses · e8fcaa5c
      Frederic Weisbecker 提交于
      A few functions use remote per CPU access APIs when they
      deal with local values.
      
      Just do the right conversion to improve performance, code
      readability and debug checks.
      
      While at it, lets extend some of these function names with *_this_cpu()
      suffix in order to display their purpose more clearly.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      e8fcaa5c
  5. 27 11月, 2013 1 次提交
  6. 20 11月, 2013 1 次提交
    • P
      lockdep: Correctly annotate hardirq context in irq_exit() · f1a83e65
      Peter Zijlstra 提交于
      There was a reported deadlock on -rt which lockdep didn't report.
      
      It turns out that in irq_exit() we tell lockdep that the hardirq
      context ends and then do all kinds of locking afterwards.
      
      To fix it, move trace_hardirq_exit() to the very end of irq_exit(), this
      ensures all locking in tick_irq_exit() and rcu_irq_exit() are properly
      recorded as happening from hardirq context.
      
      This however leads to the 'fun' little problem of running softirqs
      while in hardirq context. To cure this make the softirq code a little
      more complex (in the CONFIG_TRACE_IRQFLAGS case).
      
      Due to stack swizzling arch dependent trickery we cannot pass an
      argument to __do_softirq() to tell it if it was done from hardirq
      context or not; so use a side-band argument.
      
      When we do __do_softirq() from hardirq context, 'atomically' flip to
      softirq context and back, so that no locking goes without being in
      either hard- or soft-irq context.
      
      I didn't find any new problems in mainline using this patch, but it
      did show the -rt problem.
      Reported-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-dgwc5cdksbn0jk09vbmcc9sa@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f1a83e65
  7. 15 11月, 2013 1 次提交
  8. 01 10月, 2013 6 次提交
    • F
      irq: Optimize softirq stack selection in irq exit · cc1f0274
      Frederic Weisbecker 提交于
      If irq_exit() is called on the arch's specified irq stack,
      it should be safe to run softirqs inline under that same
      irq stack as it is near empty by the time we call irq_exit().
      
      For example if we use the same stack for both hard and soft irqs here,
      the worst case scenario is:
      hardirq -> softirq -> hardirq. But then the softirq supersedes the
      first hardirq as the stack user since irq_exit() is called in
      a mostly empty stack. So the stack merge in this case looks acceptable.
      
      Stack overrun still have a chance to happen if hardirqs have more
      opportunities to nest, but then it's another problem to solve.
      
      So lets adapt the irq exit's softirq stack on top of a new Kconfig symbol
      that can be defined when irq_exit() runs on the irq stack. That way
      we can spare some stack switch on irq processing and all the cache
      issues that come along.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      cc1f0274
    • F
      irq: Justify the various softirq stack choices · 0bed698a
      Frederic Weisbecker 提交于
      For clarity, comment the various stack choices for softirqs
      processing, whether we execute them from ksoftirqd or
      local_irq_enable() calls.
      
      Their use on irq_exit() is already commented.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      0bed698a
    • F
      irq: Improve a bit softirq debugging · 5d60d3e7
      Frederic Weisbecker 提交于
      do_softirq() has a debug check that verifies that it is not nesting
      on softirqs processing, nor miscounting the softirq part of the preempt
      count.
      
      But making sure that softirqs processing don't nest is actually a more
      generic concern that applies to any caller of __do_softirq().
      
      Do take it one step further and generalize that debug check to
      any softirq processing.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      5d60d3e7
    • F
      irq: Optimize call to softirq on hardirq exit · be6e1016
      Frederic Weisbecker 提交于
      Before processing softirqs on hardirq exit, we already
      do the check for pending softirqs while hardirqs are
      guaranteed to be disabled.
      
      So we can take a shortcut and safely jump to the arch
      specific implementation directly.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      be6e1016
    • F
      irq: Consolidate do_softirq() arch overriden implementations · 7d65f4a6
      Frederic Weisbecker 提交于
      All arch overriden implementations of do_softirq() share the following
      common code: disable irqs (to avoid races with the pending check),
      check if there are softirqs pending, then execute __do_softirq() on
      a specific stack.
      
      Consolidate the common parts such that archs only worry about the
      stack switch.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      7d65f4a6
    • F
      irq: Force hardirq exit's softirq processing on its own stack · ded79754
      Frederic Weisbecker 提交于
      The commit facd8b80
      ("irq: Sanitize invoke_softirq") converted irq exit
      calls of do_softirq() to __do_softirq() on all architectures,
      assuming it was only used there for its irq disablement
      properties.
      
      But as a side effect, the softirqs processed in the end
      of the hardirq are always called on the inline current
      stack that is used by irq_exit() instead of the softirq
      stack provided by the archs that override do_softirq().
      
      The result is mostly safe if the architecture runs irq_exit()
      on a separate irq stack because then softirqs are processed
      on that same stack that is near empty at this stage (assuming
      hardirq aren't nesting).
      
      Otherwise irq_exit() runs in the task stack and so does the softirq
      too. The interrupted call stack can be randomly deep already and
      the softirq can dig through it even further. To add insult to the
      injury, this softirq can be interrupted by a new hardirq, maximizing
      the chances for a stack overrun as reported in powerpc for example:
      
      	do_IRQ: stack overflow: 1920
      	CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1
      	Call Trace:
      	[c0000000050a8740] .show_stack+0x130/0x200 (unreliable)
      	[c0000000050a8810] .dump_stack+0x28/0x3c
      	[c0000000050a8880] .do_IRQ+0x2b8/0x2c0
      	[c0000000050a8930] hardware_interrupt_common+0x154/0x180
      	--- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp]
      		LR = .cp_start_xmit+0x390/0x820 [8139cp]
      	[c0000000050a8d40] .dev_hard_start_xmit+0x394/0x640
      	[c0000000050a8e00] .sch_direct_xmit+0x110/0x260
      	[c0000000050a8ea0] .dev_queue_xmit+0x260/0x630
      	[c0000000050a8f40] .br_dev_queue_push_xmit+0xc4/0x130 [bridge]
      	[c0000000050a8fc0] .br_dev_xmit+0x198/0x270 [bridge]
      	[c0000000050a9070] .dev_hard_start_xmit+0x394/0x640
      	[c0000000050a9130] .dev_queue_xmit+0x428/0x630
      	[c0000000050a91d0] .ip_finish_output+0x2a4/0x550
      	[c0000000050a9290] .ip_local_out+0x50/0x70
      	[c0000000050a9310] .ip_queue_xmit+0x148/0x420
      	[c0000000050a93b0] .tcp_transmit_skb+0x4e4/0xaf0
      	[c0000000050a94a0] .__tcp_ack_snd_check+0x7c/0xf0
      	[c0000000050a9520] .tcp_rcv_established+0x1e8/0x930
      	[c0000000050a95f0] .tcp_v4_do_rcv+0x21c/0x570
      	[c0000000050a96c0] .tcp_v4_rcv+0x734/0x930
      	[c0000000050a97a0] .ip_local_deliver_finish+0x184/0x360
      	[c0000000050a9840] .ip_rcv_finish+0x148/0x400
      	[c0000000050a98d0] .__netif_receive_skb_core+0x4f8/0xb00
      	[c0000000050a99d0] .netif_receive_skb+0x44/0x110
      	[c0000000050a9a70] .br_handle_frame_finish+0x2bc/0x3f0 [bridge]
      	[c0000000050a9b20] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge]
      	[c0000000050a9bd0] .br_nf_pre_routing+0x4dc/0x7d0 [bridge]
      	[c0000000050a9c70] .nf_iterate+0x114/0x130
      	[c0000000050a9d30] .nf_hook_slow+0xb4/0x1e0
      	[c0000000050a9e00] .br_handle_frame+0x290/0x330 [bridge]
      	[c0000000050a9ea0] .__netif_receive_skb_core+0x34c/0xb00
      	[c0000000050a9fa0] .netif_receive_skb+0x44/0x110
      	[c0000000050aa040] .napi_gro_receive+0xe8/0x120
      	[c0000000050aa0c0] .cp_rx_poll+0x31c/0x590 [8139cp]
      	[c0000000050aa1d0] .net_rx_action+0x1dc/0x310
      	[c0000000050aa2b0] .__do_softirq+0x158/0x330
      	[c0000000050aa3b0] .irq_exit+0xc8/0x110
      	[c0000000050aa430] .do_IRQ+0xdc/0x2c0
      	[c0000000050aa4e0] hardware_interrupt_common+0x154/0x180
      	 --- Exception: 501 at .bad_range+0x1c/0x110
      		 LR = .get_page_from_freelist+0x908/0xbb0
      	[c0000000050aa7d0] .list_del+0x18/0x50 (unreliable)
      	[c0000000050aa850] .get_page_from_freelist+0x908/0xbb0
      	[c0000000050aa9e0] .__alloc_pages_nodemask+0x21c/0xae0
      	[c0000000050aaba0] .alloc_pages_vma+0xd0/0x210
      	[c0000000050aac60] .handle_pte_fault+0x814/0xb70
      	[c0000000050aad50] .__get_user_pages+0x1a4/0x640
      	[c0000000050aae60] .get_user_pages_fast+0xec/0x160
      	[c0000000050aaf10] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm]
      	[c0000000050aafd0] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm]
      	[c0000000050ab070] .kvmppc_mmu_map_page+0x94/0x530 [kvm]
      	[c0000000050ab190] .kvmppc_handle_pagefault+0x174/0x610 [kvm]
      	[c0000000050ab270] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm]
      	[c0000000050ab320]  kvm_start_lightweight+0x1ec/0x1fc [kvm]
      	[c0000000050ab4f0] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm]
      	[c0000000050ab9c0] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
      	[c0000000050aba50] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm]
      	[c0000000050abae0] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
      	[c0000000050abc90] .do_vfs_ioctl+0x4ec/0x7c0
      	[c0000000050abd80] .SyS_ioctl+0xd4/0xf0
      	[c0000000050abe30] syscall_exit+0x0/0x98
      
      Since this is a regression, this patch proposes a minimalistic
      and low-risk solution by blindly forcing the hardirq exit processing of
      softirqs on the softirq stack. This way we should reduce significantly
      the opportunities for task stack overflow dug by softirqs.
      
      Longer term solutions may involve extending the hardirq stack coverage to
      irq_exit(), etc...
      Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: #3.9.. <stable@vger.kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      ded79754
  9. 25 9月, 2013 2 次提交
  10. 13 9月, 2013 1 次提交
  11. 15 7月, 2013 1 次提交
    • P
      kernel: delete __cpuinit usage from all core kernel files · 0db0628d
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the uses of the __cpuinit macros from C files in
      the core kernel directories (kernel, init, lib, mm, and include)
      that don't really have a specific maintainer.
      
      [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0db0628d
  12. 28 6月, 2013 1 次提交
  13. 11 6月, 2013 1 次提交
    • B
      Fix lockup related to stop_machine being stuck in __do_softirq. · 34376a50
      Ben Greear 提交于
      The stop machine logic can lock up if all but one of the migration
      threads make it through the disable-irq step and the one remaining
      thread gets stuck in __do_softirq.  The reason __do_softirq can hang is
      that it has a bail-out based on jiffies timeout, but in the lockup case,
      jiffies itself is not incremented.
      
      To work around this, re-add the max_restart counter in __do_irq and stop
      processing irqs after 10 restarts.
      
      Thanks to Tejun Heo and Rusty Russell and others for helping me track
      this down.
      
      This was introduced in 3.9 by commit c10d7367 ("softirq: reduce
      latencies").
      
      It may be worth looking into ath9k to see if it has issues with its irq
      handler at a later date.
      
      The hang stack traces look something like this:
      
          ------------[ cut here ]------------
          WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
          Watchdog detected hard LOCKUP on cpu 2
          Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
          Pid: 23, comm: migration/2 Tainted: G         C   3.9.4+ #11
          Call Trace:
           <NMI>   warn_slowpath_common+0x85/0x9f
            warn_slowpath_fmt+0x46/0x48
            watchdog_overflow_callback+0x9c/0xa7
            __perf_event_overflow+0x137/0x1cb
            perf_event_overflow+0x14/0x16
            intel_pmu_handle_irq+0x2dc/0x359
            perf_event_nmi_handler+0x19/0x1b
            nmi_handle+0x7f/0xc2
            do_nmi+0xbc/0x304
            end_repeat_nmi+0x1e/0x2e
           <<EOE>>
            cpu_stopper_thread+0xae/0x162
            smpboot_thread_fn+0x258/0x260
            kthread+0xc7/0xcf
            ret_from_fork+0x7c/0xb0
          ---[ end trace 4947dfa9b0a4cec3 ]---
          BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
          Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
          irq event stamp: 835637905
          hardirqs last  enabled at (835637904): __do_softirq+0x9f/0x257
          hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
          softirqs last  enabled at (5654720): __do_softirq+0x1ff/0x257
          softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
          CPU 1
          Pid: 17, comm: migration/1 Tainted: G        WC   3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
          RIP: tasklet_hi_action+0xf0/0xf0
          Process migration/1
          Call Trace:
           <IRQ>
            __do_softirq+0x117/0x257
            irq_exit+0x5f/0xbb
            smp_apic_timer_interrupt+0x8a/0x98
            apic_timer_interrupt+0x72/0x80
           <EOI>
            printk+0x4d/0x4f
            stop_machine_cpu_stop+0x22c/0x274
            cpu_stopper_thread+0xae/0x162
            smpboot_thread_fn+0x258/0x260
            kthread+0xc7/0xcf
            ret_from_fork+0x7c/0xb0
      Signed-off-by: NBen Greear <greearb@candelatech.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPekka Riikonen <priikone@iki.fi>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34376a50
  14. 01 5月, 2013 1 次提交
  15. 23 4月, 2013 1 次提交
    • F
      nohz: Disable the tick when irq resume in full dynticks CPU · 67826eae
      Frederic Weisbecker 提交于
      Eventually try to disable tick on irq exit, now that the
      fundamental infrastructure is in place.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      67826eae
  16. 03 4月, 2013 1 次提交
    • F
      nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON · 3451d024
      Frederic Weisbecker 提交于
      We are planning to convert the dynticks Kconfig options layout
      into a choice menu. The user must be able to easily pick
      any of the following implementations: constant periodic tick,
      idle dynticks, full dynticks.
      
      As this implies a mutual exclusion, the two dynticks implementions
      need to converge on the selection of a common Kconfig option in order
      to ease the sharing of a common infrastructure.
      
      It would thus seem pretty natural to reuse CONFIG_NO_HZ to
      that end. It already implements all the idle dynticks code
      and the full dynticks depends on all that code for now.
      So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
      CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.
      
      On the other hand we want to stay backward compatible: if
      CONFIG_NO_HZ is set in an older config file, we want to
      enable CONFIG_NO_HZ_IDLE by default.
      
      But we can't afford both at the same time or we run into
      a circular dependency:
      
      1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
         CONFIG_NO_HZ
      2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE
      
      We might be able to support that from Kconfig/Kbuild but it
      may not be wise to introduce such a confusing behaviour.
      
      So to solve this, create a new CONFIG_NO_HZ_COMMON option
      which gathers the common code between idle and full dynticks
      (that common code for now is simply the idle dynticks code)
      and select it from their referring Kconfig.
      
      Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
      to it for backward compatibility.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      3451d024
  17. 01 3月, 2013 1 次提交
    • F
      irq: Don't re-enable interrupts at the end of irq_exit · 4cd5d111
      Frederic Weisbecker 提交于
      Commit 74eed016
      "irq: Ensure irq_exit() code runs with interrupts disabled"
      restore interrupts flags in the end of irq_exit() for archs
      that don't define __ARCH_IRQ_EXIT_IRQS_DISABLED.
      
      However always returning from irq_exit() with interrupts
      disabled should not be a problem for these archs. Prior to
      this commit this was already happening anytime we processed
      pending softirqs anyway.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      4cd5d111
  18. 22 2月, 2013 3 次提交
    • F
      irq: Remove IRQ_EXIT_OFFSET workaround · 4d4c4e24
      Frederic Weisbecker 提交于
      The IRQ_EXIT_OFFSET trick was used to make sure the irq
      doesn't get preempted after we substract the HARDIRQ_OFFSET
      until we are entirely done with any code in irq_exit().
      
      This workaround was necessary because some archs may call
      irq_exit() with irqs enabled and there is still some code
      in the end of this function that is not covered by the
      HARDIRQ_OFFSET but want to stay non-preemptible.
      
      Now that irq are always disabled in irq_exit(), the whole code
      is guaranteed not to be preempted. We can thus remove this hack.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      4d4c4e24
    • T
      irq: Sanitize invoke_softirq · facd8b80
      Thomas Gleixner 提交于
      With the irq protection in irq_exit, we can remove the #ifdeffery and
      the bh_disable/enable dance in invoke_softirq()
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos
      facd8b80
    • T
      irq: Ensure irq_exit() code runs with interrupts disabled · 74eed016
      Thomas Gleixner 提交于
      We had already a few problems with code called from irq_exit() when
      interrupted from a nesting interrupt. This can happen on architectures
      which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED.
      
      __ARCH_IRQ_EXIT_IRQS_DISABLED should go away and we want to make it
      mandatory to call irq_exit() with interrupts disabled.
      
      As a temporary protection disable interrupts for those architectures
      which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED and add a WARN_ONCE
      when an architecture which defines __ARCH_IRQ_EXIT_IRQS_DISABLED calls
      irq_exit() with interrupts enabled.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linuxfoundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos
      74eed016
  19. 28 1月, 2013 1 次提交
    • F
      cputime: Safely read cputime of full dynticks CPUs · 6a61671b
      Frederic Weisbecker 提交于
      While remotely reading the cputime of a task running in a
      full dynticks CPU, the values stored in utime/stime fields
      of struct task_struct may be stale. Its values may be those
      of the last kernel <-> user transition time snapshot and
      we need to add the tickless time spent since this snapshot.
      
      To fix this, flush the cputime of the dynticks CPUs on
      kernel <-> user transition and record the time / context
      where we did this. Then on top of this snapshot and the current
      time, perform the fixup on the reader side from task_times()
      accessors.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [fixed kvm module related build errors]
      Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com>
      6a61671b
  20. 11 1月, 2013 1 次提交
    • E
      softirq: reduce latencies · c10d7367
      Eric Dumazet 提交于
      In various network workloads, __do_softirq() latencies can be up
      to 20 ms if HZ=1000, and 200 ms if HZ=100.
      
      This is because we iterate 10 times in the softirq dispatcher,
      and some actions can consume a lot of cycles.
      
      This patch changes the fallback to ksoftirqd condition to :
      
      - A time limit of 2 ms.
      - need_resched() being set on current task
      
      When one of this condition is met, we wakeup ksoftirqd for further
      softirq processing if we still have pending softirqs.
      
      Using need_resched() as the only condition can trigger RCU stalls,
      as we can keep BH disabled for too long.
      
      I ran several benchmarks and got no significant difference in
      throughput, but a very significant reduction of latencies (one order
      of magnitude) :
      
      In following bench, 200 antagonist "netperf -t TCP_RR" are started in
      background, using all available cpus.
      
      Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
      IRQ (hard+soft)
      
      Before patch :
      
      # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
      RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
      to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
      RT_LATENCY=550110.424
      MIN_LATENCY=146858
      MAX_LATENCY=997109
      P50_LATENCY=305000
      P90_LATENCY=550000
      P99_LATENCY=710000
      MEAN_LATENCY=376989.12
      STDDEV_LATENCY=184046.92
      
      After patch :
      
      # netperf -H 7.7.7.84 -t TCP_RR -T2,2 -- -k
      RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
      to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
      RT_LATENCY=40545.492
      MIN_LATENCY=9834
      MAX_LATENCY=78366
      P50_LATENCY=33583
      P90_LATENCY=59000
      P99_LATENCY=69000
      MEAN_LATENCY=38364.67
      STDDEV_LATENCY=12865.26
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c10d7367
  21. 30 10月, 2012 1 次提交
    • F
      cputime: Specialize irq vtime hooks · fa5058f3
      Frederic Weisbecker 提交于
      With CONFIG_VIRT_CPU_ACCOUNTING, when vtime_account()
      is called in irq entry/exit, we perform a check on the
      context: if we are interrupting the idle task we
      account the pending cputime to idle, otherwise account
      to system time or its sub-areas: tsk->stime, hardirq time,
      softirq time, ...
      
      However this check for idle only concerns the hardirq entry
      and softirq entry:
      
      * Hardirq may directly interrupt the idle task, in which case
      we need to flush the pending CPU time to idle.
      
      * The idle task may be directly interrupted by a softirq if
      it calls local_bh_enable(). There is probably no such call
      in any idle task but we need to cover every case. Ksoftirqd
      is not concerned because the idle time is flushed on context
      switch and softirq in the end of hardirq have the idle time
      already flushed from the hardirq entry.
      
      In the other cases we always account to system/irq time:
      
      * On hardirq exit we account the time to hardirq time.
      * On softirq exit we account the time to softirq time.
      
      To optimize this and avoid the indirect call to vtime_account()
      and the checks it performs, specialize the vtime irq APIs and
      only perform the check on irq entry. Irq exit can directly call
      vtime_account_system().
      
      CONFIG_IRQ_TIME_ACCOUNTING behaviour doesn't change and directly
      maps to its own vtime_account() implementation. One may want
      to take benefits from the new APIs to optimize irq time accounting
      as well in the future.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      fa5058f3
  22. 25 9月, 2012 1 次提交
    • F
      cputime: Use a proper subsystem naming for vtime related APIs · bf9fae9f
      Frederic Weisbecker 提交于
      Use a naming based on vtime as a prefix for virtual based
      cputime accounting APIs:
      
      - account_system_vtime() -> vtime_account()
      - account_switch_vtime() -> vtime_task_switch()
      
      It makes it easier to allow for further declension such
      as vtime_account_system(), vtime_account_idle(), ... if we
      want to find out the context we account to from generic code.
      
      This also make it better to know on which subsystem these APIs
      refer to.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      bf9fae9f
  23. 13 8月, 2012 1 次提交
  24. 01 8月, 2012 1 次提交
    • M
      mm: allow PF_MEMALLOC from softirq context · 907aed48
      Mel Gorman 提交于
      This is needed to allow network softirq packet processing to make use of
      PF_MEMALLOC.
      
      Currently softirq context cannot use PF_MEMALLOC due to it not being
      associated with a task, and therefore not having task flags to fiddle with
      - thus the gfp to alloc flag mapping ignores the task flags when in
      interrupts (hard or soft) context.
      
      Allowing softirqs to make use of PF_MEMALLOC therefore requires some
      trickery.  This patch borrows the task flags from whatever process happens
      to be preempted by the softirq.  It then modifies the gfp to alloc flags
      mapping to not exclude task flags in softirq context, and modify the
      softirq code to save, clear and restore the PF_MEMALLOC flag.
      
      The save and clear, ensures the preempted task's PF_MEMALLOC flag doesn't
      leak into the softirq.  The restore ensures a softirq's PF_MEMALLOC flag
      cannot leak back into the preempted process.  This should be safe due to
      the following reasons
      
      Softirqs can run on multiple CPUs sure but the same task should not be
      	executing the same softirq code. Neither should the softirq
      	handler be preempted by any other softirq handler so the flags
      	should not leak to an unrelated softirq.
      
      Softirqs re-enable hardware interrupts in __do_softirq() so can be
      	preempted by hardware interrupts so PF_MEMALLOC is inherited
      	by the hard IRQ. However, this is similar to a process in
      	reclaim being preempted by a hardirq. While PF_MEMALLOC is
      	set, gfp_to_alloc_flags() distinguishes between hard and
      	soft irqs and avoids giving a hardirq the ALLOC_NO_WATERMARKS
      	flag.
      
      If the softirq is deferred to ksoftirq then its flags may be used
              instead of a normal tasks but as the softirq cannot be preempted,
              the PF_MEMALLOC flag does not leak to other code by accident.
      
      [davem@davemloft.net: Document why PF_MEMALLOC is safe]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      907aed48
  25. 06 3月, 2012 1 次提交
  26. 01 3月, 2012 2 次提交
  27. 15 2月, 2012 1 次提交
    • F
      timer: Fix bad idle check on irq entry · 0a8a2e78
      Frederic Weisbecker 提交于
      idle_cpu() is called on irq entry to guess if we need to call
      tick_check_idle(). This way we can catch up with jiffies if the tick
      was stopped, stop accounting idle time during the interrupt and
      maintain the sched clock if it is unstable.
      
      But if we are going to exit the idle loop to schedule a new task (ie:
      if we have a task in the runqueue or a remotely enqueued ttwu to
      perform), the idle_cpu() check will return 0 such that we miss the
      call to tick_check_idle() for all interrupts happening before we
      schedule the new task.
      
      As a result these interrupts and the softirqs coming along may deal
      with stale jiffies values, bad sched clock values, and won't substract
      their time from the idle time accounting.
      
      Fix this with using is_idle_task() instead that strictly checks that
      we are running the idle task, without caring about the fact we are
      going to schedule a task soon.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Link: http://lkml.kernel.org/r/1327427984-23282-3-git-send-email-fweisbec@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      0a8a2e78
  28. 03 2月, 2012 1 次提交
  29. 12 12月, 2011 2 次提交
    • F
      rcu: Fix early call to rcu_idle_enter() · 416eb33c
      Frederic Weisbecker 提交于
      On the irq exit path, tick_nohz_irq_exit()
      may raise a softirq, which action leads to the wake up
      path and select_task_rq_fair() that makes use of rcu
      to iterate the domains.
      
      This is an illegal use of RCU because we may be in RCU
      extended quiescent state if we interrupted an RCU-idle
      window in the idle loop:
      
      [  132.978883] ===============================
      [  132.978883] [ INFO: suspicious RCU usage. ]
      [  132.978883] -------------------------------
      [  132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
      [  132.978883]
      [  132.978883] other info that might help us debug this:
      [  132.978883]
      [  132.978883]
      [  132.978883] rcu_scheduler_active = 1, debug_locks = 0
      [  132.978883] RCU used illegally from extended quiescent state!
      [  132.978883] 2 locks held by swapper/0:
      [  132.978883]  #0:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
      [  132.978883]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
      [  132.978883]
      [  132.978883] stack backtrace:
      [  132.978883] Pid: 0, comm: swapper Tainted: G        W   3.0.0+ #178
      [  132.978883] Call Trace:
      [  132.978883]  <IRQ>  [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
      [  132.978883]  [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
      [  132.978883]  [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
      [  132.978883]  [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
      [  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
      [  132.978883]  [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
      [  132.978883]  [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
      [  132.978883]  [<ffffffff8105aa35>] wake_up_process+0x15/0x20
      [  132.978883]  [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
      [  132.978883]  [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
      [  132.978883]  [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
      [  132.978883]  [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
      [  132.978883]  [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
      [  132.978883]  [<ffffffff810694f2>] irq_exit+0xd2/0x100
      [  132.978883]  [<ffffffff81829e96>] do_IRQ+0x66/0xe0
      [  132.978883]  [<ffffffff81820d53>] common_interrupt+0x13/0x13
      [  132.978883]  <EOI>  [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
      [  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
      [  132.978883]  [<ffffffff810144ea>] default_idle+0xba/0x370
      [  132.978883]  [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
      [  132.978883]  [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
      [  132.978883]  [<ffffffff817f217f>] rest_init+0xef/0x150
      [  132.978883]  [<ffffffff817f20e2>] ? rest_init+0x52/0x150
      [  132.978883]  [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
      [  132.978883]  [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
      [  132.978883]  [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112
      
      Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      416eb33c
    • F
      nohz: Separate out irq exit and idle loop dyntick logic · 280f0677
      Frederic Weisbecker 提交于
      The tick_nohz_stop_sched_tick() function, which tries to delay
      the next timer tick as long as possible, can be called from two
      places:
      
      - From the idle loop to start the dytick idle mode
      - From interrupt exit if we have interrupted the dyntick
      idle mode, so that we reprogram the next tick event in
      case the irq changed some internal state that requires this
      action.
      
      There are only few minor differences between both that
      are handled by that function, driven by the ts->inidle
      cpu variable and the inidle parameter. The whole guarantees
      that we only update the dyntick mode on irq exit if we actually
      interrupted the dyntick idle mode, and that we enter in RCU extended
      quiescent state from idle loop entry only.
      
      Split this function into:
      
      - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
      dynticks idle mode unconditionally if it can, and enters into RCU
      extended quiescent state.
      
      - tick_nohz_irq_exit() which only updates the dynticks idle mode
      when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
      
      To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
      into tick_nohz_idle_exit().
      
      This simplifies the code and micro-optimize the irq exit path (no need
      for local_irq_save there). This also prepares for the split between
      dynticks and rcu extended quiescent state logics. We'll need this split to
      further fix illegal uses of RCU in extended quiescent states in the idle
      loop.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      280f0677
  30. 31 10月, 2011 1 次提交