1. 27 1月, 2012 1 次提交
    • P
      bugs, x86: Fix printk levels for panic, softlockups and stack dumps · b0f4c4b3
      Prarit Bhargava 提交于
      rsyslog will display KERN_EMERG messages on a connected
      terminal.  However, these messages are useless/undecipherable
      for a general user.
      
      For example, after a softlockup we get:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
       kernel:Stack:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
       kernel:Call Trace:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
       kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89
       d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <e8> ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89
      
      This happens because the printk levels for these messages are
      incorrect. Only an informational message should be displayed on
      a terminal.
      
      I modified the printk levels for various messages in the kernel
      and tested the output by using the drivers/misc/lkdtm.c kernel
      modules (ie, softlockups, panics, hard lockups, etc.) and
      confirmed that the console output was still the same and that
      the output to the terminals was correct.
      
      For example, in the case of a softlockup we now see the much
      more informative:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ...
       BUG: soft lockup - CPU4 stuck for 60s!
      
      instead of the above confusing messages.
      
      AFAICT, the messages no longer have to be KERN_EMERG.  In the
      most important case of a panic we set console_verbose().  As for
      the other less severe cases the correct data is output to the
      console and /var/log/messages.
      
      Successfully tested by me using the drivers/misc/lkdtm.c module.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: dzickus@redhat.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b0f4c4b3
  2. 04 1月, 2012 2 次提交
    • E
      x86: Fix atomic64_xxx_cx8() functions · ceb7b40b
      Eric Dumazet 提交于
      It appears about all functions in arch/x86/lib/atomic64_cx8_32.S
      are wrong in case cmpxchg8b must be restarted, because
      LOCK_PREFIX macro defines a label "1" clashing with other local
      labels :
      
      1:
      	some_instructions
      	LOCK_PREFIX
      	cmpxchg8b (%ebp)
      	jne 1b  / jumps to beginning of LOCK_PREFIX !
      
      A possible fix is to use a magic label "672" in LOCK_PREFIX asm
      definition, similar to the "671" one we defined in
      LOCK_PREFIX_HERE.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NJan Beulich <JBeulich@suse.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1325608540.2320.103.camel@edumazet-HP-Compaq-6005-Pro-SFF-PCSigned-off-by: NIngo Molnar <mingo@elte.hu>
      ceb7b40b
    • J
      x86: Fix and improve cmpxchg_double{,_local}() · cdcd6298
      Jan Beulich 提交于
      Just like the per-CPU ones they had several
      problems/shortcomings:
      
      Only the first memory operand was mentioned in the asm()
      operands, and the 2x64-bit version didn't have a memory clobber
      while the 2x32-bit one did. The former allowed the compiler to
      not recognize the need to re-load the data in case it had it
      cached in some register, while the latter was overly
      destructive.
      
      The types of the local copies of the old and new values were
      incorrect (the types of the pointed-to variables should be used
      here, to make sure the respective old/new variable types are
      compatible).
      
      The __dummy/__junk variables were pointless, given that local
      copies of the inputs already existed (and can hence be used for
      discarded outputs).
      
      The 32-bit variant of cmpxchg_double_local() referenced
      cmpxchg16b_local().
      
      At once also:
      
       - change the return value type to what it really is: 'bool'
       - unify 32- and 64-bit variants
       - abstract out the common part of the 'normal' and 'local' variants
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/4F01F12A020000780006A19B@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      cdcd6298
  3. 26 12月, 2011 1 次提交
    • J
      KVM: Don't automatically expose the TSC deadline timer in cpuid · 4d25a066
      Jan Kiszka 提交于
      Unlike all of the other cpuid bits, the TSC deadline timer bit is set
      unconditionally, regardless of what userspace wants.
      
      This is broken in several ways:
       - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
         deadline timer feature, a guest that uses the feature will break
       - live migration to older host kernels that don't support the TSC deadline
         timer will cause the feature to be pulled from under the guest's feet;
         breaking it
       - guests that are broken wrt the feature will fail.
      
      Fix by not enabling the feature automatically; instead report it to userspace.
      Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
      will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
      KVM_GET_SUPPORTED_CPUID.
      
      Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.
      
      [avi: add the KVM_CAP + documentation]
      Reported-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
      Tested-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4d25a066
  4. 25 12月, 2011 1 次提交
    • J
      KVM: x86: Prevent starting PIT timers in the absence of irqchip support · 0924ab2c
      Jan Kiszka 提交于
      User space may create the PIT and forgets about setting up the irqchips.
      In that case, firing PIT IRQs will crash the host:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
      IP: [<ffffffffa10f6280>] kvm_set_irq+0x30/0x170 [kvm]
      ...
      Call Trace:
       [<ffffffffa11228c1>] pit_do_work+0x51/0xd0 [kvm]
       [<ffffffff81071431>] process_one_work+0x111/0x4d0
       [<ffffffff81071bb2>] worker_thread+0x152/0x340
       [<ffffffff81075c8e>] kthread+0x7e/0x90
       [<ffffffff815a4474>] kernel_thread_helper+0x4/0x10
      
      Prevent this by checking the irqchip mode before starting a timer. We
      can't deny creating the PIT if the irqchips aren't set up yet as
      current user land expects this order to work.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      0924ab2c
  5. 24 12月, 2011 6 次提交
  6. 21 12月, 2011 3 次提交
  7. 20 12月, 2011 2 次提交
  8. 18 12月, 2011 3 次提交
  9. 17 12月, 2011 1 次提交
  10. 16 12月, 2011 4 次提交
    • D
      x86_64, asm: Optimise fls(), ffs() and fls64() · ca3d30cc
      David Howells 提交于
      fls(N), ffs(N) and fls64(N) can be optimised on x86_64.  Currently they use a
      CMOV instruction after the BSR/BSF to set the destination register to -1 if the
      value to be scanned was 0 (in which case BSR/BSF set the Z flag).
      
      Instead, according to the AMD64 specification, we can make use of the fact that
      BSR/BSF doesn't modify its output register if its input is 0.  By preloading
      the output with -1 and incrementing the result, we achieve the desired result
      without the need for a conditional check.
      
      The Intel x86_64 specification, however, says that the result of BSR/BSF in
      such a case is undefined.  That said, when queried, one of the Intel CPU
      architects said that the behaviour on all Intel CPUs is that:
      
       (1) with BSRQ/BSFQ, the 64-bit destination register is written with its
           original value if the source is 0, thus, in essence, giving the effect we
           want.  And,
      
       (2) with BSRL/BSFL, the lower half of the 64-bit destination register is
           written with its original value if the source is 0, and the upper half is
           cleared, thus giving us the effect we want (we return a 4-byte int).
      
      Further, it was indicated that they (Intel) are unlikely to get away with
      changing the behaviour.
      
      It might be possible to optimise the 32-bit versions of these functions, but
      there's a lot more variation, and so the effective non-destructive property of
      BSRL/BSRF cannot be relied on.
      
      [ hpa: specifically, some 486 chips are known to NOT have this property. ]
      
      I have benchmarked these functions on my Core2 Duo test machine using the
      following program:
      
      	#include <stdlib.h>
      	#include <stdio.h>
      
      	#ifndef __x86_64__
      	#error
      	#endif
      
      	#define PAGE_SHIFT 12
      
      	typedef unsigned long long __u64, u64;
      	typedef unsigned int __u32, u32;
      	#define noinline	__attribute__((noinline))
      
      	static __always_inline int fls64(__u64 x)
      	{
      		long bitpos = -1;
      
      		asm("bsrq %1,%0"
      		    : "+r" (bitpos)
      		    : "rm" (x));
      		return bitpos + 1;
      	}
      
      	static inline unsigned long __fls(unsigned long word)
      	{
      		asm("bsr %1,%0"
      		    : "=r" (word)
      		    : "rm" (word));
      		return word;
      	}
      	static __always_inline int old_fls64(__u64 x)
      	{
      		if (x == 0)
      			return 0;
      		return __fls(x) + 1;
      	}
      
      	static noinline // __attribute__((const))
      	int old_get_order(unsigned long size)
      	{
      		int order;
      
      		size = (size - 1) >> (PAGE_SHIFT - 1);
      		order = -1;
      		do {
      			size >>= 1;
      			order++;
      		} while (size);
      		return order;
      	}
      
      	static inline __attribute__((const))
      	int get_order_old_fls64(unsigned long size)
      	{
      		int order;
      		size--;
      		size >>= PAGE_SHIFT;
      		order = old_fls64(size);
      		return order;
      	}
      
      	static inline __attribute__((const))
      	int get_order(unsigned long size)
      	{
      		int order;
      		size--;
      		size >>= PAGE_SHIFT;
      		order = fls64(size);
      		return order;
      	}
      
      	unsigned long prevent_optimise_out;
      
      	static noinline unsigned long test_old_get_order(void)
      	{
      		unsigned long n, total = 0;
      		long rep, loop;
      
      		for (rep = 1000000; rep > 0; rep--) {
      			for (loop = 0; loop <= 16384; loop += 4) {
      				n = 1UL << loop;
      				total += old_get_order(n);
      			}
      		}
      		return total;
      	}
      
      	static noinline unsigned long test_get_order_old_fls64(void)
      	{
      		unsigned long n, total = 0;
      		long rep, loop;
      
      		for (rep = 1000000; rep > 0; rep--) {
      			for (loop = 0; loop <= 16384; loop += 4) {
      				n = 1UL << loop;
      				total += get_order_old_fls64(n);
      			}
      		}
      		return total;
      	}
      
      	static noinline unsigned long test_get_order(void)
      	{
      		unsigned long n, total = 0;
      		long rep, loop;
      
      		for (rep = 1000000; rep > 0; rep--) {
      			for (loop = 0; loop <= 16384; loop += 4) {
      				n = 1UL << loop;
      				total += get_order(n);
      			}
      		}
      		return total;
      	}
      
      	int main(int argc, char **argv)
      	{
      		unsigned long total;
      
      		switch (argc) {
      		case 1:  total = test_old_get_order();		break;
      		case 2:  total = test_get_order_old_fls64();	break;
      		default: total = test_get_order();		break;
      		}
      		prevent_optimise_out = total;
      		return 0;
      	}
      
      This allows me to test the use of the old fls64() implementation and the new
      fls64() implementation and also to contrast these to the out-of-line loop-based
      implementation of get_order().  The results were:
      
      	warthog>time ./get_order
      	real    1m37.191s
      	user    1m36.313s
      	sys     0m0.861s
      	warthog>time ./get_order x
      	real    0m16.892s
      	user    0m16.586s
      	sys     0m0.287s
      	warthog>time ./get_order x x
      	real    0m7.731s
      	user    0m7.727s
      	sys     0m0.002s
      
      Using the current upstream fls64() as a basis for an inlined get_order() [the
      second result above] is much faster than using the current out-of-line
      loop-based get_order() [the first result above].
      
      Using my optimised inline fls64()-based get_order() [the third result above]
      is even faster still.
      
      [ hpa: changed the selection of 32 vs 64 bits to use CONFIG_X86_64
        instead of comparing BITS_PER_LONG, updated comments, rebased manually
        on top of 83d99df7 x86, bitops: Move fls64.h inside __KERNEL__ ]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Link: http://lkml.kernel.org/r/20111213145654.14362.39868.stgit@warthog.procyon.org.uk
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      ca3d30cc
    • H
      x86, bitops: Move fls64.h inside __KERNEL__ · 83d99df7
      H. Peter Anvin 提交于
      We would include <asm-generic/bitops/fls64.h> even without __KERNEL__,
      but that doesn't make sense, as:
      
      1. That file provides fls64(), but the corresponding function fls() is
         not exported to user space.
      2. The implementation of fls64.h uses kernel-only symbols.
      3. fls64.h is not exported to user space.
      
      This appears to have been a bug introduced in checkin:
      
      d57594c2 bitops: use __fls for fls64 on 64-bit archs
      
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Alexander van Heukelum <heukelum@mailshack.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/4EEA77E1.6050009@zytor.com
      83d99df7
    • I
      xen: only limit memory map to maximum reservation for domain 0. · d3db7281
      Ian Campbell 提交于
      d312ae87 "xen: use maximum reservation to limit amount of usable RAM"
      clamped the total amount of RAM to the current maximum reservation. This is
      correct for dom0 but is not correct for guest domains. In order to boot a guest
      "pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for
      future memory expansion the guest must derive max_pfn from the e820 provided by
      the toolstack and not the current maximum reservation (which can reflect only
      the current maximum, not the guest lifetime max). The existing algorithm
      already behaves this correctly if we do not artificially limit the maximum
      number of pages for the guest case.
      
      For a guest booted with maxmem=512, memory=128 this results in:
       [    0.000000] BIOS-provided physical RAM map:
       [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
       [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
      -[    0.000000]  Xen: 0000000000100000 - 0000000008100000 (usable)
      -[    0.000000]  Xen: 0000000008100000 - 0000000020800000 (unusable)
      +[    0.000000]  Xen: 0000000000100000 - 0000000020800000 (usable)
      ...
       [    0.000000] NX (Execute Disable) protection: active
       [    0.000000] DMI not present or invalid.
       [    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
       [    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
      -[    0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000
      +[    0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000
       [    0.000000] initial memory mapped : 0 - 027ff000
       [    0.000000] Base memory trampoline at [c009f000] 9f000 size 4096
      -[    0.000000] init_memory_mapping: 0000000000000000-0000000008100000
      -[    0.000000]  0000000000 - 0008100000 page 4k
      -[    0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000
      +[    0.000000] init_memory_mapping: 0000000000000000-0000000020800000
      +[    0.000000]  0000000000 - 0020800000 page 4k
      +[    0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000
       [    0.000000] xen: setting RW the range 27e8000 - 27ff000
       [    0.000000] 0MB HIGHMEM available.
      -[    0.000000] 129MB LOWMEM available.
      -[    0.000000]   mapped low ram: 0 - 08100000
      -[    0.000000]   low ram: 0 - 08100000
      +[    0.000000] 520MB LOWMEM available.
      +[    0.000000]   mapped low ram: 0 - 20800000
      +[    0.000000]   low ram: 0 - 20800000
      
      With this change "xl mem-set <domain> 512M" will successfully increase the
      guest RAM (by reducing the balloon).
      
      There is no change for dom0.
      Reported-and-Tested-by: NGeorge Shuklin <george.shuklin@gmail.com>
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: stable@kernel.org
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d3db7281
    • T
      x86, centaur: Enable cx8 for VIA Eden too · cb3f718d
      Timo Teräs 提交于
      My box with following cpuinfo needs the cx8 enabling still:
      
      vendor_id	: CentaurHauls
      cpu family	: 6
      model		: 13
      model name	: VIA Eden Processor 1200MHz
      stepping	: 0
      cpu MHz		: 1199.940
      cache size	: 128 KB
      
      This fixes valgrind to work on my box (it requires and checks
      cx8 from cpuinfo).
      Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
      Link: http://lkml.kernel.org/r/1323961888-10223-1-git-send-email-timo.teras@iki.fiSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      cb3f718d
  11. 15 12月, 2011 3 次提交
    • J
      x86: Fix and improve percpu_cmpxchg{8,16}b_double() · cebef5be
      Jan Beulich 提交于
      They had several problems/shortcomings:
      
      Only the first memory operand was mentioned in the 2x32bit asm()
      operands, and 2x64-bit version had a memory clobber. The first
      allowed the compiler to not recognize the need to re-load the
      data in case it had it cached in some register, and the second
      was overly destructive.
      
      The memory operand in the 2x32-bit asm() was declared to only be
      an output.
      
      The types of the local copies of the old and new values were
      incorrect (as in other per-CPU ops, the types of the per-CPU
      variables accessed should be used here, to make sure the
      respective types are compatible).
      
      The __dummy variable was pointless (and needlessly initialized
      in the 2x32-bit case), given that local copies of the inputs
      already exist.
      
      The 2x64-bit variant forced the address of the first object into
      %rsi, even though this is needed only for the call to the
      emulation function. The real cmpxchg16b can operate on an
      memory.
      
      At once also change the return value type to what it really is -
      'bool'.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/4EE86D6502000078000679FE@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      cebef5be
    • J
      x86: Report cpb and eff_freq_ro flags correctly · 969df4b8
      Joerg Roedel 提交于
      Add the flags to get rid of the [9] and [10] feature names
      in cpuinfo's 'power management' fields and replace them with
      meaningful names.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Link: http://lkml.kernel.org/r/1323875574-17881-1-git-send-email-joerg.roedel@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      969df4b8
    • F
      x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog · 29e9bf18
      Fenghua Yu 提交于
      Thermal throttle and power limit events are not defined as MCE errors in x86
      architecture and should not generate MCE errors in mcelog.
      
      Current kernel generates fake software defined MCE errors for these events.
      This may confuse users because they may think the machine has real MCE errors
      while actually only thermal throttle or power limit events happen.
      
      To make it worse, buggy firmware on some platforms may falsely generate
      the events. Therefore, kernel reports MCE errors which users think as real
      hardware errors. Although the firmware bugs should be fixed, on the other hand,
      kernel should not report MCE errors either.
      
      So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count,
      package_power_limit_count, core_throttle_count, and package_throttle_count) in
      /sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters
      for each event on each CPU. Please note that all CPU's on one package report
      duplicate counters. It's user application's responsibity to retrieve a package
      level counter for one package.
      
      This patch doesn't report package level power limit, core level power limit, and
      package level thermal throttle events in mcelog. When the events happen, only
      report them in respective counters in sysfs.
      
      Since core level thermal throttle has been legacy code in kernel for a while and
      users accepted it as MCE error in mcelog, core level thermal throttle is still
      reported in mcelog. In the mean time, the event is counted in a counter in sysfs
      as well.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Acked-by: NBorislav Petkov <bp@amd64.org>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      29e9bf18
  12. 14 12月, 2011 8 次提交
  13. 13 12月, 2011 2 次提交
  14. 12 12月, 2011 3 次提交
    • F
      nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu() · 1268fbc7
      Frederic Weisbecker 提交于
      Those two APIs were provided to optimize the calls of
      tick_nohz_idle_enter() and rcu_idle_enter() into a single
      irq disabled section. This way no interrupt happening in-between would
      needlessly process any RCU job.
      
      Now we are talking about an optimization for which benefits
      have yet to be measured. Let's start simple and completely decouple
      idle rcu and dyntick idle logics to simplify.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1268fbc7
    • F
      x86: Call idle notifier after irq_enter() · 98ad1cc1
      Frederic Weisbecker 提交于
      Interrupts notify the idle exit state before calling irq_enter().
      But the notifier code calls rcu_read_lock() and this is not
      allowed while rcu is in an extended quiescent state. We need
      to wait for irq_enter() -> rcu_idle_exit() to be called before
      doing so otherwise this results in a grumpy RCU:
      
      [    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
      [    0.099991] Hardware name: AMD690VM-FMH
      [    0.099991] Modules linked in:
      [    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
      [    0.099991] Call Trace:
      [    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
      [    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
      [    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
      [    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
      [    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
      [    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
      [    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
      [    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
      [    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
      [    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
      [    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
      [    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
      [    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
      [    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
      [    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
      [    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
      [    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Andy Henroid <andrew.d.henroid@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      98ad1cc1
    • F
      x86: Enter rcu extended qs after idle notifier call · e37e112d
      Frederic Weisbecker 提交于
      The idle notifier, called by enter_idle(), enters into rcu read
      side critical section but at that time we already switched into
      the RCU-idle window (rcu_idle_enter() has been called). And it's
      illegal to use rcu_read_lock() in that state.
      
      This results in rcu reporting its bad mood:
      
      [    1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
      [    1.275635] Hardware name: AMD690VM-FMH
      [    1.275635] Modules linked in:
      [    1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
      [    1.275635] Call Trace:
      [    1.275635]  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
      [    1.275635]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
      [    1.275635]  [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
      [    1.275635]  [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
      [    1.275635]  [<ffffffff810018a0>] enter_idle+0x20/0x30
      [    1.275635]  [<ffffffff81001995>] cpu_idle+0xa5/0x110
      [    1.275635]  [<ffffffff817a7465>] rest_init+0xe5/0x140
      [    1.275635]  [<ffffffff817a73c8>] ? rest_init+0x48/0x140
      [    1.275635]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
      [    1.275635]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
      [    1.275635]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
      [    1.275635] ---[ end trace a22d306b065d4a66 ]---
      
      Fix this by entering rcu extended quiescent state later, just before
      the CPU goes to sleep.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      e37e112d