1. 08 2月, 2009 3 次提交
    • S
      ftrace: change function graph tracer to use new in_nmi · 9a5fd902
      Steven Rostedt 提交于
      The function graph tracer piggy backed onto the dynamic ftracer
      to use the in_nmi custom code for dynamic tracing. The problem
      was (as Andrew Morton pointed out) it really only wanted to bail
      out if the context of the current CPU was in NMI context. But the
      dynamic ftrace in_nmi custom code was true if _any_ CPU happened
      to be in NMI context.
      
      Now that we have a generic in_nmi interface, this patch changes
      the function graph code to use it instead of the dynamic ftarce
      custom code.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      9a5fd902
    • S
      ftrace, x86: rename in_nmi variable · 4e6ea144
      Steven Rostedt 提交于
      Impact: clean up
      
      The in_nmi variable in x86 arch ftrace.c is a misnomer.
      Andrew Morton pointed out that the in_nmi variable is incremented
      by all CPUS. It can be set when another CPU is running an NMI.
      
      Since this is actually intentional, the fix is to rename it to
      what it really is: "nmi_running"
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      4e6ea144
    • S
      ring-buffer: add NMI protection for spinlocks · 78d904b4
      Steven Rostedt 提交于
      Impact: prevent deadlock in NMI
      
      The ring buffers are not yet totally lockless with writing to
      the buffer. When a writer crosses a page, it grabs a per cpu spinlock
      to protect against a reader. The spinlocks taken by a writer are not
      to protect against other writers, since a writer can only write to
      its own per cpu buffer. The spinlocks protect against readers that
      can touch any cpu buffer. The writers are made to be reentrant
      with the spinlocks disabling interrupts.
      
      The problem arises when an NMI writes to the buffer, and that write
      crosses a page boundary. If it grabs a spinlock, it can be racing
      with another writer (since disabling interrupts does not protect
      against NMIs) or with a reader on the same CPU. Luckily, most of the
      users are not reentrant and protects against this issue. But if a
      user of the ring buffer becomes reentrant (which is what the ring
      buffers do allow), if the NMI also writes to the ring buffer then
      we risk the chance of a deadlock.
      
      This patch moves the ftrace_nmi_enter called by nmi_enter() to the
      ring buffer code. It replaces the current ftrace_nmi_enter that is
      used by arch specific code to arch_ftrace_nmi_enter and updates
      the Kconfig to handle it.
      
      When an NMI is called, it will set a per cpu variable in the ring buffer
      code and will clear it when the NMI exits. If a write to the ring buffer
      crosses page boundaries inside an NMI, a trylock is used on the spin
      lock instead. If the spinlock fails to be acquired, then the entry
      is discarded.
      
      This bug appeared in the ftrace work in the RT tree, where event tracing
      is reentrant. This workaround solved the deadlocks that appeared there.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      78d904b4
  2. 31 1月, 2009 9 次提交
  3. 30 1月, 2009 2 次提交
  4. 29 1月, 2009 1 次提交
  5. 28 1月, 2009 1 次提交
  6. 27 1月, 2009 3 次提交
  7. 26 1月, 2009 3 次提交
    • R
      x86: fix section mismatch warning · 659d2618
      Rakib Mullick 提交于
      Here function vmi_activate calls a init function activate_vmi , which
      causes the following section mismatch warnings:
      
        LD      arch/x86/kernel/built-in.o
      WARNING: arch/x86/kernel/built-in.o(.text+0x13ba9): Section mismatch
      in reference from the function vmi_activate() to the function
      .init.text:vmi_time_init()
      The function vmi_activate() references
      the function __init vmi_time_init().
      This is often because vmi_activate lacks a __init
      annotation or the annotation of vmi_time_init is wrong.
      
      WARNING: arch/x86/kernel/built-in.o(.text+0x13bd1): Section mismatch
      in reference from the function vmi_activate() to the function
      .devinit.text:vmi_time_bsp_init()
      The function vmi_activate() references
      the function __devinit vmi_time_bsp_init().
      This is often because vmi_activate lacks a __devinit
      annotation or the annotation of vmi_time_bsp_init is wrong.
      
      WARNING: arch/x86/kernel/built-in.o(.text+0x13bdb): Section mismatch
      in reference from the function vmi_activate() to the function
      .devinit.text:vmi_time_ap_init()
      The function vmi_activate() references
      the function __devinit vmi_time_ap_init().
      This is often because vmi_activate lacks a __devinit
      annotation or the annotation of vmi_time_ap_init is wrong.
      
      Fix it by marking vmi_activate() as __init too.
      Signed-off-by: NRakib Mullick <rakib.mullick@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      659d2618
    • I
      x86: unmask CPUID levels on Intel CPUs, fix · 99fb4d34
      Ingo Molnar 提交于
      Impact: fix boot hang on pre-model-15 Intel CPUs
      
      rdmsrl_safe() does not work in very early bootup code yet, because we
      dont have the pagefault handler installed yet so exception section
      does not get parsed. rdmsr_safe() will just crash and hang the bootup.
      
      So limit the MSR_IA32_MISC_ENABLE MSR read to those CPU types that
      support it.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      99fb4d34
    • E
      x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn. · ef5fa0ab
      Eric Anholt 提交于
      In the absence of PAT, PAGE_KERNEL_WC ends up mapping to a memory type that
      gets UC behavior even in the presence of a WC MTRR covering the area in
      question.  By swapping to PAGE_KERNEL_UC_MINUS, we can get the actual
      behavior the caller wanted (WC if you can manage it, UC otherwise).
      
      This recovers the 40% performance improvement of using WC in the DRM
      to upload vertex data.
      Signed-off-by: NEric Anholt <eric@anholt.net>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ef5fa0ab
  8. 25 1月, 2009 1 次提交
    • I
      x86: use standard PIT frequency · e1b4d114
      Ingo Molnar 提交于
      the RDC and ELAN platforms use slighly different PIT clocks, resulting in
      a timex.h hack that changes PIT_TICK_RATE during build time. But if a
      tester enables any of these platform support .config options, the PIT
      will be miscalibrated on standard PC platforms.
      
      So use one frequency - in a subsequent patch we'll add a quirk to allow
      x86 platforms to define different PIT frequencies.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e1b4d114
  9. 24 1月, 2009 1 次提交
    • P
      x86, mm: fix pte_free() · 42ef73fe
      Peter Zijlstra 提交于
      On -rt we were seeing spurious bad page states like:
      
      Bad page state in process 'firefox'
      page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0
      Trying to fix it up, but a reboot is needed
      Backtrace:
      Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3
      [<c043d0f3>] ? printk+0x14/0x19
      [<c0272d4e>] bad_page+0x4e/0x79
      [<c0273831>] free_hot_cold_page+0x5b/0x1d3
      [<c02739f6>] free_hot_page+0xf/0x11
      [<c0273a18>] __free_pages+0x20/0x2b
      [<c027d170>] __pte_alloc+0x87/0x91
      [<c027d25e>] handle_mm_fault+0xe4/0x733
      [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
      [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
      [<c0218875>] do_page_fault+0x36f/0x88a
      
      This is the case where a concurrent fault already installed the PTE and
      we get to free the newly allocated one.
      
      This is due to pgtable_page_ctor() doing the spin_lock_init(&page->ptl)
      which is overlaid with the {private, mapping} struct.
      
      union {
          struct {
              unsigned long private;
              struct address_space *mapping;
          };
          spinlock_t ptl;
          struct kmem_cache *slab;
          struct page *first_page;
      };
      
      Normally the spinlock is small enough to not stomp on page->mapping, but
      PREEMPT_RT=y has huge 'spin'locks.
      
      But lockdep kernels should also be able to trigger this splat, as the
      lock tracking code grows the spinlock to cover page->mapping.
      
      The obvious fix is calling pgtable_page_dtor() like the regular pte free
      path __pte_free_tlb() does.
      
      It seems all architectures except x86 and nm10300 already do this, and
      nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
      doesn't do SMP or simply doesnt do MMU at all or something.
      Signed-off-by: NPeter Zijlstra <a.p.zijlsta@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      42ef73fe
  10. 22 1月, 2009 5 次提交
  11. 21 1月, 2009 4 次提交
    • T
      x86: mtrr fix debug boot parameter · 731f1872
      Thomas Renninger 提交于
      while looking at:
      
        http://bugzilla.kernel.org/show_bug.cgi?id=11541
      
      I realized that the mtrr.show param cannot work, because
      the code is processed much too early.
      
      This patch:
       - Declares mtrr.show as early_param
       - Stays consistent with the previous param (which I doubt
         that it ever worked), so mtrr.show=1 would still work
       - Declares mtrr_show as initdata
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      Acked-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      731f1872
    • S
      x86: fix page attribute corruption with cpa() · a1e46212
      Suresh Siddha 提交于
      Impact: fix sporadic slowdowns and warning messages
      
      This patch fixes a performance issue reported by Linus on his
      Nehalem system. While Linus reverted the PAT patch (commit
      58dab916) which exposed the issue,
      existing cpa() code can potentially still cause wrong(page attribute
      corruption) behavior.
      
      This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that
      various people reported.
      
      In 64bit kernel, kernel identity mapping might have holes depending
      on the available memory and how e820 reports the address range
      covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole
      in the address range that is not listed by e820 entries, kernel identity
      mapping will have a corresponding hole in its 1-1 identity mapping.
      
      If cpa() happens on the kernel identity mapping which falls into these holes,
      existing code fails like this:
      
      	__change_page_attr_set_clr()
      		__change_page_attr()
      			returns 0 because of if (!kpte). But doesn't
      			set cpa->numpages and cpa->pfn.
      		cpa_process_alias()
      			uses uninitialized cpa->pfn (random value)
      			which can potentially lead to changing the page
      			attribute of kernel text/data, kernel identity
      			mapping of RAM pages etc. oops!
      
      This bug was easily exposed by another PAT patch which was doing
      cpa() more often on kernel identity mapping holes (physical range between
      max_low_pfn_mapped and 4GB), where in here it was setting the
      cache disable attribute(PCD) for kernel identity mappings aswell.
      
      Fix cpa() to handle the kernel identity mapping holes. Retain
      the WARN() for cpa() calls to other not present address ranges
      (kernel-text/data, ioremap() addresses)
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a1e46212
    • I
      Revert "x86: signal: change type of paramter for sys_rt_sigreturn()" · 552b8aa4
      Ingo Molnar 提交于
      This reverts commit 4217458d.
      
      Justin Madru bisected this commit, it was causing weird Firefox
      crashes.
      
      The reason is that GCC mis-optimizes (re-uses) the on-stack parameters of
      the calling frame, which corrupts the syscall return pt_regs state and
      thus corrupts user-space register state.
      
      So we go back to the slightly less clean but more optimization-safe
      method of getting to pt_regs. Also add a comment to explain this.
      
      Resolves: http://bugzilla.kernel.org/show_bug.cgi?id=12505Reported-and-bisected-by: NJustin Madru <jdm64@gawab.com>
      Tested-by: NJustin Madru <jdm64@gawab.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      552b8aa4
    • A
      x86: use early clobbers in usercopy*.c · e0a96129
      Andi Kleen 提交于
      Impact: fix rare (but currently harmless) miscompile with certain configs and gcc versions
      
      Hugh Dickins noticed that strncpy_from_user() was miscompiled
      in some circumstances with gcc 4.3.
      
      Thanks to Hugh's excellent analysis it was easy to track down.
      
      Hugh writes:
      
      > Try building an x86_64 defconfig 2.6.29-rc1 kernel tree,
      > except not quite defconfig, switch CONFIG_PREEMPT_NONE=y
      > and CONFIG_PREEMPT_VOLUNTARY off (because it expands a
      > might_fault() there, which hides the issue): using a
      > gcc 4.3.2 (I've checked both openSUSE 11.1 and Fedora 10).
      >
      > It generates the following:
      >
      > 0000000000000000 <__strncpy_from_user>:
      >    0:   48 89 d1                mov    %rdx,%rcx
      >    3:   48 85 c9                test   %rcx,%rcx
      >    6:   74 0e                   je     16 <__strncpy_from_user+0x16>
      >    8:   ac                      lods   %ds:(%rsi),%al
      >    9:   aa                      stos   %al,%es:(%rdi)
      >    a:   84 c0                   test   %al,%al
      >    c:   74 05                   je     13 <__strncpy_from_user+0x13>
      >    e:   48 ff c9                dec    %rcx
      >   11:   75 f5                   jne    8 <__strncpy_from_user+0x8>
      >   13:   48 29 c9                sub    %rcx,%rcx
      >   16:   48 89 c8                mov    %rcx,%rax
      >   19:   c3                      retq
      >
      > Observe that "sub %rcx,%rcx; mov %rcx,%rax", whereas gcc 4.2.1
      > (and many other configs) say "sub %rcx,%rdx; mov %rdx,%rax".
      > Isn't it returning 0 when it ought to be returning strlen?
      
      The asm constraints for the strncpy_from_user() result were missing an
      early clobber, which tells gcc that the last output arguments
      are written before all input arguments are read.
      
      Also add more early clobbers in the rest of the file and fix 32-bit
      usercopy.c in the same way.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      [ since this API is rarely used and no in-kernel user relies on a 'len'
        return value (they only rely on negative return values) this miscompile
        was never noticed in the field. But it's worth fixing it nevertheless. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e0a96129
  12. 20 1月, 2009 5 次提交
  13. 19 1月, 2009 2 次提交
    • L
      x86: fix section mismatch warnings in kernel/setup_percpu.c · c7f8562a
      Leonardo Potenza 提交于
      The function setup_cpu_local_masks() has been marked __init, in
      order to remove the following section mismatch messages:
      
      WARNING: vmlinux.o(.text+0x3c2c7): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
      The function setup_cpu_local_masks() references
      the function __init alloc_bootmem_cpumask_var().
      This is often because setup_cpu_local_masks lacks a __init
      annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
      
      WARNING: vmlinux.o(.text+0x3c2d3): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
      The function setup_cpu_local_masks() references
      the function __init alloc_bootmem_cpumask_var().
      This is often because setup_cpu_local_masks lacks a __init
      annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
      
      WARNING: vmlinux.o(.text+0x3c2df): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
      The function setup_cpu_local_masks() references
      the function __init alloc_bootmem_cpumask_var().
      This is often because setup_cpu_local_masks lacks a __init
      annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
      
      WARNING: vmlinux.o(.text+0x3c2eb): Section mismatch in reference from the function setup_cpu_local_masks() to the function .init.text:alloc_bootmem_cpumask_var()
      The function setup_cpu_local_masks() references
      the function __init alloc_bootmem_cpumask_var().
      This is often because setup_cpu_local_masks lacks a __init
      annotation or the annotation of alloc_bootmem_cpumask_var is wrong.
      Signed-off-by: NLeonardo Potenza <lpotenza@inwind.it>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c7f8562a
    • M
      x86: put trigger in to detect mismatched apic versions · b2b815d8
      Mike Travis 提交于
      Impact: add debug warning
      
      Fire off one message if two apic's discovered with different
      apic versions. (this code is only called during CPU init)
      
      The goal of this is to pave the way of the removal of the apic_version[]
      array. We dont expect any apic version incompatibilities in the x86
      landscape of systems [if so we dont handle them very well and probably
      never will handle deep apic version assymetries well], but it's prudent
      to have a debug check for one kernel cycle nevertheless.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b2b815d8