1. 28 5月, 2008 1 次提交
    • I
      xen: fix early bootup crash on native hardware · b20aeccd
      Ingo Molnar 提交于
      -tip tree auto-testing found the following early bootup hang:
      
      -------------->
      get_memcfg_from_srat: assigning address to rsdp
      RSD PTR  v0 [Nvidia]
      BUG: Int 14: CR2 ffd00040
           EDI 8092fbfe  ESI ffd00040  EBP 80b0aee8  ESP 80b0aed0
           EBX 000f76f0  EDX 0000000e  ECX 00000003  EAX ffd00040
           err 00000000  EIP 802c055a   CS 00000060  flg 00010006
      Stack: ffd00040 80bc78d0 80b0af6c 80b1dbfe 8093d8ba 00000008 80b42810 80b4ddb4
             80b42842 00000000 80b0af1c 801079c8 808e724e 00000000 80b42871 802c0531
             00000100 00000000 0003fff0 80b0af40 80129999 00040100 00040100 00000000
      Pid: 0, comm: swapper Not tainted 2.6.26-rc4-sched-devel.git #570
       [<802c055a>] ? strncmp+0x11/0x25
       [<80b1dbfe>] ? get_memcfg_from_srat+0xb4/0x568
       [<801079c8>] ? mcount_call+0x5/0x9
       [<802c0531>] ? strcmp+0xa/0x22
       [<80129999>] ? printk+0x38/0x3a
       [<80129999>] ? printk+0x38/0x3a
       [<8011b122>] ? memory_present+0x66/0x6f
       [<80b216b4>] ? setup_memory+0x13/0x40c
       [<80b16b47>] ? propagate_e820_map+0x80/0x97
       [<80b1622a>] ? setup_arch+0x248/0x477
       [<80129999>] ? printk+0x38/0x3a
       [<80b11759>] ? start_kernel+0x6e/0x2eb
       [<80b110fc>] ? i386_start_kernel+0xeb/0xf2
       =======================
      <------
      
      with this config:
      
         http://redhat.com/~mingo/misc/config-Wed_May_28_01_33_33_CEST_2008.bad
      
      The thing is, the crash makes little sense at first sight. We crash on a
      benign-looking printk. The code around it got changed in -tip but
      checking those topic branches individually did not reproduce the bug.
      
      Bisection led to this commit:
      
      |   d5edbc1f is first bad commit
      |   commit d5edbc1f
      |   Author: Jeremy Fitzhardinge <jeremy@goop.org>
      |   Date:   Mon May 26 23:31:22 2008 +0100
      |
      |   xen: add p2m mfn_list_list
      
      Which is somewhat surprising, as on native hardware Xen client side
      should have little to no side-effects.
      
      After some head scratching, it turns out the following happened:
      randconfig enabled the following Xen options:
      
        CONFIG_XEN=y
        CONFIG_XEN_MAX_DOMAIN_MEMORY=8
        # CONFIG_XEN_BLKDEV_FRONTEND is not set
        # CONFIG_XEN_NETDEV_FRONTEND is not set
        CONFIG_HVC_XEN=y
        # CONFIG_XEN_BALLOON is not set
      
      which activated this piece of code in arch/x86/xen/mmu.c:
      
      > @@ -69,6 +69,13 @@
      >  	__attribute__((section(".data.page_aligned"))) =
      >  		{ [ 0 ... TOP_ENTRIES - 1] = &p2m_missing[0] };
      >
      > +/* Arrays of p2m arrays expressed in mfns used for save/restore */
      > +static unsigned long p2m_top_mfn[TOP_ENTRIES]
      > +	__attribute__((section(".bss.page_aligned")));
      > +
      > +static unsigned long p2m_top_mfn_list[TOP_ENTRIES / P2M_ENTRIES_PER_PAGE]
      > +	__attribute__((section(".bss.page_aligned")));
      
      The problem is, you must only put variables into .bss.page_aligned that
      have a _size_ that is _exactly_ page aligned. In this case the size of
      p2m_top_mfn_list is not page aligned:
      
       80b8d000 b p2m_top_mfn
       80b8f000 b p2m_top_mfn_list
       80b8f008 b softirq_stack
       80b97008 b hardirq_stack
       80b9f008 b bm_pte
      
      So all subsequent variables get unaligned which, depending on luck,
      breaks the kernel in various funny ways. In this case what killed the
      kernel first was the misaligned bootmap pte page, resulting in that
      creative crash above.
      
      Anyway, this was a fun bug to track down :-)
      
      I think the moral is that .bss.page_aligned is a dangerous construct in
      its current form, and the symptoms of breakage are very non-trivial, so
      i think we need build-time checks to make sure all symbols in
      .bss.page_aligned are truly page aligned.
      
      The Xen fix below gets the kernel booting again.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b20aeccd
  2. 27 5月, 2008 16 次提交
  3. 23 5月, 2008 1 次提交
  4. 20 5月, 2008 1 次提交
  5. 18 5月, 2008 6 次提交
  6. 17 5月, 2008 1 次提交
  7. 14 5月, 2008 7 次提交
    • R
      x86: user_regset_view table fix for ia32 on 64-bit · 1f465f4e
      Roland McGrath 提交于
      The user_regset_view table for the 32-bit regsets on the 64-bit build had
      the wrong sizes for the FP regsets.  This bug had no user-visible effect
      (just on kernel modules using the user_regset interfaces and the like).
      But the fix is trivial and risk-free.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f465f4e
    • P
      x86: arch/x86/mm/pat.c - fix warning · afc85343
      Pranith Kumar 提交于
      fix this warning:
      
       arch/x86/mm/pat.c: In function `phys_mem_access_prot_allowed':
       arch/x86/mm/pat.c:558: warning: long long unsigned int format, long
       unsigned int arg (arg 6)
       arch/x86/mm/pat.c: In function `map_devmem':
       arch/x86/mm/pat.c:580: warning: long long unsigned int format, long
       unsigned int arg (arg 6)
      Signed-off-by: ND Pranith Kumar <bobby.prani@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      afc85343
    • I
      x86: fix csum_partial() export · 89804c02
      Ingo Molnar 提交于
      Fix this symbol export problem:
      
          Building modules, stage 2.
          MODPOST 193 modules
          ERROR: "csum_partial" [fs/reiserfs/reiserfs.ko] undefined!
          make[1]: *** [__modpost] Error 1
          make: *** [modules] Error 2
      
      This is due to a known weakness of symbol exports: if a symbol's
      only in-core user is an EXPORT_SYMBOL from a lib-y section, the
      symbol is not linked in.
      
      The solution is to move the export to x8664_ksyms_64.c - but the real
      solution would be to fix kbuild.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89804c02
    • A
      x86: early_init_centaur(): use set_cpu_cap() · 8c45a4e4
      Andrew Morton 提交于
      arch/x86/kernel/setup_64.c:954: warning: passing argument 2 of 'set_bit' from incompatible pointer type
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8c45a4e4
    • H
      x86: fix app crashes after SMP resume · 61165d7a
      Hugh Dickins 提交于
      After resume on a 2cpu laptop, kernel builds collapse with a sed hang,
      sh or make segfault (often on 20295564), real-time signal to cc1 etc.
      
      Several hurdles to jump, but a manually-assisted bisect led to -rc1's
      d2bcbad5 x86: do not zap_low_mappings
      in __smp_prepare_cpus.  Though the low mappings were removed at bootup,
      they were left behind (with Global flags helping to keep them in TLB)
      after resume or cpu online, causing the crashes seen.
      
      Reinstate zap_low_mappings (with local __flush_tlb_all) for each cpu_up
      on x86_32.  This used to be serialized by smp_commenced_mask: that's now
      gone, but a low_mappings flag will do.  No need for native_smp_cpus_done
      to repeat the zap: let mem_init zap BSP's low mappings just like on UP.
      
      (In passing, fix error code from native_cpu_up: do_boot_cpu returns a
      variety of diagnostic values, Dprintk what it says but convert to -EIO.
      And save_pg_dir separately before zap_low_mappings: doesn't matter now,
      but zapping twice in succession wiped out resume's swsusp_pg_dir.)
      
      That worked well on the duo and one quad, but wouldn't boot 3rd or 4th
      cpu on P4 Xeon, oopsing just after unlock_ipi_call_lock.  The TLB flush
      IPI now being sent reveals a long-standing bug: the booting cpu has its
      APIC readied in smp_callin at the top of start_secondary, but isn't put
      into the cpu_online_map until just before that unlock_ipi_call_lock.
      
      So native_smp_call_function_mask to online cpus would send_IPI_allbutself,
      including the cpu just coming up, though it has been excluded from the
      count to wait for: by the time it handles the IPI, the call data on
      native_smp_call_function_mask's stack may well have been overwritten.
      
      So fall back to send_IPI_mask while cpu_online_map does not match
      cpu_callout_map: perhaps there's a better APICological fix to be
      made at the start_secondary end, but I wouldn't know that.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      61165d7a
    • V
      x86/PCI: X86_PAT & mprotect · 77db9885
      Venki Pallipadi 提交于
      Some versions of X used the mprotect workaround to change caching type from UC
      to WB, so that it can then use mtrr to program WC for that region [1].  Change
      the mmap of pci space through /sys or /proc interfaces from UC to UC_MINUS.
      With this change, X will not need to use mprotect workaround to get WC type
      since the MTRR mapping type will be honored.
      
      The bug in mprotect that clobbers PAT bits is fixed in a follow on patch. So,
      this X workaround will stop working as well.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      77db9885
    • T
      x86/PCI: fix broken ISA DMA · 4a367f3a
      Takashi Iwai 提交于
      Rene Herman reported:
      
      > commit 8779f2fc
      >
      > "x86: don't try to allocate from DMA zone at first"
      >
      > breaks all of ISA DMA. Or all of ALSA ISA DMA at least. All
      > ISA soundcards are silent following that commit -- no error
      > messages, everything appears fine, just silence.
      
      That patch is buggy. We had an implicit assumption that
      dev = NULL for ISA devices that require 24bit DMA.
      
      The recent work on x86 dma_alloc_coherent() breaks the ISA DMA buffer
      allocation, which is represented by "dev = NULL" and requires 24bit
      DMA implicitly.
      Bisected-by: NRene Herman <rene.herman@keyaccess.nl>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      4a367f3a
  8. 13 5月, 2008 3 次提交
  9. 11 5月, 2008 4 次提交