1. 15 7月, 2011 11 次提交
  2. 14 7月, 2011 2 次提交
  3. 13 7月, 2011 2 次提交
    • T
      x86, numa: Implement pfn -> nid mapping granularity check · 1e01979c
      Tejun Heo 提交于
      SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use
      sections array to map pfn to nid which is limited in granularity.  If
      NUMA nodes are laid out such that the mapping cannot be accurate, boot
      will fail triggering BUG_ON() in mminit_verify_page_links().
      
      On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
      granular enough until commit 2706a0bf (x86, NUMA: Enable
      CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
      aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  This
      led to the following BUG_ON().
      
       On node 0 totalpages: 2096615
         DMA zone: 32 pages used for memmap
         DMA zone: 0 pages reserved
         DMA zone: 3927 pages, LIFO batch:0
         Normal zone: 1740 pages used for memmap
         Normal zone: 220978 pages, LIFO batch:31
         HighMem zone: 16405 pages used for memmap
         HighMem zone: 1853533 pages, LIFO batch:31
       BUG: Int 6: CR2   (null)
            EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
            EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
            err   (null)  EIP c16209aa   CS 00000060  flg 00010002
       Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
                (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
              f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
       Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0bf #17
       Call Trace:
        [<c136b1e5>] ? early_fault+0x2e/0x2e
        [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
        [<c1620613>] ? memmap_init_zone+0xaf/0x10c
        [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
        [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
        [<c1601d80>] ? paging_init+0x112/0x118
        [<c15f578d>] ? setup_arch+0x791/0x82f
        [<c15f43d9>] ? start_kernel+0x6a/0x257
      
      This patch implements node_map_pfn_alignment() which determines
      maximum internode alignment and update numa_register_memblks() to
      reject NUMA configuration if alignment exceeds the pfn -> nid mapping
      granularity of the memory model as determined by PAGES_PER_SECTION.
      
      This makes the problematic machine boot w/ flatmem by rejecting the
      NUMA config and provides protection against crazy NUMA configurations.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org
      LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
      Reported-and-Tested-by: NHans Rosenfeld <hans.rosenfeld@amd.com>
      Cc: Conny Seidel <conny.seidel@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      1e01979c
    • T
      x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ · d0ead157
      Tejun Heo 提交于
      DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to
      SPARSEMEM; however, it calls each mapping unit ELEMENT instead of
      SECTION.  This patch renames it to SECTION so that PAGES_PER_SECTION
      is valid for both DISCONTIGMEM and SPARSEMEM.  This will be used by
      the next patch to implement mapping granularity check.
      
      This patch is trivial constant rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20110712074422.GA2872@htj.dyndns.org
      Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      d0ead157
  4. 08 7月, 2011 1 次提交
  5. 07 7月, 2011 2 次提交
    • M
      x86: Don't use the EFI reboot method by default · f70e957c
      Matthew Garrett 提交于
      Testing suggests that at least some Lenovos and some Intels will
      fail to reboot via EFI, attempting to jump to an unmapped
      physical address. In the long run we could handle this by
      providing a page table with a 1:1 mapping of physical addresses,
      but for now it's probably just easier to assume that ACPI or
      legacy methods will be present and reboot via those.
      Signed-off-by: NMatthew Garrett <mjg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alan Cox <alan@linux.intel.com>
      Link: http://lkml.kernel.org/r/1309985557-15350-1-git-send-email-mjg@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      f70e957c
    • K
      x86, suspend: Restore MISC_ENABLE MSR in realmode wakeup · 7a313666
      Kees Cook 提交于
      Some BIOSes will reset the Intel MISC_ENABLE MSR (specifically the
      XD_DISABLE bit) when resuming from S3, which can interact poorly with
      ebba638a. In 32bit PAE mode, this can
      lead to a fault when EFER is restored by the kernel wakeup routines,
      due to it setting the NX bit for a CPU that (thanks to the BIOS reset)
      now incorrectly thinks it lacks the NX feature. (64bit is not affected
      because it uses a common CPU bring-up that specifically handles the
      XD_DISABLE bit.)
      
      The need for MISC_ENABLE being restored so early is specific to the S3
      resume path. Normally, MISC_ENABLE is saved in save_processor_state(),
      but this happens after the resume header is created, so just reproduce
      the logic here. (acpi_suspend_lowlevel() creates the header, calls
      do_suspend_lowlevel, which calls save_processor_state(), so the saved
      processor context isn't available during resume header creation.)
      
      [ hpa: Consider for stable if OK in mainline ]
      Signed-off-by: NKees Cook <kees.cook@canonical.com>
      Link: http://lkml.kernel.org/r/20110707011034.GA8523@outflux.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: <stable@kernel.org> 2.6.38+
      7a313666
  6. 06 7月, 2011 1 次提交
  7. 01 7月, 2011 1 次提交
    • T
      x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines · a26474e8
      Tejun Heo 提交于
      During 32/64 NUMA init unification, commit 797390d8 ("x86-32,
      NUMA: use sparse_memory_present_with_active_regions()") made
      32bit mm init call memory_present() automatically from
      active_regions instead of leaving it to each NUMA init path.
      
      This commit description is inaccurate - memory_present() calls
      aren't the same for flat and numaq.  After the commit,
      memory_present() is only called for the intersection of e820 and
      NUMA layout.  Before, on flatmem, memory_present() would be
      called from 0 to max_pfn.  After, it would be called only on the
      areas that e820 indicates to be populated.
      
      This is how x86_64 works and should be okay as memmap is allowed
      to contain holes; however, x86_32 DISCONTIGMEM is missing
      early_pfn_valid(), which makes memmap_init_zone() assume that
      memmap doesn't contain any hole.  This leads to the following
      oops if e820 map contains holes as it often does on machine with
      near or more 4GiB of memory by calling pfn_to_page() on a pfn
      which isn't mapped to a NUMA node, a reported by Conny Seidel:
      
        BUG: unable to handle kernel paging request at 000012b0
        IP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2
        *pdpt =3D 0000000000000000 *pde =3D f000eef3f000ee00
        Oops: 0000 [#1] SMP
        last sysfs file:
        Modules linked in:
      
        Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00164-g797390d8 #1 To Be Filled By O.E.M. To Be Filled By O.E.M./E350M1
        EIP: 0060:[<c1aa13ce>] EFLAGS: 00010012 CPU: 0
        EIP is at memmap_init_zone+0x6c/0xf2
        EAX: 00000000 EBX: 000a8000 ECX: 000a7fff EDX: f2c00b80
        ESI: 000a8000 EDI: f2c00800 EBP: c19ffe54 ESP: c19ffe34
         DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
        Process swapper (pid: 0, ti=3Dc19fe000 task=3Dc1a07f60 task.ti=3Dc19fe000)
        Stack:
         00000002 00000000 0023f000 00000000 10000000 00000a00 f2c00000 f2c00b58
         c19ffeb0 c1a80f24 000375fe 00000000 f2c00800 00000800 00000100 00000030
         c1abb768 0000003c 00000000 00000000 00000004 00207a02 f2c00800 000375fe
        Call Trace:
         [<c1a80f24>] free_area_init_node+0x358/0x385
         [<c1a81384>] free_area_init_nodes+0x420/0x487
         [<c1a79326>] paging_init+0x114/0x11b
         [<c1a6cb13>] setup_arch+0xb37/0xc0a
         [<c1a69554>] start_kernel+0x76/0x316
         [<c1a690a8>] i386_start_kernel+0xa8/0xb0
      
      This patch fixes the bug by defining early_pfn_valid() to be the
      same as pfn_valid() when DISCONTIGMEM.
      Reported-bisected-and-tested-by: NConny Seidel <conny.seidel@amd.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: hans.rosenfeld@amd.com
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Conny Seidel <conny.seidel@amd.com>
      Link: http://lkml.kernel.org/r/20110628094107.GB3386@htj.dyndns.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a26474e8
  8. 30 6月, 2011 2 次提交
    • K
      xen/pci: Use the INT_SRC_OVR IRQ (instead of GSI) to preset the ACPI SCI IRQ. · 155a16f2
      Konrad Rzeszutek Wilk 提交于
      In the past we would use the GSI value to preset the ACPI SCI
      IRQ which worked great as GSI == IRQ:
      
      ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
      
      While that is most often seen, there are some oddities:
      
      ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 20 low level)
      
      which means that GSI 20 (or pin 20) is to be overriden for IRQ 9.
      Our code that presets the interrupt for ACPI SCI however would
      use the GSI 20 instead of IRQ 9 ending up with:
      
      xen: sci override: global_irq=20 trigger=0 polarity=1
      xen: registering gsi 20 triggering 0 polarity 1
      xen: --> pirq=20 -> irq=20
      xen: acpi sci 20
      .. snip..
      calling  acpi_init+0x0/0xbc @ 1
      ACPI: SCI (IRQ9) allocation failed
      ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20110413/evevent-119)
      ACPI: Unable to start the ACPI Interpreter
      
      as the ACPI interpreter made a call to 'acpi_gsi_to_irq' which got nine.
      It used that value to request an IRQ (request_irq) and since that was not
      present it failed.
      
      The fix is to recognize that for interrupts that are overriden (in our
      case we only care about the ACPI SCI) we should use the IRQ number
      to present the IRQ instead of the using GSI. End result is that we get:
      
      xen: sci override: global_irq=20 trigger=0 polarity=1
      xen: registering gsi 20 triggering 0 polarity 1
      xen: --> pirq=20 -> irq=9 (gsi=9)
      xen: acpi sci 9
      
      which fixes the ACPI interpreter failing on startup.
      
      CC: stable@kernel.org
      Reported-by: NLiwei <xieliwei@gmail.com>
      Tested-by: NLiwei <xieliwei@gmail.com>
      [http://lists.xensource.com/archives/html/xen-devel/2011-06/msg01727.html]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      155a16f2
    • K
      xen/mmu: Fix for linker errors when CONFIG_SMP is not defined. · 32dd1194
      Konrad Rzeszutek Wilk 提交于
      Simple enough - we use an extern defined symbol which is not
      defined when CONFIG_SMP is not defined. This fixes the linker
      dying.
      
      CC: stable@kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      32dd1194
  9. 29 6月, 2011 1 次提交
  10. 28 6月, 2011 2 次提交
    • J
      watchdog: Intel SCU Watchdog: Fix build and remove duplicate code · e376fd66
      Jesper Juhl 提交于
      Trying to build the Intel SCU Watchdog fails for me with gcc 4.6.0 -
      $ gcc --version | head -n 1
      gcc (GCC) 4.6.0 20110513 (prerelease)
      
      like this :
        CC      drivers/watchdog/intel_scu_watchdog.o
      In file included from drivers/watchdog/intel_scu_watchdog.c:49:0:
      /home/jj/src/linux-2.6/arch/x86/include/asm/apb_timer.h: In function ‘apbt_time_init’:
      /home/jj/src/linux-2.6/arch/x86/include/asm/apb_timer.h:65:42: warning: ‘return’ with a value, in function returning void [enabled by default]
      drivers/watchdog/intel_scu_watchdog.c: In function ‘intel_scu_watchdog_init’:
      drivers/watchdog/intel_scu_watchdog.c:468:2: error: implicit declaration of function ‘sfi_get_mtmr’ [-Werror=implicit-function-declaration]
      drivers/watchdog/intel_scu_watchdog.c:468:32: warning: assignment makes pointer from integer without a cast [enabled by default]
      cc1: some warnings being treated as errors
      
      make[1]: *** [drivers/watchdog/intel_scu_watchdog.o] Error 1
      make: *** [drivers/watchdog/intel_scu_watchdog.o] Error 2
      
      Additionally, linux/types.h is needlessly being included twice in 
      drivers/watchdog/intel_scu_watchdog.c
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
      e376fd66
    • K
      Fix node_start/end_pfn() definition for mm/page_cgroup.c · c6830c22
      KAMEZAWA Hiroyuki 提交于
      commit 21a3c964 uses node_start/end_pfn(nid) for detection start/end
      of nodes. But, it's not defined in linux/mmzone.h but defined in
      /arch/???/include/mmzone.h which is included only under
      CONFIG_NEED_MULTIPLE_NODES=y.
      
      Then, we see
        mm/page_cgroup.c: In function 'page_cgroup_init':
        mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn'
        mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn'
      
      So, fixiing page_cgroup.c is an idea...
      
      But node_start_pfn()/node_end_pfn() is a very generic macro and
      should be implemented in the same manner for all archs.
      (m32r has different implementation...)
      
      This patch removes definitions of node_start/end_pfn() in each archs
      and defines a unified one in linux/mmzone.h. It's not under
      CONFIG_NEED_MULTIPLE_NODES, now.
      
      A result of macro expansion is here (mm/page_cgroup.c)
      
      for !NUMA
       start_pfn = ((&contig_page_data)->node_start_pfn);
        end_pfn = ({ pg_data_t *__pgdat = (&contig_page_data); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});
      
      for NUMA (x86-64)
        start_pfn = ((node_data[nid])->node_start_pfn);
        end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;});
      
      Changelog:
       - fixed to avoid using "nid" twice in node_end_pfn() macro.
      Reported-and-acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Reported-and-tested-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c6830c22
  11. 20 6月, 2011 4 次提交
  12. 19 6月, 2011 1 次提交
  13. 17 6月, 2011 1 次提交
    • K
      xen/setup: Fix for incorrect xen_extra_mem_start. · acd049c6
      Konrad Rzeszutek Wilk 提交于
      The earlier attempts (24bdb0b6)
      at fixing this problem caused other problems to surface (PV guests
      with no PCI passthrough would have SWIOTLB turned on - which meant
      64MB of precious contingous DMA32 memory being eaten up per guest).
      The problem was: "on xen we add an extra memory region at the end of
      the e820, and on this particular machine this extra memory region
      would start below 4g and cross over the 4g boundary:
      
      [0xfee01000-0x192655000)
      
      Unfortunately e820_end_of_low_ram_pfn does not expect an
      e820 layout like that so it returns 4g, therefore initial_memory_mapping
      will map [0 - 0x100000000), that is a memory range that includes some
      reserved memory regions."
      
      The memory range was the IOAPIC regions, and with the 1-1 mapping
      turned on, it would map them as RAM, not as MMIO regions. This caused
      the hypervisor to complain. Fortunately this is experienced only under
      the initial domain so we guard for it.
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      acd049c6
  14. 16 6月, 2011 2 次提交
  15. 15 6月, 2011 2 次提交
  16. 14 6月, 2011 1 次提交
  17. 10 6月, 2011 1 次提交
    • M
      exec: delay address limit change until point of no return · dac853ae
      Mathias Krause 提交于
      Unconditionally changing the address limit to USER_DS and not restoring
      it to its old value in the error path is wrong because it prevents us
      using kernel memory on repeated calls to this function.  This, in fact,
      breaks the fallback of hard coded paths to the init program from being
      ever successful if the first candidate fails to load.
      
      With this patch applied switching to USER_DS is delayed until the point
      of no return is reached which makes it possible to have a multi-arch
      rootfs with one arch specific init binary for each of the (hard coded)
      probed paths.
      
      Since the address limit is already set to USER_DS when start_thread()
      will be invoked, this redundancy can be safely removed.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dac853ae
  18. 09 6月, 2011 2 次提交
  19. 08 6月, 2011 1 次提交
    • T
      x86: cpu-hotplug: Prevent softirq wakeup on wrong CPU · fd8a7de1
      Thomas Gleixner 提交于
      After a newly plugged CPU sets the cpu_online bit it enables
      interrupts and goes idle. The cpu which brought up the new cpu waits
      for the cpu_online bit and when it observes it, it sets the cpu_active
      bit for this cpu. The cpu_active bit is the relevant one for the
      scheduler to consider the cpu as a viable target.
      
      With forced threaded interrupt handlers which imply forced threaded
      softirqs we observed the following race:
      
      cpu 0                         cpu 1
      
      bringup(cpu1);
                                    set_cpu_online(smp_processor_id(), true);
      		              local_irq_enable();
      while (!cpu_online(cpu1));
                                    timer_interrupt()
                                      -> wake_up(softirq_thread_cpu1);
                                           -> enqueue_on(softirq_thread_cpu1, cpu0);
      
                                                                              ^^^^
      
      cpu_notify(CPU_ONLINE, cpu1);
        -> sched_cpu_active(cpu1)
           -> set_cpu_active((cpu1, true);
      
      When an interrupt happens before the cpu_active bit is set by the cpu
      which brought up the newly onlined cpu, then the scheduler refuses to
      enqueue the woken thread which is bound to that newly onlined cpu on
      that newly onlined cpu due to the not yet set cpu_active bit and
      selects a fallback runqueue. Not really an expected and desirable
      behaviour.
      
      So far this has only been observed with forced hard/softirq threading,
      but in theory this could happen without forced threaded hard/softirqs
      as well. It's probably unobservable as it would take a massive
      interrupt storm on the newly onlined cpu which causes the softirq loop
      to wake up the softirq thread and an even longer delay of the cpu
      which waits for the cpu_online bit.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: stable@kernel.org # 2.6.39
      fd8a7de1