1. 12 3月, 2009 2 次提交
  2. 09 3月, 2009 1 次提交
    • S
      x86 mmiotrace: fix remove_kmmio_fault_pages() · d0fc63f7
      Stuart Bennett 提交于
      Impact: fix race+crash in mmiotrace
      
      The list manipulation in remove_kmmio_fault_pages() was broken. If more
      than one consecutive kmmio_fault_page was re-added during the grace
      period between unregister_kmmio_probe() and remove_kmmio_fault_pages(),
      the list manipulation failed to remove pages from the release list.
      
      After a second grace period the pages get into rcu_free_kmmio_fault_pages()
      and raise a BUG_ON() kernel crash.
      
      The list manipulation is fixed to properly remove pages from the release
      list.
      
      This bug has been present from the very beginning of mmiotrace in the
      mainline kernel. It was introduced in 0fd0e3da ("x86: mmiotrace full
      patch, preview 1");
      
      An urgent fix for Linus. Tested by Stuart (on 32-bit) and Pekka
      (on amd and intel 64-bit systems, nouveau and nvidia proprietary).
      Signed-off-by: NStuart Bennett <stuart@freedesktop.org>
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      LKML-Reference: <20090308202135.34933feb@daedalus.pq.iki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d0fc63f7
  3. 03 3月, 2009 1 次提交
  4. 02 3月, 2009 7 次提交
    • P
      x86 mmiotrace: fix race with release_kmmio_fault_page() · 340430c5
      Pekka Paalanen 提交于
      There was a theoretical possibility to a race between arming a page in
      post_kmmio_handler() and disarming the page in
      release_kmmio_fault_page():
      
      cpu0                             cpu1
      ------------------------------------------------------------------
      mmiotrace shutdown
      enter release_kmmio_fault_page
                                       fault on the page
                                       disarm the page
      disarm the page
                                       handle the MMIO access
                                       re-arm the page
      put the page on release list
      remove_kmmio_fault_pages()
                                       fault on the page
                                       page not known to mmiotrace
                                       fall back to do_page_fault()
                                       *KABOOM*
      
      (This scenario also shows the double disarm case which is allowed.)
      
      Fixed by acquiring kmmio_lock in post_kmmio_handler() and checking
      if the page is being released from mmiotrace.
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Cc: Stuart Bennett <stuart@freedesktop.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      340430c5
    • S
      x86 mmiotrace: improve handling of secondary faults · 3e39aa15
      Stuart Bennett 提交于
      Upgrade some kmmio.c debug messages to warnings.
      Allow secondary faults on probed pages to fall through, and only log
      secondary faults that are not due to non-present pages.
      
      Patch edited by Pekka Paalanen.
      Signed-off-by: NStuart Bennett <stuart@freedesktop.org>
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e39aa15
    • P
      x86 mmiotrace: split set_page_presence() · 0b700a6a
      Pekka Paalanen 提交于
      From 36772dcb6ffbbb68254cbfc379a103acd2fbfefc Mon Sep 17 00:00:00 2001
      From: Pekka Paalanen <pq@iki.fi>
      Date: Sat, 28 Feb 2009 21:34:59 +0200
      
      Split set_page_presence() in kmmio.c into two more functions set_pmd_presence()
      and set_pte_presence(). Purely code reorganization, no functional changes.
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Cc: Stuart Bennett <stuart@freedesktop.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0b700a6a
    • P
      x86 mmiotrace: fix save/restore page table state · 5359b585
      Pekka Paalanen 提交于
      From baa99e2b32449ec7bf147c234adfa444caecac8a Mon Sep 17 00:00:00 2001
      From: Pekka Paalanen <pq@iki.fi>
      Date: Sun, 22 Feb 2009 20:02:43 +0200
      
      Blindly setting _PAGE_PRESENT in disarm_kmmio_fault_page() overlooks the
      possibility, that the page was not present when it was armed.
      
      Make arm_kmmio_fault_page() store the previous page presence in struct
      kmmio_fault_page and use it on disarm.
      
      This patch was originally written by Stuart Bennett, but Pekka Paalanen
      rewrote it a little different.
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Cc: Stuart Bennett <stuart@freedesktop.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5359b585
    • S
      x86 mmiotrace: WARN_ONCE if dis/arming a page fails · e9d54cae
      Stuart Bennett 提交于
      Print a full warning once, if arming or disarming a page fails.
      
      Also, if initial arming fails, do not handle the page further. This
      avoids the possibility of a page failing to arm and then later claiming
      to have handled any fault on that page.
      
      WARN_ONCE added by Pekka Paalanen.
      Signed-off-by: NStuart Bennett <stuart@freedesktop.org>
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e9d54cae
    • P
      x86: add far read test to testmmiotrace · 5ff93697
      Pekka Paalanen 提交于
      Apparently pages far into an ioremapped region might not actually be
      mapped during ioremap(). Add an optional read test to try to trigger a
      multiply faulting MMIO access. Also add more messages to the kernel log
      to help debugging.
      
      This patch is based on a patch suggested by
      Stuart Bennett <stuart@freedesktop.org>
      who discovered bugs in mmiotrace related to normal kernel space faults.
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Cc: Stuart Bennett <stuart@freedesktop.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5ff93697
    • P
      x86: count errors in testmmiotrace.ko · fab852aa
      Pekka Paalanen 提交于
      Check the read values against the written values in the MMIO read/write
      test. This test shows if the given MMIO test area really works as
      memory, which is a prerequisite for a successful mmiotrace test.
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Cc: Stuart Bennett <stuart@freedesktop.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fab852aa
  5. 28 2月, 2009 1 次提交
  6. 25 2月, 2009 1 次提交
  7. 20 2月, 2009 1 次提交
    • I
      x86: use the right protections for split-up pagetables · 07a66d7c
      Ingo Molnar 提交于
      Steven Rostedt found a bug in where in his modified kernel
      ftrace was unable to modify the kernel text, due to the PMD
      itself having been marked read-only as well in
      split_large_page().
      
      The fix, suggested by Linus, is to not try to 'clone' the
      reference protection of a huge-page, but to use the standard
      (and permissive) page protection bits of KERNPG_TABLE.
      
      The 'cloning' makes sense for the ptes but it's a confused and
      incorrect concept at the page table level - because the
      pagetable entry is a set of all ptes and hence cannot
      'clone' any single protection attribute - the ptes can be any
      mixture of protections.
      
      With the permissive KERNPG_TABLE, even if the pte protections
      get changed after this point (due to ftrace doing code-patching
      or other similar activities like kprobes), the resulting combined
      protections will still be correct and the pte's restrictive
      (or permissive) protections will control it.
      
      Also update the comment.
      
      This bug was there for a long time but has not caused visible
      problems before as it needs a rather large read-only area to
      trigger. Steve possibly hacked his kernel with some really
      large arrays or so. Anyway, the bug is definitely worth fixing.
      
      [ Huang Ying also experienced problems in this area when writing
        the EFI code, but the real bug in split_large_page() was not
        realized back then. ]
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Reported-by: NHuang Ying <ying.huang@intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      07a66d7c
  8. 19 2月, 2009 1 次提交
    • K
      mm: clean up for early_pfn_to_nid() · f2dbcfa7
      KAMEZAWA Hiroyuki 提交于
      What's happening is that the assertion in mm/page_alloc.c:move_freepages()
      is triggering:
      
      	BUG_ON(page_zone(start_page) != page_zone(end_page));
      
      Once I knew this is what was happening, I added some annotations:
      
      	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
      		printk(KERN_ERR "move_freepages: Bogus zones: "
      		       "start_page[%p] end_page[%p] zone[%p]\n",
      		       start_page, end_page, zone);
      		printk(KERN_ERR "move_freepages: "
      		       "start_zone[%p] end_zone[%p]\n",
      		       page_zone(start_page), page_zone(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
      		       page_to_pfn(start_page), page_to_pfn(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_nid[%d] end_nid[%d]\n",
      		       page_to_nid(start_page), page_to_nid(end_page));
       ...
      
      And here's what I got:
      
      	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
      	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
      	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
      	move_freepages: start_nid[1] end_nid[0]
      
      My memory layout on this box is:
      
      [    0.000000] Zone PFN ranges:
      [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
      [    0.000000] Movable zone start PFN for each node
      [    0.000000] early_node_map[8] active PFN ranges
      [    0.000000]     0: 0x00000000 -> 0x00020000
      [    0.000000]     1: 0x00800000 -> 0x0081f7ff
      [    0.000000]     1: 0x0081f800 -> 0x0081fe50
      [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
      [    0.000000]     1: 0x0081feda -> 0x0081fedb
      [    0.000000]     1: 0x0081fedd -> 0x0081fee5
      [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
      [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
      
      So it's a block move in that 0x81f600-->0x81f7ff region which triggers
      the problem.
      
      This patch:
      
      Declaration of early_pfn_to_nid() is scattered over per-arch include
      files, and it seems it's complicated to know when the declaration is used.
       I think it makes fix-for-memmap-init not easy.
      
      This patch moves all declaration to include/linux/mm.h
      
      After this,
        if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use static definition in include/linux/mm.h
        else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use generic definition in mm/page_alloc.c
        else
           -> per-arch back end function will be called.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2dbcfa7
  9. 13 2月, 2009 1 次提交
  10. 12 2月, 2009 2 次提交
  11. 06 2月, 2009 1 次提交
  12. 26 1月, 2009 1 次提交
  13. 22 1月, 2009 1 次提交
  14. 21 1月, 2009 1 次提交
    • S
      x86: fix page attribute corruption with cpa() · a1e46212
      Suresh Siddha 提交于
      Impact: fix sporadic slowdowns and warning messages
      
      This patch fixes a performance issue reported by Linus on his
      Nehalem system. While Linus reverted the PAT patch (commit
      58dab916) which exposed the issue,
      existing cpa() code can potentially still cause wrong(page attribute
      corruption) behavior.
      
      This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that
      various people reported.
      
      In 64bit kernel, kernel identity mapping might have holes depending
      on the available memory and how e820 reports the address range
      covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole
      in the address range that is not listed by e820 entries, kernel identity
      mapping will have a corresponding hole in its 1-1 identity mapping.
      
      If cpa() happens on the kernel identity mapping which falls into these holes,
      existing code fails like this:
      
      	__change_page_attr_set_clr()
      		__change_page_attr()
      			returns 0 because of if (!kpte). But doesn't
      			set cpa->numpages and cpa->pfn.
      		cpa_process_alias()
      			uses uninitialized cpa->pfn (random value)
      			which can potentially lead to changing the page
      			attribute of kernel text/data, kernel identity
      			mapping of RAM pages etc. oops!
      
      This bug was easily exposed by another PAT patch which was doing
      cpa() more often on kernel identity mapping holes (physical range between
      max_low_pfn_mapped and 4GB), where in here it was setting the
      cache disable attribute(PCD) for kernel identity mappings aswell.
      
      Fix cpa() to handle the kernel identity mapping holes. Retain
      the WARN() for cpa() calls to other not present address ranges
      (kernel-text/data, ioremap() addresses)
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a1e46212
  15. 20 1月, 2009 1 次提交
    • G
      x86: remove kernel_physical_mapping_init() from init section · f5495506
      Gary Hade 提交于
      Impact: fix crash with memory hotplug enabled
      
      kernel_physical_mapping_init() is called during memory hotplug
      so it does not belong in the init section.
      
      If the kernel is built with CONFIG_DEBUG_SECTION_MISMATCH=y on
      the make command line, arch/x86/mm/init_64.c is compiled with
      the -fno-inline-functions-called-once gcc option defeating
      inlining of kernel_physical_mapping_init() within init_memory_mapping().
      
      When kernel_physical_mapping_init() is not inlined it is placed
      in the .init.text section according to the __init in it's current
      declaration.  A later call to kernel_physical_mapping_init() during
      a memory hotplug operation encounters an int3 trap because the
      .init.text section memory has been freed.
      
      This patch eliminates the crash caused by the int3 trap by moving the
      non-inlined kernel_physical_mapping_init() from .init.text to .meminit.text.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f5495506
  16. 16 1月, 2009 2 次提交
    • J
      x86: fix assumed to be contiguous leaf page tables for kmap_atomic region (take 2) · a3c6018e
      Jan Beulich 提交于
      Debugging and original patch from Nick Piggin <npiggin@suse.de>
      
      The early fixmap pmd entry inserted at the very top of the KVA is causing the
      subsequent fixmap mapping code to not provide physically linear pte pages over
      the kmap atomic portion of the fixmap (which relies on said property to
      calculate pte addresses).
      
      This has caused weird boot failures in kmap_atomic much later in the boot
      process (initial userspace faults) on a 32-bit PAE system with a larger number
      of CPUs (smaller CPU counts tend not to run over into the next page so don't
      show up the problem).
      
      Solve this by attempting to clear out the page table, and copy any of its
      entries to the new one. Also, add a bug if a nonlinear condition is encountered
      and can't be resolved, which might save some hours of debugging if this fragile
      scheme ever breaks again...
      
      Once we have such logic, we can also use it to eliminate the early ioremap
      trickery around the page table setup for the fixmap area. This also fixes
      potential issues with FIX_* entries sharing the leaf page table with the early
      ioremap ones getting discarded by early_ioremap_clear() and not restored by
      early_ioremap_reset(). It at once eliminates the temporary (and configuration,
      namely NR_CPUS, dependent) unavailability of early fixed mappings during the
      time the fixmap area page tables get constructed.
      
      Finally, also replace the hard coded calculation of the initial table space
      needed for the fixmap area with a proper one, allowing kernels configured for
      large CPU counts to actually boot.
      
      Based-on: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a3c6018e
    • L
      Revert "x86 PAT: remove CPA WARN_ON for zero pte" · b5db0e38
      Linus Torvalds 提交于
      This reverts commit 58dab916, which
      makes my Nehalem come to a nasty crawling almost-halt.  It looks like it
      turns off caching of regular kernel RAM, with the understandable
      slowdown of a few orders of magnitude as a result.
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5db0e38
  17. 15 1月, 2009 1 次提交
  18. 14 1月, 2009 3 次提交
    • V
      x86 PAT: remove CPA WARN_ON for zero pte · 58dab916
      venkatesh.pallipadi@intel.com 提交于
      Impact: reduce scope of debug check - avoid warnings
      
      The logic to find whether identity map exists or not using
      high_memory or max_low_pfn_mapped/max_pfn_mapped are not complete
      as the memory withing the range may not be mapped if there is a
      unusable hole in e820.
      
      Specifically, on my test system I started seeing these warnings with
      tools like hwinfo, acpidump trying to map ACPI region.
      
      [   27.400018] ------------[ cut here ]------------
      [   27.400344] WARNING: at /home/venkip/src/linus/linux-2.6/arch/x86/mm/pageattr.c:560 __change_page_attr_set_clr+0xf3/0x8b8()
      [   27.400821] Hardware name: X7DB8
      [   27.401070] CPA: called for zero pte. vaddr = ffff8800cff6a000 cpa->vaddr = ffff8800cff6a000
      [   27.401569] Modules linked in:
      [   27.401882] Pid: 4913, comm: dmidecode Not tainted 2.6.28-05716-gfe0bdec6 #586
      [   27.402141] Call Trace:
      [   27.402488]  [<ffffffff80237c21>] warn_slowpath+0xd3/0x10f
      [   27.402749]  [<ffffffff80274ade>] ? find_get_page+0xb3/0xc9
      [   27.403028]  [<ffffffff80274a2b>] ? find_get_page+0x0/0xc9
      [   27.403333]  [<ffffffff80226425>] __change_page_attr_set_clr+0xf3/0x8b8
      [   27.403628]  [<ffffffff8028ec99>] ? __purge_vmap_area_lazy+0x192/0x1a1
      [   27.403883]  [<ffffffff8028eb52>] ? __purge_vmap_area_lazy+0x4b/0x1a1
      [   27.404172]  [<ffffffff80290268>] ? vm_unmap_aliases+0x1ab/0x1bb
      [   27.404512]  [<ffffffff80290105>] ? vm_unmap_aliases+0x48/0x1bb
      [   27.404766]  [<ffffffff80226d28>] change_page_attr_set_clr+0x13e/0x2e6
      [   27.405026]  [<ffffffff80698fa7>] ? _spin_unlock+0x26/0x2a
      [   27.405292]  [<ffffffff80227e6a>] ? reserve_memtype+0x19b/0x4e3
      [   27.405590]  [<ffffffff80226ffd>] _set_memory_wb+0x22/0x24
      [   27.405844]  [<ffffffff80225d28>] ioremap_change_attr+0x26/0x28
      [   27.406097]  [<ffffffff80228355>] reserve_pfn_range+0x1a3/0x235
      [   27.406427]  [<ffffffff80228430>] track_pfn_vma_new+0x49/0xb3
      [   27.406686]  [<ffffffff80286c46>] remap_pfn_range+0x94/0x32c
      [   27.406940]  [<ffffffff8022878d>] ? phys_mem_access_prot_allowed+0xb5/0x1a8
      [   27.407209]  [<ffffffff803e9bf4>] mmap_mem+0x75/0x9d
      [   27.407523]  [<ffffffff8028b3b4>] mmap_region+0x2cf/0x53e
      [   27.407776]  [<ffffffff8028b8cc>] do_mmap_pgoff+0x2a9/0x30d
      [   27.408034]  [<ffffffff8020f4a4>] sys_mmap+0x92/0xce
      [   27.408339]  [<ffffffff8020b65b>] system_call_fastpath+0x16/0x1b
      [   27.408614] ---[ end trace 4b16ad70c09a602d ]---
      [   27.408871] dmidecode:4913 reserve_pfn_range ioremap_change_attr failed write-back for cff6a000-cff6b000
      
      This is wih track_pfn_vma_new trying to keep identity map in sync.
      The address cff6a000 is the ACPI region according to e820.
      
      [    0.000000] BIOS-provided physical RAM map:
      [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
      [    0.000000]  BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
      [    0.000000]  BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
      [    0.000000]  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
      [    0.000000]  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
      [    0.000000]  BIOS-e820: 00000000cff60000 - 00000000cff69000 (ACPI data)
      [    0.000000]  BIOS-e820: 00000000cff69000 - 00000000cff80000 (ACPI NVS)
      [    0.000000]  BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
      [    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
      [    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
      [    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
      [    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
      [    0.000000]  BIOS-e820: 0000000100000000 - 0000000230000000 (usable)
      
      And is not mapped as per init_memory_mapping.
      
      [    0.000000] init_memory_mapping: 0000000000000000-00000000cff60000
      [    0.000000] init_memory_mapping: 0000000100000000-0000000230000000
      
      We can add logic to check for this. But, there can also be other holes in
      identity map when we have 1GB of aligned reserved space in e820.
      
      This patch handles it by removing the WARN_ON and returning a specific
      error value (EFAULT) to indicate that the address does not have any
      identity mapping.
      
      The code that tries to keep identity map in sync can ignore
      this error, with other callers of cpa still getting error here.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      58dab916
    • V
      x86 PAT: return compatible mapping to remap_pfn_range callers · cdecff68
      venkatesh.pallipadi@intel.com 提交于
      Impact: avoid warning message, potentially solve 3D performance regression
      
      Change x86 PAT code to return compatible memtype if the exact memtype that
      was requested in remap_pfn_rage and friends is not available due to some
      conflict.
      
      This is done by returning the compatible type in pgprot parameter of
      track_pfn_vma_new(), and the caller uses that memtype for page table.
      
      Note that track_pfn_vma_copy() which is basically called during fork gets the
      prot from existing page table and should not have any conflict. Hence we use
      strict memtype check there and do not allow compatible memtypes.
      
      This patch fixes the bug reported here:
      
        http://marc.info/?l=linux-kernel&m=123108883716357&w=2
      
      Specifically the error message:
      
        X:5010 map pfn expected mapping type write-back for d0000000-d0101000,
        got write-combining
      
      Should go away.
      Reported-and-bisected-by: NKevin Winchester <kjwinchester@gmail.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdecff68
    • V
      x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param · e4b866ed
      venkatesh.pallipadi@intel.com 提交于
      Impact: cleanup
      
      Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer.
      Subsequent patch changes the x86 PAT handling to return a compatible
      memtype in pgprot_t, if what was requested cannot be allowed due to conflicts.
      No fuctionality change in this patch.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e4b866ed
  19. 13 1月, 2009 1 次提交
    • A
      x86: avoid theoretical vmalloc fault loop · f313e123
      Andi Kleen 提交于
      Ajith Kumar noticed:
      
       I was going through the vmalloc fault handling for x86_64 and am unclear
       about the following lines in the vmalloc_fault() function.
      
       pgd = pgd_offset(current->mm ?: &init_mm, address);
       pgd_ref = pgd_offset_k(address);
      
       Here the intention is to get the pgd corresponding to the current process
       and sync it up with the pgd in init_mm(obtained from pgd_offset_k).
       However, for kernel threads current->mm is NULL and hence pgd =
       pgd_offset(init_mm, address) = pgd_ref which means the fault handler
       returns without setting the pgd entry in the MM structure in the context
       of which the kernel thread has faulted.  This could lead to never-ending
       faults and busy looping of kernel threads like pdflush.  So, shouldn't the
       pgd = pgd_offset(current->mm ?: &init_mm, address); be pgd =
       pgd_offset(current->active_mm ?: &init_mm, address);
      
      We can use active_mm unconditionally because it should be always set.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f313e123
  20. 08 1月, 2009 2 次提交
  21. 07 1月, 2009 2 次提交
    • G
      mm: show node to memory section relationship with symlinks in sysfs · c04fc586
      Gary Hade 提交于
      Show node to memory section relationship with symlinks in sysfs
      
      Add /sys/devices/system/node/nodeX/memoryY symlinks for all
      the memory sections located on nodeX.  For example:
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      indicates that memory section 135 resides on node1.
      
      Also revises documentation to cover this change as well as updating
      Documentation/ABI/testing/sysfs-devices-memory to include descriptions
      of memory hotremove files 'phys_device', 'phys_index', and 'state'
      that were previously not described there.
      
      In addition to it always being a good policy to provide users with
      the maximum possible amount of physical location information for
      resources that can be hot-added and/or hot-removed, the following
      are some (but likely not all) of the user benefits provided by
      this change.
      Immediate:
        - Provides information needed to determine the specific node
          on which a defective DIMM is located.  This will reduce system
          downtime when the node or defective DIMM is swapped out.
        - Prevents unintended onlining of a memory section that was
          previously offlined due to a defective DIMM.  This could happen
          during node hot-add when the user or node hot-add assist script
          onlines _all_ offlined sections due to user or script inability
          to identify the specific memory sections located on the hot-added
          node.  The consequences of reintroducing the defective memory
          could be ugly.
        - Provides information needed to vary the amount and distribution
          of memory on specific nodes for testing or debugging purposes.
      Future:
        - Will provide information needed to identify the memory
          sections that need to be offlined prior to physical removal
          of a specific node.
      
      Symlink creation during boot was tested on 2-node x86_64, 2-node
      ppc64, and 2-node ia64 systems.  Symlink creation during physical
      memory hot-add tested on a 2-node x86_64 system.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c04fc586
    • N
      mm: invoke oom-killer from page fault · 1c0fe6e3
      Nick Piggin 提交于
      Rather than have the pagefault handler kill a process directly if it gets
      a VM_FAULT_OOM, have it call into the OOM killer.
      
      With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
      oom killing throttling, oom priority adjustment or selective disabling,
      panic on oom, etc), it's silly to unconditionally kill the faulting
      process at page fault time.  Create a hook for pagefault oom path to call
      into instead.
      
      Only converted x86 and uml so far.
      
      [akpm@linux-foundation.org: make __out_of_memory() static]
      [akpm@linux-foundation.org: fix comment]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jeff Dike <jdike@addtoit.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0fe6e3
  22. 06 1月, 2009 1 次提交
  23. 03 1月, 2009 1 次提交
  24. 02 1月, 2009 1 次提交
    • I
      x86: convert permanent_kmaps_init() from macro to inline · a9067d53
      Ingo Brueckl 提交于
      Impact: cleanup
      
      This compiler warning:
      
        arch/x86/mm/init_32.c:515: warning: unused variable 'pgd_base'
      
      triggers because permanent_kmaps_init() is a CPP macro in the
      !CONFIG_HIGHMEM case, that does not tell the compiler that the
      'pgd_base' parameter is used.
      
      Convert permanent_kmaps_init() (and set_highmem_pages_init()) to
      C inline functions - which gives the parameter a proper type and
      which gets rid of the compiler warning as well.
      Signed-off-by: NIngo Brueckl <ib@wupperonline.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a9067d53
  25. 24 12月, 2008 1 次提交
    • H
      x86: PAT: fix address types in track_pfn_vma_new() · c1c15b65
      H. Peter Anvin 提交于
      Impact: cleanup, fix warning
      
      This warning:
      
       arch/x86/mm/pat.c: In function track_pfn_vma_copy:
       arch/x86/mm/pat.c:701: warning: passing argument 5 of follow_phys from incompatible pointer type
      
      Triggers because physical addresses are resource_size_t, not u64.
      
      This really matters when calling an interface like follow_phys() which
      takes a pointer to a physical address -- although on x86, being
      littleendian, it would generally work anyway as long as the memory region
      wasn't completely uninitialized.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c1c15b65
  26. 20 12月, 2008 1 次提交
  27. 19 12月, 2008 1 次提交