1. 24 1月, 2009 1 次提交
    • H
      x86: handle PAT more like other CPU features · 75a04811
      H. Peter Anvin 提交于
      Impact: Cleanup
      
      When PAT was originally introduced, it was handled specially for a few
      reasons:
      
      - PAT bugs are hard to track down, so we wanted to maintain a
        whitelist of CPUs.
      - The i386 and x86-64 CPUID code was not yet unified.
      
      Both of these are now obsolete, so handle PAT like any other features,
      including ordinary feature blacklisting due to known bugs.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      75a04811
  2. 16 1月, 2009 1 次提交
  3. 14 1月, 2009 3 次提交
    • V
      x86 PAT: remove CPA WARN_ON for zero pte · 58dab916
      venkatesh.pallipadi@intel.com 提交于
      Impact: reduce scope of debug check - avoid warnings
      
      The logic to find whether identity map exists or not using
      high_memory or max_low_pfn_mapped/max_pfn_mapped are not complete
      as the memory withing the range may not be mapped if there is a
      unusable hole in e820.
      
      Specifically, on my test system I started seeing these warnings with
      tools like hwinfo, acpidump trying to map ACPI region.
      
      [   27.400018] ------------[ cut here ]------------
      [   27.400344] WARNING: at /home/venkip/src/linus/linux-2.6/arch/x86/mm/pageattr.c:560 __change_page_attr_set_clr+0xf3/0x8b8()
      [   27.400821] Hardware name: X7DB8
      [   27.401070] CPA: called for zero pte. vaddr = ffff8800cff6a000 cpa->vaddr = ffff8800cff6a000
      [   27.401569] Modules linked in:
      [   27.401882] Pid: 4913, comm: dmidecode Not tainted 2.6.28-05716-gfe0bdec6 #586
      [   27.402141] Call Trace:
      [   27.402488]  [<ffffffff80237c21>] warn_slowpath+0xd3/0x10f
      [   27.402749]  [<ffffffff80274ade>] ? find_get_page+0xb3/0xc9
      [   27.403028]  [<ffffffff80274a2b>] ? find_get_page+0x0/0xc9
      [   27.403333]  [<ffffffff80226425>] __change_page_attr_set_clr+0xf3/0x8b8
      [   27.403628]  [<ffffffff8028ec99>] ? __purge_vmap_area_lazy+0x192/0x1a1
      [   27.403883]  [<ffffffff8028eb52>] ? __purge_vmap_area_lazy+0x4b/0x1a1
      [   27.404172]  [<ffffffff80290268>] ? vm_unmap_aliases+0x1ab/0x1bb
      [   27.404512]  [<ffffffff80290105>] ? vm_unmap_aliases+0x48/0x1bb
      [   27.404766]  [<ffffffff80226d28>] change_page_attr_set_clr+0x13e/0x2e6
      [   27.405026]  [<ffffffff80698fa7>] ? _spin_unlock+0x26/0x2a
      [   27.405292]  [<ffffffff80227e6a>] ? reserve_memtype+0x19b/0x4e3
      [   27.405590]  [<ffffffff80226ffd>] _set_memory_wb+0x22/0x24
      [   27.405844]  [<ffffffff80225d28>] ioremap_change_attr+0x26/0x28
      [   27.406097]  [<ffffffff80228355>] reserve_pfn_range+0x1a3/0x235
      [   27.406427]  [<ffffffff80228430>] track_pfn_vma_new+0x49/0xb3
      [   27.406686]  [<ffffffff80286c46>] remap_pfn_range+0x94/0x32c
      [   27.406940]  [<ffffffff8022878d>] ? phys_mem_access_prot_allowed+0xb5/0x1a8
      [   27.407209]  [<ffffffff803e9bf4>] mmap_mem+0x75/0x9d
      [   27.407523]  [<ffffffff8028b3b4>] mmap_region+0x2cf/0x53e
      [   27.407776]  [<ffffffff8028b8cc>] do_mmap_pgoff+0x2a9/0x30d
      [   27.408034]  [<ffffffff8020f4a4>] sys_mmap+0x92/0xce
      [   27.408339]  [<ffffffff8020b65b>] system_call_fastpath+0x16/0x1b
      [   27.408614] ---[ end trace 4b16ad70c09a602d ]---
      [   27.408871] dmidecode:4913 reserve_pfn_range ioremap_change_attr failed write-back for cff6a000-cff6b000
      
      This is wih track_pfn_vma_new trying to keep identity map in sync.
      The address cff6a000 is the ACPI region according to e820.
      
      [    0.000000] BIOS-provided physical RAM map:
      [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
      [    0.000000]  BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
      [    0.000000]  BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
      [    0.000000]  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
      [    0.000000]  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
      [    0.000000]  BIOS-e820: 00000000cff60000 - 00000000cff69000 (ACPI data)
      [    0.000000]  BIOS-e820: 00000000cff69000 - 00000000cff80000 (ACPI NVS)
      [    0.000000]  BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
      [    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
      [    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
      [    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
      [    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
      [    0.000000]  BIOS-e820: 0000000100000000 - 0000000230000000 (usable)
      
      And is not mapped as per init_memory_mapping.
      
      [    0.000000] init_memory_mapping: 0000000000000000-00000000cff60000
      [    0.000000] init_memory_mapping: 0000000100000000-0000000230000000
      
      We can add logic to check for this. But, there can also be other holes in
      identity map when we have 1GB of aligned reserved space in e820.
      
      This patch handles it by removing the WARN_ON and returning a specific
      error value (EFAULT) to indicate that the address does not have any
      identity mapping.
      
      The code that tries to keep identity map in sync can ignore
      this error, with other callers of cpa still getting error here.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      58dab916
    • V
      x86 PAT: return compatible mapping to remap_pfn_range callers · cdecff68
      venkatesh.pallipadi@intel.com 提交于
      Impact: avoid warning message, potentially solve 3D performance regression
      
      Change x86 PAT code to return compatible memtype if the exact memtype that
      was requested in remap_pfn_rage and friends is not available due to some
      conflict.
      
      This is done by returning the compatible type in pgprot parameter of
      track_pfn_vma_new(), and the caller uses that memtype for page table.
      
      Note that track_pfn_vma_copy() which is basically called during fork gets the
      prot from existing page table and should not have any conflict. Hence we use
      strict memtype check there and do not allow compatible memtypes.
      
      This patch fixes the bug reported here:
      
        http://marc.info/?l=linux-kernel&m=123108883716357&w=2
      
      Specifically the error message:
      
        X:5010 map pfn expected mapping type write-back for d0000000-d0101000,
        got write-combining
      
      Should go away.
      Reported-and-bisected-by: NKevin Winchester <kjwinchester@gmail.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdecff68
    • V
      x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param · e4b866ed
      venkatesh.pallipadi@intel.com 提交于
      Impact: cleanup
      
      Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer.
      Subsequent patch changes the x86 PAT handling to return a compatible
      memtype in pgprot_t, if what was requested cannot be allowed due to conflicts.
      No fuctionality change in this patch.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e4b866ed
  4. 13 1月, 2009 1 次提交
    • A
      x86: avoid theoretical vmalloc fault loop · f313e123
      Andi Kleen 提交于
      Ajith Kumar noticed:
      
       I was going through the vmalloc fault handling for x86_64 and am unclear
       about the following lines in the vmalloc_fault() function.
      
       pgd = pgd_offset(current->mm ?: &init_mm, address);
       pgd_ref = pgd_offset_k(address);
      
       Here the intention is to get the pgd corresponding to the current process
       and sync it up with the pgd in init_mm(obtained from pgd_offset_k).
       However, for kernel threads current->mm is NULL and hence pgd =
       pgd_offset(init_mm, address) = pgd_ref which means the fault handler
       returns without setting the pgd entry in the MM structure in the context
       of which the kernel thread has faulted.  This could lead to never-ending
       faults and busy looping of kernel threads like pdflush.  So, shouldn't the
       pgd = pgd_offset(current->mm ?: &init_mm, address); be pgd =
       pgd_offset(current->active_mm ?: &init_mm, address);
      
      We can use active_mm unconditionally because it should be always set.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f313e123
  5. 08 1月, 2009 2 次提交
  6. 07 1月, 2009 2 次提交
    • G
      mm: show node to memory section relationship with symlinks in sysfs · c04fc586
      Gary Hade 提交于
      Show node to memory section relationship with symlinks in sysfs
      
      Add /sys/devices/system/node/nodeX/memoryY symlinks for all
      the memory sections located on nodeX.  For example:
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      indicates that memory section 135 resides on node1.
      
      Also revises documentation to cover this change as well as updating
      Documentation/ABI/testing/sysfs-devices-memory to include descriptions
      of memory hotremove files 'phys_device', 'phys_index', and 'state'
      that were previously not described there.
      
      In addition to it always being a good policy to provide users with
      the maximum possible amount of physical location information for
      resources that can be hot-added and/or hot-removed, the following
      are some (but likely not all) of the user benefits provided by
      this change.
      Immediate:
        - Provides information needed to determine the specific node
          on which a defective DIMM is located.  This will reduce system
          downtime when the node or defective DIMM is swapped out.
        - Prevents unintended onlining of a memory section that was
          previously offlined due to a defective DIMM.  This could happen
          during node hot-add when the user or node hot-add assist script
          onlines _all_ offlined sections due to user or script inability
          to identify the specific memory sections located on the hot-added
          node.  The consequences of reintroducing the defective memory
          could be ugly.
        - Provides information needed to vary the amount and distribution
          of memory on specific nodes for testing or debugging purposes.
      Future:
        - Will provide information needed to identify the memory
          sections that need to be offlined prior to physical removal
          of a specific node.
      
      Symlink creation during boot was tested on 2-node x86_64, 2-node
      ppc64, and 2-node ia64 systems.  Symlink creation during physical
      memory hot-add tested on a 2-node x86_64 system.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c04fc586
    • N
      mm: invoke oom-killer from page fault · 1c0fe6e3
      Nick Piggin 提交于
      Rather than have the pagefault handler kill a process directly if it gets
      a VM_FAULT_OOM, have it call into the OOM killer.
      
      With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
      oom killing throttling, oom priority adjustment or selective disabling,
      panic on oom, etc), it's silly to unconditionally kill the faulting
      process at page fault time.  Create a hook for pagefault oom path to call
      into instead.
      
      Only converted x86 and uml so far.
      
      [akpm@linux-foundation.org: make __out_of_memory() static]
      [akpm@linux-foundation.org: fix comment]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jeff Dike <jdike@addtoit.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0fe6e3
  7. 06 1月, 2009 1 次提交
  8. 03 1月, 2009 1 次提交
  9. 02 1月, 2009 1 次提交
    • I
      x86: convert permanent_kmaps_init() from macro to inline · a9067d53
      Ingo Brueckl 提交于
      Impact: cleanup
      
      This compiler warning:
      
        arch/x86/mm/init_32.c:515: warning: unused variable 'pgd_base'
      
      triggers because permanent_kmaps_init() is a CPP macro in the
      !CONFIG_HIGHMEM case, that does not tell the compiler that the
      'pgd_base' parameter is used.
      
      Convert permanent_kmaps_init() (and set_highmem_pages_init()) to
      C inline functions - which gives the parameter a proper type and
      which gets rid of the compiler warning as well.
      Signed-off-by: NIngo Brueckl <ib@wupperonline.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a9067d53
  10. 24 12月, 2008 1 次提交
    • H
      x86: PAT: fix address types in track_pfn_vma_new() · c1c15b65
      H. Peter Anvin 提交于
      Impact: cleanup, fix warning
      
      This warning:
      
       arch/x86/mm/pat.c: In function track_pfn_vma_copy:
       arch/x86/mm/pat.c:701: warning: passing argument 5 of follow_phys from incompatible pointer type
      
      Triggers because physical addresses are resource_size_t, not u64.
      
      This really matters when calling an interface like follow_phys() which
      takes a pointer to a physical address -- although on x86, being
      littleendian, it would generally work anyway as long as the memory region
      wasn't completely uninitialized.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c1c15b65
  11. 20 12月, 2008 1 次提交
  12. 19 12月, 2008 2 次提交
  13. 18 12月, 2008 1 次提交
  14. 17 12月, 2008 3 次提交
  15. 12 12月, 2008 1 次提交
  16. 14 11月, 2008 1 次提交
  17. 13 11月, 2008 1 次提交
    • R
      x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA set · 97a70e54
      Rafael J. Wysocki 提交于
      Impact: fix crash during hibernation on 32-bit NUMA
      
      The NUMA code on x86_32 creates special memory mapping that allows
      each node's pgdat to be located in this node's memory.  For this
      purpose it allocates a memory area at the end of each node's memory
      and maps this area so that it is accessible with virtual addresses
      belonging to low memory.  As a result, if there is high memory,
      these NUMA-allocated areas are physically located in high memory,
      although they are mapped to low memory addresses.
      
      Our hibernation code does not take that into account and for this
      reason hibernation fails on all x86_32 systems with CONFIG_NUMA=y and
      with high memory present.  Fix this by adding a special mapping for
      the NUMA-allocated memory areas to the temporary page tables created
      during the last phase of resume.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97a70e54
  18. 06 11月, 2008 1 次提交
  19. 31 10月, 2008 3 次提交
  20. 29 10月, 2008 4 次提交
    • G
      x86: remove debug code from arch_add_memory() · fe8b868e
      Gary Hade 提交于
      Impact: remove incorrect WARN_ON(1)
      
      Gets rid of dmesg spam created during physical memory hot-add which
      will very likely confuse users.  The change removes what appears to
      be debugging code which I assume was unintentionally included in:
      
        x86: arch/x86/mm/init_64.c printk fixes
        commit 10f22ddeSigned-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fe8b868e
    • H
      x86: start annotating early ioremap pointers with __iomem · 1d6cf1fe
      Harvey Harrison 提交于
      Impact: some new sparse warnings in e820.c etc, but no functional change.
      
      As with regular ioremap, iounmap etc, annotate with __iomem.
      
      Fixes the following sparse warnings, will produce some new ones
      elsewhere in arch/x86 that will get worked out over time.
      
      arch/x86/mm/ioremap.c:402:9: warning: cast removes address space of expression
      arch/x86/mm/ioremap.c:406:10: warning: cast adds address space to expression (<asn:2>)
      arch/x86/mm/ioremap.c:782:19: warning: Using plain integer as NULL pointer
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1d6cf1fe
    • H
      x86: two trivial sparse annotations · 9352f569
      Harvey Harrison 提交于
      Impact: fewer sparse warnings, no functional changes
      
      arch/x86/kernel/vsmp_64.c:87:14: warning: incorrect type in argument 1 (different address spaces)
      arch/x86/kernel/vsmp_64.c:87:14:    expected void const volatile [noderef] <asn:2>*addr
      arch/x86/kernel/vsmp_64.c:87:14:    got void *[assigned] address
      arch/x86/kernel/vsmp_64.c:88:22: warning: incorrect type in argument 1 (different address spaces)
      arch/x86/kernel/vsmp_64.c:88:22:    expected void const volatile [noderef] <asn:2>*addr
      arch/x86/kernel/vsmp_64.c:88:22:    got void *
      arch/x86/kernel/vsmp_64.c:100:23: warning: incorrect type in argument 2 (different address spaces)
      arch/x86/kernel/vsmp_64.c:100:23:    expected void volatile [noderef] <asn:2>*addr
      arch/x86/kernel/vsmp_64.c:100:23:    got void *
      arch/x86/kernel/vsmp_64.c:101:23: warning: incorrect type in argument 1 (different address spaces)
      arch/x86/kernel/vsmp_64.c:101:23:    expected void const volatile [noderef] <asn:2>*addr
      arch/x86/kernel/vsmp_64.c:101:23:    got void *
      arch/x86/mm/gup.c:235:6: warning: incorrect type in argument 1 (different base types)
      arch/x86/mm/gup.c:235:6:    expected void const volatile [noderef] <asn:1>*<noident>
      arch/x86/mm/gup.c:235:6:    got unsigned long [unsigned] [assigned] start
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9352f569
    • Y
      x86: fix init_memory_mapping for [dc000000 - e0000000) - v2 · f96f57d9
      Yinghai Lu 提交于
      Impact: change over-mapping to precise mapping, fix /proc/meminfo output
      
      v2: fix less than 1G ram system handling
      
      when gart aperture is 0xdc000000 - 0xe0000000
      it return 0xc0000000 - 0xe0000000
      
      that is not right.
      
      this patch fix that will get exact mapping
      
      on 256g sytem with that aperture after patch
      LBSuse:~ # cat /proc/meminfo
      MemTotal:       264742432 kB
      MemFree:        263920628 kB
      Buffers:            1416 kB
      Cached:            24468 kB
      ...
      DirectMap4k:      5760 kB
      DirectMap2M:   3205120 kB
      DirectMap1G:  265289728 kB
      
      it is consistent to
      LBSuse:~ # cat /sys/kernel/debug/kernel_page_tables
      ..
      ---[ Low Kernel Mapping ]---
      0xffff880000000000-0xffff880000200000           2M     RW             GLB x  pte
      0xffff880000200000-0xffff880040000000        1022M     RW         PSE GLB x  pmd
      0xffff880040000000-0xffff8800c0000000           2G     RW         PSE GLB NX pud
      0xffff8800c0000000-0xffff8800d7e00000         382M     RW         PSE GLB NX pmd
      0xffff8800d7e00000-0xffff8800d7fa0000        1664K     RW             GLB NX pte
      0xffff8800d7fa0000-0xffff8800d8000000         384K                           pte
      0xffff8800d8000000-0xffff8800dc000000          64M                           pmd
      0xffff8800dc000000-0xffff8800e0000000          64M     RW         PSE GLB NX pmd
      0xffff8800e0000000-0xffff880100000000         512M                           pmd
      0xffff880100000000-0xffff880800000000          28G     RW         PSE GLB NX pud
      0xffff880800000000-0xffff880824600000         582M     RW         PSE GLB NX pmd
      0xffff880824600000-0xffff8808247f0000        1984K     RW             GLB NX pte
      0xffff8808247f0000-0xffff880824800000          64K     RW     PCD     GLB NX pte
      0xffff880824800000-0xffff880840000000         440M     RW         PSE GLB NX pmd
      0xffff880840000000-0xffff884000000000         223G     RW         PSE GLB NX pud
      0xffff884000000000-0xffff884028000000         640M     RW         PSE GLB NX pmd
      0xffff884028000000-0xffff884040000000         384M                           pmd
      0xffff884040000000-0xffff888000000000         255G                           pud
      0xffff888000000000-0xffffc20000000000       58880G                           pgd
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f96f57d9
  21. 28 10月, 2008 4 次提交
  22. 27 10月, 2008 1 次提交
  23. 23 10月, 2008 1 次提交
  24. 22 10月, 2008 2 次提交