1. 07 1月, 2009 1 次提交
    • G
      mm: show node to memory section relationship with symlinks in sysfs · c04fc586
      Gary Hade 提交于
      Show node to memory section relationship with symlinks in sysfs
      
      Add /sys/devices/system/node/nodeX/memoryY symlinks for all
      the memory sections located on nodeX.  For example:
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      indicates that memory section 135 resides on node1.
      
      Also revises documentation to cover this change as well as updating
      Documentation/ABI/testing/sysfs-devices-memory to include descriptions
      of memory hotremove files 'phys_device', 'phys_index', and 'state'
      that were previously not described there.
      
      In addition to it always being a good policy to provide users with
      the maximum possible amount of physical location information for
      resources that can be hot-added and/or hot-removed, the following
      are some (but likely not all) of the user benefits provided by
      this change.
      Immediate:
        - Provides information needed to determine the specific node
          on which a defective DIMM is located.  This will reduce system
          downtime when the node or defective DIMM is swapped out.
        - Prevents unintended onlining of a memory section that was
          previously offlined due to a defective DIMM.  This could happen
          during node hot-add when the user or node hot-add assist script
          onlines _all_ offlined sections due to user or script inability
          to identify the specific memory sections located on the hot-added
          node.  The consequences of reintroducing the defective memory
          could be ugly.
        - Provides information needed to vary the amount and distribution
          of memory on specific nodes for testing or debugging purposes.
      Future:
        - Will provide information needed to identify the memory
          sections that need to be offlined prior to physical removal
          of a specific node.
      
      Symlink creation during boot was tested on 2-node x86_64, 2-node
      ppc64, and 2-node ia64 systems.  Symlink creation during physical
      memory hot-add tested on a 2-node x86_64 system.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c04fc586
  2. 20 10月, 2008 1 次提交
    • B
      mm: cleanup to make remove_memory() arch-neutral · 71088785
      Badari Pulavarty 提交于
      There is nothing architecture specific about remove_memory().
      remove_memory() function is common for all architectures which support
      hotplug memory remove.  Instead of duplicating it in every architecture,
      collapse them into arch neutral function.
      
      [akpm@linux-foundation.org: fix the export]
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71088785
  3. 07 9月, 2008 1 次提交
  4. 16 5月, 2008 1 次提交
    • H
      [IA64] fix personality(PER_LINUX32) performance issue · 839052d2
      Huang, Xiaolan 提交于
      The patch aims to fix a performance issue for the syscall
      personality(PER_LINUX32).
      
      On IA-64 box, the syscall personality (PER_LINUX32) has poor performance
      because it failed to find the Linux/x86 execution domain. Then it tried
      to load the kernel module however it failed always and it used the default
      execution domain PER_LINUX instead. Requesting kernel modules is very
      expensive. It caused the performance issue. (see the function
      lookup_exec_domain in kernel/exec_domain.c).
      
      To resolve the issue, execution domain Linux/x86 is always registered in
      initialization time for IA-64 architecture.
      Signed-off-by: NXiaolan Huang <xiaolan.huang@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      839052d2
  5. 28 4月, 2008 1 次提交
  6. 12 4月, 2008 1 次提交
    • Z
      [IA64] Fix NUMA configuration issue · 98075d24
      Zoltan Menyhart 提交于
      There is a NUMA memory configuration issue in 2.6.24:
      
      A 2-node machine of ours has got the following memory layout:
      
      Node 0:	0 - 2 Gbytes
      Node 0:	4 - 8 Gbytes
      Node 1:	8 - 16 Gbytes
      Node 0:	16 - 18 Gbytes
      
      "efi_memmap_init()" merges the three last ranges into one.
      
      "register_active_ranges()" is called as follows:
      
      efi_memmap_walk(register_active_ranges, NULL);
      
      i.e. once for the 4 - 18 Gbytes range. It picks up the node
      number from the start address, and registers all the memory for
      the node #0.
      
      "register_active_ranges()" should be called as follows to
      make sure there is no merged address range at its entry:
      
      efi_memmap_walk(filter_memory, register_active_ranges);
      
      "filter_memory()" is similar to "filter_rsvd_memory()",
      but the reserved memory ranges are not filtered out.
      Signed-off-by: NZoltan Menyhart <Zoltan.Menyhart@bull.net>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      98075d24
  7. 10 4月, 2008 1 次提交
  8. 07 3月, 2008 1 次提交
  9. 30 10月, 2007 1 次提交
    • A
      [IA64] ia64/mm/init.c: fix section mismatches · 18b8befd
      Adrian Bunk 提交于
      This patch fixes the following section mismatches:
      
      <--  snip  -->
      
      ...
      WARNING: vmlinux.o(.text+0x5b5c2): Section mismatch: reference to .init.text:memmap_init_zone (between 'memmap_init' and 'virtual_memmap_init')
      WARNING: vmlinux.o(.text+0x5b842): Section mismatch: reference to .init.text:memmap_init_zone (between 'virtual_memmap_init' and 'ia64_mmu_init')
      ...
      
      <--  snip  -->
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      18b8befd
  10. 20 10月, 2007 1 次提交
  11. 17 10月, 2007 3 次提交
  12. 12 5月, 2007 1 次提交
  13. 08 5月, 2007 1 次提交
    • C
      Make page->private usable in compound pages · d85f3385
      Christoph Lameter 提交于
      If we add a new flag so that we can distinguish between the first page and the
      tail pages then we can avoid to use page->private in the first page.
      page->private == page for the first page, so there is no real information in
      there.
      
      Freeing up page->private makes the use of compound pages more transparent.
      They become more usable like real pages.  Right now we have to be careful f.e.
       if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
      can then no longer use the private field.  This is one of the issues that
      cause us not to support debugging for page size slabs in SLAB.
      
      Having page->private available for SLUB would allow more meta information in
      the page struct.  I can probably avoid the 16 bit ints that I have in there
      right now.
      
      Also if page->private is available then a compound page may be equipped with
      buffer heads.  This may free up the way for filesystems to support larger
      blocks than page size.
      
      We add PageTail as an alias of PageReclaim.  Compound pages cannot currently
      be reclaimed.  Because of the alias one needs to check PageCompound first.
      
      The RFC for the this approach was discussed at
      http://marc.info/?t=117574302800001&r=1&w=2
      
      [nacc@us.ibm.com: fix hugetlbfs]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d85f3385
  14. 30 3月, 2007 1 次提交
    • K
      [IA64] bugfix stack layout upside-down · 83d2cd3d
      KAMEZAWA Hiroyuki 提交于
      ia64 expects following vm layout:
      
      == low memory
      [register-stack grows up]
      [memory-stack grows down]
      == high memory
      
      But the code assigns the base of the register stack at the
      maximum stack size offset from the fixed address where the
      stack *might* start.  Stack randomization will result in the
      memory stack starting at a lower address than this, and if the
      user has set a low stack limit with "ulimit -s", then you can
      end up with the register stack above the memory stack (or if
      you were very unlucky right on top of it!).
      
      Fix: Calculate the base address for the register stack starting
      from the actual address of the memory stack.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      83d2cd3d
  15. 21 3月, 2007 1 次提交
    • Z
      [IA64] min_low_pfn and max_low_pfn calculation fix · a3f5c338
      Zou Nan hai 提交于
      We have seen bad_pte_print when testing crashdump on an SN machine in
      recent 2.6.20 kernel.  There are tons of bad pte print (pfn < max_low_pfn)
      reports when the crash kernel boots up, all those reported bad pages
      are inside initmem range; That is because if the crash kernel code and
      data happens to be at the beginning of the 1st node. build_node_maps in
      discontig.c will bypass reserved regions with filter_rsvd_memory. Since
      min_low_pfn is calculated in build_node_map, so in this case, min_low_pfn
      will be greater than kernel code and data.
      
      Because pages inside initmem are freed and reused later, we saw
      pfn_valid check fail on those pages.
      
      I think this theoretically happen on a normal kernel. When I check
      min_low_pfn and max_low_pfn calculation in contig.c and discontig.c.
      I found more issues than this.
      
      1. min_low_pfn and max_low_pfn calculation is inconsistent between
      contig.c and discontig.c,
      min_low_pfn is calculated as the first page number of boot memmap in
      contig.c (Why? Though this may work at the most of the time, I don't
      think it is the right logic). It is calculated as the lowest physical
      memory page number bypass reserved regions in discontig.c.
      max_low_pfn is calculated include reserved regions in contig.c. It is
      calculated exclude reserved regions in discontig.c.
      
      2. If kernel code and data region is happen to be at the begin or the
      end of physical memory, when min_low_pfn and max_low_pfn calculation is
      bypassed kernel code and data, pages in initmem will report bad.
      
      3. initrd is also in reserved regions, if it is at the begin or at the
      end of physical memory, kernel will refuse to reuse the memory. Because
      the virt_addr_valid check in free_initrd_mem.
      
      So it is better to fix and clean up those issues.
      Calculate min_low_pfn and max_low_pfn in a consistent way.
      Signed-off-by: NZou Nan hai <nanhai.zou@intel.com>
      Acked-by: NJay Lan <jlan@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      a3f5c338
  16. 12 2月, 2007 2 次提交
  17. 07 2月, 2007 1 次提交
    • C
      [IA64] relax per-cpu TLB requirement to DTC · 00b65985
      Chen, Kenneth W 提交于
      Instead of pinning per-cpu TLB into a DTR, use DTC.  This will free up
      one TLB entry for application, or even kernel if access pattern to
      per-cpu data area has high temporal locality.
      
      Since per-cpu is mapped at the top of region 7 address, we just need to
      add special case in alt_dtlb_miss.  The physical address of per-cpu data
      is already conveniently stored in IA64_KR(PER_CPU_DATA).  Latency for
      alt_dtlb_miss is not affected as we can hide all the latency.  It was
      measured that alt_dtlb_miss handler has 23 cycles latency before and
      after the patch.
      
      The performance effect is massive for applications that put lots of tlb
      pressure on CPU.  Workload environment like database online transaction
      processing or application uses tera-byte of memory would benefit the most.
      Measurement with industry standard database benchmark shown an upward
      of 1.6% gain.  While smaller workloads like cpu, java also showing small
      improvement.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      00b65985
  18. 06 2月, 2007 2 次提交
    • J
      [IA64] swiotlb bug fixes · cde14bbf
      Jan Beulich 提交于
      This patch fixes
      - marking I-cache clean of pages DMAed to now only done for IA64
      - broken multiple inclusion in include/asm-x86_64/swiotlb.h
      - missing call to mark_clean in swiotlb_sync_sg()
      - a (perhaps only theoretical) issue in swiotlb_dma_supported() when
      io_tlb_end is exactly at the end of memory
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      cde14bbf
    • B
      [IA64] register memory ranges in a consistent manner · 139b8304
      Bob Picco 提交于
      While pursuing and unrelated issue with 64Mb granules I noticed a problem
      related to inconsistent use of add_active_range.  There doesn't appear any
      reason to me why FLATMEM versus DISCONTIG_MEM should register memory to
      add_active_range with different code.  So I've changed the code into a
      common implementation.
      
      The other subtle issue fixed by this patch was calling add_active_range in
      count_node_pages before granule aligning is performed.  We were lucky with
      16MB granules but not so with 64MB granules.  count_node_pages has reserved
      regions filtered out and as a consequence linked kernel text and data
      aren't covered by calls to count_node_pages.  So linked kernel regions
      wasn't reported to add_active_regions.  This resulted in free_initmem
      causing numerous bad_page reports.  This won't occur with this patch
      because now all known memory regions are reported by
      register_active_ranges.
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      139b8304
  19. 12 1月, 2007 1 次提交
  20. 13 12月, 2006 1 次提交
  21. 08 12月, 2006 1 次提交
  22. 27 9月, 2006 1 次提交
  23. 04 8月, 2006 1 次提交
    • B
      [IA64] fix show_mem for VIRTUAL_MEM_MAP+FLATMEM · e44e41d0
      Bob Picco 提交于
      contig.c (FLATMEM) requires the same optimization as in discontig.c for show_mem
      when VIRTUAL_MEM_MAP is in use. Otherwise FLATMEM has softlockup timeouts.
      This was boot tested for memory configuration: SPARSEMEM,
      DISCONTIG+VIRTUAL_MEM_MAP, FLATMEM, FLATMEM+VIRTUAL_MEM_MAP and
      FLATMEM+VIRTUAL_MEM_MAP with largest memory gap less than LARGE_GAP by
      using boot parameter "mem=".
      
      This was boot tested and "echo m >/proc/sysrq-trigger" output evaluated for
      : FLATMEM, FLATMEM+VIRTUAL_MEM_MAP, DISCONTIGMEM+VIRTUAL_MEM_MAP and
      SPARSEMEM.
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e44e41d0
  24. 01 7月, 2006 1 次提交
  25. 28 6月, 2006 1 次提交
  26. 15 5月, 2006 1 次提交
  27. 28 3月, 2006 2 次提交
  28. 23 3月, 2006 3 次提交
  29. 22 3月, 2006 1 次提交
  30. 17 1月, 2006 1 次提交
  31. 30 10月, 2005 2 次提交
    • H
      [PATCH] mm: init_mm without ptlock · 872fec16
      Hugh Dickins 提交于
      First step in pushing down the page_table_lock.  init_mm.page_table_lock has
      been used throughout the architectures (usually for ioremap): not to serialize
      kernel address space allocation (that's usually vmlist_lock), but because
      pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.
      
      Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
      architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
      and drop it when allocating a new one, to check lest a racing task already
      did.  Similarly no page_table_lock in vmalloc's map_vm_area.
      
      Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
      user mms, which are converted only by a later patch, for now they have to lock
      differently according to whether or not it's init_mm.
      
      If sources get muddled, there's a danger that an arch source taking
      init_mm.page_table_lock will be mixed with common source also taking it (or
      neither take it).  So break the rules and make another change, which should
      break the build for such a mismatch: remove the redundant mm arg from
      pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).
      
      Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
      used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
      pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
      map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
      took page_table_lock for no good reason.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      872fec16
    • H
      [PATCH] mm: ia64 use expand_upwards · 46dea3d0
      Hugh Dickins 提交于
      ia64 has expand_backing_store function for growing its Register Backing Store
      vma upwards.  But more complete code for this purpose is found in the
      CONFIG_STACK_GROWSUP part of mm/mmap.c.  Uglify its #ifdefs further to provide
      expand_upwards for ia64 as well as expand_stack for parisc.
      
      The Register Backing Store vma should be marked VM_ACCOUNT.  Implement the
      intention of growing it only a page at a time, instead of passing an address
      outside of the vma to handle_mm_fault, with unknown consequences.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      46dea3d0
  32. 05 10月, 2005 1 次提交