1. 12 4月, 2008 1 次提交
    • Z
      [IA64] Fix NUMA configuration issue · 98075d24
      Zoltan Menyhart 提交于
      There is a NUMA memory configuration issue in 2.6.24:
      
      A 2-node machine of ours has got the following memory layout:
      
      Node 0:	0 - 2 Gbytes
      Node 0:	4 - 8 Gbytes
      Node 1:	8 - 16 Gbytes
      Node 0:	16 - 18 Gbytes
      
      "efi_memmap_init()" merges the three last ranges into one.
      
      "register_active_ranges()" is called as follows:
      
      efi_memmap_walk(register_active_ranges, NULL);
      
      i.e. once for the 4 - 18 Gbytes range. It picks up the node
      number from the start address, and registers all the memory for
      the node #0.
      
      "register_active_ranges()" should be called as follows to
      make sure there is no merged address range at its entry:
      
      efi_memmap_walk(filter_memory, register_active_ranges);
      
      "filter_memory()" is similar to "filter_rsvd_memory()",
      but the reserved memory ranges are not filtered out.
      Signed-off-by: NZoltan Menyhart <Zoltan.Menyhart@bull.net>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      98075d24
  2. 10 4月, 2008 1 次提交
  3. 07 3月, 2008 1 次提交
  4. 30 10月, 2007 1 次提交
    • A
      [IA64] ia64/mm/init.c: fix section mismatches · 18b8befd
      Adrian Bunk 提交于
      This patch fixes the following section mismatches:
      
      <--  snip  -->
      
      ...
      WARNING: vmlinux.o(.text+0x5b5c2): Section mismatch: reference to .init.text:memmap_init_zone (between 'memmap_init' and 'virtual_memmap_init')
      WARNING: vmlinux.o(.text+0x5b842): Section mismatch: reference to .init.text:memmap_init_zone (between 'virtual_memmap_init' and 'ia64_mmu_init')
      ...
      
      <--  snip  -->
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      18b8befd
  5. 20 10月, 2007 1 次提交
  6. 17 10月, 2007 3 次提交
  7. 12 5月, 2007 1 次提交
  8. 08 5月, 2007 1 次提交
    • C
      Make page->private usable in compound pages · d85f3385
      Christoph Lameter 提交于
      If we add a new flag so that we can distinguish between the first page and the
      tail pages then we can avoid to use page->private in the first page.
      page->private == page for the first page, so there is no real information in
      there.
      
      Freeing up page->private makes the use of compound pages more transparent.
      They become more usable like real pages.  Right now we have to be careful f.e.
       if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
      can then no longer use the private field.  This is one of the issues that
      cause us not to support debugging for page size slabs in SLAB.
      
      Having page->private available for SLUB would allow more meta information in
      the page struct.  I can probably avoid the 16 bit ints that I have in there
      right now.
      
      Also if page->private is available then a compound page may be equipped with
      buffer heads.  This may free up the way for filesystems to support larger
      blocks than page size.
      
      We add PageTail as an alias of PageReclaim.  Compound pages cannot currently
      be reclaimed.  Because of the alias one needs to check PageCompound first.
      
      The RFC for the this approach was discussed at
      http://marc.info/?t=117574302800001&r=1&w=2
      
      [nacc@us.ibm.com: fix hugetlbfs]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d85f3385
  9. 30 3月, 2007 1 次提交
    • K
      [IA64] bugfix stack layout upside-down · 83d2cd3d
      KAMEZAWA Hiroyuki 提交于
      ia64 expects following vm layout:
      
      == low memory
      [register-stack grows up]
      [memory-stack grows down]
      == high memory
      
      But the code assigns the base of the register stack at the
      maximum stack size offset from the fixed address where the
      stack *might* start.  Stack randomization will result in the
      memory stack starting at a lower address than this, and if the
      user has set a low stack limit with "ulimit -s", then you can
      end up with the register stack above the memory stack (or if
      you were very unlucky right on top of it!).
      
      Fix: Calculate the base address for the register stack starting
      from the actual address of the memory stack.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      83d2cd3d
  10. 21 3月, 2007 1 次提交
    • Z
      [IA64] min_low_pfn and max_low_pfn calculation fix · a3f5c338
      Zou Nan hai 提交于
      We have seen bad_pte_print when testing crashdump on an SN machine in
      recent 2.6.20 kernel.  There are tons of bad pte print (pfn < max_low_pfn)
      reports when the crash kernel boots up, all those reported bad pages
      are inside initmem range; That is because if the crash kernel code and
      data happens to be at the beginning of the 1st node. build_node_maps in
      discontig.c will bypass reserved regions with filter_rsvd_memory. Since
      min_low_pfn is calculated in build_node_map, so in this case, min_low_pfn
      will be greater than kernel code and data.
      
      Because pages inside initmem are freed and reused later, we saw
      pfn_valid check fail on those pages.
      
      I think this theoretically happen on a normal kernel. When I check
      min_low_pfn and max_low_pfn calculation in contig.c and discontig.c.
      I found more issues than this.
      
      1. min_low_pfn and max_low_pfn calculation is inconsistent between
      contig.c and discontig.c,
      min_low_pfn is calculated as the first page number of boot memmap in
      contig.c (Why? Though this may work at the most of the time, I don't
      think it is the right logic). It is calculated as the lowest physical
      memory page number bypass reserved regions in discontig.c.
      max_low_pfn is calculated include reserved regions in contig.c. It is
      calculated exclude reserved regions in discontig.c.
      
      2. If kernel code and data region is happen to be at the begin or the
      end of physical memory, when min_low_pfn and max_low_pfn calculation is
      bypassed kernel code and data, pages in initmem will report bad.
      
      3. initrd is also in reserved regions, if it is at the begin or at the
      end of physical memory, kernel will refuse to reuse the memory. Because
      the virt_addr_valid check in free_initrd_mem.
      
      So it is better to fix and clean up those issues.
      Calculate min_low_pfn and max_low_pfn in a consistent way.
      Signed-off-by: NZou Nan hai <nanhai.zou@intel.com>
      Acked-by: NJay Lan <jlan@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      a3f5c338
  11. 12 2月, 2007 2 次提交
  12. 07 2月, 2007 1 次提交
    • C
      [IA64] relax per-cpu TLB requirement to DTC · 00b65985
      Chen, Kenneth W 提交于
      Instead of pinning per-cpu TLB into a DTR, use DTC.  This will free up
      one TLB entry for application, or even kernel if access pattern to
      per-cpu data area has high temporal locality.
      
      Since per-cpu is mapped at the top of region 7 address, we just need to
      add special case in alt_dtlb_miss.  The physical address of per-cpu data
      is already conveniently stored in IA64_KR(PER_CPU_DATA).  Latency for
      alt_dtlb_miss is not affected as we can hide all the latency.  It was
      measured that alt_dtlb_miss handler has 23 cycles latency before and
      after the patch.
      
      The performance effect is massive for applications that put lots of tlb
      pressure on CPU.  Workload environment like database online transaction
      processing or application uses tera-byte of memory would benefit the most.
      Measurement with industry standard database benchmark shown an upward
      of 1.6% gain.  While smaller workloads like cpu, java also showing small
      improvement.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      00b65985
  13. 06 2月, 2007 2 次提交
    • J
      [IA64] swiotlb bug fixes · cde14bbf
      Jan Beulich 提交于
      This patch fixes
      - marking I-cache clean of pages DMAed to now only done for IA64
      - broken multiple inclusion in include/asm-x86_64/swiotlb.h
      - missing call to mark_clean in swiotlb_sync_sg()
      - a (perhaps only theoretical) issue in swiotlb_dma_supported() when
      io_tlb_end is exactly at the end of memory
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      cde14bbf
    • B
      [IA64] register memory ranges in a consistent manner · 139b8304
      Bob Picco 提交于
      While pursuing and unrelated issue with 64Mb granules I noticed a problem
      related to inconsistent use of add_active_range.  There doesn't appear any
      reason to me why FLATMEM versus DISCONTIG_MEM should register memory to
      add_active_range with different code.  So I've changed the code into a
      common implementation.
      
      The other subtle issue fixed by this patch was calling add_active_range in
      count_node_pages before granule aligning is performed.  We were lucky with
      16MB granules but not so with 64MB granules.  count_node_pages has reserved
      regions filtered out and as a consequence linked kernel text and data
      aren't covered by calls to count_node_pages.  So linked kernel regions
      wasn't reported to add_active_regions.  This resulted in free_initmem
      causing numerous bad_page reports.  This won't occur with this patch
      because now all known memory regions are reported by
      register_active_ranges.
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      139b8304
  14. 12 1月, 2007 1 次提交
  15. 13 12月, 2006 1 次提交
  16. 08 12月, 2006 1 次提交
  17. 27 9月, 2006 1 次提交
  18. 04 8月, 2006 1 次提交
    • B
      [IA64] fix show_mem for VIRTUAL_MEM_MAP+FLATMEM · e44e41d0
      Bob Picco 提交于
      contig.c (FLATMEM) requires the same optimization as in discontig.c for show_mem
      when VIRTUAL_MEM_MAP is in use. Otherwise FLATMEM has softlockup timeouts.
      This was boot tested for memory configuration: SPARSEMEM,
      DISCONTIG+VIRTUAL_MEM_MAP, FLATMEM, FLATMEM+VIRTUAL_MEM_MAP and
      FLATMEM+VIRTUAL_MEM_MAP with largest memory gap less than LARGE_GAP by
      using boot parameter "mem=".
      
      This was boot tested and "echo m >/proc/sysrq-trigger" output evaluated for
      : FLATMEM, FLATMEM+VIRTUAL_MEM_MAP, DISCONTIGMEM+VIRTUAL_MEM_MAP and
      SPARSEMEM.
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e44e41d0
  19. 01 7月, 2006 1 次提交
  20. 28 6月, 2006 1 次提交
  21. 15 5月, 2006 1 次提交
  22. 28 3月, 2006 2 次提交
  23. 23 3月, 2006 3 次提交
  24. 22 3月, 2006 1 次提交
  25. 17 1月, 2006 1 次提交
  26. 30 10月, 2005 2 次提交
    • H
      [PATCH] mm: init_mm without ptlock · 872fec16
      Hugh Dickins 提交于
      First step in pushing down the page_table_lock.  init_mm.page_table_lock has
      been used throughout the architectures (usually for ioremap): not to serialize
      kernel address space allocation (that's usually vmlist_lock), but because
      pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.
      
      Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
      architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
      and drop it when allocating a new one, to check lest a racing task already
      did.  Similarly no page_table_lock in vmalloc's map_vm_area.
      
      Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
      user mms, which are converted only by a later patch, for now they have to lock
      differently according to whether or not it's init_mm.
      
      If sources get muddled, there's a danger that an arch source taking
      init_mm.page_table_lock will be mixed with common source also taking it (or
      neither take it).  So break the rules and make another change, which should
      break the build for such a mismatch: remove the redundant mm arg from
      pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).
      
      Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
      used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
      pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
      map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
      took page_table_lock for no good reason.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      872fec16
    • H
      [PATCH] mm: ia64 use expand_upwards · 46dea3d0
      Hugh Dickins 提交于
      ia64 has expand_backing_store function for growing its Register Backing Store
      vma upwards.  But more complete code for this purpose is found in the
      CONFIG_STACK_GROWSUP part of mm/mmap.c.  Uglify its #ifdefs further to provide
      expand_upwards for ia64 as well as expand_stack for parisc.
      
      The Register Backing Store vma should be marked VM_ACCOUNT.  Implement the
      intention of growing it only a page at a time, instead of passing an address
      outside of the vma to handle_mm_fault, with unknown consequences.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      46dea3d0
  27. 05 10月, 2005 1 次提交
  28. 31 8月, 2005 1 次提交
    • P
      [IA64] Fix nasty VMLPT problem... · 6cf07a8c
      Peter Chubb 提交于
      I've solved the problem I was having with the simulator and not
      booting Debian.
      
      The problem is that the number of bits for the virtual linear array
      short-format VHPT (Virtually mapped linear page table, VMLPT for
      short) is being tested incorrectly. 
      
      There are two problems:
            1. The PAL call that should tell the kernel the size of the
            virtual address space isn't implemented for the simulator, so
            the kernel uses the default 50.  This is addressed separately
            in dc90e95f
      
            2.  In arch/ia64/mm/init.c there's code to calcualte the size
            of the VMLPT based on the number of implemented virtual address
            bits and the page size.  It checks to see if the VMLPT base
            address overlaps the top of the mapped region, but this check
            doesn't allow for the address space hole, and in fact will
            never trigger.
      
      Here's an alternative test and panic, that I think is more accurate.
      Signed-off-by: NPeter Chubb <peterc@gelato.unsw.edu.au>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      6cf07a8c
  29. 07 7月, 2005 1 次提交
    • B
      [IA64] memory-less-nodes repost · 564601a5
      bob.picco 提交于
      I reworked how nodes with only CPUs are treated.  The patch below seems
      simpler to me and has eliminated the complicated routine
      reassign_cpu_only_nodes.  There isn't any longer the requirement
      to modify ACPI NUMA information which was in large part the
      complexity introduced in reassign_cpu_only_nodes. 
      
      This patch will produce a different number of nodes. For example,
      reassign_cpu_only_nodes would reduce two CPUonly nodes and one memory node
      configuration to one memory+CPUs node configuration.  This patch
      doesn't change the number of nodes which means the user will see three.  Two
      nodes without memory and one node with all the memory.
      
      While doing this patch, I noticed that early_nr_phys_cpus_node isn't serving
      any useful purpose.  It is called once in find_pernode_space but the value
      isn't used to computer pernode space.  
      Signed-off-by: Nbob.picco <bob.picco@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      564601a5
  30. 09 6月, 2005 1 次提交
  31. 26 4月, 2005 2 次提交
    • T
      [IA64] MAX_PGT_FREES_PER_PASS must be 'L' to avoid warning · e96c9b47
      Tony Luck 提交于
      'min' is very picky about types of arguments, make it happy
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e96c9b47
    • R
      [IA64] Percpu quicklist for combined allocator for pgd/pmd/pte. · fde740e4
      Robin Holt 提交于
      This patch introduces using the quicklists for pgd, pmd, and pte levels
      by combining the alloc and free functions into a common set of routines.
      This greatly simplifies the reading of this header file.
      
      This patch is simple but necessary for large numa configurations.
      It simply ensures that only pages from the local node are added to a
      cpus quicklist.  This prevents the trapping of pages on a remote nodes
      quicklist by starting a process, touching a large number of pages to
      fill pmd and pte entries, migrating to another node, and then unmapping
      or exiting.  With those conditions, the pages get trapped and if the
      machine has more than 100 nodes of the same size, the calculation of
      the pgtable high water mark will be larger than any single node so page
      table cache flushing will never occur.
      
      I ran lmbench lat_proc fork and lat_proc exec on a zx1 with and without
      this patch and did not notice any change.
      
      On an sn2 machine, there was a slight improvement which is possibly
      due to pages from other nodes trapped on the test node before starting
      the run.  I did not investigate further.
      
      This patch shrinks the quicklist based upon free memory on the node
      instead of the high/low water marks.  I have written it to enable
      preemption periodically and recalculate the amount to shrink every time
      we have freed enough pages that the quicklist size should have grown.
      I rescan the nodes zones each pass because other processess may be
      draining node memory at the same time as we are adding.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      fde740e4