1. 22 6月, 2005 1 次提交
    • C
      [PATCH] node local per-cpu-pages · e7c8d5c9
      Christoph Lameter 提交于
      This patch modifies the way pagesets in struct zone are managed.
      
      Each zone has a per-cpu array of pagesets.  So any particular CPU has some
      memory in each zone structure which belongs to itself.  Even if that CPU is
      not local to that zone.
      
      So the patch relocates the pagesets for each cpu to the node that is nearest
      to the cpu instead of allocating the pagesets in the (possibly remote) target
      zone.  This means that the operations to manage pages on remote zone can be
      done with information available locally.
      
      We play a macro trick so that non-NUMA pmachines avoid the additional
      pointer chase on the page allocator fastpath.
      
      AIM7 benchmark on a 32 CPU SGI Altix
      
      w/o patches:
      Tasks    jobs/min  jti  jobs/min/task      real       cpu
          1      484.68  100       484.6769     12.01      1.97   Fri Mar 25 11:01:42 2005
        100    27140.46   89       271.4046     21.44    148.71   Fri Mar 25 11:02:04 2005
        200    30792.02   82       153.9601     37.80    296.72   Fri Mar 25 11:02:42 2005
        300    32209.27   81       107.3642     54.21    451.34   Fri Mar 25 11:03:37 2005
        400    34962.83   78        87.4071     66.59    588.97   Fri Mar 25 11:04:44 2005
        500    31676.92   75        63.3538     91.87    742.71   Fri Mar 25 11:06:16 2005
        600    36032.69   73        60.0545     96.91    885.44   Fri Mar 25 11:07:54 2005
        700    35540.43   77        50.7720    114.63   1024.28   Fri Mar 25 11:09:49 2005
        800    33906.70   74        42.3834    137.32   1181.65   Fri Mar 25 11:12:06 2005
        900    34120.67   73        37.9119    153.51   1325.26   Fri Mar 25 11:14:41 2005
       1000    34802.37   74        34.8024    167.23   1465.26   Fri Mar 25 11:17:28 2005
      
      with slab API changes and pageset patch:
      
      Tasks    jobs/min  jti  jobs/min/task      real       cpu
          1      485.00  100       485.0000     12.00      1.96   Fri Mar 25 11:46:18 2005
        100    28000.96   89       280.0096     20.79    150.45   Fri Mar 25 11:46:39 2005
        200    32285.80   79       161.4290     36.05    293.37   Fri Mar 25 11:47:16 2005
        300    40424.15   84       134.7472     43.19    438.42   Fri Mar 25 11:47:59 2005
        400    39155.01   79        97.8875     59.46    590.05   Fri Mar 25 11:48:59 2005
        500    37881.25   82        75.7625     76.82    730.19   Fri Mar 25 11:50:16 2005
        600    39083.14   78        65.1386     89.35    872.79   Fri Mar 25 11:51:46 2005
        700    38627.83   77        55.1826    105.47   1022.46   Fri Mar 25 11:53:32 2005
        800    39631.94   78        49.5399    117.48   1169.94   Fri Mar 25 11:55:30 2005
        900    36903.70   79        41.0041    141.94   1310.78   Fri Mar 25 11:57:53 2005
       1000    36201.23   77        36.2012    160.77   1458.31   Fri Mar 25 12:00:34 2005
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NShobhit Dayal <shobhit@calsoftinc.com>
      Signed-off-by: NShai Fultheim <Shai@Scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e7c8d5c9
  2. 06 5月, 2005 1 次提交
  3. 01 5月, 2005 1 次提交
  4. 20 4月, 2005 3 次提交
    • H
      [PATCH] freepgt: hugetlb_free_pgd_range · 3bf5ee95
      Hugh Dickins 提交于
      ia64 and ppc64 had hugetlb_free_pgtables functions which were no longer being
      called, and it wasn't obvious what to do about them.
      
      The ppc64 case turns out to be easy: the associated tables are noted elsewhere
      and freed later, safe to either skip its hugetlb areas or go through the
      motions of freeing nothing.  Since ia64 does need a special case, restore to
      ppc64 the special case of skipping them.
      
      The ia64 hugetlb case has been broken since pgd_addr_end went in, though it
      probably appeared to work okay if you just had one such area; in fact it's
      been broken much longer if you consider a long munmap spanning from another
      region into the hugetlb region.
      
      In the ia64 hugetlb region, more virtual address bits are available than in
      the other regions, yet the page tables are structured the same way: the page
      at the bottom is larger.  Here we need to scale down each addr before passing
      it to the standard free_pgd_range.  Was about to write a hugely_scaled_down
      macro, but found htlbpage_to_page already exists for just this purpose.  Fixed
      off-by-one in ia64 is_hugepage_only_range.
      
      Uninline free_pgd_range to make it available to ia64.  Make sure the
      vma-gathering loop in free_pgtables cannot join a hugepage_only_range to any
      other (safe to join huges?  probably but don't bother).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3bf5ee95
    • H
      [PATCH] freepgt: remove MM_VM_SIZE(mm) · ee39b37b
      Hugh Dickins 提交于
      There's only one usage of MM_VM_SIZE(mm) left, and it's a troublesome macro
      because mm doesn't contain the (32-bit emulation?) info needed.  But it too is
      only needed because we ignore the end from the vma list.
      
      We could make flush_pgtables return that end, or unmap_vmas.  Choose the
      latter, since it's a natural fit with unmap_mapping_range_vma needing to know
      its restart addr.  This does make more than minimal change, but if unmap_vmas
      had returned the end before, this is how we'd have done it, rather than
      storing the break_addr in zap_details.
      
      unmap_vmas used to return count of vmas scanned, but that's just debug which
      hasn't been useful in a while; and if we want the map_count 0 on exit check
      back, it can easily come from the final remove_vm_struct loop.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ee39b37b
    • H
      [PATCH] freepgt: free_pgtables use vma list · e0da382c
      Hugh Dickins 提交于
      Recent woes with some arches needing their own pgd_addr_end macro; and 4-level
      clear_page_range regression since 2.6.10's clear_page_tables; and its
      long-standing well-known inefficiency in searching throughout the higher-level
      page tables for those few entries to clear and free: all can be blamed on
      ignoring the list of vmas when we free page tables.
      
      Replace exit_mmap's clear_page_range of the total user address space by
      free_pgtables operating on the mm's vma list; unmap_region use it in the same
      way, giving floor and ceiling beyond which it may not free tables.  This
      brings lmbench fork/exec/sh numbers back to 2.6.10 (unless preempt is enabled,
      in which case latency fixes spoil unmap_vmas throughput).
      
      Beware: the do_mmap_pgoff driver failure case must now use unmap_region
      instead of zap_page_range, since a page table might have been allocated, and
      can only be freed while it is touched by some vma.
      
      Move free_pgtables from mmap.c to memory.c, where its lower levels are adapted
      from the clear_page_range levels.  (Most of free_pgtables' old code was
      actually for a non-existent case, prev not properly set up, dating from before
      hch gave us split_vma.) Pass mmu_gather** in the public interfaces, since we
      might want to add latency lockdrops later; but no attempt to do so yet, going
      by vma should itself reduce latency.
      
      But what if is_hugepage_only_range?  Those ia64 and ppc64 cases need careful
      examination: put that off until a later patch of the series.
      
      What of x86_64's 32bit vdso page __map_syscall32 maps outside any vma?
      
      And the range to sparc64's flush_tlb_pgtables?  It's less clear to me now that
      we need to do more than is done here - every PMD_SIZE ever occupied will be
      flushed, do we really have to flush every PGDIR_SIZE ever partially occupied? 
      A shame to complicate it unnecessarily.
      
      Special thanks to David Miller for time spent repairing my ceilings.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e0da382c
  5. 17 4月, 2005 2 次提交