1. 22 3月, 2006 11 次提交
  2. 17 3月, 2006 3 次提交
  3. 15 3月, 2006 2 次提交
  4. 10 3月, 2006 4 次提交
    • C
      [PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages. · 8fce4d8e
      Christoph Lameter 提交于
      The cache reaper currently tries to free all alien caches and all remote
      per cpu pages in each pass of cache_reap.  For a machines with large number
      of nodes (such as Altix) this may lead to sporadic delays of around ~10ms.
      Interrupts are disabled while reclaiming creating unacceptable delays.
      
      This patch changes that behavior by adding a per cpu reap_node variable.
      Instead of attempting to free all caches, we free only one alien cache and
      the per cpu pages from one remote node.  That reduces the time spend in
      cache_reap.  However, doing so will lengthen the time it takes to
      completely drain all remote per cpu pagesets and all alien caches.  The
      time needed will grow with the number of nodes in the system.  All caches
      are drained when they overflow their respective capacity.  So the drawback
      here is only that a bit of memory may be wasted for awhile longer.
      
      Details:
      
      1. Rename drain_remote_pages to drain_node_pages to allow the specification
         of the node to drain of pcp pages.
      
      2. Add additional functions init_reap_node, next_reap_node for NUMA
         that manage a per cpu reap_node counter.
      
      3. Add a reap_alien function that reaps only from the current reap_node.
      
      For us this seems to be a critical issue.  Holdoffs of an average of ~7ms
      cause some HPC benchmarks to slow down significantly.  F.e.  NAS parallel
      slows down dramatically.  NAS parallel has a 12-16 seconds runtime w/o rotor
      compared to 5.8 secs with the rotor patches.  It gets down to 5.05 secs with
      the additional interrupt holdoff reductions.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8fce4d8e
    • Y
      [PATCH] memory hotadd: pgdat->node_present_pages fix · f2937be5
      Yasunori Goto 提交于
      When pages are onlined, not only zone->present_pages but also
      pgdat->node_present_pages should be refreshed.
      
      This parameter is used to show information at
      /sys/device/system/node/nodeX/meminfo via si_meminfo_node().
      
      So, it shows strange value for MemUsed which is calculated
      (node_present_pages - all zones free pages).
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2937be5
    • C
      [PATCH] vmscan: no zone_reclaim if PF_MALLOC is set · a6bf5270
      Christoph Lameter 提交于
      If the process has already set PF_MALLOC and is already using
      current->reclaim_state then do not try to reclaim memory from the zone.
      This is set by kswapd and/or synchrononous global reclaim which will not
      take it lightly if we zap the reclaim_state.
      Signed-off-by: NChristoph Lameter <clameter@sig.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a6bf5270
    • H
      [PATCH] page_add_file_rmap(): remove BUG_ON()s · 85a6cd03
      Hugh Dickins 提交于
      Remove two early-development BUG_ONs from page_add_file_rmap.
      
      The pfn_valid test (originally useful for checking that nobody passed an
      artificial struct page) comes too late, since we already have the struct
      page.
      
      The PageAnon test (useful when anon was first distinguished from file rmap)
      prevents ->nopage implementations from reusing ->mapping, which would
      otherwise be available.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      85a6cd03
  5. 09 3月, 2006 4 次提交
  6. 07 3月, 2006 3 次提交
    • C
      [PATCH] numa_maps update · 397874df
      Christoph Lameter 提交于
      Change the format of numa_maps to be more compact and contain additional
      information that is useful for managing and troubleshooting memory on a
      NUMA system.  Numa_maps can now also support huge pages.
      
      Fixes:
      
      1. More compact format. Only display fields if they contain additional
      	information.
      
      2. Always display information for all vmas. The old numa_maps did not display
      	vma with no mapped entries. This was a bit confusing because page
      	migration removes ptes for file backed vmas. After page migration
      	a part of the vmas vanished.
      
      3. Rename maxref to maxmap. This is the maximum mapcount of all the pages
      	in a vma and may be used as an indicator as to how many processes
      	may be using a certain vma.
      
      4. Include the ability to scan over huge page vmas.
      
      New items shown:
      
      dirty
      	Number of pages in a vma that have either the dirty bit set in the
      	page_struct or in the pte.
      
      file=<filename>
      	The file backing the pages if any
      
      stack
      	Stack area
      
      heap
      	Heap area
      
      huge
      	Huge page area. The number of pages shows is the number of huge
      	pages not the regular sized pages.
      
      swapcache
      	Number of pages with swap references. Must be >0 in order to
      	be shown.
      
      active
      	Number of active pages. Only displayed if different from the number
      	of pages mapped.
      
      writeback
      	Number of pages under writeback. Only displayed if >0.
      
      Sample ouput of a process using huge pages:
      
      00000000 default
      2000000000000000 default file=/lib/ld-2.3.90.so mapped=13 mapmax=30 N0=13
      2000000000044000 default file=/lib/ld-2.3.90.so anon=2 dirty=2 swapcache=2 N2=2
      2000000000064000 default file=/lib/librt-2.3.90.so mapped=2 active=1 N1=1 N3=1
      2000000000074000 default file=/lib/librt-2.3.90.so
      2000000000080000 default file=/lib/librt-2.3.90.so anon=1 swapcache=1 N2=1
      2000000000084000 default
      2000000000088000 default file=/lib/libc-2.3.90.so mapped=52 mapmax=32 active=48 N0=52
      20000000002bc000 default file=/lib/libc-2.3.90.so
      20000000002c8000 default file=/lib/libc-2.3.90.so anon=3 dirty=2 swapcache=3 active=2 N1=1 N2=2
      20000000002d4000 default anon=1 swapcache=1 N1=1
      20000000002d8000 default file=/lib/libpthread-2.3.90.so mapped=8 mapmax=3 active=7 N2=2 N3=6
      20000000002fc000 default file=/lib/libpthread-2.3.90.so
      2000000000308000 default file=/lib/libpthread-2.3.90.so anon=1 dirty=1 swapcache=1 N1=1
      200000000030c000 default anon=1 dirty=1 swapcache=1 N1=1
      2000000000320000 default anon=1 dirty=1 N1=1
      200000000071c000 default
      2000000000720000 default anon=2 dirty=2 swapcache=1 N1=1 N2=1
      2000000000f1c000 default
      2000000000f20000 default anon=2 dirty=2 swapcache=1 active=1 N2=1 N3=1
      200000000171c000 default
      2000000001720000 default anon=1 dirty=1 swapcache=1 N1=1
      2000000001b20000 default
      2000000001b38000 default file=/lib/libgcc_s.so.1 mapped=2 N1=2
      2000000001b48000 default file=/lib/libgcc_s.so.1
      2000000001b54000 default file=/lib/libgcc_s.so.1 anon=1 dirty=1 active=0 N1=1
      2000000001b58000 default file=/lib/libunwind.so.7.0.0 mapped=2 active=1 N1=2
      2000000001b74000 default file=/lib/libunwind.so.7.0.0
      2000000001b80000 default file=/lib/libunwind.so.7.0.0
      2000000001b84000 default
      4000000000000000 default file=/media/huge/test9 mapped=1 N1=1
      6000000000000000 default file=/media/huge/test9 anon=1 dirty=1 active=0 N1=1
      6000000000004000 default heap
      607fffff7fffc000 default anon=1 dirty=1 swapcache=1 N2=1
      607fffffff06c000 default stack anon=1 dirty=1 active=0 N1=1
      8000000060000000 default file=/mnt/huge/test0 huge dirty=3 N1=3
      8000000090000000 default file=/mnt/huge/test1 huge dirty=3 N0=1 N2=2
      80000000c0000000 default file=/mnt/huge/test2 huge dirty=3 N1=1 N3=2
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      397874df
    • L
      slab: clarify and fix calculate_slab_order() · 9888e6fa
      Linus Torvalds 提交于
      If we triggered the 'offslab_limit' test, we would return with
      cachep->gfporder incremented once too many times.
      
      This clarifies the logic somewhat, and fixes that bug.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9888e6fa
    • L
      Fix "check_slabp" printout size calculation · 264132bc
      Linus Torvalds 提交于
      We want to use the "struct slab" size, not the size of the pointer to
      same.  As it is, we'd not print out the last <n> entry pointers in the
      slab (where <n> is ~10, depending on whether it's a 32-bit or 64-bit
      kernel).
      
      Gaah, that slab code was written by somebody who likes unreadable crud.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      264132bc
  7. 03 3月, 2006 2 次提交
  8. 01 3月, 2006 4 次提交
  9. 25 2月, 2006 2 次提交
  10. 22 2月, 2006 1 次提交
    • H
      [PATCH] tmpfs: fix mount mpol nodelist parsing · b00dc3ad
      Hugh Dickins 提交于
      I've been dissatisfied with the mpol_nodelist mount option which was
      added to tmpfs earlier in -rc.  Replace it by mpol=policy:nodelist.
      
      And it was broken: a nodelist is a comma-separated list of numbers and
      ranges; the mount options are a comma-separated list of token=values.
      Whoops, blindly strsep'ing on commas doesn't work so well: since we've
      no numeric tokens, and unlikely to add them, use that to distinguish.
      
      Move the mpol= parsing to shmem_parse_mpol under CONFIG_NUMA, reject
      all its options as invalid if not NUMA.  /proc shows MPOL_PREFERRED
      as "prefer", so use that name for the policy instead of "preferred".
      
      Enforce that mpol=default has no nodelist; that mpol=prefer has one
      node only; that mpol=bind has a nodelist; but let mpol=interleave use
      node_online_map if no nodelist given.  Describe this in tmpfs.txt.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NRobin Holt <holt@sgi.com>
      Acked-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b00dc3ad
  11. 21 2月, 2006 4 次提交