1. 12 1月, 2006 3 次提交
  2. 10 1月, 2006 1 次提交
  3. 09 1月, 2006 6 次提交
    • P
      [PATCH] cpuset: memory pressure meter · 3e0d98b9
      Paul Jackson 提交于
      Provide a simple per-cpuset metric of memory pressure, tracking the -rate-
      that the tasks in a cpuset call try_to_free_pages(), the synchronous
      (direct) memory reclaim code.
      
      This enables batch managers monitoring jobs running in dedicated cpusets to
      efficiently detect what level of memory pressure that job is causing.
      
      This is useful both on tightly managed systems running a wide mix of
      submitted jobs, which may choose to terminate or reprioritize jobs that are
      trying to use more memory than allowed on the nodes assigned them, and with
      tightly coupled, long running, massively parallel scientific computing jobs
      that will dramatically fail to meet required performance goals if they
      start to use more memory than allowed to them.
      
      This patch just provides a very economical way for the batch manager to
      monitor a cpuset for signs of memory pressure.  It's up to the batch
      manager or other user code to decide what to do about it and take action.
      
      ==> Unless this feature is enabled by writing "1" to the special file
          /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
          code of __alloc_pages() for this metric reduces to simply noticing
          that the cpuset_memory_pressure_enabled flag is zero.  So only
          systems that enable this feature will compute the metric.
      
      Why a per-cpuset, running average:
      
          Because this meter is per-cpuset, rather than per-task or mm, the
          system load imposed by a batch scheduler monitoring this metric is
          sharply reduced on large systems, because a scan of the tasklist can be
          avoided on each set of queries.
      
          Because this meter is a running average, instead of an accumulating
          counter, a batch scheduler can detect memory pressure with a single
          read, instead of having to read and accumulate results for a period of
          time.
      
          Because this meter is per-cpuset rather than per-task or mm, the
          batch scheduler can obtain the key information, memory pressure in a
          cpuset, with a single read, rather than having to query and accumulate
          results over all the (dynamically changing) set of tasks in the cpuset.
      
      A per-cpuset simple digital filter (requires a spinlock and 3 words of data
      per-cpuset) is kept, and updated by any task attached to that cpuset, if it
      enters the synchronous (direct) page reclaim code.
      
      A per-cpuset file provides an integer number representing the recent
      (half-life of 10 seconds) rate of direct page reclaims caused by the tasks
      in the cpuset, in units of reclaims attempted per second, times 1000.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3e0d98b9
    • N
      [PATCH] mm: free_pages opt · 48db57f8
      Nick Piggin 提交于
      Try to streamline free_pages_bulk by ensuring callers don't pass in a
      'count' that exceeds the list size.
      
      Some cleanups:
      Rename __free_pages_bulk to __free_one_page.
      Put the page list manipulation from __free_pages_ok into free_one_page.
      Make __free_pages_ok static.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      48db57f8
    • N
      [PATCH] mm: cleanup zone_pcp · 23316bc8
      Nick Piggin 提交于
      Use zone_pcp everywhere even though NUMA code "knows" the internal details
      of the zone.  Stop other people trying to copy, and it looks nicer.
      
      Also, only print the pagesets of online cpus in zoneinfo.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "Seth, Rohit" <rohit.seth@intel.com>
      Cc: Christoph Lameter <christoph@lameter.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      23316bc8
    • R
      [PATCH] Make high and batch sizes of per_cpu_pagelists configurable · 8ad4b1fb
      Rohit Seth 提交于
      As recently there has been lot of traffic on the right values for batch and
      high water marks for per_cpu_pagelists.  This patch makes these two
      variables configurable through /proc interface.
      
      A new tunable /proc/sys/vm/percpu_pagelist_fraction is added.  This entry
      controls the fraction of pages at most in each zone that are allocated for
      each per cpu page list.  The min value for this is 8.  It means that we
      don't allow more than 1/8th of pages in each zone to be allocated in any
      single per_cpu_pagelist.
      
      The batch value of each per cpu pagelist is also updated as a result.  It
      is set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
      Signed-off-by: NRohit Seth <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8ad4b1fb
    • C
      [PATCH] slab: remove nested #ifdef CONFIG_NUMA · bec6b0c8
      Christoph Lameter 提交于
      For some reason there is an #ifdef CONFIG_NUMA within another #ifdef
      CONFIG_NUMA in the page allocator.  Remove innermost #ifdef CONFIG_NUMA
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bec6b0c8
    • A
      [PATCH] revert "mm: page_state fixes" · 84c2008a
      Andrew Morton 提交于
      Hugh says:
      
      page_alloc_cpu_notify() specifically contains code to
      
       		/* Add dead cpu's page_states to our own. */
      
      which handles this more efficiently.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      84c2008a
  4. 07 1月, 2006 18 次提交
  5. 16 12月, 2005 1 次提交
  6. 04 12月, 2005 1 次提交
  7. 29 11月, 2005 1 次提交
  8. 23 11月, 2005 2 次提交
    • H
      [PATCH] unpaged: PG_reserved bad_page · 689bcebf
      Hugh Dickins 提交于
      It used to be the case that PG_reserved pages were silently never freed, but
      in 2.6.15-rc1 they may be freed with a "Bad page state" message.  We should
      work through such cases as they appear, fixing the code; but for now it's
      safer to issue the message without freeing the page, leaving PG_reserved set.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      689bcebf
    • H
      [PATCH] unpaged: unifdefed PageCompound · 664beed0
      Hugh Dickins 提交于
      It looks like snd_xxx is not the only nopage to be using PageReserved as a way
      of holding a high-order page together: which no longer works, but is masked by
      our failure to free from VM_RESERVED areas.  We cannot fix that bug without
      first substituting another way to hold the high-order page together, while
      farming out the 0-order pages from within it.
      
      That's just what PageCompound is designed for, but it's been kept under
      CONFIG_HUGETLB_PAGE.  Remove the #ifdefs: which saves some space (out- of-line
      put_page), doesn't slow down what most needs to be fast (already using
      hugetlb), and unifies the way we handle high-order pages.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      664beed0
  9. 18 11月, 2005 1 次提交
  10. 15 11月, 2005 3 次提交
    • A
      [PATCH] x86_64: Remove obsolete ARCH_HAS_ATOMIC_UNSIGNED and page_flags_t · 07808b74
      Andi Kleen 提交于
      Has been introduced for x86-64 at some point to save memory
      in struct page, but has been obsolete for some time. Just
      remove it.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      07808b74
    • A
      b0d41693
    • A
      [PATCH] x86_64: Add 4GB DMA32 zone · a2f1b424
      Andi Kleen 提交于
      Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.
      
      As a bit of historical background: when the x86-64 port
      was originally designed we had some discussion if we should
      use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
      both. Both was ruled out at this point because it was in early
      2.4 when VM is still quite shakey and had bad troubles even
      dealing with one DMA zone.  We settled on the 16MB DMA zone mainly
      because we worried about older soundcards and the floppy.
      
      But this has always caused problems since then because
      device drivers had trouble getting enough DMA able memory. These days
      the VM works much better and the wide use of NUMA has proven
      it can deal with many zones successfully.
      
      So this patch adds both zones.
      
      This helps drivers who need a lot of memory below 4GB because
      their hardware is not accessing more (graphic drivers - proprietary
      and free ones, video frame buffer drivers, sound drivers etc.).
      Previously they could only use IOMMU+16MB GFP_DMA, which
      was not enough memory.
      
      Another common problem is that hardware who has full memory
      addressing for >4GB misses it for some control structures in memory
      (like transmit rings or other metadata).  They tended to allocate memory
      in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
      but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
      (even on AMD systems the IOMMU tends to be quite small) especially if you have
      many devices.  With the new zone pci_alloc_consistent can just put
      this stuff into memory below 4GB which works better.
      
      One argument was still if the zone should be 4GB or 2GB. The main
      motivation for 2GB would be an unnamed not so unpopular hardware
      raid controller (mostly found in older machines from a particular four letter
      company) who has a strange 2GB restriction in firmware. But
      that one works ok with swiotlb/IOMMU anyways, so it doesn't really
      need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
      it seems to be the most common restriction.
      
      The new zone is so far added only for x86-64.
      
      For other architectures who don't set up this
      new zone nothing changes. Architectures can set a compatibility
      define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
      as GFP_DMA. Otherwise it's a nop because on 32bit architectures
      it's normally not needed because GFP_NORMAL (=0) is DMA able
      enough.
      
      One problem is still that GFP_DMA means different things on different
      architectures. e.g. some drivers used to have #ifdef ia64  use GFP_DMA
      (trusting it to be 4GB) #elif __x86_64__ (use other hacks like
      the swiotlb because 16MB is not enough) ... . This was quite
      ugly and is now obsolete.
      
      These should be now converted to use GFP_DMA32 unconditionally. I haven't done
      this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
      which will use GFP_DMA32 transparently.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a2f1b424
  11. 14 11月, 2005 3 次提交