1. 18 7月, 2007 3 次提交
    • M
      Add a movablecore= parameter for sizing ZONE_MOVABLE · 7e63efef
      Mel Gorman 提交于
      This patch adds a new parameter for sizing ZONE_MOVABLE called
      movablecore=.  While kernelcore= is used to specify the minimum amount of
      memory that must be available for all allocation types, movablecore= is
      used to specify the minimum amount of memory that is used for migratable
      allocations.  The amount of memory used for migratable allocations
      determines how large the huge page pool could be dynamically resized to at
      runtime for example.
      
      How movablecore is actually handled is that the total number of pages in
      the system is calculated and a value is set for kernelcore that is
      
      kernelcore == totalpages - movablecore
      
      Both kernelcore= and movablecore= can be safely specified at the same time.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e63efef
    • M
      handle kernelcore=: generic · ed7ed365
      Mel Gorman 提交于
      This patch adds the kernelcore= parameter for x86.
      
      Once all patches are applied, a new command-line parameter exist and a new
      sysctl.  This patch adds the necessary documentation.
      
      From: Yasunori Goto <y-goto@jp.fujitsu.com>
      
        When "kernelcore" boot option is specified, kernel can't boot up on ia64
        because of an infinite loop.  In addition, the parsing code can be handled
        in an architecture-independent manner.
      
        This patch uses common code to handle the kernelcore= parameter.  It is
        only available to architectures that support arch-independent zone-sizing
        (i.e.  define CONFIG_ARCH_POPULATES_NODE_MAP).  Other architectures will
        ignore the boot parameter.
      
      [bunk@stusta.de: make cmdline_parse_kernelcore() static]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Acked-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed7ed365
    • M
      Create the ZONE_MOVABLE zone · 2a1e274a
      Mel Gorman 提交于
      The following 8 patches against 2.6.20-mm2 create a zone called ZONE_MOVABLE
      that is only usable by allocations that specify both __GFP_HIGHMEM and
      __GFP_MOVABLE.  This has the effect of keeping all non-movable pages within a
      single memory partition while allowing movable allocations to be satisfied
      from either partition.  The patches may be applied with the list-based
      anti-fragmentation patches that groups pages together based on mobility.
      
      The size of the zone is determined by a kernelcore= parameter specified at
      boot-time.  This specifies how much memory is usable by non-movable
      allocations and the remainder is used for ZONE_MOVABLE.  Any range of pages
      within ZONE_MOVABLE can be released by migrating the pages or by reclaiming.
      
      When selecting a zone to take pages from for ZONE_MOVABLE, there are two
      things to consider.  First, only memory from the highest populated zone is
      used for ZONE_MOVABLE.  On the x86, this is probably going to be ZONE_HIGHMEM
      but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64.  Second,
      the amount of memory usable by the kernel will be spread evenly throughout
      NUMA nodes where possible.  If the nodes are not of equal size, the amount of
      memory usable by the kernel on some nodes may be greater than others.
      
      By default, the zone is not as useful for hugetlb allocations because they are
      pinned and non-migratable (currently at least).  A sysctl is provided that
      allows huge pages to be allocated from that zone.  This means that the huge
      page pool can be resized to the size of ZONE_MOVABLE during the lifetime of
      the system assuming that pages are not mlocked.  Despite huge pages being
      non-movable, we do not introduce additional external fragmentation of note as
      huge pages are always the largest contiguous block we care about.
      
      Credit goes to Andy Whitcroft for catching a large variety of problems during
      review of the patches.
      
      This patch creates an additional zone, ZONE_MOVABLE.  This zone is only usable
      by allocations which specify both __GFP_HIGHMEM and __GFP_MOVABLE.  Hot-added
      memory continues to be placed in their existing destination as there is no
      mechanism to redirect them to a specific zone.
      
      [y-goto@jp.fujitsu.com: Fix section mismatch of memory hotplug related code]
      [akpm@linux-foundation.org: various fixes]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a1e274a
  2. 17 7月, 2007 6 次提交
    • A
      fault-injection: add min-order parameter to fail_page_alloc · 54114994
      Akinobu Mita 提交于
      Limiting smaller allocation failures by fault injection helps to find real
      possible bugs.  Because higher order allocations are likely to fail and
      zero-order allocations are not likely to fail.
      
      This patch adds min-order parameter to fail_page_alloc.  It specifies the
      minimum page allocation order to be injected failures.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54114994
    • D
      b49ad484
    • P
      mm: more __meminit annotations · 6ea6e688
      Paul Mundt 提交于
      Currently zone_spanned_pages_in_node() and zone_absent_pages_in_node() are
      non-static for ARCH_POPULATES_NODE_MAP and static otherwise.  However, only
      the non-static versions are __meminit annotated, despite only being called
      from __meminit functions in either case.
      
      zone_init_free_lists() is currently non-static and not __meminit annotated
      either, despite only being called once in the entire tree by
      init_currently_empty_zone(), which too is __meminit.  So make it static and
      properly annotated.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ea6e688
    • J
      mm: fix improper .init-type section references · 98011f56
      Jan Beulich 提交于
      .. which modpost started warning about.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98011f56
    • E
      MM: alloc_large_system_hash() can free some memory for non power-of-two bucketsize · 1037b83b
      Eric Dumazet 提交于
      alloc_large_system_hash() is called at boot time to allocate space for
      several large hash tables.
      
      Lately, TCP hash table was changed and its bucketsize is not a power-of-two
      anymore.
      
      On most setups, alloc_large_system_hash() allocates one big page (order >
      0) with __get_free_pages(GFP_ATOMIC, order).  This single high_order page
      has a power-of-two size, bigger than the needed size.
      
      We can free all pages that wont be used by the hash table.
      
      On a 1GB i386 machine, this patch saves 128 KB of LOWMEM memory.
      
      TCP established hash table entries: 32768 (order: 6, 393216 bytes)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1037b83b
    • K
      change zonelist order: zonelist order selection logic · f0c0b2b8
      KAMEZAWA Hiroyuki 提交于
      Make zonelist creation policy selectable from sysctl/boot option v6.
      
      This patch makes NUMA's zonelist (of pgdat) order selectable.
      Available order are Default(automatic)/ Node-based / Zone-based.
      
      [Default Order]
      The kernel selects Node-based or Zone-based order automatically.
      
      [Node-based Order]
      This policy treats the locality of memory as the most important parameter.
      Zonelist order is created by each zone's locality. This means lower zones
      (ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion.
      IOW. ZONE_DMA will be in the middle of zonelist.
      current 2.6.21 kernel uses this.
      
      Pros.
       * A user can expect local memory as much as possible.
      Cons.
       * lower zone will be exhansted before higher zone. This may cause OOM_KILL.
      
      Maybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL
      because of ZONE_DMA exhaution and you need the best locality.
      
      (example)
      assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.
      
      *node(0)'s memory allocation order:
      
       node(0)'s NORMAL -> node(0)'s DMA -> node(1)'s NORMAL.
      
      *node(1)'s memory allocation order:
      
       node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.
      
      [Zone-based order]
      This policy treats the zone type as the most important parameter.
      Zonelist order is created by zone-type order. This means lower zone
      never be used bofere higher zone exhaustion.
      IOW. ZONE_DMA will be always at the tail of zonelist.
      
      Pros.
       * OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted.
      Cons.
       * memory locality may not be best.
      
      (example)
      assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.
      
      *node(0)'s memory allocation order:
      
       node(0)'s NORMAL -> node(1)'s NORMAL -> node(0)'s DMA.
      
      *node(1)'s memory allocation order:
      
       node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.
      
      bootoption "numa_zonelist_order=" and proc/sysctl is supporetd.
      
      command:
      %echo N > /proc/sys/vm/numa_zonelist_order
      
      Will rebuild zonelist in Node-based order.
      
      command:
      %echo Z > /proc/sys/vm/numa_zonelist_order
      
      Will rebuild zonelist in Zone-based order.
      
      Thanks to Lee Schermerhorn, he gives me much help and codes.
      
      [Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "jesse.barnes@intel.com" <jesse.barnes@intel.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0c0b2b8
  3. 16 6月, 2007 1 次提交
    • P
      mm: Fix memory/cpu hotplug section mismatch and oops. · d09c6b80
      Paul Mundt 提交于
      When building with memory hotplug enabled and cpu hotplug disabled, we
      end up with the following section mismatch:
      
      WARNING: mm/built-in.o(.text+0x4e58): Section mismatch: reference to
      .init.text: (between 'free_area_init_node' and '__build_all_zonelists')
      
      This happens as a result of:
      
              -> free_area_init_node()
                -> free_area_init_core()
                  -> zone_pcp_init() <-- all __meminit up to this point
                    -> zone_batchsize() <-- marked as __cpuinit                     fo
      
      This happens because CONFIG_HOTPLUG_CPU=n sets __cpuinit to __init, but
      CONFIG_MEMORY_HOTPLUG=y unsets __meminit.
      
      Changing zone_batchsize() to __devinit fixes this.
      
      __devinit is the only thing that is common between CONFIG_HOTPLUG_CPU=y and
      CONFIG_MEMORY_HOTPLUG=y. In the long run, perhaps this should be moved to
      another section identifier completely. Without this, memory hot-add
      of offline nodes (via hotadd_new_pgdat()) will oops if CPU hotplug is
      not also enabled.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Acked-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      
      --
      
       mm/page_alloc.c |    2 +-
       1 file changed, 1 insertion(+), 1 deletion(-)
      d09c6b80
  4. 31 5月, 2007 1 次提交
  5. 24 5月, 2007 1 次提交
  6. 19 5月, 2007 1 次提交
    • S
      mm: fix section mismatch warnings · 577a32f6
      Sam Ravnborg 提交于
      modpost had two cases hardcoded for mm/
      Shift over to __init_refok and kill the
      hardcoded function names in modpost.
      
      This has the drawback that the functions
      will always be kept no matter configuration.
      With previous code the function were placed in
      init section if configuration allowed it.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      577a32f6
  7. 11 5月, 2007 1 次提交
  8. 10 5月, 2007 2 次提交
    • C
      Move remote node draining out of slab allocators · 4037d452
      Christoph Lameter 提交于
      Currently the slab allocators contain callbacks into the page allocator to
      perform the draining of pagesets on remote nodes.  This requires SLUB to have
      a whole subsystem in order to be compatible with SLAB.  Moving node draining
      out of the slab allocators avoids a section of code in SLUB.
      
      Move the node draining so that is is done when the vm statistics are updated.
      At that point we are already touching all the cachelines with the pagesets of
      a processor.
      
      Add a expire counter there.  If we have to update per zone or global vm
      statistics then assume that the pageset will require subsequent draining.
      
      The expire counter will be decremented on each vm stats update pass until it
      reaches zero.  Then we will drain one batch from the pageset.  The draining
      will cause vm counter updates which will then cause another expiration until
      the pcp is empty.  So we will drain a batch every 3 seconds.
      
      Note that remote node draining is a somewhat esoteric feature that is required
      on large NUMA systems because otherwise significant portions of system memory
      can become trapped in pcp queues.  The number of pcp is determined by the
      number of processors and nodes in a system.  A system with 4 processors and 2
      nodes has 8 pcps which is okay.  But a system with 1024 processors and 512
      nodes has 512k pcps with a high potential for large amount of memory being
      caught in them.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4037d452
    • R
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki 提交于
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
  9. 09 5月, 2007 2 次提交
  10. 08 5月, 2007 6 次提交
  11. 02 3月, 2007 1 次提交
  12. 21 2月, 2007 2 次提交
  13. 12 2月, 2007 8 次提交
    • C
      [PATCH] optional ZONE_DMA: optional ZONE_DMA in the VM · 4b51d669
      Christoph Lameter 提交于
      Make ZONE_DMA optional in core code.
      
      - ifdef all code for ZONE_DMA and related definitions following the example
        for ZONE_DMA32 and ZONE_HIGHMEM.
      
      - Without ZONE_DMA, ZONE_HIGHMEM and ZONE_DMA32 we get to a ZONES_SHIFT of
        0.
      
      - Modify the VM statistics to work correctly without a DMA zone.
      
      - Modify slab to not create DMA slabs if there is no ZONE_DMA.
      
      [akpm@osdl.org: cleanup]
      [jdike@addtoit.com: build fix]
      [apw@shadowen.org: Simplify calculation of the number of bits we need for ZONES_SHIFT]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NJeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b51d669
    • C
      [PATCH] optional ZONE_DMA: deal with cases of ZONE_DMA meaning the first zone · 6267276f
      Christoph Lameter 提交于
      This patchset follows up on the earlier work in Andrew's tree to reduce the
      number of zones.  The patches allow to go to a minimum of 2 zones.  This one
      allows also to make ZONE_DMA optional and therefore the number of zones can be
      reduced to one.
      
      ZONE_DMA is usually used for ISA DMA devices.  There are a number of reasons
      why we would not want to have ZONE_DMA
      
      1. Some arches do not need ZONE_DMA at all.
      
      2. With the advent of IOMMUs DMA zones are no longer needed.
         The necessity of DMA zones may drastically be reduced
         in the future. This patchset allows a compilation of
         a kernel without that overhead.
      
      3. Devices that require ISA DMA get rare these days. All
         my systems do not have any need for ISA DMA.
      
      4. The presence of an additional zone unecessarily complicates
         VM operations because it must be scanned and balancing
         logic must operate on its.
      
      5. With only ZONE_NORMAL one can reach the situation where
         we have only one zone. This will allow the unrolling of many
         loops in the VM and allows the optimization of varous
         code paths in the VM.
      
      6. Having only a single zone in a NUMA system results in a
         1-1 correspondence between nodes and zones. Various additional
         optimizations to critical VM paths become possible.
      
      Many systems today can operate just fine with a single zone.  If you look at
      what is in ZONE_DMA then one usually sees that nothing uses it.  The DMA slabs
      are empty (Some arches use ZONE_DMA instead of ZONE_NORMAL, then ZONE_NORMAL
      will be empty instead).
      
      On all of my systems (i386, x86_64, ia64) ZONE_DMA is completely empty.  Why
      constantly look at an empty zone in /proc/zoneinfo and empty slab in
      /proc/slabinfo?  Non i386 also frequently have no need for ZONE_DMA and zones
      stay empty.
      
      The patchset was tested on i386 (UP / SMP), x86_64 (UP, NUMA) and ia64 (NUMA).
      
      The RFC posted earlier (see
      http://marc.theaimsgroup.com/?l=linux-kernel&m=115231723513008&w=2) had lots
      of #ifdefs in them.  An effort has been made to minize the number of #ifdefs
      and make this as compact as possible.  The job was made much easier by the
      ongoing efforts of others to extract common arch specific functionality.
      
      I have been running this for awhile now on my desktop and finally Linux is
      using all my available RAM instead of leaving the 16MB in ZONE_DMA untouched:
      
      christoph@pentium940:~$ cat /proc/zoneinfo
      Node 0, zone   Normal
        pages free     4435
              min      1448
              low      1810
              high     2172
              active   241786
              inactive 210170
              scanned  0 (a: 0 i: 0)
              spanned  524224
              present  524224
          nr_anon_pages 61680
          nr_mapped    14271
          nr_file_pages 390264
          nr_slab_reclaimable 27564
          nr_slab_unreclaimable 1793
          nr_page_table_pages 449
          nr_dirty     39
          nr_writeback 0
          nr_unstable  0
          nr_bounce    0
          cpu: 0 pcp: 0
                    count: 156
                    high:  186
                    batch: 31
          cpu: 0 pcp: 1
                    count: 9
                    high:  62
                    batch: 15
        vm stats threshold: 20
          cpu: 1 pcp: 0
                    count: 177
                    high:  186
                    batch: 31
          cpu: 1 pcp: 1
                    count: 12
                    high:  62
                    batch: 15
        vm stats threshold: 20
        all_unreclaimable: 0
        prev_priority:     12
        temp_priority:     12
        start_pfn:         0
      
      This patch:
      
      In two places in the VM we use ZONE_DMA to refer to the first zone.  If
      ZONE_DMA is optional then other zones may be first.  So simply replace
      ZONE_DMA with zone 0.
      
      This also fixes ZONETABLE_PGSHIFT.  If we have only a single zone then
      ZONES_PGSHIFT may become 0 because there is no need anymore to encode the zone
      number related to a pgdat.  However, we still need a zonetable to index all
      the zones for each node if this is a NUMA system.  Therefore define
      ZONETABLE_SHIFT unconditionally as the offset of the ZONE field in page flags.
      
      [apw@shadowen.org: fix mismerge]
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6267276f
    • C
      [PATCH] Drop get_zone_counts() · 65e458d4
      Christoph Lameter 提交于
      Values are available via ZVC sums.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65e458d4
    • C
      [PATCH] Drop nr_free_pages_pgdat() · 9195481d
      Christoph Lameter 提交于
      Function is unnecessary now.  We can use the summing features of the ZVCs to
      get the values we need.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9195481d
    • C
      [PATCH] Drop free_pages() · 96177299
      Christoph Lameter 提交于
      nr_free_pages is now a simple access to a global variable.  Make it a macro
      instead of a function.
      
      The nr_free_pages now requires vmstat.h to be included.  There is one
      occurrence in power management where we need to add the include.  Directly
      refrer to global_page_state() there to clarify why the #include was added.
      
      [akpm@osdl.org: arm build fix]
      [akpm@osdl.org: sparc64 build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96177299
    • C
      [PATCH] Use ZVC for free_pages · d23ad423
      Christoph Lameter 提交于
      This is again simplifies some of the VM counter calculations through the use
      of the ZVC consolidated counters.
      
      [michal.k.k.piotrowski@gmail.com: build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NMichal Piotrowski <michal.k.k.piotrowski@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d23ad423
    • C
      [PATCH] Use ZVC for inactive and active counts · c8785385
      Christoph Lameter 提交于
      The determination of the dirty ratio to determine writeback behavior is
      currently based on the number of total pages on the system.
      
      However, not all pages in the system may be dirtied.  Thus the ratio is always
      too low and can never reach 100%.  The ratio may be particularly skewed if
      large hugepage allocations, slab allocations or device driver buffers make
      large sections of memory not available anymore.  In that case we may get into
      a situation in which f.e.  the background writeback ratio of 40% cannot be
      reached anymore which leads to undesired writeback behavior.
      
      This patchset fixes that issue by determining the ratio based on the actual
      pages that may potentially be dirty.  These are the pages on the active and
      the inactive list plus free pages.
      
      The problem with those counts has so far been that it is expensive to
      calculate these because counts from multiple nodes and multiple zones will
      have to be summed up.  This patchset makes these counters ZVC counters.  This
      means that a current sum per zone, per node and for the whole system is always
      available via global variables and not expensive anymore to calculate.
      
      The patchset results in some other good side effects:
      
      - Removal of the various functions that sum up free, active and inactive
        page counts
      
      - Cleanup of the functions that display information via the proc filesystem.
      
      This patch:
      
      The use of a ZVC for nr_inactive and nr_active allows a simplification of some
      counter operations.  More ZVC functionality is used for sums etc in the
      following patches.
      
      [akpm@osdl.org: UP build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8785385
    • M
      [PATCH] Avoid excessive sorting of early_node_map[] · a6af2bc3
      Mel Gorman 提交于
      find_min_pfn_for_node() and find_min_pfn_with_active_regions() sort
      early_node_map[] on every call.  This is an excessive amount of sorting and
      that can be avoided.  This patch always searches the whole early_node_map[]
      in find_min_pfn_for_node() instead of returning the first value found.  The
      map is then only sorted once when required.  Successfully boot tested on a
      number of machines.
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6af2bc3
  14. 10 2月, 2007 1 次提交
  15. 01 2月, 2007 1 次提交
  16. 12 1月, 2007 1 次提交
  17. 06 1月, 2007 2 次提交
    • C
      [PATCH] Check for populated zone in __drain_pages · f2e12bb2
      Christoph Lameter 提交于
      Both process_zones() and drain_node_pages() check for populated zones
      before touching pagesets.  However, __drain_pages does not do so,
      
      This may result in a NULL pointer dereference for pagesets in unpopulated
      zones if a NUMA setup is combined with cpu hotplug.
      
      Initially the unpopulated zone has the pcp pointers pointing to the boot
      pagesets.  Since the zone is not populated the boot pageset pointers will
      not be changed during page allocator and slab bootstrap.
      
      If a cpu is later brought down (first call to __drain_pages()) then the pcp
      pointers for cpus in unpopulated zones are set to NULL since __drain_pages
      does not first check for an unpopulated zone.
      
      If the cpu is then brought up again then we call process_zones() which will
      ignore the unpopulated zone.  So the pageset pointers will still be NULL.
      
      If the cpu is then again brought down then __drain_pages will attempt to
      drain pages by following the NULL pageset pointer for unpopulated zones.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2e12bb2
    • P
      [PATCH] Sanely size hash tables when using large base pages · 9ab37b8f
      Paul Mundt 提交于
      At the moment the inode/dentry cache hash tables (common by way of
      alloc_large_system_hash()) are incorrectly sized by their respective
      detection logic when we attempt to use large base pages on systems with
      little memory.
      
      This results in odd behaviour when using a 64kB PAGE_SIZE, such as:
      
      Dentry cache hash table entries: 8192 (order: -1, 32768 bytes)
      Inode-cache hash table entries: 4096 (order: -2, 16384 bytes)
      
      The mount cache hash table is seemingly the only one that gets this right
      by directly taking PAGE_SIZE in to account.
      
      The following patch attempts to catch the bogus values and round it up to
      at least 0-order.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9ab37b8f