1. 12 2月, 2007 21 次提交
    • A
      [PATCH] Export invalidate_mapping_pages() to modules · 54bc4855
      Anton Altaparmakov 提交于
      It makes no sense to me to export invalidate_inode_pages() and not
      invalidate_mapping_pages() and I actually need invalidate_mapping_pages()
      because of its range specification ability...
      
      akpm: also remove the export of invalidate_inode_pages() by making it an
      inlined wrapper.
      Signed-off-by: NAnton Altaparmakov <aia21@cantab.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54bc4855
    • I
      [PATCH] lockdep: also check for freed locks in kmem_cache_free() · 898552c9
      Ingo Molnar 提交于
      kmem_cache_free() was missing the check for freeing held locks.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      898552c9
    • K
      [PATCH] do not disturb page referenced state when unmapping memory range · daa88c8d
      Ken Chen 提交于
      When kernel unmaps an address range, it needs to transfer PTE state into
      page struct.  Currently, kernel transfer access bit via
      mark_page_accessed().  The call to mark_page_accessed in the unmap path
      doesn't look logically correct.
      
      At unmap time, calling mark_page_accessed will causes page LRU state to be
      bumped up one step closer to more recently used state.  It is causing quite
      a bit headache in a scenario when a process creates a shmem segment, touch
      a whole bunch of pages, then unmaps it.  The unmapping takes a long time
      because mark_page_accessed() will start moving pages from inactive to
      active list.
      
      I'm not too much concerned with moving the page from one list to another in
      LRU.  Sooner or later it might be moved because of multiple mappings from
      various processes.  But it just doesn't look logical that when user asks a
      range to be unmapped, it's his intention that the process is no longer
      interested in these pages.  Moving those pages to active list (or bumping
      up a state towards more active) seems to be an over reaction.  It also
      prolongs unmapping latency which is the core issue I'm trying to solve.
      
      As suggested by Peter, we should still preserve the info on pte young
      pages, but not more.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NKen Chen <kenchen@google.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      daa88c8d
    • K
      [PATCH] simplify shmem_aops.set_page_dirty() method · 76719325
      Ken Chen 提交于
      shmem backed file does not have page writeback, nor it participates in
      backing device's dirty or writeback accounting.  So using generic
      __set_page_dirty_nobuffers() for its .set_page_dirty aops method is a bit
      overkill.  It unnecessarily prolongs shm unmap latency.
      
      For example, on a densely populated large shm segment (sevearl GBs), the
      unmapping operation becomes painfully long.  Because at unmap, kernel
      transfers dirty bit in PTE into page struct and to the radix tree tag.  The
      operation of tagging the radix tree is particularly expensive because it
      has to traverse the tree from the root to the leaf node on every dirty
      page.  What's bothering is that radix tree tag is used for page write back.
       However, shmem is memory backed and there is no page write back for such
      file system.  And in the end, we spend all that time tagging radix tree and
      none of that fancy tagging will be used.  So let's simplify it by introduce
      a new aops __set_page_dirty_no_writeback and this will speed up shm unmap.
      Signed-off-by: NKen Chen <kenchen@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76719325
    • C
      [PATCH] Set CONFIG_ZONE_DMA for arches with GENERIC_ISA_DMA · 5ac6da66
      Christoph Lameter 提交于
      As Andi pointed out: CONFIG_GENERIC_ISA_DMA only disables the ISA DMA
      channel management.  Other functionality may still expect GFP_DMA to
      provide memory below 16M.  So we need to make sure that CONFIG_ZONE_DMA is
      set independent of CONFIG_GENERIC_ISA_DMA.  Undo the modifications to
      mm/Kconfig where we made ZONE_DMA dependent on GENERIC_ISA_DMA and set
      theses explicitly in each arches Kconfig.
      
      Reviews must occur for each arch in order to determine if ZONE_DMA can be
      switched off.  It can only be switched off if we know that all devices
      supported by a platform are capable of performing DMA transfers to all of
      memory (Some arches already support this: uml, avr32, sh sh64, parisc and
      IA64/Altix).
      
      In order to switch ZONE_DMA off conditionally, one would have to establish
      a scheme by which one can assure that no drivers are enabled that are only
      capable of doing I/O to a part of memory, or one needs to provide an
      alternate means of performing an allocation from a specific range of memory
      (like provided by alloc_pages_range()) and insure that all drivers use that
      call.  In that case the arches alloc_dma_coherent() may need to be modified
      to call alloc_pages_range() instead of relying on GFP_DMA.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ac6da66
    • C
      [PATCH] optional ZONE_DMA: optional ZONE_DMA in the VM · 4b51d669
      Christoph Lameter 提交于
      Make ZONE_DMA optional in core code.
      
      - ifdef all code for ZONE_DMA and related definitions following the example
        for ZONE_DMA32 and ZONE_HIGHMEM.
      
      - Without ZONE_DMA, ZONE_HIGHMEM and ZONE_DMA32 we get to a ZONES_SHIFT of
        0.
      
      - Modify the VM statistics to work correctly without a DMA zone.
      
      - Modify slab to not create DMA slabs if there is no ZONE_DMA.
      
      [akpm@osdl.org: cleanup]
      [jdike@addtoit.com: build fix]
      [apw@shadowen.org: Simplify calculation of the number of bits we need for ZONES_SHIFT]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NJeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b51d669
    • C
      [PATCH] optional ZONE_DMA: introduce CONFIG_ZONE_DMA · 66701b14
      Christoph Lameter 提交于
      This patch simply defines CONFIG_ZONE_DMA for all arches.  We later do special
      things with CONFIG_ZONE_DMA after the VM and an arch are prepared to work
      without ZONE_DMA.
      
      CONFIG_ZONE_DMA can be defined in two ways depending on how an architecture
      handles ISA DMA.
      
      First if CONFIG_GENERIC_ISA_DMA is set by the arch then we know that the arch
      needs ZONE_DMA because ISA DMA devices are supported.  We can catch this in
      mm/Kconfig and do not need to modify arch code.
      
      Second, arches may use ZONE_DMA in an unknown way.  We set CONFIG_ZONE_DMA for
      all arches that do not set CONFIG_GENERIC_ISA_DMA in order to insure backwards
      compatibility.  The arches may later undefine ZONE_DMA if their arch code has
      been verified to not depend on ZONE_DMA.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66701b14
    • C
      [PATCH] optional ZONE_DMA: deal with cases of ZONE_DMA meaning the first zone · 6267276f
      Christoph Lameter 提交于
      This patchset follows up on the earlier work in Andrew's tree to reduce the
      number of zones.  The patches allow to go to a minimum of 2 zones.  This one
      allows also to make ZONE_DMA optional and therefore the number of zones can be
      reduced to one.
      
      ZONE_DMA is usually used for ISA DMA devices.  There are a number of reasons
      why we would not want to have ZONE_DMA
      
      1. Some arches do not need ZONE_DMA at all.
      
      2. With the advent of IOMMUs DMA zones are no longer needed.
         The necessity of DMA zones may drastically be reduced
         in the future. This patchset allows a compilation of
         a kernel without that overhead.
      
      3. Devices that require ISA DMA get rare these days. All
         my systems do not have any need for ISA DMA.
      
      4. The presence of an additional zone unecessarily complicates
         VM operations because it must be scanned and balancing
         logic must operate on its.
      
      5. With only ZONE_NORMAL one can reach the situation where
         we have only one zone. This will allow the unrolling of many
         loops in the VM and allows the optimization of varous
         code paths in the VM.
      
      6. Having only a single zone in a NUMA system results in a
         1-1 correspondence between nodes and zones. Various additional
         optimizations to critical VM paths become possible.
      
      Many systems today can operate just fine with a single zone.  If you look at
      what is in ZONE_DMA then one usually sees that nothing uses it.  The DMA slabs
      are empty (Some arches use ZONE_DMA instead of ZONE_NORMAL, then ZONE_NORMAL
      will be empty instead).
      
      On all of my systems (i386, x86_64, ia64) ZONE_DMA is completely empty.  Why
      constantly look at an empty zone in /proc/zoneinfo and empty slab in
      /proc/slabinfo?  Non i386 also frequently have no need for ZONE_DMA and zones
      stay empty.
      
      The patchset was tested on i386 (UP / SMP), x86_64 (UP, NUMA) and ia64 (NUMA).
      
      The RFC posted earlier (see
      http://marc.theaimsgroup.com/?l=linux-kernel&m=115231723513008&w=2) had lots
      of #ifdefs in them.  An effort has been made to minize the number of #ifdefs
      and make this as compact as possible.  The job was made much easier by the
      ongoing efforts of others to extract common arch specific functionality.
      
      I have been running this for awhile now on my desktop and finally Linux is
      using all my available RAM instead of leaving the 16MB in ZONE_DMA untouched:
      
      christoph@pentium940:~$ cat /proc/zoneinfo
      Node 0, zone   Normal
        pages free     4435
              min      1448
              low      1810
              high     2172
              active   241786
              inactive 210170
              scanned  0 (a: 0 i: 0)
              spanned  524224
              present  524224
          nr_anon_pages 61680
          nr_mapped    14271
          nr_file_pages 390264
          nr_slab_reclaimable 27564
          nr_slab_unreclaimable 1793
          nr_page_table_pages 449
          nr_dirty     39
          nr_writeback 0
          nr_unstable  0
          nr_bounce    0
          cpu: 0 pcp: 0
                    count: 156
                    high:  186
                    batch: 31
          cpu: 0 pcp: 1
                    count: 9
                    high:  62
                    batch: 15
        vm stats threshold: 20
          cpu: 1 pcp: 0
                    count: 177
                    high:  186
                    batch: 31
          cpu: 1 pcp: 1
                    count: 12
                    high:  62
                    batch: 15
        vm stats threshold: 20
        all_unreclaimable: 0
        prev_priority:     12
        temp_priority:     12
        start_pfn:         0
      
      This patch:
      
      In two places in the VM we use ZONE_DMA to refer to the first zone.  If
      ZONE_DMA is optional then other zones may be first.  So simply replace
      ZONE_DMA with zone 0.
      
      This also fixes ZONETABLE_PGSHIFT.  If we have only a single zone then
      ZONES_PGSHIFT may become 0 because there is no need anymore to encode the zone
      number related to a pgdat.  However, we still need a zonetable to index all
      the zones for each node if this is a NUMA system.  Therefore define
      ZONETABLE_SHIFT unconditionally as the offset of the ZONE field in page flags.
      
      [apw@shadowen.org: fix mismerge]
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6267276f
    • C
      [PATCH] Drop get_zone_counts() · 65e458d4
      Christoph Lameter 提交于
      Values are available via ZVC sums.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65e458d4
    • C
      [PATCH] Drop __get_zone_counts() · 05a0416b
      Christoph Lameter 提交于
      Values are readily available via ZVC per node and global sums.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05a0416b
    • C
      [PATCH] Drop nr_free_pages_pgdat() · 9195481d
      Christoph Lameter 提交于
      Function is unnecessary now.  We can use the summing features of the ZVCs to
      get the values we need.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9195481d
    • C
      [PATCH] Drop free_pages() · 96177299
      Christoph Lameter 提交于
      nr_free_pages is now a simple access to a global variable.  Make it a macro
      instead of a function.
      
      The nr_free_pages now requires vmstat.h to be included.  There is one
      occurrence in power management where we need to add the include.  Directly
      refrer to global_page_state() there to clarify why the #include was added.
      
      [akpm@osdl.org: arm build fix]
      [akpm@osdl.org: sparc64 build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96177299
    • C
      [PATCH] Reorder ZVCs according to cacheline · 51ed4491
      Christoph Lameter 提交于
      The global and per zone counter sums are in arrays of longs.  Reorder the ZVCs
      so that the most frequently used ZVCs are put into the same cacheline.  That
      way calculations of the global, node and per zone vm state touches only a
      single cacheline.  This is mostly important for 64 bit systems were one 128
      byte cacheline takes only 8 longs.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51ed4491
    • C
      [PATCH] Use ZVC for free_pages · d23ad423
      Christoph Lameter 提交于
      This is again simplifies some of the VM counter calculations through the use
      of the ZVC consolidated counters.
      
      [michal.k.k.piotrowski@gmail.com: build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NMichal Piotrowski <michal.k.k.piotrowski@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d23ad423
    • C
      [PATCH] Use ZVC for inactive and active counts · c8785385
      Christoph Lameter 提交于
      The determination of the dirty ratio to determine writeback behavior is
      currently based on the number of total pages on the system.
      
      However, not all pages in the system may be dirtied.  Thus the ratio is always
      too low and can never reach 100%.  The ratio may be particularly skewed if
      large hugepage allocations, slab allocations or device driver buffers make
      large sections of memory not available anymore.  In that case we may get into
      a situation in which f.e.  the background writeback ratio of 40% cannot be
      reached anymore which leads to undesired writeback behavior.
      
      This patchset fixes that issue by determining the ratio based on the actual
      pages that may potentially be dirty.  These are the pages on the active and
      the inactive list plus free pages.
      
      The problem with those counts has so far been that it is expensive to
      calculate these because counts from multiple nodes and multiple zones will
      have to be summed up.  This patchset makes these counters ZVC counters.  This
      means that a current sum per zone, per node and for the whole system is always
      available via global variables and not expensive anymore to calculate.
      
      The patchset results in some other good side effects:
      
      - Removal of the various functions that sum up free, active and inactive
        page counts
      
      - Cleanup of the functions that display information via the proc filesystem.
      
      This patch:
      
      The use of a ZVC for nr_inactive and nr_active allows a simplification of some
      counter operations.  More ZVC functionality is used for sums etc in the
      following patches.
      
      [akpm@osdl.org: UP build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8785385
    • H
      [PATCH] page_mkwrite caller race fix · c3704ceb
      Hugh Dickins 提交于
      After do_wp_page has tested page_mkwrite, it must release old_page after
      acquiring page table lock, not before: at some stage that ordering got
      reversed, leaving a (very unlikely) window in which old_page might be
      truncated, freed, and reused in the same position.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3704ceb
    • A
      [PATCH] /proc/zoneinfo: fix vm stats display · 5a88a13d
      Andrew Morton 提交于
      This early break prevents us from displaying info for the vm stats thresholds
      if the zone doesn't have any pages in its per-cpu pagesets.
      
      So my 800MB i386 box says:
      
      Node 0, zone      DMA
        pages free     2365
              min      16
              low      20
              high     24
              active   0
              inactive 0
              scanned  0 (a: 0 i: 0)
              spanned  4096
              present  4044
          nr_anon_pages 0
          nr_mapped    1
          nr_file_pages 0
          nr_slab_reclaimable 0
          nr_slab_unreclaimable 0
          nr_page_table_pages 0
          nr_dirty     0
          nr_writeback 0
          nr_unstable  0
          nr_bounce    0
          nr_vmscan_write 0
              protection: (0, 868, 868)
        pagesets
        all_unreclaimable: 0
        prev_priority:     12
        start_pfn:         0
      Node 0, zone   Normal
        pages free     199713
              min      934
              low      1167
              high     1401
              active   10215
              inactive 4507
              scanned  0 (a: 0 i: 0)
              spanned  225280
              present  222420
          nr_anon_pages 2685
          nr_mapped    1110
          nr_file_pages 12055
          nr_slab_reclaimable 2216
          nr_slab_unreclaimable 1527
          nr_page_table_pages 213
          nr_dirty     0
          nr_writeback 0
          nr_unstable  0
          nr_bounce    0
          nr_vmscan_write 0
              protection: (0, 0, 0)
        pagesets
          cpu: 0 pcp: 0
                    count: 152
                    high:  186
                    batch: 31
          cpu: 0 pcp: 1
                    count: 13
                    high:  62
                    batch: 15
        vm stats threshold: 16
          cpu: 1 pcp: 0
                    count: 34
                    high:  186
                    batch: 31
          cpu: 1 pcp: 1
                    count: 10
                    high:  62
                    batch: 15
        vm stats threshold: 16
        all_unreclaimable: 0
        prev_priority:     12
        start_pfn:         4096
      
      Just nuke all that search-for-the-first-non-empty-pageset code.  Dunno why it
      was there in the first place..
      
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a88a13d
    • M
      [PATCH] Avoid excessive sorting of early_node_map[] · a6af2bc3
      Mel Gorman 提交于
      find_min_pfn_for_node() and find_min_pfn_with_active_regions() sort
      early_node_map[] on every call.  This is an excessive amount of sorting and
      that can be avoided.  This patch always searches the whole early_node_map[]
      in find_min_pfn_for_node() instead of returning the first value found.  The
      map is then only sorted once when required.  Successfully boot tested on a
      number of machines.
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6af2bc3
    • C
      [PATCH] slab: use parameter passed to cache_reap to determine pointer to work structure · 7c5cae36
      Christoph Lameter 提交于
      Use the pointer passed to cache_reap to determine the work pointer and
      consolidate exit paths.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c5cae36
    • P
      [PATCH] slab: cache alloc cleanups · 8c8cc2c1
      Pekka Enberg 提交于
      Clean up __cache_alloc and __cache_alloc_node functions a bit.  We no
      longer need to do NUMA_BUILD tricks and the UMA allocation path is much
      simpler.  No functional changes in this patch.
      
      Note: saves few kernel text bytes on x86 NUMA build due to using gotos in
      __cache_alloc_node() and moving __GFP_THISNODE check in to
      fallback_alloc().
      
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Acked-by: NChristoph Lameter <christoph@lameter.com>
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c8cc2c1
    • P
      [PATCH] slab: remove broken PageSlab check from kfree_debugcheck · 6e40e730
      Pekka Enberg 提交于
      The PageSlab debug check in kfree_debugcheck() is broken for compound
      pages.  It is also redundant as we already do BUG_ON for non-slab pages in
      page_get_cache() and page_get_slab() which are always called before we free
      any actual objects.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e40e730
  2. 10 2月, 2007 4 次提交
  3. 01 2月, 2007 1 次提交
  4. 31 1月, 2007 2 次提交
    • A
      [PATCH] Don't allow the stack to grow into hugetlb reserved regions · 0d59a01b
      Adam Litke 提交于
      When expanding the stack, we don't currently check if the VMA will cross
      into an area of the address space that is reserved for hugetlb pages.
      Subsequent faults on the expanded portion of such a VMA will confuse the
      low-level MMU code, resulting in an OOPS.  Check for this.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d59a01b
    • H
      [PATCH] mm: mremap correct rmap accounting · 701dfbc1
      Hugh Dickins 提交于
      Nick Piggin points out that page accounting on MIPS multiple ZERO_PAGEs
      is not maintained by its move_pte, and could lead to freeing a ZERO_PAGE.
      
      Instead of complicating that move_pte, just forget the minor optimization
      when mremapping, and change the one thing which needed it for correctness
      - filemap_xip use ZERO_PAGE(0) throughout instead of according to address.
      
      [ "There is no block device driver one could use for XIP on mips
         platforms" - Carsten Otte ]
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      701dfbc1
  5. 30 1月, 2007 1 次提交
    • L
      Fix balance_dirty_page() calculations with CONFIG_HIGHMEM · dc6e29da
      Linus Torvalds 提交于
      This makes balance_dirty_page() always base its calculations on the
      amount of non-highmem memory in the machine, rather than try to base it
      on total memory and then falling back on non-highmem memory if the
      mapping it was writing wasn't highmem capable.
      
      This not only fixes a situation where two different writers can have
      wildly different notions about what is a "balanced" dirty state, but it
      also means that people with highmem machines don't run into an OOM
      situation when regular memory fills up with dirty pages.
      
      We used to try to handle the latter case by scaling down the dirty_ratio
      if the machine had a lot of highmem pages in page_writeback_init(), but
      it wasn't aggressive enough for some situations, and since basing the
      dirty ratio on highmem memory was broken in the first place, let's just
      stop doing so.
      
      (A variation of this theme fixed Justin Piszcz's OOM problem when
      copying an 18GB file on a RAID setup).
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Justin Piszcz <jpiszcz@lucidpixels.com>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Adrian Bunk <bunk@stusta.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc6e29da
  6. 27 1月, 2007 4 次提交
  7. 23 1月, 2007 1 次提交
  8. 13 1月, 2007 1 次提交
  9. 12 1月, 2007 2 次提交
    • T
      [PATCH] NFS: Fix race in nfs_release_page() · e3db7691
      Trond Myklebust 提交于
          NFS: Fix race in nfs_release_page()
      
          invalidate_inode_pages2() may find the dirty bit has been set on a page
          owing to the fact that the page may still be mapped after it was locked.
          Only after the call to unmap_mapping_range() are we sure that the page
          can no longer be dirtied.
          In order to fix this, NFS has hooked the releasepage() method and tries
          to write the page out between the call to unmap_mapping_range() and the
          call to remove_mapping(). This, however leads to deadlocks in the page
          reclaim code, where the page may be locked without holding a reference
          to the inode or dentry.
      
          Fix is to add a new address_space_operation, launder_page(), which will
          attempt to write out a dirty page without releasing the page lock.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      
          Also, the bare SetPageDirty() can skew all sort of accounting leading to
          other nasties.
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e3db7691
    • D
      [PATCH] Fix sparsemem on Cell · a2f3aa02
      Dave Hansen 提交于
      Fix an oops experienced on the Cell architecture when init-time functions,
      early_*(), are called at runtime.  It alters the call paths to make sure
      that the callers explicitly say whether the call is being made on behalf of
      a hotplug even, or happening at boot-time.
      
      It has been compile tested on ppc64, ia64, s390, i386 and x86_64.
      Acked-by: NArnd Bergmann <arndb@de.ibm.com>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Acked-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a2f3aa02
  10. 09 1月, 2007 1 次提交
    • R
      [ARM] pass vma for flush_anon_page() · a6f36be3
      Russell King 提交于
      Since get_user_pages() may be used with processes other than the
      current process and calls flush_anon_page(), flush_anon_page() has to
      cope in some way with non-current processes.
      
      It may not be appropriate, or even desirable to flush a region of
      virtual memory cache in the current process when that is different to
      the process that we want the flush to occur for.
      
      Therefore, pass the vma into flush_anon_page() so that the architecture
      can work out whether the 'vmaddr' is for the current process or not.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      a6f36be3
  11. 06 1月, 2007 2 次提交