1. 10 10月, 2007 1 次提交
  2. 20 7月, 2007 1 次提交
  3. 19 7月, 2007 1 次提交
    • D
      [FS] Implement block_page_mkwrite. · 54171690
      David Chinner 提交于
      Many filesystems need a ->page-mkwrite callout to correctly
      set up pages that have been written to by mmap. This is especially
      important when mmap is writing into holes as it allows filesystems
      to correctly account for and allocate space before the mmap
      write is allowed to proceed.
      
      Protection against truncate races is provided by locking the page
      and checking to see whether the page mapping is correct and whether
      it is beyond EOF so we don't end up allowing allocations beyond
      the current EOF or changing EOF as a result of a mmap write.
      
      SGI-PV: 940392
      SGI-Modid: 2.6.x-xfs-melb:linux:29146a
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      54171690
  4. 18 7月, 2007 3 次提交
    • N
      fs: introduce some page/buffer invariants · 787d2214
      Nick Piggin 提交于
      It is a bug to set a page dirty if it is not uptodate unless it has
      buffers.  If the page has buffers, then the page may be dirty (some buffers
      dirty) but not uptodate (some buffers not uptodate).  The exception to this
      rule is if the set_page_dirty caller is racing with truncate or invalidate.
      
      A buffer can not be set dirty if it is not uptodate.
      
      If either of these situations occurs, it indicates there could be some data
      loss problem.  Some of these warnings could be a harmless one where the
      page or buffer is set uptodate immediately after it is dirtied, however we
      should fix those up, and enforce this ordering.
      
      Bring the order of operations for truncate into line with those of
      invalidate.  This will prevent a page from being able to go !uptodate while
      we're holding the tree_lock, which is probably a good thing anyway.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      787d2214
    • A
      Lumpy Reclaim V4 · 5ad333eb
      Andy Whitcroft 提交于
      When we are out of memory of a suitable size we enter reclaim.  The current
      reclaim algorithm targets pages in LRU order, which is great for fairness at
      order-0 but highly unsuitable if you desire pages at higher orders.  To get
      pages of higher order we must shoot down a very high proportion of memory;
      >95% in a lot of cases.
      
      This patch set adds a lumpy reclaim algorithm to the allocator.  It targets
      groups of pages at the specified order anchored at the end of the active and
      inactive lists.  This encourages groups of pages at the requested orders to
      move from active to inactive, and active to free lists.  This behaviour is
      only triggered out of direct reclaim when higher order pages have been
      requested.
      
      This patch set is particularly effective when utilised with an
      anti-fragmentation scheme which groups pages of similar reclaimability
      together.
      
      This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch which forms
      the foundation.  Credit to Mel Gorman for sanitity checking.
      
      Mel said:
      
        The patches have an application with hugepage pool resizing.
      
        When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can
        be resized with greater reliability.  Testing on a desktop machine with 2GB
        of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it's own
        was very slow as the success rate was quite low.  Without lumpy-reclaim,
        each attempt to grow the pool by 100 pages would yield 1 or 2 hugepages.
        With lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical.
      
      [akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup]
      [bunk@stusta.de: static declarations for internal functions]
      [a.p.zijlstra@chello.nl: initial lumpy V2 implementation]
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Bob Picco <bob.picco@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ad333eb
    • M
      Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated · 769848c0
      Mel Gorman 提交于
      It is often known at allocation time whether a page may be migrated or not.
      This patch adds a flag called __GFP_MOVABLE and a new mask called
      GFP_HIGH_MOVABLE.  Allocations using the __GFP_MOVABLE can be either migrated
      using the page migration mechanism or reclaimed by syncing with backing
      storage and discarding.
      
      An API function very similar to alloc_zeroed_user_highpage() is added for
      __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable().  The
      flags used by alloc_zeroed_user_highpage() are not changed because it would
      change the semantics of an existing API.  After this patch is applied there
      are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
      be marked deprecated if this patch is merged.
      
      Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
      shmem.c to keep all flag modifications to inode->mapping in the
      shmem_dir_alloc() helper function.  This clean-up suggestion is courtesy of
      Hugh Dickens.
      
      Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
      concept.  Credit to Hugh Dickens for catching issues with shmem swap vector
      and ramfs allocations.
      
      [akpm@linux-foundation.org: build fix]
      [hugh@veritas.com: __GFP_ZERO cleanup]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      769848c0
  5. 17 7月, 2007 1 次提交
  6. 22 5月, 2007 1 次提交
  7. 17 5月, 2007 2 次提交
    • C
      Fix page allocation flags in grow_dev_page() · ea125892
      Christoph Lameter 提交于
      grow_dev_page() simply passes GFP_NOFS to find_or_create_page.  This means
      the allocation of radix tree nodes is done with GFP_NOFS and the allocation
      of a new page is done using GFP_NOFS.
      
      The mapping has a flags field that contains the necessary allocation flags
      for the page cache allocation.  These need to be consulted in order to get
      DMA and HIGHMEM allocations etc right.  And yes a blockdev could be
      allowing Highmem allocations if its a ramdisk.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea125892
    • C
      Remove SLAB_CTOR_CONSTRUCTOR · a35afb83
      Christoph Lameter 提交于
      SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35afb83
  8. 10 5月, 2007 2 次提交
    • R
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki 提交于
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
    • N
      fs: convert core functions to zero_user_page · 01f2705d
      Nate Diller 提交于
      It's very common for file systems to need to zero part or all of a page,
      the simplist way is just to use kmap_atomic() and memset().  There's
      actually a library function in include/linux/highmem.h that does exactly
      that, but it's confusingly named memclear_highpage_flush(), which is
      descriptive of *how* it does the work rather than what the *purpose* is.
      So this patchset renames the function to zero_user_page(), and calls it
      from the various places that currently open code it.
      
      This first patch introduces the new function call, and converts all the
      core kernel callsites, both the open-coded ones and the old
      memclear_highpage_flush() ones.  Following this patch is a series of
      conversions for each file system individually, per AKPM, and finally a
      patch deprecating the old call.  The diffstat below shows the entire
      patchset.
      
      [akpm@linux-foundation.org: fix a few things]
      Signed-off-by: NNate Diller <nate.diller@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01f2705d
  9. 09 5月, 2007 2 次提交
  10. 08 5月, 2007 4 次提交
    • C
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter 提交于
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      SLAB.
      
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50953fe9
    • P
      mm: optimize kill_bdev() · f9a14399
      Peter Zijlstra 提交于
      Remove duplicate work in kill_bdev().
      
      It currently invalidates and then truncates the bdev's mapping.
      invalidate_mapping_pages() will opportunistically remove pages from the
      mapping.  And truncate_inode_pages() will forcefully remove all pages.
      
      The only thing truncate doesn't do is flush the bh lrus.  So do that
      explicitly.  This avoids (very unlikely) but possible invalid lookup
      results if the same bdev is quickly re-issued.
      
      It also will prevent extreme kernel latencies which are observed when
      blockdevs which have a large amount of pagecache are unmounted, by avoiding
      invalidate_mapping_pages() on that path.  invalidate_mapping_pages() has no
      cond_resched (it can be called under spinlock), whereas truncate_inode_pages()
      has one.
      
      [akpm@linux-foundation.org: restore nrpages==0 optimisation]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9a14399
    • P
      mm: remove destroy_dirty_buffers from invalidate_bdev() · f98393a6
      Peter Zijlstra 提交于
      Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
      been used in 6 years (so akpm says).
      
      find * -name \*.[ch] | xargs grep -l invalidate_bdev |
      while read file; do
      	quilt add $file;
      	sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
      done
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f98393a6
    • N
      fs: buffer don't PageUptodate without page locked · 3d67f2d7
      Nick Piggin 提交于
      __block_write_full_page is calling SetPageUptodate without the page locked.
      This is unusual, but not incorrect, as PG_writeback is still set.
      
      However the next patch will require that SetPageUptodate always be called with
      the page locked.  Simply don't bother setting the page uptodate in this case
      (it is unusual that the write path does such a thing anyway).  Instead just
      leave it to the read side to bring the page uptodate when it notices that all
      buffers are uptodate.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d67f2d7
  11. 07 3月, 2007 1 次提交
  12. 21 2月, 2007 2 次提交
  13. 13 2月, 2007 2 次提交
  14. 12 2月, 2007 2 次提交
    • N
      [PATCH] buffer: memorder fix · 72ed3d03
      Nick Piggin 提交于
      unlock_buffer(), like unlock_page(), must not clear the lock without
      ensuring that the critical section is closed.
      
      Mingming later sent the same patch, saying:
      
        We are running SDET benchmark and saw double free issue for ext3 extended
        attributes block, which complains the same xattr block already being freed (in
        ext3_xattr_release_block()).  The problem could also been triggered by
        multiple threads loop untar/rm a kernel tree.
      
        The race is caused by missing a memory barrier at unlock_buffer() before the
        lock bit being cleared, resulting in possible concurrent h_refcounter update.
        That causes a reference counter leak, then later leads to the double free that
        we have seen.
      
        Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
        bit is being cleared, however, there is no memory barrier *before* the bit is
        cleared.  On some arch the h_refcount update instruction and the clear bit
        instruction could be reordered, thus leave the critical section re-entered.
      
        The race is like this: For example, if the h_refcount is initialized as 1,
      
        cpu 0:                                   cpu1
        --------------------------------------   -----------------------------------
        lock_buffer() /* test_and_set_bit */
        clear_buffer_locked(bh);
                                                lock_buffer() /* test_and_set_bit */
        h_refcount = h_refcount+1; /* = 2*/     h_refcount = h_refcount + 1; /*= 2 */
                                                clear_buffer_locked(bh);
        ....                                    ......
      
        We lost a h_refcount here. We need a memory barrier before the buffer head lock
        bit being cleared to force the order of the two writes.  Please apply.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72ed3d03
    • A
      [PATCH] remove invalidate_inode_pages() · fc0ecff6
      Andrew Morton 提交于
      Convert all calls to invalidate_inode_pages() into open-coded calls to
      invalidate_mapping_pages().
      
      Leave the invalidate_inode_pages() wrapper in place for now, marked as
      deprecated.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc0ecff6
  15. 30 1月, 2007 1 次提交
  16. 27 1月, 2007 1 次提交
    • L
      Resurrect 'try_to_free_buffers()' VM hackery · ecdfc978
      Linus Torvalds 提交于
      It's not pretty, but it appears that ext3 with data=journal will clean
      pages without ever actually telling the VM that they are clean.  This,
      in turn, will result in the VM (and balance_dirty_pages() in particular)
      to never realize that the pages got cleaned, and wait forever for an
      event that already happened.
      
      Technically, this seems to be a problem with ext3 itself, but it used to
      be hidden by 'try_to_free_buffers()' noticing this situation on its own,
      and just working around the filesystem problem.
      
      This commit re-instates that hack, in order to avoid a regression for
      the 2.6.20 release. This fixes bugzilla 7844:
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=7844
      
      Peter Zijlstra points out that we should probably retain the debugging
      code that this removes from cancel_dirty_page(), and I agree, but for
      the imminent release we might as well just silence the warning too
      (since it's not a new bug: anything that triggers that warning has been
      around forever).
      Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ecdfc978
  17. 12 1月, 2007 1 次提交
  18. 22 12月, 2006 1 次提交
  19. 11 12月, 2006 3 次提交
    • A
      [PATCH] io-accounting: write-cancel accounting · e08748ce
      Andrew Morton 提交于
      Account for the number of byte writes which this process caused to not happen
      after all.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e08748ce
    • A
      [PATCH] io-accounting: write accounting · 55e829af
      Andrew Morton 提交于
      Accounting writes is fairly simple: whenever a process flips a page from clean
      to dirty, we accuse it of having caused a write to underlying storage of
      PAGE_CACHE_SIZE bytes.
      
      This may overestimate the amount of writing: the page-dirtying may cause only
      one buffer_head's worth of writeout.  Fixing that is possible, but probably a
      bit messy and isn't obviously important.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      55e829af
    • A
      [PATCH] clean up __set_page_dirty_nobuffers() · 8c08540f
      Andrew Morton 提交于
      Save a tabstop in __set_page_dirty_nobuffers() and __set_page_dirty_buffers()
      and a few other places.  No functional changes.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8c08540f
  20. 08 12月, 2006 2 次提交
  21. 17 10月, 2006 1 次提交
    • J
      [PATCH] Fix IO error reporting on fsync() · 58ff407b
      Jan Kara 提交于
      When IO error happens on metadata buffer, buffer is freed from memory and
      later fsync() is called, filesystems like ext2 fail to report EIO.  We
      
      solve the problem by introducing a pointer to associated address space into
      the buffer_head.  When a buffer is removed from a list of metadata buffers
      associated with an address space, IO error is transferred from the buffer to
      the address space, so that fsync can later report it.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      58ff407b
  22. 12 10月, 2006 2 次提交
  23. 10 10月, 2006 1 次提交
    • N
      [PATCH] mm: bug in set_page_dirty_buffers · ebf7a227
      Nick Piggin 提交于
      This was triggered, but not the fault of, the dirty page accounting
      patches. Suitable for -stable as well, after it goes upstream.
      
        Unable to handle kernel NULL pointer dereference at virtual address 0000004c
        EIP is at _spin_lock+0x12/0x66
        Call Trace:
         [<401766e7>] __set_page_dirty_buffers+0x15/0xc0
         [<401401e7>] set_page_dirty+0x2c/0x51
         [<40140db2>] set_page_dirty_balance+0xb/0x3b
         [<40145d29>] __do_fault+0x1d8/0x279
         [<40147059>] __handle_mm_fault+0x125/0x951
         [<401133f1>] do_page_fault+0x440/0x59f
         [<4034d0c1>] error_code+0x39/0x40
         [<08048a33>] 0x8048a33
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ebf7a227
  24. 01 10月, 2006 1 次提交
    • D
      [PATCH] BLOCK: Move functions out of buffer code [try #6] · cf9a2ae8
      David Howells 提交于
      Move some functions out of the buffering code that aren't strictly buffering
      specific.  This is a precursor to being able to disable the block layer.
      
       (*) Moved some stuff out of fs/buffer.c:
      
           (*) The file sync and general sync stuff moved to fs/sync.c.
      
           (*) The superblock sync stuff moved to fs/super.c.
      
           (*) do_invalidatepage() moved to mm/truncate.c.
      
           (*) try_to_release_page() moved to mm/filemap.c.
      
       (*) Moved some related declarations between header files:
      
           (*) declarations for do_invalidatepage() and try_to_release_page() moved
           	 to linux/mm.h.
      
           (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cf9a2ae8
  25. 26 9月, 2006 1 次提交
    • P
      [PATCH] mm: tracking shared dirty pages · d08b3851
      Peter Zijlstra 提交于
      Tracking of dirty pages in shared writeable mmap()s.
      
      The idea is simple: write protect clean shared writeable pages, catch the
      write-fault, make writeable and set dirty.  On page write-back clean all the
      PTE dirty bits and write protect them once again.
      
      The implementation is a tad harder, mainly because the default
      backing_dev_info capabilities were too loosely maintained.  Hence it is not
      enough to test the backing_dev_info for cap_account_dirty.
      
      The current heuristic is as follows, a VMA is eligible when:
       - its shared writeable
          (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
       - it is not a 'special' mapping
          (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
       - the backing_dev_info is cap_account_dirty
          mapping_cap_account_dirty(vma->vm_file->f_mapping)
       - f_op->mmap() didn't change the default page protection
      
      Page from remap_pfn_range() are explicitly excluded because their COW
      semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
      because they don't have a backing store anyway.
      
      mprotect() is taught about the new behaviour as well.  However it overrides
      the last condition.
      
      Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
      It can be called on any page, but is currently only implemented for mapped
      pages, if the page is found the be of a VMA that accounts dirty pages it will
      also wrprotect the PTE.
      
      Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
      under ->private_lock.  This seems to be safe, since ->private_lock is used to
      serialize access to the buffers, not the page itself.  This is needed because
      clear_page_dirty() will call into page_mkclean() and would thereby violate
      locking order.
      
      [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d08b3851