1. 17 10月, 2008 1 次提交
  2. 03 9月, 2008 1 次提交
    • H
      VFS: fix dio write returning EIO when try_to_release_page fails · 6ccfa806
      Hisashi Hifumi 提交于
      Dio write returns EIO when try_to_release_page fails because bh is
      still referenced.
      
      The patch
      
          commit 3f31fddf
          Author: Mingming Cao <cmm@us.ibm.com>
          Date:   Fri Jul 25 01:46:22 2008 -0700
      
              jbd: fix race between free buffer and commit transaction
      
      was merged into 2.6.27-rc1, but I noticed that this patch is not enough
      to fix the race.
      
      I did fsstress test heavily to 2.6.27-rc1, and found that dio write still
      sometimes got EIO through this test.
      
      The patch above fixed race between freeing buffer(dio) and committing
      transaction(jbd) but I discovered that there is another race, freeing
      buffer(dio) and ext3/4_ordered_writepage.
      
      : background_writeout()
           ->write_cache_pages()
             ->ext3_ordered_writepage()
           	   walk_page_buffers() -> take a bh ref
       	   block_write_full_page() -> unlock_page
      		: <- end_page_writeback
                      : <- race! (dio write->try_to_release_page fails)
            	   walk_page_buffers() ->release a bh ref
      
      ext3_ordered_writepage holds bh ref and does unlock_page remaining
      taking a bh ref, so this causes the race and failure of
      try_to_release_page.
      
      To fix this race, I used the approach of falling back to buffered
      writes if try_to_release_page() fails on a page.
      
      [akpm@linux-foundation.org: cleanups]
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ccfa806
  3. 05 8月, 2008 1 次提交
  4. 03 8月, 2008 1 次提交
    • M
      mm: dont clear PG_uptodate on truncate/invalidate · 84209e02
      Miklos Szeredi 提交于
      Brian Wang reported that a FUSE filesystem exported through NFS could
      return I/O errors on read.  This was traced to splice_direct_to_actor()
      returning a short or zero count when racing with page invalidation.
      
      However this is not FUSE or NFSD specific, other filesystems (notably
      NFS) also call invalidate_inode_pages2() to purge stale data from the
      cache.
      
      If this happens while such pages are sitting in a pipe buffer, then
      splice(2) from the pipe can return zero, and read(2) from the pipe can
      return ENODATA.
      
      The zero return is especially bad, since it implies end-of-file or
      disconnected pipe/socket, and is documented as such for splice.  But
      returning an error for read() is also nasty, when in fact there was no
      error (data becoming stale is not an error).
      
      The same problems can be triggered by "hole punching" with
      madvise(MADV_REMOVE).
      
      Fix this by not clearing the PG_uptodate flag on truncation and
      invalidation.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84209e02
  5. 27 7月, 2008 1 次提交
  6. 28 4月, 2008 1 次提交
    • H
      fix invalidate_inode_pages2_range() to not clear ret · 0dd1334f
      Hisashi Hifumi 提交于
      DIO invalidates page cache through invalidate_inode_pages2_range().
      invalidate_inode_pages2_range() sets ret=-EIO when
      invalidate_complete_page2() fails, but this ret is cleared if
      do_launder_page() succeed on a page of next index.
      
      In this case, dio is carried out even if invalidate_complete_page2() fails
      on some pages.
      
      This can cause inconsistency between memory and blocks on HDD because the
      page cache still exists.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Ken Chen <kenchen@google.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Chuck Lever <cel@citi.umich.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0dd1334f
  7. 04 3月, 2008 1 次提交
  8. 06 2月, 2008 3 次提交
    • S
      page migraton: handle orphaned pages · 62e1c553
      Shaohua Li 提交于
      Orphaned page might have fs-private metadata, the page is truncated.  As
      the page hasn't mapping, page migration refuse to migrate the page.  It
      appears the page is only freed in page reclaim and if zone watermark is
      low, the page is never freed, as a result migration always fail.  I thought
      we could free the metadata so such page can be freed in migration and make
      migration more reliable.
      
      [akpm@linux-foundation.org: go direct to try_to_free_buffers()]
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62e1c553
    • B
      Fix dirty page accounting leak with ext3 data=journal · a2b34564
      Bjorn Steinbrink 提交于
      In 46d2277c ("Clean up and make
      try_to_free_buffers() not race with dirty pages"), try_to_free_buffers
      was changed to bail out if the page was dirty.
      
      That in turn caused truncate_complete_page to leak massive amounts of
      memory, because the dirty bit was only cleared after the call to
      try_to_free_buffers.
      
      So the call to cancel_dirty_page was moved up to have the dirty bit
      cleared early in 3e67c098 ("truncate:
      clear page dirtiness before running try_to_free_buffers()").
      
      The problem with that fix is, that the page can be redirtied after
      cancel_dirty_page was called, eg. like this:
      
      truncate_complete_page()
        cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
        do_invalidatepage()
          ext3_invalidatepage()
            journal_invalidatepage()
              journal_unmap_buffer()
                __dispose_buffer()
                  __journal_unfile_buffer()
                    __journal_temp_unlink_buffer()
                      mark_buffer_dirty(); // PG_dirty set, incr. dirty pages
      
      And then we end up with dirty pages being wrongly accounted.
      
      As a result, in ecdfc978 ("Resurrect
      'try_to_free_buffers()' VM hackery") the changes to try_to_free_buffers
      were reverted, so the original reason for the massive memory leak is
      gone, and we can also revert the move of the call to cancel_dirty_page
      from truncate_complete_page and get the accounting right again.
      
      I'm not sure if it matters, but opposed to the final check in
      __remove_from_page_cache, this one also cares about the task io
      accounting, so maybe we want to use this instead, although it's not
      quite the clean fix either.
      Signed-off-by: NBjörn Steinbrink <B.Steinbrink@gmx.de>
      Tested-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Osterried <osterried@jesse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2b34564
    • C
      Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user · eebd2aa3
      Christoph Lameter 提交于
      Simplify page cache zeroing of segments of pages through 3 functions
      
      zero_user_segments(page, start1, end1, start2, end2)
      
              Zeros two segments of the page. It takes the position where to
              start and end the zeroing which avoids length calculations and
      	makes code clearer.
      
      zero_user_segment(page, start, end)
      
              Same for a single segment.
      
      zero_user(page, start, length)
      
              Length variant for the case where we know the length.
      
      We remove the zero_user_page macro. Issues:
      
      1. Its a macro. Inline functions are preferable.
      
      2. The KM_USER0 macro is only defined for HIGHMEM.
      
         Having to treat this special case everywhere makes the
         code needlessly complex. The parameter for zeroing is always
         KM_USER0 except in one single case that we open code.
      
      Avoiding KM_USER0 makes a lot of code not having to be dealing
      with the special casing for HIGHMEM anymore. Dealing with
      kmap is only necessary for HIGHMEM configurations. In those
      configurations we use KM_USER0 like we do for a series of other
      functions defined in highmem.h.
      
      Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
      function could not be a macro. zero_user_* functions introduced
      here can be be inline because that constant is not used when these
      functions are called.
      
      Also extract the flushing of the caches to be outside of the kmap.
      
      [akpm@linux-foundation.org: fix nfs and ntfs build]
      [akpm@linux-foundation.org: fix ntfs build some more]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eebd2aa3
  9. 04 2月, 2008 1 次提交
  10. 17 10月, 2007 2 次提交
  11. 20 7月, 2007 2 次提交
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
    • N
      mm: fix fault vs invalidate race for linear mappings · d00806b1
      Nick Piggin 提交于
      Fix the race between invalidate_inode_pages and do_no_page.
      
      Andrea Arcangeli identified a subtle race between invalidation of pages from
      pagecache with userspace mappings, and do_no_page.
      
      The issue is that invalidation has to shoot down all mappings to the page,
      before it can be discarded from the pagecache.  Between shooting down ptes to
      a particular page, and actually dropping the struct page from the pagecache,
      do_no_page from any process might fault on that page and establish a new
      mapping to the page just before it gets discarded from the pagecache.
      
      The most common case where such invalidation is used is in file truncation.
      This case was catered for by doing a sort of open-coded seqlock between the
      file's i_size, and its truncate_count.
      
      Truncation will decrease i_size, then increment truncate_count before
      unmapping userspace pages; do_no_page will read truncate_count, then find the
      page if it is within i_size, and then check truncate_count under the page
      table lock and back out and retry if it had subsequently been changed (ptl
      will serialise against unmapping, and ensure a potentially updated
      truncate_count is actually visible).
      
      Complexity and documentation issues aside, the locking protocol fails in the
      case where we would like to invalidate pagecache inside i_size.  do_no_page
      can come in anytime and filemap_nopage is not aware of the invalidation in
      progress (as it is when it is outside i_size).  The end result is that
      dangling (->mapping == NULL) pages that appear to be from a particular file
      may be mapped into userspace with nonsense data.  Valid mappings to the same
      place will see a different page.
      
      Andrea implemented two working fixes, one using a real seqlock, another using
      a page->flags bit.  He also proposed using the page lock in do_no_page, but
      that was initially considered too heavyweight.  However, it is not a global or
      per-file lock, and the page cacheline is modified in do_no_page to increment
      _count and _mapcount anyway, so a further modification should not be a large
      performance hit.  Scalability is not an issue.
      
      This patch implements this latter approach.  ->nopage implementations return
      with the page locked if it is possible for their underlying file to be
      invalidated (in that case, they must set a special vm_flags bit to indicate
      so).  do_no_page only unlocks the page after setting up the mapping
      completely.  invalidation is excluded because it holds the page lock during
      invalidation of each page (and ensures that the page is not mapped while
      holding the lock).
      
      This also allows significant simplifications in do_no_page, because we have
      the page locked in the right place in the pagecache from the start.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d00806b1
  12. 18 7月, 2007 1 次提交
    • N
      fs: introduce some page/buffer invariants · 787d2214
      Nick Piggin 提交于
      It is a bug to set a page dirty if it is not uptodate unless it has
      buffers.  If the page has buffers, then the page may be dirty (some buffers
      dirty) but not uptodate (some buffers not uptodate).  The exception to this
      rule is if the set_page_dirty caller is racing with truncate or invalidate.
      
      A buffer can not be set dirty if it is not uptodate.
      
      If either of these situations occurs, it indicates there could be some data
      loss problem.  Some of these warnings could be a harmless one where the
      page or buffer is set uptodate immediately after it is dirtied, however we
      should fix those up, and enforce this ordering.
      
      Bring the order of operations for truncate into line with those of
      invalidate.  This will prevent a page from being able to go !uptodate while
      we're holding the tree_lock, which is probably a good thing anyway.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      787d2214
  13. 17 7月, 2007 2 次提交
  14. 10 5月, 2007 1 次提交
    • N
      fs: convert core functions to zero_user_page · 01f2705d
      Nate Diller 提交于
      It's very common for file systems to need to zero part or all of a page,
      the simplist way is just to use kmap_atomic() and memset().  There's
      actually a library function in include/linux/highmem.h that does exactly
      that, but it's confusingly named memclear_highpage_flush(), which is
      descriptive of *how* it does the work rather than what the *purpose* is.
      So this patchset renames the function to zero_user_page(), and calls it
      from the various places that currently open code it.
      
      This first patch introduces the new function call, and converts all the
      core kernel callsites, both the open-coded ones and the old
      memclear_highpage_flush() ones.  Following this patch is a series of
      conversions for each file system individually, per AKPM, and finally a
      patch deprecating the old call.  The diffstat below shows the entire
      patchset.
      
      [akpm@linux-foundation.org: fix a few things]
      Signed-off-by: NNate Diller <nate.diller@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01f2705d
  15. 02 3月, 2007 1 次提交
  16. 12 2月, 2007 2 次提交
  17. 27 1月, 2007 2 次提交
  18. 12 1月, 2007 1 次提交
    • T
      [PATCH] NFS: Fix race in nfs_release_page() · e3db7691
      Trond Myklebust 提交于
          NFS: Fix race in nfs_release_page()
      
          invalidate_inode_pages2() may find the dirty bit has been set on a page
          owing to the fact that the page may still be mapped after it was locked.
          Only after the call to unmap_mapping_range() are we sure that the page
          can no longer be dirtied.
          In order to fix this, NFS has hooked the releasepage() method and tries
          to write the page out between the call to unmap_mapping_range() and the
          call to remove_mapping(). This, however leads to deadlocks in the page
          reclaim code, where the page may be locked without holding a reference
          to the inode or dentry.
      
          Fix is to add a new address_space_operation, launder_page(), which will
          attempt to write out a dirty page without releasing the page lock.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      
          Also, the bare SetPageDirty() can skew all sort of accounting leading to
          other nasties.
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e3db7691
  19. 24 12月, 2006 1 次提交
  20. 23 12月, 2006 1 次提交
  21. 22 12月, 2006 2 次提交
    • A
      [PATCH] truncate: clear page dirtiness before running try_to_free_buffers() · 3e67c098
      Andrew Morton 提交于
      truncate presently invalidates the dirty page's buffer_heads then shoots down
      the page.  But try_to_free_buffers() will now bale out because the page is
      dirty.
      
      Net effect: the LRU gets filled with dirty pages which have invalidated
      buffer_heads attached.  They have no ->mapping and hence cannot be cleaned.
      The machine leaks memory at an enormous rate.
      
      Fix this by cleaning the page before running try_to_free_buffers(), so
      try_to_free_buffers() can do its work.
      
      Also, remember to do dirty-page-acoounting in cancel_dirty_page() so the
      machine won't wedge up trying to write non-existent dirty pages.
      
      Probably still wrong, but now less so.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3e67c098
    • L
      VM: Remove "clear_page_dirty()" and "test_clear_page_dirty()" functions · fba2591b
      Linus Torvalds 提交于
      They were horribly easy to mis-use because of their tempting naming, and
      they also did way more than any users of them generally wanted them to
      do.
      
      A dirty page can become clean under two circumstances:
      
       (a) when we write it out.  We have "clear_page_dirty_for_io()" for
           this, and that function remains unchanged.
      
           In the "for IO" case it is not sufficient to just clear the dirty
           bit, you also have to mark the page as being under writeback etc.
      
       (b) when we actually remove a page due to it becoming inaccessible to
           users, notably because it was truncate()'d away or the file (or
           metadata) no longer exists, and we thus want to cancel any
           outstanding dirty state.
      
      For the (b) case, we now introduce "cancel_dirty_page()", which only
      touches the page state itself, and verifies that the page is not mapped
      (since cancelling writes on a mapped page would be actively wrong as it
      is still accessible to users).
      
      Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
      ReiserFS, XFS all use the old confusing functions, and will be fixed
      separately in subsequent commits (with some of them just removing the
      offending logic, and others using clear_page_dirty_for_io()).
      
      This was confirmed by Martin Michlmayr to fix the apt database
      corruption on ARM.
      
      Cc: Martin Michlmayr <tbm@cyrius.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Andrei Popa <andrei.popa@i-neo.ro>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Gordon Farquharson <gordonfarquharson@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fba2591b
  22. 11 12月, 2006 1 次提交
  23. 17 10月, 2006 1 次提交
  24. 12 10月, 2006 2 次提交
  25. 01 10月, 2006 3 次提交
    • A
      [PATCH] invalidate_inode_pages2(): ignore page refcounts · bd4c8ce4
      Andrew Morton 提交于
      The recent fix to invalidate_inode_pages() (git commit 016eb4a0) managed to
      unfix invalidate_inode_pages2().
      
      The problem is that various bits of code in the kernel can take transient refs
      on pages: the page scanner will do this when inspecting a batch of pages, and
      the lru_cache_add() batching pagevecs also hold a ref.
      
      Net result is transient failures in invalidate_inode_pages2().  This affects
      NFS directory invalidation (observed) and presumably also block-backed
      direct-io (not yet reported).
      
      Fix it by reverting invalidate_inode_pages2() back to the old version which
      ignores the page refcounts.
      
      We may come up with something more clever later, but for now we need a 2.6.18
      fix for NFS.
      
      Cc: Chuck Lever <cel@citi.umich.edu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bd4c8ce4
    • D
      [PATCH] BLOCK: Make it possible to disable the block layer [try #6] · 9361401e
      David Howells 提交于
      Make it possible to disable the block layer.  Not all embedded devices require
      it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
      the block layer to be present.
      
      This patch does the following:
      
       (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
           support.
      
       (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
           an item that uses the block layer.  This includes:
      
           (*) Block I/O tracing.
      
           (*) Disk partition code.
      
           (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
      
           (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
           	 block layer to do scheduling.  Some drivers that use SCSI facilities -
           	 such as USB storage - end up disabled indirectly from this.
      
           (*) Various block-based device drivers, such as IDE and the old CDROM
           	 drivers.
      
           (*) MTD blockdev handling and FTL.
      
           (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
           	 taking a leaf out of JFFS2's book.
      
       (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
           linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
           however, still used in places, and so is still available.
      
       (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
           parts of linux/fs.h.
      
       (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
      
       (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
      
       (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
           is not enabled.
      
       (*) fs/no-block.c is created to hold out-of-line stubs and things that are
           required when CONFIG_BLOCK is not set:
      
           (*) Default blockdev file operations (to give error ENODEV on opening).
      
       (*) Makes some /proc changes:
      
           (*) /proc/devices does not list any blockdevs.
      
           (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
      
       (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
      
       (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
           given command other than Q_SYNC or if a special device is specified.
      
       (*) In init/do_mounts.c, no reference is made to the blockdev routines if
           CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
      
       (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
           error ENOSYS by way of cond_syscall if so).
      
       (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
           CONFIG_BLOCK is not set, since they can't then happen.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9361401e
    • D
      [PATCH] BLOCK: Move functions out of buffer code [try #6] · cf9a2ae8
      David Howells 提交于
      Move some functions out of the buffering code that aren't strictly buffering
      specific.  This is a precursor to being able to disable the block layer.
      
       (*) Moved some stuff out of fs/buffer.c:
      
           (*) The file sync and general sync stuff moved to fs/sync.c.
      
           (*) The superblock sync stuff moved to fs/super.c.
      
           (*) do_invalidatepage() moved to mm/truncate.c.
      
           (*) try_to_release_page() moved to mm/filemap.c.
      
       (*) Moved some related declarations between header files:
      
           (*) declarations for do_invalidatepage() and try_to_release_page() moved
           	 to linux/mm.h.
      
           (*) __set_page_dirty_buffers() moved to linux/buffer_head.h.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cf9a2ae8
  26. 27 9月, 2006 1 次提交
  27. 09 9月, 2006 1 次提交
    • A
      [PATCH] invalidate_complete_page() race fix · 016eb4a0
      Andrew Morton 提交于
      If a CPU faults this page into pagetables after invalidate_mapping_pages()
      checked page_mapped(), invalidate_complete_page() will still proceed to remove
      the page from pagecache.  This leaves the page-faulting process with a
      detached page.  If it was MAP_SHARED then file data loss will ensue.
      
      Fix that up by checking the page's refcount after taking tree_lock.
      
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      016eb4a0
  28. 23 6月, 2006 1 次提交
    • N
      [PATCH] Remove semi-softlockup from invalidate_mapping_pages · e0f23603
      NeilBrown 提交于
      If invalidate_mapping_pages is called to invalidate a very large mapping
      (e.g.  a very large block device) and if the only active page in that
      device is near the end (or at least, at a very large index), such as, say,
      the superblock of an md array, and if that page happens to be locked when
      invalidate_mapping_pages is called, then
      
        pagevec_lookup will return this page and
        as it is locked, 'next' will be incremented and pagevec_lookup
        will be called again. and again. and again.
        while we count from 0 upto a very large number.
      
      We should really always set 'next' to 'page->index+1' before going around
      the loop again, not just if the page isn't locked.
      
      Cc: "Steinar H. Gunderson" <sgunderson@bigfoot.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e0f23603
  29. 10 1月, 2006 1 次提交