1. 08 6月, 2011 1 次提交
    • W
      writeback: introduce .tagged_writepages for the WB_SYNC_NONE sync stage · 6e6938b6
      Wu Fengguang 提交于
      sync(2) is performed in two stages: the WB_SYNC_NONE sync and the
      WB_SYNC_ALL sync. Identify the first stage with .tagged_writepages and
      do livelock prevention for it, too.
      
      Jan's commit f446daae ("mm: implement writeback livelock avoidance
      using page tagging") is a partial fix in that it only fixed the
      WB_SYNC_ALL phase livelock.
      
      Although ext4 is tested to no longer livelock with commit f446daae,
      it may due to some "redirty_tail() after pages_skipped" effect which
      is by no means a guarantee for _all_ the file systems.
      
      Note that writeback_inodes_sb() is called by not only sync(), they are
      treated the same because the other callers also need livelock prevention.
      
      Impact:  It changes the order in which pages/inodes are synced to disk.
      Now in the WB_SYNC_NONE stage, it won't proceed to write the next inode
      until finished with the current inode.
      Acked-by: NJan Kara <jack@suse.cz>
      CC: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      6e6938b6
  2. 27 5月, 2011 1 次提交
    • C
      fs: pass exact type of data dirties to ->dirty_inode · aa385729
      Christoph Hellwig 提交于
      Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
      anything else, so that the filesystem can track internally if it
      needs to push out a transaction for fdatasync or not.
      
      This is just the prototype change with no user for it yet.  I plan
      to push large XFS changes for the next merge window, and getting
      this trivial infrastructure in this window would help a lot to avoid
      tree interdependencies.
      
      Also remove incorrect comments that ->dirty_inode can't block.  That
      has been changed a long time ago, and many implementations rely on it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aa385729
  3. 26 5月, 2011 1 次提交
  4. 25 5月, 2011 3 次提交
    • A
      ext4: enable "punch hole" functionality · a4bb6b64
      Allison Henderson 提交于
      This patch adds new routines: "ext4_punch_hole" "ext4_ext_punch_hole"
      and "ext4_ext_check_cache"
      
      fallocate has been modified to call ext4_punch_hole when the punch hole
      flag is passed.  At the moment, we only support punching holes in
      extents, so this routine is pretty much a wrapper for the ext4_ext_punch_hole
      routine.
      
      The ext4_ext_punch_hole routine first completes all outstanding writes
      with the associated pages, and then releases them.  The unblock
      aligned data is zeroed, and all blocks in between are punched out.
      
      The ext4_ext_check_cache routine is very similar to ext4_ext_in_cache
      except it accepts a ext4_ext_cache parameter instead of a ext4_extent
      parameter.  This routine is used by ext4_ext_punch_hole to check and
      see if a block in a hole that has been cached.  The ext4_ext_cache
      parameter is necessary because the members ext4_extent structure are
      not large enough to hold a 32 bit value.  The existing
      ext4_ext_in_cache routine has become a wrapper to this new function.
      
      [ext4 punch hole patch series 5/5 v7] 
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      a4bb6b64
    • A
      ext4: add new function ext4_block_zero_page_range() · 30848851
      Allison Henderson 提交于
      This patch modifies the existing ext4_block_truncate_page() function
      which was used by the truncate code path, and which zeroes out block
      unaligned data, by adding a new length parameter, and renames it to
      ext4_block_zero_page_rage().  This function can now be used to zero out the
      head of a block, the tail of a block, or the middle
      of a block.
      
      The ext4_block_truncate_page() function is now a wrapper to
      ext4_block_zero_page_range().
      
      [ext4 punch hole patch series 2/5 v7] 
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      30848851
    • A
      ext4: add flag to ext4_has_free_blocks · 55f020db
      Allison Henderson 提交于
      This patch adds an allocation request flag to the ext4_has_free_blocks
      function which enables the use of reserved blocks.  This will allow a
      punch hole to proceed even if the disk is full.  Punching a hole may
      require additional blocks to first split the extents.
      
      Because ext4_has_free_blocks is a low level function, the flag needs
      to be passed down through several functions listed below:
      
      ext4_ext_insert_extent
      ext4_ext_create_new_leaf
      ext4_ext_grow_indepth
      ext4_ext_split
      ext4_ext_new_meta_block
      ext4_mb_new_blocks
      ext4_claim_free_blocks
      ext4_has_free_blocks
      
      [ext4 punch hole patch series 1/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      55f020db
  5. 24 5月, 2011 1 次提交
    • T
      ext4: use truncate_setsize() unconditionally · 072bd7ea
      Theodore Ts'o 提交于
      In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
      past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
      before calling vmtruncate().  This caused any allocated but unwritten
      blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
      flag to be dropped.  This was done to make to make sure that
      EOFBLOCKS_FL would not be cleared while still leaving blocks past
      i_size allocated.  This was not necessary, since ext4_truncate()
      guarantees that blocks past i_size will be dropped, even in the case
      where truncate() has increased i_size before calling ext4_truncate().
      
      So fix this by removing the EOFBLOCKS_FL special case treatment in
      ext4_setattr().  In addition, use truncate_setsize() followed by a
      call to ext4_truncate() instead of using vmtruncate().  This is more
      efficient since it skips the call to inode_newsize_ok(), which has
      been checked already by inode_change_ok().  This is also in a win in
      the case where EOFBLOCKS_FL is set since it avoids calling
      ext4_truncate() twice.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      072bd7ea
  6. 19 5月, 2011 2 次提交
  7. 09 5月, 2011 1 次提交
  8. 11 4月, 2011 2 次提交
  9. 05 4月, 2011 1 次提交
  10. 31 3月, 2011 1 次提交
  11. 24 3月, 2011 1 次提交
  12. 22 3月, 2011 1 次提交
  13. 21 3月, 2011 1 次提交
  14. 10 3月, 2011 1 次提交
  15. 28 2月, 2011 1 次提交
  16. 27 2月, 2011 8 次提交
    • T
      ext4: move setup of the mpd structure to write_cache_pages_da() · 168fc022
      Theodore Ts'o 提交于
      Move the initialization of all of the fields of the mpd structure to
      write_cache_pages_da().  This simplifies the code considerably.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      168fc022
    • T
      ext4: don't lock the next page in write_cache_pages if not needed · 78aaced3
      Theodore Ts'o 提交于
      If we have accumulated a contiguous region of memory to be written
      out, and the next page can added to this region, don't bother locking
      (and then unlocking the page) before writing out the memory.  In the
      unlikely event that the next page was being written back by some other
      CPU, we can also skip waiting that page to finish writeback.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      78aaced3
    • T
      ext4: remove page_skipped hackery in ext4_da_writepages() · ee6ecbcc
      Theodore Ts'o 提交于
      Because the ext4 page writeback codepath had been prematurely calling
      clear_page_dirty_for_io(), if it turned out that a particular page
      couldn't be written out during a particular pass of
      write_cache_pages_da(), the page would have to get redirtied by
      calling redirty_pages_for_writeback().  Not only was this wasted work,
      but redirty_page_for_writeback() would increment wbc->pages_skipped to
      signal to writeback_sb_inodes() that buffers were locked, and that it
      should skip this inode until later.
      
      Since this signal was incorrect in ext4's case --- which was caused by
      ext4's historically incorrect use of write_cache_pages() ---
      ext4_da_writepages() saved and restored wbc->skipped_pages to avoid
      confusing writeback_sb_inodes().
      
      Now that we've fixed ext4 to call clear_page_dirty_for_io() right
      before initiating the page I/O, we can nuke the page_skipped
      save/restore hackery, and breathe a sigh of relief.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ee6ecbcc
    • T
      ext4: clear the dirty bit for a page in writeback at the last minute · 97498956
      Theodore Ts'o 提交于
      Move when we call clear_page_dirty_for_io() to just before we actually
      write the page.  This simplifies the code somewhat, and avoids marking
      pages as clean and then needing to remark them as dirty later.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      97498956
    • T
      ext4: simple cleanups to write_cache_pages_da() · 4f01b02c
      Theodore Ts'o 提交于
      Eliminate duplicate code, unneeded variables, etc., to make it easier
      to understand the code.  No behavioral changes were made in this patch.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4f01b02c
    • T
      ext4: fold __mpage_da_writepage() into write_cache_pages_da() · 8eb9e5ce
      Theodore Ts'o 提交于
      Fold the __mpage_da_writepage() function into write_cache_pages_da().
      This will give us opportunities to clean up and simplify the resulting
      code.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8eb9e5ce
    • C
      ext4: fix ext4_da_block_invalidatepages() to handle page range properly · c7f5938a
      Curt Wohlgemuth 提交于
      If ext4_da_block_invalidatepages() is called because of a
      failure from ext4_map_blocks() in mpage_da_map_and_submit(),
      it's supposed to clean up -- including unlock -- all the
      pages in the mpd structure.  But these values may not match
      up, even on a system in which block size == page size:
      
         mpd->b_blocknr != mpd->first_page
         mpd->b_size != (mpd->next_page - mpd->first_page)
      
      ext4_da_block_invalidatepages() has been using b_blocknr and
      b_size; this patch changes it to use first_page and
      next_page.
      
      Tested:  I injected a small number (5%) of failures in
      ext4_map_blocks() in the case that the flags contain
      EXT4_GET_BLOCKS_DELALLOC_RESERVE, and ran fsstress on this
      kernel.  Without this patch, I got hung tasks every time.
      With this patch, I see no hangs in many runs of fsstress.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c7f5938a
    • C
      ext4: mark multi-page IO complete on mapping failure · e0fd9b90
      Curt Wohlgemuth 提交于
      In mpage_da_map_and_submit(), if we have a delayed block
      allocation failure from ext4_map_blocks(), we need to mark
      the IO as complete, by setting
      
            mpd->io_done = 1;
      
      Otherwise, we could end up submitting the pages in an outer
      loop; since they are unlocked on mapping failure in
      ext4_da_block_invalidatepages(), this will cause a bug check
      in mpage_da_submit_io().
      
      I tested this by injected failures into ext4_map_blocks().
      Without this patch, a simple fsstress run will bug check;
      with the patch, it works fine.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e0fd9b90
  17. 22 2月, 2011 1 次提交
  18. 14 1月, 2011 1 次提交
  19. 11 1月, 2011 6 次提交
  20. 17 12月, 2010 2 次提交
  21. 15 12月, 2010 1 次提交
    • T
      ext4: Turn off multiple page-io submission by default · 1449032b
      Theodore Ts'o 提交于
      Jon Nelson has found a test case which causes postgresql to fail with
      the error:
      
      psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581
      
      Under memory pressure, it looks like part of a file can end up getting
      replaced by zero's.  Until we can figure out the cause, we'll roll
      back the change and use block_write_full_page() instead of
      ext4_bio_write_page().  The new, more efficient writing function can
      be used via the mount option mblk_io_submit, so we can test and fix
      the new page I/O code.
      
      To reproduce the problem, install postgres 8.4 or 9.0, and pin enough
      memory such that the system just at the end of triggering writeback
      before running the following sql script:
      
      begin;
      create temporary table foo as select x as a, ARRAY[x] as b FROM
      generate_series(1, 10000000 ) AS x;
      create index foo_a_idx on foo (a);
      create index foo_b_idx on foo USING GIN (b);
      rollback;
      
      If the temporary table is created on a hard drive partition which is
      encrypted using dm_crypt, then under memory pressure, approximately
      30-40% of the time, pgsql will issue the above failure.
      
      This patch should fix this problem, and the problem will come back if
      the file system is mounted with the mblk_io_submit mount option.
      Reported-by: NJon Nelson <jnelson@jamponi.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1449032b
  22. 15 11月, 2010 1 次提交
  23. 09 11月, 2010 1 次提交