1. 28 10月, 2010 9 次提交
    • T
      ext4: move mpage_put_bnr_to_bhs()'s functionality to mpage_da_submit_io() · 1de3e3df
      Theodore Ts'o 提交于
      This massively simplifies the ext4_da_writepages() code path by
      completely removing mpage_put_bnr_bhs(), which is almost 100 lines of
      code iterating over a set of pages using pagevec_lookup(), and folds
      that functionality into mpage_da_submit_io()'s existing
      pagevec_lookup() loop.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1de3e3df
    • T
      ext4: inline walk_page_buffers() into mpage_da_submit_io · 3ecdb3a1
      Theodore Ts'o 提交于
      Expand the call:
      
        if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
                              ext4_bh_delay_or_unwritten))
      	goto redirty_page
      
      into mpage_da_submit_io().
      
      This will allow us to merge in mpage_put_bnr_to_bhs() in the next
      patch.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3ecdb3a1
    • T
      ext4: inline ext4_writepage() into mpage_da_submit_io() · cb20d518
      Theodore Ts'o 提交于
      As a prepratory step to switching to bio_submit, inline
      ext4_writepage() into mpage_da_submit() and then simplify things a
      bit.  This makes it clearer what mpage_da_submit needs to do.
      
      Also, move the ClearPageChecked(page) call into
      __ext4_journalled_writepage(), as a minor bit of cleanup refactoring.
      
      This also allows us to pull i_size_read() and
      ext4_should_journal_data() out of the loop, which should be a very
      minor CPU savings.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cb20d518
    • T
      ext4: simplify ext4_writepage() · a42afc5f
      Theodore Ts'o 提交于
      The actual code in ext4_writepage() is unnecessarily convoluted.
      Simplify it so it is easier to understand, but otherwise logically
      equivalent.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a42afc5f
    • T
      ext4: call mpage_da_submit_io() from mpage_da_map_blocks() · 5a87b7a5
      Theodore Ts'o 提交于
      Eventually we need to completely reorganize the ext4 writepage
      callpath, but for now, we simplify things a little by calling
      mpage_da_submit_io() from mpage_da_map_blocks(), since all of the
      places where we call mpage_da_map_blocks() it is followed up by a call
      to mpage_da_submit_io().
      
      We're also a wee bit better with respect to error handling, but there
      are still a number of issues where it's not clear what the right thing
      is to do with ext4 functions deep in the writeback codepath fails.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5a87b7a5
    • E
      ext4: queue conversion after adding to inode's completed IO list · c999af2b
      Eric Sandeen 提交于
      By queuing the io end on the unwritten workqueue before adding it
      to our inode's list of completed IOs, I think we run the risk
      of the work getting completed, and the IO freed, before we try
      to add it to the inode's i_completed_io_list.
      
      It should be safe to add it to the inode's list of completed
      IOs, and -then- queue it for completion, I think.
      
      Thanks to Dave Chinner for pointing out the race.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c999af2b
    • T
      ext4: fix potential infinite loop in ext4_da_writepages() · 0c9169cc
      Toshiyuki Okajima 提交于
      On linux-2.6.36-rc2, if we execute the following script, we can hang
      the system when the /bin/sync command is executed:
      
      ========================================================================
      #!/bin/sh
      
      echo -n "HANG UP TEST: "
      /bin/dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1M 2> /dev/null
      /sbin/mkfs.ext4 -Fq /tmp/img
      /bin/mount -o loop -t ext4 /tmp/img /mnt
      /bin/dd if=/dev/zero of=/mnt/file bs=1 count=1 \
      seek=$((16*1024*1024*1024*1024-4096)) 2> /dev/null
      /bin/sync
      /bin/umount /mnt
      echo "DONE"
      exit 0
      ========================================================================
      
      We can see the following backtrace if we get the kdump when this
      hangup occurs:
      
      ======================================================================
      kthread()
      => bdi_writeback_thread()
         => wb_do_writeback()
            => wb_writeback()
               => writeback_inodes_wb()
                  => writeback_sb_inodes()
                     => writeback_single_inode()
                        => ext4_da_writepages()  ---+ 
                                      ^ infinite    |
                                      |   loop      |
                                      +-------------+
      ======================================================================
      
      The reason why this hangup happens is described as follows:
      1) We write the last extent block of the file whose size is the filesystem 
         maximum size.
      2) "BH_Delay" flag is set on the buffer_head of its block.
      3) - the member, "m_lblk" of struct mpage_da_data is 4294967295 (UINT_MAX)
         - the member, "m_len" of struct mpage_da_data is 1
        mpage_put_bnr_to_bhs() which is called via ext4_da_writepages()
        cannot clear "BH_Delay" flag of the buffer_head because the type of
        m_lblk is ext4_lblk_t and then m_lblk + m_len is overflow.
      
        Therefore an infinite loop occurs because ext4_da_writepages()
        cannot write the page (which corresponds to the block) since
        "BH_Delay" flag isn't cleared.
      ----------------------------------------------------------------------
      static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd,
      				struct ext4_map_blocks *map)
      {
      ...
      	int blocks = map->m_len;
      ...
      		do {
      			// cur_logical = 4294967295
      			// map->m_lblk = 4294967295
      			// blocks = 1
      			// *** map->m_lblk + blocks == 0 (OVERFLOW!) ***
      			// (cur_logical >= map->m_lblk + blocks) => true
      			if (cur_logical >= map->m_lblk + blocks)
      				break;
      ----------------------------------------------------------------------
      
      NOTE: Mounting with the nodelalloc option will avoid this codepath,
      and thus, avoid this hang
      Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0c9169cc
    • E
      ext4: don't bump up LONG_MAX nr_to_write by a factor of 8 · b443e733
      Eric Sandeen 提交于
      I'm uneasy with lots of stuff going on in ext4_da_writepages(),
      but bumping nr_to_write from LLONG_MAX to -8 clearly isn't
      making anything better, so avoid the multiplier in that case.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b443e733
    • E
      ext4: stop looping in ext4_num_dirty_pages when max_pages reached · 659c6009
      Eric Sandeen 提交于
      Today we simply break out of the inner loop when we have accumulated
      max_pages; this keeps scanning forwad and doing pagevec_lookup_tag()
      in the while (!done) loop, this does potentially a lot of work
      with no net effect.
      
      When we have accumulated max_pages, just clean up and return.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      659c6009
  2. 10 8月, 2010 4 次提交
    • A
      convert ext4 to ->evict_inode() · 0930fcc1
      Al Viro 提交于
      pretty much brute-force...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0930fcc1
    • C
      remove inode_setattr · 1025774c
      Christoph Hellwig 提交于
      Replace inode_setattr with opencoded variants of it in all callers.  This
      moves the remaining call to vmtruncate into the filesystem methods where it
      can be replaced with the proper truncate sequence.
      
      In a few cases it was obvious that we would never end up calling vmtruncate
      so it was left out in the opencoded variant:
      
       spufs: explicitly checks for ATTR_SIZE earlier
       btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
       ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
      
      In addition to that ncpfs called inode_setattr with handcrafted iattrs,
      which allowed to trim down the opencoded variant.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1025774c
    • C
      introduce __block_write_begin · 6e1db88d
      Christoph Hellwig 提交于
      Split up the block_write_begin implementation - __block_write_begin is a new
      trivial wrapper for block_prepare_write that always takes an already
      allocated page and can be either called from block_write_begin or filesystem
      code that already has a page allocated.  Remove the handling of already
      allocated pages from block_write_begin after switching all callers that
      do it to __block_write_begin.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6e1db88d
    • C
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig 提交于
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eafdc7d1
  3. 06 8月, 2010 1 次提交
    • J
      ext4: Fix dirtying of journalled buffers in data=journal mode · 56d35a4c
      Jan Kara 提交于
      In data=journal mode, we still use block_write_begin() to prepare
      page for writing. This function can occasionally mark buffer dirty
      which violates journalling assumptions - when a buffer is part of
      a transaction, it should be dirty and a buffer can be already part
      of a forget list of some transaction when block_write_begin()
      gets called. This violation of journalling assumptions then results
      in "JBD: Spotted dirty metadata buffer..." warnings.
      
      In fact, temporary dirtying the buffer while the page is still locked
      does not really cause problems to the journalling because we won't write
      the buffer until the page gets unlocked. So we just have to make sure
      to clear dirty bits before unlocking the page.
      Signed-off-by: NJan Kara <jack@suse.cz>
      56d35a4c
  4. 04 8月, 2010 1 次提交
  5. 30 7月, 2010 1 次提交
  6. 27 7月, 2010 9 次提交
  7. 30 6月, 2010 1 次提交
  8. 15 6月, 2010 2 次提交
  9. 14 6月, 2010 1 次提交
  10. 05 6月, 2010 1 次提交
  11. 22 5月, 2010 1 次提交
  12. 17 5月, 2010 8 次提交
  13. 16 5月, 2010 1 次提交
    • E
      ext4: don't use quota reservation for speculative metadata · 72b8ab9d
      Eric Sandeen 提交于
      Because we can badly over-reserve metadata when we
      calculate worst-case, it complicates things for quota, since
      we must reserve and then claim later, retry on EDQUOT, etc.
      Quota is also a generally smaller pool than fs free blocks,
      so this over-reservation hurts more, and more often.
      
      I'm of the opinion that it's not the worst thing to allow
      metadata to push a user slightly over quota.  This simplifies
      the code and avoids the false quota rejections that result
      from worst-case speculation.
      
      This patch stops the speculative quota-charging for
      worst-case metadata requirements, and just charges quota
      when the blocks are allocated at writeout.  It also is
      able to remove the try-again loop on EDQUOT.
      
      This patch has been tested indirectly by running the xfstests
      suite with a hack to mount & enable quota prior to the test.
      
      I also did a more specific test of fragmenting freespace
      and then doing a large delalloc write under quota; quota
      stopped me at the right amount of file IO, and then the
      writeout generated enough metadata (due to the fragmentation)
      that it put me slightly over quota, as expected.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      72b8ab9d