1. 28 10月, 2010 6 次提交
    • T
      ext4: simplify ext4_writepage() · a42afc5f
      Theodore Ts'o 提交于
      The actual code in ext4_writepage() is unnecessarily convoluted.
      Simplify it so it is easier to understand, but otherwise logically
      equivalent.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a42afc5f
    • T
      ext4: call mpage_da_submit_io() from mpage_da_map_blocks() · 5a87b7a5
      Theodore Ts'o 提交于
      Eventually we need to completely reorganize the ext4 writepage
      callpath, but for now, we simplify things a little by calling
      mpage_da_submit_io() from mpage_da_map_blocks(), since all of the
      places where we call mpage_da_map_blocks() it is followed up by a call
      to mpage_da_submit_io().
      
      We're also a wee bit better with respect to error handling, but there
      are still a number of issues where it's not clear what the right thing
      is to do with ext4 functions deep in the writeback codepath fails.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5a87b7a5
    • E
      ext4: queue conversion after adding to inode's completed IO list · c999af2b
      Eric Sandeen 提交于
      By queuing the io end on the unwritten workqueue before adding it
      to our inode's list of completed IOs, I think we run the risk
      of the work getting completed, and the IO freed, before we try
      to add it to the inode's i_completed_io_list.
      
      It should be safe to add it to the inode's list of completed
      IOs, and -then- queue it for completion, I think.
      
      Thanks to Dave Chinner for pointing out the race.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c999af2b
    • T
      ext4: fix potential infinite loop in ext4_da_writepages() · 0c9169cc
      Toshiyuki Okajima 提交于
      On linux-2.6.36-rc2, if we execute the following script, we can hang
      the system when the /bin/sync command is executed:
      
      ========================================================================
      #!/bin/sh
      
      echo -n "HANG UP TEST: "
      /bin/dd if=/dev/zero of=/tmp/img bs=1k count=1 seek=1M 2> /dev/null
      /sbin/mkfs.ext4 -Fq /tmp/img
      /bin/mount -o loop -t ext4 /tmp/img /mnt
      /bin/dd if=/dev/zero of=/mnt/file bs=1 count=1 \
      seek=$((16*1024*1024*1024*1024-4096)) 2> /dev/null
      /bin/sync
      /bin/umount /mnt
      echo "DONE"
      exit 0
      ========================================================================
      
      We can see the following backtrace if we get the kdump when this
      hangup occurs:
      
      ======================================================================
      kthread()
      => bdi_writeback_thread()
         => wb_do_writeback()
            => wb_writeback()
               => writeback_inodes_wb()
                  => writeback_sb_inodes()
                     => writeback_single_inode()
                        => ext4_da_writepages()  ---+ 
                                      ^ infinite    |
                                      |   loop      |
                                      +-------------+
      ======================================================================
      
      The reason why this hangup happens is described as follows:
      1) We write the last extent block of the file whose size is the filesystem 
         maximum size.
      2) "BH_Delay" flag is set on the buffer_head of its block.
      3) - the member, "m_lblk" of struct mpage_da_data is 4294967295 (UINT_MAX)
         - the member, "m_len" of struct mpage_da_data is 1
        mpage_put_bnr_to_bhs() which is called via ext4_da_writepages()
        cannot clear "BH_Delay" flag of the buffer_head because the type of
        m_lblk is ext4_lblk_t and then m_lblk + m_len is overflow.
      
        Therefore an infinite loop occurs because ext4_da_writepages()
        cannot write the page (which corresponds to the block) since
        "BH_Delay" flag isn't cleared.
      ----------------------------------------------------------------------
      static void mpage_put_bnr_to_bhs(struct mpage_da_data *mpd,
      				struct ext4_map_blocks *map)
      {
      ...
      	int blocks = map->m_len;
      ...
      		do {
      			// cur_logical = 4294967295
      			// map->m_lblk = 4294967295
      			// blocks = 1
      			// *** map->m_lblk + blocks == 0 (OVERFLOW!) ***
      			// (cur_logical >= map->m_lblk + blocks) => true
      			if (cur_logical >= map->m_lblk + blocks)
      				break;
      ----------------------------------------------------------------------
      
      NOTE: Mounting with the nodelalloc option will avoid this codepath,
      and thus, avoid this hang
      Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0c9169cc
    • E
      ext4: don't bump up LONG_MAX nr_to_write by a factor of 8 · b443e733
      Eric Sandeen 提交于
      I'm uneasy with lots of stuff going on in ext4_da_writepages(),
      but bumping nr_to_write from LLONG_MAX to -8 clearly isn't
      making anything better, so avoid the multiplier in that case.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b443e733
    • E
      ext4: stop looping in ext4_num_dirty_pages when max_pages reached · 659c6009
      Eric Sandeen 提交于
      Today we simply break out of the inner loop when we have accumulated
      max_pages; this keeps scanning forwad and doing pagevec_lookup_tag()
      in the while (!done) loop, this does potentially a lot of work
      with no net effect.
      
      When we have accumulated max_pages, just clean up and return.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      659c6009
  2. 10 8月, 2010 4 次提交
    • A
      convert ext4 to ->evict_inode() · 0930fcc1
      Al Viro 提交于
      pretty much brute-force...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0930fcc1
    • C
      remove inode_setattr · 1025774c
      Christoph Hellwig 提交于
      Replace inode_setattr with opencoded variants of it in all callers.  This
      moves the remaining call to vmtruncate into the filesystem methods where it
      can be replaced with the proper truncate sequence.
      
      In a few cases it was obvious that we would never end up calling vmtruncate
      so it was left out in the opencoded variant:
      
       spufs: explicitly checks for ATTR_SIZE earlier
       btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
       ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
      
      In addition to that ncpfs called inode_setattr with handcrafted iattrs,
      which allowed to trim down the opencoded variant.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1025774c
    • C
      introduce __block_write_begin · 6e1db88d
      Christoph Hellwig 提交于
      Split up the block_write_begin implementation - __block_write_begin is a new
      trivial wrapper for block_prepare_write that always takes an already
      allocated page and can be either called from block_write_begin or filesystem
      code that already has a page allocated.  Remove the handling of already
      allocated pages from block_write_begin after switching all callers that
      do it to __block_write_begin.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6e1db88d
    • C
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig 提交于
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eafdc7d1
  3. 06 8月, 2010 1 次提交
    • J
      ext4: Fix dirtying of journalled buffers in data=journal mode · 56d35a4c
      Jan Kara 提交于
      In data=journal mode, we still use block_write_begin() to prepare
      page for writing. This function can occasionally mark buffer dirty
      which violates journalling assumptions - when a buffer is part of
      a transaction, it should be dirty and a buffer can be already part
      of a forget list of some transaction when block_write_begin()
      gets called. This violation of journalling assumptions then results
      in "JBD: Spotted dirty metadata buffer..." warnings.
      
      In fact, temporary dirtying the buffer while the page is still locked
      does not really cause problems to the journalling because we won't write
      the buffer until the page gets unlocked. So we just have to make sure
      to clear dirty bits before unlocking the page.
      Signed-off-by: NJan Kara <jack@suse.cz>
      56d35a4c
  4. 04 8月, 2010 1 次提交
  5. 30 7月, 2010 1 次提交
  6. 27 7月, 2010 9 次提交
  7. 30 6月, 2010 1 次提交
  8. 15 6月, 2010 2 次提交
  9. 14 6月, 2010 1 次提交
  10. 05 6月, 2010 1 次提交
  11. 22 5月, 2010 1 次提交
  12. 17 5月, 2010 8 次提交
  13. 16 5月, 2010 3 次提交
    • E
      ext4: don't use quota reservation for speculative metadata · 72b8ab9d
      Eric Sandeen 提交于
      Because we can badly over-reserve metadata when we
      calculate worst-case, it complicates things for quota, since
      we must reserve and then claim later, retry on EDQUOT, etc.
      Quota is also a generally smaller pool than fs free blocks,
      so this over-reservation hurts more, and more often.
      
      I'm of the opinion that it's not the worst thing to allow
      metadata to push a user slightly over quota.  This simplifies
      the code and avoids the false quota rejections that result
      from worst-case speculation.
      
      This patch stops the speculative quota-charging for
      worst-case metadata requirements, and just charges quota
      when the blocks are allocated at writeout.  It also is
      able to remove the try-again loop on EDQUOT.
      
      This patch has been tested indirectly by running the xfstests
      suite with a hack to mount & enable quota prior to the test.
      
      I also did a more specific test of fragmenting freespace
      and then doing a large delalloc write under quota; quota
      stopped me at the right amount of file IO, and then the
      writeout generated enough metadata (due to the fragmentation)
      that it put me slightly over quota, as expected.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      72b8ab9d
    • E
      ext4: don't scan/accumulate more pages than mballoc will allocate · c445e3e0
      Eric Sandeen 提交于
      There was a bug reported on RHEL5 that a 10G dd on a 12G box
      had a very, very slow sync after that.
      
      At issue was the loop in write_cache_pages scanning all the way
      to the end of the 10G file, even though the subsequent call
      to mpage_da_submit_io would only actually write a smallish amt; then
      we went back to the write_cache_pages loop ... wasting tons of time
      in calling __mpage_da_writepage for thousands of pages we would
      just revisit (many times) later.
      
      Upstream it's not such a big issue for sys_sync because we get
      to the loop with a much smaller nr_to_write, which limits the loop.
      
      However, talking with Aneesh he realized that fsync upstream still
      gets here with a very large nr_to_write and we face the same problem.
      
      This patch makes mpage_add_bh_to_extent stop the loop after we've
      accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately
      causes the write_cache_pages loop to break.
      
      Repeating the test with a dirty_ratio of 80 (to leave something for
      fsync to do), I don't see huge IO performance gains, but the reduction
      in cpu usage is striking: 80% usage with stock, and 2% with the
      below patch.  Instrumenting the loop in write_cache_pages clearly
      shows that we are wasting time here.
      
      Eventually we need to change mpage_da_map_pages() also submit its I/O
      to the block layer, subsuming mpage_da_submit_io(), and then change it
      call ext4_get_blocks() multiple times.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c445e3e0
    • D
      ext4: fix quota accounting in case of fallocate · 35121c98
      Dmitry Monakhov 提交于
      allocated_meta_data is already included in 'used' variable.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      35121c98
  14. 04 4月, 2010 1 次提交