1. 30 4月, 2012 9 次提交
  2. 17 4月, 2012 1 次提交
  3. 20 3月, 2012 1 次提交
  4. 19 3月, 2012 1 次提交
  5. 05 3月, 2012 3 次提交
    • C
      ext4: add comments to definition of ext4_io_end_t · 4188188b
      Curt Wohlgemuth 提交于
      This should make it more clear what this structure is used
      for, and how some of the (mutually exclusive) fields are
      used to keep page cache references.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4188188b
    • J
      ext4: fix race between sync and completed io work · 491caa43
      Jeff Moyer 提交于
      The following command line will leave the aio-stress process unkillable
      on an ext4 file system (in my case, mounted on /mnt/test):
      
      aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2
      
      This is using the aio-stress program from the xfstests test suite.
      That particular command line tells aio-stress to do random writes to
      20 files from 20 threads (one thread per file).  The files are NOT
      preallocated, so you will get writes to random offsets within the
      file, thus creating holes and extending i_size.  It also opens the
      file with O_DIRECT and O_SYNC.
      
      On to the problem.  When an I/O requires unwritten extent conversion,
      it is queued onto the completed_io_list for the ext4 inode.  Two code
      paths will pull work items from this list.  The first is the
      ext4_end_io_work routine, and the second is ext4_flush_completed_IO,
      which is called via the fsync path (and O_SYNC handling, as well).
      There are two issues I've found in these code paths.  First, if the
      fsync path beats the work routine to a particular I/O, the work
      routine will free the io_end structure!  It does not take into account
      the fact that the io_end may still be in use by the fsync path.  I've
      fixed this issue by adding yet another IO_END flag, indicating that
      the io_end is being processed by the fsync path.
      
      The second problem is that the work routine will make an assignment to
      io->flag outside of the lock.  I have witnessed this result in a hang
      at umount.  Moving the flag setting inside the lock resolved that
      problem.
      
      The problem was introduced by commit b82e384c ("ext4: optimize
      locking for end_io extent conversion"), which first appeared in 3.2.
      As such, the fix should be backported to that release (probably along
      with the unwritten extent conversion race fix).
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      CC: stable@kernel.org
      491caa43
    • T
      ext4: make ext4_show_options() be table-driven · 5a916be1
      Theodore Ts'o 提交于
      Consistently show mount options which are the non-default, so that
      /proc/mounts accurately shows the mount options that would be
      necessary to mount the file system in its current mode of operation.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5a916be1
  6. 04 3月, 2012 1 次提交
  7. 03 3月, 2012 1 次提交
  8. 21 2月, 2012 3 次提交
    • J
      ext4: fix race between unwritten extent conversion and truncate · 266991b1
      Jeff Moyer 提交于
      The following comment in ext4_end_io_dio caught my attention:
      
      	/* XXX: probably should move into the real I/O completion handler */
              inode_dio_done(inode);
      
      The truncate code takes i_mutex, then calls inode_dio_wait.  Because the
      ext4 code path above will end up dropping the mutex before it is
      reacquired by the worker thread that does the extent conversion, it
      seems to me that the truncate can happen out of order.  Jan Kara
      mentioned that this might result in error messages in the system logs,
      but that should be the extent of the "damage."
      
      The fix is pretty straight-forward: don't call inode_dio_done until the
      extent conversion is complete.
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      266991b1
    • T
      ext4: fix INCOMPAT feature codepoint reservation for INLINEDATA · 856cbcf9
      Theodore Ts'o 提交于
      In commit 9b90e5e0 I incorrectly reserved the wrong bit for
      EXT4_FEATURE_INCOMPAT_INLINEDATA per the discussion on the linux-ext4
      list on December 7, 2011.  The codepoint 0x2000 should be used for
      EXT4_FEATURE_INCOMPAT_USE_META_CSUM, so INLINEDATA will be assigned
      the value 0x8000.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      856cbcf9
    • T
      ext4: fix race when setting bitmap_uptodate flag · 813e5727
      Theodore Ts'o 提交于
      In ext4_read_{inode,block}_bitmap() we were setting bitmap_uptodate()
      before submitting the buffer for read.  The is bad, since we check
      bitmap_uptodate() without locking the buffer, and so if another
      process is racing with us, it's possible that they will think the
      bitmap is uptodate even though the read has not completed yet,
      resulting in inodes and blocks potentially getting allocated more than
      once if we get really unlucky.
      
      Addresses-Google-Bug: 2828254
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      813e5727
  9. 05 1月, 2012 3 次提交
  10. 04 1月, 2012 2 次提交
  11. 29 12月, 2011 2 次提交
  12. 01 11月, 2011 2 次提交
  13. 29 10月, 2011 1 次提交
  14. 25 10月, 2011 1 次提交
    • D
      ext4: update EOFBLOCKS flag on fallocate properly · a4e5d88b
      Dmitry Monakhov 提交于
      EOFBLOCK_FL should be updated if called w/o FALLOCATE_FL_KEEP_SIZE
      Currently it happens only if new extent was allocated.
      
      TESTCASE:
      fallocate test_file -n -l4096
      fallocate test_file -l4096
      Last fallocate cmd has updated size, but keept EOFBLOCK_FL set. And
      fsck will complain about that.
      
      Also remove ping pong in ext4_fallocate() in case of new extents,
      where ext4_ext_map_blocks() clear EOFBLOCKS bit, and later
      ext4_falloc_update_inode() restore it again.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a4e5d88b
  15. 09 10月, 2011 2 次提交
  16. 10 9月, 2011 7 次提交
    • A
      ext4: attempt to fix race in bigalloc code path · 5356f261
      Aditya Kali 提交于
      Currently, there exists a race between delayed allocated writes and
      the writeback when bigalloc feature is in use. The race was because we
      wanted to determine what blocks in a cluster are under delayed
      allocation and we were using buffer_delayed(bh) check for it. But, the
      writeback codepath clears this bit without any synchronization which
      resulted in a race and an ext4 warning similar to:
      
      EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0
      		reserved data blocks
      
      The race existed in two places.
      (1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from
          writeback code path.
      (2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where
          buffer_delayed(bh) is set.
      
      To fix (1), this patch introduces a new buffer_head state bit -
      BH_Da_Mapped.  This bit is set under the protection of
      EXT4_I(inode)->i_data_sem when we have actually mapped the delayed
      allocated blocks during the writeout time. We can now reliably check
      for this bit inside ext4_find_delalloc_range() to determine whether
      the reservation for the blocks have already been claimed or not.
      
      To fix (2), it was necessary to set buffer_delay(bh) under the
      protection of i_data_sem.  So, I extracted the very beginning of
      ext4_map_blocks into a new function - ext4_da_map_blocks() - and
      performed the required setting of bh_delay bit and the quota
      reservation under the protection of i_data_sem.  These two fixes makes
      the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent,
      thus removing the race.
      
      Tested: I was able to reproduce the problem by running 'dd' and
      'fsync' in parallel. Also, xfstests sometimes used to reproduce this
      race. After the fix both my test and xfstests were successful and no
      race (warning message) was observed.
      
      Google-Bug-Id: 4997027
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5356f261
    • A
      ext4: add some tracepoints in ext4/extents.c · d8990240
      Aditya Kali 提交于
      This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in
      ext4/inode.c.
      
      Tested: Built and ran the kernel and verified that these tracepoints work.
      Also ran xfstests.
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
          
      d8990240
    • T
      ext4: rename ext4_has_free_blocks() to ext4_has_free_clusters() · df55c99d
      Theodore Ts'o 提交于
      Rename the function so it is more clear what is going on.  Also rename
      the various variables so it's clearer what's happening.
      
      Also fix a missing blocks to cluster conversion when reading the
      number of reserved blocks for root.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      df55c99d
    • T
      ext4: rename ext4_claim_free_blocks() to ext4_claim_free_clusters() · e7d5f315
      Theodore Ts'o 提交于
      This function really claims a number of free clusters, not blocks, so
      rename it so it's clearer what's going on.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e7d5f315
    • T
      ext4: rename ext4_free_blocks_after_init() to ext4_free_clusters_after_init() · cff1dfd7
      Theodore Ts'o 提交于
      This function really returns the number of clusters after initializing
      an uninitalized block bitmap has been initialized.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cff1dfd7
    • T
      ext4: rename ext4_count_free_blocks() to ext4_count_free_clusters() · 5dee5437
      Theodore Ts'o 提交于
      This function really counts the free clusters reported in the block
      group descriptors, so rename it to reduce confusion.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5dee5437
    • T
      ext4: Rename ext4_free_blks_{count,set}() to refer to clusters · 021b65bb
      Theodore Ts'o 提交于
      The field bg_free_blocks_count_{lo,high} in the block group
      descriptor has been repurposed to hold the number of free clusters for
      bigalloc functions.  So rename the functions so it makes it easier to
      read and audit the block allocation and block freeing code.
      
      Note: at this point in bigalloc development we doesn't support
      online resize, so this also makes it really obvious all of the places
      we need to fix up to add support for online resize.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      021b65bb