1. 04 9月, 2011 1 次提交
    • T
      ext4: improve handling of conflicting mount options · 56889787
      Theodore Ts'o 提交于
      If the user explicitly specifies conflicting mount options for
      delalloc or dioread_nolock and data=journal, fail the mount, instead
      of printing a warning and continuing (since many user's won't look at
      dmesg and notice the warning).
      
      Also, print a single warning that data=journal implies that delayed
      allocation is not on by default (since it's not supported), and
      furthermore that O_DIRECT is not supported.  Improve the text in
      Documentation/filesystems/ext4.txt so this is clear there as well.
      
      Similarly, if the dioread_nolock mount option is specified when the
      file system block size != PAGE_SIZE, fail the mount instead of
      printing a warning message and ignoring the mount option.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      56889787
  2. 14 8月, 2011 1 次提交
    • J
      ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode · 2581fdc8
      Jiaying Zhang 提交于
      Flush inode's i_completed_io_list before calling ext4_io_wait to
      prevent the following deadlock scenario: A page fault happens while
      some process is writing inode A. During page fault,
      shrink_icache_memory is called that in turn evicts another inode
      B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
      that waits for inode B's i_ioend_count to become zero. However, inode
      B's ioend work was queued behind some of inode A's ioend work on the
      same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
      thread on that cpu is processing inode A's ioend work, it tries to
      grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
      still hold before the page fault happened, we enter a deadlock.
      
      Also moves ext4_flush_completed_IO and ext4_ioend_wait from
      ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
      ext4_evict_inode() is called before ext4_destroy_inode() and in
      ext4_evict_inode(), we may call ext4_truncate() without holding
      i_mutex lock. As a result, there is a race between flush_completed_IO
      that is called from ext4_ext_truncate() and ext4_end_io_work, which
      may cause corruption on an io_end structure. This change moves
      ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
      to ext4_evict_inode() to resolve the race between ext4_truncate() and
      ext4_end_io_work during inode deletion.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      2581fdc8
  3. 04 8月, 2011 1 次提交
  4. 01 8月, 2011 2 次提交
  5. 27 7月, 2011 1 次提交
  6. 18 7月, 2011 1 次提交
  7. 11 7月, 2011 1 次提交
  8. 06 6月, 2011 1 次提交
    • L
      ext4: Fix max file size and logical block counting of extent format file · f17722f9
      Lukas Czerner 提交于
      Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
      in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
      format and fill the tail of file up to its end. We will hit the BUG_ON
      when we write the last block (2^32-1) into the sparse file.
      
      The root cause of the problem lies in the fact that we specifically set
      s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
      which is 32 bit long. However, we are not storing start and end block
      number, but rather start block number and length in blocks. It means
      that in order to cover extent from 0 to EXT_MAX_BLOCK we need
      EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
      and it does not.
      
      The only way to fix it without changing the meaning of the struct
      ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
      by one fs block so we can cover the whole extent we can get by the
      on-disk extent format.
      
      Also in many places EXT_MAX_BLOCK is used as length instead of maximum
      logical block number as the name suggests, it is all a bit messy. So
      this commit renames it to EXT_MAX_BLOCKS and change its usage in some
      places to actually be maximum number of blocks in the extent.
      
      The bug which this commit fixes can be reproduced as follows:
      
       dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
       sync
       dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))
      Reported-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f17722f9
  9. 27 5月, 2011 1 次提交
    • D
      ext4: add cleancache support · 7abc52c2
      Dan Magenheimer 提交于
      This seventh patch of eight in this cleancache series "opts-in"
      cleancache for ext4.  Filesystems must explicitly enable cleancache
      by calling cleancache_init_fs anytime an instance of the filesystem
      is mounted. For ext4, all other cleancache hooks are in
      the VFS layer including the matching cleancache_flush_fs
      hook which must be called on unmount.
      
      Details and a FAQ can be found in Documentation/vm/cleancache.txt
      
      [v6-v8: no changes]
      [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NAndreas Dilger <adilger@sun.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      7abc52c2
  10. 25 5月, 2011 2 次提交
    • J
      ext4: add support for multiple mount protection · c5e06d10
      Johann Lombardi 提交于
      Prevent an ext4 filesystem from being mounted multiple times.
      A sequence number is stored on disk and is periodically updated (every 5
      seconds by default) by a mounted filesystem.
      At mount time, we now wait for s_mmp_update_interval seconds to make sure
      that the MMP sequence does not change.
      In case of failure, the nodename, bdevname and the time at which the MMP
      block was last updated is displayed.
      Signed-off-by: NAndreas Dilger <adilger@whamcloud.com>
      Signed-off-by: NJohann Lombardi <johann@whamcloud.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c5e06d10
    • K
      ext4: ensure f_bfree returned by ext4_statfs() is non-negative · d02a9391
      Kazuya Mio 提交于
      I found the issue that the number of free blocks went negative.
      # stat -f /mnt/mp1/
        File: "/mnt/mp1/"
          ID: e175ccb83a872efe Namelen: 255     Type: ext2/ext3
      Block size: 4096       Fundamental block size: 4096
      Blocks: Total: 258022     Free: -15        Available: -13122
      Inodes: Total: 65536      Free: 63029
      
      f_bfree in struct statfs will go negative when the filesystem has
      few free blocks. Because the number of dirty blocks is bigger than
      the number of free blocks in the following two cases.
      
      CASE 1:
      ext4_da_writepages
        mpage_da_map_and_submit
          ext4_map_blocks
            ext4_ext_map_blocks
              ext4_mb_new_blocks
                ext4_mb_diskspace_used
                  percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len);
              <--- interrupt statfs systemcall --->
              ext4_da_update_reserve_space
                  percpu_counter_sub(&sbi->s_dirtyblocks_counter,
                                  used + ei->i_allocated_meta_blocks);
      
      CASE 2:
      ext4_write_begin
        __block_write_begin
          ext4_map_blocks
            ext4_ext_map_blocks
              ext4_mb_new_blocks
                ext4_mb_diskspace_used
                  percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len);
                  <--- interrupt statfs systemcall --->
                  percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
      
      To avoid the issue, this patch ensures that f_bfree is non-negative.
      Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      d02a9391
  11. 23 5月, 2011 2 次提交
  12. 21 5月, 2011 4 次提交
  13. 19 5月, 2011 1 次提交
  14. 16 5月, 2011 1 次提交
  15. 09 5月, 2011 2 次提交
  16. 19 4月, 2011 1 次提交
    • T
      ext4: check for ext[23] file system features when mounting as ext[23] · 2035e776
      Theodore Ts'o 提交于
      Provide better emulation for ext[23] mode by enforcing that the file
      system does not have any unsupported file system features as defined
      by ext[23] when emulating the ext[23] file system driver when
      CONFIG_EXT4_USE_FOR_EXT23 is defined.
      
      This causes the file system type information in /proc/mounts to be
      correct for the automatically mounted root file system.  This also
      means that "mount -t ext2 /dev/sda /mnt" will fail if /dev/sda
      contains an ext3 or ext4 file system, just as one would expect if the
      original ext2 file system driver were in use.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2035e776
  17. 11 4月, 2011 1 次提交
  18. 06 4月, 2011 1 次提交
  19. 05 4月, 2011 2 次提交
  20. 31 3月, 2011 1 次提交
  21. 22 3月, 2011 1 次提交
  22. 15 3月, 2011 1 次提交
  23. 06 3月, 2011 1 次提交
  24. 28 2月, 2011 2 次提交
  25. 27 2月, 2011 1 次提交
  26. 24 2月, 2011 2 次提交
  27. 22 2月, 2011 2 次提交
  28. 12 2月, 2011 1 次提交
    • E
      ext4: serialize unaligned asynchronous DIO · e9e3bcec
      Eric Sandeen 提交于
      ext4 has a data corruption case when doing non-block-aligned
      asynchronous direct IO into a sparse file, as demonstrated
      by xfstest 240.
      
      The root cause is that while ext4 preallocates space in the
      hole, mappings of that space still look "new" and 
      dio_zero_block() will zero out the unwritten portions.  When
      more than one AIO thread is going, they both find this "new"
      block and race to zero out their portion; this is uncoordinated
      and causes data corruption.
      
      Dave Chinner fixed this for xfs by simply serializing all
      unaligned asynchronous direct IO.  I've done the same here.
      The difference is that we only wait on conversions, not all IO.
      This is a very big hammer, and I'm not very pleased with
      stuffing this into ext4_file_write().  But since ext4 is
      DIO_LOCKING, we need to serialize it at this high level.
      
      I tried to move this into ext4_ext_direct_IO, but by then
      we have the i_mutex already, and we will wait on the
      work queue to do conversions - which must also take the
      i_mutex.  So that won't work.
      
      This was originally exposed by qemu-kvm installing to
      a raw disk image with a normal sector-63 alignment.  I've
      tested a backport of this patch with qemu, and it does
      avoid the corruption.  It is also quite a lot slower
      (14 min for package installs, vs. 8 min for well-aligned)
      but I'll take slow correctness over fast corruption any day.
      
      Mingming suggested that we can track outstanding
      conversions, and wait on those so that non-sparse
      files won't be affected, and I've implemented that here;
      unaligned AIO to nonsparse files won't take a perf hit.
      
      [tytso@mit.edu: Keep the mutex as a hashed array instead
       of bloating the ext4 inode]
      
      [tytso@mit.edu: Fix up namespace issues so that global
       variables are protected with an "ext4_" prefix.]
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e9e3bcec
  29. 04 2月, 2011 1 次提交