1. 04 7月, 2015 2 次提交
    • E
      ext4: correctly migrate a file with a hole at the beginning · 8974fec7
      Eryu Guan 提交于
      Currently ext4_ind_migrate() doesn't correctly handle a file which
      contains a hole at the beginning of the file.  This caused the migration
      to be done incorrectly, and then if there is a subsequent following
      delayed allocation write to the "hole", this would reclaim the same data
      blocks again and results in fs corruption.
      
        # assmuing 4k block size ext4, with delalloc enabled
        # skip the first block and write to the second block
        xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile
      
        # converting to indirect-mapped file, which would move the data blocks
        # to the beginning of the file, but extent status cache still marks
        # that region as a hole
        chattr -e /mnt/ext4/testfile
      
        # delayed allocation writes to the "hole", reclaim the same data block
        # again, results in i_blocks corruption
        xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile
        umount /mnt/ext4
        e2fsck -nf /dev/sda6
        ...
        Inode 53, i_blocks is 16, should be 8.  Fix? no
        ...
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      8974fec7
    • E
      ext4: be more strict when migrating to non-extent based file · d6f123a9
      Eryu Guan 提交于
      Currently the check in ext4_ind_migrate() is not enough before doing the
      real conversion:
      
      a) delayed allocated extents could bypass the check on eh->eh_entries
         and eh->eh_depth
      
      This can be demonstrated by this script
      
        xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile
        chattr -e /mnt/ext4/testfile
      
      where testfile has two extents but still be converted to non-extent
      based file format.
      
      b) only extent length is checked but not the offset, which would result
         in data lose (delalloc) or fs corruption (nodelalloc), because
         non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30)
         blocks
      
      This can be demostrated by
      
        xfs_io -fc "pwrite 5T 4k" /mnt/ext4/testfile
        chattr -e /mnt/ext4/testfile
        sync
      
      If delalloc is enabled, dmesg prints
        EXT4-fs warning (device dm-4): ext4_block_to_path:105: block 1342177280 > max in inode 53
        EXT4-fs (dm-4): Delayed block allocation failed for inode 53 at logical offset 1342177280 with max blocks 1 with error 5
        EXT4-fs (dm-4): This should not happen!! Data will be lost
      
      If delalloc is disabled, e2fsck -nf shows corruption
        Inode 53, i_size is 5497558142976, should be 4096.  Fix? no
      
      Fix the two issues by
      
      a) forcing all delayed allocation blocks to be allocated before checking
         eh->eh_depth and eh->eh_entries
      b) limiting the last logical block of the extent is within direct map
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d6f123a9
  2. 16 4月, 2015 1 次提交
  3. 26 11月, 2014 1 次提交
  4. 02 9月, 2014 3 次提交
  5. 28 7月, 2014 1 次提交
  6. 13 5月, 2014 1 次提交
  7. 29 8月, 2013 1 次提交
  8. 17 8月, 2013 1 次提交
  9. 11 4月, 2013 2 次提交
  10. 10 2月, 2013 1 次提交
  11. 09 2月, 2013 1 次提交
    • T
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o 提交于
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  12. 29 11月, 2012 1 次提交
    • T
      ext4: rationalize ext4_extents.h inclusion · 4a092d73
      Theodore Ts'o 提交于
      Previously, ext4_extents.h was being included at the end of ext4.h,
      which was bad for a number of reasons: (a) it was not being included
      in the expected place, and (b) it caused the header to be included
      multiple times.  There were #ifdef's to prevent this from causing any
      problems, but it still was unnecessary.
      
      By moving the function declarations that were in ext4_extents.h to
      ext4.h, which is standard practice for where the function declarations
      for the rest of ext4.h can be found, we can remove ext4_extents.h from
      being included in ext4.h at all, and then we can only include
      ext4_extents.h where it is needed in ext4's source files.
      
      It should be possible to move a few more things into ext4.h, and
      further reduce the number of source files that need to #include
      ext4_extents.h, but that's a cleanup for another day.
      Reported-by: NSachin Kamat <sachin.kamat@linaro.org>
      Reported-by: NWei Yongjun <weiyj.lk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4a092d73
  13. 16 5月, 2012 1 次提交
  14. 21 2月, 2012 1 次提交
  15. 09 1月, 2012 1 次提交
  16. 02 11月, 2011 1 次提交
  17. 29 10月, 2011 2 次提交
  18. 10 9月, 2011 1 次提交
  19. 03 5月, 2011 1 次提交
  20. 31 3月, 2011 1 次提交
  21. 22 2月, 2011 1 次提交
  22. 11 1月, 2011 1 次提交
  23. 28 10月, 2010 1 次提交
  24. 14 6月, 2010 1 次提交
  25. 17 5月, 2010 1 次提交
  26. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  27. 02 3月, 2010 1 次提交
  28. 25 1月, 2010 1 次提交
    • T
      ext4: Use bitops to read/modify EXT4_I(inode)->i_state · 19f5fb7a
      Theodore Ts'o 提交于
      At several places we modify EXT4_I(inode)->i_state without holding
      i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
      ext4_do_update_inode, ...). These modifications are racy and we can
      lose updates to i_state. So convert handling of i_state to use bitops
      which are atomic.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      19f5fb7a
  29. 09 12月, 2009 1 次提交
  30. 23 11月, 2009 1 次提交
    • T
      ext4: call ext4_forget() from ext4_free_blocks() · e6362609
      Theodore Ts'o 提交于
      Add the facility for ext4_forget() to be called from
      ext4_free_blocks().  This simplifies the code in a large number of
      places, and centralizes most of the work of calling ext4_forget() into
      a single place.
      
      Also fix a bug in the extents migration code; it wasn't calling
      ext4_forget() when releasing the indirect blocks during the
      conversion.  As a result, if the system cashed during or shortly after
      the extents migration, and the released indirect blocks get reused as
      data blocks, the journal replay would corrupt the data blocks.  With
      this new patch, fixing this bug was as simple as adding the
      EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      e6362609
  31. 29 9月, 2009 1 次提交
    • M
      ext4: Split uninitialized extents for direct I/O · 0031462b
      Mingming Cao 提交于
      When writing into an unitialized extent via direct I/O, and the direct
      I/O doesn't exactly cover the unitialized extent, split the extent
      into uninitialized and initialized extents before submitting the I/O.
      This avoids needing to deal with an ENOSPC error in the end_io
      callback that gets used for direct I/O.
      
      When the IO is complete, the written extent will be marked as initialized.
      
      Singed-Off-By: Mingming Cao <cmm@us.ibm.com> 
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0031462b
  32. 17 9月, 2009 1 次提交
    • T
      ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags · 1b9c12f4
      Theodore Ts'o 提交于
      EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag,
      and the hex value assigned to it collides with FS_DIRECTIO_FL (which
      is also stored in i_flags).  There's no reason for the
      EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use
      i_state instead.
      
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1b9c12f4
  33. 26 8月, 2009 1 次提交
    • A
      ext4: Add missing unlock_new_inode() call in extent migration code · a8526e84
      Aneesh Kumar K.V 提交于
      We need to unlock the new inode before iput.  This patch fixes the
      following warning when calling chattr +e to migrate a file to use
      extents.  It also fixes problems in when e4defrag attempts to
      defragment an inode.
      
      [  470.400044] ------------[ cut here ]------------
      [  470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
      [  470.400072] Hardware name: N/A
      .....
      ...
      [  470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
      [  470.400359] Call Trace:
      [  470.400372]  [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f
      [  470.400385]  [<ffffffff81037798>] warn_slowpath_null+0xf/0x11
      [  470.400395]  [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a
      [  470.400405]  [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd
      [  470.400413]  [<ffffffff810b7083>] iput+0x61/0x65
      [  470.400455]  [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4]
      [  470.400492]  [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4]
      [  470.400507]  [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82
      [  470.400517]  [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9
      [  470.400527]  [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf
      [  470.400537]  [<ffffffff810b2087>] sys_ioctl+0x51/0x74
      [  470.400549]  [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b
      [  470.400557] ---[ end trace ab85723542352dac ]---
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a8526e84
  34. 13 6月, 2009 2 次提交
    • A
      ext4: teach the inode allocator to use a goal inode number · 11013911
      Andreas Dilger 提交于
      Enhance the inode allocator to take a goal inode number as a
      paremeter; if it is specified, it takes precedence over Orlov or
      parent directory inode allocation algorithms.
      
      The extents migration function uses the goal inode number so that the
      extent trees allocated the migration function use the correct flex_bg.
      In the future, the goal inode functionality will also be used to
      allocate an adjacent inode for the extended attributes.
      
      Also, for testing purposes the goal inode number can be specified via
      /sys/fs/{dev}/inode_goal.  This can be useful for testing inode
      allocation beyond 2^32 blocks on very large filesystems.
      Signed-off-by: NAndreas Dilger <adilger@sun.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      11013911
    • T
      ext4: Use a hash of the topdir directory name for the Orlov parent group · f157a4aa
      Theodore Ts'o 提交于
      Instead of using a random number to determine the goal parent grop for
      the Orlov top directories, use a hash of the directory name.  This
      allows for repeatable results when trying to benchmark filesystem
      layout algorithms.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f157a4aa