1. 28 7月, 2011 1 次提交
  2. 24 7月, 2011 3 次提交
  3. 18 7月, 2011 4 次提交
  4. 12 7月, 2011 1 次提交
  5. 11 7月, 2011 3 次提交
    • R
      ext4: remove redundant goto in ext4_ext_insert_extent() · ffb505ff
      Robin Dong 提交于
      If eh->eh_entries is smaller than eh->eh_max, the routine will
      go to the "repeat" and then go to "has_space" directlly ,
      since argument "depth" and "eh" are not even changed.
      
      Therefore, goto "has_space" directly and remove redundant "repeat" tag.
      Signed-off-by: NRobin Dong <sanbai@taobao.com>
      ffb505ff
    • J
      ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails · 575a1d4b
      Jiaying Zhang 提交于
      Upon corrupted inode or disk failures, we may fail after we already
      allocate some blocks from the inode or take some blocks from the
      inode's preallocation list, but before we successfully insert the
      corresponding extent to the extent tree. In this case, we should free
      any allocated blocks and discard the inode's preallocated blocks
      because the entries in the inode's preallocation list may be in an
      inconsistent state.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      575a1d4b
    • M
      ext4: fix i_blocks/quota accounting when extent insertion fails · 7132de74
      Maxim Patlasov 提交于
      The current implementation of ext4_free_blocks() always calls
      dquot_free_block This looks quite sensible in the most cases: blocks
      to be freed are associated with inode and were accounted in quota and
      i_blocks some time ago.
      
      However, there is a case when blocks to free were not accounted by the
      time calling ext4_free_blocks() yet:
      
      1. delalloc is on, write_begin pre-allocated some space in quota
      2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
      3. then ext4_ext_map_blocks() gets an error (e.g.  ENOSPC) from
         ext4_ext_insert_extent() and calls ext4_free_blocks().
      
      In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
      turn, decrements i_blocks for blocks which were not accounted yet (due
      to delalloc) After clean umount, e2fsck reports something like:
      
      > Inode 21, i_blocks is 5080, should be 5128.  Fix<y>?
      because i_blocks was erroneously decremented as explained above.
      
      The patch fixes the problem by passing the new flag
      EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
      that the dquot_free_block() call be skipped.
      Signed-off-by: NMaxim Patlasov <maxim.patlasov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      7132de74
  6. 28 6月, 2011 3 次提交
  7. 06 6月, 2011 2 次提交
    • L
      ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap · c03f8aa9
      Lukas Czerner 提交于
      Currently we are not marking the extent as the last one
      (FIEMAP_EXTENT_LAST) if there is a hole at the end of the file. This is
      because we just do not check for it right now and continue searching for
      next extent. But at the point we hit the hole at the end of the file, it
      is too late.
      
      This commit adds check for the allocated block in subsequent extent and
      if there is no more extents (block = EXT_MAX_BLOCKS) just flag the
      current one as the last one.
      
      This behaviour has been spotted unintentionally by 252 xfstest, when the
      test hangs out, because of wrong loop condition. However on other
      filesystems (like xfs) it will exit anyway, because we notice the last
      extent flag and exit.
      
      With this patch xfstest 252 does not hang anymore, ext4 fiemap
      implementation still reports bad extent type in some cases, however
      this seems to be different issue.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c03f8aa9
    • L
      ext4: Fix max file size and logical block counting of extent format file · f17722f9
      Lukas Czerner 提交于
      Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
      in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
      format and fill the tail of file up to its end. We will hit the BUG_ON
      when we write the last block (2^32-1) into the sparse file.
      
      The root cause of the problem lies in the fact that we specifically set
      s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
      which is 32 bit long. However, we are not storing start and end block
      number, but rather start block number and length in blocks. It means
      that in order to cover extent from 0 to EXT_MAX_BLOCK we need
      EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
      and it does not.
      
      The only way to fix it without changing the meaning of the struct
      ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
      by one fs block so we can cover the whole extent we can get by the
      on-disk extent format.
      
      Also in many places EXT_MAX_BLOCK is used as length instead of maximum
      logical block number as the name suggests, it is all a bit messy. So
      this commit renames it to EXT_MAX_BLOCKS and change its usage in some
      places to actually be maximum number of blocks in the extent.
      
      The bug which this commit fixes can be reproduced as follows:
      
       dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
       sync
       dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))
      Reported-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f17722f9
  8. 26 5月, 2011 1 次提交
  9. 25 5月, 2011 5 次提交
    • V
      ext4: do not normalize block requests from fallocate() · 556b27ab
      Vivek Haldar 提交于
      Currently, an fallocate request of size slightly larger than a power of
      2 is turned into two block requests, each a power of 2, with the extra
      blocks pre-allocated for future use. When an application calls
      fallocate, it already has an idea about how large the file may grow so
      there is usually little benefit to reserve extra blocks on the
      preallocation list. This reduces disk fragmentation.
      
      Tested: fsstress. Also verified manually that fallocat'ed files are
      contiguously laid out with this change (whereas without it they begin at
      power-of-2 boundaries, leaving blocks in between). CPU usage of
      fallocate is not appreciably higher.  In a tight fallocate loop, CPU
      usage hovers between 5%-8% with this change, and 5%-7% without it.
      
      Using a simulated file system aging program which the file system to
      70%, the percentage of free extents larger than 8MB (as measured by
      e2freefrag) increased from 38.8% without this change, to 69.4% with
      this change.
      Signed-off-by: NVivek Haldar <haldar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      556b27ab
    • A
      ext4: enable "punch hole" functionality · a4bb6b64
      Allison Henderson 提交于
      This patch adds new routines: "ext4_punch_hole" "ext4_ext_punch_hole"
      and "ext4_ext_check_cache"
      
      fallocate has been modified to call ext4_punch_hole when the punch hole
      flag is passed.  At the moment, we only support punching holes in
      extents, so this routine is pretty much a wrapper for the ext4_ext_punch_hole
      routine.
      
      The ext4_ext_punch_hole routine first completes all outstanding writes
      with the associated pages, and then releases them.  The unblock
      aligned data is zeroed, and all blocks in between are punched out.
      
      The ext4_ext_check_cache routine is very similar to ext4_ext_in_cache
      except it accepts a ext4_ext_cache parameter instead of a ext4_extent
      parameter.  This routine is used by ext4_ext_punch_hole to check and
      see if a block in a hole that has been cached.  The ext4_ext_cache
      parameter is necessary because the members ext4_extent structure are
      not large enough to hold a 32 bit value.  The existing
      ext4_ext_in_cache routine has become a wrapper to this new function.
      
      [ext4 punch hole patch series 5/5 v7] 
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      a4bb6b64
    • A
      ext4: add "punch hole" flag to ext4_map_blocks() · e861304b
      Allison Henderson 提交于
      This patch adds a new flag to ext4_map_blocks() that specifies the
      given range of blocks should be punched out.  Extents are first
      converted to uninitialized extents before they are punched
      out. Because punching a hole may require that the extent be split, it
      is possible that the splitting may need more blocks than are
      available.  To deal with this, use of reserved blocks are enabled to
      allow the split to proceed.
      
      The routine then returns the number of blocks successfully
      punched out.
      
      [ext4 punch hole patch series 4/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      e861304b
    • A
      ext4: punch out extents · d583fb87
      Allison Henderson 提交于
      This patch modifies the truncate routines to support hole punching
      Below is a brief summary of the patches changes:
      
      - Added end param to ext_ext4_rm_leaf
              This function has been modified to accept an end parameter
              which enables it to punch holes in leafs instead of just
              truncating them.
      
      - Implemented the "remove head" case in the ext_remove_blocks routine
              This routine is used by ext_ext4_rm_leaf to remove the tail
              of an extent during a truncate.  The new ext_ext4_rm_leaf
              routine will now also use it to remove the head of an extent in the
              case that the hole covers a region of blocks at the beginning
              of an extent.
      
      - Added "end" param to ext4_ext_remove_space routine
              This function has been modified to accept a stop parameter, which
              is passed through to ext4_ext_rm_leaf.
      
      [ext4 punch hole patch series 3/5 v6] 
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d583fb87
    • A
      ext4: add flag to ext4_has_free_blocks · 55f020db
      Allison Henderson 提交于
      This patch adds an allocation request flag to the ext4_has_free_blocks
      function which enables the use of reserved blocks.  This will allow a
      punch hole to proceed even if the disk is full.  Punching a hole may
      require additional blocks to first split the extents.
      
      Because ext4_has_free_blocks is a low level function, the flag needs
      to be passed down through several functions listed below:
      
      ext4_ext_insert_extent
      ext4_ext_create_new_leaf
      ext4_ext_grow_indepth
      ext4_ext_split
      ext4_ext_new_meta_block
      ext4_mb_new_blocks
      ext4_claim_free_blocks
      ext4_has_free_blocks
      
      [ext4 punch hole patch series 1/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      55f020db
  10. 24 5月, 2011 1 次提交
  11. 23 5月, 2011 3 次提交
  12. 16 5月, 2011 1 次提交
  13. 04 5月, 2011 2 次提交
  14. 03 5月, 2011 1 次提交
  15. 31 3月, 2011 1 次提交
  16. 24 3月, 2011 1 次提交
  17. 22 3月, 2011 1 次提交
  18. 28 2月, 2011 1 次提交
    • Y
      ext4: make FIEMAP and delayed allocation play well together · 6d9c85eb
      Yongqiang Yang 提交于
      Fix the FIEMAP ioctl so that it returns all of the page ranges which
      are still subject to delayed allocation.  We were missing some cases
      if the file was sparse.
      
      Reported by Chris Mason <chris.mason@oracle.com>:
      >We've had reports on btrfs that cp is giving us files full of zeros
      >instead of actually copying them.  It was tracked down to a bug with
      >the btrfs fiemap implementation where it was returning holes for
      >delalloc ranges.
      >
      >Newer versions of cp are trusting fiemap to tell it where the holes
      >are, which does seem like a pretty neat trick.
      >
      >I decided to give xfs and ext4 a shot with a few tests cases too, xfs
      >passed with all the ones btrfs was getting wrong, and ext4 got the basic
      >delalloc case right.
      >$ mkfs.ext4 /dev/xxx
      >$ mount /dev/xxx /mnt
      >$ dd if=/dev/zero of=/mnt/foo bs=1M count=1
      >$ fiemap-test foo
      >ext:   0 logical: [       0..     255] phys:        0..     255
      >flags: 0x007 tot: 256
      >
      >Horray!  But once we throw a hole in, things go bad:
      >$ mkfs.ext4 /dev/xxx
      >$ mount /dev/xxx /mnt
      >$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
      >$ fiemap-test foo
      >< no output >
      >
      >We've got a delalloc extent after the hole and ext4 fiemap didn't find
      >it.  If I run sync to kick the delalloc out:
      >$sync
      >$ fiemap-test foo
      >ext:   0 logical: [     256..     511] phys:    34048..   34303
      >flags: 0x001 tot: 256
      >
      >fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
      >got there.  It's full of pretty comments so I know it isn't mine, but
      >you can grab it here:
      >
      >http://oss.oracle.com/~mason/fiemap-test.c
      >
      >xfsqa has a fiemap program too.
      
      After Fix, test results are as follows:
      ext:   0 logical: [     256..     511] phys:        0..     255
      flags: 0x007 tot: 256
      ext:   0 logical: [     256..     511] phys:    33280..   33535
      flags: 0x001 tot: 256
      
      $ mkfs.ext4 /dev/xxx
      $ mount /dev/xxx /mnt
      $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
      $ sync
      $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=3
      $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=5
      $ fiemap-test foo
      ext:   0 logical: [     256..     511] phys:    33280..   33535
      flags: 0x000 tot: 256
      ext:   1 logical: [     768..    1023] phys:        0..     255
      flags: 0x006 tot: 256
      ext:   2 logical: [    1280..    1535] phys:        0..     255
      flags: 0x007 tot: 256
      Tested-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6d9c85eb
  19. 22 2月, 2011 1 次提交
  20. 15 2月, 2011 1 次提交
  21. 12 2月, 2011 1 次提交
    • E
      ext4: serialize unaligned asynchronous DIO · e9e3bcec
      Eric Sandeen 提交于
      ext4 has a data corruption case when doing non-block-aligned
      asynchronous direct IO into a sparse file, as demonstrated
      by xfstest 240.
      
      The root cause is that while ext4 preallocates space in the
      hole, mappings of that space still look "new" and 
      dio_zero_block() will zero out the unwritten portions.  When
      more than one AIO thread is going, they both find this "new"
      block and race to zero out their portion; this is uncoordinated
      and causes data corruption.
      
      Dave Chinner fixed this for xfs by simply serializing all
      unaligned asynchronous direct IO.  I've done the same here.
      The difference is that we only wait on conversions, not all IO.
      This is a very big hammer, and I'm not very pleased with
      stuffing this into ext4_file_write().  But since ext4 is
      DIO_LOCKING, we need to serialize it at this high level.
      
      I tried to move this into ext4_ext_direct_IO, but by then
      we have the i_mutex already, and we will wait on the
      work queue to do conversions - which must also take the
      i_mutex.  So that won't work.
      
      This was originally exposed by qemu-kvm installing to
      a raw disk image with a normal sector-63 alignment.  I've
      tested a backport of this patch with qemu, and it does
      avoid the corruption.  It is also quite a lot slower
      (14 min for package installs, vs. 8 min for well-aligned)
      but I'll take slow correctness over fast corruption any day.
      
      Mingming suggested that we can track outstanding
      conversions, and wait on those so that non-sparse
      files won't be affected, and I've implemented that here;
      unaligned AIO to nonsparse files won't take a perf hit.
      
      [tytso@mit.edu: Keep the mutex as a hashed array instead
       of bloating the ext4 inode]
      
      [tytso@mit.edu: Fix up namespace issues so that global
       variables are protected with an "ext4_" prefix.]
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e9e3bcec
  22. 21 1月, 2011 1 次提交
  23. 17 1月, 2011 1 次提交
    • C
      fallocate should be a file operation · 2fe17c10
      Christoph Hellwig 提交于
      Currently all filesystems except XFS implement fallocate asynchronously,
      while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
      I/O we really want our allocation on disk, especially for the !KEEP_SIZE
      case where we actually grow the file with user-visible zeroes.  On the
      other hand always commiting the transaction is a bad idea for fast-path
      uses of fallocate like for example in recent Samba versions.   Given
      that block allocation is a data plane operation anyway change it from
      an inode operation to a file operation so that we have the file structure
      available that lets us check for O_SYNC.
      
      This also includes moving the code around for a few of the filesystems,
      and remove the already unnedded S_ISDIR checks given that we only wire
      up fallocate for regular files.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2fe17c10