1. 10 9月, 2011 5 次提交
    • A
      ext4: attempt to fix race in bigalloc code path · 5356f261
      Aditya Kali 提交于
      Currently, there exists a race between delayed allocated writes and
      the writeback when bigalloc feature is in use. The race was because we
      wanted to determine what blocks in a cluster are under delayed
      allocation and we were using buffer_delayed(bh) check for it. But, the
      writeback codepath clears this bit without any synchronization which
      resulted in a race and an ext4 warning similar to:
      
      EXT4-fs (ram1): ext4_da_update_reserve_space: ino 13, used 1 with only 0
      		reserved data blocks
      
      The race existed in two places.
      (1) between ext4_find_delalloc_range() and ext4_map_blocks() when called from
          writeback code path.
      (2) between ext4_find_delalloc_range() and ext4_da_get_block_prep() (where
          buffer_delayed(bh) is set.
      
      To fix (1), this patch introduces a new buffer_head state bit -
      BH_Da_Mapped.  This bit is set under the protection of
      EXT4_I(inode)->i_data_sem when we have actually mapped the delayed
      allocated blocks during the writeout time. We can now reliably check
      for this bit inside ext4_find_delalloc_range() to determine whether
      the reservation for the blocks have already been claimed or not.
      
      To fix (2), it was necessary to set buffer_delay(bh) under the
      protection of i_data_sem.  So, I extracted the very beginning of
      ext4_map_blocks into a new function - ext4_da_map_blocks() - and
      performed the required setting of bh_delay bit and the quota
      reservation under the protection of i_data_sem.  These two fixes makes
      the checking of buffer_delay(bh) and buffer_da_mapped(bh) consistent,
      thus removing the race.
      
      Tested: I was able to reproduce the problem by running 'dd' and
      'fsync' in parallel. Also, xfstests sometimes used to reproduce this
      race. After the fix both my test and xfstests were successful and no
      race (warning message) was observed.
      
      Google-Bug-Id: 4997027
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5356f261
    • A
      ext4: add some tracepoints in ext4/extents.c · d8990240
      Aditya Kali 提交于
      This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in
      ext4/inode.c.
      
      Tested: Built and ran the kernel and verified that these tracepoints work.
      Also ran xfstests.
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
          
      d8990240
    • A
      ext4: Fix bigalloc quota accounting and i_blocks value · 7b415bf6
      Aditya Kali 提交于
      With bigalloc changes, the i_blocks value was not correctly set (it was still
      set to number of blocks being used, but in case of bigalloc, we want i_blocks
      to represent the number of clusters being used). Since the quota subsystem sets
      the i_blocks value, this patch fixes the quota accounting and makes sure that
      the i_blocks value is set correctly.
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7b415bf6
    • T
      ext4: teach ext4_ext_truncate() about the bigalloc feature · 0aa06000
      Theodore Ts'o 提交于
      When we are truncating (as opposed unlinking) a file, we need to worry
      about partial truncates of a file, especially in the light of sparse
      files.  The changes here make sure that arbitrary truncates of sparse
      files works correctly.  Yeah, it's messy.
      
      Note that these functions will need to be revisted when the punch
      ioctl is integrated --- in fact this commit will probably have merge
      conflicts with the punch changes which Allison Henders and the IBM LTC
      have been working on.  I will need to fix this up when either patch
      hits mainline.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0aa06000
    • T
      ext4: teach ext4_ext_map_blocks() about the bigalloc feature · 4d33b1ef
      Theodore Ts'o 提交于
      If we need to allocate a new block in ext4_ext_map_blocks(), the
      function needs to see if the cluster has already been allocated.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4d33b1ef
  2. 07 9月, 2011 1 次提交
    • A
      ext4: fix fsx truncate failure · 189e868f
      Allison Henderson 提交于
      While running extended fsx tests to verify the first
      two patches, a similar bug was also found in the
      truncate operation.
      
      This bug happens because the truncate routine only zeros
      the unblock aligned portion of the last page.  This means
      that the block aligned portions of the page appearing after
      i_size are left unzeroed, and the buffer heads still mapped.
      
      This bug is corrected by using ext4_discard_partial_page_buffers
      in the truncate routine to zero the partial page and unmap
      the buffer headers.
      Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      189e868f
  3. 04 9月, 2011 1 次提交
    • T
      jbd2: add debugging information to jbd2_journal_dirty_metadata() · 9ea7a0df
      Theodore Ts'o 提交于
      Add debugging information in case jbd2_journal_dirty_metadata() is
      called with a buffer_head which didn't have
      jbd2_journal_get_write_access() called on it, or if the journal_head
      has the wrong transaction in it.  In addition, return an error code.
      This won't change anything for ocfs2, which will BUG_ON() the non-zero
      exit code.
      
      For ext4, the caller of this function is ext4_handle_dirty_metadata(),
      and on seeing a non-zero return code, will call __ext4_journal_stop(),
      which will print the function and line number of the (buggy) calling
      function and abort the journal.  This will allow us to recover instead
      of bug halting, which is better from a robustness and reliability
      point of view.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9ea7a0df
  4. 03 9月, 2011 2 次提交
    • A
      ext4: fix 2nd xfstests 127 punch hole failure · 2be4751b
      Allison Henderson 提交于
      This patch fixes a second punch hole bug found by xfstests 127.
      
      This bug happens because punch hole needs to flush the pages
      of the hole to avoid race conditions.  But if the end of the
      hole is in the same page as i_size, the buffer heads beyond
      i_size need to be unmapped and the page needs to be zeroed
      after it is flushed.
      
      To correct this, the new ext4_discard_partial_page_buffers
      routine is used to zero and unmap the partial page
      beyond i_size if the end of the hole appears in the same
      page as i_size.
      
      The code has also been optimized to set the end of the hole
      to the page after i_size if the specified hole exceeds i_size,
      and the code that flushes the pages has been simplified.
      Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
      2be4751b
    • A
      ext4: fix xfstests 75, 112, 127 punch hole failure · ba06208a
      Allison Henderson 提交于
      This patch addresses a bug found by xfstests 75, 112, 127
      when blocksize = 1k
      
      This bug happens because the punch hole code only zeros
      out non block aligned regions of the page.  This means that if the
      blocks are smaller than a page, then the block aligned regions of
      the page inside the hole are left un-zeroed, and their buffer heads
      are still mapped.  This bug is corrected by using
      ext4_discard_partial_page_buffers to properly zero the partial page
      at the head and tail of the hole, and unmap the corresponding buffer
      heads
      
      This patch also addresses a bug reported by Lukas while working on a
      new patch to add discard support for loop devices using punch hole.
      The bug happened because of the first and last block number
      needed to be cast to a larger data type before calculating the
      byte offset, but since now we only need the byte offsets of the
      pages, we no longer even need to be calculating the byte offsets
      of the blocks.  The code to do the block offset calculations is
      removed in this patch.
      Signed-off-by: NAllison Henderson <achender@linux.vnet.ibm.com>
      ba06208a
  5. 28 7月, 2011 2 次提交
    • U
      ext4: Fix overflow caused by missing cast in ext4_fallocate() · 29ae07b7
      Utako Kusaka 提交于
      The logical block number in map.l_blk is a __u32, and so before we
      shift it left, by the block size, we neeed cast it to a 64-bit size.
      
      Otherwise i_size can be corrupted on an ENOSPC.
      
      # df -T /mnt/mp1
      Filesystem    Type   1K-blocks      Used Available Use% Mounted on
      /dev/sda6     ext4     9843276    153056   9190200   2% /mnt/mp1
      # fallocate -o 0 -l 2199023251456 /mnt/mp1/testfile
      fallocate: /mnt/mp1/testfile: fallocate failed: No space left on device
      # stat /mnt/mp1/testfile
        File: `/mnt/mp1/testfile'
        Size: 4293656576	Blocks: 19380440   IO Block: 4096   regular file
      Device: 806h/2054d	Inode: 12          Links: 1
      Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
      Access: 2011-07-25 13:01:31.414490496 +0900
      Modify: 2011-07-25 13:01:31.414490496 +0900
      Change: 2011-07-25 13:01:31.454490495 +0900
      Signed-off-by: NUtako Kusaka <u-kusaka@wm.jp.nec.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      --
       fs/ext4/extents.c |    2 +-
       1 files changed, 1 insertions(+), 1 deletions(-)
      29ae07b7
    • R
      ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole · 0e1147b0
      Robin Dong 提交于
      The old function ext4_ext_rm_idx is used only for truncate case
      because it just remove last index in extent-index-block. When punching
      hole, it usually needed to remove "middle" index, therefore we must
      move indexes which after it forward.
      
      (I create a file with 1 depth extent tree and punch hole in the middle
      of it, the last index in index-block strangly gone, so I find out this
      bug)
      Signed-off-by: NRobin Dong <sanbai@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0e1147b0
  6. 24 7月, 2011 3 次提交
  7. 18 7月, 2011 4 次提交
  8. 12 7月, 2011 1 次提交
  9. 11 7月, 2011 3 次提交
    • R
      ext4: remove redundant goto in ext4_ext_insert_extent() · ffb505ff
      Robin Dong 提交于
      If eh->eh_entries is smaller than eh->eh_max, the routine will
      go to the "repeat" and then go to "has_space" directlly ,
      since argument "depth" and "eh" are not even changed.
      
      Therefore, goto "has_space" directly and remove redundant "repeat" tag.
      Signed-off-by: NRobin Dong <sanbai@taobao.com>
      ffb505ff
    • J
      ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails · 575a1d4b
      Jiaying Zhang 提交于
      Upon corrupted inode or disk failures, we may fail after we already
      allocate some blocks from the inode or take some blocks from the
      inode's preallocation list, but before we successfully insert the
      corresponding extent to the extent tree. In this case, we should free
      any allocated blocks and discard the inode's preallocated blocks
      because the entries in the inode's preallocation list may be in an
      inconsistent state.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      575a1d4b
    • M
      ext4: fix i_blocks/quota accounting when extent insertion fails · 7132de74
      Maxim Patlasov 提交于
      The current implementation of ext4_free_blocks() always calls
      dquot_free_block This looks quite sensible in the most cases: blocks
      to be freed are associated with inode and were accounted in quota and
      i_blocks some time ago.
      
      However, there is a case when blocks to free were not accounted by the
      time calling ext4_free_blocks() yet:
      
      1. delalloc is on, write_begin pre-allocated some space in quota
      2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
      3. then ext4_ext_map_blocks() gets an error (e.g.  ENOSPC) from
         ext4_ext_insert_extent() and calls ext4_free_blocks().
      
      In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
      turn, decrements i_blocks for blocks which were not accounted yet (due
      to delalloc) After clean umount, e2fsck reports something like:
      
      > Inode 21, i_blocks is 5080, should be 5128.  Fix<y>?
      because i_blocks was erroneously decremented as explained above.
      
      The patch fixes the problem by passing the new flag
      EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
      that the dquot_free_block() call be skipped.
      Signed-off-by: NMaxim Patlasov <maxim.patlasov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      7132de74
  10. 28 6月, 2011 3 次提交
  11. 06 6月, 2011 2 次提交
    • L
      ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap · c03f8aa9
      Lukas Czerner 提交于
      Currently we are not marking the extent as the last one
      (FIEMAP_EXTENT_LAST) if there is a hole at the end of the file. This is
      because we just do not check for it right now and continue searching for
      next extent. But at the point we hit the hole at the end of the file, it
      is too late.
      
      This commit adds check for the allocated block in subsequent extent and
      if there is no more extents (block = EXT_MAX_BLOCKS) just flag the
      current one as the last one.
      
      This behaviour has been spotted unintentionally by 252 xfstest, when the
      test hangs out, because of wrong loop condition. However on other
      filesystems (like xfs) it will exit anyway, because we notice the last
      extent flag and exit.
      
      With this patch xfstest 252 does not hang anymore, ext4 fiemap
      implementation still reports bad extent type in some cases, however
      this seems to be different issue.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c03f8aa9
    • L
      ext4: Fix max file size and logical block counting of extent format file · f17722f9
      Lukas Czerner 提交于
      Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
      in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
      format and fill the tail of file up to its end. We will hit the BUG_ON
      when we write the last block (2^32-1) into the sparse file.
      
      The root cause of the problem lies in the fact that we specifically set
      s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
      which is 32 bit long. However, we are not storing start and end block
      number, but rather start block number and length in blocks. It means
      that in order to cover extent from 0 to EXT_MAX_BLOCK we need
      EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
      and it does not.
      
      The only way to fix it without changing the meaning of the struct
      ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
      by one fs block so we can cover the whole extent we can get by the
      on-disk extent format.
      
      Also in many places EXT_MAX_BLOCK is used as length instead of maximum
      logical block number as the name suggests, it is all a bit messy. So
      this commit renames it to EXT_MAX_BLOCKS and change its usage in some
      places to actually be maximum number of blocks in the extent.
      
      The bug which this commit fixes can be reproduced as follows:
      
       dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
       sync
       dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))
      Reported-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f17722f9
  12. 26 5月, 2011 1 次提交
  13. 25 5月, 2011 5 次提交
    • V
      ext4: do not normalize block requests from fallocate() · 556b27ab
      Vivek Haldar 提交于
      Currently, an fallocate request of size slightly larger than a power of
      2 is turned into two block requests, each a power of 2, with the extra
      blocks pre-allocated for future use. When an application calls
      fallocate, it already has an idea about how large the file may grow so
      there is usually little benefit to reserve extra blocks on the
      preallocation list. This reduces disk fragmentation.
      
      Tested: fsstress. Also verified manually that fallocat'ed files are
      contiguously laid out with this change (whereas without it they begin at
      power-of-2 boundaries, leaving blocks in between). CPU usage of
      fallocate is not appreciably higher.  In a tight fallocate loop, CPU
      usage hovers between 5%-8% with this change, and 5%-7% without it.
      
      Using a simulated file system aging program which the file system to
      70%, the percentage of free extents larger than 8MB (as measured by
      e2freefrag) increased from 38.8% without this change, to 69.4% with
      this change.
      Signed-off-by: NVivek Haldar <haldar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      556b27ab
    • A
      ext4: enable "punch hole" functionality · a4bb6b64
      Allison Henderson 提交于
      This patch adds new routines: "ext4_punch_hole" "ext4_ext_punch_hole"
      and "ext4_ext_check_cache"
      
      fallocate has been modified to call ext4_punch_hole when the punch hole
      flag is passed.  At the moment, we only support punching holes in
      extents, so this routine is pretty much a wrapper for the ext4_ext_punch_hole
      routine.
      
      The ext4_ext_punch_hole routine first completes all outstanding writes
      with the associated pages, and then releases them.  The unblock
      aligned data is zeroed, and all blocks in between are punched out.
      
      The ext4_ext_check_cache routine is very similar to ext4_ext_in_cache
      except it accepts a ext4_ext_cache parameter instead of a ext4_extent
      parameter.  This routine is used by ext4_ext_punch_hole to check and
      see if a block in a hole that has been cached.  The ext4_ext_cache
      parameter is necessary because the members ext4_extent structure are
      not large enough to hold a 32 bit value.  The existing
      ext4_ext_in_cache routine has become a wrapper to this new function.
      
      [ext4 punch hole patch series 5/5 v7] 
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      a4bb6b64
    • A
      ext4: add "punch hole" flag to ext4_map_blocks() · e861304b
      Allison Henderson 提交于
      This patch adds a new flag to ext4_map_blocks() that specifies the
      given range of blocks should be punched out.  Extents are first
      converted to uninitialized extents before they are punched
      out. Because punching a hole may require that the extent be split, it
      is possible that the splitting may need more blocks than are
      available.  To deal with this, use of reserved blocks are enabled to
      allow the split to proceed.
      
      The routine then returns the number of blocks successfully
      punched out.
      
      [ext4 punch hole patch series 4/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      e861304b
    • A
      ext4: punch out extents · d583fb87
      Allison Henderson 提交于
      This patch modifies the truncate routines to support hole punching
      Below is a brief summary of the patches changes:
      
      - Added end param to ext_ext4_rm_leaf
              This function has been modified to accept an end parameter
              which enables it to punch holes in leafs instead of just
              truncating them.
      
      - Implemented the "remove head" case in the ext_remove_blocks routine
              This routine is used by ext_ext4_rm_leaf to remove the tail
              of an extent during a truncate.  The new ext_ext4_rm_leaf
              routine will now also use it to remove the head of an extent in the
              case that the hole covers a region of blocks at the beginning
              of an extent.
      
      - Added "end" param to ext4_ext_remove_space routine
              This function has been modified to accept a stop parameter, which
              is passed through to ext4_ext_rm_leaf.
      
      [ext4 punch hole patch series 3/5 v6] 
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d583fb87
    • A
      ext4: add flag to ext4_has_free_blocks · 55f020db
      Allison Henderson 提交于
      This patch adds an allocation request flag to the ext4_has_free_blocks
      function which enables the use of reserved blocks.  This will allow a
      punch hole to proceed even if the disk is full.  Punching a hole may
      require additional blocks to first split the extents.
      
      Because ext4_has_free_blocks is a low level function, the flag needs
      to be passed down through several functions listed below:
      
      ext4_ext_insert_extent
      ext4_ext_create_new_leaf
      ext4_ext_grow_indepth
      ext4_ext_split
      ext4_ext_new_meta_block
      ext4_mb_new_blocks
      ext4_claim_free_blocks
      ext4_has_free_blocks
      
      [ext4 punch hole patch series 1/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      55f020db
  14. 24 5月, 2011 1 次提交
  15. 23 5月, 2011 3 次提交
  16. 16 5月, 2011 1 次提交
  17. 04 5月, 2011 2 次提交