1. 26 5月, 2011 1 次提交
  2. 25 5月, 2011 13 次提交
    • V
      ext4: do not normalize block requests from fallocate() · 556b27ab
      Vivek Haldar 提交于
      Currently, an fallocate request of size slightly larger than a power of
      2 is turned into two block requests, each a power of 2, with the extra
      blocks pre-allocated for future use. When an application calls
      fallocate, it already has an idea about how large the file may grow so
      there is usually little benefit to reserve extra blocks on the
      preallocation list. This reduces disk fragmentation.
      
      Tested: fsstress. Also verified manually that fallocat'ed files are
      contiguously laid out with this change (whereas without it they begin at
      power-of-2 boundaries, leaving blocks in between). CPU usage of
      fallocate is not appreciably higher.  In a tight fallocate loop, CPU
      usage hovers between 5%-8% with this change, and 5%-7% without it.
      
      Using a simulated file system aging program which the file system to
      70%, the percentage of free extents larger than 8MB (as measured by
      e2freefrag) increased from 38.8% without this change, to 69.4% with
      this change.
      Signed-off-by: NVivek Haldar <haldar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      556b27ab
    • A
      ext4: enable "punch hole" functionality · a4bb6b64
      Allison Henderson 提交于
      This patch adds new routines: "ext4_punch_hole" "ext4_ext_punch_hole"
      and "ext4_ext_check_cache"
      
      fallocate has been modified to call ext4_punch_hole when the punch hole
      flag is passed.  At the moment, we only support punching holes in
      extents, so this routine is pretty much a wrapper for the ext4_ext_punch_hole
      routine.
      
      The ext4_ext_punch_hole routine first completes all outstanding writes
      with the associated pages, and then releases them.  The unblock
      aligned data is zeroed, and all blocks in between are punched out.
      
      The ext4_ext_check_cache routine is very similar to ext4_ext_in_cache
      except it accepts a ext4_ext_cache parameter instead of a ext4_extent
      parameter.  This routine is used by ext4_ext_punch_hole to check and
      see if a block in a hole that has been cached.  The ext4_ext_cache
      parameter is necessary because the members ext4_extent structure are
      not large enough to hold a 32 bit value.  The existing
      ext4_ext_in_cache routine has become a wrapper to this new function.
      
      [ext4 punch hole patch series 5/5 v7] 
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      a4bb6b64
    • A
      ext4: add "punch hole" flag to ext4_map_blocks() · e861304b
      Allison Henderson 提交于
      This patch adds a new flag to ext4_map_blocks() that specifies the
      given range of blocks should be punched out.  Extents are first
      converted to uninitialized extents before they are punched
      out. Because punching a hole may require that the extent be split, it
      is possible that the splitting may need more blocks than are
      available.  To deal with this, use of reserved blocks are enabled to
      allow the split to proceed.
      
      The routine then returns the number of blocks successfully
      punched out.
      
      [ext4 punch hole patch series 4/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      e861304b
    • A
      ext4: punch out extents · d583fb87
      Allison Henderson 提交于
      This patch modifies the truncate routines to support hole punching
      Below is a brief summary of the patches changes:
      
      - Added end param to ext_ext4_rm_leaf
              This function has been modified to accept an end parameter
              which enables it to punch holes in leafs instead of just
              truncating them.
      
      - Implemented the "remove head" case in the ext_remove_blocks routine
              This routine is used by ext_ext4_rm_leaf to remove the tail
              of an extent during a truncate.  The new ext_ext4_rm_leaf
              routine will now also use it to remove the head of an extent in the
              case that the hole covers a region of blocks at the beginning
              of an extent.
      
      - Added "end" param to ext4_ext_remove_space routine
              This function has been modified to accept a stop parameter, which
              is passed through to ext4_ext_rm_leaf.
      
      [ext4 punch hole patch series 3/5 v6] 
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d583fb87
    • A
      ext4: add new function ext4_block_zero_page_range() · 30848851
      Allison Henderson 提交于
      This patch modifies the existing ext4_block_truncate_page() function
      which was used by the truncate code path, and which zeroes out block
      unaligned data, by adding a new length parameter, and renames it to
      ext4_block_zero_page_rage().  This function can now be used to zero out the
      head of a block, the tail of a block, or the middle
      of a block.
      
      The ext4_block_truncate_page() function is now a wrapper to
      ext4_block_zero_page_range().
      
      [ext4 punch hole patch series 2/5 v7] 
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      30848851
    • A
      ext4: add flag to ext4_has_free_blocks · 55f020db
      Allison Henderson 提交于
      This patch adds an allocation request flag to the ext4_has_free_blocks
      function which enables the use of reserved blocks.  This will allow a
      punch hole to proceed even if the disk is full.  Punching a hole may
      require additional blocks to first split the extents.
      
      Because ext4_has_free_blocks is a low level function, the flag needs
      to be passed down through several functions listed below:
      
      ext4_ext_insert_extent
      ext4_ext_create_new_leaf
      ext4_ext_grow_indepth
      ext4_ext_split
      ext4_ext_new_meta_block
      ext4_mb_new_blocks
      ext4_claim_free_blocks
      ext4_has_free_blocks
      
      [ext4 punch hole patch series 1/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      55f020db
    • A
      ext4: reserve inodes and feature code for 'quota' feature · ae812306
      Aditya Kali 提交于
      I am working on patch to add quota as a built-in feature for ext4
      filesystem. The implementation is based on the design given at
      https://ext4.wiki.kernel.org/index.php/Design_For_1st_Class_Quota_in_Ext4.
      This patch reserves the inode numbers 3 and 4 for quota purposes and
      also reserves EXT4_FEATURE_RO_COMPAT_QUOTA feature code.
      Signed-off-by: NAditya Kali <adityakali@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ae812306
    • J
      ext4: add support for multiple mount protection · c5e06d10
      Johann Lombardi 提交于
      Prevent an ext4 filesystem from being mounted multiple times.
      A sequence number is stored on disk and is periodically updated (every 5
      seconds by default) by a mounted filesystem.
      At mount time, we now wait for s_mmp_update_interval seconds to make sure
      that the MMP sequence does not change.
      In case of failure, the nodename, bdevname and the time at which the MMP
      block was last updated is displayed.
      Signed-off-by: NAndreas Dilger <adilger@whamcloud.com>
      Signed-off-by: NJohann Lombardi <johann@whamcloud.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c5e06d10
    • K
      ext4: ensure f_bfree returned by ext4_statfs() is non-negative · d02a9391
      Kazuya Mio 提交于
      I found the issue that the number of free blocks went negative.
      # stat -f /mnt/mp1/
        File: "/mnt/mp1/"
          ID: e175ccb83a872efe Namelen: 255     Type: ext2/ext3
      Block size: 4096       Fundamental block size: 4096
      Blocks: Total: 258022     Free: -15        Available: -13122
      Inodes: Total: 65536      Free: 63029
      
      f_bfree in struct statfs will go negative when the filesystem has
      few free blocks. Because the number of dirty blocks is bigger than
      the number of free blocks in the following two cases.
      
      CASE 1:
      ext4_da_writepages
        mpage_da_map_and_submit
          ext4_map_blocks
            ext4_ext_map_blocks
              ext4_mb_new_blocks
                ext4_mb_diskspace_used
                  percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len);
              <--- interrupt statfs systemcall --->
              ext4_da_update_reserve_space
                  percpu_counter_sub(&sbi->s_dirtyblocks_counter,
                                  used + ei->i_allocated_meta_blocks);
      
      CASE 2:
      ext4_write_begin
        __block_write_begin
          ext4_map_blocks
            ext4_ext_map_blocks
              ext4_mb_new_blocks
                ext4_mb_diskspace_used
                  percpu_counter_sub(&sbi->s_freeblocks_counter, ac->ac_b_ex.fe_len);
                  <--- interrupt statfs systemcall --->
                  percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
      
      To avoid the issue, this patch ensures that f_bfree is non-negative.
      Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      d02a9391
    • L
      ext4: protect bb_first_free in ext4_trim_all_free() with group lock · 28739eea
      Lukas Czerner 提交于
      We should protect reading bd_info->bb_first_free with the group lock
      because otherwise we might miss some free blocks. This is not a big deal
      at all, but the change to do right thing is really simple, so lets do
      that.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      28739eea
    • L
      ext4: only load buddy bitmap in ext4_trim_fs() when it is needed · 78944086
      Lukas Czerner 提交于
      Currently we are loading buddy ext4_mb_load_buddy() for every block
      group we are going through in ext4_trim_fs() in many cases just to find
      out that there is not enough space to be bothered with. As Amir Goldstein
      suggested we can use bb_free information directly from ext4_group_info.
      
      This commit removes ext4_mb_load_buddy() from ext4_trim_fs() and rather
      get the ext4_group_info via ext4_get_group_info() and use the bb_free
      information directly from that. This avoids unnecessary call to load
      buddy in the case the group does not have enough free space to trim.
      Loading buddy is now moved to ext4_trim_all_free().
      
      Tested by me with xfstests 251.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      78944086
    • E
      jbd2: Fix comment to match the code in jbd2__journal_start() · c867516d
      Eryu Guan 提交于
      jbd2__journal_start() returns an ERR_PTR() value rather than NULL on
      failure.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c867516d
    • J
      ext4: fix waiting and sending of a barrier in ext4_sync_file() · 93628ffb
      Jan Kara 提交于
      jbd2_log_start_commit() returns 1 only when we really start a
      transaction.  But we also need to wait for a transaction when the
      commit is already running.  Fix this problem by waiting for
      transaction commit unconditionally (which is just a quick check if the
      transaction is already committed).
      
      Also we have to be more careful with sending of a barrier because when
      transaction is being committed in parallel to ext4_sync_file()
      running, we cannot be sure that the barrier the journalling code sends
      happens after we wrote all the data for fsync (note that not every
      data writeout needs to trigger metadata changes thus commit of some
      metadata changes can be running while other data is still written
      out). So use jbd2_will_send_data_barrier() helper to detect the common
      cases when we can be sure barrier will be issued by the commit code
      and issue the barrier ourselves in the remaining cases.
      Reported-by: NEdward Goggin <egoggin@vmware.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      93628ffb
  3. 24 5月, 2011 4 次提交
    • J
      jbd2: Add function jbd2_trans_will_send_data_barrier() · bbd2be36
      Jan Kara 提交于
      Provide a function which returns whether a transaction with given tid
      will send a flush to the filesystem device.  The function will be used
      by ext4 to detect whether fsync needs to send a separate flush or not.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bbd2be36
    • J
      jbd2: fix sending of data flush on journal commit · 81be12c8
      Jan Kara 提交于
      
      In data=ordered mode, it's theoretically possible (however rare) that
      an inode is filed to transaction's t_inode_list and a flusher thread
      writes all the data and inode is reclaimed before the transaction
      starts to commit.  In such a case, we could erroneously omit sending a
      flush to file system device when it is different from the journal
      device (because data can still be in disk cache only).
      
      Fix the problem by setting a flag in a transaction when some inode is added
      to it and then send disk flush in the commit code when the flag is set.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      81be12c8
    • Y
      ext4: fix ext4_ext_fiemap_cb() to handle blocks before request range correctly · b221349f
      Yongqiang Yang 提交于
      To get delayed-extent information, ext4_ext_fiemap_cb() looks up
      pagecache, it thus collects information starting from a page's
      head block.
      
      If blocksize < pagesize, the beginning blocks of a page may lies
      before the request range. So ext4_ext_fiemap_cb() should proceed
      ignoring them, because they has been handled before. If no mapped
      buffer in the range is found in the 1st page, we need to look up
      the 2nd page, otherwise delayed-extents after a hole will be ignored.
      
      Without this patch, xfstests 225 will hung on ext4 with 1K block.
      Reported-by: NAmir Goldstein <amir73il@users.sourceforge.net>
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b221349f
    • T
      ext4: use truncate_setsize() unconditionally · 072bd7ea
      Theodore Ts'o 提交于
      In commit c8d46e41 (ext4: Add flag to files with blocks intentionally
      past EOF), if the EOFBLOCKS_FL flag is set, we call ext4_truncate()
      before calling vmtruncate().  This caused any allocated but unwritten
      blocks created by calling fallocate() with the FALLOC_FL_KEEP_SIZE
      flag to be dropped.  This was done to make to make sure that
      EOFBLOCKS_FL would not be cleared while still leaving blocks past
      i_size allocated.  This was not necessary, since ext4_truncate()
      guarantees that blocks past i_size will be dropped, even in the case
      where truncate() has increased i_size before calling ext4_truncate().
      
      So fix this by removing the EOFBLOCKS_FL special case treatment in
      ext4_setattr().  In addition, use truncate_setsize() followed by a
      call to ext4_truncate() instead of using vmtruncate().  This is more
      efficient since it skips the call to inode_newsize_ok(), which has
      been checked already by inode_change_ok().  This is also in a win in
      the case where EOFBLOCKS_FL is set since it avoids calling
      ext4_truncate() twice.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      072bd7ea
  4. 23 5月, 2011 5 次提交
  5. 21 5月, 2011 4 次提交
  6. 19 5月, 2011 3 次提交
  7. 16 5月, 2011 2 次提交
  8. 15 5月, 2011 1 次提交
  9. 10 5月, 2011 4 次提交
  10. 09 5月, 2011 3 次提交