1. 05 6月, 2013 6 次提交
  2. 04 6月, 2013 1 次提交
    • J
      ext4: use io_end for multiple bios · 97a851ed
      Jan Kara 提交于
      Change writeback path to create just one io_end structure for the
      extent to which we submit IO and share it among bios writing that
      extent. This prevents needless splitting and joining of unwritten
      extents when they cannot be submitted as a single bio.
      
      Bugs in ENOMEM handling found by Linux File System Verification project
      (linuxtesting.org) and fixed by Alexey Khoroshilov
      <khoroshilov@ispras.ru>.
      
      CC: Alexey Khoroshilov <khoroshilov@ispras.ru>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      97a851ed
  3. 01 6月, 2013 4 次提交
  4. 28 5月, 2013 10 次提交
    • P
      ext4: suppress ext4 orphan messages on mount · 566370a2
      Paul Taysom 提交于
      Suppress the messages releating to processing the ext4 orphan list
      ("truncating inode" and "deleting unreferenced inode") unless the
      debug option is on, since otherwise they end up taking up space in the
      log that could be used for more useful information.
      
      Tested by opening several files, unlinking them, then
      crashing the system, rebooting the system and examining
      /var/log/messages.
      
      Addresses the problem described in http://crbug.com/220976Signed-off-by: NPaul Taysom <taysom@chromium.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      566370a2
    • L
      ext4: make punch hole code path work with bigalloc · d23142c6
      Lukas Czerner 提交于
      Currently punch hole is disabled in file systems with bigalloc
      feature enabled. However the recent changes in punch hole patch should
      make it easier to support punching holes on bigalloc enabled file
      systems.
      
      This commit changes partial_cluster handling in ext4_remove_blocks(),
      ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
      partial_cluster is unsigned long long type and it makes sure that we
      will free the partial cluster if all extents has been released from that
      cluster. However it has been specifically designed only for truncate.
      
      With punch hole we can be freeing just some extents in the cluster
      leaving the rest untouched. So we have to make sure that we will notice
      cluster which still has some extents. To do this I've changed
      partial_cluster to be signed long long type. The only scenario where
      this could be a problem is when cluster_size == block size, however in
      that case there would not be any partial clusters so we're safe. For
      bigger clusters the signed type is enough. Now we use the negative value
      in partial_cluster to mark such cluster used, hence we know that we must
      not free it even if all other extents has been freed from such cluster.
      
      This scenario can be described in simple diagram:
      
      |FFF...FF..FF.UUU|
       ^----------^
        punch hole
      
      . - free space
      | - cluster boundary
      F - freed extent
      U - used extent
      
      Also update respective tracepoints to use signed long long type for
      partial_cluster.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d23142c6
    • L
      ext4: update ext4_ext_remove_space trace point · 61801325
      Lukas Czerner 提交于
      Add "end" variable.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      61801325
    • L
      ext4: remove unused code from ext4_remove_blocks() · 78fb9cdf
      Lukas Czerner 提交于
      The "head removal" branch in the condition is never used in any code
      path in ext4 since the function only caller ext4_ext_rm_leaf() will make
      sure that the extent is properly split before removing blocks. Note that
      there is a bug in this branch anyway.
      
      This commit removes the unused code completely and makes use of
      ext4_error() instead of printk if dubious range is provided.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      78fb9cdf
    • L
      ext4: remove unused discard_partial_page_buffers · c121ffd0
      Lukas Czerner 提交于
      The discard_partial_page_buffers is no longer used anywhere so we can
      simply remove it including the *_no_lock variant and
      EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c121ffd0
    • L
      ext4: use ext4_zero_partial_blocks in punch_hole · a87dd18c
      Lukas Czerner 提交于
      We're doing to get rid of ext4_discard_partial_page_buffers() since it is
      duplicating some code and also partially duplicating work of
      truncate_pagecache_range(), moreover the old implementation was much
      clearer.
      
      Now when the truncate_inode_pages_range() can handle truncating non page
      aligned regions we can use this to invalidate and zero out block aligned
      region of the punched out range and then use ext4_block_truncate_page()
      to zero the unaligned blocks on the start and end of the range. This
      will greatly simplify the punch hole code. Moreover after this commit we
      can get rid of the ext4_discard_partial_page_buffers() completely.
      
      We also introduce function ext4_prepare_punch_hole() to do come common
      operations before we attempt to do the actual punch hole on
      indirect or extent file which saves us some code duplication.
      
      This has been tested on ppc64 with 1k block size with fsx and xfstests
      without any problems.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      a87dd18c
    • L
      ext4: truncate_inode_pages() in orphan cleanup path · 55f252c9
      Lukas Czerner 提交于
      Currently we do not tell mm to zero out tail of the page before truncate
      in orphan_cleanup(). This is ok, because the page should not be
      uptodate, however this may eventually change and I might cause problems.
      
      Call truncate_inode_pages() as precautionary measure. Thanks Jan Kara
      for pointing this out.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      55f252c9
    • L
      Revert "ext4: fix fsx truncate failure" · eb3544c6
      Lukas Czerner 提交于
      This reverts commit 189e868f.
      
      This commit reintroduces the use of ext4_block_truncate_page() in ext4
      truncate operation instead of ext4_discard_partial_page_buffers().
      
      The statement in the commit description that the truncate operation only
      zero block unaligned portion of the last page is not exactly right,
      since truncate_pagecache_range() also zeroes and invalidate the unaligned
      portion of the page. Then there is no need to zero and unmap it once more
      and ext4_block_truncate_page() was doing the right job, although we
      still need to update the buffer head containing the last block, which is
      exactly what ext4_block_truncate_page() is doing.
      
      Moreover the problem described in the commit is fixed more properly with
      commit
      
      15291164
      	jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer
      
      This was tested on ppc64 machine with block size of 1024 bytes without
      any problems.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      eb3544c6
    • L
      ext4: Call ext4_jbd2_file_inode() after zeroing block · 0713ed0c
      Lukas Czerner 提交于
      In data=ordered mode we should call ext4_jbd2_file_inode() so that crash
      after the truncate transaction has committed does not expose stall data
      in the tail of the block.
      
      Thanks Jan Kara for pointing that out.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0713ed0c
    • L
      Revert "ext4: remove no longer used functions in inode.c" · d863dc36
      Lukas Czerner 提交于
      This reverts commit ccb4d7af.
      
      This commit reintroduces functions ext4_block_truncate_page() and
      ext4_block_zero_page_range() which has been previously removed in favour
      of ext4_discard_partial_page_buffers().
      
      In future commits we want to reintroduce those function and remove
      ext4_discard_partial_page_buffers() since it is duplicating some code
      and also partially duplicating work of truncate_pagecache_range(),
      moreover the old implementation was much clearer.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d863dc36
  5. 22 5月, 2013 3 次提交
    • L
      ext4: use ->invalidatepage() length argument · ca99fdd2
      Lukas Czerner 提交于
      ->invalidatepage() aop now accepts range to invalidate so we can make
      use of it in all ext4 invalidatepage routines.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      ca99fdd2
    • L
      jbd2: change jbd2_journal_invalidatepage to accept length · 259709b0
      Lukas Czerner 提交于
      invalidatepage now accepts range to invalidate and there are two file
      system using jbd2 also implementing punch hole feature which can benefit
      from this. We need to implement the same thing for jbd2 layer in order to
      allow those file system take benefit of this functionality.
      
      This commit adds length argument to the jbd2_journal_invalidatepage()
      and updates all instances in ext4 and ocfs2.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      259709b0
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  6. 12 5月, 2013 1 次提交
  7. 08 5月, 2013 1 次提交
  8. 07 5月, 2013 1 次提交
  9. 06 5月, 2013 1 次提交
    • L
      ext4: limit group search loop for non-extent files · e6155736
      Lachlan McIlroy 提交于
      In the case where we are allocating for a non-extent file,
      we must limit the groups we allocate from to those below
      2^32 blocks, and ext4_mb_regular_allocator() attempts to
      do this initially by putting a cap on ngroups for the
      subsequent search loop.
      
      However, the initial target group comes in from the 
      allocation context (ac), and it may already be beyond
      the artificially limited ngroups.  In this case,
      the limit
      
      	if (group == ngroups)
      		group = 0;
      
      at the top of the loop is never true, and the loop will
      run away.
      
      Catch this case inside the loop and reset the search to
      start at group 0.
      
      [sandeen@redhat.com: add commit msg & comments]
      Signed-off-by: NLachlan McIlroy <lmcilroy@redhat.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      e6155736
  10. 03 5月, 2013 1 次提交
    • Y
      ext4: fix fio regression · e30b5dca
      Yan, Zheng 提交于
      We (Linux Kernel Performance project) found a regression introduced
      by commit:
      
        f7fec032 ext4: track all extent status in extent status tree
      
      The commit causes about 20% performance decrease in fio random write
      test. Profiler shows that rb_next() uses a lot of CPU time. The call
      stack is:
      
        rb_next
        ext4_es_find_delayed_extent
        ext4_map_blocks
        _ext4_get_block
        ext4_get_block_write
        __blockdev_direct_IO
        ext4_direct_IO
        generic_file_direct_write
        __generic_file_aio_write
        ext4_file_write
        aio_rw_vect_retry
        aio_run_iocb
        do_io_submit
        sys_io_submit
        system_call_fastpath
        io_submit
        td_io_getevents
        io_u_queued_complete
        thread_main
        main
        __libc_start_main
      
      The cause is that ext4_es_find_delayed_extent() doesn't have an
      upper bound, it keeps searching until a delayed extent is found.
      When there are a lots of non-delayed entries in the extent state
      tree, ext4_es_find_delayed_extent() may uses a lot of CPU time.
      Reported-by: NLKP project <lkp@linux.intel.com>
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      e30b5dca
  11. 23 4月, 2013 1 次提交
  12. 22 4月, 2013 4 次提交
  13. 21 4月, 2013 1 次提交
  14. 20 4月, 2013 4 次提交
    • T
      ext4: fix readdir error in case inline_data+^dir_index. · c4d8b023
      Tao Ma 提交于
      Zach reported a problem that if inline data is enabled, we don't
      tell the difference between the offset of '.' and '..'. And a
      getdents will fail if the user only want to get '.'. And what's
      worse, we may meet with duplicate dir entries as the offset
      for inline dir and non-inline one is quite different.
      
      This patch just try to resolve this problem if dir_index
      is disabled. In this case, f_pos is the real offset with
      the dir block, so for inline dir, we just pretend as if
      we are a dir block and returns the offset like a norml
      dir block does.
      Reported-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c4d8b023
    • T
      ext4: fix readdir error in the case of inline_data+dir_index · 8af0f082
      Tao Ma 提交于
      Zach reported a problem that if inline data is enabled, we don't
      tell the difference between the offset of '.' and '..'. And a
      getdents will fail if the user only want to get '.' and what's worse,
      if there is a conversion happens when the user calls getdents
      many times, he/she may get the same entry twice.
      
      In theory, a dir block would also fail if it is converted to a
      hashed-index based dir since f_pos will become a hash value, not the
      real one, but it doesn't happen.  And a deep investigation shows that
      we uses a hash based solution even for a normal dir if the dir_index
      feature is enabled.
      
      So this patch just adds a new htree_inlinedir_to_tree for inline dir,
      and if we find that the hash index is supported, we will do like what
      we do for a dir block.
      Reported-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8af0f082
    • D
    • J
      ext4: move quota initialization out of inode allocation transaction · eb9cc7e1
      Jan Kara 提交于
      Inode allocation transaction is pretty heavy (246 credits with quotas
      and extents before previous patch, still around 200 after it).  This is
      mostly due to credits required for allocation of quota structures
      (credits there are heavily overestimated but it's difficult to make
      better estimates if we don't want to wire non-trivial assumptions about
      quota format into filesystem).
      
      So move quota initialization out of allocation transaction. That way
      transaction for quota structure allocation will be started only if we
      need to look up quota structure on disk (rare) and furthermore it will
      be started for each quota type separately, not for all of them at once.
      This reduces maximum transaction size to 34 is most cases and to 73 in
      the worst case.
      
      [ Modified by tytso to clean up the cleanup paths for error handling.
        Also use a separate call to ext4_std_error() for each failure so it
        is easier for someone who is debugging a problem in this function to
        determine which function call failed. ]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      eb9cc7e1
  15. 19 4月, 2013 1 次提交