1. 05 6月, 2013 11 次提交
    • J
      ext4: defer clearing of PageWriteback after extent conversion · b0857d30
      Jan Kara 提交于
      Currently PageWriteback bit gets cleared from put_io_page() called
      from ext4_end_bio().  This is somewhat inconvenient as extent tree is
      not fully updated at that time (unwritten extents are not marked as
      written) so we cannot read the data back yet.  This design was
      dictated by lock ordering as we cannot start a transaction while
      PageWriteback bit is set (we could easily deadlock with
      ext4_da_writepages()).  But now that we use transaction reservation
      for extent conversion, locking issues are solved and we can move
      PageWriteback bit clearing after extent conversion is done.  As a
      result we can remove wait for unwritten extent conversion from
      ext4_sync_file() because it already implicitely happens through
      wait_on_page_writeback().
      
      We implement deferring of PageWriteback clearing by queueing completed
      bios to appropriate io_end and processing all the pages when io_end is
      going to be freed instead of at the moment ext4_io_end() is called.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b0857d30
    • J
      ext4: split extent conversion lists to reserved & unreserved parts · 2e8fa54e
      Jan Kara 提交于
      Now that we have extent conversions with reserved transaction, we have
      to prevent extent conversions without reserved transaction (from DIO
      code) to block these (as that would effectively void any transaction
      reservation we did).  So split lists, work items, and work queues to
      reserved and unreserved parts.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2e8fa54e
    • J
      ext4: use transaction reservation for extent conversion in ext4_end_io · 6b523df4
      Jan Kara 提交于
      Later we would like to clear PageWriteback bit only after extent
      conversion from unwritten to written extents is performed.  However it
      is not possible to start a transaction after PageWriteback is set
      because that violates lock ordering (and is easy to deadlock).  So we
      have to reserve a transaction before locking pages and sending them
      for IO and later we use the transaction for extent conversion from
      ext4_end_io().
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6b523df4
    • J
      ext4: remove buffer_uninit handling · 3613d228
      Jan Kara 提交于
      There isn't any need for setting BH_Uninit on buffers anymore.  It was
      only used to signal we need to mark io_end as needing extent
      conversion in add_bh_to_extent() but now we can mark the io_end
      directly when mapping extent.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      3613d228
    • J
      ext4: restructure writeback path · 4e7ea81d
      Jan Kara 提交于
      There are two issues with current writeback path in ext4.  For one we
      don't necessarily map complete pages when blocksize < pagesize and
      thus needn't do any writeback in one iteration.  We always map some
      blocks though so we will eventually finish mapping the page.  Just if
      writeback races with other operations on the file, forward progress is
      not really guaranteed. The second problem is that current code
      structure makes it hard to associate all the bios to some range of
      pages with one io_end structure so that unwritten extents can be
      converted after all the bios are finished.  This will be especially
      difficult later when io_end will be associated with reserved
      transaction handle.
      
      We restructure the writeback path to a relatively simple loop which
      first prepares extent of pages, then maps one or more extents so that
      no page is partially mapped, and once page is fully mapped it is
      submitted for IO. We keep all the mapping and IO submission
      information in mpage_da_data structure to somewhat reduce stack usage.
      Resulting code is somewhat shorter than the old one and hopefully also
      easier to read.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4e7ea81d
    • J
      ext4: better estimate credits needed for ext4_da_writepages() · fffb2739
      Jan Kara 提交于
      We limit the number of blocks written in a single loop of
      ext4_da_writepages() to 64 when inode uses indirect blocks.  That is
      unnecessary as credit estimates for mapping logically continguous run
      of blocks is rather low even for inode with indirect blocks.  So just
      lift this limitation and properly calculate the number of necessary
      credits.
      
      This better credit estimate will also later allow us to always write
      at least a single page in one iteration.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fffb2739
    • J
      ext4: improve writepage credit estimate for files with indirect blocks · fa55a0ed
      Jan Kara 提交于
      ext4_ind_trans_blocks() wrongly used 'chunk' argument to decide whether
      blocks mapped are logically contiguous. That is wrong since the argument
      informs whether the blocks are physically contiguous. As the blocks
      mapped are always logically contiguous and that's all
      ext4_ind_trans_blocks() cares about, just remove the 'chunk' argument.
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fa55a0ed
    • J
      ext4: deprecate max_writeback_mb_bump sysfs attribute · f2d50a65
      Jan Kara 提交于
      This attribute is now unused so deprecate it.  We still show the old
      default value to keep some compatibility but we don't allow writing to
      that attribute anymore.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f2d50a65
    • J
      ext4: stop messing with nr_to_write in ext4_da_writepages() · 39bba40b
      Jan Kara 提交于
      Writeback code got better in how it submits IO and now the number of
      pages requested to be written is usually higher than original 1024.
      The number is now dynamically computed based on observed throughput
      and is set to be about 0.5 s worth of writeback.  E.g. on ordinary
      SATA drive this ends up somewhere around 10000 as my testing shows.
      So remove the unnecessary smarts from ext4_da_writepages().
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      39bba40b
    • J
      ext4: provide wrappers for transaction reservation calls · 5fe2fe89
      Jan Kara 提交于
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5fe2fe89
    • J
      jbd2: transaction reservation support · 8f7d89f3
      Jan Kara 提交于
      In some cases we cannot start a transaction because of locking
      constraints and passing started transaction into those places is not
      handy either because we could block transaction commit for too long.
      Transaction reservation is designed to solve these issues.  It
      reserves a handle with given number of credits in the journal and the
      handle can be later attached to the running transaction without
      blocking on commit or checkpointing.  Reserved handles do not block
      transaction commit in any way, they only reduce maximum size of the
      running transaction (because we have to always be prepared to
      accomodate request for attaching reserved handle).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8f7d89f3
  2. 04 6月, 2013 1 次提交
    • J
      ext4: use io_end for multiple bios · 97a851ed
      Jan Kara 提交于
      Change writeback path to create just one io_end structure for the
      extent to which we submit IO and share it among bios writing that
      extent. This prevents needless splitting and joining of unwritten
      extents when they cannot be submitted as a single bio.
      
      Bugs in ENOMEM handling found by Linux File System Verification project
      (linuxtesting.org) and fixed by Alexey Khoroshilov
      <khoroshilov@ispras.ru>.
      
      CC: Alexey Khoroshilov <khoroshilov@ispras.ru>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      97a851ed
  3. 01 6月, 2013 4 次提交
  4. 28 5月, 2013 10 次提交
    • P
      ext4: suppress ext4 orphan messages on mount · 566370a2
      Paul Taysom 提交于
      Suppress the messages releating to processing the ext4 orphan list
      ("truncating inode" and "deleting unreferenced inode") unless the
      debug option is on, since otherwise they end up taking up space in the
      log that could be used for more useful information.
      
      Tested by opening several files, unlinking them, then
      crashing the system, rebooting the system and examining
      /var/log/messages.
      
      Addresses the problem described in http://crbug.com/220976Signed-off-by: NPaul Taysom <taysom@chromium.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      566370a2
    • L
      ext4: make punch hole code path work with bigalloc · d23142c6
      Lukas Czerner 提交于
      Currently punch hole is disabled in file systems with bigalloc
      feature enabled. However the recent changes in punch hole patch should
      make it easier to support punching holes on bigalloc enabled file
      systems.
      
      This commit changes partial_cluster handling in ext4_remove_blocks(),
      ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
      partial_cluster is unsigned long long type and it makes sure that we
      will free the partial cluster if all extents has been released from that
      cluster. However it has been specifically designed only for truncate.
      
      With punch hole we can be freeing just some extents in the cluster
      leaving the rest untouched. So we have to make sure that we will notice
      cluster which still has some extents. To do this I've changed
      partial_cluster to be signed long long type. The only scenario where
      this could be a problem is when cluster_size == block size, however in
      that case there would not be any partial clusters so we're safe. For
      bigger clusters the signed type is enough. Now we use the negative value
      in partial_cluster to mark such cluster used, hence we know that we must
      not free it even if all other extents has been freed from such cluster.
      
      This scenario can be described in simple diagram:
      
      |FFF...FF..FF.UUU|
       ^----------^
        punch hole
      
      . - free space
      | - cluster boundary
      F - freed extent
      U - used extent
      
      Also update respective tracepoints to use signed long long type for
      partial_cluster.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d23142c6
    • L
      ext4: update ext4_ext_remove_space trace point · 61801325
      Lukas Czerner 提交于
      Add "end" variable.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      61801325
    • L
      ext4: remove unused code from ext4_remove_blocks() · 78fb9cdf
      Lukas Czerner 提交于
      The "head removal" branch in the condition is never used in any code
      path in ext4 since the function only caller ext4_ext_rm_leaf() will make
      sure that the extent is properly split before removing blocks. Note that
      there is a bug in this branch anyway.
      
      This commit removes the unused code completely and makes use of
      ext4_error() instead of printk if dubious range is provided.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      78fb9cdf
    • L
      ext4: remove unused discard_partial_page_buffers · c121ffd0
      Lukas Czerner 提交于
      The discard_partial_page_buffers is no longer used anywhere so we can
      simply remove it including the *_no_lock variant and
      EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c121ffd0
    • L
      ext4: use ext4_zero_partial_blocks in punch_hole · a87dd18c
      Lukas Czerner 提交于
      We're doing to get rid of ext4_discard_partial_page_buffers() since it is
      duplicating some code and also partially duplicating work of
      truncate_pagecache_range(), moreover the old implementation was much
      clearer.
      
      Now when the truncate_inode_pages_range() can handle truncating non page
      aligned regions we can use this to invalidate and zero out block aligned
      region of the punched out range and then use ext4_block_truncate_page()
      to zero the unaligned blocks on the start and end of the range. This
      will greatly simplify the punch hole code. Moreover after this commit we
      can get rid of the ext4_discard_partial_page_buffers() completely.
      
      We also introduce function ext4_prepare_punch_hole() to do come common
      operations before we attempt to do the actual punch hole on
      indirect or extent file which saves us some code duplication.
      
      This has been tested on ppc64 with 1k block size with fsx and xfstests
      without any problems.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      a87dd18c
    • L
      ext4: truncate_inode_pages() in orphan cleanup path · 55f252c9
      Lukas Czerner 提交于
      Currently we do not tell mm to zero out tail of the page before truncate
      in orphan_cleanup(). This is ok, because the page should not be
      uptodate, however this may eventually change and I might cause problems.
      
      Call truncate_inode_pages() as precautionary measure. Thanks Jan Kara
      for pointing this out.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      55f252c9
    • L
      Revert "ext4: fix fsx truncate failure" · eb3544c6
      Lukas Czerner 提交于
      This reverts commit 189e868f.
      
      This commit reintroduces the use of ext4_block_truncate_page() in ext4
      truncate operation instead of ext4_discard_partial_page_buffers().
      
      The statement in the commit description that the truncate operation only
      zero block unaligned portion of the last page is not exactly right,
      since truncate_pagecache_range() also zeroes and invalidate the unaligned
      portion of the page. Then there is no need to zero and unmap it once more
      and ext4_block_truncate_page() was doing the right job, although we
      still need to update the buffer head containing the last block, which is
      exactly what ext4_block_truncate_page() is doing.
      
      Moreover the problem described in the commit is fixed more properly with
      commit
      
      15291164
      	jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer
      
      This was tested on ppc64 machine with block size of 1024 bytes without
      any problems.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      eb3544c6
    • L
      ext4: Call ext4_jbd2_file_inode() after zeroing block · 0713ed0c
      Lukas Czerner 提交于
      In data=ordered mode we should call ext4_jbd2_file_inode() so that crash
      after the truncate transaction has committed does not expose stall data
      in the tail of the block.
      
      Thanks Jan Kara for pointing that out.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      0713ed0c
    • L
      Revert "ext4: remove no longer used functions in inode.c" · d863dc36
      Lukas Czerner 提交于
      This reverts commit ccb4d7af.
      
      This commit reintroduces functions ext4_block_truncate_page() and
      ext4_block_zero_page_range() which has been previously removed in favour
      of ext4_discard_partial_page_buffers().
      
      In future commits we want to reintroduce those function and remove
      ext4_discard_partial_page_buffers() since it is duplicating some code
      and also partially duplicating work of truncate_pagecache_range(),
      moreover the old implementation was much clearer.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      d863dc36
  5. 22 5月, 2013 3 次提交
    • L
      ext4: use ->invalidatepage() length argument · ca99fdd2
      Lukas Czerner 提交于
      ->invalidatepage() aop now accepts range to invalidate so we can make
      use of it in all ext4 invalidatepage routines.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      ca99fdd2
    • L
      jbd2: change jbd2_journal_invalidatepage to accept length · 259709b0
      Lukas Czerner 提交于
      invalidatepage now accepts range to invalidate and there are two file
      system using jbd2 also implementing punch hole feature which can benefit
      from this. We need to implement the same thing for jbd2 layer in order to
      allow those file system take benefit of this functionality.
      
      This commit adds length argument to the jbd2_journal_invalidatepage()
      and updates all instances in ext4 and ocfs2.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      259709b0
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  6. 12 5月, 2013 1 次提交
  7. 08 5月, 2013 1 次提交
  8. 07 5月, 2013 1 次提交
  9. 06 5月, 2013 1 次提交
    • L
      ext4: limit group search loop for non-extent files · e6155736
      Lachlan McIlroy 提交于
      In the case where we are allocating for a non-extent file,
      we must limit the groups we allocate from to those below
      2^32 blocks, and ext4_mb_regular_allocator() attempts to
      do this initially by putting a cap on ngroups for the
      subsequent search loop.
      
      However, the initial target group comes in from the 
      allocation context (ac), and it may already be beyond
      the artificially limited ngroups.  In this case,
      the limit
      
      	if (group == ngroups)
      		group = 0;
      
      at the top of the loop is never true, and the loop will
      run away.
      
      Catch this case inside the loop and reset the search to
      start at group 0.
      
      [sandeen@redhat.com: add commit msg & comments]
      Signed-off-by: NLachlan McIlroy <lmcilroy@redhat.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      e6155736
  10. 03 5月, 2013 1 次提交
    • Y
      ext4: fix fio regression · e30b5dca
      Yan, Zheng 提交于
      We (Linux Kernel Performance project) found a regression introduced
      by commit:
      
        f7fec032 ext4: track all extent status in extent status tree
      
      The commit causes about 20% performance decrease in fio random write
      test. Profiler shows that rb_next() uses a lot of CPU time. The call
      stack is:
      
        rb_next
        ext4_es_find_delayed_extent
        ext4_map_blocks
        _ext4_get_block
        ext4_get_block_write
        __blockdev_direct_IO
        ext4_direct_IO
        generic_file_direct_write
        __generic_file_aio_write
        ext4_file_write
        aio_rw_vect_retry
        aio_run_iocb
        do_io_submit
        sys_io_submit
        system_call_fastpath
        io_submit
        td_io_getevents
        io_u_queued_complete
        thread_main
        main
        __libc_start_main
      
      The cause is that ext4_es_find_delayed_extent() doesn't have an
      upper bound, it keeps searching until a delayed extent is found.
      When there are a lots of non-delayed entries in the extent state
      tree, ext4_es_find_delayed_extent() may uses a lot of CPU time.
      Reported-by: NLKP project <lkp@linux.intel.com>
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      e30b5dca
  11. 23 4月, 2013 1 次提交
  12. 22 4月, 2013 4 次提交
  13. 21 4月, 2013 1 次提交