1. 11 3月, 2013 1 次提交
    • D
      ext4: add self-testing infrastructure to do a sanity check · 921f266b
      Dmitry Monakhov 提交于
      This commit adds a self-testing infrastructure like extent tree does to
      do a sanity check for extent status tree.  After status tree is as a
      extent cache, we'd better to make sure that it caches right result.
      
      After applied this commit, we will get a lot of messages when we run
      xfstests as below.
      
      ...
      kernel: ES len assertation failed for inode: 230 retval 1 != map->m_len
      3 in ext4_map_blocks (allocation)
      ...
      kernel: ES cache assertation failed for inode: 230 es_cached ex
      [974/2/4781/20] != found ex [974/1/4781/1000]
      ...
      kernel: ES insert assertation failed for inode: 635 ex_status
      [0/45/21388/w] != es_status [44/1/21432/u]
      ...
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      921f266b
  2. 18 2月, 2013 4 次提交
    • Z
      ext4: lookup block mapping in extent status tree · d100eef2
      Zheng Liu 提交于
      After tracking all extent status, we already have a extent cache in
      memory.  Every time we want to lookup a block mapping, we can first
      try to lookup it in extent status tree to avoid a potential disk I/O.
      
      A new function called ext4_es_lookup_extent is defined to finish this
      work.  When we try to lookup a block mapping, we always call
      ext4_map_blocks and/or ext4_da_map_blocks.  So in these functions we
      first try to lookup a block mapping in extent status tree.
      
      A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks
      in order not to put a hole into extent status tree because this hole
      will be converted to delayed extent in the tree immediately.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan kara <jack@suse.cz>
      d100eef2
    • Z
      ext4: track all extent status in extent status tree · f7fec032
      Zheng Liu 提交于
      By recording the phycisal block and status, extent status tree is able
      to track the status of every extents.  When we call _map_blocks
      functions to lookup an extent or create a new written/unwritten/delayed
      extent, this extent will be inserted into extent status tree.
      
      We don't load all extents from disk in alloc_inode() because it costs
      too much memory, and if a file is opened and closed frequently it will
      takes too much time to load all extent information.  So currently when
      we create/lookup an extent, this extent will be inserted into extent
      status tree.  Hence, the extent status tree may not comprehensively
      contain all of the extents found in the file.
      
      Here a condition we need to take care is that an extent might contains
      unwritten and delayed status simultaneously because an extent is delayed
      allocated and could be allocated by fallocate.  At this time we need to
      keep delayed status because later we need to update delayed reservation
      space using it.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan kara <jack@suse.cz>
      f7fec032
    • Z
      ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag · a25a4e1a
      Zheng Liu 提交于
      This commit lets ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag
      because in later commit ext4_map_blocks needs to use this flag to
      determine the extent status.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      a25a4e1a
    • Z
      ext4: add physical block and status member into extent status tree · fdc0212e
      Zheng Liu 提交于
      This commit adds two members in extent_status structure to let it record
      physical block and extent status.  Here es_pblk is used to record both
      of them because physical block only has 48 bits.  So extent status could
      be stashed into it so that we can save some memory.  Now written,
      unwritten, delayed and hole are defined as status.
      
      Due to new member is added into extent status tree, all interfaces need
      to be adjusted.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      fdc0212e
  3. 15 2月, 2013 3 次提交
  4. 09 2月, 2013 2 次提交
    • T
      ext4: grab page before starting transaction handle in write_begin() · 47564bfb
      Theodore Ts'o 提交于
      The grab_cache_page_write_begin() function can potentially sleep for a
      long time, since it may need to do memory allocation which can block
      if the system is under significant memory pressure, and because it may
      be blocked on page writeback.  If it does take a long time to grab the
      page, it's better that we not hold an active jbd2 handle.
      
      So grab a handle on the page first, and _then_ start the transaction
      handle.
      
      This commit fixes the following long transaction handle hold time:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      47564bfb
    • T
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o 提交于
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  5. 30 1月, 2013 1 次提交
  6. 29 1月, 2013 3 次提交
    • J
      ext4: fix ext4_writepage() to achieve data=ordered guarantees · fe386132
      Jan Kara 提交于
      So far ext4_writepage() skipped writing pages that had any delayed or
      unwritten buffers attached. When blocksize < pagesize this breaks
      data=ordered mode guarantees as we can have a page with one freshly
      allocated buffer whose allocation is part of the committing
      transaction and another buffer in the page which is delayed or
      unwritten. So fix this problem by calling ext4_bio_writepage()
      anyway. It will submit mapped buffers and leave others alone.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fe386132
    • J
      ext4: simplify mpage_add_bh_to_extent() · b6a8e62f
      Jan Kara 提交于
      The argument b_size of mpage_add_bh_to_extent() was bogus since it was
      always == blocksize (which we can easily derive from inode->i_blkbits).
      Also second branch of condition:
      	if (nrblocks >= EXT4_MAX_TRANS_DATA) {
      	} else if ((nrblocks + (b_size >> mpd->inode->i_blkbits)) >
      						EXT4_MAX_TRANS_DATA) {
      	}
      was never taken because (b_size >> mpd->inode->i_blkbits) == 1.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b6a8e62f
    • J
      ext4: dirty page has always buffers attached · f8bec370
      Jan Kara 提交于
      ext4_writepage(), write_cache_pages_da(), and mpage_da_submit_io()
      doesn't have to deal with the case when page doesn't have buffers. We
      attach buffers to a page in ->write_begin() and ->page_mkwrite() which
      covers all places where a page can become dirty.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f8bec370
  7. 28 1月, 2013 3 次提交
  8. 17 1月, 2013 1 次提交
  9. 13 1月, 2013 2 次提交
  10. 26 12月, 2012 2 次提交
    • J
      ext4: fix deadlock in journal_unmap_buffer() · 53e87268
      Jan Kara 提交于
      We cannot wait for transaction commit in journal_unmap_buffer()
      because we hold page lock which ranks below transaction start.  We
      solve the issue by bailing out of journal_unmap_buffer() and
      jbd2_journal_invalidatepage() with -EBUSY.  Caller is then responsible
      for waiting for transaction commit to finish and try invalidation
      again. Since the issue can happen only for page stradding i_size, it
      is simple enough to manually call jbd2_journal_invalidatepage() for
      such page from ext4_setattr(), check the return value and wait if
      necessary.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      53e87268
    • J
      ext4: split off ext4_journalled_invalidatepage() · 4520fb3c
      Jan Kara 提交于
      In data=journal mode we don't need delalloc or DIO handling in invalidatepage
      and similarly in other modes we don't need the journal handling. So split
      invalidatepage implementations.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4520fb3c
  11. 11 12月, 2012 6 次提交
  12. 03 12月, 2012 1 次提交
  13. 30 11月, 2012 1 次提交
  14. 16 11月, 2012 1 次提交
    • T
      ext4: remove calls to ext4_jbd2_file_inode() from delalloc write path · f3b59291
      Theodore Ts'o 提交于
      The calls to ext4_jbd2_file_inode() are needed to guarantee that we do
      not expose stale data in the data=ordered mode.  However, they are not
      necessary because in all of the cases where we have newly allocated
      blocks in the delayed allocation write path, we immediately submit the
      dirty pages for I/O.  Hence, we can avoid the overhead of adding the
      inode to the list of inodes whose data pages will be to be flushed out
      to disk completely during the next commit operation.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f3b59291
  15. 15 11月, 2012 1 次提交
  16. 09 11月, 2012 3 次提交
  17. 01 10月, 2012 1 次提交
    • T
      ext4: fix mtime update in nodelalloc mode · 041bbb6d
      Theodore Ts'o 提交于
      Commits 5e8830dc and 41c4d25f introduced a regression into
      v3.6-rc1 for ext4 in nodealloc mode, such that mtime updates would not
      take place for files modified via mmap if the page was already in the
      page cache.  This would also affect ext3 file systems mounted using
      the ext4 file system driver.
      
      The problem was that ext4_page_mkwrite() had a shortcut which would
      avoid calling __block_page_mkwrite() under some circumstances, and the
      above two commit transferred the responsibility of calling
      file_update_time() to __block_page_mkwrite --- which woudln't get
      called in some circumstances.
      
      Since __block_page_mkwrite() only has three callers,
      block_page_mkwrite(), ext4_page_mkwrite, and nilfs_page_mkwrite(), the
      best way to solve this is to move the responsibility for calling
      file_update_time() to its caller.
      
      This problem was found via xfstests #215 with a file system mounted
      with -o nodelalloc.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: stable@vger.kernel.org
      041bbb6d
  18. 29 9月, 2012 4 次提交