1. 10 2月, 2013 1 次提交
    • T
      ext4: fix the number of credits needed for acl ops with inline data · 95eaefbd
      Theodore Ts'o 提交于
      Operations which modify extended attributes may need extra journal
      credits if inline data is used, since there is a chance that some
      extended attributes may need to get pushed to an external attribute
      block.
      
      Changes to reflect this was made in xattr.c, but they were missed in
      fs/ext4/acl.c.  To fix this, abstract the calculation of the number of
      credits needed for xattr operations to an inline function defined in
      ext4_jbd2.h, and use it in acl.c and xattr.c.
      
      Also move the function declarations used in inline.c from xattr.h
      (where they are non-obviously hidden, and caused problems since
      ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
      them to ext4.h.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NTao Ma <boyu.mt@taobao.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      95eaefbd
  2. 09 2月, 2013 1 次提交
    • T
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o 提交于
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  3. 13 1月, 2013 2 次提交
  4. 11 12月, 2012 2 次提交
  5. 05 12月, 2012 1 次提交
  6. 09 11月, 2012 1 次提交
  7. 10 7月, 2012 1 次提交
    • T
      ext4: use s_csum_seed instead of i_csum_seed for xattr block · 41eb70dd
      Tao Ma 提交于
      In xattr block operation, we use h_refcount to indicate whether the
      xattr block is shared among many inodes. And xattr block csum uses
      s_csum_seed if it is shared and i_csum_seed if it belongs to
      one inode. But this has a problem. So consider the block is shared
      first bewteen inode A and B, and B has some xattr update and CoW
      the xattr block. When it updates the *old* xattr block(because
      of the h_refcount change) and calls ext4_xattr_release_block, we
      has no idea that inode A is the real owner of the *old* xattr
      block and we can't use the i_csum_seed of inode A either in xattr
      block csum calculation. And I don't think we have an easy way to
      find inode A.
      
      So this patch just removes the tricky i_csum_seed and we now uses
      s_csum_seed every time for the xattr block csum. The corresponding
      patch for the e2fsprogs will be sent in another patch.
      
      This is spotted by xfstests 117.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NDarrick J. Wong <djwong@us.ibm.com>
      41eb70dd
  8. 30 4月, 2012 1 次提交
  9. 20 3月, 2012 1 次提交
  10. 21 2月, 2012 2 次提交
    • E
      ext4: avoid deadlock on sync-mounted FS w/o journal · c1bb05a6
      Eric Sandeen 提交于
      Processes hang forever on a sync-mounted ext2 file system that
      is mounted with the ext4 module (default in Fedora 16).
      
      I can reproduce this reliably by mounting an ext2 partition with
      "-o sync" and opening a new file an that partition with vim. vim
      will hang in "D" state forever.  The same happens on ext4 without
      a journal.
      
      I am attaching a small patch here that solves this issue for me.
      In the sync mounted case without a journal,
      ext4_handle_dirty_metadata() may call sync_dirty_buffer(), which
      can't be called with buffer lock held.
      
      Also move mb_cache_entry_release inside lock to avoid race
      fixed previously by 8a2bfdcb ext[34]: EA block reference count racing fix
      Note too that ext2 fixed this same problem in 2006 with
      b2f49033 [PATCH] fix deadlock in ext2
      
      Signed-off-by: Martin.Wilck@ts.fujitsu.com
      [sandeen@redhat.com: move mb_cache_entry_release before unlock, edit commit msg]
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c1bb05a6
    • Z
      ext4: remove unneeded variable in ext4_xattr_check_block() · f1b3a2a7
      Zheng Liu 提交于
      We could return directly from ext4_xattr_check_block(). Thus, we
      shouldn't need to define a 'error' variable.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f1b3a2a7
  11. 29 10月, 2011 1 次提交
    • E
      ext4: fix race in xattr block allocation path · 6d6a4351
      Eric Sandeen 提交于
      Ceph users reported that when using Ceph on ext4, the filesystem
      would often become corrupted, containing inodes with incorrect
      i_blocks counters.
      
      I managed to reproduce this with a very hacked-up "streamtest"
      binary from the Ceph tree.
      
      Ceph is doing a lot of xattr writes, to out-of-inode blocks.
      There is also another thread which does sync_file_range and close,
      of the same files.  The problem appears to happen due to this race:
      
      sync/flush thread               xattr-set thread
      -----------------               ----------------
      
      do_writepages                   ext4_xattr_set
      ext4_da_writepages              ext4_xattr_set_handle
      mpage_da_map_blocks             ext4_xattr_block_set
              set DELALLOC_RESERVE
                                      ext4_new_meta_blocks
                                              ext4_mb_new_blocks
                                                      if (!i_delalloc_reserved_flag)
                                                              vfs_dq_alloc_block
      ext4_get_blocks
      	down_write(i_data_sem)
              set i_delalloc_reserved_flag
      	...
      	up_write(i_data_sem)
                                              if (i_delalloc_reserved_flag)
                                                      vfs_dq_alloc_block_nofail
      
      
      In other words, the sync/flush thread pops in and sets
      i_delalloc_reserved_flag on the inode, which makes the xattr thread
      think that it's in a delalloc path in ext4_new_meta_blocks(),
      and add the block for a second time, after already having added
      it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks
      
      The real problem is that we shouldn't be using the DELALLOC_RESERVED
      state flag, and instead we should be passing
      EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of
      using an inode state flag.  We'll fix this for now with using
      i_data_sem to prevent this race, but this is really not the right way
      to fix things.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      6d6a4351
  12. 26 10月, 2011 1 次提交
    • E
      ext4: use ext4_reserve_inode_write in ext4_xattr_set_handle · 66543617
      Eric Sandeen 提交于
      ext4_mark_iloc_dirty() says:
      
       * The caller must have previously called ext4_reserve_inode_write().
       * Give this, we know that the caller already has write access to iloc->bh.
      
      ext4_xattr_set_handle, however, just open-codes it.  May as well use
      the helper function for consistency.
      
      No bug here, just tidiness.
      
      (Note: on cleanup path, ext4_reserve_inode_write sets
      the bh to NULL if it returns an error, and brelse() of 
      a null bh is handled gracefully).
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      66543617
  13. 25 5月, 2011 1 次提交
    • A
      ext4: add flag to ext4_has_free_blocks · 55f020db
      Allison Henderson 提交于
      This patch adds an allocation request flag to the ext4_has_free_blocks
      function which enables the use of reserved blocks.  This will allow a
      punch hole to proceed even if the disk is full.  Punching a hole may
      require additional blocks to first split the extents.
      
      Because ext4_has_free_blocks is a low level function, the flag needs
      to be passed down through several functions listed below:
      
      ext4_ext_insert_extent
      ext4_ext_create_new_leaf
      ext4_ext_grow_indepth
      ext4_ext_split
      ext4_ext_new_meta_block
      ext4_mb_new_blocks
      ext4_claim_free_blocks
      ext4_has_free_blocks
      
      [ext4 punch hole patch series 1/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      55f020db
  14. 21 3月, 2011 1 次提交
  15. 22 2月, 2011 1 次提交
  16. 11 1月, 2011 2 次提交
  17. 28 10月, 2010 1 次提交
  18. 10 8月, 2010 1 次提交
  19. 12 6月, 2010 1 次提交
    • T
      ext4: Clean up s_dirt handling · a0375156
      Theodore Ts'o 提交于
      We don't need to set s_dirt in most of the ext4 code when journaling
      is enabled.  In ext3/4 some of the summary statistics for # of free
      inodes, blocks, and directories are calculated from the per-block
      group statistics when the file system is mounted or unmounted.  As a
      result the superblock doesn't have to be updated, either via the
      journal or by setting s_dirt.  There are a few exceptions, most
      notably when resizing the file system, where the superblock needs to
      be modified --- and in that case it should be done as a journalled
      operation if possible, and s_dirt set only in no-journal mode.
      
      This patch will optimize out some unneeded disk writes when using ext4
      with a journal.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a0375156
  20. 22 5月, 2010 1 次提交
  21. 17 5月, 2010 2 次提交
  22. 05 3月, 2010 1 次提交
    • C
      dquot: cleanup space allocation / freeing routines · 5dd4056d
      Christoph Hellwig 提交于
      Get rid of the alloc_space, free_space, reserve_space, claim_space and
      release_rsv dquot operations - they are always called from the filesystem
      and if a filesystem really needs their own (which none currently does)
      it can just call into it's own routine directly.
      
      Move shared logic into the common __dquot_alloc_space,
      dquot_claim_space_nodirty and __dquot_free_space low-level methods,
      and rationalize the wrappers around it to move as much as possible
      code into the common block for CONFIG_QUOTA vs not.  Also rename
      all these helpers to be named dquot_* instead of vfs_dq_*.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5dd4056d
  23. 16 2月, 2010 2 次提交
  24. 25 1月, 2010 1 次提交
    • T
      ext4: Use bitops to read/modify EXT4_I(inode)->i_state · 19f5fb7a
      Theodore Ts'o 提交于
      At several places we modify EXT4_I(inode)->i_state without holding
      i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
      ext4_do_update_inode, ...). These modifications are racy and we can
      lose updates to i_state. So convert handling of i_state to use bitops
      which are atomic.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      19f5fb7a
  25. 23 12月, 2009 1 次提交
    • J
      ext4: Eliminate potential double free on error path · d3533d72
      Julia Lawall 提交于
      b_entry_name and buffer are initially NULL, are initialized within a loop
      to the result of calling kmalloc, and are freed at the bottom of this loop.
      The loop contains gotos to cleanup, which also frees b_entry_name and
      buffer.  Some of these gotos are before the reinitializations of
      b_entry_name and buffer.  To maintain the invariant that b_entry_name and
      buffer are NULL at the top of the loop, and thus acceptable arguments to
      kfree, these variables are now set to NULL after the kfrees.
      
      This seems to be the simplest solution.  A more complicated solution
      would be to introduce more labels in the error handling code at the end of
      the function.
      
      A simplified version of the semantic match that finds this problem is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r@
      identifier E;
      expression E1;
      iterator I;
      statement S;
      @@
      
      *kfree(E);
      ... when != E = E1
          when != I(E,...) S
          when != &E
      *kfree(E);
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d3533d72
  26. 17 12月, 2009 1 次提交
    • C
      sanitize xattr handler prototypes · 431547b3
      Christoph Hellwig 提交于
      Add a flags argument to struct xattr_handler and pass it to all xattr
      handler methods.  This allows using the same methods for multiple
      handlers, e.g. for the ACL methods which perform exactly the same action
      for the access and default ACLs, just using a different underlying
      attribute.  With a little more groundwork it'll also allow sharing the
      methods for the regular user/trusted/secure handlers in extN, ocfs2 and
      jffs2 like it's already done for xfs in this patch.
      
      Also change the inode argument to the handlers to a dentry to allow
      using the handlers mechnism for filesystems that require it later,
      e.g. cifs.
      
      [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      431547b3
  27. 23 11月, 2009 1 次提交
    • T
      ext4: call ext4_forget() from ext4_free_blocks() · e6362609
      Theodore Ts'o 提交于
      Add the facility for ext4_forget() to be called from
      ext4_free_blocks().  This simplifies the code in a large number of
      places, and centralizes most of the work of calling ext4_forget() into
      a single place.
      
      Also fix a bug in the extents migration code; it wasn't calling
      ext4_forget() when releasing the indirect blocks during the
      conversion.  As a result, if the system cashed during or shortly after
      the extents migration, and the released indirect blocks get reused as
      data blocks, the journal replay would corrupt the data blocks.  With
      this new patch, fixing this bug was as simple as adding the
      EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      e6362609
  28. 16 11月, 2009 1 次提交
  29. 17 9月, 2009 1 次提交
    • E
      ext4: limit block allocations for indirect-block files to < 2^32 · fb0a387d
      Eric Sandeen 提交于
      Today, the ext4 allocator will happily allocate blocks past
      2^32 for indirect-block files, which results in the block
      numbers getting truncated, and corruption ensues.
      
      This patch limits such allocations to < 2^32, and adds
      BUG_ONs if we do get blocks larger than that.
      
      This should address RH Bug 519471, ext4 bitmap allocator 
      must limit blocks to < 2^32
      
      * ext4_find_goal() is modified to choose a goal < UINT_MAX,
        so that our starting point is in an acceptable range.
      
      * ext4_xattr_block_set() is modified such that the goal block
        is < UINT_MAX, as above.
      
      * ext4_mb_regular_allocator() is modified so that the group
        search does not continue into groups which are too high
      
      * ext4_mb_use_preallocated() has a check that we don't use
        preallocated space which is too far out
      
      * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs
      
      No attempt has been made to limit inode locations to < 2^32,
      so we may wind up with blocks far from their inodes.  Doing
      this much already will lead to some odd ENOSPC issues when the
      "lower 32" gets full, and further restricting inodes could
      make that even weirder.
      
      For high inodes, choosing a goal of the original, % UINT_MAX,
      may be a bit odd, but then we're in an odd situation anyway,
      and I don't know of a better heuristic.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fb0a387d
  30. 26 3月, 2009 1 次提交
  31. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  32. 13 12月, 2008 1 次提交
    • T
      ext4: remove do_blk_alloc() · 97df5d15
      Theodore Ts'o 提交于
      The convenience function do_blk_alloc() is a static function with only
      one caller, so fold it into ext4_new_meta_blocks() to simplify the
      code and to make it easier to understand.
      
      To save more stack space, if count is a null pointer in
      ext4_new_meta_blocks() assume that caller wanted a single block (and
      if there is an error, no blocks were allocated).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      97df5d15
  33. 08 12月, 2008 1 次提交
    • T
      ext4: remove ext4_new_meta_block() · cfe82c85
      Theodore Ts'o 提交于
      There were only two one callers of the function ext4_new_meta_block(),
      which just a very simpler wrapper function around
      ext4_new_meta_blocks().  Change those two functions to call
      ext4_new_meta_blocks() directly, to save code and stack space usage.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cfe82c85
  34. 11 10月, 2008 1 次提交