1. 21 2月, 2012 1 次提交
  2. 29 10月, 2011 1 次提交
    • E
      ext4: fix race in xattr block allocation path · 6d6a4351
      Eric Sandeen 提交于
      Ceph users reported that when using Ceph on ext4, the filesystem
      would often become corrupted, containing inodes with incorrect
      i_blocks counters.
      
      I managed to reproduce this with a very hacked-up "streamtest"
      binary from the Ceph tree.
      
      Ceph is doing a lot of xattr writes, to out-of-inode blocks.
      There is also another thread which does sync_file_range and close,
      of the same files.  The problem appears to happen due to this race:
      
      sync/flush thread               xattr-set thread
      -----------------               ----------------
      
      do_writepages                   ext4_xattr_set
      ext4_da_writepages              ext4_xattr_set_handle
      mpage_da_map_blocks             ext4_xattr_block_set
              set DELALLOC_RESERVE
                                      ext4_new_meta_blocks
                                              ext4_mb_new_blocks
                                                      if (!i_delalloc_reserved_flag)
                                                              vfs_dq_alloc_block
      ext4_get_blocks
      	down_write(i_data_sem)
              set i_delalloc_reserved_flag
      	...
      	up_write(i_data_sem)
                                              if (i_delalloc_reserved_flag)
                                                      vfs_dq_alloc_block_nofail
      
      
      In other words, the sync/flush thread pops in and sets
      i_delalloc_reserved_flag on the inode, which makes the xattr thread
      think that it's in a delalloc path in ext4_new_meta_blocks(),
      and add the block for a second time, after already having added
      it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks
      
      The real problem is that we shouldn't be using the DELALLOC_RESERVED
      state flag, and instead we should be passing
      EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of
      using an inode state flag.  We'll fix this for now with using
      i_data_sem to prevent this race, but this is really not the right way
      to fix things.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      6d6a4351
  3. 26 10月, 2011 1 次提交
    • E
      ext4: use ext4_reserve_inode_write in ext4_xattr_set_handle · 66543617
      Eric Sandeen 提交于
      ext4_mark_iloc_dirty() says:
      
       * The caller must have previously called ext4_reserve_inode_write().
       * Give this, we know that the caller already has write access to iloc->bh.
      
      ext4_xattr_set_handle, however, just open-codes it.  May as well use
      the helper function for consistency.
      
      No bug here, just tidiness.
      
      (Note: on cleanup path, ext4_reserve_inode_write sets
      the bh to NULL if it returns an error, and brelse() of 
      a null bh is handled gracefully).
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      66543617
  4. 25 5月, 2011 1 次提交
    • A
      ext4: add flag to ext4_has_free_blocks · 55f020db
      Allison Henderson 提交于
      This patch adds an allocation request flag to the ext4_has_free_blocks
      function which enables the use of reserved blocks.  This will allow a
      punch hole to proceed even if the disk is full.  Punching a hole may
      require additional blocks to first split the extents.
      
      Because ext4_has_free_blocks is a low level function, the flag needs
      to be passed down through several functions listed below:
      
      ext4_ext_insert_extent
      ext4_ext_create_new_leaf
      ext4_ext_grow_indepth
      ext4_ext_split
      ext4_ext_new_meta_block
      ext4_mb_new_blocks
      ext4_claim_free_blocks
      ext4_has_free_blocks
      
      [ext4 punch hole patch series 1/5 v7]
      Signed-off-by: NAllison Henderson <achender@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      55f020db
  5. 21 3月, 2011 1 次提交
  6. 22 2月, 2011 1 次提交
  7. 11 1月, 2011 2 次提交
  8. 28 10月, 2010 1 次提交
  9. 10 8月, 2010 1 次提交
  10. 12 6月, 2010 1 次提交
    • T
      ext4: Clean up s_dirt handling · a0375156
      Theodore Ts'o 提交于
      We don't need to set s_dirt in most of the ext4 code when journaling
      is enabled.  In ext3/4 some of the summary statistics for # of free
      inodes, blocks, and directories are calculated from the per-block
      group statistics when the file system is mounted or unmounted.  As a
      result the superblock doesn't have to be updated, either via the
      journal or by setting s_dirt.  There are a few exceptions, most
      notably when resizing the file system, where the superblock needs to
      be modified --- and in that case it should be done as a journalled
      operation if possible, and s_dirt set only in no-journal mode.
      
      This patch will optimize out some unneeded disk writes when using ext4
      with a journal.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a0375156
  11. 22 5月, 2010 1 次提交
  12. 17 5月, 2010 2 次提交
  13. 05 3月, 2010 1 次提交
    • C
      dquot: cleanup space allocation / freeing routines · 5dd4056d
      Christoph Hellwig 提交于
      Get rid of the alloc_space, free_space, reserve_space, claim_space and
      release_rsv dquot operations - they are always called from the filesystem
      and if a filesystem really needs their own (which none currently does)
      it can just call into it's own routine directly.
      
      Move shared logic into the common __dquot_alloc_space,
      dquot_claim_space_nodirty and __dquot_free_space low-level methods,
      and rationalize the wrappers around it to move as much as possible
      code into the common block for CONFIG_QUOTA vs not.  Also rename
      all these helpers to be named dquot_* instead of vfs_dq_*.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5dd4056d
  14. 16 2月, 2010 2 次提交
  15. 25 1月, 2010 1 次提交
    • T
      ext4: Use bitops to read/modify EXT4_I(inode)->i_state · 19f5fb7a
      Theodore Ts'o 提交于
      At several places we modify EXT4_I(inode)->i_state without holding
      i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
      ext4_do_update_inode, ...). These modifications are racy and we can
      lose updates to i_state. So convert handling of i_state to use bitops
      which are atomic.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      19f5fb7a
  16. 23 12月, 2009 1 次提交
    • J
      ext4: Eliminate potential double free on error path · d3533d72
      Julia Lawall 提交于
      b_entry_name and buffer are initially NULL, are initialized within a loop
      to the result of calling kmalloc, and are freed at the bottom of this loop.
      The loop contains gotos to cleanup, which also frees b_entry_name and
      buffer.  Some of these gotos are before the reinitializations of
      b_entry_name and buffer.  To maintain the invariant that b_entry_name and
      buffer are NULL at the top of the loop, and thus acceptable arguments to
      kfree, these variables are now set to NULL after the kfrees.
      
      This seems to be the simplest solution.  A more complicated solution
      would be to introduce more labels in the error handling code at the end of
      the function.
      
      A simplified version of the semantic match that finds this problem is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r@
      identifier E;
      expression E1;
      iterator I;
      statement S;
      @@
      
      *kfree(E);
      ... when != E = E1
          when != I(E,...) S
          when != &E
      *kfree(E);
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d3533d72
  17. 17 12月, 2009 1 次提交
    • C
      sanitize xattr handler prototypes · 431547b3
      Christoph Hellwig 提交于
      Add a flags argument to struct xattr_handler and pass it to all xattr
      handler methods.  This allows using the same methods for multiple
      handlers, e.g. for the ACL methods which perform exactly the same action
      for the access and default ACLs, just using a different underlying
      attribute.  With a little more groundwork it'll also allow sharing the
      methods for the regular user/trusted/secure handlers in extN, ocfs2 and
      jffs2 like it's already done for xfs in this patch.
      
      Also change the inode argument to the handlers to a dentry to allow
      using the handlers mechnism for filesystems that require it later,
      e.g. cifs.
      
      [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      431547b3
  18. 23 11月, 2009 1 次提交
    • T
      ext4: call ext4_forget() from ext4_free_blocks() · e6362609
      Theodore Ts'o 提交于
      Add the facility for ext4_forget() to be called from
      ext4_free_blocks().  This simplifies the code in a large number of
      places, and centralizes most of the work of calling ext4_forget() into
      a single place.
      
      Also fix a bug in the extents migration code; it wasn't calling
      ext4_forget() when releasing the indirect blocks during the
      conversion.  As a result, if the system cashed during or shortly after
      the extents migration, and the released indirect blocks get reused as
      data blocks, the journal replay would corrupt the data blocks.  With
      this new patch, fixing this bug was as simple as adding the
      EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      e6362609
  19. 16 11月, 2009 1 次提交
  20. 17 9月, 2009 1 次提交
    • E
      ext4: limit block allocations for indirect-block files to < 2^32 · fb0a387d
      Eric Sandeen 提交于
      Today, the ext4 allocator will happily allocate blocks past
      2^32 for indirect-block files, which results in the block
      numbers getting truncated, and corruption ensues.
      
      This patch limits such allocations to < 2^32, and adds
      BUG_ONs if we do get blocks larger than that.
      
      This should address RH Bug 519471, ext4 bitmap allocator 
      must limit blocks to < 2^32
      
      * ext4_find_goal() is modified to choose a goal < UINT_MAX,
        so that our starting point is in an acceptable range.
      
      * ext4_xattr_block_set() is modified such that the goal block
        is < UINT_MAX, as above.
      
      * ext4_mb_regular_allocator() is modified so that the group
        search does not continue into groups which are too high
      
      * ext4_mb_use_preallocated() has a check that we don't use
        preallocated space which is too far out
      
      * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs
      
      No attempt has been made to limit inode locations to < 2^32,
      so we may wind up with blocks far from their inodes.  Doing
      this much already will lead to some odd ENOSPC issues when the
      "lower 32" gets full, and further restricting inodes could
      make that even weirder.
      
      For high inodes, choosing a goal of the original, % UINT_MAX,
      may be a bit odd, but then we're in an odd situation anyway,
      and I don't know of a better heuristic.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fb0a387d
  21. 26 3月, 2009 1 次提交
  22. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  23. 13 12月, 2008 1 次提交
    • T
      ext4: remove do_blk_alloc() · 97df5d15
      Theodore Ts'o 提交于
      The convenience function do_blk_alloc() is a static function with only
      one caller, so fold it into ext4_new_meta_blocks() to simplify the
      code and to make it easier to understand.
      
      To save more stack space, if count is a null pointer in
      ext4_new_meta_blocks() assume that caller wanted a single block (and
      if there is an error, no blocks were allocated).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      97df5d15
  24. 08 12月, 2008 1 次提交
    • T
      ext4: remove ext4_new_meta_block() · cfe82c85
      Theodore Ts'o 提交于
      There were only two one callers of the function ext4_new_meta_block(),
      which just a very simpler wrapper function around
      ext4_new_meta_blocks().  Change those two functions to call
      ext4_new_meta_blocks() directly, to save code and stack space usage.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cfe82c85
  25. 11 10月, 2008 1 次提交
  26. 09 10月, 2008 1 次提交
    • K
      ext4: fix xattr deadlock · 4d20c685
      Kalpak Shah 提交于
      ext4_xattr_set_handle() eventually ends up calling
      ext4_mark_inode_dirty() which tries to expand the inode by shifting
      the EAs.  This leads to the xattr_sem being downed again and leading
      to a deadlock.
      
      This patch makes sure that if ext4_xattr_set_handle() is in the
      call-chain, ext4_mark_inode_dirty() will not expand the inode.
      Signed-off-by: NKalpak Shah <kalpak.shah@sun.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4d20c685
  27. 27 7月, 2008 1 次提交
  28. 12 7月, 2008 1 次提交
    • A
      ext4: Use inode preallocation with -o noextents · 7061eba7
      Aneesh Kumar K.V 提交于
      When mballoc is enabled, block allocation for old block-based
      files are allocated using mballoc allocator instead of old
      block-based allocator. The old ext3 block reservation is turned
      off when mballoc is turned on.
      
      However, the in-core preallocation is not enabled for block-based/
      non-extent based file block allocation. This result in performance
      regression, as now we don't have "reservation" ore in-core preallocation
      to prevent interleaved fragmentation in multiple writes workload.
      
      This patch fix this by enable per inode in-core preallocation
      for non extent files when mballoc is used.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7061eba7
  29. 15 5月, 2008 1 次提交
  30. 30 4月, 2008 2 次提交
  31. 17 4月, 2008 4 次提交
  32. 16 4月, 2008 1 次提交
  33. 29 1月, 2008 1 次提交