1. 16 2月, 2010 2 次提交
  2. 25 1月, 2010 1 次提交
    • T
      ext4: Use bitops to read/modify EXT4_I(inode)->i_state · 19f5fb7a
      Theodore Ts'o 提交于
      At several places we modify EXT4_I(inode)->i_state without holding
      i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
      ext4_do_update_inode, ...). These modifications are racy and we can
      lose updates to i_state. So convert handling of i_state to use bitops
      which are atomic.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      19f5fb7a
  3. 23 12月, 2009 1 次提交
    • J
      ext4: Eliminate potential double free on error path · d3533d72
      Julia Lawall 提交于
      b_entry_name and buffer are initially NULL, are initialized within a loop
      to the result of calling kmalloc, and are freed at the bottom of this loop.
      The loop contains gotos to cleanup, which also frees b_entry_name and
      buffer.  Some of these gotos are before the reinitializations of
      b_entry_name and buffer.  To maintain the invariant that b_entry_name and
      buffer are NULL at the top of the loop, and thus acceptable arguments to
      kfree, these variables are now set to NULL after the kfrees.
      
      This seems to be the simplest solution.  A more complicated solution
      would be to introduce more labels in the error handling code at the end of
      the function.
      
      A simplified version of the semantic match that finds this problem is as
      follows: (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r@
      identifier E;
      expression E1;
      iterator I;
      statement S;
      @@
      
      *kfree(E);
      ... when != E = E1
          when != I(E,...) S
          when != &E
      *kfree(E);
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d3533d72
  4. 17 12月, 2009 1 次提交
    • C
      sanitize xattr handler prototypes · 431547b3
      Christoph Hellwig 提交于
      Add a flags argument to struct xattr_handler and pass it to all xattr
      handler methods.  This allows using the same methods for multiple
      handlers, e.g. for the ACL methods which perform exactly the same action
      for the access and default ACLs, just using a different underlying
      attribute.  With a little more groundwork it'll also allow sharing the
      methods for the regular user/trusted/secure handlers in extN, ocfs2 and
      jffs2 like it's already done for xfs in this patch.
      
      Also change the inode argument to the handlers to a dentry to allow
      using the handlers mechnism for filesystems that require it later,
      e.g. cifs.
      
      [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      431547b3
  5. 23 11月, 2009 1 次提交
    • T
      ext4: call ext4_forget() from ext4_free_blocks() · e6362609
      Theodore Ts'o 提交于
      Add the facility for ext4_forget() to be called from
      ext4_free_blocks().  This simplifies the code in a large number of
      places, and centralizes most of the work of calling ext4_forget() into
      a single place.
      
      Also fix a bug in the extents migration code; it wasn't calling
      ext4_forget() when releasing the indirect blocks during the
      conversion.  As a result, if the system cashed during or shortly after
      the extents migration, and the released indirect blocks get reused as
      data blocks, the journal replay would corrupt the data blocks.  With
      this new patch, fixing this bug was as simple as adding the
      EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      e6362609
  6. 16 11月, 2009 1 次提交
  7. 17 9月, 2009 1 次提交
    • E
      ext4: limit block allocations for indirect-block files to < 2^32 · fb0a387d
      Eric Sandeen 提交于
      Today, the ext4 allocator will happily allocate blocks past
      2^32 for indirect-block files, which results in the block
      numbers getting truncated, and corruption ensues.
      
      This patch limits such allocations to < 2^32, and adds
      BUG_ONs if we do get blocks larger than that.
      
      This should address RH Bug 519471, ext4 bitmap allocator 
      must limit blocks to < 2^32
      
      * ext4_find_goal() is modified to choose a goal < UINT_MAX,
        so that our starting point is in an acceptable range.
      
      * ext4_xattr_block_set() is modified such that the goal block
        is < UINT_MAX, as above.
      
      * ext4_mb_regular_allocator() is modified so that the group
        search does not continue into groups which are too high
      
      * ext4_mb_use_preallocated() has a check that we don't use
        preallocated space which is too far out
      
      * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs
      
      No attempt has been made to limit inode locations to < 2^32,
      so we may wind up with blocks far from their inodes.  Doing
      this much already will lead to some odd ENOSPC issues when the
      "lower 32" gets full, and further restricting inodes could
      make that even weirder.
      
      For high inodes, choosing a goal of the original, % UINT_MAX,
      may be a bit odd, but then we're in an odd situation anyway,
      and I don't know of a better heuristic.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fb0a387d
  8. 26 3月, 2009 1 次提交
  9. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  10. 13 12月, 2008 1 次提交
    • T
      ext4: remove do_blk_alloc() · 97df5d15
      Theodore Ts'o 提交于
      The convenience function do_blk_alloc() is a static function with only
      one caller, so fold it into ext4_new_meta_blocks() to simplify the
      code and to make it easier to understand.
      
      To save more stack space, if count is a null pointer in
      ext4_new_meta_blocks() assume that caller wanted a single block (and
      if there is an error, no blocks were allocated).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      97df5d15
  11. 08 12月, 2008 1 次提交
    • T
      ext4: remove ext4_new_meta_block() · cfe82c85
      Theodore Ts'o 提交于
      There were only two one callers of the function ext4_new_meta_block(),
      which just a very simpler wrapper function around
      ext4_new_meta_blocks().  Change those two functions to call
      ext4_new_meta_blocks() directly, to save code and stack space usage.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cfe82c85
  12. 11 10月, 2008 1 次提交
  13. 09 10月, 2008 1 次提交
    • K
      ext4: fix xattr deadlock · 4d20c685
      Kalpak Shah 提交于
      ext4_xattr_set_handle() eventually ends up calling
      ext4_mark_inode_dirty() which tries to expand the inode by shifting
      the EAs.  This leads to the xattr_sem being downed again and leading
      to a deadlock.
      
      This patch makes sure that if ext4_xattr_set_handle() is in the
      call-chain, ext4_mark_inode_dirty() will not expand the inode.
      Signed-off-by: NKalpak Shah <kalpak.shah@sun.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4d20c685
  14. 27 7月, 2008 1 次提交
  15. 12 7月, 2008 1 次提交
    • A
      ext4: Use inode preallocation with -o noextents · 7061eba7
      Aneesh Kumar K.V 提交于
      When mballoc is enabled, block allocation for old block-based
      files are allocated using mballoc allocator instead of old
      block-based allocator. The old ext3 block reservation is turned
      off when mballoc is turned on.
      
      However, the in-core preallocation is not enabled for block-based/
      non-extent based file block allocation. This result in performance
      regression, as now we don't have "reservation" ore in-core preallocation
      to prevent interleaved fragmentation in multiple writes workload.
      
      This patch fix this by enable per inode in-core preallocation
      for non extent files when mballoc is used.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7061eba7
  16. 15 5月, 2008 1 次提交
  17. 30 4月, 2008 2 次提交
  18. 17 4月, 2008 4 次提交
  19. 16 4月, 2008 1 次提交
  20. 29 1月, 2008 1 次提交
  21. 18 10月, 2007 2 次提交
  22. 18 7月, 2007 2 次提交
    • K
      ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields · 6dd4ee7c
      Kalpak Shah 提交于
      We need to make sure that existing ext3 filesystems can also avail the
      new fields that have been added to the ext4 inode. We use
      s_want_extra_isize and s_min_extra_isize to decide by how much we should
      expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set
      then we expand the inode by max(s_want_extra_isize, s_min_extra_isize ,
      sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is
      still an open question about whether users should be able to set
      s_*_extra_isize smaller than the known fields or not.
      
      This patch also adds the functionality to expand inodes to include the
      newly added fields. We start by trying to expand by s_want_extra_isize
      bytes and if its fails we try to expand by s_min_extra_isize bytes. This
      is done by changing the i_extra_isize if enough space is available in
      the inode and no EAs are present. If EAs are present and there is enough
      space in the inode then the EAs in the inode are shifted to make space.
      If enough space is not available in the inode due to the EAs then 1 or
      more EAs are shifted to the external EA block. In the worst case when
      even the external EA block does not have enough space we inform the user
      that some EA would need to be deleted or s_min_extra_isize would have to
      be reduced.
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NKalpak Shah <kalpak@clusterfs.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6dd4ee7c
    • K
      ext4: Add nanosecond timestamps · ef7f3835
      Kalpak Shah 提交于
      This patch adds nanosecond timestamps for ext4. This involves adding
      *time_extra fields to the ext4_inode to extend the timestamps to
      64-bits.  Creation time is also added by this patch.
      
      These extended fields will fit into an inode if the filesystem was
      formatted with large inodes (-I 256 or larger) and there are currently
      no EAs consuming all of the available space. For new inodes we always
      reserve enough space for the kernel's known extended fields, but for
      inodes created with an old kernel this might not have been the case. So
      this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature
      flag(ro-compat so that older kernels can't create inodes with a smaller
      extra_isize). which indicates if the fields fitting inside
      s_min_extra_isize are available or not.  If the expansion of inodes if
      unsuccessful then this feature will be disabled.  This feature is only
      enabled if requested by the sysadmin.
      
      None of the extended inode fields is critical for correct filesystem
      operation.
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NKalpak Shah <kalpak@clusterfs.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ef7f3835
  23. 02 3月, 2007 1 次提交
    • M
      [PATCH] ext[34]: EA block reference count racing fix · 8a2bfdcb
      Mingming Cao 提交于
      There are race issues around ext[34] xattr block release code.
      
      ext[34]_xattr_release_block() checks the reference count of xattr block
      (h_refcount) and frees that xattr block if it is the last one reference it.
       Unlike ext2, the check of this counter is unprotected by any lock.
      ext[34]_xattr_release_block() will free the mb_cache entry before freeing
      that xattr block.  There is a small window between the check for the re
      h_refcount ==1 and the call to mb_cache_entry_free().  During this small
      window another inode might find this xattr block from the mbcache and reuse
      it, racing a refcount updates.  The xattr block will later be freed by the
      first inode without notice other inode is still use it.  Later if that
      block is reallocated as a datablock for other file, then more serious
      problem might happen.
      
      We need put a lock around places checking the refount as well to avoid
      racing issue.  Another place need this kind of protection is in
      ext3_xattr_block_set(), where it will modify the xattr block content in-
      the-fly if the refcount is 1 (means it's the only inode reference it).
      
      This will also fix another issue: the xattr block may not get freed at all
      if no lock is to protect the refcount check at the release time.  It is
      possible that the last two inodes could release the shared xattr block at
      the same time.  But both of them think they are not the last one so only
      decreased the h_refcount without freeing xattr block at all.
      
      We need to call lock_buffer() after ext3_journal_get_write_access() to
      avoid deadlock (because the later will call lock_buffer()/unlock_buffer
      () as well).
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Cc: Andreas Gruenbacher <agruen@suse.de>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a2bfdcb
  24. 08 12月, 2006 1 次提交
  25. 12 10月, 2006 4 次提交
  26. 27 9月, 2006 1 次提交
  27. 26 6月, 2006 2 次提交
    • M
      [PATCH] ext3_fsblk_t: the rest of in-kernel filesystem blocks conversion · 43d23f90
      Mingming Cao 提交于
      Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t.  Convert the
      rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
      and replace the printk format string respondingly.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      43d23f90
    • M
      [PATCH] ext3_fsblk_t: filesystem, group blocks and bug fixes · 1c2bf374
      Mingming Cao 提交于
      Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
      int type, thus limited ext3 filesystem to 8TB (4kblock size based).  While
      trying to fix them, it seems quite confusing in the ext3 code where some
      blocks are filesystem-wide blocks, some are group relative offsets that need
      to be signed value (as -1 has special meaning).  So it seem saner to define
      two types of physical blocks: one is filesystem wide blocks, another is
      group-relative blocks.  The following patches clarify these two types of
      blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
      filesystem limit to 8TB.
      
      With this series of patches and the percpu counter data type changes in the mm
      tree, we are able to extend exts filesystem limit to 16TB.
      
      This work is also a pre-request for the recent >32 bit ext3 work, and makes
      the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
      ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
      ext3 filesystem block corresponding.
      
      Two RFC with a series patches have been posted to ext2-devel list and have
      been reviewed and discussed:
      http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2
      
      http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2
      
      Patches are tested on both 32 bit machine and 64 bit machine, <8TB ext3 and
      >8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39).  Tests
      includes overnight fsx, tiobench, dbench and fsstress.
      
      This patch:
      
      Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
      filesystem wide blocks.
      
      This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
      occurs in the same function where used to be confusing before.  Also include
      kernel bug fixes for filesystem wide in-kernel block variables.  There are
      some fileystem wide blocks are treated as int/unsigned int type in the kernel
      currently, especially in ext3 block allocation and reservation code.  This
      patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
      long) type.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1c2bf374
  28. 11 1月, 2006 1 次提交
  29. 10 1月, 2006 1 次提交