1. 07 1月, 2009 1 次提交
    • F
      ext4: Allow ext4 to run without a journal · 0390131b
      Frank Mayhar 提交于
      A few weeks ago I posted a patch for discussion that allowed ext4 to run
      without a journal.  Since that time I've integrated the excellent
      comments from Andreas and fixed several serious bugs.  We're currently
      running with this patch and generating some performance numbers against
      both ext2 (with backported reservations code) and ext4 with and without
      a journal.  It just so happens that running without a journal is
      slightly faster for most everything.
      
      We did
      	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
      
      which creates 4 threads, each of which create and do reads and writes on
      a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
      to bypass the page cache.  Results:
      
                           ext2        ext4, default   ext4, no journal
        initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
        rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
        reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
        re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
        random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
        random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 
      
      So it seems that, so far, this was a useful exercise.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0390131b
  2. 07 12月, 2008 1 次提交
  3. 29 10月, 2008 1 次提交
  4. 05 1月, 2009 1 次提交
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
  5. 01 1月, 2009 1 次提交
  6. 24 10月, 2008 1 次提交
  7. 23 10月, 2008 1 次提交
  8. 11 10月, 2008 1 次提交
  9. 23 9月, 2008 1 次提交
  10. 09 9月, 2008 2 次提交
  11. 12 7月, 2008 3 次提交
  12. 30 4月, 2008 2 次提交
  13. 17 4月, 2008 2 次提交
  14. 29 4月, 2008 1 次提交
  15. 26 2月, 2008 1 次提交
  16. 16 2月, 2008 1 次提交
  17. 22 2月, 2008 1 次提交
  18. 08 2月, 2008 1 次提交
  19. 05 2月, 2008 1 次提交
  20. 29 1月, 2008 3 次提交
  21. 18 10月, 2007 1 次提交
  22. 20 9月, 2007 2 次提交
  23. 18 7月, 2007 2 次提交
    • A
      ext4: Remove 65000 subdirectory limit · f8628a14
      Andreas Dilger 提交于
      This patch adds support to ext4 for allowing more than 65000
      subdirectories. Currently the maximum number of subdirectories is capped
      at 32000.
      
      If we exceed 65000 subdirectories in an htree directory it sets the
      inode link count to 1 and no longer counts subdirectories.  The
      directory link count is not actually used when determining if a
      directory is empty, as that only counts subdirectories and not regular
      files that might be in there. 
      
      A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if
      the subdir count for any directory crosses 65000. A later fsck will clear
      EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any directory
      with >65000 subdirs.
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NKalpak Shah <kalpak@clusterfs.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      
      f8628a14
    • K
      ext4: Add nanosecond timestamps · ef7f3835
      Kalpak Shah 提交于
      This patch adds nanosecond timestamps for ext4. This involves adding
      *time_extra fields to the ext4_inode to extend the timestamps to
      64-bits.  Creation time is also added by this patch.
      
      These extended fields will fit into an inode if the filesystem was
      formatted with large inodes (-I 256 or larger) and there are currently
      no EAs consuming all of the available space. For new inodes we always
      reserve enough space for the kernel's known extended fields, but for
      inodes created with an old kernel this might not have been the case. So
      this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature
      flag(ro-compat so that older kernels can't create inodes with a smaller
      extra_isize). which indicates if the fields fitting inside
      s_min_extra_isize are available or not.  If the expansion of inodes if
      unsuccessful then this feature will be disabled.  This feature is only
      enabled if requested by the sysadmin.
      
      None of the extended inode fields is critical for correct filesystem
      operation.
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NKalpak Shah <kalpak@clusterfs.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ef7f3835
  24. 17 7月, 2007 1 次提交
    • V
      ext3/ext4: orphan list corruption due bad inode · a6c15c2b
      Vasily Averin 提交于
      After ext3 orphan list check has been added into ext3_destroy_inode()
      (please see my previous patch) the following situation has been detected:
      
       EXT3-fs warning (device sda6): ext3_unlink: Deleting nonexistent file (37901290), 0
       Inode 00000101a15b7840: orphan list check failed!
       00000773 6f665f00 74616d72 00000573 65725f00 06737270 66000000 616d726f
      ...
       Call Trace: [<ffffffff80211ea9>] ext3_destroy_inode+0x79/0x90
        [<ffffffff801a2b16>] sys_unlink+0x126/0x1a0
        [<ffffffff80111479>] error_exit+0x0/0x81
        [<ffffffff80110aba>] system_call+0x7e/0x83
      
      First messages said that unlinked inode has i_nlink=0, then ext3_unlink()
      adds this inode into orphan list.
      
      Second message means that this inode has not been removed from orphan list.
       Inode dump has showed that i_fop = &bad_file_ops and it can be set in
      make_bad_inode() only.  Then I've found that ext3_read_inode() can call
      make_bad_inode() without any error/warning messages, for example in the
      following case:
      
      ...
              if (inode->i_nlink == 0) {
                      if (inode->i_mode == 0 ||
                          !(EXT3_SB(inode->i_sb)->s_mount_state & EXT3_ORPHAN_FS)) {
                              /* this inode is deleted */
                              brelse (bh);
                              goto bad_inode;
      ...
      
      Bad inode can live some time, ext3_unlink can add it to orphan list, but
      ext3_delete_inode() do not deleted this inode from orphan list.  As result
      we can have orphan list corruption detected in ext3_destroy_inode().
      
      However it is not clear for me how to fix this issue correctly.
      
      As far as i see is_bad_inode() is called after iget() in all places
      excluding ext3_lookup() and ext3_get_parent().  I believe it makes sense to
      add bad inode check to these functions too and call iput if bad inode
      detected.
      Signed-off-by: NVasily Averin <vvs@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6c15c2b
  25. 01 6月, 2007 1 次提交
  26. 09 5月, 2007 2 次提交
  27. 13 2月, 2007 1 次提交
  28. 12 2月, 2007 2 次提交
  29. 09 12月, 2006 1 次提交