提交 · 05bf9e839d9de4e8a094274a0a2fd07beb47eaf1 · gsplhtlxg / clone-Linux

22 2月, 2009 1 次提交

ext4: Add fallback for find_group_flex · 05bf9e83

由 Theodore Ts'o 提交于 2月 21, 2009

This is a workaround for find_group_flex() which badly needs to be
replaced.  One of its problems (besides ignoring the Orlov algorithm)
is that it is a bit hyperactive about returning failure under
suspicious circumstances.  This can lead to spurious ENOSPC failures
even when there are inodes still available.

Work around this for now by retrying the search using
find_group_other() if find_group_flex() returns -1.  If
find_group_other() succeeds when find_group_flex() has failed, log a
warning message.

A better block/inode allocator that will fix this problem for real has
been queued up for the next merge window.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

05bf9e83

07 1月, 2009 1 次提交

ext4: Remove "extents" mount option · 83982b6f

由 Theodore Ts'o 提交于 1月 06, 2009

This mount option is largely superfluous, and in fact the way it was
implemented was buggy; if a filesystem which did not have the extents
feature flag was mounted -o extents, the filesystem would attempt to
create and use extents-based file even though the extents feature flag
was not eabled. The simplest thing to do is to nuke the mount option
entirely. It's not all that useful to force the non-creation of new
extent-based files if the filesystem can support it.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

83982b6f

04 1月, 2009 1 次提交
- T
  ext4: Add markers for better debuggability · ba80b101
  由 Theodore Ts'o 提交于 1月 03, 2009
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
  ba80b101
06 1月, 2009 3 次提交

ext4: mark the blocks/inode bitmap beyond end of group as used · 648f5879

由 Aneesh Kumar K.V 提交于 1月 05, 2009

We need to mark the block/inode bitmap beyond the end of the group
with '1'.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

648f5879

ext4: Use new buffer_head flag to check uninit group bitmaps initialization · 2ccb5fb9

由 Aneesh Kumar K.V 提交于 1月 05, 2009

For uninit block group, the on-disk bitmap is not initialized. That
implies we cannot depend on the uptodate flag on the bitmap
buffer_head to find bitmap validity.  Use a new buffer_head flag which
would be set after we properly initialize the bitmap.  This also
prevents (re-)initializing the uninit group bitmap every time we call 
ext4_read_block_bitmap().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

2ccb5fb9

ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() · 39341867

由 Aneesh Kumar K.V 提交于 1月 05, 2009

We need to make sure we update the inode bitmap and clear
EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since
ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide
whether to initialize the inode bitmap each time it is called.
(introduced by commit c806e68f.)

ext4_read_inode_bitmap does:

spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
	ext4_init_inode_bitmap(sb, bh, block_group, desc);

and ext4_new_inode does
if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
                   ino, inode_bitmap_bh->b_data))
		   ......
		   ...
spin_lock(sb_bgl_lock(sbi, group));

gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
i.e., on allocation we update the bitmap then we take the sb_bgl_lock
and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a
parallel ext4_read_inode_bitmap can zero out the bitmap in between
the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..)

The race results in below user visible errors
EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449
EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ...
EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ...
# ls -al /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71
ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

39341867

04 1月, 2009 1 次提交

ext4: code cleanup · 3300beda

由 Aneesh Kumar K.V 提交于 1月 03, 2009

Rename some variables.  We also unlock locks in the reverse order we
acquired as a part of cleanup.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3300beda

06 1月, 2009 1 次提交

ext4: Use high 16 bits of the block group descriptor's free counts fields · 560671a0

由 Aneesh Kumar K.V 提交于 1月 05, 2009

Rename the lower bits with suffix _lo and add helper
to access the values. Also rename bg_itable_unused_hi
to bg_pad as in e2fsprogs.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

560671a0

01 1月, 2009 1 次提交
- A
  nfsd race fixes: ext4 · 6b38e842
  由 Al Viro 提交于 12月 30, 2008
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  6b38e842
14 11月, 2008 1 次提交

CRED: Wrap task credential accesses in the Ext4 filesystem · 4c9c544e

由 David Howells 提交于 11月 14, 2008

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: adilger@sun.com
Cc: linux-ext4@vger.kernel.org
Signed-off-by: NJames Morris <jmorris@namei.org>

4c9c544e

07 11月, 2008 1 次提交

ext4: add checksum calculation when clearing UNINIT flag in ext4_new_inode · 23712a9c

由 Frederic Bohe 提交于 11月 07, 2008

When initializing an uninitialized block group in ext4_new_inode(),
its block group checksum must be re-calculated.  This fixes a race
when several threads try to allocate a new inode in an UNINIT'd group.

There is some question whether we need to be initializing the block
bitmap in ext4_new_inode() at all, but for now, if we are going to
init the block group, let's eliminate the race.
Signed-off-by: NFrederic Bohe <frederic.bohe@bull.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

23712a9c

06 1月, 2009 2 次提交

ext4: Make ext4_group_t be an unsigned int · a9df9a49

由 Theodore Ts'o 提交于 1月 05, 2009

Nearly all places in the ext3/4 code which uses "unsigned long" is
probably a bug, since on 32-bit systems a ulong a 32-bits, which means
we are wasting stack space on 64-bit systems.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a9df9a49

ext4: remove extraneous newlines from calls to ext4_error() and ext4_warning() · fde4d95a

由 Theodore Ts'o 提交于 1月 05, 2009

This removes annoying blank syslog entries emitted by ext4_error() or
ext4_warning(), since these functions add their own newline.
Signed-off-by: NNick Warne <nick@ukfsn.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fde4d95a

07 1月, 2009 1 次提交

ext4: Allow ext4 to run without a journal · 0390131b

由 Frank Mayhar 提交于 1月 07, 2009

A few weeks ago I posted a patch for discussion that allowed ext4 to run
without a journal.  Since that time I've integrated the excellent
comments from Andreas and fixed several serious bugs.  We're currently
running with this patch and generating some performance numbers against
both ext2 (with backported reservations code) and ext4 with and without
a journal.  It just so happens that running without a journal is
slightly faster for most everything.

We did
	iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2

which creates 4 threads, each of which create and do reads and writes on
a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
to bypass the page cache.  Results:

                     ext2        ext4, default   ext4, no journal
  initial writes   13.0 MB/s        15.4 MB/s          15.7 MB/s
  rewrites         13.1 MB/s        15.6 MB/s          15.9 MB/s
  reads            15.2 MB/s        16.9 MB/s          17.2 MB/s
  re-reads         15.3 MB/s        16.9 MB/s          17.2 MB/s
  random readers    5.6 MB/s         5.6 MB/s           5.7 MB/s
  random writers    5.1 MB/s         5.3 MB/s           5.4 MB/s 

So it seems that, so far, this was a useful exercise.
Signed-off-by: NFrank Mayhar <fmayhar@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0390131b

10 10月, 2008 2 次提交

ext4: fix initialization of UNINIT bitmap blocks · c806e68f

由 Frederic Bohe 提交于 10月 10, 2008

This fixes a bug which caused on-line resizing of filesystems with a
1k blocksize to fail.  The root cause of this bug was the fact that if
an uninitalized bitmap block gets read in by userspace (which
e2fsprogs does try to avoid, but can happen when the blocksize is less
than the pagesize and an adjacent blocks is read into memory)
ext4_read_block_bitmap() was erroneously depending on the buffer
uptodate flag to decide whether it needed to initialize the bitmap
block in memory --- i.e., to set the standard set of blocks in use by
a block group (superblock, bitmaps, inode table, etc.).  Essentially,
ext4_read_block_bitmap() assumed it was the only routine that might
try to read a block containing a block bitmap, which is simply not
true.  

To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
must always initialize uninitialized bitmap blocks.  Once a block or
inode is allocated out of that bitmap, it will be marked as
initialized in the block group descriptor, so in general this won't
result any extra unnecessary work.
Signed-off-by: NFrederic Bohe <frederic.bohe@bull.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c806e68f

T
ext4: Remove old legacy block allocator · c2ea3fde
由 Theodore Ts'o 提交于 10月 10, 2008
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
c2ea3fde

09 9月, 2008 2 次提交
- T
  ext4: Fix whitespace checkpatch warnings/errors · af5bc92d
  由 Theodore Ts'o 提交于 9月 08, 2008
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
  af5bc92d
- T
  ext4: Add printk priority levels to clean up checkpatch warnings · 4776004f
  由 Theodore Ts'o 提交于 9月 08, 2008
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
  4776004f
20 8月, 2008 1 次提交

ext4: Fix bug where we return ENOSPC even though we have plenty of inodes · c001077f

由 Eric Sandeen 提交于 8月 19, 2008

The find_group_flex() function starts with best_flex as the
parent_fbg_group, which happens to have 0 inodes free.  Some of the
flex groups searched have free blocks and free inodes, but the
flex_freeb_ratio is < 10, so they're skipped.  Then when a group is
compared to the current "best" flex group, it does not have more free
blocks than "best", so it is skipped as well.

This continues until no flex group with free inodes is found which has
a proper ratio or which has more free blocks than the "best" group,
and we're left with a "best" group that has 0 inodes free, and we
return -ENOSPC.

We fix this by changing the logic so that if the current "best" flex
group has no inodes free, and the current one does have room, it is
promoted to the next "best."
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c001077f

03 8月, 2008 2 次提交

ext4: lock block groups when initializing · b5f10eed

由 Eric Sandeen 提交于 8月 02, 2008

I noticed when filling a 1T filesystem with 4 threads using the
fs_mark benchmark:

fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0

that I occasionally got checksum mismatch errors:

EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935

etc.  I'd reliably get 4-5 of them during the run.

It appears that the problem is likely a race to init the bg's
when the uninit_bg feature is enabled.

With the patch below, which adds sb_bgl_locking around initialization,
I was able to complete several runs with no errors or warnings.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b5f10eed

ext4: sync up block and inode bitmap reading functions · e29d1cde

由 Eric Sandeen 提交于 8月 02, 2008

ext4_read_block_bitmap and read_inode_bitmap do essentially
the same thing, and yet they are structured quite differently.
I came across this difference while looking at doing bg locking
during bg initialization.

This patch:

* removes unnecessary casts in the error messages
* renames read_inode_bitmap to ext4_read_inode_bitmap
* and more substantially, restructures the inode bitmap
  reading function to be more like the block bitmap counterpart.

The change to the inode bitmap reader simplifies the locking
to be applied in the next patch.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

e29d1cde

12 7月, 2008 4 次提交

ext4: do not set extents feature from the kernel · e4079a11

由 Eric Sandeen 提交于 7月 11, 2008

We've talked for a while about getting rid of any feature-
setting from the kernel; this gets rid of the code which would
set the INCOMPAT_EXTENTS flag on the first file write when mounted
as ext4[dev].

With this patch, if the extents feature is not already set on disk,
then mounting as ext4 will fall back to noextents with a warning,
and if -o extents is explicitly requested, the mount will fail,
also with warning.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e4079a11

ext4: New inode allocation for FLEX_BG meta-data groups. · 772cb7c8

由 Jose R. Santos 提交于 7月 11, 2008

This patch mostly controls the way inode are allocated in order to
make ialloc aware of flex_bg block group grouping.  It achieves this
by bypassing the Orlov allocator when block group meta-data are packed
toghether through mke2fs.  Since the impact on the block allocator is
minimal, this patch should have little or no effect on other block
allocation algorithms. By controlling the inode allocation, it can
basically control where the initial search for new block begins and
thus indirectly manipulate the block allocator.

This allocator favors data and meta-data locality so the disk will
gradually be filled from block group zero upward.  This helps improve
performance by reducing seek time.  Since the group of inode tables
within one flex_bg are treated as one giant inode table, uninitialized
block groups would not need to partially initialize as many inode
table as with Orlov which would help fsck time as the filesystem usage
goes up.
Signed-off-by: NJose R. Santos <jrs@us.ibm.com>
Signed-off-by: NValerie Clement <valerie.clement@bull.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

772cb7c8

ext4: Rename read_block_bitmap() to ext4_read_block_bitmap() · 574ca174

由 Theodore Ts'o 提交于 7月 11, 2008

Since this a non-static function, make it be ext4 specific to avoid
conflicts with potentially other filesystems.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

574ca174

ext4: handle corrupted orphan list at mount · 91ef4caf

由 Duane Griffin 提交于 7月 11, 2008

If the orphan node list includes valid, untruncatable nodes with nlink > 0
the ext4_orphan_cleanup loop which attempts to delete them will not do so,
causing it to loop forever. Fix by checking for such nodes in the
ext4_orphan_get function.

This patch fixes the second case (image hdb.20000009.softlockup.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.
Signed-off-by: NDuane Griffin <duaneg@dghda.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

91ef4caf

30 4月, 2008 2 次提交

ext4: mark inode dirty after initializing the extent tree · 8753e88f

由 Aneesh Kumar K.V 提交于 4月 29, 2008

We should mark the inode dirty only after initializing the extent
tree.  Also if we fail during extent initialization we need
to call DQUOT_FREE_INODE.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8753e88f

ext4: move headers out of include/linux · 3dcf5451

由 Christoph Hellwig 提交于 4月 29, 2008

Move ext4 headers out of include/linux.  This is just the trivial move,
there's some more thing that could be done later. 
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3dcf5451

22 4月, 2008 1 次提交

ext*: spelling fix prefered -> preferred · 1cc8dcf5

由 Benoit Boissinot 提交于 4月 21, 2008

Spelling fix: prefered -> preferred
Signed-off-by: NBenoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>

1cc8dcf5

17 4月, 2008 2 次提交

ext4: replace remaining __FUNCTION__ occurrences · 46e665e9

由 Harvey Harrison 提交于 4月 17, 2008

__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

46e665e9

ext4: le*_add_cpu conversion · e8546d06

由 Marcin Slusarz 提交于 4月 17, 2008

replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
					expression_in_cpu_byteorder);
with:
	leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
generated with semantic patch
Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Cc: sct@redhat.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: adilger@clusterfs.com
Cc: Mingming Cao <cmm@us.ibm.com>

e8546d06

29 4月, 2008 1 次提交

ext4: Enable extent format for symlinks. · e65187e6

由 Aneesh Kumar K.V 提交于 4月 29, 2008

This patch enables extent-formatted normal symlinks.  Using extents
format allows a symlink to refer to a block number larger than 2^32
on large filesystems.  We still don't enable extent format for fast
symlinks, which are contained in the inode itself.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e65187e6

26 2月, 2008 1 次提交

ext4: set EXT4_EXTENTS_FL only for directory and regular files · 42bf0383

由 Aneesh Kumar K.V 提交于 2月 25, 2008

In addition, don't inherit EXT4_EXTENTS_FL from parent directory.
If we have a directory with extent flag set and later mount the file
system with -o noextents, the files created in that directory will also
have extent flag set but we would not have called ext4_ext_tree_init for
them. This will cause error later when we are verifying the extent header
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

42bf0383

08 2月, 2008 1 次提交

iget: stop EXT4 from using iget() and read_inode() · 1d1fe1ee

由 David Howells 提交于 2月 07, 2008

Stop the EXT4 filesystem from using iget() and read_inode().  Replace
ext4_read_inode() with ext4_iget(), and call that instead of iget().
ext4_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.

ext4_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Acked-by: NJan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d1fe1ee

29 1月, 2008 5 次提交

ext4: fix up EXT4FS_DEBUG builds · c549a95d

由 Eric Sandeen 提交于 1月 28, 2008

Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail
without these changes.  Clean up some format warnings too.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

c549a95d

ext4: Rename i_dir_acl to i_size_high · a48380f7

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Rename ext4_inode.i_dir_acl to i_size_high
drop ext4_inode_info.i_dir_acl as it is not used
Rename ext4_inode.i_size to ext4_inode.i_size_lo
Add helper function for accessing the ext4_inode combined i_size.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

a48380f7

ext4: Introduce ext4_update_*_feature · 99e6f829

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Introduce ext4_update_*_feature and use them instead
of opencoding.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

99e6f829

ext4: fixes block group number being set to a negative value · 2aa9fc4c

由 Avantika Mathur 提交于 1月 28, 2008

This patch fixes various places where the group number is set to a negative
value.
Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2aa9fc4c

ext4: add ext4_group_t, and change all group variables to this type. · fd2d4291

由 Avantika Mathur 提交于 1月 28, 2008

In many places variables for block group are of type int, which limits the
maximum number of block groups to 2^31. Each block group can have up to
2^15 blocks, with a 4K block size, and the max filesystem size is limited to
2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB

This patch introduces a new type ext4_group_t, of type unsigned long, to
represent block group numbers in ext4.
All occurrences of block group variables are converted to type ext4_group_t.
Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>

fd2d4291

18 10月, 2007 2 次提交

Ext4: Uninitialized Block Groups · 717d50e4

由 Andreas Dilger 提交于 10月 16, 2007

In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
regardless of whether it is in use.  This is this the most time consuming part
of the filesystem check.  The unintialized block group feature can greatly
reduce e2fsck time by eliminating checking of uninitialized inodes.

With this feature, there is a a high water mark of used inodes for each block
group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
of each group descriptor is used to ensure that corruption in the group
descriptor's bit flags does not cause incorrect operation.

The feature is enabled through a mkfs option

	mke2fs /dev/ -O uninit_groups

A patch adding support for uninitialized block groups to e2fsprogs tools has
been posted to the linux-ext4 mailing list.

The patches have been stress tested with fsstress and fsx.  In performance
tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
linearly with the total number of inodes in the filesytem.  In ext4 with the
uninitialized block groups feature, the e2fsck time is constant, based
solely on the number of used inodes rather than the total inode count.
Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
greatly reduce e2fsck time for users.  With performance improvement of 2-20
times, depending on how full the filesystem is.

The attached graph shows the major improvements in e2fsck times in filesystems
with a large total inode count, but few inodes in use.

In each group descriptor if we have

EXT4_BG_INODE_UNINIT set in bg_flags:
        Inode table is not initialized/used in this group. So we can skip
        the consistency check during fsck.
EXT4_BG_BLOCK_UNINIT set in bg_flags:
        No block in the group is used. So we can skip the block bitmap
        verification for this group.

We also add two new fields to group descriptor as a part of
uninitialized group patch.

        __le16  bg_itable_unused;       /* Unused inodes count */
        __le16  bg_checksum;            /* crc16(sb_uuid+group+desc) */

bg_itable_unused:

If we have EXT4_BG_INODE_UNINIT not set in bg_flags
then bg_itable_unused will give the offset within
the inode table till the inodes are used. This can be
used by fsck to skip list of inodes that are marked unused.

bg_checksum:
Now that we depend on bg_flags and bg_itable_unused to determine
the block and inode usage, we need to make sure group descriptor
is not corrupt. We add checksum to group descriptor to
detect corruption. If the descriptor is found to be corrupt, we
mark all the blocks and inodes in the group used.
Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

717d50e4

ext4: Remove (partial, never completed) fragment support · f077d0d7

由 Coly Li 提交于 10月 16, 2007

Fragment support in ext2/3/4 was never implemented, and it probably will
never be implemented.   So remove it from ext4.
Signed-off-by: NColy Li <coyli@suse.de>
Acked-by: NAndreas Dilger <adilger@clusterfs.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f077d0d7