提交 · c7d206b3379f7d6462e778b74f475c470ee3dcaf · openeuler / Kernel

12 7月, 2008 6 次提交

ext4: Use page_mkwrite vma_operations to get mmap write notification. · 2e9ee850

由 Aneesh Kumar K.V 提交于 7月 11, 2008

We would like to get notified when we are doing a write on mmap section.
This is needed with respect to preallocated area. We split the preallocated
area into initialzed extent and uninitialzed extent in the call back. This
let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
that would result in data loss. The changes are also needed to handle ENOSPC
when writing to an mmap section of files with holes.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2e9ee850

ext4: fix online resize with mballoc · 5f21b0e6

由 Frederic Bohe 提交于 7月 11, 2008

Update group infos when updating a group's descriptor.
Add group infos when adding a group's descriptor.
Refresh cache pages used by mb_alloc when changes occur.
This will probably need modifications when META_BG resizing will be allowed.
Signed-off-by: NFrederic Bohe <frederic.bohe@bull.net>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

5f21b0e6

ext4: mballoc avoid use root reserved blocks for non root allocation · 07031431

由 Mingming Cao 提交于 7月 11, 2008

mballoc allocation missed check for blocks reserved for root users. Add
ext4_has_free_blocks() check before allocation. Also modified
ext4_has_free_blocks() to support multiple block allocation request.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

07031431

ext4: cleanup block allocator · 654b4908

由 Aneesh Kumar K.V 提交于 7月 11, 2008

Move the code related to block allocation to a single function and add helper
funtions to differient allocation for data and meta data blocks
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

654b4908

ext4: Use inode preallocation with -o noextents · 7061eba7

由 Aneesh Kumar K.V 提交于 7月 11, 2008

When mballoc is enabled, block allocation for old block-based
files are allocated using mballoc allocator instead of old
block-based allocator. The old ext3 block reservation is turned
off when mballoc is turned on.

However, the in-core preallocation is not enabled for block-based/
non-extent based file block allocation. This result in performance
regression, as now we don't have "reservation" ore in-core preallocation
to prevent interleaved fragmentation in multiple writes workload.

This patch fix this by enable per inode in-core preallocation
for non extent files when mballoc is used.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7061eba7

ext4: New inode allocation for FLEX_BG meta-data groups. · 772cb7c8

由 Jose R. Santos 提交于 7月 11, 2008

This patch mostly controls the way inode are allocated in order to
make ialloc aware of flex_bg block group grouping.  It achieves this
by bypassing the Orlov allocator when block group meta-data are packed
toghether through mke2fs.  Since the impact on the block allocator is
minimal, this patch should have little or no effect on other block
allocation algorithms. By controlling the inode allocation, it can
basically control where the initial search for new block begins and
thus indirectly manipulate the block allocator.

This allocator favors data and meta-data locality so the disk will
gradually be filled from block group zero upward.  This helps improve
performance by reducing seek time.  Since the group of inode tables
within one flex_bg are treated as one giant inode table, uninitialized
block groups would not need to partially initialize as many inode
table as with Orlov which would help fsck time as the filesystem usage
goes up.
Signed-off-by: NJose R. Santos <jrs@us.ibm.com>
Signed-off-by: NValerie Clement <valerie.clement@bull.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

772cb7c8

14 7月, 2008 1 次提交

ext4: replace __FUNCTION__ occurrences · 4db9c54a

由 Stoyan Gaydarov 提交于 7月 13, 2008

__FUNCTION__ is gcc-specific, use __func__ instead
Signed-off-by: NStoyan Gaydarov <stoyboyker@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4db9c54a

12 7月, 2008 2 次提交

ext4: fix comments to say "ext4" · 8a35694e

由 Shen Feng 提交于 7月 11, 2008

Change second/third to fourth.
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8a35694e

ext4: handle corrupted orphan list at mount · 91ef4caf

由 Duane Griffin 提交于 7月 11, 2008

If the orphan node list includes valid, untruncatable nodes with nlink > 0
the ext4_orphan_cleanup loop which attempts to delete them will not do so,
causing it to loop forever. Fix by checking for such nodes in the
ext4_orphan_get function.

This patch fixes the second case (image hdb.20000009.softlockup.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.
Signed-off-by: NDuane Griffin <duaneg@dghda.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

91ef4caf

30 4月, 2008 1 次提交

ext4: move headers out of include/linux · 3dcf5451

由 Christoph Hellwig 提交于 4月 29, 2008

Move ext4 headers out of include/linux.  This is just the trivial move,
there's some more thing that could be done later. 
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3dcf5451

29 4月, 2008 1 次提交

ext4: remove duplicate include of ext4_fs_i.h header file · 418f6e9e

由 Joe Perches 提交于 4月 29, 2008

include/linux/ext4_fs_i.h is included in include/linux/ext_fs.h twice
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

418f6e9e

30 4月, 2008 1 次提交

Convert ext4 to use unlocked_ioctl · 5cdd7b2d

由 Andi Kleen 提交于 4月 29, 2008

I checked ext4_ioctl and it looked largely safe to not be used
without BKL.  So convert it over to unlocked_ioctl.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

5cdd7b2d

29 4月, 2008 1 次提交

ext4: Fix race between migration and mmap write · 267e4db9

由 Aneesh Kumar K.V 提交于 4月 29, 2008

Fail migrate if we allocated new blocks via mmap write.

If we write to holes in the file via mmap, we end up allocating
new blocks. This block allocation happens without taking inode->i_mutex.
Since migrate is protected by i_mutex and migrate expects that no
new blocks get allocated during migrate, fail migrate if new blocks
get allocated.

We can't take inode->i_mutex in the mmap write path because that
would result in a locking order violation between i_mutex and mmap_sem.
Also adding a separate rw_sempahore for protection is really high overhead
for a rare operation such as migrate.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

267e4db9

10 2月, 2008 1 次提交

ext4: Add new "development flag" to the ext4 filesystem · 469108ff

由 Theodore Tso 提交于 2月 10, 2008

This flag is simply a generic "this is a crash/burn test filesystem"
marker.  If it is set, then filesystem code which is "in development"
will be allowed to mount the filesystem.  Filesystem code which is not
considered ready for prime-time will check for this flag, and if it is
not set, it will refuse to touch the filesystem.

As we start rolling ext4 out to distro's like Fedora, et. al, this makes
it less likely that a user might accidentally start using ext4 on a
production filesystem; a bad thing, since that will essentially make it
be unfsckable until e2fsprogs catches up.
Signed-off-by: NTheodore Tso <tytso@MIT.EDU>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

469108ff

08 2月, 2008 1 次提交

iget: stop EXT4 from using iget() and read_inode() · 1d1fe1ee

由 David Howells 提交于 2月 07, 2008

Stop the EXT4 filesystem from using iget() and read_inode().  Replace
ext4_read_inode() with ext4_iget(), and call that instead of iget().
ext4_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.

ext4_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Acked-by: NJan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d1fe1ee

29 1月, 2008 18 次提交

ext4: Add multi block allocator for ext4 · c9de560d

由 Alex Tomas 提交于 1月 29, 2008

Signed-off-by: NAlex Tomas <alex@clusterfs.com>
Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c9de560d

ext4: Add ext4_find_next_bit() · aa02ad67

由 Aneesh Kumar K.V 提交于 1月 28, 2008

This function is used by the ext4 multi block allocator patches.

Also add generic_find_next_le_bit
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

aa02ad67

ext4: Add EXT4_IOC_MIGRATE ioctl · c14c6fd5

由 Aneesh Kumar K.V 提交于 1月 28, 2008

The below patch add ioctl for migrating ext3 indirect block mapped inode
to ext4 extent mapped inode.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

c14c6fd5

ext4: Add inode version support in ext4 · 25ec56b5

由 Jean Noel Cordenner 提交于 1月 28, 2008

This patch adds 64-bit inode version support to ext4. The lower 32 bits
are stored in the osd1.linux1.l_i_version field while the high 32 bits
are stored in the i_version_hi field newly created in the ext4_inode.
This field is incremented in case the ext4_inode is large enough. A
i_version mount option has been added to enable the feature.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
Signed-off-by: NKalpak Shah <kalpak@clusterfs.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NJean Noel Cordenner <jean-noel.cordenner@bull.net>

25ec56b5

ext4: Add the journal checksum feature · 818d276c

由 Girish Shilamkar 提交于 1月 28, 2008

The journal checksum feature adds two new flags i.e
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.

JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
checksum for the blocks described by the descriptor blocks.
Due to checksums, writing of the commit record no longer needs to be
synchronous. Now commit record can be sent to disk without waiting for
descriptor blocks to be written to disk. This behavior is controlled
using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
able to recover the journal with _ASYNC_COMMIT hence it is made
incompat.
The commit header has been extended to hold the checksum along with the
type of the checksum.

For recovery in pass scan checksums are verified to ensure the sanity
and completeness(in case of _ASYNC_COMMIT) of every transaction.
Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
Signed-off-by: NGirish Shilamkar <girish@clusterfs.com>
Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

818d276c

ext4: Convert truncate_mutex to read write semaphore. · 0e855ac8

由 Aneesh Kumar K.V 提交于 1月 28, 2008

We are currently taking the truncate_mutex for every read. This would have
performance impact on large CPU configuration. Convert the lock to read write
semaphore and take read lock when we are trying to read the file.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

0e855ac8

ext4: Make ext4_get_blocks_wrap take the truncate_mutex early. · c278bfec

由 Aneesh Kumar K.V 提交于 1月 28, 2008

When doing a migrate from ext3 to ext4 inode we need to make sure the test
for inode type and walking inode data happens inside lock. To make this
happen move truncate_mutex early before checking the i_flags.

This actually should enable us to remove the verify_chain().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

c278bfec

ext4: sync up block group descriptor with e2fsprogs. · 91b51a01

由 Coly Li 提交于 1月 28, 2008

This patch extends bg_itable_unused of ext4 group descriptor
from 16bit into 32bit. In order to add bg_itable_unused_hi into
struct ext4_group_desc, some extra fields which are already introduced into
e2fsprogs are also added in for consistency.
Signed-off-by: NColy Li <coyli@suse.de>
Cc: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

91b51a01

ext4: Support large files · 8180a562

由 Aneesh Kumar K.V 提交于 1月 28, 2008

This patch converts ext4_inode i_blocks to represent total
blocks occupied by the inode in file system block size.
Earlier the variable used to represent this in 512 byte
block size. This actually limited the total size of the file.

The feature is enabled transparently when we write an inode
whose i_blocks cannot be represnted as 512 byte units in a
48 bit variable.

inode flag  EXT4_HUGE_FILE_FL
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

8180a562

ext4: Add support for 48 bit inode i_blocks. · 0fc1b451

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode
to represet the higher 16 bits for i_blocks. With this change max_file
size becomes (2**48 -1 )* 512 bytes.

We add a RO_COMPAT feature to the super block to indicate that inode
have i_blocks represented as a split 48 bits. Super block with this
feature set cannot be mounted read write on a kernel with CONFIG_LSF
disabled.

Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

0fc1b451

ext4: Rename i_dir_acl to i_size_high · a48380f7

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Rename ext4_inode.i_dir_acl to i_size_high
drop ext4_inode_info.i_dir_acl as it is not used
Rename ext4_inode.i_size to ext4_inode.i_size_lo
Add helper function for accessing the ext4_inode combined i_size.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

a48380f7

ext4: Rename i_file_acl to i_file_acl_lo · 7973c0c1

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Rename i_file_acl to i_file_acl_lo. This helps
in finding bugs where we use i_file_acl instead
of the combined i_file_acl_lo and i_file_acl_high
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

7973c0c1

ext4: Fix sparse warnings. · 1d03ec98

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Fix sparse warnings related to static functions
and local variables.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

1d03ec98

ext4: Introduce ext4_update_*_feature · 99e6f829

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Introduce ext4_update_*_feature and use them instead
of opencoding.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

99e6f829

ext4: add ext4_group_t, and change all group variables to this type. · fd2d4291

由 Avantika Mathur 提交于 1月 28, 2008

In many places variables for block group are of type int, which limits the
maximum number of block groups to 2^31. Each block group can have up to
2^15 blocks, with a 4K block size, and the max filesystem size is limited to
2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB

This patch introduces a new type ext4_group_t, of type unsigned long, to
represent block group numbers in ext4.
All occurrences of block group variables are converted to type ext4_group_t.
Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>

fd2d4291

ext4: Introduce ext4_lblk_t · 725d26d3

由 Aneesh Kumar K.V 提交于 1月 28, 2008

This patch adds a new data type ext4_lblk_t to represent
the logical file blocks.

This is the preparatory patch to support large files in ext4
The follow up patch with convert the ext4_inode i_blocks to
represent the number of blocks in file system block size. This
changes makes it possible to have a block number 2**32 -1 which
will result in overflow if the block number is represented by
signed long. This patch convert all the block number to type
ext4_lblk_t which is typedef to __u32

Also remove dead code ext4_ext_walk_space
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

725d26d3

ext4: Avoid rec_len overflow with 64KB block size · a72d7f83

由 Jan Kara 提交于 1月 28, 2008

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0xffff instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

a72d7f83

ext4: Support large blocksize up to PAGESIZE · afc7cbca

由 Takashi Sato 提交于 1月 28, 2008

This patch set supports large block size(>4k, <=64k) in ext4,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext4 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0xffff to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext4: enlarge blocksize
         - Allow blocksize up to pagesize

  [2/2]  ext4: fix rec_len overflow
         - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext4dev, and able to handle empty directory block.
Signed-off-by: NTakashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

afc7cbca

18 10月, 2007 7 次提交

ext4: Convert s_r_blocks_count and s_free_blocks_count · 308ba3ec

由 Aneesh Kumar K.V 提交于 10月 16, 2007

Convert s_r_blocks_count and s_free_blocks_count to
s_r_blocks_count_lo and s_free_blocks_count_lo

This helps in finding BUGs due to direct partial access of
these split 64 bit values

Also fix direct partial access in ext4 code
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

308ba3ec

ext4: Convert s_blocks_count to s_blocks_count_lo · 6bc9feff

由 Aneesh Kumar K.V 提交于 10月 16, 2007

Convert s_blocks_count to s_blocks_count_lo
This helps in finding BUGs due to direct partial access of
these split 64 bit values

Also fix direct partial access in ext4 code
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

6bc9feff

ext4: Convert bg_inode_bitmap and bg_inode_table · 5272f837

由 Aneesh Kumar K.V 提交于 10月 16, 2007

Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo
and bg_inode_table_lo.  This helps in finding BUGs due to
direct partial access of these split 64 bit values

Also fix one direct partial access
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

5272f837

ext4: Convert bg_block_bitmap to bg_block_bitmap_lo · 3a14589c

由 Aneesh Kumar K.V 提交于 10月 16, 2007

Convert bg_block_bitmap to bg_block_bitmap_lo
This helps in catching some BUGS due to direct
partial access of these split fields.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

3a14589c

ext4: FLEX_BG Kernel support v2. · ce421581

由 Jose R. Santos 提交于 10月 16, 2007

This feature relaxes check restrictions on where each block groups meta
data is located within the storage media.  This allows for the allocation
of bitmaps or inode tables outside the block group boundaries in cases
where bad blocks forces us to look for new blocks which the owning block
group can not satisfy.  This will also allow for new meta-data allocation
schemes to improve performance and scalability.
Signed-off-by: NJose R. Santos <jrs@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

ce421581

ext4: Fix sparse warnings · c1bddad9

由 Aneesh Kumar K.V 提交于 10月 16, 2007

Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

c1bddad9

Ext4: Uninitialized Block Groups · 717d50e4

由 Andreas Dilger 提交于 10月 16, 2007

In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
regardless of whether it is in use.  This is this the most time consuming part
of the filesystem check.  The unintialized block group feature can greatly
reduce e2fsck time by eliminating checking of uninitialized inodes.

With this feature, there is a a high water mark of used inodes for each block
group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
of each group descriptor is used to ensure that corruption in the group
descriptor's bit flags does not cause incorrect operation.

The feature is enabled through a mkfs option

	mke2fs /dev/ -O uninit_groups

A patch adding support for uninitialized block groups to e2fsprogs tools has
been posted to the linux-ext4 mailing list.

The patches have been stress tested with fsstress and fsx.  In performance
tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
linearly with the total number of inodes in the filesytem.  In ext4 with the
uninitialized block groups feature, the e2fsck time is constant, based
solely on the number of used inodes rather than the total inode count.
Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
greatly reduce e2fsck time for users.  With performance improvement of 2-20
times, depending on how full the filesystem is.

The attached graph shows the major improvements in e2fsck times in filesystems
with a large total inode count, but few inodes in use.

In each group descriptor if we have

EXT4_BG_INODE_UNINIT set in bg_flags:
        Inode table is not initialized/used in this group. So we can skip
        the consistency check during fsck.
EXT4_BG_BLOCK_UNINIT set in bg_flags:
        No block in the group is used. So we can skip the block bitmap
        verification for this group.

We also add two new fields to group descriptor as a part of
uninitialized group patch.

        __le16  bg_itable_unused;       /* Unused inodes count */
        __le16  bg_checksum;            /* crc16(sb_uuid+group+desc) */

bg_itable_unused:

If we have EXT4_BG_INODE_UNINIT not set in bg_flags
then bg_itable_unused will give the offset within
the inode table till the inodes are used. This can be
used by fsck to skip list of inodes that are marked unused.

bg_checksum:
Now that we depend on bg_flags and bg_itable_unused to determine
the block and inode usage, we need to make sure group descriptor
is not corrupt. We add checksum to group descriptor to
detect corruption. If the descriptor is found to be corrupt, we
mark all the blocks and inodes in the group used.
Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

717d50e4

openeuler / Kernel 11 个月 前同步成功

openeuler / Kernel
11 个月前同步成功