提交 · 7be2baaa0322c59ba888aa5260a8c130666acd41 · openeuler / raspberrypi-kernel

10 2月, 2009 1 次提交

ext4: Fix to read empty directory blocks correctly in 64k · 7be2baaa

由 Wei Yongjun 提交于 2月 10, 2009

The rec_len field in the directory entry is 16 bits, so there was a
problem representing rec_len for filesystems with a 64k block size in
the case where the directory entry takes the entire 64k block.
Unfortunately, there were two schemes that were proposed; one where
all zeros meant 65536 and one where all ones (65535) meant 65536.
E2fsprogs used 0, whereas the kernel used 65535.  Oops.  Fortunately
this case happens extremely rarely, with the most common case being
the lost+found directory, created by mke2fs.

So we will be liberal in what we accept, and accept both encodings,
but we will continue to encode 65536 as 65535.  This will require a
change in e2fsprogs, but with fortunately ext4 filesystems normally
have the dir_index feature enabled, which precludes having a
completely empty directory block.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7be2baaa

11 2月, 2009 1 次提交

jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() · 7f5aa215

由 Jan Kara 提交于 2月 10, 2009

If we race with commit code setting i_transaction to NULL, we could
possibly dereference it.  Proper locking requires the journal pointer
(to access journal->j_list_lock), which we don't have.  So we have to
change the prototype of the function so that filesystem passes us the
journal pointer.  Also add a more detailed comment about why the
function jbd2_journal_begin_ordered_truncate() does what it does and
how it should be used.

Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
suspitious code.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Acked-by: NJoel Becker <joel.becker@oracle.com>
CC: linux-ext4@vger.kernel.org
CC: ocfs2-devel@oss.oracle.com
CC: mfasheh@suse.de
CC: Dan Carpenter <error27@gmail.com>

7f5aa215

10 2月, 2009 1 次提交

Revert "ext4: wait on all pending commits in ext4_sync_fs()" · 9eddacf9

由 Jan Kara 提交于 2月 10, 2009

This undoes commit 14ce0cb4.

Since jbd2_journal_start_commit() is now fixed to return 1 when we
started a transaction commit, there's some transaction waiting to be
committed or there's a transaction already committing, we don't
need to call ext4_force_commit() in ext4_sync_fs(). Furthermore
ext4_force_commit() can unnecessarily create sync transaction which is
expensive so it's worthwhile to remove it when we can.

http://bugzilla.kernel.org/show_bug.cgi?id=12224Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>
Cc: linux-ext4@vger.kernel.org

9eddacf9

30 1月, 2009 1 次提交

ext4: Remove bogus BUG() check in ext4_bmap() · b9ec63f7

由 Theodore Ts'o 提交于 1月 30, 2009

The code to support journal-less ext4 operation added a BUG to
ext4_bmap() which fired if there was no journal and the
EXT4_STATE_JDATA bit was set in the i_state field.  This caused
running the filefrag program (which uses the FIMBAP ioctl) to trigger
a BUG().

The EXT4_STATE_JDATA bit is only used for ext4_bmap(), and it's
harmless for the bit to be set.  We could add a check in
__ext4_journalled_writepage() and ext4_journalled_write_end() to only
set the EXT4_STATE_JDATA bit if the journal is present, but that adds
an extra test and jump instruction.  It's easier to simply remove the
BUG check.

http://bugzilla.kernel.org/show_bug.cgi?id=12568Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

b9ec63f7

27 1月, 2009 2 次提交

ext4: Fix building with EXT4FS_DEBUG · 9fd9784c

由 Thadeu Lima de Souza Cascardo 提交于 1月 26, 2009

When bg_free_blocks_count was renamed to bg_free_blocks_count_lo in
560671a0, its uses under EXT4FS_DEBUG were not changed to the helper
ext4_free_blks_count.

Another commit, 498e5f24, also did not change everything needed under
EXT4FS_DEBUG, thus making it spill some warnings related to printing
format.

This commit fixes both issues and makes ext4 build again when
EXT4FS_DEBUG is enabled.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9fd9784c

ext4: Initialize the new group descriptor when resizing the filesystem · fdff73f0

由 Theodore Ts'o 提交于 1月 26, 2009

Make sure all of the fields of the group descriptor are properly
initialized.  Previously, we allowed bg_flags field to be contain
random garbage, which could trigger non-deterministic behavior,
including a kernel OOPS.

http://bugzilla.kernel.org/show_bug.cgi?id=12433Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

fdff73f0

20 1月, 2009 1 次提交

ext4: Fix ext4_free_blocks() w/o a journal when files have indirect blocks · e7f07968

由 Theodore Ts'o 提交于 1月 20, 2009

When trying to unlink a file with indirect blocks on a filesystem
without a journal, the "circular indirect block" sanity test was
getting falsely triggered.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e7f07968

17 1月, 2009 1 次提交

ext4: Add sanity check to make_indexed_dir · e6b8bc09

由 Theodore Ts'o 提交于 1月 16, 2009

Make sure the rec_len field in the '..' entry is sane, lest we overrun
the directory block and cause a kernel oops on a purposefully
corrupted filesystem.

Thanks to Sami Liedes for reporting this bug.

http://bugzilla.kernel.org/show_bug.cgi?id=12430Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

e6b8bc09

18 1月, 2009 1 次提交

ext4: only use i_size_high for regular files · 06a279d6

由 Theodore Ts'o 提交于 1月 17, 2009

Directories are not allowed to be bigger than 2GB, so don't use
i_size_high for anything other than regular files.  E2fsck should
complain about these inodes, but the simplest thing to do for the
kernel is to only use i_size_high for regular files.

This prevents an intentially corrupted filesystem from causing the
kernel to burn a huge amount of CPU and issuing error messages such
as:

EXT4-fs warning (device loop0): ext4_block_to_path: block 135090028 > max

Thanks to David Maciejak from Fortinet's FortiGuard Global Security
Research Team for reporting this issue.

http://bugzilla.kernel.org/show_bug.cgi?id=12375Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

06a279d6

10 1月, 2009 1 次提交

filesystem freeze: add error handling of write_super_lockfs/unlockfs · c4be0c1d

由 Takashi Sato 提交于 1月 09, 2009

Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests.  So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
   or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.
Signed-off-by: NTakashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: NMasayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
Cc: <xfs-masters@oss.sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4be0c1d

09 1月, 2009 2 次提交

fix similar typos to successfull · 73ac36ea

由 Coly Li 提交于 1月 07, 2009

When I review ocfs2 code, find there are 2 typos to "successfull".  After
doing grep "successfull " in kernel tree, 22 typos found totally -- great
minds always think alike :)

This patch fixes all the similar typos. Thanks for Randy's ack and comments.
Signed-off-by: NColy Li <coyli@suse.de>
Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NRoland Dreier <rolandd@cisco.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73ac36ea

generic swap(): ext4: remove local swap() macro · 97e133b4

由 Wu Fengguang 提交于 1月 07, 2009

Use the new generic implementation.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

97e133b4

07 1月, 2009 2 次提交

percpu_counter: FBC_BATCH should be a variable · 179f7ebf

由 Eric Dumazet 提交于 1月 06, 2009

For NR_CPUS >= 16 values, FBC_BATCH is 2*NR_CPUS

Considering more and more distros are using high NR_CPUS values, it makes
sense to use a more sensible value for FBC_BATCH, and get rid of NR_CPUS.

A sensible value is 2*num_online_cpus(), with a minimum value of 32 (This
minimum value helps branch prediction in __percpu_counter_add())

We already have a hotcpu notifier, so we can adjust FBC_BATCH dynamically.

We rename FBC_BATCH to percpu_counter_batch since its not a constant
anymore.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

179f7ebf

ext4: Remove "extents" mount option · 83982b6f

由 Theodore Ts'o 提交于 1月 06, 2009

This mount option is largely superfluous, and in fact the way it was
implemented was buggy; if a filesystem which did not have the extents
feature flag was mounted -o extents, the filesystem would attempt to
create and use extents-based file even though the extents feature flag
was not eabled. The simplest thing to do is to nuke the mount option
entirely. It's not all that useful to force the non-creation of new
extent-based files if the filesystem can support it.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

83982b6f

06 1月, 2009 1 次提交

ext4: Make printk's consistently prefixed with "EXT4-fs: " · abda1418

由 Theodore Ts'o 提交于 1月 06, 2009

Previously, some were "ext4: ", and some were "EXT4: "; change them to
be consistent with most ext4 printk's, which is to use "EXT4-fs: ".
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

abda1418

07 1月, 2009 1 次提交

ext4: Add sanity checks for the superblock before mounting the filesystem · 4ec11028

由 Theodore Ts'o 提交于 1月 06, 2009

This avoids insane superblock configurations that could lead to kernel
oops due to null pointer derefences.

http://bugzilla.kernel.org/show_bug.cgi?id=12371

Thanks to David Maciejak at Fortinet's FortiGuard Global Security
Research Team who discovered this bug independently (but at
approximately the same time) as Thiemo Nagel, who submitted the patch.
Signed-off-by: NThiemo Nagel <thiemo.nagel@ph.tum.de>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

4ec11028

06 1月, 2009 3 次提交
- T
  ext4: Add mount option to set kjournald's I/O priority · b3881f74
  由 Theodore Ts'o 提交于 1月 05, 2009
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jens Axboe <jens.axboe@oracle.com>
```
  b3881f74
- J
  ext4: Add default allocation routines for quota structures · a5b5ee32
  由 Jan Kara 提交于 11月 25, 2008
```
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
```
  a5b5ee32
- J
  ext4: Use sb_any_quota_loaded() instead of sb_any_quota_enabled() · 17bd13b3
  由 Jan Kara 提交于 8月 20, 2008
```
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
```
  17bd13b3
05 1月, 2009 2 次提交

fs: symlink write_begin allocation context fix · 54566b2c

由 Nick Piggin 提交于 1月 04, 2009

With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened.  They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim.  This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock.  The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
this flag in their write_begin function.  Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>		[2.6.28.x]
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
  untouched to the grab_cache_page_write_begin() function.  That
  just simplifies everybody, and may even allow future expansion of the
  logic.   - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

54566b2c

fs: introduce bgl_lock_ptr() · c644f0e4

由 Pekka Enberg 提交于 1月 04, 2009

As suggested by Andreas Dilger, introduce a bgl_lock_ptr() helper in
<linux/blockgroup_lock.h> and add separate sb_bgl_lock() helpers to
filesystem specific header files to break the hidden dependency to
struct ext[234]_sb_info.

Also, while at it, convert the macros to static inlines to try make up
for all the times I broke Andrew Morton's tree.
Acked-by: NAndreas Dilger <adilger@sun.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c644f0e4

04 1月, 2009 1 次提交
- T
  ext4: Add markers for better debuggability · ba80b101
  由 Theodore Ts'o 提交于 1月 03, 2009
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
  ba80b101
07 1月, 2009 1 次提交

ext4: Remove code to create the journal inode · c3191067

由 Theodore Ts'o 提交于 1月 06, 2009

This code has been obsolete in quite some time, since the supported
method for adding a journal inode is to use tune2fs (or to creating
new filesystem with a journal via mke2fs or mkfs.ext4).
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c3191067

06 1月, 2009 7 次提交

ext4: provide function to release metadata pages under memory pressure · c39a7f84

由 Toshiyuki Okajima 提交于 1月 05, 2009

Pages in the page cache belonging to ext4 data files are released via
the ext4_releasepage() function specified in the ext4 inode's
address_space_ops.  However, metadata blocks (such as indirect blocks,
directory blocks, etc) are managed via the block device
address_space_ops, and they can not be released by
try_to_free_buffers() if they have a journal head attached to them.

To address this, we supply a release_metadata function which calls
jbd2_journal_try_to_free_buffers() function to free the metadata, and
which is called by the block device's blkdev_releasepage() function.
Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: linux-fsdevel@vger.kernel.org

c39a7f84

ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc · 0087d9fb

由 Aneesh Kumar K.V 提交于 1月 05, 2009

With nodelalloc option we need to update the dirty block counter on
block allocation failure. This is needed because we increment the
dirty block counter early in the block allocation phase. Without
the patch s_dirty_blocks_counter goes wrong so that filesystem's
free blocks decreases incorrectly.
Tested-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

0087d9fb

ext4: Init the complete page while building buddy cache · 29eaf024

由 Aneesh Kumar K.V 提交于 1月 05, 2009

We need to init the complete page during buddy cache init
by setting the contents to '1'.  Otherwise we can see the
following errors after doing an online resize of the
filesystem:

EXT4-fs error (device sdb1): ext4_mb_mark_diskspace_used:
	Allocating block 1040385 in system zone of 127 group
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

29eaf024

ext4: Don't allow new groups to be added during block allocation · 8556e8f3

由 Aneesh Kumar K.V 提交于 1月 05, 2009

After we mark the blocks in the buddy cache as allocated,
we need to ensure that we don't reinit the buddy cache until
the block bitmap is updated.  This commit achieves this by holding
the group_info alloc_semaphore till ext4_mb_release_context
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

8556e8f3

ext4: mark the blocks/inode bitmap beyond end of group as used · 648f5879

由 Aneesh Kumar K.V 提交于 1月 05, 2009

We need to mark the block/inode bitmap beyond the end of the group
with '1'.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

648f5879

ext4: Use new buffer_head flag to check uninit group bitmaps initialization · 2ccb5fb9

由 Aneesh Kumar K.V 提交于 1月 05, 2009

For uninit block group, the on-disk bitmap is not initialized. That
implies we cannot depend on the uptodate flag on the bitmap
buffer_head to find bitmap validity.  Use a new buffer_head flag which
would be set after we properly initialize the bitmap.  This also
prevents (re-)initializing the uninit group bitmap every time we call 
ext4_read_block_bitmap().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

2ccb5fb9

ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() · 39341867

由 Aneesh Kumar K.V 提交于 1月 05, 2009

We need to make sure we update the inode bitmap and clear
EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since
ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide
whether to initialize the inode bitmap each time it is called.
(introduced by commit c806e68f.)

ext4_read_inode_bitmap does:

spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
	ext4_init_inode_bitmap(sb, bh, block_group, desc);

and ext4_new_inode does
if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
                   ino, inode_bitmap_bh->b_data))
		   ......
		   ...
spin_lock(sb_bgl_lock(sbi, group));

gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
i.e., on allocation we update the bitmap then we take the sb_bgl_lock
and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a
parallel ext4_read_inode_bitmap can zero out the bitmap in between
the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..)

The race results in below user visible errors
EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449
EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ...
EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ...
# ls -al /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71
ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

39341867

04 1月, 2009 1 次提交

ext4: code cleanup · 3300beda

由 Aneesh Kumar K.V 提交于 1月 03, 2009

Rename some variables.  We also unlock locks in the reverse order we
acquired as a part of cleanup.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3300beda

06 1月, 2009 3 次提交

ext4: Use high 16 bits of the block group descriptor's free counts fields · 560671a0

由 Aneesh Kumar K.V 提交于 1月 05, 2009

Rename the lower bits with suffix _lo and add helper
to access the values. Also rename bg_itable_unused_hi
to bg_pad as in e2fsprogs.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

560671a0

ext4: Fix race between read_block_bitmap() and mark_diskspace_used() · e8134b27

由 Aneesh Kumar K.V 提交于 1月 05, 2009

We need to make sure we update the block bitmap and clear
EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
whether to initialize the block bitmap each time it is called
(introduced by commit c806e68f), and this can race with block
allocations in ext4_mb_mark_diskspace_used().

ext4_read_block_bitmap does:

spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
	ext4_init_block_bitmap(sb, bh, block_group, desc);

Now on the block allocation side we do

mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
....
spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);

ie on allocation we update the bitmap then we take the sb_bgl_lock
and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
parallel ext4_read_block_bitmap can zero out the bitmap in between
the above mb_set_bits and spin_lock(sb_bg_lock..)

The race results in below user visible errors
EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

e8134b27

ext4: fix BUG when calling ext4_error with locked block group · 5d1b1b3f

由 Aneesh Kumar K.V 提交于 1月 05, 2009

The mballoc code likes to call ext4_error while it is holding locked
block groups. This can causes a scheduling in atomic context BUG. We
can't just unlock the block group and relock it after/if ext4_error
returns since that might result in race conditions in the case where
the filesystem is set to continue after finding errors.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5d1b1b3f

01 1月, 2009 2 次提交

A
nfsd race fixes: ext4 · 6b38e842
由 Al Viro 提交于 12月 30, 2008
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6b38e842

ext4: ensure fast symlinks are NUL-terminated · e83c1397

由 Duane Griffin 提交于 12月 19, 2008

Ensure fast symlink targets are NUL-terminated, even if corrupted
on-disk.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: adilger@sun.com
Cc: linux-ext4@vger.kernel.org
Signed-off-by: NDuane Griffin <duaneg@dghda.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e83c1397

29 12月, 2008 1 次提交

Get rid of CONFIG_LSF · b3a6ffe1

由 Jens Axboe 提交于 12月 12, 2008

We have two seperate config entries for large devices/files. One
is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF
that handles large files. This doesn't make a lot of sense, you typically
want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording
to indicate that it covers both.
Acked-by: NJean Delvare <khali@linux-fr.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b3a6ffe1

11 12月, 2008 2 次提交

revert "percpu_counter: new function percpu_counter_sum_and_set" · 02d21168

由 Andrew Morton 提交于 12月 09, 2008

Revert

    commit e8ced39d
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Fri Jul 11 19:27:31 2008 -0400

        percpu_counter: new function percpu_counter_sum_and_set

As described in

	revert "percpu counter: clean up percpu_counter_sum_and_set()"

the new percpu_counter_sum_and_set() is racy against updates to the
cpu-local accumulators on other CPUs.  Revert that change.

This means that ext4 will be slow again.  But correct.
Reported-by: NEric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>		[2.6.27.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

02d21168

revert "percpu counter: clean up percpu_counter_sum_and_set()" · 71c5576f

由 Andrew Morton 提交于 12月 09, 2008

Revert

    commit 1f7c14c6
    Author: Mingming Cao <cmm@us.ibm.com>
    Date:   Thu Oct 9 12:50:59 2008 -0400

        percpu counter: clean up percpu_counter_sum_and_set()

Before this patch we had the following:

percpu_counter_sum(): return the percpu_counter's value

percpu_counter_sum_and_set(): return the percpu_counter's value, copying
that value into the central value and zeroing the per-cpu counters before
returning.

After this patch, percpu_counter_sum_and_set() has gone, and
percpu_counter_sum() gets the old percpu_counter_sum_and_set()
functionality.

Problem is, as Eric points out, the old percpu_counter_sum_and_set()
functionality was racy and wrong.  It zeroes out counters on "other" cpus,
without holding any locks which will prevent races agaist updates from
those other CPUS.

This patch reverts 1f7c14c6.  This means
that percpu_counter_sum_and_set() still has the race, but
percpu_counter_sum() does not.

Note that this is not a simple revert - ext4 has since started using
percpu_counter_sum() for its dirty_blocks counter as well.

Note that this revert patch changes percpu_counter_sum() semantics.

Before the patch, a call to percpu_counter_sum() will bring the counter's
central counter mostly up-to-date, so a following percpu_counter_read()
will return a close value.

After this patch, a call to percpu_counter_sum() will leave the counter's
central accumulator unaltered, so a subsequent call to
percpu_counter_read() can now return a significantly inaccurate result.

If there is any code in the tree which was introduced after
e8ced39d was merged, and which depends
upon the new percpu_counter_sum() semantics, that code will break.
Reported-by: NEric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

71c5576f

24 11月, 2008 1 次提交

ext4: Fix lockdep recursive locking warning · b7be019e

由 Aneesh Kumar K.V 提交于 11月 23, 2008

In ext4_mb_init_group(), if the filesystem block size is less than
PAGE_SIZE/2, the code tries to grab alloc_sem for multiple block
groups in a loop.  We need to allow for this by using
down_write_nested() and passing in the loop index as a lock subclass
number.  This works because no other code path needs to take multiple
alloc_sem's.  Note that lockdep will fail for filesystem blocksize
smaller than to PAGE_SIZE/16k.  (e.g., a 1k filesystem blocksize with
a 32k page size, or a 2k filesystem blocksize with a 64k blocksize,
etc.)
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b7be019e