提交 · bb3d132a24cd8bf5e7773b2d9f9baa58b07a7dae · openeuler / raspberrypi-kernel

29 5月, 2012 1 次提交

ext4: fix potential NULL dereference in ext4_free_inodes_counts() · bb3d132a

由 Dan Carpenter 提交于 5月 28, 2012

The ext4_get_group_desc() function returns NULL on error, and
ext4_free_inodes_count() function dereferences it without checking.
There is a check on the next line, but it's too late.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

bb3d132a

27 5月, 2012 8 次提交

ext4/jbd2: add metadata checksumming to the list of supported features · e93376c2

由 Darrick J. Wong 提交于 5月 27, 2012

Activate the metadata checksumming feature by adding it to ext4 and
jbd2's lists of supported features.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e93376c2

jbd2: checksum data blocks that are stored in the journal · c3900875

由 Darrick J. Wong 提交于 5月 27, 2012

Calculate and verify checksums of each data block being stored in the journal.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c3900875

jbd2: checksum commit blocks · 1f56c589

由 Darrick J. Wong 提交于 5月 27, 2012

Calculate and verify the checksum of commit blocks.  In checksum v2,
deprecate most of the checksum v1 commit block checksum fields, since
each block has its own checksum.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1f56c589

jbd2: checksum descriptor blocks · 3caa487f

由 Darrick J. Wong 提交于 5月 27, 2012

Calculate and verify a checksum of each descriptor block.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3caa487f

jbd2: checksum revocation blocks · 42a7106d

由 Darrick J. Wong 提交于 5月 27, 2012

Compute and verify revoke blocks inside the journal.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

42a7106d

jbd2: checksum journal superblock · 4fd5ea43

由 Darrick J. Wong 提交于 5月 27, 2012

Calculate and verify a checksum covering the journal superblock.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4fd5ea43

jbd2: Grab a reference to the crc32c driver if necessary · 01b5adce

由 Darrick J. Wong 提交于 5月 27, 2012

Obtain a reference to the crc32c driver if needed for the v2 checksum.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

01b5adce

jbd2: enable journal clients to enable v2 checksumming · 25ed6e8a

由 Darrick J. Wong 提交于 5月 27, 2012

Add in the necessary code so that journal clients can enable the new
journal checksumming features.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

25ed6e8a

23 5月, 2012 1 次提交

jbd2: change disk layout for metadata checksumming · 8f888ef8

由 Darrick J. Wong 提交于 5月 22, 2012

Define flags and allocate space in on-disk journal structures to support
checksumming of journal metadata.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8f888ef8

21 5月, 2012 1 次提交

ext4: enable the 64-bit jbd2 feature based on the 64-bit ext4 feature · f32aaf2d

由 Theodore Ts'o 提交于 5月 21, 2012

Previously we were only enabling the 64-bit jbd2 feature if the number
of blocks in the file system was greater 2**32-1. The problem with
this is that it makes it harder to test the 64-bit journal code paths
with small file systems, since a small test file system would with the
64-bit ext4 feature enable would use a 64-bit file system on-disk data
structures, but use a 32-bit journal.

This would also cause problems when trying to do an online resize to
grow the filesystem above the 2**32-1 boundary. Fortunately the patch
to support online resize for 64-bit file systems hasn't been merged
yet, so this problem hasn't arisen in practice.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f32aaf2d

30 4月, 2012 17 次提交

ext4: remove unnecessary check in add_dirent_to_buf() · b09de7fa

由 Theodore Ts'o 提交于 4月 30, 2012

None of this function callers ever pass in a NULL inode pointer, so
this check is unnecessary, and the else clause is dead code. (This
change should make the code coverage people a little happier. :-)
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b09de7fa

ext4: add checksums to the MMP block · 5c359a47

由 Darrick J. Wong 提交于 4月 29, 2012

Compute and verify a checksum for the MMP block.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5c359a47

ext4: make block group checksums use metadata_csum algorithm · feb0ab32

由 Darrick J. Wong 提交于 4月 29, 2012

metadata_csum supersedes uninit_bg.  Convert the ROCOMPAT uninit_bg
flag check to a helper function that covers both, and make the
checksum calculation algorithm use either crc16 or the metadata_csum
chosen algorithm depending on which flag is set.  Print a warning if
we try to mount a filesystem with both feature flags set.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

feb0ab32

ext4: Calculate and verify checksums of extended attribute blocks · cc8e94fd

由 Darrick J. Wong 提交于 4月 29, 2012

Calculate and verify the checksums of extended attribute blocks.  This
only applies to separate EA blocks that are pointed to by
inode->i_file_acl (i.e.  external EA blocks); the checksum lives in
the EA header.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cc8e94fd

ext4: calculate and verify checksums of directory leaf blocks · b0336e8d

由 Darrick J. Wong 提交于 4月 29, 2012

Calculate and verify the checksums for directory leaf blocks
(i.e. blocks that only contain actual directory entries).  The
checksum lives in what looks to be an unused directory entry with a 0
name_len at the end of the block.  This scheme is not used for
internal htree nodes because the mechanism in place there only costs
one dx_entry, whereas the "empty" directory entry would cost two
dx_entries.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b0336e8d

ext4: Calculate and verify checksums for htree nodes · dbe89444

由 Darrick J. Wong 提交于 4月 29, 2012

Calculate and verify the checksum for directory index tree (htree)
node blocks.  The checksum is stored in the last 4 bytes of the htree
block and requires the dx_entry array to stop 1 dx_entry short of the
end of the block.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

dbe89444

ext4: verify and calculate checksums for extent tree blocks · 7ac5990d

由 Darrick J. Wong 提交于 4月 29, 2012

Calculate and verify the checksum for each extent tree block.  The
checksum is located in the space immediately after the last possible
ext4_extent in the block.  The space is is typically the last 4-8
bytes in the block.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7ac5990d

ext4: calculate and verify block bitmap checksum · fa77dcfa

由 Darrick J. Wong 提交于 4月 29, 2012

Compute and verify the checksum of the block bitmap; this checksum is
stored in the block group descriptor.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fa77dcfa

ext4: calculate and verify checksums for inode bitmaps · 41a246d1

由 Darrick J. Wong 提交于 4月 29, 2012

Compute and verify the checksum of the inode bitmap; the checkum is
stored in the block group descriptor.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

41a246d1

ext4: calculate and verify inode checksums · 814525f4

由 Darrick J. Wong 提交于 4月 29, 2012

This patch introduces to ext4 the ability to calculate and verify
inode checksums.  This requires the use of a new ro compatibility flag
and some accompanying e2fsprogs patches to provide the relevant
features in tune2fs and e2fsck.  The inode generation changes have
been integrated into this patch.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

814525f4

ext4: calculate and verify superblock checksum · a9c47317

由 Darrick J. Wong 提交于 4月 29, 2012

Calculate and verify the superblock checksum.  Since the UUID and
block group number are embedded in each copy of the superblock, we
need only checksum the entire block.  Refactor some of the code to
eliminate open-coding of the checksum update call.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a9c47317

ext4: load the crc32c driver if necessary · 0441984a

由 Darrick J. Wong 提交于 4月 29, 2012

Obtain a reference to the cryptoapi and crc32c if we mount a
filesystem with metadata checksumming enabled.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0441984a

ext4: record the checksum algorithm in use in the superblock · d25425f8

由 Darrick J. Wong 提交于 4月 29, 2012

Record the type of checksum algorithm we're using for metadata in the
superblock, in case we ever want/need to change the algorithm.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d25425f8

ext4: change on-disk layout to support extended metadata checksumming · e6153918

由 Darrick J. Wong 提交于 4月 29, 2012

Define flags and change structure definitions to allow checksumming of
ext4 metadata.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e6153918

ext4: create a new BH_Verified flag to avoid unnecessary metadata validation · f8489128

由 Darrick J. Wong 提交于 4月 29, 2012

Create a new BH_Verified flag to indicate that we've verified all the
data in a buffer_head for correctness.  This allows us to bypass
expensive verification steps when they are not necessary without
missing them when they are.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f8489128

autofs: make the autofsv5 packet file descriptor use a packetized pipe · 64f371bc

由 Linus Torvalds 提交于 4月 29, 2012

The autofs packet size has had a very unfortunate size problem on x86:
because the alignment of 'u64' differs in 32-bit and 64-bit modes, and
because the packet data was not 8-byte aligned, the size of the autofsv5
packet structure differed between 32-bit and 64-bit modes despite
looking otherwise identical (300 vs 304 bytes respectively).

We first fixed that up by making the 64-bit compat mode know about this
problem in commit a32744d4 ("autofs: work around unhappy compat
problem on x86-64"), and that made a 32-bit 'systemd' work happily on a
64-bit kernel because everything then worked the same way as on a 32-bit
kernel.

But it turned out that 'automount' had actually known and worked around
this problem in user space, so fixing the kernel to do the proper 32-bit
compatibility handling actually *broke* 32-bit automount on a 64-bit
kernel, because it knew that the packet sizes were wrong and expected
those incorrect sizes.

As a result, we ended up reverting that compatibility mode fix, and
thus breaking systemd again, in commit fcbf94b9.

With both automount and systemd doing a single read() system call, and
verifying that they get *exactly* the size they expect but using
different sizes, it seemed that fixing one of them inevitably seemed to
break the other.  At one point, a patch I seriously considered applying
from Michael Tokarev did a "strcmp()" to see if it was automount that
was doing the operation.  Ugly, ugly.

However, a prettier solution exists now thanks to the packetized pipe
mode.  By marking the communication pipe as being packetized (by simply
setting the O_DIRECT flag), we can always just write the bigger packet
size, and if user-space does a smaller read, it will just get that
partial end result and the extra alignment padding will simply be thrown
away.

This makes both automount and systemd happy, since they now get the size
they asked for, and the kernel side of autofs simply no longer needs to
care - it could pad out the packet arbitrarily.

Of course, if there is some *other* user of autofs (please, please,
please tell me it ain't so - and we haven't heard of any) that tries to
read the packets with multiple writes, that other user will now be
broken - the whole point of the packetized mode is that one system call
gets exactly one packet, and you cannot read a packet in pieces.
Tested-by: NMichael Tokarev <mjt@tls.msk.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

64f371bc

pipes: add a "packetized pipe" mode for writing · 9883035a

由 Linus Torvalds 提交于 4月 29, 2012

The actual internal pipe implementation is already really about
individual packets (called "pipe buffers"), and this simply exposes that
as a special packetized mode.

When we are in the packetized mode (marked by O_DIRECT as suggested by
Alan Cox), a write() on a pipe will not merge the new data with previous
writes, so each write will get a pipe buffer of its own.  The pipe
buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
will tell the reader side to break the read at that boundary (and throw
away any partial packet contents that do not fit in the read buffer).

End result: as long as you do writes less than PIPE_BUF in size (so that
the pipe doesn't have to split them up), you can now treat the pipe as a
packet interface, where each read() system call will read one packet at
a time.  You can just use a sufficiently big read buffer (PIPE_BUF is
sufficient, since bigger than that doesn't guarantee atomicity anyway),
and the return value of the read() will naturally give you the size of
the packet.

NOTE! We do not support zero-sized packets, and zero-sized reads and
writes to a pipe continue to be no-ops.  Also note that big packets will
currently be split at write time, but that the size at which that
happens is not really specified (except that it's bigger than PIPE_BUF).
Currently that limit is the system page size, but we might want to
explicitly support bigger packets some day.

The main user for this is going to be the autofs packet interface,
allowing us to stop having to care so deeply about exact packet sizes
(which have had bugs with 32/64-bit compatibility modes).  But user
space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
fail with an EINVAL on kernels that do not support this interface.
Tested-by: NMichael Tokarev <mjt@tls.msk.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: stable@kernel.org  # needed for systemd/autofs interaction fix
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9883035a

28 4月, 2012 8 次提交

Revert "autofs: work around unhappy compat problem on x86-64" · fcbf94b9

由 Linus Torvalds 提交于 4月 28, 2012

This reverts commit a32744d4.

While that commit was technically the right thing to do, and made the
x86-64 compat mode work identically to native 32-bit mode (and thus
fixing the problem with a 32-bit systemd install on a 64-bit kernel), it
turns out that the automount binaries had workarounds for this compat
problem.

Now, the workarounds are disgusting: doing an "uname()" to find out the
architecture of the kernel, and then comparing it for the 64-bit cases
and fixing up the size of the read() in automount for those.  And they
were confused: it's not actually a generic 64-bit issue at all, it's
very much tied to just x86-64, which has different alignment for an
'u64' in 64-bit mode than in 32-bit mode.

But the end result is that fixing the compat layer actually breaks the
case of a 32-bit automount on a x86-64 kernel.

There are various approaches to fix this (including just doing a
"strcmp()" on current->comm and comparing it to "automount"), but I
think that I will do the one that teaches pipes about a special "packet
mode", which will allow user space to not have to care too deeply about
the padding at the end of the autofs packet.

That change will make the compat workaround unnecessary, so let's revert
it first, and get automount working again in compat mode.  The
packetized pipes will then fix autofs for systemd.
Reported-and-requested-by: NMichael Tokarev <mjt@tls.msk.ru>
Cc: Ian Kent <raven@themaw.net>
Cc: stable@kernel.org # for 3.3
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fcbf94b9

Btrfs: reduce lock contention during extent insertion · dc7fdde3

由 Chris Mason 提交于 4月 27, 2012

We're spending huge amounts of time on lock contention during
end_io processing because we unconditionally assume we are overwriting
an existing extent in the file for each IO.

This checks to see if we are outside i_size, and if so, it uses a
less expensive readonly search of the btree to look for existing
extents.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

dc7fdde3

Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir · fede766f

由 Chris Mason 提交于 4月 27, 2012

Btrfs has an optimization where it will preallocate dentries during
readdir to fill in enough information to open the inode without an extra
lookup.

But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and
that leads to deadlocks because our readdir code has tree locks held.

For now, disable this optimization.  We'll fix the gfp mask in the next
merge window.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fede766f

Btrfs: Fix space checking during fs resize · 7654b724

由 Daniel J Blueman 提交于 4月 27, 2012

Fix out-of-space checking, addressing a warning and potential resource
leak when resizing the filesystem down while allocating blocks.
Signed-off-by: NDaniel J Blueman <daniel@quora.org>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7654b724

Btrfs: fix block_rsv and space_info lock ordering · 1f699d38

由 Stefan Behrens 提交于 4月 27, 2012

may_commit_transaction() calls
        spin_lock(&space_info->lock);
        spin_lock(&delayed_rsv->lock);
and update_global_block_rsv() calls
        spin_lock(&block_rsv->lock);
        spin_lock(&sinfo->lock);

Lockdep complains about this at run time.
Everywhere except in update_global_block_rsv(), the space_info lock is
the outer lock, therefore the locking order in update_global_block_rsv()
is changed.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1f699d38

Btrfs: Prevent root_list corruption · 1daf3540

由 Daniel J Blueman 提交于 4月 27, 2012

I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add
correct locking to address this.
Signed-off-by: NDaniel J Blueman <daniel@quora.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1daf3540

Btrfs: fix repair code for RAID10 · 3e74317a

由 Jan Schmidt 提交于 4月 27, 2012

btrfs_map_block sets mirror_num, so that the repair code knows eventually
which device gave us the read error. For RAID10, mirror_num must be 1 or 2.
Before this fix mirror_num was incorrectly related to our stripe index.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3e74317a

Btrfs: do not start delalloc inodes during sync · 996d282c

由 Josef Bacik 提交于 4月 23, 2012

btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and
start writing them out, but it doesn't splice the list or anything so as
long as somebody is doing work on the box you could end up in this section
_forever_.  So just remove it, it's not needed anyway since sync will start
writeback on all inodes anyway, all we need to do is wait for ordered
extents and then we can commit the transaction.  In my horrible torture test
sync goes from taking 4 minutes to about 1.5 minutes.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

996d282c

26 4月, 2012 4 次提交

revert "proc: clear_refs: do not clear reserved pages" · 63f61a6f

由 Will Deacon 提交于 4月 25, 2012

Revert commit 85e72aa5 ("proc: clear_refs: do not clear reserved
pages"), which was a quick fix suitable for -stable until ARM had been
moved over to the gate_vma mechanism:

https://lkml.org/lkml/2012/1/14/55

With commit f9d4861f ("ARM: 7294/1: vectors: use gate_vma for vectors user
mapping"), ARM does now use the gate_vma, so the PageReserved check can be
removed from the proc code.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Cc: Nicolas Pitre <nico@linaro.org>
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

63f61a6f

hugetlbfs: lockdep annotate root inode properly · 65ed7601

由 Aneesh Kumar K.V 提交于 4月 25, 2012

This fixes the below reported false lockdep warning.  e096d0c7
("lockdep: Add helper function for dir vs file i_mutex annotation") added
a similar annotation for every other inode in hugetlbfs but missed the
root inode because it was allocated by a separate function.

For HugeTLB fs we allow taking i_mutex in mmap.  HugeTLB fs doesn't
support file write and its file read callback is modified in a05b0855
("hugetlbfs: avoid taking i_mutex from hugetlbfs_read()") to not take
i_mutex.  Hence for HugeTLB fs with regular files we really don't take
i_mutex with mmap_sem held.

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.4.0-rc1+ #322 Not tainted
 -------------------------------------------------------
 bash/1572 is trying to acquire lock:
  (&mm->mmap_sem){++++++}, at: [<ffffffff810f1618>] might_fault+0x40/0x90

 but task is already holding lock:
  (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&sb->s_type->i_mutex_key#12){+.+.+.}:
        [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
        [<ffffffff816a2f5e>] __mutex_lock_common+0x48/0x350
        [<ffffffff816a3325>] mutex_lock_nested+0x2a/0x31
        [<ffffffff811fb8e1>] hugetlbfs_file_mmap+0x7d/0x104
        [<ffffffff810f859a>] mmap_region+0x272/0x47d
        [<ffffffff810f8a39>] do_mmap_pgoff+0x294/0x2ee
        [<ffffffff810f8b65>] sys_mmap_pgoff+0xd2/0x10e
        [<ffffffff8103d19e>] sys_mmap+0x1d/0x1f
        [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b

 -> #0 (&mm->mmap_sem){++++++}:
        [<ffffffff810a0256>] __lock_acquire+0xa81/0xd75
        [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
        [<ffffffff810f1645>] might_fault+0x6d/0x90
        [<ffffffff81125d62>] filldir+0x6a/0xc2
        [<ffffffff81133a83>] dcache_readdir+0x5c/0x222
        [<ffffffff81125fa8>] vfs_readdir+0x76/0xa8
        [<ffffffff811260b6>] sys_getdents+0x79/0xc9
        [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&sb->s_type->i_mutex_key#12);
                                lock(&mm->mmap_sem);
                                lock(&sb->s_type->i_mutex_key#12);
   lock(&mm->mmap_sem);

  *** DEADLOCK ***

 1 lock held by bash/1572:
  #0:  (&sb->s_type->i_mutex_key#12){+.+.+.}, at: [<ffffffff81125f88>] vfs_readdir+0x56/0xa8

 stack backtrace:
 Pid: 1572, comm: bash Not tainted 3.4.0-rc1+ #322
 Call Trace:
  [<ffffffff81699a3c>] print_circular_bug+0x1f8/0x209
  [<ffffffff810a0256>] __lock_acquire+0xa81/0xd75
  [<ffffffff810f38aa>] ? handle_pte_fault+0x5ff/0x614
  [<ffffffff8109e622>] ? mark_lock+0x2d/0x258
  [<ffffffff810f1618>] ? might_fault+0x40/0x90
  [<ffffffff810a09e5>] lock_acquire+0xd5/0xfa
  [<ffffffff810f1618>] ? might_fault+0x40/0x90
  [<ffffffff816a3249>] ? __mutex_lock_common+0x333/0x350
  [<ffffffff810f1645>] might_fault+0x6d/0x90
  [<ffffffff810f1618>] ? might_fault+0x40/0x90
  [<ffffffff81125d62>] filldir+0x6a/0xc2
  [<ffffffff81133a83>] dcache_readdir+0x5c/0x222
  [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
  [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
  [<ffffffff81125cf8>] ? sys_ioctl+0x74/0x74
  [<ffffffff81125fa8>] vfs_readdir+0x76/0xa8
  [<ffffffff811260b6>] sys_getdents+0x79/0xc9
  [<ffffffff816a5922>] system_call_fastpath+0x16/0x1b
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

65ed7601

fs/buffer.c: remove BUG() in possible but rare condition · 61065a30

由 Glauber Costa 提交于 4月 25, 2012

While stressing the kernel with with failing allocations today, I hit the
following chain of events:

alloc_page_buffers():

	bh = alloc_buffer_head(GFP_NOFS);
	if (!bh)
		goto no_grow; <= path taken

grow_dev_page():
        bh = alloc_page_buffers(page, size, 0);
        if (!bh)
                goto failed;  <= taken, consequence of the above

and then the failed path BUG()s the kernel.

The failure is inserted a litte bit artificially, but even then, I see no
reason why it should be deemed impossible in a real box.

Even though this is not a condition that we expect to see around every
time, failed allocations are expected to be handled, and BUG() sounds just
too much.  As a matter of fact, grow_dev_page() can return NULL just fine
in other circumstances, so I propose we just remove it, then.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61065a30

epoll: clear the tfile_check_list on -ELOOP · 13d51807

由 Jason Baron 提交于 4月 25, 2012

An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent
circular epoll dependencies from being created.  However, in that case we
do not properly clear the 'tfile_check_list'.  Thus, add a call to
clear_tfile_check_list() for the -ELOOP case.
Signed-off-by: NJason Baron <jbaron@redhat.com>
Reported-by: NYurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Tested-by: NAlexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

13d51807