提交 · a561be7100cd610bd2e082f3211c1dfb45835817 · openeuler / Kernel

04 1月, 2012 3 次提交

switch a bunch of places to mnt_want_write_file() · a561be71

由 Al Viro 提交于 11月 23, 2011

it's both faster (in case when file has been opened for write) and cleaner.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a561be71

trim fs/internal.h · f47ec3f2

由 Al Viro 提交于 11月 21, 2011

some stuff in there can actually become static; some belongs to pnode.h
as it's a private interface between namespace.c and pnode.c...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f47ec3f2

pull manipulations of rpc_cred inside alloc_nfs_open_context() · 5ede7b1c

由 Al Viro 提交于 10月 23, 2011

No need to duplicate them in both callers; make it return
ERR_PTR(-ENOMEM) on allocation failure instead of NULL and
it'll be able to report rpc_lookup_cred() failures just
fine.  Callers are much happier that way...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5ede7b1c

30 12月, 2011 2 次提交

procfs: do not confuse jiffies with cputime64_t · 34845636

由 Andreas Schwab 提交于 12月 28, 2011

Commit 2a95ea6c ("procfs: do not overflow get_{idle,iowait}_time
for nohz") did not take into account that one some architectures jiffies
and cputime use different units.

This causes get_idle_time() to return numbers in the wrong units, making
the idle time fields in /proc/stat wrong.

Instead of converting the usec value returned by
get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
usecs_to_cputime64 to convert it to the correct unit of cputime64_t.
Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Artem S. Tashkinov" <t.artem@mailcity.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

34845636

ceph: disable use of dcache for readdir etc. · a4d46363

由 Sage Weil 提交于 12月 29, 2011

Ceph attempts to use the dcache to satisfy negative lookups and readdir
when the entire directory contents are in cache.  Disable this behavior
until lingering bugs in this code are shaken out; we'll re-enable these
hooks once things are fully stable.
Signed-off-by: NSage Weil <sage@newdream.net>

a4d46363

27 12月, 2011 1 次提交

vfs: fix handling of lock allocation failure in lease-break case · 6d4b9e38

由 Linus Torvalds 提交于 12月 26, 2011

Bruce Fields notes that commit 778fc546 ("locks: fix tracking of
inprogress lease breaks") introduced a possible error pointer
dereference on failure to allocate memory.  locks_conflict() will
dereference the passed-in new lease lock structure that may be an error pointer.

This means an open (without O_NONBLOCK set) on a file with a lease
applied (generally only done when Samba or nfsd (with v4) is running)
could crash if a kmalloc() fails.

So instead of playing games with IS_ERROR() all over the place, just
check the allocation failure early.  That makes the code more
straightforward, and avoids this possible bad pointer dereference.
Based-on-patch-by: NJ. Bruce Fields <bfields@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d4b9e38

24 12月, 2011 2 次提交

xfs: log all dirty inodes in xfs_fs_sync_fs · be4f1ac8

由 Christoph Hellwig 提交于 12月 20, 2011

Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync().  Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.

The older_than_this checks that are now more strictly enforced since

    writeback: avoid livelocking WB_SYNC_ALL writeback

by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once.  But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened.  I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.

Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty.  This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Tested-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

be4f1ac8

xfs: log the inode in ->write_inode calls for kupdate · 0b8fd303

由 Christoph Hellwig 提交于 12月 18, 2011

If the writeback code writes back an inode because it has expired we currently
use the non-blockin ->write_inode path.  This means any inode that is pinned
is skipped.  With delayed logging and a workload that has very little log
traffic otherwise it is very likely that an inode that gets constantly
written to is always pinned, and thus we keep refusing to write it.  The VM
writeback code at that point redirties it and doesn't try to write it again
for another 30 seconds.  This means under certain scenarious time based
metadata writeback never happens.

Fix this by calling into xfs_log_inode for kupdate in addition to data
integrity syncs, and thus transfer the inode to the log ASAP.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Tested-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

0b8fd303

23 12月, 2011 2 次提交

Btrfs: call d_instantiate after all ops are setup · 08c422c2

由 Al Viro 提交于 12月 23, 2011

This closes races where btrfs is calling d_instantiate too soon during
inode creation.  All of the callers of btrfs_add_nondir are updated to
instantiate after the inode is fully setup in memory.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

08c422c2

Btrfs: fix worker lock misuse in find_worker · 8d532b2a

由 Chris Mason 提交于 12月 23, 2011

Dan Carpenter noticed that we were doing a double unlock on the worker
lock, and sometimes picking a worker thread without the lock held.

This fixes both errors.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>

8d532b2a

21 12月, 2011 2 次提交

nilfs2: potential integer overflow in nilfs_ioctl_clean_segments() · 481fe17e

由 Haogang Chen 提交于 12月 19, 2011

There is a potential integer overflow in nilfs_ioctl_clean_segments().
When a large argv[n].v_nmembs is passed from the userspace, the subsequent
call to vmalloc() will allocate a buffer smaller than expected, which
leads to out-of-bound access in nilfs_ioctl_move_blocks() and
lfs_clean_segments().

The following check does not prevent the overflow because nsegs is also
controlled by the userspace and could be very large.

		if (argv[n].v_nmembs > nsegs * nilfs->ns_blocks_per_segment)
			goto out_free;

This patch clamps argv[n].v_nmembs to UINT_MAX / argv[n].v_size, and
returns -EINVAL when overflow.
Signed-off-by: NHaogang Chen <haogangchen@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

481fe17e

nilfs2: unbreak compat ioctl · 695c60f2

由 Thomas Meyer 提交于 12月 19, 2011

commit 828b1c50 ("nilfs2: add compat ioctl") incidentally broke all
other NILFS compat ioctls.  Make them work again.
Signed-off-by: NThomas Meyer <thomas@m3y3r.de>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org> [3.0+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

695c60f2

18 12月, 2011 1 次提交

writeback: show writeback reason with __print_symbolic · b3bba872

由 Wu Fengguang 提交于 12月 08, 2011

This makes the binary trace understandable by trace-cmd.

CC: Dave Chinner <david@fromorbit.com>
CC: Curt Wohlgemuth <curtw@google.com>
CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

b3bba872

17 12月, 2011 1 次提交

btrfs: lower the dirty balance poll interval · 142349f5

由 Wu Fengguang 提交于 12月 16, 2011

Tests show that the original large intervals can easily make the dirty
limit exceeded on 100 concurrent dd's. So adapt to as large as the
next check point selected by the dirty throttling algorithm.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

142349f5

16 12月, 2011 9 次提交

NFS: Fix a regression in nfs_file_llseek() · 6c529617

由 Trond Myklebust 提交于 12月 15, 2011

After commit 06222e49 (fs: handle
SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek)
the behaviour of llseek() was changed so that it always revalidates
the file size. The bug appears to be due to a logic error in the
afore-mentioned commit, which always evaluates to 'true'.
Reported-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>=3.1]

6c529617

Btrfs: unplug every once and a while · d85c8a6f

由 Chris Mason 提交于 12月 15, 2011

The btrfs io submission threads can build up massive plug lists.  This
keeps things more reasonable so we don't hand over huge dumps of IO at
once.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d85c8a6f

C
Btrfs: deal with NULL srv_rsv in the delalloc inode reservation code · e755d9ab
由 Chris Mason 提交于 12月 15, 2011
```
btrfs_update_inode is sometimes called with a null reservation.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
e755d9ab

Btrfs: only set cache_generation if we setup the block group · e65cbb94

由 Josef Bacik 提交于 12月 13, 2011

A user reported a problem booting into a new kernel with the old format inodes.
He was panicing in cow_file_range while writing out the inode cache. This is
because if the block group is not cached we'll just skip writing out the cache,
however if it gets dirtied again in the same transaction and it finished caching
we'd go ahead and write it out, but since we set cache_generation to the transid
we think we've already truncated it and will just carry on, running into
cow_file_range and blowing up. We need to make sure we only set
cache_generation if we've done the truncate. The user tested this patch and
verified that the panic no longer occured. Thanks,
Reported-and-Tested-by: NKlaus Bitto <klaus.bitto@gmail.com>
Signed-off-by: NJosef Bacik <josef@redhat.com>

e65cbb94

Btrfs: don't panic if orphan item already exists · ee4d89f0

由 Josef Bacik 提交于 12月 13, 2011

I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in
a loop. This is because we will add an orphan item, do the truncate, the
truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're
left with an orphan item still in the fs. Then we come back later to do another
truncate and it blows up because we already have an orphan item. This is ok so
just fix the BUG_ON() to only BUG() if ret is not EEXIST. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

ee4d89f0

Btrfs: fix leaked space in truncate · 7041ee97

由 Josef Bacik 提交于 12月 09, 2011

We were occasionaly leaking space when running xfstest 269. This is because if
we failed to start the transaction in the truncate loop we'd just goto out, but
we need to break so that the inode is removed from the orphan list and the space
is properly freed. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7041ee97

Btrfs: fix how we do delalloc reservations and how we free reservations on error · 660d3f6c

由 Josef Bacik 提交于 12月 09, 2011

Running xfstests 269 with some tracing my scripts kept spitting out errors about
releasing bytes that we didn't actually have reserved. This took me down a huge
rabbit hole and it turns out the way we deal with reserved_extents is wrong,
we need to only be setting it if the reservation succeeds, otherwise the free()
method will come in and unreserve space that isn't actually reserved yet, which
can lead to other warnings and such. The math was all working out right in the
end, but it caused all sorts of other issues in addition to making my scripts
yell and scream and generally make it impossible for me to track down the
original issue I was looking for. The other problem is with our error handling
in the reservation code. There are two cases that we need to deal with

1) We raced with free. In this case free won't free anything because csum_bytes
is modified before we dro the lock in our reservation path, so free rightly
doesn't release any space because the reservation code may be depending on that
reservation. However if we fail, we need the reservation side to do the free at
that point since that space is no longer in use. So as it stands the code was
doing this fine and it worked out, except in case #2

2) We don't race with free. Nobody comes in and changes anything, and our
reservation fails. In this case we didn't reserve anything anyway and we just
need to clean up csum_bytes but not free anything. So we keep track of
csum_bytes before we drop the lock and if it hasn't changed we know we can just
decrement csum_bytes and carry on.

Because of the case where we can race with free()'s since we have to drop our
spin_lock to do the reservation, I'm going to serialize all reservations with
the i_mutex. We already get this for free in the heavy use paths, truncate and
file write all hold the i_mutex, just needed to add it to page_mkwrite and
various ioctl/balance things. With this patch my space leak scripts no longer
scream bloody murder. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

660d3f6c

Btrfs: deal with enospc from dirtying inodes properly · 22c44fe6

由 Josef Bacik 提交于 11月 30, 2011

Now that we're properly keeping track of delayed inode space we've been getting
a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is
because a bunch of people call mark_inode_dirty, which is void so we can't
return ENOSPC. This needs to be fixed in a few areas

1) file_update_time - this updates the mtime and such when writing to a file,
which will call mark_inode_dirty. So copy file_update_time into btrfs so we can
call btrfs_dirty_inode directly and return an error if we get one appropriately.

2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't
setting ->setattr for symlinks, even though we should have been. This catches
one of the cases where we were getting errors in mark_inode_dirty.

3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
instead of mark_inode_dirty. This lets us return errors properly for truncate
and chown/anything related to setattr.

4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
print an error if we have one. The only remaining user we can't control for
this is touch_atime(), but we don't really want to keep people from walking
down the tree if we don't have space to save the atime update, so just complain
but don't worry about it.

With this patch xfstests 83 complains a handful of times instead of hundreds of
times. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

22c44fe6

Btrfs: fix num_workers_starting bug and other bugs in async thread · 0dc3b84a

由 Josef Bacik 提交于 11月 18, 2011

Al pointed out we have some random problems with the way we account for
num_workers_starting in the async thread stuff.  First of all we need to make
sure to decrement num_workers_starting if we fail to start the worker, so make
__btrfs_start_workers do this.  Also fix __btrfs_start_workers so that it
doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
failed to create a worker.  Also check_pending_worker_creates needs to call
__btrfs_start_work in it's work function since it already increments
num_workers_starting.

People only start one worker at a time, so get rid of the num_workers argument
everywhere, and make btrfs_queue_worker a void since it will always succeed.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

0dc3b84a

15 12月, 2011 7 次提交

BTRFS: Establish i_ops before calling d_instantiate · ad19db71

由 Casey Schaufler 提交于 12月 15, 2011

The Smack LSM hook for security_d_instantiate checks
the inode's i_op->getxattr value to determine if the
containing filesystem supports extended attributes.
The BTRFS filesystem sets the inode's i_op value only
after it has instantiated the inode. This results in
Smack incorrectly giving new BTRFS inodes attributes
from the filesystem defaults on the assumption that
values can't be stored on the filesystem. This patch
moves the assignment of inode operation vectors ahead
of the calls to d_instantiate, letting Smack know that
the filesystem supports extended attributes. There
should be no impact on the performance or behavior of
BTRFS.
Signed-off-by: NCasey Schaufler <casey@schaufler-ca.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ad19db71

Btrfs: add a cond_resched() into the worker loop · 8f3b65a3

由 Chris Mason 提交于 12月 15, 2011

If we have a constant stream of end_io completions or crc work,
we can hit softlockup messages from the async helper threads.  This
adds a cond_resched() into the loop to avoid them.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8f3b65a3

Btrfs: fix ctime update of on-disk inode · 306424cc

由 Li Zefan 提交于 12月 14, 2011

To reproduce the bug:

    # touch /mnt/tmp
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:23.412105981 +0800
    # chattr +i /mnt/tmp
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:43.198105295 +0800
    # umount /mnt
    # mount /dev/loop1 /mnt
    # stat /mnt/tmp | grep Change
    Change: 2011-12-09 09:32:23.412105981 +0800

We should update ctime of in-memory inode before calling
btrfs_update_inode().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

306424cc

btrfs: keep orphans for subvolume deletion · f8e9e0b0

由 Arne Jansen 提交于 12月 14, 2011

Since we have the free space caches, btrfs_orphan_cleanup also runs for
the tree_root. Unfortunately this also cleans up the orphans used to mark
subvol deletions in progress.

Currently if a subvol deletion gets interrupted twice by umount/mount, the
deletion will not be continued and the space permanently lost, though it
would be possible to write a tool to recover those lost subvol deletions.
This patch checks if the orphan belongs to a subvol (dead root) and skips
the deletion.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f8e9e0b0

Btrfs: fix inaccurate available space on raid0 profile · 39fb26c3

由 Miao Xie 提交于 12月 14, 2011

When we use raid0 as the data profile, df command may show us a very
inaccurate value of the available space, which may be much less than the
real one. It may make the users puzzled. Fix it by changing the calculation
of the available space, and making it be more similar to a fake chunk
allocation.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

39fb26c3

Btrfs: fix wrong disk space information of the files · 3642320e

由 Miao Xie 提交于 12月 14, 2011

Btrfsck report errors after the 83th case of xfstests was run, The error
number is 400, it means the used disk space of the file is wrong.

The reason of this bug is that:
The file truncation may fail when the space of the file system is not enough,
and leave some file extents, whose offset are beyond the end of the files.
When we want to expand those files, we will drop those file extents, and
put in dummy file extents, and then we should update the i-node. But btrfs
forgets to do it.

This patch adds the forgotten i-node update.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3642320e

Btrfs: fix wrong i_size when truncating a file to a larger size · f4a2f4c5

由 Miao Xie 提交于 12月 14, 2011

Btrfsck report error 100 after the 83th case of xfstests was run, it means
the i_size of the file is wrong.

The reason of this bug is that:
Btrfs increased i_size of the file at the beginning, but it failed to expand
the file, and failed to update the i_size to the old size because there is no
enough space in the file system, so we found a wrong i_size.

This patch fixes this bug by updating the i_size just when we pass the file
expanding and get enough space to update i-node.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f4a2f4c5

14 12月, 2011 10 次提交

fs/ncpfs: fix error paths and goto statements in ncp_fill_super() · 759c361e

由 Djalal Harouni 提交于 12月 13, 2011

The label 'out_bdi' should be followed by bdi_destroy() instead of
fput() which should be after the 'out_fput' label.

If bdi_setup_and_register() fails then jump to the 'out_fput' label
instead of the 'out_bdi' one.

If fget(data.info_fd) fails then jump to the previously fixed 'out_bdi'
label to call bdi_destroy() otherwise the bdi object will not be
destroyed.

Compile tested only.
Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

759c361e

ext4: handle EOF correctly in ext4_bio_write_page() · 5a0dc736

由 Yongqiang Yang 提交于 12月 13, 2011

We need to zero out part of a page which beyond EOF before setting uptodate,
otherwise, mapread or write will see non-zero data beyond EOF.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

5a0dc736

ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized · 5b5ffa49

由 Yongqiang Yang 提交于 12月 13, 2011

If a file is fallocated on a hole, map->m_lblk + map->m_len may be greater
than ee_block + ee_len.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

5b5ffa49

ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers() · 093e6e36

由 Yongqiang Yang 提交于 12月 13, 2011

If a page has been read into memory and never been written, it has no
buffers, but we should handle the page in truncate or punch hole.

VFS code of writing operations has handled holes correctly, so this
patch removes the code handling holes in writing operations.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

093e6e36

ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize · 13a79a47

由 Yongqiang Yang 提交于 12月 13, 2011

If there is an unwritten but clean buffer in a page and there is a
dirty buffer after the buffer, then mpage_submit_io does not write the
dirty buffer out.  As a result, da_writepages loops forever.

This patch fixes the problem by checking dirty flag.
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

13a79a47

ext4: avoid hangs in ext4_da_should_update_i_disksize() · ea51d132

由 Andrea Arcangeli 提交于 12月 13, 2011

If the pte mapping in generic_perform_write() is unmapped between
iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the
"copied" parameter to ->end_write can be zero. ext4 couldn't cope with
it with delayed allocations enabled. This skips the i_disksize
enlargement logic if copied is zero and no new data was appeneded to
the inode.

 gdb> bt
 #0  0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\
 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 #2  0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\
 ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440
 #3  generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\
 os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482
 #4  0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\
 xffff88001e26be40) at mm/filemap.c:2600
 #5  0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\
 zed out>, pos=<value optimized out>) at mm/filemap.c:2632
 #6  0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\
 t fs/ext4/file.c:136
 #7  0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \
 ppos=0xffff88001e26bf48) at fs/read_write.c:406
 #8  0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\
 000, pos=0xffff88001e26bf48) at fs/read_write.c:435
 #9  0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\
 4000) at fs/read_write.c:487
 #10 <signal handler called>
 #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ ()
 #12 0x0000000000000000 in ?? ()
 gdb> print offset
 $22 = 0xffffffffffffffff
 gdb> print idx
 $23 = 0xffffffff
 gdb> print inode->i_blkbits
 $24 = 0xc
 gdb> up
 #1  ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\
 xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512
 2512                    if (ext4_da_should_update_i_disksize(page, end)) {
 gdb> print start
 $25 = 0x0
 gdb> print end
 $26 = 0xffffffffffffffff
 gdb> print pos
 $27 = 0x108000
 gdb> print new_i_size
 $28 = 0x108000
 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize
 $29 = 0xd9000
 gdb> down
 2467            for (i = 0; i < idx; i++)
 gdb> print i
 $30 = 0xd44acbee

This is 100% reproducible with some autonuma development code tuned in
a very aggressive manner (not normal way even for knumad) which does
"exotic" changes to the ptes. It wouldn't normally trigger but I don't
see why it can't happen normally if the page is added to swap cache in
between the two faults leading to "copied" being zero (which then
hangs in ext4). So it should be fixed. Especially possible with lumpy
reclaim (albeit disabled if compaction is enabled) as that would
ignore the young bits in the ptes.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

ea51d132

Y
ceph: add missing spin_unlock at ceph_mdsc_build_path() · 9d5a09e6
由 Yehuda Sadeh 提交于 12月 13, 2011
```
one of the paths was missing spin_unlock
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
```
9d5a09e6
A
configfs: register_filesystem() called too early · 7c6455e3
由 Al Viro 提交于 12月 13, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7c6455e3
A
fuse: register_filesystem() called too early · 988f0325
由 Al Viro 提交于 12月 13, 2011
```
same story as with ubifs
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
988f0325

ubifs: too early register_filesystem() · 5cc361e3

由 Al Viro 提交于 12月 12, 2011

doing that before you are ready to handle mount() is a Bad Idea(tm)...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5cc361e3

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功