提交 · dc89e9824464e91fa0b06267864ceabe3186fd8b · bug2833 / cloud-kernel

18 3月, 2011 5 次提交

Btrfs: use a slab for the free space entries · dc89e982

由 Josef Bacik 提交于 1月 28, 2011

Since we alloc/free free space entries a whole lot, lets use a slab to keep
track of them. This makes some of my tests slightly faster. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

dc89e982

Btrfs: change reserved_extents to an atomic_t · 57a45ced

由 Josef Bacik 提交于 1月 25, 2011

We track delayed allocation per inodes via 2 counters, one is
outstanding_extents and reserved_extents. Outstanding_extents is already an
atomic_t, but reserved_extents is not and is protected by a spinlock. So
convert this to an atomic_t and instead of using a spinlock, use atomic_cmpxchg
when releasing delalloc bytes. This makes our inode 72 bytes smaller, and
reduces locking overhead (albiet it was minimal to begin with). Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

57a45ced

Btrfs: fix how we deal with the pages array in the write path · 4a64001f

由 Josef Bacik 提交于 1月 25, 2011

Really we don't need to memset the pages array at all, since we know how many
pages we're going to use in the array and pass that around. So don't memset,
just trust we're not idiots and we pass num_pages around properly.
Signed-off-by: NJosef Bacik <josef@redhat.com>

4a64001f

Btrfs: simplify our write path · d0215f3e

由 Josef Bacik 提交于 1月 25, 2011

Our aio_write function is huge and kind of hard to follow at times. So this
patch fixes this by breaking out the buffered and direct write paths out into
seperate functions so it's a little clearer what's going on. I've also fixed
some wrong typing that we had and added the ability to handle getting an error
back from btrfs_set_extent_delalloc. Tested this with xfstests and everything
came out fine. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

d0215f3e

Btrfs: fix formatting in file.c · 9f570b8d

由 Josef Bacik 提交于 1月 25, 2011

Sorry, but these were bugging me.  Just cleanup some of the formatting in
file.c.
Signed-off-by: NJosef Bacik <josef@redhat.com>

9f570b8d

15 3月, 2011 1 次提交

Fix corrupted OSF partition table parsing · 1eafbfeb

由 Timo Warns 提交于 3月 14, 2011

The kernel automatically evaluates partition tables of storage devices.
The code for evaluating OSF partitions contains a bug that leaks data
from kernel heap memory to userspace for certain corrupted OSF
partitions.

In more detail:

  for (i = 0 ; i < le16_to_cpu(label->d_npartitions); i++, partition++) {

iterates from 0 to d_npartitions - 1, where d_npartitions is read from
the partition table without validation and partition is a pointer to an
array of at most 8 d_partitions.

Add the proper and obvious validation.
Signed-off-by: NTimo Warns <warns@pre-sense.de>
Cc: stable@kernel.org
[ Changed the patch trivially to not repeat the whole le16_to_cpu()
  thing, and to use an explicit constant for the magic value '8' ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1eafbfeb

14 3月, 2011 1 次提交

compat breakage in preadv() and pwritev() · c44ed965

由 Al Viro 提交于 3月 13, 2011

Fix for a dumb preadv()/pwritev() compat bug - unlike the native
variants, the compat_...  ones forget to check FMODE_P{READ,WRITE}, so
e.g.  on pipe the native preadv() will fail with -ESPIPE and compat one
will act as readv() and succeed.

Not critical, but it's a clear bug with trivial fix, so IMO it's OK for
-final.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c44ed965

12 3月, 2011 7 次提交

Btrfs: break out of shrink_delalloc earlier · 36e39c40

由 Chris Mason 提交于 3月 12, 2011

Josef had changed shrink_delalloc to exit after three shrink
attempts, which wasn't quite enough because new writers could
race in and steal free space.

But it also fixed deadlocks and stalls as we tried to recover
delalloc reservations.  The code was tweaked to loop 1024
times, and would reset the counter any time a small amount
of progress was made.  This was too drastic, and with a
lot of writers we can end up stuck in shrink_delalloc forever.

The shrink_delalloc loop is fairly complex because the caller is looping
too, and the caller will go ahead and force a transaction commit to make
sure we reclaim space.

This reworks things to exit shrink_delalloc when we've forced some
writeback and the delalloc reservations have gone down.  This means
the writeback has not just started but has also finished at
least some of the metadata changes required to reclaim delalloc
space.

If we've got this wrong, we're returning ENOSPC too early, which
is a big improvement over the current behavior of hanging the machine.

Test 224 in xfstests hammers on this nicely, and with 1000 writers
trying to fill a 1GB drive we get our first ENOSPC at 93% full.  The
other writers are able to continue until we get 100%.

This is a worst case test for btrfs because the 1000 writers are doing
small IO, and the small FS size means we don't have a lot of room
for metadata chunks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

36e39c40

NFS: NFSROOT should default to "proto=udp" · 53d47375

由 Chuck Lever 提交于 3月 11, 2011

There have been a number of recent reports that NFSROOT is no longer
working with default mount options, but fails only with certain NICs.

Brian Downing <bdowning@lavos.net> bisected to commit 56463e50 "NFS:
Use super.c for NFSROOT mount option parsing".  Among other things,
this commit changes the default mount options for NFSROOT to use TCP
instead of UDP as the underlying transport.

TCP seems less able to deal with NICs that are slow to initialize.
The system logs that have accompanied reports of problems all show
that NFSROOT attempts to establish a TCP connection before the NIC is
fully initialized, and thus the TCP connection attempt fails.

When a TCP connection attempt fails during a mount operation, the
NFS stack needs to fail the operation.  Usually user space knows how
and when to retry it.  The network layer does not report a distinct
error code for this particular failure mode.  Thus, there isn't a
clean way for the RPC client to see that it needs to retry in this
case, but not in others.

Because NFSROOT is used in some environments where it is not possible
to update the kernel command line to specify "udp", the proper thing
to do is change NFSROOT to use UDP by default, as it did before commit
56463e50.

To make it easier to see how to change default mount options for
NFSROOT and to distinguish default settings from mandatory settings,
I've adjusted a couple of areas to document the specifics.

root_nfs_cat() is also modified to deal with commas properly when
concatenating strings containing mount option lists.  This keeps
root_nfs_cat() call sites simpler, now that we may be concatenating
multiple mount option strings.
Tested-by: NBrian Downing <bdowning@lavos.net>
Tested-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: <stable@kernel.org> # 2.6.37
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

53d47375

nfs4: remove duplicated #include · 57df216b

由 Huang Weiyi 提交于 3月 08, 2011

Remove duplicated #include('s) in
  fs/nfs/nfs4proc.c
Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

57df216b

NFSv4: nfs4_state_mark_reclaim_nograce() should be static · f9feab1e

由 Trond Myklebust 提交于 3月 09, 2011

There are no more external users of nfs4_state_mark_reclaim_nograce() or
nfs4_state_mark_reclaim_reboot(), so mark them as static.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f9feab1e

T
NFSv4: Fix the setlk error handler · ecac799a
由 Trond Myklebust 提交于 3月 09, 2011
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
ecac799a

NFSv4.1: Fix the handling of the SEQUENCE status bits · b4410c2f

由 Trond Myklebust 提交于 3月 09, 2011

We want SEQUENCE status bits to be handled by the state manager in order
to avoid threading issues.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b4410c2f

NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses · 0400a6b0

由 Trond Myklebust 提交于 3月 09, 2011

nfs4_schedule_state_recovery() should only be used when we need to force
the state manager to check the lease. If we just want to start the
state manager in order to handle a state recovery situation, we should be
using nfs4_schedule_state_manager().

This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing
its use with a set of helper functions that do the right thing.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

0400a6b0

11 3月, 2011 10 次提交

NFSv4.1 reclaim complete must wait for completion · c34c32ea

由 Andy Adamson 提交于 3月 09, 2011

Signed-off-by: NAndy Adamson <andros@netapp.com>
[Trond: fix whitespace errors]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c34c32ea

NFSv4: remove duplicate clientid in struct nfs_client · 114f64b5

由 Andy Adamson 提交于 3月 09, 2011

Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

114f64b5

NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY · 7d6d63d6

由 Ricardo Labiaga 提交于 3月 09, 2011

Fix bug where we currently retry the EXCHANGEID call again, eventhough
we already have a valid clientid.  Instead, delay and retry the CREATE_SESSION
call.
Signed-off-by: NRicardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7d6d63d6

(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31... · 3fa0b4e2

由 Frank Filz 提交于 12月 02, 2010

(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid

The problem was use of an int32, which when converted to a uint64
is sign extended resulting in a fileid that doesn't fit in 32 bits
even though the intent of the function is to fit the fileid into
32 bits.
Signed-off-by: NFrank Filz <ffilzlnx@us.ibm.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
[Trond: Added an include for compat.h]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

3fa0b4e2

nfs: fix compilation warning · 43b7c3f0

由 Jovi Zhang 提交于 3月 02, 2011

this commit fix compilation warning as following:
linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast
Signed-off-by: NJovi Zhang <bookjovi@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

43b7c3f0

nfs: add kmalloc return value check in decode_and_add_ds · b9f81057

由 Stanislav Fomichev 提交于 2月 05, 2011

add kmalloc return value check in decode_and_add_ds
Signed-off-by: NStanislav Fomichev <kernel@fomichev.me>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b9f81057

nfs: close NFSv4 COMMIT vs. CLOSE race · d2224e7a

由 Jeff Layton 提交于 3月 06, 2011

I've been adding in more artificial delays in the NFSv4 commit and close
codepaths to uncover races. The kernel I'm testing has the patch to
close the race in __rpc_wait_for_completion_task that's in Trond's
cthon2011 branch. The reproducer I've been using does this in a loop:

	mkdir("DIR");
	fd = open("DIR/FILE", O_WRONLY|O_CREAT|O_EXCL, 0644);
	write(fd, "abcdefg", 7);
	close(fd);
	unlink("DIR/FILE");
	rmdir("DIR");

The above reproducer shouldn't result in any silly-renaming. However,
when I add a "msleep(100)" just after the nfs_commit_clear_lock call in
nfs_commit_release, I can almost always force one to occur. If I can
force it to occur with that, then it can happen without that delay
given the right timing.

nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called
with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait
for the task to complete before putting its reference to it, so the last
reference get put in rpc_release task and gets queued to a workqueue.

In this situation, the last open context reference may be put by the
COMMIT release instead of the close() syscall. The close() syscall
returns too quickly and the unlink runs while the d_count is still
high since the COMMIT release hasn't put its dentry reference yet.

Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete
before putting the task reference when FLUSH_SYNC is set. With this, the
last reference is put by the process that's initiating the FLUSH_SYNC
commit and the race is closed.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d2224e7a

SUNRPC: Close a race in __rpc_wait_for_completion_task() · bf294b41

由 Trond Myklebust 提交于 2月 21, 2011

Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bf294b41

btrfs: fix not enough reserved space · 7e6b6465

由 Miao Xie 提交于 2月 18, 2011

btrfs_link() will insert 3 items(inode ref, dir name item and dir index item)
into the b+ tree and update 2 items(its inode, and parent's inode) in the b+
tree. So we should reserve space for these 5 items, not 3 items.
Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7e6b6465

btrfs: fix dip leak · b4966b77

由 Daniel J Blueman 提交于 3月 09, 2011

The btrfs DIO code leaks dip structs when dip->csums allocation
fails; bio->bi_end_io isn't set at the point where the free_ordered
branch is consequently taken, thus bio_endio doesn't call the function
which would free it in the normal case. Fix.
Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com>
Acked-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b4966b77

10 3月, 2011 10 次提交

fs/dcache: allow d_obtain_alias() to return unhashed dentries · d891eedb

由 J. Bruce Fields 提交于 1月 18, 2011

Without this patch, inodes are not promptly freed on last close of an
unlinked file by an nfs client:

	client$ mount -tnfs4 server:/export/ /mnt/
	client$ tail -f /mnt/FOO
	...
	server$ df -i /export
	server$ rm /export/FOO
	(^C the tail -f)
	server$ df -i /export
	server$ echo 2 >/proc/sys/vm/drop_caches
	server$ df -i /export

the df's will show that the inode is not freed on the filesystem until
the last step, when it could have been freed after killing the client's
tail -f. On-disk data won't be deallocated either, leading to possible
spurious ENOSPC.

This occurs because when the client does the close, it arrives in a
compound with a putfh and a close, processed like:

	- putfh: look up the filehandle.  The only alias found for the
	  inode will be DCACHE_UNHASHED alias referenced by the filp
	  this, so it creates a new DCACHE_DISCONECTED dentry and
	  returns that instead.
	- close: closes the existing filp, which is destroyed
	  immediately by dput() since it's DCACHE_UNHASHED.
	- end of the compound: release the reference
	  to the current filehandle, and dput() the new
	  DCACHE_DISCONECTED dentry, which gets put on the
	  unused list instead of being destroyed immediately.

Nick Piggin suggested fixing this by allowing d_obtain_alias to return
the unhashed dentry that is referenced by the filp, instead of making it
create a new dentry.

Leave __d_find_alias() alone to avoid changing behavior of other
callers.

Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
hashed or unhashed, disconnected or not, should work.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d891eedb

Check for immutable/append flag in fallocate path · 1ca551c6

由 Marco Stornelli 提交于 3月 05, 2011

In the fallocate path the kernel doesn't check for the immutable/append
flag. It's possible to have a race condition in this scenario: an
application open a file in read/write and it does something, meanwhile
root set the immutable flag on the file, the application at that point
can call fallocate with success. In addition, we don't allow to do any
unreserve operation on an append only file but only the reserve one.
Signed-off-by: NMarco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1ca551c6

fat: fix d_revalidate oopsen on NFS exports · 9177ada9

由 Al Viro 提交于 3月 10, 2011

can't blindly check nd->flags in ->d_revalidate()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9177ada9

jfs: fix d_revalidate oopsen on NFS exports · 8ce84eeb

由 Al Viro 提交于 3月 10, 2011

can't blindly check nd->flags in ->d_revalidate()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8ce84eeb

ocfs2: fix d_revalidate oopsen on NFS exports · 4714e637

由 Al Viro 提交于 3月 10, 2011

can't blindly check nd->flags in ->d_revalidate()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4714e637

gfs2: fix d_revalidate oopsen on NFS exports · 53fe9241

由 Al Viro 提交于 3月 10, 2011

can't blindly check nd->flags in ->d_revalidate()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

53fe9241

fuse: fix d_revalidate oopsen on NFS exports · 529c5f95

由 Al Viro 提交于 3月 10, 2011

can't blindly check nd->flags in ->d_revalidate()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

529c5f95

ceph: fix d_revalidate oopsen on NFS exports · 0eb980e3

由 Al Viro 提交于 3月 10, 2011

can't blindly check nd->flags in ->d_revalidate()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0eb980e3

A
reiserfs xattr ->d_revalidate() shouldn't care about RCU · c78f4cc5
由 Al Viro 提交于 2月 16, 2011
```
... it returns an error unconditionally
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
c78f4cc5
A
/proc/self is never going to be invalidated... · ae50adcb
由 Al Viro 提交于 2月 16, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ae50adcb

09 3月, 2011 3 次提交

nd->inode is not set on the second attempt in path_walk() · b306419a

由 Al Viro 提交于 3月 08, 2011

We leave it at whatever it had been pointing to after the
first link_path_walk() had failed with -ESTALE.  Things
do not work well after that...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b306419a

nfsd: wrong index used in inner loop · 3ec07aa9

由 roel 提交于 3月 08, 2011

Index i was already used in the outer loop

Cc: stable@kernel.org
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3ec07aa9

Btrfs: make sure not to return overlapping extents to fiemap · ea8efc74

由 Chris Mason 提交于 3月 08, 2011

The btrfs fiemap code was incorrectly returning duplicate or overlapping
extents in some cases.  cp was blindly trusting this result and we would
end up with a destination file that was bigger than the original because
some bytes were copied twice.

The fix here adjusts our offsets to make sure we're always moving
forward in the fiemap results.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ea8efc74

08 3月, 2011 3 次提交

unfuck proc_sysctl ->d_compare() · dfef6dcd

由 Al Viro 提交于 3月 08, 2011

a) struct inode is not going to be freed under ->d_compare();
however, the thing PROC_I(inode)->sysctl points to just might.
Fortunately, it's enough to make freeing that sucker delayed,
provided that we don't step on its ->unregistering, clear
the pointer to it in PROC_I(inode) before dropping the reference
and check if it's NULL in ->d_compare().

b) I'm not sure that we *can* walk into NULL inode here (we recheck
dentry->seq between verifying that it's still hashed / fetching
dentry->d_inode and passing it to ->d_compare() and there's no
negative hashed dentries in /proc/sys/*), but if we can walk into
that, we really should not have ->d_compare() return 0 on it!
Said that, I really suspect that this check can be simply killed.
Nick?
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dfef6dcd

nfsd4: fix bad pointer on failure to find delegation · 32b007b4

由 J. Bruce Fields 提交于 3月 06, 2011

In case of a nonempty list, the return on error here is obviously bogus;
it ends up being a pointer to the list head instead of to any valid
delegation on the list.

In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky,
then renew_client may oops, and it may take an embarassingly long time to
figure out why.  Facepalm.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
IP: [<ffffffff81292965>] nfsd4_delegreturn+0x125/0x200
...

Cc: stable@kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

32b007b4

Btrfs: deal with short returns from copy_from_user · 31339acd

由 Chris Mason 提交于 3月 07, 2011

When copy_from_user is only able to copy some of the bytes we requested,
we may end up creating a partially up to date page.  To avoid garbage in
the page, we need to treat a partial copy as a zero length copy.

This makes the rest of the file_write code drop the page and
retry the whole copy instead of marking the partially up to
date page as dirty.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
cc: stable@kernel.org

31339acd

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致