提交 · d6fb79d433d0a34c36bdf74eaf90857193a6261f · openeuler / raspberrypi-kernel

12 3月, 2011 16 次提交

NFSv4.1: new flag for lease time check · d6fb79d4

由 Andy Adamson 提交于 3月 01, 2011

Data servers cannot send nfs4_proc_get_lease_time. but still need to setup
state renewal. Add the NFS_CS_CHECK_LEASE_TIME bit to indicate if the lease
time can be checked.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d6fb79d4

NFSv4.1: new flag for state renewal check · d3b4c9d7

由 Andy Adamson 提交于 3月 01, 2011

Data servers not sharing a session with the mount MDS always have an empty
cl_superblocks list.
Replace the cl_superblocks empty list check to see if it is time to shut down
renewd with the NFS_CS_STOP_RENEW bit which is not set by such a data server.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d3b4c9d7

NFSv4.1: send zero stateid seqid on v4.1 i/o · 89d1ea65

由 Andy Adamson 提交于 3月 01, 2011

Data servers require a zero stateid seqid, and there is no advantage to not
doing the same for all NFSv4.1
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

89d1ea65

NFS move nfs_client initialization into nfs_get_client · 45a52a02

由 Andy Adamson 提交于 3月 01, 2011

Now nfs_get_client returns an nfs_client ready to be used no matter if it was
found or created.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

45a52a02

NFSv4.1: put_layout_hdr can remove nfsi->layout · bf9c1387

由 Andy Adamson 提交于 3月 01, 2011

Prevents an Oops triggered by CB_LAYOUTRECALL and LAYOUTGET race on a
pnfs_layout_hdr first pnfs_layout_segment.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bf9c1387

NFS: change nfs_writeback_done to return void · 13602896

由 Fred Isaman 提交于 2月 11, 2011

The return values are not used by any callers.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

13602896

NFS: remove pointless if statement in nfs_direct_write_result · 83762c56

由 Fred Isaman 提交于 2月 11, 2011

The code was doing nothing more in either branch of the if.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

83762c56

pnfs: fix pnfs lock inversion of i_lock and cl_lock · f49f9baa

由 Fred Isaman 提交于 2月 03, 2011

The pnfs code was using throughout the lock order i_lock, cl_lock.
This conflicts with the nfs delegation code. Rework the pnfs code
to avoid taking both locks simultaneously.

Currently the code takes the double lock to add/remove the layout to a
nfs_client list, while atomically checking that the list of lsegs is
empty. To avoid this, we rely on existing serializations. When a
layout is initialized with lseg count equal zero, LAYOUTGET's
openstateid serialization is in effect, making it safe to assume it
stays zero unless we change it. And once a layout's lseg count drops
to zero, it is set as DESTROYED and so will stay at zero.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f49f9baa

pnfs: do not need to clear NFS_LAYOUT_BULK_RECALL flag · 9f52c252

由 Fred Isaman 提交于 2月 03, 2011

We do not need to clear the NFS_LAYOUT_BULK_RECALL, as setting it
guarantees that NFS_LAYOUT_DESTROYED will be set once any outstanding
io is finished.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

9f52c252

pnfs: avoid incorrect use of layout stateid · 38511722

由 Fred Isaman 提交于 2月 03, 2011

The code could violate the following from RFC5661, section 12.5.3:
"Once a client has no more layouts on a file, the layout stateid is no
longer valid and MUST NOT be used."

This can occur when a layout already has a lseg, starts another
non-everlapping LAYOUTGET, and a CB_LAYOUTRECALL for the existing lseg
is processed before we hit pnfs_layout_process().

Solve by setting, each time the client has no more lsegs for a file, a
flag which blocks further use of the layout and triggers its removal.

This also fixes a second bug which occurs in the same instance as
above.  If we actually use pnfs_layout_process, we add the new lseg to
the layout, but the layout has been removed from the nfs_client list
by the intervening CB_LAYOUTRECALL and will not be added back.  Thus
the newly acquired lseg will not be properly returned in the event of
a subsequent CB_LAYOUTRECALL.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

38511722

NFS: NFSROOT should default to "proto=udp" · 53d47375

由 Chuck Lever 提交于 3月 11, 2011

There have been a number of recent reports that NFSROOT is no longer
working with default mount options, but fails only with certain NICs.

Brian Downing <bdowning@lavos.net> bisected to commit 56463e50 "NFS:
Use super.c for NFSROOT mount option parsing".  Among other things,
this commit changes the default mount options for NFSROOT to use TCP
instead of UDP as the underlying transport.

TCP seems less able to deal with NICs that are slow to initialize.
The system logs that have accompanied reports of problems all show
that NFSROOT attempts to establish a TCP connection before the NIC is
fully initialized, and thus the TCP connection attempt fails.

When a TCP connection attempt fails during a mount operation, the
NFS stack needs to fail the operation.  Usually user space knows how
and when to retry it.  The network layer does not report a distinct
error code for this particular failure mode.  Thus, there isn't a
clean way for the RPC client to see that it needs to retry in this
case, but not in others.

Because NFSROOT is used in some environments where it is not possible
to update the kernel command line to specify "udp", the proper thing
to do is change NFSROOT to use UDP by default, as it did before commit
56463e50.

To make it easier to see how to change default mount options for
NFSROOT and to distinguish default settings from mandatory settings,
I've adjusted a couple of areas to document the specifics.

root_nfs_cat() is also modified to deal with commas properly when
concatenating strings containing mount option lists.  This keeps
root_nfs_cat() call sites simpler, now that we may be concatenating
multiple mount option strings.
Tested-by: NBrian Downing <bdowning@lavos.net>
Tested-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: <stable@kernel.org> # 2.6.37
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

53d47375

nfs4: remove duplicated #include · 57df216b

由 Huang Weiyi 提交于 3月 08, 2011

Remove duplicated #include('s) in
  fs/nfs/nfs4proc.c
Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

57df216b

NFSv4: nfs4_state_mark_reclaim_nograce() should be static · f9feab1e

由 Trond Myklebust 提交于 3月 09, 2011

There are no more external users of nfs4_state_mark_reclaim_nograce() or
nfs4_state_mark_reclaim_reboot(), so mark them as static.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f9feab1e

T
NFSv4: Fix the setlk error handler · ecac799a
由 Trond Myklebust 提交于 3月 09, 2011
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
ecac799a

NFSv4.1: Fix the handling of the SEQUENCE status bits · b4410c2f

由 Trond Myklebust 提交于 3月 09, 2011

We want SEQUENCE status bits to be handled by the state manager in order
to avoid threading issues.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b4410c2f

NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses · 0400a6b0

由 Trond Myklebust 提交于 3月 09, 2011

nfs4_schedule_state_recovery() should only be used when we need to force
the state manager to check the lease. If we just want to start the
state manager in order to handle a state recovery situation, we should be
using nfs4_schedule_state_manager().

This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing
its use with a set of helper functions that do the right thing.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

0400a6b0

11 3月, 2011 8 次提交

NFSv4.1 reclaim complete must wait for completion · c34c32ea

由 Andy Adamson 提交于 3月 09, 2011

Signed-off-by: NAndy Adamson <andros@netapp.com>
[Trond: fix whitespace errors]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c34c32ea

NFSv4: remove duplicate clientid in struct nfs_client · 114f64b5

由 Andy Adamson 提交于 3月 09, 2011

Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

114f64b5

NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY · 7d6d63d6

由 Ricardo Labiaga 提交于 3月 09, 2011

Fix bug where we currently retry the EXCHANGEID call again, eventhough
we already have a valid clientid.  Instead, delay and retry the CREATE_SESSION
call.
Signed-off-by: NRicardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7d6d63d6

(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31... · 3fa0b4e2

由 Frank Filz 提交于 12月 02, 2010

(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid

The problem was use of an int32, which when converted to a uint64
is sign extended resulting in a fileid that doesn't fit in 32 bits
even though the intent of the function is to fit the fileid into
32 bits.
Signed-off-by: NFrank Filz <ffilzlnx@us.ibm.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
[Trond: Added an include for compat.h]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

3fa0b4e2

nfs: fix compilation warning · 43b7c3f0

由 Jovi Zhang 提交于 3月 02, 2011

this commit fix compilation warning as following:
linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast
Signed-off-by: NJovi Zhang <bookjovi@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

43b7c3f0

nfs: add kmalloc return value check in decode_and_add_ds · b9f81057

由 Stanislav Fomichev 提交于 2月 05, 2011

add kmalloc return value check in decode_and_add_ds
Signed-off-by: NStanislav Fomichev <kernel@fomichev.me>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b9f81057

nfs: close NFSv4 COMMIT vs. CLOSE race · d2224e7a

由 Jeff Layton 提交于 3月 06, 2011

I've been adding in more artificial delays in the NFSv4 commit and close
codepaths to uncover races. The kernel I'm testing has the patch to
close the race in __rpc_wait_for_completion_task that's in Trond's
cthon2011 branch. The reproducer I've been using does this in a loop:

	mkdir("DIR");
	fd = open("DIR/FILE", O_WRONLY|O_CREAT|O_EXCL, 0644);
	write(fd, "abcdefg", 7);
	close(fd);
	unlink("DIR/FILE");
	rmdir("DIR");

The above reproducer shouldn't result in any silly-renaming. However,
when I add a "msleep(100)" just after the nfs_commit_clear_lock call in
nfs_commit_release, I can almost always force one to occur. If I can
force it to occur with that, then it can happen without that delay
given the right timing.

nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called
with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait
for the task to complete before putting its reference to it, so the last
reference get put in rpc_release task and gets queued to a workqueue.

In this situation, the last open context reference may be put by the
COMMIT release instead of the close() syscall. The close() syscall
returns too quickly and the unlink runs while the d_count is still
high since the COMMIT release hasn't put its dentry reference yet.

Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete
before putting the task reference when FLUSH_SYNC is set. With this, the
last reference is put by the process that's initiating the FLUSH_SYNC
commit and the race is closed.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d2224e7a

SUNRPC: Close a race in __rpc_wait_for_completion_task() · bf294b41

由 Trond Myklebust 提交于 2月 21, 2011

Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bf294b41

05 3月, 2011 2 次提交

nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) · e9e3d724

由 Neil Horman 提交于 3月 04, 2011

The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):

  bad_page+0x69/0x91
  free_hot_cold_page+0x81/0x144
  skb_release_data+0x5f/0x98
  __kfree_skb+0x11/0x1a
  tcp_ack+0x6a3/0x1868
  tcp_rcv_established+0x7a6/0x8b9
  tcp_v4_do_rcv+0x2a/0x2fa
  tcp_v4_rcv+0x9a2/0x9f6
  do_timer+0x2df/0x52c
  ip_local_deliver+0x19d/0x263
  ip_rcv+0x539/0x57c
  netif_receive_skb+0x470/0x49f
  :virtio_net:virtnet_poll+0x46b/0x5c5
  net_rx_action+0xac/0x1b3
  __do_softirq+0x89/0x133
  call_softirq+0x1c/0x28
  do_softirq+0x2c/0x7d
  do_IRQ+0xec/0xf5
  default_idle+0x0/0x50
  ret_from_intr+0x0/0xa
  default_idle+0x29/0x50
  cpu_idle+0x95/0xb8
  start_kernel+0x220/0x225
  _sinittext+0x22f/0x236

It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.

We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:

1) ensure that when we create the list of pages, no page struct has
   PG_Slab set

 or

2) not use a page list to send this data

Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go.  I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it.  This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked.  We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages.  This way the data
will be properly freed when the ack comes in

Successfully tested by myself to solve the above oops.

Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: security@kernel.org
CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e9e3d724

ceph: no .snap inside of snapped namespace · 455cec0a

由 Sage Weil 提交于 3月 03, 2011

Otherwise you can do things like

# mkdir .snap/foo
# cd .snap/foo/.snap
# ls
<badness>
Signed-off-by: NSage Weil <sage@newdream.net>

455cec0a

04 3月, 2011 3 次提交

ceph: do not clear I_COMPLETE from d_release · 16a8b70a

由 Sage Weil 提交于 2月 28, 2011

First, this was racy anyway: d_release isn't called until well after the
dentry is unhashed.  Second, this runs afoul of the recent dcache change
that clears d_parent prior to calling d_release (949854d0), causing a NULL
pointer dereference.
Signed-off-by: NSage Weil <sage@newdream.net>

16a8b70a

ceph: do not set I_COMPLETE · b545cc15

由 Sage Weil 提交于 2月 28, 2011

Do not set the I_COMPLETE flag on directories until we resolve races with
dcache pruning.
Signed-off-by: NSage Weil <sage@newdream.net>

b545cc15

Revert "ceph: keep reference to parent inode on ceph_dentry" · 9bde178d

由 Sage Weil 提交于 2月 28, 2011

This reverts commit 97d79b40.

This fails to account for d_parent changes due to rename or disconnected
dentries due to submounts or NFS reexports.
Signed-off-by: NSage Weil <sage@newdream.net>

9bde178d

03 3月, 2011 9 次提交

hfs: fix rename() over non-empty directory · 69102e9b

由 Al Viro 提交于 3月 02, 2011

merge hfs_unlink() and hfs_rmdir(), while we are at it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

69102e9b

udf: fix i_nlink limit · 810c1b2e

由 Al Viro 提交于 3月 02, 2011

(256 << sizeof(x)) - 1 is not the maximal possible value of x...
In reality, the maximal allowed value for UDF FileLinkCount is
65535.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

810c1b2e

fix reiserfs mkdir() breakage · 99890a3b

由 Al Viro 提交于 3月 02, 2011

if directory has so many subdirectories that its link count is set
to 1 (i.e. "can't tell accurately") and reiserfs_new_inode() fails,
we shouldn't decrement the parent's link count in cleanup path;
that's what DEC_DIR_INODE_NLINK() is for.  As it is, we end up
with parent suddenly getting zero i_nlink, with very unpleasant
effects.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

99890a3b

A
exofs: i_nlink races in rename() · babfe560
由 Al Viro 提交于 3月 02, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
babfe560
A
nilfs2: i_nlink races in rename() · 30eb43d3
由 Al Viro 提交于 3月 02, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
30eb43d3
A
minix: i_nlink races in rename() · 6f88049c
由 Al Viro 提交于 3月 02, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6f88049c
A
ufs: i_nlink races in rename() · 37750cdd
由 Al Viro 提交于 3月 02, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
37750cdd
A
sysv: i_nlink races in rename() · 4787d45f
由 Al Viro 提交于 3月 02, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
4787d45f

of/flattree: Drop an uninteresting message to pr_debug level · 8aaccf7f

由 Paul Bolle 提交于 2月 14, 2011

This message looks like an error (which it isn't) when booting with a
flattened device tree.  Remove the message from normal kernel builds.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

8aaccf7f

02 3月, 2011 2 次提交

ext2: Fix link count corruption under heavy link+rename load · e8a80c6f

由 Josh Hunt 提交于 2月 24, 2011

vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
it as reported and analyzed by Josh.

In fact, there is no good reason to mess with i_nlink of the moved file.
We did it presumably to simulate linking into the new directory and unlinking
from an old one. But the practical effect of this is disputable because fsck
can possibly treat file as being properly linked into both directories without
writing any error which is confusing. So we just stop increment-decrement
games with i_nlink which also fixes the corruption.

CC: stable@kernel.org
CC: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NJosh Hunt <johunt@akamai.com>
Signed-off-by: NJan Kara <jack@suse.cz>

e8a80c6f

xfs: zero proper structure size for geometry calls · af24ee9e

由 Alex Elder 提交于 3月 01, 2011

Commit 493f3358 added this call to
xfs_fs_geometry() in order to avoid passing kernel stack data back
to user space:

+       memset(geo, 0, sizeof(*geo));

Unfortunately, one of the callers of that function passes the
address of a smaller data type, cast to fit the type that
xfs_fs_geometry() requires.  As a result, this can happen:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: f87aca93

Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358+ #1
Call Trace:

[<c12991ac>] ? panic+0x50/0x150
[<c102ed71>] ? __stack_chk_fail+0x10/0x18
[<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs]

Fix this by fixing that one caller to pass the right type and then
copy out the subset it is interested in.

Note: This patch is an alternative to one originally proposed by
Eric Sandeen.
Reported-by: NJeffrey Hundstad <jeffrey.hundstad@mnsu.edu>
Signed-off-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Tested-by: NJeffrey Hundstad <jeffrey.hundstad@mnsu.edu>

af24ee9e