提交 · 0601f793921157603831d00a9541d92e8f5763f6 · openeuler / Kernel

08 4月, 2011 1 次提交

SUNRPC: requeue tcp socket less frequently · 0601f793

由 Trond Myklebust 提交于 5月 18, 2009

Don't requeue the socket in some cases where we know it's unnecessary.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0601f793

27 3月, 2011 1 次提交

NFS: Ensure that rpc_release_resources_task() can be called twice. · a271c5a0

由 OGAWA Hirofumi 提交于 3月 27, 2011

BUG: atomic_dec_and_test(): -1: atomic counter underflow at:
Pid: 2827, comm: mount.nfs Not tainted 2.6.38 #1
Call Trace:
 [<ffffffffa02223a0>] ? put_rpccred+0x44/0x14e [sunrpc]
 [<ffffffffa021bbe9>] ? rpc_ping+0x4e/0x58 [sunrpc]
 [<ffffffffa021c4a5>] ? rpc_create+0x481/0x4fc [sunrpc]
 [<ffffffffa022298a>] ? rpcauth_lookup_credcache+0xab/0x22d [sunrpc]
 [<ffffffffa028be8c>] ? nfs_create_rpc_client+0xa6/0xeb [nfs]
 [<ffffffffa028c660>] ? nfs4_set_client+0xc2/0x1f9 [nfs]
 [<ffffffffa028cd3c>] ? nfs4_create_server+0xf2/0x2a6 [nfs]
 [<ffffffffa0295d07>] ? nfs4_remote_mount+0x4e/0x14a [nfs]
 [<ffffffff810dd570>] ? vfs_kern_mount+0x6e/0x133
 [<ffffffffa029605a>] ? nfs_do_root_mount+0x76/0x95 [nfs]
 [<ffffffffa029643d>] ? nfs4_try_mount+0x56/0xaf [nfs]
 [<ffffffffa0297434>] ? nfs_get_sb+0x435/0x73c [nfs]
 [<ffffffff810dd59b>] ? vfs_kern_mount+0x99/0x133
 [<ffffffff810dd693>] ? do_kern_mount+0x48/0xd8
 [<ffffffff810f5b75>] ? do_mount+0x6da/0x741
 [<ffffffff810f5c5f>] ? sys_mount+0x83/0xc0
 [<ffffffff8100293b>] ? system_call_fastpath+0x16/0x1b

Well, so, I think this is real bug of nfs codes somewhere. With some
review, the code

rpc_call_sync()
    rpc_run_task
        rpc_execute()
            __rpc_execute()
                rpc_release_task()
                    rpc_release_resources_task()
                        put_rpccred()                <= release cred
    rpc_put_task
        rpc_do_put_task()
            rpc_release_resources_task()
                put_rpccred()                        <= release cred again

seems to be release cred unintendedly.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a271c5a0

25 3月, 2011 2 次提交

NFS: Determine initial mount security · 8f70e95f

由 Bryan Schumaker 提交于 3月 24, 2011

When sec=<something> is not presented as a mount option,
we should attempt to determine what security flavor the
server is using.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8f70e95f

NFS: use secinfo when crossing mountpoints · 7ebb9315

由 Bryan Schumaker 提交于 3月 24, 2011

A submount may use different security than the parent
mount does.  We should figure out what sec flavor the
submount uses at mount time.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7ebb9315

23 3月, 2011 1 次提交

SUNRPC: Never reuse the socket port after an xs_close() · 246408dc

由 Trond Myklebust 提交于 3月 22, 2011

If we call xs_close(), we're in one of two situations:
 - Autoclose, which means we don't expect to resend a request
 - bind+connect failed, which probably means the port is in use
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org

246408dc

18 3月, 2011 5 次提交

SUNRPC: Remove resource leak in svc_rdma_send_error() · 4be34b9d

由 Jesper Juhl 提交于 1月 22, 2011

We leak the memory allocated to 'ctxt' when we return after
'ib_dma_mapping_error()' returns !=0.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4be34b9d

RPC: killing RPC tasks races fixed · 8e26de23

由 Stanislav Kinsbursky 提交于 3月 17, 2011

RPC task RPC_TASK_QUEUED bit is set must be checked before trying to wake up
task rpc_killall_tasks() because task->tk_waitqueue can not be set (equal to
NULL).
Also, as Trond Myklebust mentioned, such approach (instead of checking
tk_waitqueue to NULL) allows us to "optimise away the call to
rpc_wake_up_queued_task() altogether for those
tasks that aren't queued".

Here is an example of dereferencing of tk_waitqueue equal to NULL:

CPU 0               	CPU 1				CPU 2
--------------------	---------------------	--------------------------
nfs4_run_open_task
rpc_run_task
rpc_execute
rpc_set_active
rpc_make_runnable
(waiting)
			rpc_async_schedule
			nfs4_open_prepare
			nfs_wait_on_sequence
						nfs_umount_begin
						rpc_killall_tasks
						rpc_wake_up_task
						rpc_wake_up_queued_task
						spin_lock(tk_waitqueue == NULL)
						BUG()
			rpc_sleep_on
			spin_lock(&q->lock)
			__rpc_sleep_on
			task->tk_waitqueue = q
Signed-off-by: NStanislav Kinsbursky <skinsbursky@openvz.org>
Cc: stable@kernel.org
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8e26de23

xprt: remove redundant check · ba3c578d

由 j223yang@asset.uwaterloo.ca 提交于 3月 16, 2011

remove redundant check.
Signed-off-by: NJinqiu Yang <crindy646@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

ba3c578d

T
SUNRPC: Convert struct rpc_xprt to use atomic_t counters · a8de240a
由 Trond Myklebust 提交于 3月 15, 2011
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
a8de240a

SUNRPC: Ensure we always run the tk_callback before tk_action · e020c680

由 Trond Myklebust 提交于 3月 15, 2011

This fixes a race in which the task->tk_callback() puts the rpc_task
to sleep, setting a new callback. Under certain circumstances, the current
code may end up executing the task->tk_action before it gets round to the
callback.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org

e020c680

16 3月, 2011 2 次提交

sunrpc: fix printk format warning · 986d4abb

由 Randy Dunlap 提交于 3月 15, 2011

Fix printk format build warning:

net/sunrpc/xprtrdma/verbs.c:1463: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'dma_addr_t'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

986d4abb

xprt: remove redundant null check · 4d4a76f3

由 j223yang@asset.uwaterloo.ca 提交于 3月 10, 2011

'req' is dereferenced before checked for NULL.
The patch simply removes the check.

Signed-off-by: Jinqiu Yang<crindy646@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4d4a76f3

12 3月, 2011 6 次提交

gss:krb5 only include enctype numbers in gm_upcall_enctypes · f8628220

由 Kevin Coffman 提交于 3月 03, 2011

Make the value in gm_upcall_enctypes just the enctype values.
This allows the values to be used more easily elsewhere.
Signed-off-by: NKevin Coffman <kwc@citi.umich.edu>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f8628220

RPCRDMA: Fix FRMR registration/invalidate handling. · 5c635e09

由 Tom Tucker 提交于 2月 09, 2011

When the rpc_memreg_strategy is 5, FRMR are used to map RPC data.
This mode uses an FRMR to map the RPC data, then invalidates
(i.e. unregisers) the data in xprt_rdma_free. These FRMR are used
across connections on the same mount, i.e. if the connection goes
away on an idle timeout and reconnects later, the FRMR are not
destroyed and recreated.

This creates a problem for transport errors because the WR that
invalidate an FRMR may be flushed (i.e. fail) leaving the
FRMR valid. When the FRMR is later used to map an RPC it will fail,
tearing down the transport and starting over. Over time, more and
more of the FRMR pool end up in the wrong state resulting in
seemingly random disconnects.

This fix keeps track of the FRMR state explicitly by setting it's
state based on the successful completion of a reg/inv WR. If the FRMR
is ever used and found to be in the wrong state, an invalidate WR
is prepended, re-syncing the FRMR state and avoiding the connection loss.
Signed-off-by: NTom Tucker <tom@ogc.us>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

5c635e09

RPCRDMA: Fix to XDR page base interpretation in marshalling logic. · bd7ea31b

由 Tom Tucker 提交于 2月 09, 2011

The RPCRDMA marshalling logic assumed that xdr->page_base was an
offset into the first page of xdr->page_list. It is in fact an
offset into the xdr->page_list itself, that is, it selects the
first page in the page_list and the offset into that page.

The symptom depended in part on the rpc_memreg_strategy, if it was
FRMR, or some other one-shot mapping mode, the connection would get
torn down on a base and bounds error. When the badly marshalled RPC
was retransmitted it would reconnect, get the error, and tear down the
connection again in a loop forever. This resulted in a hung-mount. For
the other modes, it would result in silent data corruption. This bug is
most easily reproduced by writing more data than the filesystem
has space for.

This fix corrects the page_base assumption and otherwise simplifies
the iov mapping logic.
Signed-off-by: NTom Tucker <tom@ogc.us>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bd7ea31b

NFSv4.1: filelayout async error handler · cbdabc7f

由 Andy Adamson 提交于 3月 01, 2011

Use our own async error handler.
Mark the layout as failed and retry i/o through the MDS on specified errors.

Update the mds_offset in nfs_readpage_retry so that a failed short-read retry
to a DS gets correctly resent through the MDS.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

cbdabc7f

RPC: clarify rpc_run_task error handling · eabf5baa

由 Fred Isaman 提交于 2月 11, 2011

rpc_run_task can only fail if it is not passed in a preallocated task.
However, that is not at all clear with the current code.  So
remove several impossible to occur failure checks.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

eabf5baa

RPC: remove check for impossible condition in rpc_make_runnable · cee6a537

由 Fred Isaman 提交于 2月 11, 2011

queue_work() only returns 0 or 1, never a negative value.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

cee6a537

11 3月, 2011 3 次提交

sunrpc: Propagate errors from xs_bind() through xs_create_sock() · 4cea288a

由 Ben Hutchings 提交于 2月 22, 2011

xs_create_sock() is supposed to return a pointer or an ERR_PTR-encoded
error, but it currently returns 0 if xs_bind() fails.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org [v2.6.37]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4cea288a

SUNRPC: Remove resource leak in svc_rdma_send_error() · a5e50268

由 Jesper Juhl 提交于 1月 22, 2011

We leak the memory allocated to 'ctxt' when we return after
'ib_dma_mapping_error()' returns !=0.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a5e50268

SUNRPC: Close a race in __rpc_wait_for_completion_task() · bf294b41

由 Trond Myklebust 提交于 2月 21, 2011

Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bf294b41

10 3月, 2011 1 次提交

svcrpc: fix bad argument in unix_domain_find · 352b5d13

由 J. Bruce Fields 提交于 3月 09, 2011

"After merging the nfsd tree, today's linux-next build (powerpc
ppc64_defconfig) produced this warning:

net/sunrpc/svcauth_unix.c: In function 'unix_domain_find':
net/sunrpc/svcauth_unix.c:58: warning: passing argument 1 of
+'svcauth_unix_domain_release' from incompatible pointer type
net/sunrpc/svcauth_unix.c:41: note: expected 'struct auth_domain *' but
argument
+is of type 'struct unix_domain *'

Introduced by commit 8b3e07ac ("svcrpc: fix rare race on unix_domain
creation")."
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

352b5d13

09 3月, 2011 1 次提交

svcrpc: fix rare race on unix_domain creation · 8b3e07ac

由 J. Bruce Fields 提交于 3月 07, 2011

Note that "new" here is not yet fully initialized; auth_domain_put
should be called only on auth_domains that have actually been added to
the hash.

Before this fix, two attempts to add the same domain at once could
cause the hlist_del in auth_domain_put to fail.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

8b3e07ac

08 3月, 2011 1 次提交

gss:krb5 only include enctype numbers in gm_upcall_enctypes · 540c8cb6

由 Kevin Coffman 提交于 3月 02, 2011

Make the value in gm_upcall_enctypes just the enctype values.
This allows the values to be used more easily elsewhere.
Signed-off-by: NKevin Coffman <kwc@citi.umich.edu>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

540c8cb6

23 2月, 2011 1 次提交

net: add __rcu annotations to sk_wq and wq · eaefd110

由 Eric Dumazet 提交于 2月 18, 2011

Add proper RCU annotations/verbs to sk_wq and wq members

Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)

Fix sunrpc sk_sleep() abuse too
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eaefd110

26 1月, 2011 1 次提交

NFS do not find client in NFSv4 pg_authenticate · 778be232

由 Andy Adamson 提交于 1月 25, 2011

The information required to find the nfs_client cooresponding to the incoming
back channel request is contained in the NFS layer. Perform minimal checking
in the RPC layer pg_authenticate method, and push more detailed checking into
the NFS layer where the nfs_client can be found.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

778be232

25 1月, 2011 1 次提交

workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER · ada609ee

由 Tejun Heo 提交于 1月 25, 2011

WQ_RESCUER is now an internal flag and should only be used in the
workqueue implementation proper.  Use WQ_MEM_RECLAIM instead.

This doesn't introduce any functional difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>

ada609ee

12 1月, 2011 3 次提交

rpc: allow xprt_class->setup to return a preexisting xprt · f0418aa4

由 J. Bruce Fields 提交于 12月 08, 2010

This allows us to reuse the xprt associated with a server connection if
one has already been set up.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f0418aa4

rpc: keep backchannel xprt as long as server connection · 99de8ea9

由 J. Bruce Fields 提交于 12月 08, 2010

Multiple backchannels can share the same tcp connection; from rfc 5661 section
2.10.3.1:

	A connection's association with a session is not exclusive.  A
	connection associated with the channel(s) of one session may be
	simultaneously associated with the channel(s) of other sessions
	including sessions associated with other client IDs.

However, multiple backchannels share a connection, they must all share
the same xid stream (hence the same rpc_xprt); the only way we have to
match replies with calls at the rpc layer is using the xid.

So, keep the rpc_xprt around as long as the connection lasts, in case
we're asked to use the connection as a backchannel again.

Requests to create new backchannel clients over a given server
connection should results in creating new clients that reuse the
existing rpc_xprt.

But to start, just reject attempts to associate multiple rpc_xprt's with
the same underlying bc_xprt.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

99de8ea9

rpc: move sk_bc_xprt to svc_xprt · d75faea3

由 J. Bruce Fields 提交于 11月 30, 2010

This seems obviously transport-level information even if it's currently
used only by the server socket code.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d75faea3

11 1月, 2011 1 次提交

NFS: Don't use vm_map_ram() in readdir · 6650239a

由 Trond Myklebust 提交于 1月 08, 2011

vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.

The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org  [2.6.37]

6650239a

07 1月, 2011 9 次提交

fs: dcache reduce branches in lookup path · fb045adb

由 Nick Piggin 提交于 1月 07, 2011

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fb045adb

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

fs: change d_delete semantics · fe15ce44

由 Nick Piggin 提交于 1月 07, 2011

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fe15ce44

NFS rename client back channel transport field · 4a19de0f