提交 · bd7ea31b9e8a342be76e0fe8d638343886c2d8c5 · openanolis / cloud-kernel

12 3月, 2011 4 次提交

RPCRDMA: Fix to XDR page base interpretation in marshalling logic. · bd7ea31b

由 Tom Tucker 提交于 2月 09, 2011

The RPCRDMA marshalling logic assumed that xdr->page_base was an
offset into the first page of xdr->page_list. It is in fact an
offset into the xdr->page_list itself, that is, it selects the
first page in the page_list and the offset into that page.

The symptom depended in part on the rpc_memreg_strategy, if it was
FRMR, or some other one-shot mapping mode, the connection would get
torn down on a base and bounds error. When the badly marshalled RPC
was retransmitted it would reconnect, get the error, and tear down the
connection again in a loop forever. This resulted in a hung-mount. For
the other modes, it would result in silent data corruption. This bug is
most easily reproduced by writing more data than the filesystem
has space for.

This fix corrects the page_base assumption and otherwise simplifies
the iov mapping logic.
Signed-off-by: NTom Tucker <tom@ogc.us>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bd7ea31b

NFSv4.1: filelayout async error handler · cbdabc7f

由 Andy Adamson 提交于 3月 01, 2011

Use our own async error handler.
Mark the layout as failed and retry i/o through the MDS on specified errors.

Update the mds_offset in nfs_readpage_retry so that a failed short-read retry
to a DS gets correctly resent through the MDS.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

cbdabc7f

RPC: clarify rpc_run_task error handling · eabf5baa

由 Fred Isaman 提交于 2月 11, 2011

rpc_run_task can only fail if it is not passed in a preallocated task.
However, that is not at all clear with the current code.  So
remove several impossible to occur failure checks.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

eabf5baa

RPC: remove check for impossible condition in rpc_make_runnable · cee6a537

由 Fred Isaman 提交于 2月 11, 2011

queue_work() only returns 0 or 1, never a negative value.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

cee6a537

11 3月, 2011 3 次提交

sunrpc: Propagate errors from xs_bind() through xs_create_sock() · 4cea288a

由 Ben Hutchings 提交于 2月 22, 2011

xs_create_sock() is supposed to return a pointer or an ERR_PTR-encoded
error, but it currently returns 0 if xs_bind() fails.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org [v2.6.37]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4cea288a

SUNRPC: Remove resource leak in svc_rdma_send_error() · a5e50268

由 Jesper Juhl 提交于 1月 22, 2011

We leak the memory allocated to 'ctxt' when we return after
'ib_dma_mapping_error()' returns !=0.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a5e50268

SUNRPC: Close a race in __rpc_wait_for_completion_task() · bf294b41

由 Trond Myklebust 提交于 2月 21, 2011

Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bf294b41

26 1月, 2011 1 次提交

NFS do not find client in NFSv4 pg_authenticate · 778be232

由 Andy Adamson 提交于 1月 25, 2011

The information required to find the nfs_client cooresponding to the incoming
back channel request is contained in the NFS layer. Perform minimal checking
in the RPC layer pg_authenticate method, and push more detailed checking into
the NFS layer where the nfs_client can be found.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

778be232

12 1月, 2011 3 次提交

rpc: allow xprt_class->setup to return a preexisting xprt · f0418aa4

由 J. Bruce Fields 提交于 12月 08, 2010

This allows us to reuse the xprt associated with a server connection if
one has already been set up.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f0418aa4

rpc: keep backchannel xprt as long as server connection · 99de8ea9

由 J. Bruce Fields 提交于 12月 08, 2010

Multiple backchannels can share the same tcp connection; from rfc 5661 section
2.10.3.1:

	A connection's association with a session is not exclusive.  A
	connection associated with the channel(s) of one session may be
	simultaneously associated with the channel(s) of other sessions
	including sessions associated with other client IDs.

However, multiple backchannels share a connection, they must all share
the same xid stream (hence the same rpc_xprt); the only way we have to
match replies with calls at the rpc layer is using the xid.

So, keep the rpc_xprt around as long as the connection lasts, in case
we're asked to use the connection as a backchannel again.

Requests to create new backchannel clients over a given server
connection should results in creating new clients that reuse the
existing rpc_xprt.

But to start, just reject attempts to associate multiple rpc_xprt's with
the same underlying bc_xprt.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

99de8ea9

rpc: move sk_bc_xprt to svc_xprt · d75faea3

由 J. Bruce Fields 提交于 11月 30, 2010

This seems obviously transport-level information even if it's currently
used only by the server socket code.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d75faea3

11 1月, 2011 1 次提交

NFS: Don't use vm_map_ram() in readdir · 6650239a

由 Trond Myklebust 提交于 1月 08, 2011

vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.

The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org  [2.6.37]

6650239a

07 1月, 2011 9 次提交

fs: dcache reduce branches in lookup path · fb045adb

由 Nick Piggin 提交于 1月 07, 2011

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fb045adb

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

fs: change d_delete semantics · fe15ce44

由 Nick Piggin 提交于 1月 07, 2011

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fe15ce44

NFS rename client back channel transport field · 4a19de0f