提交 · 05f4c350ee02e9461c6ae3a880ea326a06835e37 · openanolis / cloud-kernel

02 10月, 2012 26 次提交

NFS: Discover NFSv4 server trunking when mounting · 05f4c350

由 Chuck Lever 提交于 9月 14, 2012

"Server trunking" is a fancy named for a multi-homed NFS server.
Trunking might occur if a client sends NFS requests for a single
workload to multiple network interfaces on the same server.  There
are some implications for NFSv4 state management that make it useful
for a client to know if a single NFSv4 server instance is
multi-homed.  (Note this is only a consideration for NFSv4, not for
legacy versions of NFS, which are stateless).

If a client cares about server trunking, no NFSv4 operations can
proceed until that client determines who it is talking to.  Thus
server IP trunking discovery must be done when the client first
encounters an unfamiliar server IP address.

The nfs_get_client() function walks the nfs_client_list and matches
on server IP address.  The outcome of that walk tells us immediately
if we have an unfamiliar server IP address.  It invokes
nfs_init_client() in this case.  Thus, nfs4_init_client() is a good
spot to perform trunking discovery.

Discovery requires a client to establish a fresh client ID, so our
client will now send SETCLIENTID or EXCHANGE_ID as the first NFS
operation after a successful ping, rather than waiting for an
application to perform an operation that requires NFSv4 state.

The exact process for detecting trunking is different for NFSv4.0 and
NFSv4.1, so a minorversion-specific init_client callout method is
introduced.

CLID_INUSE recovery is important for the trunking discovery process.
CLID_INUSE is a sign the server recognizes the client's nfs_client_id4
id string, but the client is using the wrong principal this time for
the SETCLIENTID operation.  The SETCLIENTID must be retried with a
series of different principals until one works, and then the rest of
trunking discovery can proceed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

05f4c350

NFS: Use the same nfs_client_id4 for every server · e984a55a

由 Chuck Lever 提交于 9月 14, 2012

Currently, when identifying itself to NFS servers, the Linux NFS
client uses a unique nfs_client_id4.id string for each server IP
address it talks with.  For example, when client A talks to server X,
the client identifies itself using a string like "AX".  The
requirements for these strings are specified in detail by RFC 3530
(and bis).

This form of client identification presents a problem for Transparent
State Migration.  When client A's state on server X is migrated to
server Y, it continues to be associated with string "AX."  But,
according to the rules of client string construction above, client
A will present string "AY" when communicating with server Y.

Server Y thus has no way to know that client A should be associated
with the state migrated from server X.  "AX" is all but abandoned,
interfering with establishing fresh state for client A on server Y.

To support transparent state migration, then, NFSv4.0 clients must
instead use the same nfs_client_id4.id string to identify themselves
to every NFS server; something like "A".

Now a client identifies itself as "A" to server X.  When a file
system on server X transitions to server Y, and client A identifies
itself as "A" to server Y, Y will know immediately that the state
associated with "A," whether it is native or migrated, is owned by
the client, and can merge both into a single lease.

As a pre-requisite to adding support for NFSv4 migration to the Linux
NFS client, this patch changes the way Linux identifies itself to NFS
servers via the SETCLIENTID (NFSv4 minor version 0) and EXCHANGE_ID
(NFSv4 minor version 1) operations.

In addition to removing the server's IP address from nfs_client_id4,
the Linux NFS client will also no longer use its own source IP address
as part of the nfs_client_id4 string.  On multi-homed clients, the
value of this address depends on the address family and network
routing used to contact the server, thus it can be different for each
server.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

e984a55a

NFS: Introduce "migration" mount option · 89652617

由 Chuck Lever 提交于 9月 14, 2012

Currently, the Linux client uses a unique nfs_client_id4.id string
when identifying itself to distinct NFS servers.

To support transparent state migration, the Linux client will have to
use the same nfs_client_id4 string for all servers it communicates
with (also known as the "uniform client string" approach).  Otherwise
NFS servers can not recognize that open and lock state need to be
merged after a file system transition.

Unfortunately, there are some NFSv4.0 servers currently in the field
that do not tolerate the uniform client string approach.

Thus, by default, our NFSv4.0 mounts will continue to use the current
approach, and we introduce a mount option that switches them to use
the uniform model.  Client administrators must identify which servers
can be mounted with this option.  Eventually most NFSv4.0 servers will
be able to handle the uniform approach, and we can change the default.

The first mount of a server controls the behavior for all subsequent
mounts for the lifetime of that set of mounts of that server.  After
the last mount of that server is gone, the client erases the data
structure that tracks the lease.  A subsequent lease may then honor
a different "migration" setting.

This patch adds only the infrastructure for parsing the new mount
option.  Support for uniform client strings is added in a subsequent
patch.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

89652617

SUNRPC: Introduce rpc_clone_client_set_auth() · ba9b584c

由 Chuck Lever 提交于 9月 14, 2012

An ULP is supposed to be able to replace a GSS rpc_auth object with
another GSS rpc_auth object using rpcauth_create().  However,
rpcauth_create() in 3.5 reliably fails with -EEXIST in this case.
This is because when gss_create() attempts to create the upcall pipes,
sometimes they are already there.  For example if a pipe FS mount
event occurs, or a previous GSS flavor was in use for this rpc_clnt.

It turns out that's not the only problem here.  While working on a
fix for the above problem, we noticed that replacing an rpc_clnt's
rpc_auth is not safe, since dereferencing the cl_auth field is not
protected in any way.

So we're deprecating the ability of rpcauth_create() to switch an
rpc_clnt's security flavor during normal operation.  Instead, let's
add a fresh API that clones an rpc_clnt and gives the clone a new
flavor before it's used.

This makes immediate use of the new __rpc_clone_client() helper.

This can be used in a similar fashion to rpcauth_create() when a
client is hunting for the correct security flavor.  Instead of
replacing an rpc_clnt's security flavor in a loop, the ULP replaces
the whole rpc_clnt.

To fix the -EEXIST problem, any ULP logic that relies on replacing
an rpc_clnt's rpc_auth with rpcauth_create() must be changed to use
this API instead.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

ba9b584c

SUNRPC: Refactor rpc_clone_client() · 1b63a751

由 Chuck Lever 提交于 9月 14, 2012

rpc_clone_client() does most of the same tasks as rpc_new_client(),
so there is an opportunity for code re-use.  Create a generic helper
that makes it easy to clone an RPC client while replacing any of the
clnt's parameters.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

1b63a751

SUNRPC: Use __func__ in dprintk() in auth_gss.c · 632f0d05

由 Chuck Lever 提交于 9月 14, 2012

Clean up: Some function names have changed, but debugging messages
were never updated.  Automate the construction of the function name
in debugging messages.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

632f0d05

SUNRPC: Clean up dprintk messages in rpc_pipe.c · d8af9bc1

由 Chuck Lever 提交于 9月 14, 2012

Clean up: The blank space in front of the message must be spaces.
Tabs show up on the console as a graphical character.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d8af9bc1

NFS: Slow down state manager after an unhandled error · ffe5a830

由 Chuck Lever 提交于 9月 14, 2012

If the state manager thread is not actually able to fully recover from
some situation, it wakes up waiters, who kick off a new state manager
thread.  Quite often the fresh invocation of the state manager is just
as successful.

This results in a livelock as the client dumps thousands of NFS
requests a second on the network in a vain attempt to recover.  Not
very friendly.

To mitigate this situation, add a delay in the state manager after
an unhandled error, so that the client sends just a few requests
every second in this case.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

ffe5a830

NFS: nfs_parsed_mount_options can use unsigned int · 8cb7f74e

由 Chuck Lever 提交于 9月 14, 2012

fs/nfs/super.c: In function ‘nfs_compare_remount_data’:
fs/nfs/super.c:2042:18: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2043:18: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2044:20: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2046:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2047:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2048:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2049:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2050:18: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]

Seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8cb7f74e

lockd: create and use per-net NSM RPC clients on MON/UNMON requests · cb7323ff

由 Stanislav Kinsbursky 提交于 9月 18, 2012

NSM RPC client can be required on NFSv3 umount, when child reaper is dying
(and destroying it's mount namespace). It means, that current nsproxy is set
to NULL already, but creation of RPC client requires UTS namespace for gaining
hostname string.

This patch creates reference-counted per-net NSM client on first monitor
request and destroys it after last unmonitor request.
Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

cb7323ff

lockd: use rpc client's cl_nodename for id encoding · 303a7ce9

由 Stanislav Kinsbursky 提交于 9月 18, 2012

Taking hostname from uts namespace if not safe, because this cuold be
performind during umount operation on child reaper death. And in this case
current->nsproxy is NULL already.
Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

303a7ce9

lockd: per-net NSM client creation and destruction helpers introduced · e9406db2

由 Stanislav Kinsbursky 提交于 9月 18, 2012

NSM RPC client can be required on NFSv3 umount, when child reaper is dying (and
destroying it's mount namespace). It means, that current nsproxy is set to
NULL already, but creation of RPC client requires UTS namespace for gaining
hostname string.
This patch introduces reference counted NFS RPC clients creation and
destruction helpers (similar to RPCBIND RPC clients).
Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

e9406db2

NFS: add debug messages to callback down function · 1dc42e04

由 Stanislav Kinsbursky 提交于 8月 20, 2012

Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

1dc42e04

NFS: callback per-net usage counting introduced · b3d19c51

由 Stanislav Kinsbursky 提交于 8月 20, 2012

This patch also introduces refcount-aware nfs_callback_down_net() wrapper for
svc_shutdown_net().
Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b3d19c51

NFS: make nfs_callback_tcpport6 per network context · 29dcc16a

由 Stanislav Kinsbursky 提交于 8月 20, 2012

Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

29dcc16a

NFS: make nfs_callback_tcpport per network context · bbe0a3aa

由 Stanislav Kinsbursky 提交于 8月 20, 2012

Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bbe0a3aa

NFS: callback up - users counting cleanup · 23c20ecd

由 Stanislav Kinsbursky 提交于 8月 20, 2012

Usage coutner now increased only is the service was started sccessfully.
Even if service is running already, then goto is not required anymore, because
service creation and start will be skipped.
With this patch code looks clearer.
Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

23c20ecd

NFS: callback service start function introduced · 8e246144

由 Stanislav Kinsbursky 提交于 8月 20, 2012

This is just a code move, which from my POW makes code looks better.
I.e. now on start we have 3 different stages:
1) Service creation.
2) Service per-net data allocation.
3) Service start.

Patch also renames goto label "out_err:" into "err_start:" to reflect new
changes.
Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8e246144

NFS: callback up - transport backchannel cleanup · 691c457a

由 Stanislav Kinsbursky 提交于 8月 20, 2012

No need to assign transports backchannel server explicitly in
nfs41_callback_up() -  there is nfs_callback_bc_serv() function for this.
By using it, nfs4_callback_up() and nfs41_callback_up() can be called without
transport argument.

Note: service have to be passed to nfs_callback_bc_serv() instead of callback,
since callback link can be uninitialized.
Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

691c457a

NFS: move per-net callback thread initialization to nfs_callback_up_net() · c946556b

由 Stanislav Kinsbursky 提交于 8月 20, 2012

v4:
1) Callback transport creation routine selection by version simlified.

This new function in now called before nfs_minorversion_callback_svc_setup()).

Also few small changes:
1) current network namespace in nfs_callback_up() was replaced by transport net.
2) svc_shutdown_net() was moved prior to callback usage counter decrement
(because in case of per-net data allocation faulure svc_shutdown_net() have to
be skipped).
Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c946556b

NFS: callback service creation function introduced · dd018428

由 Stanislav Kinsbursky 提交于 8月 20, 2012

This function creates service if it's not exist, or increase usage counter of
the existent, and returns pointer to it.
Usage counter will be droppepd by svc_destroy() later in nfs_callback_up().
Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

dd018428

NFS: pass net to nfs_callback_down() · c8ceb412

由 Stanislav Kinsbursky 提交于 8月 20, 2012

Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c8ceb412

NFSv4: Add ACCESS operation to OPEN compound · 6168f62c

由 Weston Andros Adamson 提交于 9月 10, 2012

The OPEN operation has no way to differentiate an open for read and an
open for execution - both look like read to the server. This allowed
users to read files that didn't have READ access but did have EXEC access,
which is obviously wrong.

This patch adds an ACCESS call to the OPEN compound to handle the
difference between OPENs for reading and execution. Since we're going
through the trouble of calling ACCESS, we check all possible access bits
and cache the results hopefully avoiding an ACCESS call in the future.
Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

6168f62c

NFS: Use kzalloc() instead of kmalloc() in the idmapper · 57a51048

由 Bryan Schumaker 提交于 8月 09, 2012

This will allocate memory that has already been zeroed, allowing us to
remove the memset later on.
Signed-off-by: NBryan Schumaker <bjchuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

57a51048

NFS: Remove bad delegations during open recovery · 6938867e

由 Bryan Schumaker 提交于 9月 26, 2012

I put the client into an open recovery loop by:
	Client: Open file
		read half
	Server: Expire client (echo 0 > /sys/kernel/debug/nfsd/forget_clients)
	Client: Drop vm cache (echo 3 > /proc/sys/vm/drop_caches)
		finish reading file

This causes a loop because the client never updates the nfs4_state after
discovering that the delegation is invalid.  This means it will keep
trying to read using the bad delegation rather than attempting to re-open
the file.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
CC: stable@vger.kernel.org [3.4+]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

6938867e

NFS: Always use the open stateid when checking for expired opens · fcb6d9c6

由 Bryan Schumaker 提交于 9月 26, 2012

If we are reading through a delegation, and the delegation is OK then
state->stateid will still point to a delegation stateid and not an open
stateid.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fcb6d9c6

29 9月, 2012 14 次提交

SUNRPC: Limit the rpciod workqueue concurrency · 9b96ce71

由 Trond Myklebust 提交于 9月 28, 2012

We shouldn't need more than 1 worker thread per cpu, since rpciod
is designed to run without sleeping in most cases.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

9b96ce71

NFSv4.1: nfs4_proc_layoutreturn must always drop the plh_block_lgets count · 849b286f

由 Trond Myklebust 提交于 9月 24, 2012

Currently it does not do so if the RPC call failed to start. Fix is to
move the decrement of plh_block_lgets into nfs4_layoutreturn_release.

Also remove a redundant test of task->tk_status in nfs4_layoutreturn_done:
if lrp->res.lrs_present is set, then obviously the RPC call succeeded.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

849b286f

NFSv4.1: _pnfs_return_layout() shouldn't invalidate the layout on failure · 65857d57

由 Trond Myklebust 提交于 9月 24, 2012

Failure of the layoutreturn allocation fails is not a good reason to
mark the pnfs_layout_hdr as having failed a layoutget or i/o. Just
exit cleanly.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

65857d57

NFSv4.1: Remove the NFS_LAYOUT_RETURNED state · e5929f3c

由 Trond Myklebust 提交于 9月 21, 2012

It serves no purpose that the test for whether or not we have valid
layout segments doesn't already serve.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

e5929f3c

NFSv4.1: Clear NFS_LAYOUT_BULK_RECALL when the layout segments are freed · 173f77e9

由 Trond Myklebust 提交于 9月 21, 2012

Once all the affected layout segments have been freed up, clear the
NFS_LAYOUT_BULK_RECALL flag so that we can reuse the pnfs_layout_hdr
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

173f77e9

NFSv4.1: Get rid of the NFS_LAYOUT_DESTROYED state · 8006bfba

由 Trond Myklebust 提交于 9月 21, 2012

We already have a mechanism for blocking LAYOUTGET by means of the
plh_block_lgets counter. The only "service" that NFS_LAYOUT_DESTROYED
provides at this point is to block layoutget once the layout segment
list is empty, which basically means that you have to wait until
the pnfs_layout_hdr is destroyed before you can do pNFS on that file
again.

This patch enables the reuse of the pnfs_layout_hdr if the layout
segment list is empty.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8006bfba

T
NFSv4.1: Remove unused 'default allocation' for pnfs_alloc_layout_hdr() · 57934278
由 Trond Myklebust 提交于 9月 20, 2012
```
...and ditto for pnfs_free_layout_hdr()
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
57934278

NFSv4.1: Get rid of pNFS spin lock debugging asserts... · a9136d49

由 Trond Myklebust 提交于 9月 20, 2012

These are all in static declared functions that are called only once.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a9136d49

NFSv4.1: Balance pnfs_layout_hdr refcount in pnfs_layout_(insert|remove)_lseg · 8f0d27dc

由 Trond Myklebust 提交于 9月 20, 2012

Ensure that the reference count for pnfs_layout_hdr reverts to the
original value after a call to pnfs_layout_remove_lseg().

Note that the caller is expected to hold a reference to the struct
pnfs_layout_hdr.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8f0d27dc

NFSv4.1: Clean up pnfs_put_lseg() · 905ca191

由 Trond Myklebust 提交于 9月 20, 2012

There is no longer a need to use pnfs_free_lseg_list(). Just call
pnfs_free_lseg() directly.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

905ca191

NFSv4.1: Clean up the removal of pnfs_layout_hdr from the server list · 9c626381

由 Trond Myklebust 提交于 9月 20, 2012

Move the code into pnfs_free_layout_hdr(), and add checks to
get_layout_by_fh_locked to ensure that they don't reference a layout
that is being freed.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

9c626381

NFSv4.1: Free the pnfs_layout_hdr outside the inode->i_lock · 6622c3ea

由 Trond Myklebust 提交于 9月 20, 2012

None of the existing pNFS layout drivers seem to require the inode
to be locked while they free the layout header.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

6622c3ea

NFSv4.1: Remove redundant reference to the pnfs_layout_hdr · 01d39ce8

由 Trond Myklebust 提交于 9月 20, 2012

Each layout segment already holds a reference to the pnfs_layout_hdr,
so there is no need to hold an extra reference that is released once
the last layout segment is freed.

Ensure that pnfs_find_alloc_layout() always returns a reference
to the pnfs_layout_hdr, which will be matched by the final call to
pnfs_put_layout_hdr() in pnfs_update_layout().
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

01d39ce8

NFSv4.1: Rename the pnfs_put_lseg_common to pnfs_layout_remove_lseg · 57036a37

由 Trond Myklebust 提交于 9月 20, 2012

The latter name is more descriptive of the actual function.
Also rename pnfs_insert_layout to pnfs_layout_insert_lseg.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

57036a37

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功