提交 · fc28decdc93633a65d54e42498e9e819d466329c · openanolis / cloud-kernel

29 3月, 2009 11 次提交

SUNRPC: Use IPv4 loopback for registering AF_INET6 kernel RPC services · fc28decd

由 Chuck Lever 提交于 3月 18, 2009

The kernel uses an IPv6 loopback address when registering its AF_INET6
RPC services so that it can tell whether the local portmapper is
actually IPv6-enabled.

Since the legacy portmapper doesn't listen on IPv6, however, this
causes a long timeout on older systems if the kernel happens to try
creating and registering an AF_INET6 RPC service.  Originally I wanted
to use a connected transport (either TCP or connected UDP) so that the
upcall would fail immediately if the portmapper wasn't listening on
IPv6, but we never agreed on what transport to use.

In the end, it's of little consequence to the kernel whether the local
portmapper is listening on IPv6.  It's only important whether the
portmapper supports rpcbind v4.  And the kernel can't tell that at all
if it is sending requests via IPv6 -- the portmapper will just ignore
them.

So, send both rpcbind v2 and v4 SET/UNSET requests via IPv4 loopback
to maintain better backwards compatibility between new kernels and
legacy user space, and prevent multi-second hangs in some cases when
the kernel attempts to register RPC services.

This patch is part of a series that addresses

   http://bugzilla.kernel.org/show_bug.cgi?id=12256Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fc28decd

SUNRPC: Set IPV6ONLY flag on PF_INET6 RPC listener sockets · 7d21c0f9

由 Chuck Lever 提交于 3月 18, 2009

We are about to convert to using separate RPC listener sockets for
PF_INET and PF_INET6. This echoes the way IPv6 is handled in user
space by TI-RPC, and eliminates the need for ULPs to worry about
mapped IPv4 AF_INET6 addresses when doing address comparisons.

Start by setting the IPV6ONLY flag on PF_INET6 RPC listener sockets.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7d21c0f9

NFS: Revert creation of IPv6 listeners for lockd and NFSv4 callbacks · 26298caa

由 Chuck Lever 提交于 3月 18, 2009

We're about to convert over to using separate PF_INET and PF_INET6
listeners, instead of a single PF_INET6 listener that also receives
AF_INET requests and maps them to AF_INET6.

Clear the way by removing the logic in lockd and the NFSv4 callback
server that creates an AF_INET6 service listener.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

26298caa

SUNRPC: Remove @family argument from svc_create() and svc_create_pooled() · 49a9072f

由 Chuck Lever 提交于 3月 18, 2009

Since an RPC service listener's protocol family is specified now via
svc_create_xprt(), it no longer needs to be passed to svc_create() or
svc_create_pooled(). Remove that argument from the synopsis of those
functions, and remove the sv_family field from the svc_serv struct.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

49a9072f

SUNRPC: Change svc_create_xprt() to take a @family argument · 9652ada3

由 Chuck Lever 提交于 3月 18, 2009

The sv_family field is going away.  Pass a protocol family argument to
svc_create_xprt() instead of extracting the family from the passed-in
svc_serv struct.

Again, as this is a listener socket and not an address, we make this
new argument an "int" protocol family, instead of an "sa_family_t."
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

9652ada3

SUNRPC: svc_setup_socket() gets protocol family from socket · baf01caf

由 Chuck Lever 提交于 3月 18, 2009

Since the sv_family field is going away, modify svc_setup_socket() to
extract the protocol family from the passed-in socket instead of from
the passed-in svc_serv struct.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

baf01caf

SUNRPC: Pass a family argument to svc_register() · 4b62e58c

由 Chuck Lever 提交于 3月 18, 2009

The sv_family field is going away. Instead of using sv_family, have
the svc_register() function take a protocol family argument.

Since this argument represents a protocol family, and not an address
family, this argument takes an int, as this is what is passed to
sock_create_kern(). Also make sure svc_register's helpers are
checking for PF_FOO instead of AF_FOO. The value of [AP]F_FOO are
equivalent; this is simply a symbolic change to reflect the semantics
of the value stored in that variable.

sock_create_kern() should return EPFNOSUPPORT if the passed-in
protocol family isn't supported, but it uses EAFNOSUPPORT for this
case. We will stick with that tradition here, as svc_register()
is called by the RPC server in the same path as sock_create_kern().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4b62e58c

SUNRPC: Clean up svc_find_xprt() calling sequence · 156e6209

由 Chuck Lever 提交于 3月 18, 2009

Clean up: add documentating comment and use appropriate data types for
svc_find_xprt()'s arguments.

This also eliminates a mixed sign comparison: @port was an int, while
the return value of svc_xprt_local_port() is an unsigned short.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

156e6209

NFSD: If port value written to /proc/fs/nfsd/portlist is invalid, return EINVAL · adbbe929

由 Chuck Lever 提交于 3月 18, 2009

Make sure port value read from user space by write_ports is valid before
passing it to svc_find_xprt().  If it wasn't, the writer would get ENOENT
instead of EINVAL.
Noticed-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

adbbe929

SUNRPC: Clean up static inline functions in svc_xprt.h · efb3288b

由 Chuck Lever 提交于 3月 18, 2009

Clean up: Enable the use of const arguments in higher level svc_ APIs
by adding const to the arguments of the helper functions in svc_xprt.h
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

efb3288b

SUNRPC: Don't flag empty RPCB_GETADDR reply as bogus · 776bd5c7

由 Chuck Lever 提交于 3月 18, 2009

In 2007, commit e65fe397 added
additional sanity checking to rpcb_decode_getaddr() to make sure we
were getting a reply that was long enough to be an actual universal
address.  If the uaddr string isn't long enough, the XDR decoder
returns EIO.

However, an empty string is a valid RPCB_GETADDR response if the
requested service isn't registered.  Moreover, "::.n.m" is also a
valid RPCB_GETADDR response for IPv6 addresses that is shorter
than rpcb_decode_getaddr()'s lower limit of 11.  So this sanity
check introduced a regression for rpcbind requests against IPv6
remotes.

So revert the lower bound check added by commit
e65fe397, and add an explicit check
for an empty uaddr string, similar to libtirpc's rpcb_getaddr(3).
Pointed-out-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

776bd5c7

20 3月, 2009 8 次提交

NFS: Optimise NFS close() · 7fe5c398

由 Trond Myklebust 提交于 3月 19, 2009

Close-to-open cache consistency rules really only require us to flush out
writes on calls to close(), and require us to revalidate attributes on the
very last close of the file.

Currently we appear to be doing a lot of extra attribute revalidation
and cache flushes.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7fe5c398

NFS: Fix the notifications when renaming onto an existing file · b1e4adf4

由 Trond Myklebust 提交于 3月 19, 2009

NFS appears to be returning an unnecessary "delete" notification when
we're doing an atomic rename. See

  http://bugzilla.gnome.org/show_bug.cgi?id=575684

The fix is to get rid of the redundant call to d_delete().
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b1e4adf4

NFS: Fix up a mismerged patch · 47c62564

由 Trond Myklebust 提交于 3月 16, 2009

Move the definition of nfs_need_commit() into the #ifdef CONFIG_NFS_V3
section as originally intended in the patch "NFS: cleanup - remove
struct nfs_inode->ncommit"
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

47c62564

SVCRDMA: fix recent printk format warnings. · 2e3c230b

由 Tom Talpey 提交于 3月 12, 2009

printk formats in prior commit were reversed/incorrect.
Compiled without warning on x86 and x86_64, but detected on ppc.
Signed-off-by: NTom Talpey <tmtalpey@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2e3c230b

SUNRPC: Ensure we close the socket on EPIPE errors too... · 55420c24

由 Trond Myklebust 提交于 3月 11, 2009

As long as one task is holding the socket lock, then calls to
xprt_force_disconnect(xprt) will not succeed in shutting down the socket.
In particular, this would mean that a server initiated shutdown will not
succeed until the lock is relinquished.
In order to avoid the deadlock, we should ensure that xs_tcp_send_request()
closes the socket on EPIPE errors too.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

55420c24

T
SUNRPC: xs_tcp_connect_worker{4,6}: merge common code · b61d59ff
由 Trond Myklebust 提交于 3月 11, 2009
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
b61d59ff
T
SUNRPC: Add a sysctl to control the duration of the socket linger timeout · 25fe6142
由 Trond Myklebust 提交于 3月 11, 2009
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
25fe6142

SUNRPC: Add the equivalent of the linger and linger2 timeouts to RPC sockets · 7d1e8255

由 Trond Myklebust 提交于 3月 11, 2009

This fixes a regression against FreeBSD servers as reported by Tomas
Kasparek. Apparently when using RPC over a TCP socket, the FreeBSD servers
don't ever react to the client closing the socket, and so commit
e06799f9 (SUNRPC: Use shutdown() instead of
close() when disconnecting a TCP socket) causes the setup to hang forever
whenever the client attempts to close and then reconnect.

We break the deadlock by adding a 'linger2' style timeout to the socket,
after which, the client will abort the connection using a TCP 'RST'.

The default timeout is set to 15 seconds. A subsequent patch will put it
under user control by means of a systctl.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7d1e8255

12 3月, 2009 21 次提交

SUNRPC: Ensure that xs_nospace return values are propagated · 5e3771ce

由 Trond Myklebust 提交于 3月 11, 2009

If xs_nospace() finds that the socket has disconnected, it attempts to
return ENOTCONN, however that value is then squashed by the callers.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

5e3771ce

SUNRPC: Delay, then retry on connection errors. · 8a2cec29

由 Trond Myklebust 提交于 3月 11, 2009

Enforce the comment in xs_tcp_connect_worker4/xs_tcp_connect_worker6 that
we should delay, then retry on certain connection errors.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8a2cec29

SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending · 2a491991

由 Trond Myklebust 提交于 3月 11, 2009

While we should definitely return socket errors to the task that is
currently trying to send data, there is no need to propagate the same error
to all the other tasks on xprt->pending. Doing so actually slows down
recovery, since it causes more than one tasks to attempt socket recovery.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2a491991

SUNRPC: Handle socket errors correctly · 482f32e6

由 Trond Myklebust 提交于 3月 11, 2009

Ensure that we pick up and handle socket errors as they occur.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

482f32e6

SUNRPC: Handle ECONNREFUSED correctly in xprt_transmit() · c8485e4d

由 Trond Myklebust 提交于 3月 11, 2009

If we get an ECONNREFUSED error, we currently go to sleep on the
'xprt->sending' wait queue. The problem is that no timeout is set there,
and there is nothing else that will wake the task up later.

We should deal with ECONNREFUSED in call_status, given that is where we
also deal with -EHOSTDOWN, and friends.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c8485e4d

T
SUNRPC: Don't disconnect if a connection is still in progress. · 40d2549d
由 Trond Myklebust 提交于 3月 11, 2009
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
40d2549d

SUNRPC: Ensure we set XPRT_CLOSING only after we've sent a tcp FIN... · 670f9457

由 Trond Myklebust 提交于 3月 11, 2009

...so that we can distinguish between when we need to shutdown and when we
don't. Also remove the call to xs_tcp_shutdown() from xs_tcp_connect(),
since xprt_connect() makes the same test.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

670f9457

SUNRPC: Avoid an unnecessary task reschedule on ENOTCONN · 15f081ca

由 Trond Myklebust 提交于 3月 11, 2009

If the socket is unconnected, and xprt_transmit() returns ENOTCONN, we
currently give up the lock on the transport channel. Doing so means that
the lock automatically gets assigned to the next task in the xprt->sending
queue, and so that task needs to be woken up to do the actual connect.

The following patch aims to avoid that unnecessary task switch.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

15f081ca

NFS: load the rpc/rdma transport module automatically · a67d18f8

由 Tom Talpey 提交于 3月 11, 2009

When mounting an NFS/RDMA server with the "-o proto=rdma" or
"-o rdma" options, attempt to dynamically load the necessary
"xprtrdma" client transport module. Doing so improves usability,
while avoiding a static module dependency and any unnecesary
resources.
Signed-off-by: NTom Talpey <tmtalpey@gmail.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a67d18f8

SUNRPC: dynamically load RPC transport modules on-demand · 441e3e24

由 Tom Talpey 提交于 3月 11, 2009

Provide an api to attempt to load any necessary kernel RPC
client transport module automatically. By convention, the
desired module name is "xprt"+"transport name". For example,
when NFS mounting with "-o proto=rdma", attempt to load the
"xprtrdma" module.
Signed-off-by: NTom Talpey <tmtalpey@gmail.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

441e3e24

XPRTRDMA: correct an rpc/rdma inline send marshaling error · b38ab40a

由 Tom Talpey 提交于 3月 11, 2009

Certain client rpc's which contain both lengthy page-contained
metadata and a non-empty xdr_tail buffer require careful handling
to avoid overlapped memory copying. Rearranging of existing rpcrdma
marshaling code avoids it; this fixes an NFSv4 symlink creation error
detected with connectathon basic/test8 to multiple servers.
Signed-off-by: NTom Talpey <tmtalpey@gmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b38ab40a

SVCRDMA: remove faulty assertions in rpc/rdma chunk validation. · b1e1e158

由 Tom Talpey 提交于 3月 11, 2009

Certain client-provided RPCRDMA chunk alignments result in an
additional scatter/gather entry, which triggered nfs/rdma server
assertions incorrectly. OpenSolaris nfs/rdma client connectathon
testing was blocked by these in the special/locking section.
Signed-off-by: NTom Talpey <tmtalpey@gmail.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b1e1e158

NFS: Kill the "defined but not used" compile error on nommu machines · e1ebfd33

由 Trond Myklebust 提交于 3月 11, 2009

Bryan Wu reports that when compiling NFS on nommu machines he gets a
"defined but not used" error on nfs_file_mmap().

The easiest fix is simply to get rid of the special casing in NFS, and
just always call generic_file_mmap() to set up the file.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

e1ebfd33

NFS: Throttle page dirtying while we're flushing to disk · 72cb77f4

由 Trond Myklebust 提交于 3月 11, 2009

The following patch is a combination of a patch by myself and Peter
Staubach.

Trond: If we allow other processes to dirty pages while a process is doing
a consistency sync to disk, we can end up never making progress.

Peter: Attached is a patch which addresses a continuing problem with
the NFS client generating out of order WRITE requests.  While
this is compliant with all of the current protocol
specifications, there are servers in the market which can not
handle out of order WRITE requests very well.  Also, this may
lead to sub-optimal block allocations in the underlying file
system on the server.  This may cause the read throughputs to
be reduced when reading the file from the server.

Peter: There has been a lot of work recently done to address out of
order issues on a systemic level.  However, the NFS client is
still susceptible to the problem.  Out of order WRITE
requests can occur when pdflush is in the middle of writing
out pages while the process dirtying the pages calls
generic_file_buffered_write which calls
generic_perform_write which calls
balance_dirty_pages_rate_limited which ends up calling
writeback_inodes which ends up calling back into the NFS
client to writes out dirty pages for the same file that
pdflush happens to be working with.
Signed-off-by: NPeter Staubach <staubach@redhat.com>
[modification by Trond to merge the two similar patches]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

72cb77f4

T
NFS: cleanup - remove struct nfs_inode->ncommit · fb8a1f11
由 Trond Myklebust 提交于 3月 11, 2009
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
fb8a1f11

NFSv4: Simplify some cache consistency post-op GETATTRs · a65318bf

由 Trond Myklebust 提交于 3月 11, 2009

Certain asynchronous operations such as write() do not expect
(or care) that other metadata such as the file owner, mode, acls, ...
change. All they want to do is update and/or check the change attribute,
ctime, and mtime.
By skipping the file owner and group update, we also avoid having to do a
potential idmapper upcall for these asynchronous RPC calls.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a65318bf

NFSv4: A referral is assumed to always point to a directory. · 69aaaae1

由 Trond Myklebust 提交于 3月 11, 2009

Fix a bug whereby we would fail to create a mount point for a referral.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

69aaaae1

T
NFSv4: Make decode_getfattr() set fattr->valid to reflect what was decoded · 409924e4
由 Trond Myklebust 提交于 3月 11, 2009
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
409924e4
T
NFSv4: Clean up decode_getfattr() · f26c7a78
由 Trond Myklebust 提交于 3月 11, 2009
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
f26c7a78

NFS: Fix the type of struct nfs_fattr->mode · bca79478

由 Trond Myklebust 提交于 3月 11, 2009

There is no point in using anything other than umode_t, since we copy the
content pretty much directly into inode->i_mode.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bca79478

NFS: Shrink the struct nfs_fattr · 1ca277d8

由 Trond Myklebust 提交于 3月 11, 2009

We don't need the bitmap[] field anymore, since the 'valid' field tells us
all we need to know about which attributes were filled in...
Also move the pre-op attributes in order to improve the structure packing.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

1ca277d8

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功