提交 · 3be7988674ab33565700a37b210f502563d932e6 · openanolis / cloud-kernel

13 9月, 2016 1 次提交

svcauth_gss: Revert ("Remove unnecessary allocation") · bf2c4b6f

由 Chuck Lever 提交于 9月 01, 2016

rsc_lookup steals the passed-in memory to avoid doing an allocation of
its own, so we can't just pass in a pointer to memory that someone else
is using.

If we really want to avoid allocation there then maybe we should
preallocate somwhere, or reference count these handles.

For now we should revert.

On occasion I see this on my server:

kernel: kernel BUG at /home/cel/src/linux/linux-2.6/mm/slub.c:3851!
kernel: invalid opcode: 0000 [#1] SMP
kernel: Modules linked in: cts rpcsec_gss_krb5 sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd btrfs xor iTCO_wdt iTCO_vendor_support raid6_pq pcspkr i2c_i801 i2c_smbus lpc_ich mfd_core mei_me sg mei shpchp wmi ioatdma ipmi_si ipmi_msghandler acpi_pad acpi_power_meter rpcrdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb mlx4_core ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
kernel: CPU: 7 PID: 145 Comm: kworker/7:2 Not tainted 4.8.0-rc4-00006-g9d06b0b #15
kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
kernel: Workqueue: events do_cache_clean [sunrpc]
kernel: task: ffff8808541d8000 task.stack: ffff880854344000
kernel: RIP: 0010:[<ffffffff811e7075>] [<ffffffff811e7075>] kfree+0x155/0x180
kernel: RSP: 0018:ffff880854347d70 EFLAGS: 00010246
kernel: RAX: ffffea0020fe7660 RBX: ffff88083f9db064 RCX: 146ff0f9d5ec5600
kernel: RDX: 000077ff80000000 RSI: ffff880853f01500 RDI: ffff88083f9db064
kernel: RBP: ffff880854347d88 R08: ffff8808594ee000 R09: ffff88087fdd8780
kernel: R10: 0000000000000000 R11: ffffea0020fe76c0 R12: ffff880853f01500
kernel: R13: ffffffffa013cf76 R14: ffffffffa013cff0 R15: ffffffffa04253a0
kernel: FS: 0000000000000000(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fed60b020c3 CR3: 0000000001c06000 CR4: 00000000001406e0
kernel: Stack:
kernel: ffff8808589f2f00 ffff880853f01500 0000000000000001 ffff880854347da0
kernel: ffffffffa013cf76 ffff8808589f2f00 ffff880854347db8 ffffffffa013d006
kernel: ffff8808589f2f20 ffff880854347e00 ffffffffa0406f60 0000000057c7044f
kernel: Call Trace:
kernel: [<ffffffffa013cf76>] rsc_free+0x16/0x90 [auth_rpcgss]
kernel: [<ffffffffa013d006>] rsc_put+0x16/0x30 [auth_rpcgss]
kernel: [<ffffffffa0406f60>] cache_clean+0x2e0/0x300 [sunrpc]
kernel: [<ffffffffa04073ee>] do_cache_clean+0xe/0x70 [sunrpc]
kernel: [<ffffffff8109a70f>] process_one_work+0x1ff/0x3b0
kernel: [<ffffffff8109b15c>] worker_thread+0x2bc/0x4a0
kernel: [<ffffffff8109aea0>] ? rescuer_thread+0x3a0/0x3a0
kernel: [<ffffffff810a0ba4>] kthread+0xe4/0xf0
kernel: [<ffffffff8169c47f>] ret_from_fork+0x1f/0x40
kernel: [<ffffffff810a0ac0>] ? kthread_stop+0x110/0x110
kernel: Code: f7 ff ff eb 3b 65 8b 05 da 30 e2 7e 89 c0 48 0f a3 05 a0 38 b8 00 0f 92 c0 84 c0 0f 85 d1 fe ff ff 0f 1f 44 00 00 e9 f5 fe ff ff <0f> 0b 49 8b 03 31 f6 f6 c4 40 0f 85 62 ff ff ff e9 61 ff ff ff
kernel: RIP [<ffffffff811e7075>] kfree+0x155/0x180
kernel: RSP <ffff880854347d70>
kernel: ---[ end trace 3fdec044969def26 ]---

It seems to be most common after a server reboot where a client has been
using a Kerberos mount, and reconnects to continue its workload.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

bf2c4b6f

07 9月, 2016 2 次提交

xprtrdma: Fix receive buffer accounting · 05c97466

由 Chuck Lever 提交于 9月 06, 2016

An RPC can terminate before its reply arrives, if a credential
problem or a soft timeout occurs. After this happens, xprtrdma
reports it is out of Receive buffers.

A Receive buffer is posted before each RPC is sent, and returned to
the buffer pool when a reply is received. If no reply is received
for an RPC, that Receive buffer remains posted. But xprtrdma tries
to post another when the next RPC is sent.

If this happens a few dozen times, there are no receive buffers left
to be posted at send time. I don't see a way for a transport
connection to recover at that point, and it will spit warnings and
unnecessarily delay RPCs on occasion for its remaining lifetime.

Commit 1e465fd4 ("xprtrdma: Replace send and receive arrays")
removed a little bit of logic to detect this case and not provide
a Receive buffer so no more buffers are posted, and then transport
operation continues correctly. We didn't understand what that logic
did, and it wasn't commented, so it was removed as part of the
overhaul to support backchannel requests.

Restore it, but be wary of the need to keep extra Receives posted
to deal with backchannel requests.

Fixes: 1e465fd4 ("xprtrdma: Replace send and receive arrays")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

05c97466

xprtrdma: Revert ("xprtrdma: Reply buffer exhaustion...") · 78d506e1

由 Chuck Lever 提交于 9月 06, 2016

Receive buffer exhaustion, if it were to actually occur, would be
catastrophic. However, when there are no reply buffers to post, that
means all of them have already been posted and are waiting for
incoming replies. By design, there can never be more RPCs in flight
than there are available receive buffers.

A receive buffer can be left posted after an RPC exits without a
received reply; say, due to a credential problem or a soft timeout.
This does not result in fewer posted receive buffers than there are
pending RPCs, and there is already logic in xprtrdma to deal
appropriately with this case.

It also looks like the "+ 2" that was removed was accidentally
accommodating the number of extra receive buffers needed for
receiving backchannel requests. That will need to be addressed by
another patch.

Fixes: 3d4cf35b ("xprtrdma: Reply buffer exhaustion can be...")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

78d506e1

03 9月, 2016 1 次提交

sunrpc: fix UDP memory accounting · a41bd25a

由 Paolo Abeni 提交于 8月 25, 2016

The commit f9b2ee71 ("SUNRPC: Move UDP receive data path
into a workqueue context"), as a side effect, moved the
skb_free_datagram() call outside the scope of the related socket
lock, but UDP sockets require such lock to be held for proper
memory accounting.
Fix it by replacing skb_free_datagram() with
skb_free_datagram_locked().

Fixes: f9b2ee71 ("SUNRPC: Move UDP receive data path into a workqueue context")
Reported-and-tested-by: NJan Stancek <jstancek@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a41bd25a

25 8月, 2016 1 次提交

SUNRPC: Silence WARN_ON when NFSv4.1 over RDMA is in use · 16590a22

由 Chuck Lever 提交于 8月 22, 2016

Using NFSv4.1 on RDMA should be safe, so broaden the new checks in
rpc_create().

WARN_ON_ONCE is used, matching most other WARN call sites in clnt.c.

Fixes: 39a9beab ("rpc: share one xps between all backchannels")
Fixes: d50039ea ("nfsd4/rpc: move backchannel create logic...")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

16590a22

06 8月, 2016 3 次提交

NFSv4: Cap the transport reconnection timer at 1/2 lease period · 8d480326

由 Trond Myklebust 提交于 8月 05, 2016

We don't want to miss a lease period renewal due to the TCP connection
failing to reconnect in a timely fashion. To ensure this doesn't happen,
cap the reconnection timer so that we retry the connection attempt
at least every 1/2 lease period.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8d480326

SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout · 3851f1cd

由 Trond Myklebust 提交于 8月 04, 2016

...and ensure that we propagate it to new transports on the same
client.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3851f1cd

SUNRPC: Fix reconnection timeouts · 02910177

由 Trond Myklebust 提交于 8月 04, 2016

When the connect attempt fails and backs off, we should start the clock
at the last connection attempt, not time at which we queue up the
reconnect job.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

02910177

05 8月, 2016 2 次提交

SUNRPC: disable the use of IPv6 temporary addresses. · d88e4d82

由 NeilBrown 提交于 8月 04, 2016

If the net.ipv6.conf.*.use_temp_addr sysctl is set to '2',
then TCP connections over IPv6 will prefer a 'private' source
address.
These eventually expire and become invalid, typically after a week,
but the time is configurable.

When the local address becomes invalid the client will not be able to
receive replies from the server.  Eventually the connection will timeout
or break and a new connection will be established, but this can take
half an hour (typically TCP connection break time).

RFC 4941, which describes private IPv6 addresses, acknowledges that some
applications might not work well with them and that the application may
explicitly a request non-temporary (i.e. "public") address.

I believe this is correct for SUNRPC clients.  Without this change, a
client will occasionally experience a long delay if private addresses
have been enabled.

The privacy offered by private addresses is of little value for an NFS
server which requires client authentication.

For NFSv3 this will often not be a problem because idle connections are
closed after 5 minutes.  For NFSv4 connections never go idle due to the
period RENEW (or equivalent) request.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d88e4d82

SUNRPC: allow for upcalls for same uid but different gss service · 9130b8db

由 Olga Kornievskaia 提交于 8月 03, 2016

It's possible to have simultaneous upcalls for the same UIDs but
different GSS service. In that case, we need to allow for the
upcall to gssd to proceed so that not the same context is used
by two different GSS services. Some servers lock the use of context
to the GSS service.
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9130b8db

03 8月, 2016 1 次提交

SUNRPC: Fix up socket autodisconnect · ad3331ac

由 Trond Myklebust 提交于 8月 02, 2016

Ensure that we don't forget to set up the disconnection timer for the
case when a connect request is fulfilled after the RPC request that
initiated it has timed out or been interrupted.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ad3331ac

02 8月, 2016 3 次提交

SUNRPC: Detect immediate closure of accepted sockets · c7995f8a

由 Trond Myklebust 提交于 7月 26, 2016

This modification is useful for debugging issues that happen while
the socket is being initialised.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c7995f8a

SUNRPC: accept() may return sockets that are still in SYN_RECV · b2f21f7d

由 Trond Myklebust 提交于 7月 26, 2016

We're seeing traces of the following form:

 [10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2
 [10952.396351] svc: tcp_accept ffff88042ba4 a000 sock ffff88042a6e4c80
 [10952.396362] nfsd: connect from 10.2.6.1, port=187
 [10952.396364] svc: svc_setup_socket ffff8800b99bcf00
 [10952.396368] setting up TCP socket for reading
 [10952.396370] svc: svc_setup_socket created ffff8803eb10a000 (inet ffff88042b75b800)
 [10952.396373] svc: transport ffff8803eb10a000 put into queue
 [10952.396375] svc: transport ffff88042ba4a000 put into queue
 [10952.396377] svc: server ffff8800bb0ec000 waiting for data (to = 3600000)
 [10952.396380] svc: transport ffff8803eb10a000 dequeued, inuse=2
 [10952.396381] svc_recv: found XPT_CLOSE
 [10952.396397] svc: svc_delete_xprt(ffff8803eb10a000)
 [10952.396398] svc: svc_tcp_sock_detach(ffff8803eb10a000)
 [10952.396399] svc: svc_sock_detach(ffff8803eb10a000)
 [10952.396412] svc: svc_sock_free(ffff8803eb10a000)

i.e. an immediate close of the socket after initialisation.

The culprit appears to be the test at the end of svc_tcp_init, which
checks if the newly created socket is in the TCP_ESTABLISHED state,
and immediately closes it if not. The evidence appears to suggest that
the socket might still be in the SYN_RECV state at this time.

The fix is to check for both states, and then to add a check in
svc_tcp_state_change() to ensure we don't close the socket when
it transitions into TCP_ESTABLISHED.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b2f21f7d

SUNRPC: Handle EADDRNOTAVAIL on connection failures · 1f4c17a0

由 Trond Myklebust 提交于 8月 01, 2016

If the connect attempt immediately fails with an EADDRNOTAVAIL error, then
that means our choice of source port number was bad.
This error is expected when we set the SO_REUSEPORT socket option and we
have 2 sockets sharing the same source and destination address and port
combinations.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Fixes: 402e23b4 ("SUNRPC: Fix stupid typo in xs_sock_set_reuseport")
Cc: stable@vger.kernel.org # v4.0+

1f4c17a0

25 7月, 2016 1 次提交

SUNRPC: Fix a compiler warning in fs/nfs/clnt.c · ce272302

由 Trond Myklebust 提交于 7月 24, 2016

Fix the report:

net/sunrpc/clnt.c:2580:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ce272302

20 7月, 2016 6 次提交

xprtrdma: fix semicolon.cocci warnings · 53d78523

由 kbuild test robot 提交于 7月 16, 2016

net/sunrpc/xprtrdma/verbs.c:798:2-3: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

53d78523

sunrpc: Prevent resvport min/max inversion via sysfs and module parameter · ffb6ca33

由 Frank Sorenson 提交于 7月 08, 2016

The current min/max resvport settings are independently limited
by the entire range of allowed ports, so max_resvport can be
set to a port lower than min_resvport.

Prevent inversion of min/max values when set through sysfs and
module parameter by setting the limits dependent on each other.
Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ffb6ca33

sunrpc: Prevent resvport min/max inversion via sysctl · e08ea3a9

由 Frank Sorenson 提交于 7月 08, 2016

The current min/max resvport settings are independently limited
by the entire range of allowed ports, so max_resvport can be
set to a port lower than min_resvport.

Prevent inversion of min/max values when set through sysctl by
setting the limits dependent on each other.
Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e08ea3a9

sunrpc: Fix reserved port range calculation · 5d71899a

由 Frank Sorenson 提交于 7月 08, 2016

The range calculation for choosing the random reserved port will panic
with divide-by-zero when min_resvport == max_resvport, a range of one
port, not zero.

Fix the reserved port range calculation by adding one to the difference.
Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5d71899a

sunrpc: Fix bit count when setting hashtable size to power-of-two · 34ae685c

由 Frank Sorenson 提交于 6月 27, 2016

Author: Frank Sorenson <sorenson@redhat.com>
Date:   2016-06-27 13:55:48 -0500

    sunrpc: Fix bit count when setting hashtable size to power-of-two

    The hashtable size is incorrectly calculated as the next higher
    power-of-two when being set to a power-of-two.  fls() returns the
    bit number of the most significant set bit, with the least
    significant bit being numbered '1'.  For a power-of-two, fls()
    will return a bit number which is one higher than the number of bits
    required, leading to a hashtable which is twice the requested size.

    In addition, the value of (1 << nbits) will always be at least num,
    so the test will never be true.

    Fix the hash table size calculation to correctly set hashtable
    size, and eliminate the unnecessary check.
Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

34ae685c

sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags · ce52914e

由 Scott Mayhew 提交于 6月 07, 2016

A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
not really safe to use the the generic_cred->acred->ac_flags to store
the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
with the auth_cred to be in a state where they're perpetually doing 4K
NFS_FILE_SYNC writes.

This can be reproduced as follows:

1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
They do not need to be the same export, nor do they even need to be from
the same NFS server.  Also, v3 is fine.
$ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
$ sudo mount -o v3,sec=sys server2:/export /mnt/sys

2. As the normal user, before accessing the kerberized mount, kinit with
a short lifetime (but not so short that renewing the ticket would leave
you within the 4-minute window again by the time the original ticket
expires), e.g.
$ kinit -l 10m -r 60m

3. Do some I/O to the kerberized mount and verify that the writes are
wsize, UNSTABLE:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

4. Wait until you're within 4 minutes of key expiry, then do some more
I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
set.  Verify that the writes are 4K, FILE_SYNC:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

5. Now do some I/O to the sec=sys mount.  This will cause
RPC_CRED_NO_CRKEY_TIMEOUT to be set:
$ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1

6. Writes for that user will now be permanently 4K, FILE_SYNC for that
user, regardless of which mount is being written to, until you reboot
the client.  Renewing the kerberos ticket (assuming it hasn't already
expired) will have no effect.  Grabbing a new kerberos ticket at this
point will have no effect either.

Move the flag to the auth->au_flags field (which is currently unused)
and rename it slightly to reflect that it's no longer associated with
the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
add the inode to the arg list of nfs_ctx_key_to_expire so we can
determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ce52914e

16 7月, 2016 1 次提交

SUNRPC: Fix infinite looping in rpc_clnt_iterate_for_each_xprt · bdc54d8e

由 Trond Myklebust 提交于 7月 16, 2016

If there were less than 2 entries in the multipath list, then
xprt_iter_next_entry_multiple() would never advance beyond the
first entry, which is correct for round robin behaviour, but not
for the list iteration.

The end result would be infinite looping in rpc_clnt_iterate_for_each_xprt()
as we would never see the xprt == NULL condition fulfilled.
Reported-by: NOleg Drokin <green@linuxhacker.ru>
Fixes: 80b14d5e ("SUNRPC: Add a structure to track multiple transports")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bdc54d8e

14 7月, 2016 10 次提交

SUNRPC: Remove unused callback xpo_adjust_wspace() · f4a4906e

由 Trond Myklebust 提交于 6月 24, 2016

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f4a4906e

SUNRPC: Change TCP socket space reservation · 637600f3

由 Trond Myklebust 提交于 6月 24, 2016

The current server rpc tcp code attempts to predict how much writeable
socket space will be available to a given RPC call before accepting it
for processing.  On a 40GigE network, we've found this throttles
individual clients long before the network or disk is saturated.  The
server may handle more clients easily, but the bandwidth of individual
clients is still artificially limited.

Instead of trying (and failing) to predict how much writeable socket space
will be available to the RPC call, just fall back to the simple model of
deferring processing until the socket is uncongested.

This may increase the risk of fast clients starving slower clients; in
such cases, the previous patch allows setting a hard per-connection
limit.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

637600f3

SUNRPC: Add a server side per-connection limit · ff3ac5c3

由 Trond Myklebust 提交于 6月 24, 2016

Allow the user to limit the number of requests serviced through a single
connection, to help prevent faster clients from starving slower clients.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ff3ac5c3

SUNRPC: Micro optimisation for svc_data_ready · 4720b070

由 Trond Myklebust 提交于 6月 24, 2016

Don't call svc_xprt_enqueue() if the XPT_DATA flag is already set.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4720b070

SUNRPC: Call the default socket callbacks instead of open coding · fa9251af

由 Trond Myklebust 提交于 6月 24, 2016

Rather than code up our own versions of the socket callbacks, just
call the defaults.
This also allows us to merge svc_udp_data_ready() and svc_tcp_data_ready().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

fa9251af

SUNRPC: lock the socket while detaching it · 069c225b

由 Trond Myklebust 提交于 6月 24, 2016

Prevent callbacks from triggering while we're detaching the socket.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

069c225b

SUNRPC: Add tracepoints for dropped and deferred requests · 104f6351

由 Trond Myklebust 提交于 6月 24, 2016

Dropping and/or deferring requests has an impact on performance. Let's
make sure we can trace those events.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

104f6351

SUNRPC: Add a tracepoint for server socket out-of-space conditions · 82ea2d76

由 Trond Myklebust 提交于 6月 24, 2016

Add a tracepoint to track when the processing of incoming RPC data gets
deferred due to out-of-space issues on the outgoing transport.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

82ea2d76

sunrpc: add gss minor status to svcauth_gss_proxy_init · 04d70eda

由 Scott Mayhew 提交于 6月 15, 2016

GSS-Proxy doesn't produce very much debug logging at all.  Printing out
the gss minor status will aid in troubleshooting if the
GSS_Accept_sec_context upcall fails.
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

04d70eda

sunrpc: remove 'inuse' flag from struct cache_detail. · d8d29138

由 NeilBrown 提交于 6月 02, 2016

This field is not currently in use.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d8d29138

12 7月, 2016 8 次提交

NFS: Don't drop CB requests with invalid principals · a4e187d8

由 Chuck Lever 提交于 6月 29, 2016

Before commit 778be232 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.

Fixes: 778be232 ("NFS do not find client in NFSv4 ...")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a4e187d8

svc: Avoid garbage replies when pc_func() returns rpc_drop_reply · 0533b130

由 Chuck Lever 提交于 6月 29, 2016

If an RPC program does not set vs_dispatch and pc_func() returns
rpc_drop_reply, the server sends a reply anyway containing a single
word containing the value RPC_DROP_REPLY (in network byte-order, of
course). This is a nonsense RPC message.

Fixes: 9e701c61 ("svcrpc: simpler request dropping")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0533b130

xprtrdma: No direct data placement with krb5i and krb5p · 65b80179

由 Chuck Lever 提交于 6月 29, 2016

Direct data placement is not allowed when using flavors that
guarantee integrity or privacy. When such security flavors are in
effect, don't allow the use of Read and Write chunks for moving
individual data items. All messages larger than the inline threshold
are sent via Long Call or Long Reply.

On my systems (CX-3 Pro on FDR), for small I/O operations, the use
of Long messages adds only around 5 usecs of latency in each
direction.

Note that when integrity or encryption is used, the host CPU touches
every byte in these messages. Even if it could be used, data
movement offload doesn't buy much in this case.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

65b80179

xprtrdma: Clean up fixup_copy_count accounting · 64695bde

由 Chuck Lever 提交于 6月 29, 2016

fixup_copy_count should count only the number of bytes copied to the
page list. The head and tail are now always handled without a data
copy.

And the debugging at the end of rpcrdma_inline_fixup() is also no
longer necessary, since copy_len will be non-zero when there is reply
data in the tail (a normal and valid case).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

64695bde

xprtrdma: Update only specific fields in private receive buffer · cfabe2c6

由 Chuck Lever 提交于 6月 29, 2016

Now that rpcrdma_inline_fixup() updates only two fields in
rq_rcv_buf, a full memcpy of that structure to rq_private_buf is
unwarranted. Updating rq_private_buf fields only where needed also
better documents what is going on.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

cfabe2c6

xprtrdma: Do not update {head, tail}.iov_len in rpcrdma_inline_fixup() · cb0ae1fb

由 Chuck Lever 提交于 6月 29, 2016

While trying NFSv4.0/RDMA with sec=krb5p, I noticed small NFS READ
operations failed. After the client unwrapped the NFS READ reply
message, the NFS READ XDR decoder was not able to decode the reply.
The message was "Server cheating in reply", with the reported
number of received payload bytes being zero. Applications reported
a read(2) that returned -1/EIO.

The problem is rpcrdma_inline_fixup() sets the tail.iov_len to zero
when the incoming reply fits entirely in the head iovec. The zero
tail.iov_len confused xdr_buf_trim(), which then mangled the actual
reply data instead of simply removing the trailing GSS checksum.

As near as I can tell, RPC transports are not supposed to update the
head.iov_len, page_len, or tail.iov_len fields in the receive XDR
buffer when handling an incoming RPC reply message. These fields
contain the length of each component of the XDR buffer, and hence
the maximum number of bytes of reply data that can be stored in each
XDR buffer component. I've concluded this because:

- This is how xdr_partial_copy_from_skb() appears to behave
- rpcrdma_inline_fixup() already does not alter page_len
- call_decode() compares rq_private_buf and rq_rcv_buf and WARNs
   if they are not exactly the same

Unfortunately, as soon as I tried the simple fix to just remove the
line that sets tail.iov_len to zero, I saw that the logic that
appends the implicit Write chunk pad inline depends on inline_fixup
setting tail.iov_len to zero.

To address this, re-organize the tail iovec handling logic to use
the same approach as with the head iovec: simply point tail.iov_base
to the correct bytes in the receive buffer.

While I remember all this, write down the conclusion in documenting
comments.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

cb0ae1fb

xprtrdma: rpcrdma_inline_fixup() overruns the receive page list · 80414abc

由 Chuck Lever 提交于 6月 29, 2016

When the remaining length of an incoming reply is longer than the
XDR buf's page_len, switch over to the tail iovec instead of
copying more than page_len bytes into the page list.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

80414abc

xprtrdma: Chunk list encoders no longer share one rl_segments array · 5ab81428

由 Chuck Lever 提交于 6月 29, 2016

Currently, all three chunk list encoders each use a portion of the
one rl_segments array in rpcrdma_req. This is because the MWs for
each chunk list were preserved in rl_segments so that ro_unmap could
find and invalidate them after the RPC was complete.

However, now that MWs are placed on a per-req linked list as they
are registered, there is no longer any information in rpcrdma_mr_seg
that is shared between ro_map and ro_unmap_{sync,safe}, and thus
nothing in rl_segments needs to be preserved after
rpcrdma_marshal_req is complete.

Thus the rl_segments array can be used now just for the needs of
each rpcrdma_convert_iovs call. Once each chunk list is encoded, the
next chunk list encoder is free to re-use all of rl_segments.

This means all three chunk lists in one RPC request can now each
encode a full size data payload with no increase in the size of
rl_segments.

This is a key requirement for Kerberos support, since both the Call
and Reply for a single RPC transaction are conveyed via Long
messages (RDMA Read/Write). Both can be large.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5ab81428

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功