- 04 6月, 2014 18 次提交
-
-
由 Shirley Ma 提交于
GETACL relies on transport layer to alloc memory for reply buffer. However xprtrdma assumes that the reply buffer (pagelist) has been pre-allocated in upper layer. This problem was reported by IOL OFA lab test on PPC. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NEdward Mossman <emossman@iol.unh.edu> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up. Remove HCA-specific clutter in xprtrdma, which is supposed to be device-independent. Hal Rosenstock <hal@dev.mellanox.co.il> observes: > Note that there is OpenSM option (enable_quirks) to return 1K MTU > in SA PathRecord responses for Tavor so that can be used for this. > The default setting for enable_quirks is FALSE so that would need > changing. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Devesh Sharma <Devesh.Sharma@Emulex.Com> reports that after a disconnect, his HCA is failing to create a fresh QP, leaving ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to wake up and post LOCAL_INV as they exit, causing an oops. rpcrdma_ep_connect() is allowing the wake-up by leaking the QP creation error code (-EPERM in this case) to the RPC client's generic layer. xprt_connect_status() does not recognize -EPERM, so it kills pending RPC tasks immediately rather than retrying the connect. Re-arrange the QP creation logic so that when it fails on reconnect, it leaves ->qp with the old QP rather than NULL. If pending RPC tasks wake and exit, LOCAL_INV work requests will flush rather than oops. On initial connect, leaving ->qp == NULL is OK, since there are no pending RPCs that might use ->qp. But be sure not to try to destroy a NULL QP when rpcrdma_ep_connect() is retried. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
While marshaling an RPC/RDMA request, the inline_{rsize,wsize} settings determine whether an inline request is used, or whether read or write chunks lists are built. The current default value of these settings is 1024. Any RPC request smaller than 1024 bytes is sent to the NFS server completely inline. rpcrdma_buffer_create() allocates and pre-registers a set of RPC buffers for each transport instance, also based on the inline rsize and wsize settings. RPC/RDMA requests and replies are built in these buffers. However, if an RPC/RDMA request is expected to be larger than 1024, a buffer has to be allocated and registered for that RPC, and deregistered and released when the RPC is complete. This is known has a "hardway allocation." Since the introduction of NFSv4, the size of RPC requests has become larger, and hardway allocations are thus more frequent. Hardway allocations are significant overhead, and they waste the existing RPC buffers pre-allocated by rpcrdma_buffer_create(). We'd like fewer hardway allocations. Increasing the size of the pre-registered buffers is the most direct way to do this. However, a blanket increase of the inline thresholds has interoperability consequences. On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000 bytes for each RPC request buffer, using kmalloc(). Due to internal fragmentation, this wastes nearly 1200 bytes because kmalloc() already returns an 8192-byte piece of memory for a 7000-byte allocation request, though the extra space remains unused. So let's round up the size of the pre-allocated buffers, and make use of the unused space in the kmalloc'd memory. This change reduces the amount of hardway allocated memory for an NFSv4 general connectathon run from 1322092 to 9472 bytes (99%). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Sagi Grimberg <sagig@dev.mellanox.co.il> points out that a steady stream of CQ events could starve other work because of the boundless loop pooling in rpcrdma_{send,recv}_poll(). Instead of a (potentially infinite) while loop, return after collecting a budgeted number of completions. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NSagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Change the completion handlers to grab up to 16 items per ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16 items are returned. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Skip the ib_poll_cq() after re-arming, if the provider knows there are no additional items waiting. (Have a look at commit ed23a727 for more details). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The current CQ handler uses the ib_wc.opcode field to distinguish between event types. However, the contents of that field are not reliable if the completion status is not IB_WC_SUCCESS. When an error completion occurs on a send event, the CQ handler schedules a tasklet with something that is not a struct rpcrdma_rep. This is never correct behavior, and sometimes it results in a panic. To resolve this issue, split the completion queue into a send CQ and a receive CQ. The send CQ handler now handles only struct rpcrdma_mw wr_id's, and the receive CQ handler now handles only struct rpcrdma_rep wr_id's. Fix suggested by Shirley Ma <shirley.ma@oracle.com> Reported-by: NRafael Reiter <rafael.reiter@ims.co.at> Fixes: 5c635e09 BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NKlemens Senn <klemens.senn@ims.co.at> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: rpcrdma_ep_destroy() returns a value that is used only to print a debugging message. rpcrdma_ep_destroy() already prints debugging messages in all error cases. Make rpcrdma_ep_destroy() return void instead. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: All remaining callers of rpcrdma_deregister_external() pass NULL as the last argument, so remove that argument. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
If the selected memory registration mode is not supported by the underlying provider/HCA, the NFS mount command reports that there was an invalid mount option, and fails. This is misleading. Reporting a problem allocating memory is a lot closer to the truth. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
An audit of in-kernel RDMA providers that do not support the FRMR memory registration shows that several of them support MTHCAFMR. Prefer MTHCAFMR when FRMR is not supported. If MTHCAFMR is not supported, only then choose ALLPHYSICAL. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
All kernel RDMA providers except amso1100 support either MTHCAFMR or FRMR, both of which are faster than REGISTER. amso1100 can continue to use ALLPHYSICAL. The only other ULP consumer in the kernel that uses the reg_phys_mr verb is Lustre. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were intended as stop-gap modes before the introduction of FRMR. They are now considered obsolete. MEMWINDOWS_ASYNC is also considered unsafe because it can leave client memory registered and exposed for an indeterminant time after each I/O. At this point, the MEMWINDOWS modes add needless complexity, so remove them. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: This memory registration mode is slow and was never meant for use in production environments. Remove it to reduce implementation complexity. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
An IB provider can invoke rpcrdma_conn_func() in an IRQ context, thus rpcrdma_conn_func() cannot be allowed to directly invoke generic RPC functions like xprt_wake_pending_tasks(). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Allen Andrews 提交于
Two memory region leaks were found during testing: 1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is called. If ib_alloc_fast_reg_page_list returns an error it bails out of the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a memory leak. Added code to dereg the last frmr if ib_alloc_fast_reg_page_list fails. 2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free the MR's on the rb_mws list if there are rb_send_bufs present. However, in rpcrdma_buffer_create while the rb_mws list is being built if one of the MR allocation requests fail after some MR's have been allocated on the rb_mws list the routine never gets to create any rb_send_bufs but instead jumps to the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws list because the rb_send_bufs were never created. This leaks all the MR's on the rb_mws list that were created prior to one of the MR allocations failing. Issue(2) was seen during testing. Our adapter had a finite number of MR's available and we created enough connections to where we saw an MR allocation failure on our Nth NFS connection request. After the kernel cleaned up the resources it had allocated for the Nth connection we noticed that FMR's had been leaked due to the coding error described above. Issue(1) was seen during a code review while debugging issue(2). Signed-off-by: NAllen Andrews <allen.andrews@emulex.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Steve Wise 提交于
Some rdma devices don't support a fast register page list depth of at least RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast register regions according to the minimum of the device max supported depth or RPCRDMA_MAX_DATA_SEGS. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 29 5月, 2014 1 次提交
-
-
由 David Rientjes 提交于
rpc_malloc() allocates with GFP_NOWAIT without making any attempt at reclaim so it easily fails when low on memory. This ends up spamming the kernel log: SLAB: Unable to allocate memory on node 0 (gfp=0x4000) cache: kmalloc-8192, object size: 8192, order: 1 node 0: slabs: 207/207, objs: 207/207, free: 0 rekonq: page allocation failure: order:1, mode:0x204000 CPU: 2 PID: 14321 Comm: rekonq Tainted: G O 3.15.0-rc3-12.gfc9498b-desktop+ #6 Hardware name: System manufacturer System Product Name/M4A785TD-V EVO, BIOS 2105 07/23/2010 0000000000000000 ffff880010ff17d0 ffffffff815e693c 0000000000204000 ffff880010ff1858 ffffffff81137bd2 0000000000000000 0000001000000000 ffff88011ffebc38 0000000000000001 0000000000204000 ffff88011ffea000 Call Trace: [<ffffffff815e693c>] dump_stack+0x4d/0x6f [<ffffffff81137bd2>] warn_alloc_failed+0xd2/0x140 [<ffffffff8113be19>] __alloc_pages_nodemask+0x7e9/0xa30 [<ffffffff811824a8>] kmem_getpages+0x58/0x140 [<ffffffff81183de6>] fallback_alloc+0x1d6/0x210 [<ffffffff81183be3>] ____cache_alloc_node+0x123/0x150 [<ffffffff81185953>] __kmalloc+0x203/0x490 [<ffffffffa06b0ee2>] rpc_malloc+0x32/0xa0 [sunrpc] [<ffffffffa06a6999>] call_allocate+0xb9/0x170 [sunrpc] [<ffffffffa06b19d8>] __rpc_execute+0x88/0x460 [sunrpc] [<ffffffffa06b2da9>] rpc_execute+0x59/0xc0 [sunrpc] [<ffffffffa06a932b>] rpc_run_task+0x6b/0x90 [sunrpc] [<ffffffffa077b5c1>] nfs4_call_sync_sequence+0x51/0x80 [nfsv4] [<ffffffffa077d45d>] _nfs4_do_setattr+0x1ed/0x280 [nfsv4] [<ffffffffa0782a72>] nfs4_do_setattr+0x72/0x180 [nfsv4] [<ffffffffa078334c>] nfs4_proc_setattr+0xbc/0x140 [nfsv4] [<ffffffffa074a7e8>] nfs_setattr+0xd8/0x240 [nfs] [<ffffffff811baa71>] notify_change+0x231/0x380 [<ffffffff8119cf5c>] chmod_common+0xfc/0x120 [<ffffffff8119df80>] SyS_chmod+0x40/0x90 [<ffffffff815f4cfd>] system_call_fastpath+0x1a/0x1f ... If the allocation fails, simply return NULL and avoid spamming the kernel log. Reported-by: NMarc Dietrich <marvin24@gmx.de> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 19 5月, 2014 1 次提交
-
-
由 Trond Myklebust 提交于
We're not taking a reference in the case where _gss_mech_get_by_pseudoflavor loops without finding the correct rpcsec_gss flavour, so why are we releasing it? Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 12 4月, 2014 1 次提交
-
-
由 David S. Miller 提交于
Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 4月, 2014 1 次提交
-
-
由 Stanislav Kinsbursky 提交于
There could be a case, when NFSd file system is mounted in network, different to socket's one, like below: "ip netns exec" creates new network and mount namespace, which duplicates NFSd mount point, created in init_net context. And thus NFS server stop in nested network context leads to RPCBIND client destruction in init_net. Then, on NFSd start in nested network context, rpc.nfsd process creates socket in nested net and passes it into "write_ports", which leads to RPCBIND sockets creation in init_net context because of the same reason (NFSd monut point was created in init_net context). An attempt to register passed socket in nested net leads to panic, because no RPCBIND client present in nexted network namespace. This patch add check that passed socket's net matches NFSd superblock's one. And returns -EINVAL error to user psace otherwise. v2: Put socket on exit. Reported-by: NWeng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 30 3月, 2014 4 次提交
-
-
由 Kinglong Mee 提交于
Don't move the assign of args->bc_xprt->xpt_bc_xprt out of xs_setup_bc_tcp, because rpc_ping (which is in rpc_create) will using it. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Besides checking rpc_xprt out of xs_setup_bc_tcp, increase it's reference (it's important). Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Backchannel xprt isn't freed right now. Free it in bc_destroy, and put the reference of THIS_MODULE. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 29 3月, 2014 5 次提交
-
-
由 J. Bruce Fields 提交于
Allow xdr_buf_subsegment(&buf, &buf, base, len) to modify an xdr_buf in-place. Also, none of the callers need the iov_base of head or tail to be zeroed out. Also add documentation. (As it turns out, I'm not really using this new guarantee, but it seems a simple way to make this function a bit more robust.) Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Creating xprt failed after xs_format_peer_addresses, sunrpc must free those memory of peer addresses in xprt. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
The xdr_off value in dma_map_xdr gets passed to ib_dma_map_page as the offset into the page to be mapped. This calculation does not correctly take into account the case where the data starts at some offset into the page. Increment the xdr_off by the page_base to ensure that it is respected. Cc: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
There are two entirely separate modules under xprtrdma/ and there's no reason that enabling one should automatically enable the other. Add config options for each one so they can be enabled/disabled separately. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Tom Tucker 提交于
The server regression was caused by the addition of rq_next_page (afc59400). There were a few places that were missed with the update of the rq_respages array. Signed-off-by: NTom Tucker <tom@ogc.us> Tested-by: NSteve Wise <swise@ogc.us> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 28 3月, 2014 2 次提交
-
-
由 Rashika Kheria 提交于
Mark functions as static in net/sunrpc/svc_xprt.c because they are not used outside this file. This eliminates the following warning in net/sunrpc/svc_xprt.c: net/sunrpc/svc_xprt.c:574:5: warning: no previous prototype for ‘svc_alloc_arg’ [-Wmissing-prototypes] net/sunrpc/svc_xprt.c:615:18: warning: no previous prototype for ‘svc_get_next_xprt’ [-Wmissing-prototypes] net/sunrpc/svc_xprt.c:694:6: warning: no previous prototype for ‘svc_add_new_temp_xprt’ [-Wmissing-prototypes] Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
It retries in 1s, not 1000 jiffies. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 21 3月, 2014 2 次提交
-
-
由 Trond Myklebust 提交于
When restarting an rpc call, we should not be carrying over data from the previous call. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 20 3月, 2014 2 次提交
-
-
由 Steve Dickson 提交于
Don't schedule an rpc_delay before checking to see if the task is a SOFTCONN because the tk_callback from the delay (__rpc_atrun) clears the task status before the rpc_exit_task can be run. Signed-off-by: NSteve Dickson <steved@redhat.com> Fixes: 561ec160 (SUNRPC: call_connect_status should recheck...) Link: http://lkml.kernel.org/r/5329CF7C.7090308@RedHat.comSigned-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 18 3月, 2014 3 次提交
-
-
由 Chuck Lever 提交于
The use of KERN_INFO causes garbage characters to appear when debugging is enabled. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Chuck Lever 提交于
After commit a11a2bf4, "SUNRPC: Optimise away unnecessary data moves in xdr_align_pages", Thu Aug 2 13:21:43 2012, READs larger than a few hundred bytes via NFS/RDMA no longer work. This commit exposed a long-standing bug in rpcrdma_inline_fixup(). I reproduce this with an rsize=4096 mount using the cthon04 basic tests. Test 5 fails with an EIO error. For my reproducer, kernel log shows: NFS: server cheating in read reply: count 4096 > recvd 0 rpcrdma_inline_fixup() is zeroing the xdr_stream::page_len field, and xdr_align_pages() is now returning that value to the READ XDR decoder function. That field is set up by xdr_inline_pages() by the READ XDR encoder function. As far as I can tell, it is supposed to be left alone after that, as it describes the dimensions of the reply xdr_stream, not the contents of that stream. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=68391Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
If the rpcbind server is unavailable, we still want the RPC client to respect the timeout. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-