- 25 11月, 2015 2 次提交
-
-
由 J. Bruce Fields 提交于
The principal name on a gss cred is used to setup the NFSv4.0 callback, which has to have a client principal name to authenticate to. That code wants the name to be in the form servicetype@hostname. rpc.svcgssd passes down such names (and passes down no principal name at all in the case the principal isn't a service principal). gss-proxy always passes down the principal name, and passes it down in the form servicetype/hostname@REALM. So we've been munging the name gss-proxy passes down into the format the NFSv4.0 callback code expects, or throwing away the name if we can't. Since the introduction of the MACH_CRED enforcement in NFSv4.1, we've also been using the principal name to verify that certain operations are done as the same principal as was used on the original EXCHANGE_ID call. For that application, the original name passed down by gss-proxy is also useful. Lack of that name in some cases was causing some kerberized NFSv4.1 mount failures in an Active Directory environment. This fix only works in the gss-proxy case. The fix for legacy rpc.svcgssd would be more involved, and rpc.svcgssd already has other problems in the AD case. Reported-and-tested-by: NJames Ralston <ralston@pobox.com> Acked-by: NSimo Sorce <simo@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
Minor cleanup, no change in behavior. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 24 11月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
Minor optimization: when dealing with write chunk XDR roundup, do not post a Write WR for the zero bytes in the pad. Simply update the write segment in the RPC-over-RDMA header to reflect the extra pad bytes. The Reply chunk is also a write chunk, but the server does not use send_write_chunks() to send the Reply chunk. That's OK in this case: the server Upper Layer typically marshals the Reply chunk contents in a single contiguous buffer, without a separate tail for the XDR pad. The comments and the variable naming refer to "chunks" but what is really meant is "segments." The existing code sends only one xdr_write_chunk per RPC reply. The fix assumes this as well. When the XDR pad in the first write chunk is reached, the assumption is the Write list is complete and send_write_chunks() returns. That will remain a valid assumption until the server Upper Layer can support multiple bulk payload results per RPC. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 11 11月, 2015 1 次提交
-
-
由 J. Bruce Fields 提交于
We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd expect them. But it doesn't appear they're necessary in our case, and this is likely a hot path--for now just document the odd behavior. Kosuke Tatsukawa found this issue while looking through the linux source code for places calling waitqueue_active() before wake_up*(), but without preceding memory barriers, after sending a patch to fix a similar issue in drivers/tty/n_tty.c (Details about the original issue can be found here: https://lkml.org/lkml/2015/9/28/849). Reported-by: NKosuke Tatsukawa <tatsu@ab.jp.nec.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 10 11月, 2015 1 次提交
-
-
由 Stefan Hajnoczi 提交于
The svc_setup_socket() function does set the send and receive buffer sizes, so the comment is out-of-date: Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 04 11月, 2015 2 次提交
-
-
由 Kinglong Mee 提交于
The sunrpc debug sysctl files only accept decimal number right now. But all the XXXDBUG_XXX macros are defined as hexadecimal. It is not easy to set or check an separate flag. This patch let those files support accepting hexadecimal number, (decimal number is also supported). Also, display it as hexadecimal. v2, Remove duplicate parsing of '0x...', just using simple_strtol(tmpbuf, &s, 0) Fix a bug of isspace() checking after parsing Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Andrzej Hajda 提交于
Due to incorrect len type bc_send_request returned always zero. The problem has been detected using proposed semantic patch scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 03 11月, 2015 17 次提交
-
-
由 Chuck Lever 提交于
Forechannel transports get their own "bc_up" method to create an endpoint for the backchannel service. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> [Anna Schumaker: Add forward declaration of struct net to xprt.h] Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Allow the use of other transport classes when handling a backward direction RPC call. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
On NFSv4.1 mount points, the Linux NFS client uses this transport endpoint to receive backward direction calls and route replies back to the NFSv4.1 server. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: N"J. Bruce Fields" <bfields@fieldses.org> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Introduce a code path in the rpcrdma_reply_handler() to catch incoming backward direction RPC calls and route them to the ULP's backchannel server. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Backward direction RPC replies are sent via the client transport's send_request method, the same way forward direction RPC calls are sent. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Pre-allocate extra send and receive Work Requests needed to handle backchannel receives and sends. The transport doesn't know how many extra WRs to pre-allocate until the xprt_setup_backchannel() call, but that's long after the WRs are allocated during forechannel setup. So, use a fixed value for now. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
xprtrdma's backward direction send and receive buffers are the same size as the forechannel's inline threshold, and must be pre- registered. The consumer has no control over which receive buffer the adapter chooses to catch an incoming backwards-direction call. Any receive buffer can be used for either a forward reply or a backward call. Thus both types of RPC message must all be the same size. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
xprt_{setup,destroy}_backchannel() won't be adequate for RPC/RMDA bi-direction. In particular, receive buffers have to be pre- registered and posted in order to receive incoming backchannel requests. Add a virtual function call to allow the insertion of appropriate backchannel setup and destruction methods for each transport. In addition, freeing a backchannel request is a little different for RPC/RDMA. Introduce an rpc_xprt_op to handle the difference. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Now that RPC replies are processed in a workqueue, there's no need to disable IRQs when managing send and receive buffers. This saves noticeable overhead per RPC. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: The reply tasklet is no longer used. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The reply tasklet is fast, but it's single threaded. After reply traffic saturates a single CPU, there's no more reply processing capacity. Replace the tasklet with a workqueue to spread reply handling across all CPUs. This also moves RPC/RDMA reply handling out of the soft IRQ context and into a context that allows sleeps. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The rb_send_bufs and rb_recv_bufs arrays are used to implement a pair of stacks for keeping track of free rpcrdma_req and rpcrdma_rep structs. Replace those arrays with free lists. To allow more than 512 RPCs in-flight at once, each of these arrays would be larger than a page (assuming 8-byte addresses and 4KB pages). Allowing up to 64K in-flight RPCs (as TCP now does), each buffer array would have to be 128 pages. That's an order-6 allocation. (Not that we're going there.) A list is easier to expand dynamically. Instead of allocating a larger array of pointers and copying the existing pointers to the new array, simply append more buffers to each list. This also makes it simpler to manage receive buffers that might catch backwards-direction calls, or to post receive buffers in bulk to amortize the overhead of ib_post_recv. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: The error cases in rpcrdma_reply_handler() almost never execute. Ensure the compiler places them out of the hot path. No behavior change expected. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Commit 8301a2c0 ("xprtrdma: Limit work done by completion handler") was supposed to prevent xprtrdma's upcall handlers from starving other softIRQ work by letting them return to the provider before all CQEs have been polled. The logic assumes the provider will call the upcall handler again immediately if the CQ is re-armed while there are still queued CQEs. This assumption is invalid. The IBTA spec says that after a CQ is armed, the hardware must interrupt only when a new CQE is inserted. xprtrdma can't rely on the provider calling again, even though some providers do. Therefore, leaving CQEs on queue makes sense only when there is another mechanism that ensures all remaining CQEs are consumed in a timely fashion. xprtrdma does not have such a mechanism. If a CQE remains queued, the transport can wait forever to send the next RPC. Finally, move the wcs array back onto the stack to ensure that the poll array is always local to the CPU where the completion upcall is running. Fixes: 8301a2c0 ("xprtrdma: Limit work done by completion ...") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns a positive value if WCs were added to a CQ after the last completion upcall but before the CQ has been re-armed. Commit 7f23f6f6 ("xprtrmda: Reduce lock contention in completion handlers") assumed that when ib_req_notify_cq() returned a positive RC, the CQ had also been successfully re-armed, making it safe to return control to the provider without losing any completion signals. That is an invalid assumption. Change both completion handlers to continue polling while ib_req_notify_cq() returns a positive value. Fixes: 7f23f6f6 ("xprtrmda: Reduce lock contention in ...") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
After adding a swapfile on an NFS/RDMA mount and removing the normal swap partition, I was able to push the NFS client well into swap without any issue. I forgot to swapoff the NFS file before rebooting. This pinned the NFS mount and the IB core and provider, causing shutdown to hang. I think this is expected and safe behavior. Probably shutdown scripts should "swapoff -a" before unmounting any filesystems. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Steve Wise 提交于
Unsignaled send WRs can get flushed as part of normal unmount, so don't log them as warnings. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 29 10月, 2015 3 次提交
-
-
由 Sagi Grimberg 提交于
Instead of maintaining a fastreg page list, keep an sg table and convert an array of pages to a sg list. Then call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Acked-by: NChristoph Hellwig <hch@lst.de> Tested-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NSelvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Instead of maintaining a fastreg page list, keep an sg table and convert an array of pages to a sg list. Then call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Acked-by: NChristoph Hellwig <hch@lst.de> Tested-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NSelvin Xavier <selvin.xavier@avagotech.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Guy Shapiro 提交于
Add support for network namespaces in the ib_cma module. This is accomplished by: 1. Adding network namespace parameter for rdma_create_id. This parameter is used to populate the network namespace field in rdma_id_private. rdma_create_id keeps a reference on the network namespace. 2. Using the network namespace from the rdma_id instead of init_net inside of ib_cma, when listening on an ID and when looking for an ID for an incoming request. 3. Decrementing the reference count for the appropriate network namespace when calling rdma_destroy_id. In order to preserve the current behavior init_net is passed when calling from other modules. Signed-off-by: NGuy Shapiro <guysh@mellanox.com> Signed-off-by: NHaggai Eran <haggaie@mellanox.com> Signed-off-by: NYotam Kenneth <yotamke@mellanox.com> Signed-off-by: NShachar Raindel <raindel@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 24 10月, 2015 3 次提交
-
-
由 Neil Brown 提交于
The caches used to store sunrpc authentication information can be flushed by writing a timestamp to a file in /proc. This timestamp has a one-second resolution and any entry in cache that was last_refreshed *before* that time is treated as expired. This is problematic as it is not possible to reliably flush the cache without interrupting NFS service. If the current time is written to the "flush" file, any entry that was added since the current second started will still be treated as valid. If one second beyond than the current time is written to the file then no entries can be valid until the second ticks over. This will mean that no NFS request will be handled for up to 1 second. To resolve this issue we make two changes: 1/ treat an entry as expired if the timestamp when it was last_refreshed is before *or the same as* the expiry time. This means that current code which writes out the current time will now flush the cache reliably. 2/ when a new entry in added to the cache - set the last_refresh timestamp to 1 second *beyond* the current flush time, when that not in the past. This ensures that newly added entries will always be valid. Now that we have a very reliable way to flush the cache, and also since we are using "since-boot" timestamps which are monotonic, change cache_purge() to set the smallest future flush_time which will work, and leave it there: don't revert to '1'. Also disable the setting of the 'flush_time' far into the future. That has never been useful and is now awkward as it would cause last_refresh times to be strange. Finally: if a request is made to set the 'flush_time' to the current second, assume the intent is to flush the cache and advance it, if necessary, to 1 second beyond the current 'flush_time' so that all active entries will be deemed to be expired. As part of this we need to add a 'cache_detail' arg to cache_init() and cache_fresh_locked() so they can find the current ->flush_time. Signed-off-by: NNeilBrown <neilb@suse.com> Reported-by: NOlaf Kirch <okir@suse.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Arnd Bergmann 提交于
The gss_key_timeout() function causes a harmless warning in some configurations, e.g. ARM imx_v6_v7_defconfig with gcc-5.2, if the compiler cannot figure out the state of the 'expire' variable across an rcu_read_unlock(): net/sunrpc/auth_gss/auth_gss.c: In function 'gss_key_timeout': net/sunrpc/auth_gss/auth_gss.c:1422:211: warning: 'expire' may be used uninitialized in this function [-Wmaybe-uninitialized] To avoid this warning without adding a bogus initialization, this rewrites the function so the comparison is done inside of the critical section. As a side-effect, it also becomes slightly easier to understand because the implementation now more closely resembles the comment above it. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Fixes: c5e6aecd ("sunrpc: fix RCU handling of gc_ctx field") Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Trond Myklebust 提交于
If we're sending more pages via kernel_sendpage(), then set MSG_SENDPAGE_NOTLAST. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 12 10月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
Now that the NFS server advertises a maximum payload size of 1MB for RPC/RDMA again, it crashes in svc_process_common() when NFS client sends a 1MB NFS WRITE on an NFS/RDMA mount. The server has set up a 259 element array of struct page pointers in rq_pages[] for each incoming request. The last element of the array is NULL. When an incoming request has been completely received, rdma_read_complete() attempts to set the starting page of the incoming page vector: rqstp->rq_arg.pages = &rqstp->rq_pages[head->hdr_count]; and the page to use for the reply: rqstp->rq_respages = &rqstp->rq_arg.pages[page_no]; But the value of page_no has already accounted for head->hdr_count. Thus rq_respages now points past the end of the incoming pages. For NFS WRITE operations smaller than the maximum, this is harmless. But when the NFS WRITE operation is as large as the server's max payload size, rq_respages now points at the last entry in rq_pages, which is NULL. Fixes: cc9a903d ('svcrdma: Change maximum server payload . . .') BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@dev.mellanox.co.il> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 08 10月, 2015 6 次提交
-
-
由 Trond Myklebust 提交于
If we're sending more than one page via kernel_sendpage(), then set MSG_SENDPAGE_NOTLAST between the pages so that we don't send suboptimal frames (see commit 2f533844 and commit 35f9c09f). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Now that we've done it for TCP and UDP, let's convert AF_LOCAL as well. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Now that we've done it for TCP, let's convert UDP as well. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Stream protocols such as TCP can often build up a backlog of data to be read due to ordering. Combine this with the fact that some workloads such as NFS read()-intensive workloads need to receive a lot of data per RPC call, and it turns out that receiving the data from inside a softirq context can cause starvation. The following patch moves the TCP data receive into a workqueue context. We still end up calling tcp_read_sock(), but we do so from a process context, meaning that softirqs are enabled for most of the time. With this patch, I see a doubling of read bandwidth when running a multi-threaded iozone workload between a virtual client and server setup. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Move the TCP data receive loop out of xs_tcp_data_ready(). Doing so will allow us to move the data receive out of the softirq context in a set of followup patches. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Christoph Hellwig 提交于
This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: NHaggai Eran <haggaie@mellanox.com> Tested-by: NSagi Grimberg <sagig@mellanox.com> Tested-by: NSteve Wise <swise@opengridcomputing.com>
-
- 07 10月, 2015 1 次提交
-
-
由 Sagi Grimberg 提交于
There is no need to require LOCAL_DMA_LKEY support as the PD allocation makes sure that there is a local_dma_lkey. Also correctly set a return value in error path. This caused a NULL pointer dereference in mlx5 which removed the support for LOCAL_DMA_LKEY. Fixes: bb6c96d7 ("xprtrdma: Replace global lkey with lkey local to PD") Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 30 9月, 2015 1 次提交
-
-
由 Steve Wise 提交于
The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions were not taking into account the initial page_offset when determining the rdma read length. This resulted in a read who's starting address and length exceeded the base/bounds of the frmr. The server gets an async error from the rdma device and kills the connection, and the client then reconnects and resends. This repeats indefinitely, and the application hangs. Most work loads don't tickle this bug apparently, but one test hit it every time: building the linux kernel on a 16 core node with 'make -j 16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA. This bug seems to only be tripped with devices having small fastreg page list depths. I didn't see it with mlx4, for instance. Fixes: 0bf48289 ('svcrdma: refactor marshalling logic') Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NChuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 28 9月, 2015 1 次提交
-
-
由 Steve Wise 提交于
Otherwise a FRMR completion can cause a touch-after-free crash. In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling rpcrdma_ep_destroy(). In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the qp, then drain the cqs, then destroy the qp, and finally destroy the cqs. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-