提交 · 0d3ebb9ae9f9c887518fd4c81a68084111d154d7 · openanolis / cloud-kernel

14 8月, 2008 1 次提交

svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet · 24b8b447

由 Tom Tucker 提交于 8月 13, 2008

RDMA_READ completions are kept on a separate queue from the general
I/O request queue. Since a separate lock is used to protect the RDMA_READ
completion queue, a race exists between the dto_tasklet and the
svc_rdma_recvfrom thread where the dto_tasklet sets the XPT_DATA
bit and adds I/O to the read-completion queue. Concurrently, the
recvfrom thread checks the generic queue, finds it empty and resets
the XPT_DATA bit. A subsequent svc_xprt_enqueue will fail to enqueue
the transport for I/O and cause the transport to "stall".

The fix is to protect both lists with the same lock and set the XPT_DATA
bit with this lock held.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

24b8b447

03 7月, 2008 7 次提交

svcrdma: Change WR context get/put to use the kmem cache · 8948896c

由 Tom Tucker 提交于 5月 28, 2008

Change the WR context pool to be shared across mount points. This
reduces the RDMA transport memory footprint significantly since
idle mounts don't consume WR context memory.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

8948896c

svcrdma: Limit ORD based on client's advertised IRD · 36ef25e4

由 Tom Tucker 提交于 5月 19, 2008

When adapters have differing IRD limits, the RDMA transport will fail to
connect properly. The RDMA transport should use the client's advertised
inbound read limit when computing its outbound read limit. For iWARP
transports, there is currently no standard for exchanging IRD/ORD
during connection establishment so the 'responder_resources' field in the
connect event is the local device's limit. The RDMA transport can be
configured to use a smaller ORD by writing the desired number to the
/proc/sys/sunrpc/svc_rdma/max_outbound_read_requests file.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

36ef25e4

svcrdma: Remove unneeded spin locks from __svc_rdma_free · 94dba491

由 Tom Tucker 提交于 5月 28, 2008

At the time __svc_rdma_free is called, we are guaranteed that all references
to this transport are gone. There is, therefore, no need to protect the
resource lists with a spin lock.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

94dba491

svcrdma: Add dma map count and WARN_ON · 87295b6c

由 Tom Tucker 提交于 5月 28, 2008

Add a dma map count in order to verify that all DMA mapping resources
have been freed when the transport is closed.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

87295b6c

svcrdma: Move the DMA unmap logic to the CQ handler · e6ab9143

由 Tom Tucker 提交于 5月 28, 2008

Separate DMA unmap from context destruction and perform DMA unmapping
in the SQ/RQ CQ reap functions. This is necessary to support software
based RDMA implementations that actually copy the data in their
ib_dma_unmap callback functions and architectures that don't have
cache coherent I/O busses.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

e6ab9143

svcrdma: Use RPC reply map for RDMA_WRITE processing · 34d16e42

由 Tom Tucker 提交于 7月 02, 2008

Use the new svc_rdma_req_map data type for mapping the client side memory
to the server side memory. Move the DMA mapping to the context pointed to
by each WR individually so that it is unmapped after the WR completes.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

34d16e42

svcrdma: Add a type for keeping NFS RPC mapping · ab96dddb

由 Tom Tucker 提交于 5月 28, 2008

Create a new data structure to hold the remote client address space
to local server address space mapping.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

ab96dddb

19 5月, 2008 17 次提交

svcrdma: Change svc_rdma_send_error return type to void · 008fdbc5

由 Tom Tucker 提交于 5月 07, 2008

The svc_rdma_send_error function is called when an RPCRDMA protocol
error is detected. This function attempts to post an error reply message.
Since an error posting to a transport in error is ignored, change
the return type to void.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

008fdbc5

svcrdma: Copy transport address and arm CQ before calling rdma_accept · af261af4

由 Tom Tucker 提交于 5月 07, 2008

This race was found by inspection. Messages can be received from the peer
immediately following the rdma_accept call, however, the CQ have not yet
been armed and the transport address has not yet been set.

Set the transport address in the connect request handler and arm the CQ
prior to calling rdma_accept.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

af261af4

svcrdma: Use ib verbs version of dma_unmap · 97a3df38

由 Tom Tucker 提交于 5月 01, 2008

Use the ib_verbs version of the dma_unmap service in the
svc_rdma_put_context function. This should support providers
using software rdma.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

97a3df38

svcrdma: Cleanup queued, but unprocessed I/O in svc_rdma_free · 356d0a15

由 Tom Tucker 提交于 5月 01, 2008

When the transport is closing, the DTO tasklet may queue data
that never gets processed. Clean up resources associated with
this I/O.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

356d0a15

svcrdma: Move the QP and cm_id destruction to svc_rdma_free · 1711386c

由 Tom Tucker 提交于 5月 01, 2008

Move the destruction of the QP and CM_ID to the free path so that the
QP cleanup code doesn't race with the dto_tasklet handling flushed WR.
The QP reference is not needed because we now have a reference for
every WR.

Also add a guard in the SQ and RQ completion handlers to ignore
calls generated by some providers when the QP is destroyed.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

1711386c

svcrdma: Add reference for each SQ/RQ WR · 0905c0f0

由 Tom Tucker 提交于 5月 01, 2008

Add a reference on the transport for every outstanding WR.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

0905c0f0

svcrdma: Move destroy to kernel thread · 8da91ea8

由 Tom Tucker 提交于 4月 30, 2008

Some providers may wait while destroying adapter resources.
Since it is possible that the last reference is put on the
dto_tasklet, the actual destroy must be scheduled as a work item.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

8da91ea8

svcrdma: Shrink scope of spinlock on RQ CQ · 47698e08

由 Tom Tucker 提交于 5月 06, 2008

The rq_cq_reap function is only called from the dto_tasklet. The
only resource shared with other threads is the sc_rq_dto_q. Move the
spin lock to protect only this list.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

47698e08

svcrdma: Use standard Linux lists for context cache · 87407673

由 Tom Tucker 提交于 4月 30, 2008

Replace the one-off linked list implementation used to implement the
context cache with the standard Linux list_head lists. Add a context
counter to catch resource leaks. A WARN_ON will be added later to
ensure that we've freed all contexts.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

87407673

svcrdma: Simplify RDMA_READ deferral buffer management · 02e7452d

由 Tom Tucker 提交于 4月 30, 2008

An NFS_WRITE requires a set of RDMA_READ requests to fetch the write
data from the client. There are two principal pieces of data that
need to be tracked: the list of pages that comprise the completed RPC
and the SGE of dma mapped pages to refer to this list of pages. Previously
this whole bit was managed as a linked list of contexts with the
context containing the page list buried in this list. This patch
simplifies this processing by not keeping a linked list, but rather only
a pionter from the last submitted RDMA_READ's context to the context
that maps the set of pages that describe the RPC. This significantly
simplifies this code path. SGE contexts are cleaned up inline in the DTO
path instead of at read completion time.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

02e7452d

svcrdma: Remove unused READ_DONE context flags bit · 10a38c33

由 Tom Tucker 提交于 4月 30, 2008

The RDMACTXT_F_READ_DONE bit is not longer used. Remove it.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

10a38c33

svcrdma: Fix error handling during listening endpoint creation · 58e8f621

由 Tom Tucker 提交于 5月 06, 2008

A listening endpoint isn't known to the generic transport switch until
the svc_create_xprt function returns without error. Calling
svc_xprt_put within the xpo_create function causes the module reference
count to be erroneously decremented.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

58e8f621

svcrdma: Free context on ib_post_recv error · 05a0826a

由 Tom Tucker 提交于 4月 25, 2008

If there is an error posting the recv WR to the RQ, free the
context associated with the WR. This would leak a context when
asynchronous errors occurred on the transport while conccurent threads
were processing their RPC.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

05a0826a

svcrdma: Add put of connection ESTABLISHED reference in rdma_cma_handler · 120693d1

由 Tom Tucker 提交于 4月 24, 2008

The svcrdma transport takes a reference when it gets the ESTABLISHED
event from the provider. This reference is supposed to be removed when
the DISCONNECT event is received, however, the call to svc_xprt_put
was missing in the switch statement. This results in the memory
associated with the transport never being freed.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

120693d1

svcrdma: Fix return value in svc_rdma_send · 9d6347ac

由 Tom Tucker 提交于 4月 25, 2008

Fix the return value on close to -ENOTCONN so caller knows to free context.
Also if a thread is waiting for free SQ space, check for close when waking
to avoid posting WR to a closing transport.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

9d6347ac

svcrdma: Fix race with dto_tasklet in svc_rdma_send · dbcd00eb

由 Tom Tucker 提交于 5月 06, 2008

The svc_rdma_send function will attempt to reap SQ WR to make room for
a new request if it finds the SQ full. This function races with the
dto_tasklet that also reaps SQ WR. To avoid polling and arming the CQ
unnecessarily move the test_and_clear_bit of the RDMAXPRT_SQ_PENDING
flag and arming of the CQ to the sq_cq_reap function.

Refactor the rq_cq_reap function to match sq_cq_reap so that the
code is easier to follow.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

dbcd00eb

svcrdma: Simplify receive buffer posting · 0e7f011a

由 Tom Tucker 提交于 4月 23, 2008

The svcrdma transport provider currently allocates receive buffers
to the RQ through the xpo_release_rqst method. This approach is overly
complicated since it means that the rqstp rq_xprt_ctxt has to be
selectively set based on whether the RPC is going to be processed
immediately or deferred. Instead, just post the receive buffer when
we are certain that we are replying in the send_reply function.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

0e7f011a

24 4月, 2008 1 次提交

SVCRDMA: Add check for XPT_CLOSE in svc_rdma_send · 830bb59b

由 Tom Tucker 提交于 3月 11, 2008

SVCRDMA: Add check for XPT_CLOSE in svc_rdma_send

The svcrdma transport can crash if a send is waiting for an
empty SQ slot and the connection is closed due to an asynchronous error.
The crash is caused when svc_rdma_send attempts to send on a deleted
QP.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

830bb59b

13 3月, 2008 1 次提交

SVCRDMA: Add xprt refs to fix close/unmount crash · c48cbb40

由 Tom Tucker 提交于 3月 11, 2008

RDMA connection shutdown on an SMP machine can cause a kernel crash due
to the transport close path racing with the I/O tasklet.

Additional transport references were added as follows:
- A reference when on the DTO Q to avoid having the transport
  deleted while queued for I/O.
- A reference while there is a QP able to generate events.
- A reference until the DISCONNECTED event is received on the CM ID
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c48cbb40

02 2月, 2008 1 次提交

rdma: SVCRDMA Core Transport Services · 377f9b2f

由 Tom Tucker 提交于 12月 12, 2007

This file implements the core transport data management and I/O
path. The I/O path for RDMA involves receiving callbacks on interrupt
context. Since all the svc transport locks are _bh locks we enqueue the
transport on a list, schedule a tasklet to dequeue data indications from
the RDMA completion queue. The tasklet in turn takes _bh locks to
enqueue receive data indications on a list for the transport. The
svc_rdma_recvfrom transport function dequeues data from this list in an
NFSD thread context.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Acked-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

377f9b2f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功