提交 · 3be7f32878e742cf3c17b435c90e198862457706 · openanolis / cloud-kernel

12 10月, 2015 1 次提交

svcrdma: Fix NFS server crash triggered by 1MB NFS WRITE · 3be7f328

由 Chuck Lever 提交于 10月 12, 2015

Now that the NFS server advertises a maximum payload size of 1MB
for RPC/RDMA again, it crashes in svc_process_common() when NFS
client sends a 1MB NFS WRITE on an NFS/RDMA mount.

The server has set up a 259 element array of struct page pointers
in rq_pages[] for each incoming request. The last element of the
array is NULL.

When an incoming request has been completely received,
rdma_read_complete() attempts to set the starting page of the
incoming page vector:

  rqstp->rq_arg.pages = &rqstp->rq_pages[head->hdr_count];

and the page to use for the reply:

  rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];

But the value of page_no has already accounted for head->hdr_count.
Thus rq_respages now points past the end of the incoming pages.

For NFS WRITE operations smaller than the maximum, this is harmless.
But when the NFS WRITE operation is as large as the server's max
payload size, rq_respages now points at the last entry in rq_pages,
which is NULL.

Fixes: cc9a903d ('svcrdma: Change maximum server payload . . .')
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NShirley Ma <shirley.ma@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3be7f328

30 9月, 2015 1 次提交

svcrdma: handle rdma read with a non-zero initial page offset · c91aed98

由 Steve Wise 提交于 9月 28, 2015

The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions
were not taking into account the initial page_offset when determining
the rdma read length.  This resulted in a read who's starting address
and length exceeded the base/bounds of the frmr.

The server gets an async error from the rdma device and kills the
connection, and the client then reconnects and resends.  This repeats
indefinitely, and the application hangs.

Most work loads don't tickle this bug apparently, but one test hit it
every time: building the linux kernel on a 16 core node with 'make -j
16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA.

This bug seems to only be tripped with devices having small fastreg page
list depths.  I didn't see it with mlx4, for instance.

Fixes: 0bf48289 ('svcrdma: refactor marshalling logic')
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c91aed98

29 8月, 2015 1 次提交

svcrdma: Use max_sge_rd for destination read depths · bc3fe2e3

由 Steve Wise 提交于 7月 27, 2015

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bc3fe2e3

05 6月, 2015 1 次提交

svcrdma: Keep rpcrdma_msg fields in network byte-order · 30b7e246

由 Chuck Lever 提交于 6月 04, 2015

Fields in struct rpcrdma_msg are __be32. Don't byte-swap these
fields when decoding RPC calls and then swap them back for the
reply. For the most part, they can be left alone.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

30b7e246

19 5月, 2015 2 次提交

IB/Verbs: Use management helper rdma_cap_read_multi_sge() · bc0f1d71

由 Michael Wang 提交于 5月 05, 2015

Introduce helper rdma_cap_read_multi_sge() to help us check if the port of an
IB device support RDMA Read Multiple Scatter-Gather Entries.
Signed-off-by: NMichael Wang <yun.wang@profitbricks.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bc0f1d71

IB/Verbs: Reform IB-ulp xprtrdma · 3de2c31c

由 Michael Wang 提交于 5月 05, 2015

Use raw management helpers to reform IB-ulp xprtrdma.
Signed-off-by: NMichael Wang <yun.wang@profitbricks.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3de2c31c

16 1月, 2015 8 次提交

svcrdma: Handle additional inline content · a97c331f

由 Chuck Lever 提交于 1月 13, 2015

Most NFS RPCs place their large payload argument at the end of the
RPC header (eg, NFSv3 WRITE). For NFSv3 WRITE and SYMLINK, RPC/RDMA
sends the complete RPC header inline, and the payload argument in
the read list. Data in the read list is the last part of the XDR
stream.

One important case is not like this, however. NFSv4 COMPOUND is a
counted array of operations. A WRITE operation, with its large data
payload, can appear in the middle of the compound's operations
array. Thus NFSv4 WRITE compounds can have header content after the
WRITE payload.

The Linux client, for example, performs an NFSv4 WRITE like this:

  { PUTFH, WRITE, GETATTR }

Though RFC 5667 is not precise about this, the proper way to convey
this compound is to place the GETATTR inline, _after_ the front of
the RPC header. The receiver inserts the read list payload into the
XDR stream after the initial WRITE arguments, and before the GETATTR
operation, thanks to the value of the read list "position" field.

The Linux client currently sends the GETATTR at the end of the
RPC/RDMA read list, which is incorrect. It will be corrected in the
future.

The Linux server currently rejects NFSv4 compounds with inline
content after the read list. For the above NFSv4 WRITE compound, the
NFS compound header indicates there are three operations, but the
server finds nonsense when it looks in the XDR stream for the third
operation, and the compound fails with OP_ILLEGAL.

Move trailing inline content to the end of the XDR buffer's page
list. This presents incoming NFSv4 WRITE compounds to NFSD in the
same way the socket transport does.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a97c331f

svcrdma: Move read list XDR round-up logic · fcbeced5

由 Chuck Lever 提交于 1月 13, 2015

This is a pre-requisite for a subsequent patch.

Read list XDR round-up needs to be done _before_ additional inline
content is copied to the end of the XDR buffer's page list. Move
the logic added by commit e560e3b5 ("svcrdma: Add zero padding
if the client doesn't send it").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

fcbeced5

svcrdma: Support RDMA_NOMSG requests · 0b056c22

由 Chuck Lever 提交于 1月 13, 2015

Currently the Linux server can not decode RDMA_NOMSG type requests.
Operations whose length exceeds the fixed size of RDMA SEND buffers,
like large NFSv4 CREATE(NF4LNK) operations, must be conveyed via
RDMA_NOMSG.

For an RDMA_MSG type request, the client sends the RPC/RDMA, RPC
headers, and some or all of the NFS arguments via RDMA SEND.

For an RDMA_NOMSG type request, the client sends just the RPC/RDMA
header via RDMA SEND. The request's read list contains elements for
the entire RPC message, including the RPC header.

NFSD expects the RPC/RMDA header and RPC header to be contiguous in
page zero of the XDR buffer. Add logic in the RDMA READ path to make
the read list contents land where the server prefers, when the
incoming message is a type RDMA_NOMSG message.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0b056c22

svcrdma: rc_position sanity checking · 61edbcb7

由 Chuck Lever 提交于 1月 13, 2015

An RPC/RDMA client may send large RPC arguments via a read
list. This is a list of scatter/gather elements which convey
RPC call arguments too large to fit in a small RDMA SEND.

Each entry in the read list has a "position" field, whose value is
the byte offset in the XDR stream where the data in that entry is to
be inserted. Entries which share the same "position" value make up
the same RPC argument. The receiver inserts entries with the same
position field value in list order into the XDR stream.

Currently the Linux NFS/RDMA server cannot handle receiving read
chunks in more than one position, mostly because no current client
sends read lists with elements in more than one position. As a
sanity check, ensure that all received chunks have the same
"rc_position."
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

61edbcb7

svcrdma: Plant reader function in struct svcxprt_rdma · e5452411

由 Chuck Lever 提交于 1月 13, 2015

The RDMA reader function doesn't change once an svcxprt_rdma is
instantiated. Instead of checking sc_devcap during every incoming
RPC, set the reader function once when the connection is accepted.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e5452411

svcrdma: Scrub BUG_ON() and WARN_ON() call sites · 3fe04ee9

由 Chuck Lever 提交于 1月 13, 2015

Current convention is to avoid using BUG_ON() in places where an
oops could cause complete system failure.

Replace BUG_ON() call sites in svcrdma with an assertion error
message and allow execution to continue safely.

Some BUG_ON() calls are removed because they have never fired in
production (that we are aware of).

Some WARN_ON() calls are also replaced where a back trace is not
helpful; e.g., in a workqueue task.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3fe04ee9

svcrdma: Clean up read chunk counting · 2397aa8b

由 Chuck Lever 提交于 1月 13, 2015

The byte_count argument is not used, and the function is called
only from one place.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2397aa8b

svcrdma: Clean up dprintk · 597561bf

由 Chuck Lever 提交于 1月 13, 2015

Nit: Fix inconsistent white space in dprintk messages.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

597561bf

23 7月, 2014 1 次提交

svcrdma: Add zero padding if the client doesn't send it · e560e3b5

由 Chuck Lever 提交于 7月 22, 2014

See RFC 5666 section 3.7: clients don't have to send zero XDR
padding.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=246Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e560e3b5

07 6月, 2014 2 次提交

svcrdma: Fence LOCAL_INV work requests · 83710fc7

由 Steve Wise 提交于 6月 05, 2014

Fencing forces the invalidate to only happen after all prior send
work requests have been completed.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Reported by : Devesh Sharma <Devesh.Sharma@Emulex.Com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

83710fc7

svcrdma: refactor marshalling logic · 0bf48289

由 Steve Wise 提交于 5月 28, 2014

This patch refactors the NFSRDMA server marshalling logic to
remove the intermediary map structures.  It also fixes an existing bug
where the NFSRDMA server was not minding the device fast register page
list length limitations.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

0bf48289

29 3月, 2014 1 次提交

Fix regression in NFSRDMA server · 7e4359e2

由 Tom Tucker 提交于 3月 25, 2014

The server regression was caused by the addition of rq_next_page
(afc59400). There were a few places that
were missed with the update of the rq_respages array.
Signed-off-by: NTom Tucker <tom@ogc.us>
Tested-by: NSteve Wise <swise@ogc.us>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7e4359e2

18 12月, 2012 1 次提交

nfsd4: cleanup: replace rq_resused count by rq_next_page pointer · afc59400

由 J. Bruce Fields 提交于 12月 10, 2012

It may be a matter of personal taste, but I find this makes the code
clearer.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

afc59400

18 2月, 2012 1 次提交

svcrdma: Cleanup sparse warnings in the svcrdma module · cec56c8f

由 Tom Tucker 提交于 2月 15, 2012

The svcrdma transport was un-marshalling requests in-place. This resulted
in sparse warnings due to __beXX data containing both NBO and HBO data.

The code has been restructured to do byte-swapping as the header is
parsed instead of when the header is validated immediately after receipt.

Also moved extern declarations for the workqueue and memory pools to the
private header file.
Signed-off-by: NTom Tucker <tom@ogc.us>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cec56c8f

19 10月, 2010 2 次提交

svcrdma: Cleanup DMA unmapping in error paths. · 4a84386f

由 Tom Tucker 提交于 10月 12, 2010

There are several error paths in the code that do not unmap DMA. This
patch adds calls to svc_rdma_unmap_dma to free these DMA contexts.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4a84386f

svcrdma: Change DMA mapping logic to avoid the page_address kernel API · b432e6b3

由 Tom Tucker 提交于 10月 12, 2010

There was logic in the send path that assumed that a page containing data
to send to the client has a KVA. This is not always the case and can result
in data corruption when page_address returns zero and we end up DMA mapping
zero.

This patch changes the bus mapping logic to avoid page_address() where
necessary and converts all calls from ib_dma_map_single to ib_dma_map_page
in order to keep the map/unmap calls symmetric.
Signed-off-by: NTom Tucker <tom@ogc.us>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b432e6b3

03 5月, 2010 1 次提交

sunrpc: centralise most calls to svc_xprt_received · b48fa6b9

由 Neil Brown 提交于 3月 01, 2010

svc_xprt_received must be called when ->xpo_recvfrom has finished
receiving a message, so that the XPT_BUSY flag will be cleared and
if necessary, requeued for further work.

This call is currently made in each ->xpo_recvfrom function, often
from multiple different points.  In each case it is the earliest point
on a particular path where it is known that the protection provided by
XPT_BUSY is no longer needed.

However there are (still) some error paths which do not call
svc_xprt_received, and requiring each ->xpo_recvfrom to make the call
does not encourage robustness.

So: move the svc_xprt_received call to be made just after the
call to ->xpo_recvfrom(), and move it of the various ->xpo_recvfrom
methods.

This means that it may not be called at the earliest possible instant,
but this is unlikely to be a measurable performance issue.

Note that there are still other calls to svc_xprt_received as it is
also needed when an xprt is newly created.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

b48fa6b9

30 11月, 2009 1 次提交

net: Move && and || to end of previous line · f64f9e71

由 Joe Perches 提交于 11月 29, 2009

Not including net/atm/

Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f64f9e71

16 6月, 2009 1 次提交

sunrpc: potential memory leak in function rdma_read_xdr · 59fb3066

由 Christian Engelmayer 提交于 6月 14, 2009

In case the check on ch_count fails the cleanup path is skipped and the
previously allocated memory 'rpl_map', 'chl_map' is not freed.

Reported by Coverity.
Signed-off-by: NChristian Engelmayer <christian.engelmayer@frequentis.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

59fb3066

26 4月, 2009 1 次提交

svcrdma: Fix dma map direction for rdma read targets · d0687be7

由 Steve Wise 提交于 4月 03, 2009

The nfs server rdma transport was mapping rdma read target pages for
TO_DEVICE instead of FROM_DEVICE.  This causes data corruption on non
cache-coherent systems if frmrs are used.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

d0687be7

15 12月, 2008 1 次提交

rpc/rdma: goto instead of copypaste · b1721d2b

由 Ilpo Järvinen 提交于 12月 14, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1721d2b

07 10月, 2008 1 次提交

svcrdma: Modify the RPC recv path to use FRMR when available · 146b6df6

由 Tom Tucker 提交于 8月 12, 2008

RPCRDMA requests that specify a read-list are fetched with RDMA_READ. Using
an FRMR to map the data sink improves NFSRDMA security on transports that
place the RDMA_READ data sink LKEY on the wire because the valid lifetime
of the MR is only the duration of the RDMA_READ. The LKEY is invalidated
when the last RDMA_READ WR completes.

Mapping the data sink also allows for very large amounts to data to be
fetched with a single WR, so if the client is also using FRMR, the entire
RPC read-list can be fetched with a single WR.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

146b6df6

14 8月, 2008 1 次提交

svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet · 24b8b447

由 Tom Tucker 提交于 8月 13, 2008

RDMA_READ completions are kept on a separate queue from the general
I/O request queue. Since a separate lock is used to protect the RDMA_READ
completion queue, a race exists between the dto_tasklet and the
svc_rdma_recvfrom thread where the dto_tasklet sets the XPT_DATA
bit and adds I/O to the read-completion queue. Concurrently, the
recvfrom thread checks the generic queue, finds it empty and resets
the XPT_DATA bit. A subsequent svc_xprt_enqueue will fail to enqueue
the transport for I/O and cause the transport to "stall".

The fix is to protect both lists with the same lock and set the XPT_DATA
bit with this lock held.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

24b8b447

03 7月, 2008 2 次提交

svcrdma: Add dma map count and WARN_ON · 87295b6c

由 Tom Tucker 提交于 5月 28, 2008

Add a dma map count in order to verify that all DMA mapping resources
have been freed when the transport is closed.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

87295b6c

svcrdma: Use reply and chunk map for RDMA_READ processing · f820c57e

由 Tom Tucker 提交于 5月 27, 2008

Modify the RDMA_READ processing to use the reply and chunk list mapping data
types. Also add a special purpose 'hdr_count' field in in the context to hold
the header page count instead of overloading the SGE length field and
corrupting the DMA map length.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

f820c57e

19 5月, 2008 7 次提交

svcrdma: Verify read-list fits within RPCSVC_MAXPAGES · a6f911c0

由 Tom Tucker 提交于 5月 13, 2008

A RDMA read-list cannot contain more elements than RPCSVC_MAXPAGES or
it will overflow the DTO context. Verify this when processing the
protocol header.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

a6f911c0

svcrdma: Change svc_rdma_send_error return type to void · 008fdbc5

由 Tom Tucker 提交于 5月 07, 2008

The svc_rdma_send_error function is called when an RPCRDMA protocol
error is detected. This function attempts to post an error reply message.
Since an error posting to a transport in error is ignored, change
the return type to void.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

008fdbc5

svcrdma: Set rqstp transport address in rdma_read_complete function · 69500c43

由 Tom Tucker 提交于 5月 07, 2008

The rdma_read_complete function needs to copy the rqstp transport address
from the transport. Failure to do so can result in using the wrong
authentication method for the RPC or bug checking if the rqstp address
is not valid.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

69500c43

svcrdma: Simplify RDMA_READ deferral buffer management · 02e7452d

由 Tom Tucker 提交于 4月 30, 2008

An NFS_WRITE requires a set of RDMA_READ requests to fetch the write
data from the client. There are two principal pieces of data that
need to be tracked: the list of pages that comprise the completed RPC
and the SGE of dma mapped pages to refer to this list of pages. Previously
this whole bit was managed as a linked list of contexts with the
context containing the page list buried in this list. This patch
simplifies this processing by not keeping a linked list, but rather only
a pionter from the last submitted RDMA_READ's context to the context
that maps the set of pages that describe the RPC. This significantly
simplifies this code path. SGE contexts are cleaned up inline in the DTO
path instead of at read completion time.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

02e7452d

svcrdma: Remove unused READ_DONE context flags bit · 10a38c33

由 Tom Tucker 提交于 4月 30, 2008

The RDMACTXT_F_READ_DONE bit is not longer used. Remove it.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

10a38c33

svcrdma: Return error from rdma_read_xdr so caller knows to free context · d16d4009

由 Tom Tucker 提交于 5月 06, 2008

The rdma_read_xdr function did not discriminate between no read-list and
an error posting the read-list. This results in a leak of a page if there
is an error posting the read-list.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

d16d4009

svcrdma: Simplify receive buffer posting · 0e7f011a

由 Tom Tucker 提交于 4月 23, 2008

The svcrdma transport provider currently allocates receive buffers
to the RQ through the xpo_release_rqst method. This approach is overly
complicated since it means that the rqstp rq_xprt_ctxt has to be
selectively set based on whether the RPC is going to be processed
immediately or deferred. Instead, just post the receive buffer when
we are certain that we are replying in the send_reply function.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>

0e7f011a

27 3月, 2008 1 次提交

SVCRDMA: Check num_sge when setting LAST_CTXT bit · c8237a5f

由 Tom Tucker 提交于 3月 25, 2008

The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly
when the last chunk in the read-list spanned multiple pages. This
resulted in a kernel panic when the wrong context was used to
build the RPC iovec page list.

RDMA_READ is used to fetch RPC data from the client for
NFS_WRITE requests. A scatter-gather is used to map the
advertised client side buffer to the server-side iovec and
associated page list.

WR contexts are used to convey which scatter-gather entries are
handled by each WR. When the write data is large, a single RPC may
require multiple RDMA_READ requests so the contexts for a single RPC
are chained together in a linked list. The last context in this list
is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes,
the CQ handler code can enqueue the RPC for processing.

The code in rdma_read_xdr was setting this bit on the last two
contexts on this list when the last read-list chunk spanned multiple
pages. This caused the svc_rdma_recvfrom logic to incorrectly build
the RPC and caused the kernel to crash because the second-to-last
context doesn't contain the iovec page list.

Modified the condition that sets this bit so that it correctly detects
the last context for the RPC.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Tested-by: NRoland Dreier <rolandd@cisco.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c8237a5f

25 3月, 2008 1 次提交

SVCRDMA: Use only 1 RDMA read scatter entry for iWARP adapters · d3073779

由 Roland Dreier 提交于 3月 24, 2008

The iWARP protocol limits RDMA read requests to a single scatter
entry.  NFS/RDMA has code in rdma_read_max_sge() that is supposed to
limit the sge_count for RDMA read requests to 1, but the code to do
that is inside an #ifdef RDMA_TRANSPORT_IWARP block.  In the mainline
kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a
preprocessor #define, so the #ifdef'ed code is never compiled.

In my test of a kernel build with -j8 on an NFS/RDMA mount, this
problem eventually leads to trouble starting with:

    svcrdma: Error posting send = -22
    svcrdma : RDMA_READ error = -22

and things go downhill from there.

The trivial fix is to delete the #ifdef guard.  The check seems to be
a remnant of when the NFS/RDMA code was not merged and needed to
compile against multiple kernel versions, although I don't think it
ever worked as intended.  In any case now that the code is upstream
there's no need to test whether the RDMA_TRANSPORT_IWARP constant is
defined or not.

Without this patch, my kernel build on an NFS/RDMA mount using NetEffect
adapters quickly and 100% reproducibly failed with an error like:

    ld: final link failed: Software caused connection abort

With the patch applied I was able to complete a kernel build on the
same setup.

(Tom Tucker says this is "actually an _ancient_ remnant when it had to
compile against iWARP vs. non-iWARP enabled OFA trees.")
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
Acked-by: NTom Tucker <tom@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d3073779

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功