提交 · 15ae1375ea91ae2dee6f12d71a79d8c0a10a30bf · openeuler / Kernel

17 6月, 2021 1 次提交

RDMA/rxe: Fix qp reference counting for atomic ops · 15ae1375

由 Bob Pearson 提交于 6月 04, 2021

Currently the rdma_rxe driver attempts to protect atomic responder
resources by taking a reference to the qp which is only freed when the
resource is recycled for a new read or atomic operation. This means that
in normal circumstances there is almost always an extra qp reference once
an atomic operation has been executed which prevents cleaning up the qp
and associated pd and cqs when the qp is destroyed.

This patch removes the call to rxe_add_ref() in send_atomic_ack() and the
call to rxe_drop_ref() in free_rd_atomic_resource(). If the qp is
destroyed while a peer is retrying an atomic op it will cause the
operation to fail which is acceptable.

Link: https://lore.kernel.org/r/20210604230558.4812-1-rpearsonhpe@gmail.comReported-by: NZhu Yanjun <zyjzyj2000@gmail.com>
Fixes: 86af6176 ("IB/rxe: remove unnecessary skb_clone")
Signed-off-by: NBob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

15ae1375

04 6月, 2021 1 次提交

RDMA/rxe: Protext kernel index from user space · 5bcf5a59

由 Bob Pearson 提交于 5月 27, 2021

In order to prevent user space from modifying the index that belongs to
the kernel for shared queues let the kernel use a local copy of the index
and copy any new values of that index to the shared rxe_queue_bus struct.

This adds more switch statements which decreases the performance of the
queue API. Move the type into the parameter list for these functions so
that the compiler can optimize out the switch statements when the explicit
type is known. Modify all the calls in the driver on performance paths to
pass in the explicit queue type.

Link: https://lore.kernel.org/r/20210527194748.662636-4-rpearsonhpe@gmail.com
Link: https://lore.kernel.org/linux-rdma/20210526165239.GP1002214@@nvidia.com/Signed-off-by: NBob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

5bcf5a59

09 4月, 2021 1 次提交

RDMA/rxe: Fix missing acks from responder · ea492251

由 Bob Pearson 提交于 4月 01, 2021

All responder errors from request packets that do not consume a receive
WQE fail to generate acks for RC QPs. This patch corrects this behavior
by making the flow follow the same path as request packets that do consume
a WQE after the completion.

Link: https://lore.kernel.org/r/20210402001016.3210-1-rpearson@hpe.com
Link: https://lore.kernel.org/linux-rdma/1a7286ac-bcea-40fb-2267-480134dd301b@gmail.com/Signed-off-by: NBob Pearson <rpearson@hpe.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

ea492251

31 3月, 2021 1 次提交

RDMA/rxe: Split MEM into MR and MW · 364e282c

由 Bob Pearson 提交于 3月 25, 2021

In the original rxe implementation it was intended to use a common object
to represent MRs and MWs but they are different enough to separate these
into two objects.

This allows replacing the mem name with mr for MRs which is more
consistent with the style for the other objects and less likely to be
confusing. This is a long patch that mostly changes mem to mr where it
makes sense and adds a new rxe_mw struct.

Link: https://lore.kernel.org/r/20210325212425.2792-1-rpearson@hpe.comSigned-off-by: NBob Pearson <rpearson@hpe.com>
Acked-by: NZhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

364e282c

17 2月, 2021 1 次提交

RDMA/rxe: Remove unused pkt->offset · bf139b58

由 Bob Pearson 提交于 2月 11, 2021

The pkt->offset field is never used except to assign it to 0. But it adds
lots of unneeded code. This patch removes the field and related code. This
causes a measurable improvement in performance.

Link: https://lore.kernel.org/r/20210211210455.3274-1-rpearson@hpe.comSigned-off-by: NBob Pearson <rpearson@hpe.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

bf139b58

09 2月, 2021 1 次提交

RDMA/rxe: Fix FIXME in rxe_udp_encap_recv() · 899aba89

由 Bob Pearson 提交于 1月 28, 2021

rxe_udp_encap_recv() drops the reference to rxe->ib_dev taken by
rxe_get_dev_from_net() which should be held until each received skb is
freed. This patch moves the calls to ib_device_put() to each place a
received skb is freed. It also takes references to the ib_device for each
cloned skb created to process received multicast packets.

Fixes: 4c173f59 ("RDMA/rxe: Use ib_device_get_by_netdev() instead of open coding")
Link: https://lore.kernel.org/r/20210128233318.2591-1-rpearson@hpe.comSigned-off-by: NBob Pearson <rpearson@hpe.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

899aba89

21 1月, 2021 1 次提交

Revert "RDMA/rxe: Remove VLAN code leftovers from RXE" · f1b0a8ea

由 Martin Wilck 提交于 1月 20, 2021

This reverts commit b2d24404.

It's true that creating rxe on top of 802.1q interfaces doesn't work.
Thus, commit fd49ddaf ("RDMA/rxe: prevent rxe creation on top of vlan
interface") was absolutely correct.

But b2d24404 was incorrect assuming that with this change, RDMA and
VLAN don't work togehter at all. It just has to be set up
differently. Rather than creating rxe on top of the VLAN interface, rxe
must be created on top of the physical interface.  RDMA then works just
fine through VLAN interfaces on top of that physical interface, via the
"upper device" logic.

This is hard to see in the rxe logic because it never talks about vlan,
but instead rxe carefully selects upper vlan netdevices when working with
packets which in turn imply certain vlan tagging. This is all done
correctly and interacts with the gid table with VLAN support the same as
real HW does.

b2d24404 broke this setup deliberately and should thus be
reverted. Also, b2d24404 removed rxe_dma_device(), so adapt the revert
to discard that hunk.

Fixes: b2d24404 ("RDMA/rxe: Remove VLAN code leftovers from RXE")
Link: https://lore.kernel.org/r/20210120161913.7347-1-mwilck@suse.comSigned-off-by: NMartin Wilck <mwilck@suse.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

f1b0a8ea

12 11月, 2020 1 次提交

RDMA/rxe: Remove VLAN code leftovers from RXE · b2d24404

由 Zhu Yanjun 提交于 11月 02, 2020

Since the commit fd49ddaf ("RDMA/rxe: prevent rxe creation on top of
vlan interface") does not permit rxe on top of vlan device, all the stuff
related with vlan should be removed.

Fixes: fd49ddaf ("RDMA/rxe: prevent rxe creation on top of vlan interface")
Link: https://lore.kernel.org/r/1604326422-18625-1-git-send-email-yanjunz@nvidia.comSigned-off-by: NZhu Yanjun <yanjunz@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

b2d24404

31 8月, 2020 1 次提交

RDMA/rxe: Add SPDX hdrs to rxe source files · 63fa15db

由 Bob Pearson 提交于 8月 27, 2020

Add SPDX headers to all rxe .c and .h files.

Link: https://lore.kernel.org/r/20200827145439.2273-1-rpearson@hpe.comSigned-off-by: NBob Pearson <rpearson@hpe.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

63fa15db

10 12月, 2019 1 次提交

rxe: correctly calculate iCRC for unaligned payloads · 2030abdd

由 Steve Wise 提交于 12月 02, 2019

If RoCE PDUs being sent or received contain pad bytes, then the iCRC
is miscalculated, resulting in PDUs being emitted by RXE with an incorrect
iCRC, as well as ingress PDUs being dropped due to erroneously detecting
a bad iCRC in the PDU. The fix is to include the pad bytes, if any,
in iCRC computations.

Note: This bug has caused broken on-the-wire compatibility with actual
hardware RoCE devices since the soft-RoCE driver was first put into the
mainstream kernel. Fixing it will create an incompatibility with the
original soft-RoCE devices, but is necessary to be compatible with real
hardware devices.

Fixes: 8700e3e7 ("Soft RoCE driver")
Signed-off-by: NSteve Wise <larrystevenwise@gmail.com>
Link: https://lore.kernel.org/r/20191203020319.15036-2-larrystevenwise@gmail.comSigned-off-by: NDoug Ledford <dledford@redhat.com>

2030abdd

09 7月, 2019 1 次提交

RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM · bdce1290

由 Konstantin Taranov 提交于 6月 27, 2019

Calculate the correct byte_len on the receiving side when a work
completion is generated with IB_WC_RECV_RDMA_WITH_IMM opcode.

According to the IBA byte_len must indicate the number of written bytes,
whereas it was always equal to zero for the IB_WC_RECV_RDMA_WITH_IMM
opcode, even though data was transferred.

Fixes: 8700e3e7 ("Soft RoCE driver")
Signed-off-by: NKonstantin Taranov <konstantin.taranov@inf.ethz.ch>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

bdce1290

22 1月, 2019 1 次提交

IB/rxe: Remove unnecessary rxe variable · 9802c335

由 Zhu Yanjun 提交于 1月 20, 2019

The variable rxe in the function is not used. So it is removed.
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

9802c335

09 11月, 2018 2 次提交

IB/rxe: move the variable into the function that uses it · a854b1e8

由 Zhu Yanjun 提交于 11月 03, 2018

The variable rxe is only used in the function rxe_xmit_packet, and the
caller functions do not use it. So move this variable into the function
rxe_xmit_packet.
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

a854b1e8

RDMA/rxe: Add link_down, rdma_sends, rdma_recvs stats counters · 6e5559b2

由 Andrew Boyer 提交于 11月 01, 2018

link_down is self-explanatory.

rdma_sends and rdma_recvs count the number of RDMA Send and RDMA Receive
operations completed successfully. This is different from the existing
sent_pkts and rcvd_pkts counters because the existing counters measure
packets, not RDMA operations.

ack_deffered is renamed to ack_deferred to fix the spelling.

out_of_sequence is renamed to out_of_seq_request to make clear that it is
counting only requests and not other packets which can be out of sequence.
Signed-off-by: NAndrew Boyer <andrew.boyer@dell.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

6e5559b2

07 11月, 2018 2 次提交

rxe: fix error completion wr_id and qp_num · e48d8ed9

由 Sagi Grimberg 提交于 10月 25, 2018

Error completions must still contain a valid wr_id and
qp_num such that the consumer can rely on. Correctly
fill these fields in receive error completions.
Reported-by: NWalker Benjamin <benjamin.walker@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Tested-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e48d8ed9

IB/rxe: clean skb queue directly · 4e588c8d

由 Zhu Yanjun 提交于 10月 19, 2018

When resp is in error state, the queued SKBs will not be handled.
The function get_req cleans up the skb queue directly.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4e588c8d

31 8月, 2018 2 次提交

IB/rxe: fix for duplicate request processing and ack psns · b97db585

由 Vijay Immanuel 提交于 6月 12, 2018

Don't reset the resp opcode for a replayed read response.
The resp opcode could be in the middle of a write or send
sequence, when the duplicate read request was received.
An example sequence is as follows:
- Receive read request for 12KB PSN 20. Transmit read response
  first, middle and last with PSNs 20,21,22.
- Receive write first PSN 23.
  At this point the resp psn is 24 and resp opcode is write first.
- The sender notices that PSN 20 is dropped and retransmits.
  Receive read request for 12KB PSN 20. Transmit read response
  first, middle and last with PSNs 20,21,22. The resp opcode is
  set to -1, the resp psn remains 24.
- Receive write first PSN 23. This is processed by duplicate_request().
  The resp opcode remains -1 and resp psn remains 24.
- Receive write middle PSN 24. check_op_seq() reports a missing
  first error since the resp opcode is -1.

When sending an ack for a duplicate send or write request,
use the psn of the previous ack sent. Do not use the psn
of a read response for the ack.
An example sequence is as follows:
- Receive write PSN 30. Transmit ACK for PSN 30.
- Receive read request 4KB PSN 31. Transmit read response with
  PSN 31. The resp psn is now 32.
- The sender notices that PSN 30 is dropped and retransmits.
  Receive write PSN 30. duplicate_request() sends an ACK with
  PSN 31. That is incorrect since PSN 31 was a read request.
Signed-off-by: NVijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b97db585

IB/rxe: Simplify rxe_find_route() to avoid GID query for netdev · 3db2bceb

由 Parav Pandit 提交于 8月 28, 2018

rxe_prepare() is called on an skb which has ndev already initialized by
rxe_init_packet().
Therefore avoid querying the GID attribute again and use the available
netdevice from the skb->dev.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Tested-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3db2bceb

19 6月, 2018 1 次提交

IB/rxe: support for 802.1q VLAN on the listener · 92cf36ee

由 Vijay Immanuel 提交于 6月 12, 2018

Set the vlan flag and vlan_id field in the wc for rdma_listen()
to work over VLAN. This is required by ib_init_ah_attr_from_wc()
which is called by the CM REQ handler.
Signed-off-by: NVijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: NYonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

92cf36ee

28 4月, 2018 2 次提交

IB/rxe: avoid double kfree_skb · 9fd4350b

由 Zhu Yanjun 提交于 4月 26, 2018

When skb is sent, it will pass the following functions in soft roce.

rxe_send [rdma_rxe]
    ip_local_out
        __ip_local_out
        ip_output
            ip_finish_output
                ip_finish_output2
                    dev_queue_xmit
                        __dev_queue_xmit
                            dev_hard_start_xmit

In the above functions, if error occurs in the above functions or
iptables rules drop skb after ip_local_out, kfree_skb will be called.
So it is not necessary to call kfree_skb in soft roce module again.
Or else crash will occur.

The steps to reproduce:

     server                       client
    ---------                    ---------
    |1.1.1.1|<----rxe-channel--->|1.1.1.2|
    ---------                    ---------

On server: rping -s -a 1.1.1.1 -v -C 10000 -S 512
On client: rping -c -a 1.1.1.1 -v -C 10000 -S 512

The kernel configs CONFIG_DEBUG_KMEMLEAK and
CONFIG_DEBUG_OBJECTS are enabled on both server and client.

When rping runs, run the following command in server:

iptables -I OUTPUT -p udp  --dport 4791 -j DROP

Without this patch, crash will occur.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9fd4350b

IB/rxe: remove unused function variable · e12ee8ce

由 Zhu Yanjun 提交于 4月 23, 2018

In the functions rxe_mem_init_dma, rxe_mem_init_user, rxe_mem_init_fast
and copy_data, the function variable rxe is not used. So this function
variable rxe is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e12ee8ce

20 4月, 2018 2 次提交

IB/rxe: replace refcount_inc with skb_get · fe896ceb

由 Zhu Yanjun 提交于 4月 10, 2018

Follow the advice from Bart, the function refcount_inc is replaced
with skb_get in commit 99dae690 ("IB/rxe: optimize mcast recv process")
and commit 86af6176 ("IB/rxe: remove unnecessary skb_clone").

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Suggested-by: NBart Van Assche <Bart.VanAssche@wdc.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fe896ceb

IB/rxe: optimize the function duplicate_request · 2e473507

由 Zhu Yanjun 提交于 4月 10, 2018

In the function duplicate_request, the reference of skb can be increased
to replace the function skb_clone.

This will make rxe performace better and save memory.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2e473507

08 3月, 2018 1 次提交

IB/rxe: remove unnecessary skb_clone · 86af6176

由 Zhu Yanjun 提交于 2月 27, 2018

In send_atomic_ack function, it is not necessary to make a
skb_clone. To gain better performance (high throughput and
low latency), this skb_clone is removed.

The following tests are made.

 server                       client
---------                    ---------
|1.1.1.1|<----rxe-channel--->|1.1.1.2|
---------                    ---------

On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512
On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512

The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server
and client.

This test runs for several hours. There is no memory leak and the whole
system can work well.

Based on the above network, the following tests are made.

Server: ibv_rc_pingpong -d rxe0 -g 1
Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1

The test results on Server(10 tests are made).
Before:
Throughput is 137.07 Mbit/sec
Latency is 517.76 usec/iter

After:
Throughput is 148.85 Mbit/sec
Latency is 476.64 usec/iter

The throughput is enhanced and the latency is reduced.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

86af6176

19 1月, 2018 1 次提交

RDMA/rxe: Fix a race condition in rxe_requester() · 65567e41

由 Bart Van Assche 提交于 1月 12, 2018

The rxe driver works as follows:
* The send queue, receive queue and completion queues are implemented as
  circular buffers.
* ib_post_send() and ib_post_recv() calls are serialized through a spinlock.
* Removing elements from various queues happens from tasklet
  context. Tasklets are guaranteed to run on at most one CPU. This serializes
  access to these queues. See also rxe_completer(), rxe_requester() and
  rxe_responder().
* rxe_completer() processes the skbs queued onto qp->resp_pkts.
* rxe_requester() handles the send queue (qp->sq.queue).
* rxe_responder() processes the skbs queued onto qp->req_pkts.

Since rxe_drain_req_pkts() processes qp->req_pkts, calling
rxe_drain_req_pkts() from rxe_requester() is racy. Hence this patch.
Reported-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: NDoug Ledford <dledford@redhat.com>

65567e41

16 1月, 2018 1 次提交

RDMA: Mark imm_data as be32 in the verbs uapi header · c966ea12

由 Jason Gunthorpe 提交于 1月 11, 2018

This matches what the userspace copy of this header has been doing
for a while. imm_data is an opaque 4 byte array carried over the network,
and invalidate_rkey is in CPU byte order.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c966ea12

29 8月, 2017 1 次提交

IB/rxe: Fix up the responder's find_resources() function · d45d2956

由 Andrew Boyer 提交于 8月 28, 2017

The resource array is sized by max_dest_rd_atomic, not max_rd_atomic.
Iterating over max_rd_atomic entries of qp->resp.resources[] will cause
incorrect behavior when the two attributes are different (or even
crash if max_rd_atomic is larger).

Fixes: 8700e3e7 ("Soft RoCE driver")
Signed-off-by: NAndrew Boyer <andrew.boyer@dell.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d45d2956

20 7月, 2017 1 次提交

rxe: fix broken receive queue draining · 12171971

由 Vijay Immanuel 提交于 6月 27, 2017

If we modified the qp to ERROR state, and
drained the recieve queue, post_recv must
trigger the responder task to complete
the drain work request.

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NVijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>--
Signed-off-by: NDoug Ledford <dledford@redhat.com>

12171971

13 7月, 2017 1 次提交

IB/rxe: do not copy extra stack memory to skb · 4c93496f

由 Kees Cook 提交于 7月 12, 2017

This fixes a over-read condition detected by FORTIFY_SOURCE for this
line:

	memcpy(SKB_TO_PKT(skb), &ack_pkt, sizeof(skb->cb));

The error was:

  In file included from ./include/linux/bitmap.h:8:0,
                   from ./include/linux/cpumask.h:11,
                   from ./include/linux/mm_types_task.h:13,
                   from ./include/linux/mm_types.h:4,
                   from ./include/linux/kmemcheck.h:4,
                   from ./include/linux/skbuff.h:18,
                   from drivers/infiniband/sw/rxe/rxe_resp.c:34:
  In function 'memcpy',
      inlined from 'send_atomic_ack.constprop' at drivers/infiniband/sw/rxe/rxe_resp.c:998:2,
      inlined from 'acknowledge' at drivers/infiniband/sw/rxe/rxe_resp.c:1026:3,
      inlined from 'rxe_responder' at drivers/infiniband/sw/rxe/rxe_resp.c:1286:10:
  ./include/linux/string.h:309:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
      __read_overflow2();

Daniel Micay noted that struct rxe_pkt_info is 32 bytes on 32-bit
architectures, but skb->cb is still 64.  The memcpy() over-reads 32
bytes.  This fixes it by zeroing the unused bytes in skb->cb.

Link: http://lkml.kernel.org/r/1497903987-21002-5-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
Cc: Moni Shoua <monis@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c93496f

02 5月, 2017 1 次提交

IB/rxe: Don't clamp residual length to mtu · d5241850

由 Johannes Thumshirn 提交于 4月 06, 2017

When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
header into the qp->resp.resid variable for later use. Later in check_rkey()
we clamp it to the MTU if the packet is an  RDMA WRITE packet and has a
residual length bigger than the MTU. Later in write_data_in() we subtract the
payload of the packet from the residual length. If the packet happens to have a
payload of exactly the MTU size we end up with a residual length of 0 despite
the packet not being the last in the conversation. When the next packet in the
conversation arrives, we don't have any residual length left and thus set the QP
into an error state.

This broke NVMe over Fabrics functionality over rdma_rxe.ko

The patch was verified using the following test.

 # echo eth0 > /sys/module/rdma_rxe/parameters/add
 # nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test
 # mkfs.xfs -fK /dev/nvme0n1
 meta-data=/dev/nvme0n1           isize=256    agcount=4, agsize=65536 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=0        finobt=0, sparse=0
 data     =                       bsize=4096   blocks=262144, imaxpct=25
          =                       sunit=0      swidth=0 blks
 naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
 log      =internal log           bsize=4096   blocks=2560, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
 realtime =none                   extsz=4096   blocks=0, rtextents=0
 # mount /dev/nvme0n1 /tmp/
 [  148.923263] XFS (nvme0n1): Mounting V4 Filesystem
 [  148.961196] XFS (nvme0n1): Ending clean mount
 # dd if=/dev/urandom of=test.bin bs=1M count=128
 128+0 records in
 128+0 records out
 134217728 bytes (134 MB, 128 MiB) copied, 0.437991 s, 306 MB/s
 # sha256sum test.bin
 cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  test.bin
 # cp test.bin /tmp/
 sha256sum /tmp/test.bin
 cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  /tmp/test.bin
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Max Gurtovoy <maxg@mellanox.com>
Acked-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d5241850

21 4月, 2017 1 次提交

IB/rxe: Add port protocol stats · 0b1e5b99

由 Yonatan Cohen 提交于 3月 10, 2017

Expose new counters using the get_hw_stats callback.
We expose the following counters:

+---------------------+----------------------------------------+
|      Name           |           Description                  |
|---------------------+----------------------------------------|
|sent_pkts            | number of sent pkts                    |
|---------------------+----------------------------------------|
|rcvd_pkts            | number of received packets             |
|---------------------+----------------------------------------|
|out_of_sequence      | number of errors due to packet         |
|                     | transport sequence number              |
|---------------------+----------------------------------------|
|duplicate_request    | number of received duplicated packets. |
|                     | A request that previously executed is  |
|                     | named duplicated.                      |
|---------------------+----------------------------------------|
|rcvd_rnr_err         | number of received RNR by completer    |
|---------------------+----------------------------------------|
|send_rnr_err         | number of sent RNR by responder        |
|---------------------+----------------------------------------|
|rcvd_seq_err         | number of out of sequence packets      |
|                     | received                               |
|---------------------+----------------------------------------|
|ack_deffered         | number of deferred handling of ack     |
|                     | packets.                               |
|---------------------+----------------------------------------|
|retry_exceeded_err   | number of times retry exceeded         |
|---------------------+----------------------------------------|
|completer_retry_err  | number of times completer decided to   |
|                     | retry                                  |
|---------------------+----------------------------------------|
|send_err             | number of failed send packet           |
+---------------------+----------------------------------------+
Signed-off-by: NYonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NAndrew Boyer <andrew.boyer@dell.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b1e5b99

25 3月, 2017 1 次提交

IB/rxe: increment msn only when completing a request · 9fcd67d1

由 David Marchand 提交于 2月 24, 2017

According to C9-147, MSN should only be incremented when the last packet of
a multi packet request has been received.

"Logically, the requester associates a sequential Send Sequence Number
(SSN) with each WQE posted to the send queue. The SSN bears a one-
to-one relationship to the MSN returned by the responder in each re-
sponse packet. Therefore, when the requester receives a response, it in-
terprets the MSN as representing the SSN of the most recent request
completed by the responder to determine which send WQE(s) can be
completed."

Fixes: 8700e3e7 ("Soft RoCE driver")
Signed-off-by: NDavid Marchand <david.marchand@6wind.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9fcd67d1

09 2月, 2017 1 次提交

IB/rxe: Fix resid update · 628f07d3

由 Eyal Itkin 提交于 2月 07, 2017

Update the response's resid field when larger than MTU, instead of only
updating the local resid variable.

Fixes: 8700e3e7 ("Soft RoCE driver")
Signed-off-by: NEyal Itkin <eyal.itkin@gmail.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

628f07d3

11 1月, 2017 6 次提交

IB/rxe: Remove a pointless indirection layer · 839f5ac0

由 Bart Van Assche 提交于 1月 10, 2017

Neither rxe->ifc_ops nor any of the function pointers in struct
struct rxe_ifc_ops ever change. Hence remove the rxe->ifc_ops
indirection mechanism.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NAndrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

839f5ac0

IB/rxe: Fix reference leaks in memory key invalidation code · ab176544

由 Bart Van Assche 提交于 1月 10, 2017

Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NAndrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ab176544

IB/rxe: Fix a MR reference leak in check_rkey() · b3a45996

由 Bart Van Assche 提交于 1月 10, 2017

Avoid that calling check_rkey() for mem->state == RXE_MEM_STATE_FREE
triggers an MR reference leak.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NAndrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b3a45996

IB/rxe: Generate a completion for all failed work requests · 18d3451c

由 Bart Van Assche 提交于 1月 10, 2017

Change do_complete() such that an error completion is not only
generated if a QP is in the error state but also if a work request
failed.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NAndrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

18d3451c

IB/rxe: Introduce functions for queue draining · 723ec9ae

由 Bart Van Assche 提交于 1月 10, 2017

This change makes the code easier to read and avoids that code is
duplicated.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NAndrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

723ec9ae

IB/rxe: Issue warnings once · 43553b47

由 Bart Van Assche 提交于 1月 10, 2017

It is strongly recommended to report kernel warnings once instead
of every time a condition is hit. Hence change WARN_ON() into
WARN_ON_ONCE() / BUILD_BUG_ON() as appropriate.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NAndrew Boyer <andrew.boyer@dell.com>
Cc: Moni Shoua <monis@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

43553b47

23 12月, 2016 1 次提交
- A
  IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sends · 37b36193
  由 Andrew Boyer 提交于 12月 22, 2016
```
Signed-off-by: NAndrew Boyer <andrew.boyer@dell.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
```
  37b36193

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功