1. 17 6月, 2021 1 次提交
  2. 04 6月, 2021 1 次提交
  3. 09 4月, 2021 1 次提交
  4. 31 3月, 2021 1 次提交
  5. 17 2月, 2021 1 次提交
  6. 09 2月, 2021 1 次提交
  7. 21 1月, 2021 1 次提交
    • M
      Revert "RDMA/rxe: Remove VLAN code leftovers from RXE" · f1b0a8ea
      Martin Wilck 提交于
      This reverts commit b2d24404.
      
      It's true that creating rxe on top of 802.1q interfaces doesn't work.
      Thus, commit fd49ddaf ("RDMA/rxe: prevent rxe creation on top of vlan
      interface") was absolutely correct.
      
      But b2d24404 was incorrect assuming that with this change, RDMA and
      VLAN don't work togehter at all. It just has to be set up
      differently. Rather than creating rxe on top of the VLAN interface, rxe
      must be created on top of the physical interface.  RDMA then works just
      fine through VLAN interfaces on top of that physical interface, via the
      "upper device" logic.
      
      This is hard to see in the rxe logic because it never talks about vlan,
      but instead rxe carefully selects upper vlan netdevices when working with
      packets which in turn imply certain vlan tagging. This is all done
      correctly and interacts with the gid table with VLAN support the same as
      real HW does.
      
      b2d24404 broke this setup deliberately and should thus be
      reverted. Also, b2d24404 removed rxe_dma_device(), so adapt the revert
      to discard that hunk.
      
      Fixes: b2d24404 ("RDMA/rxe: Remove VLAN code leftovers from RXE")
      Link: https://lore.kernel.org/r/20210120161913.7347-1-mwilck@suse.comSigned-off-by: NMartin Wilck <mwilck@suse.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      f1b0a8ea
  8. 12 11月, 2020 1 次提交
  9. 31 8月, 2020 1 次提交
  10. 10 12月, 2019 1 次提交
  11. 09 7月, 2019 1 次提交
  12. 22 1月, 2019 1 次提交
  13. 09 11月, 2018 2 次提交
  14. 07 11月, 2018 2 次提交
  15. 31 8月, 2018 2 次提交
    • V
      IB/rxe: fix for duplicate request processing and ack psns · b97db585
      Vijay Immanuel 提交于
      Don't reset the resp opcode for a replayed read response.
      The resp opcode could be in the middle of a write or send
      sequence, when the duplicate read request was received.
      An example sequence is as follows:
      - Receive read request for 12KB PSN 20. Transmit read response
        first, middle and last with PSNs 20,21,22.
      - Receive write first PSN 23.
        At this point the resp psn is 24 and resp opcode is write first.
      - The sender notices that PSN 20 is dropped and retransmits.
        Receive read request for 12KB PSN 20. Transmit read response
        first, middle and last with PSNs 20,21,22. The resp opcode is
        set to -1, the resp psn remains 24.
      - Receive write first PSN 23. This is processed by duplicate_request().
        The resp opcode remains -1 and resp psn remains 24.
      - Receive write middle PSN 24. check_op_seq() reports a missing
        first error since the resp opcode is -1.
      
      When sending an ack for a duplicate send or write request,
      use the psn of the previous ack sent. Do not use the psn
      of a read response for the ack.
      An example sequence is as follows:
      - Receive write PSN 30. Transmit ACK for PSN 30.
      - Receive read request 4KB PSN 31. Transmit read response with
        PSN 31. The resp psn is now 32.
      - The sender notices that PSN 30 is dropped and retransmits.
        Receive write PSN 30. duplicate_request() sends an ACK with
        PSN 31. That is incorrect since PSN 31 was a read request.
      Signed-off-by: NVijay Immanuel <vijayi@attalasystems.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      b97db585
    • P
      IB/rxe: Simplify rxe_find_route() to avoid GID query for netdev · 3db2bceb
      Parav Pandit 提交于
      rxe_prepare() is called on an skb which has ndev already initialized by
      rxe_init_packet().
      Therefore avoid querying the GID attribute again and use the available
      netdevice from the skb->dev.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
      Tested-by: NYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3db2bceb
  16. 19 6月, 2018 1 次提交
  17. 28 4月, 2018 2 次提交
    • Z
      IB/rxe: avoid double kfree_skb · 9fd4350b
      Zhu Yanjun 提交于
      When skb is sent, it will pass the following functions in soft roce.
      
      rxe_send [rdma_rxe]
          ip_local_out
              __ip_local_out
              ip_output
                  ip_finish_output
                      ip_finish_output2
                          dev_queue_xmit
                              __dev_queue_xmit
                                  dev_hard_start_xmit
      
      In the above functions, if error occurs in the above functions or
      iptables rules drop skb after ip_local_out, kfree_skb will be called.
      So it is not necessary to call kfree_skb in soft roce module again.
      Or else crash will occur.
      
      The steps to reproduce:
      
           server                       client
          ---------                    ---------
          |1.1.1.1|<----rxe-channel--->|1.1.1.2|
          ---------                    ---------
      
      On server: rping -s -a 1.1.1.1 -v -C 10000 -S 512
      On client: rping -c -a 1.1.1.1 -v -C 10000 -S 512
      
      The kernel configs CONFIG_DEBUG_KMEMLEAK and
      CONFIG_DEBUG_OBJECTS are enabled on both server and client.
      
      When rping runs, run the following command in server:
      
      iptables -I OUTPUT -p udp  --dport 4791 -j DROP
      
      Without this patch, crash will occur.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      9fd4350b
    • Z
      IB/rxe: remove unused function variable · e12ee8ce
      Zhu Yanjun 提交于
      In the functions rxe_mem_init_dma, rxe_mem_init_user, rxe_mem_init_fast
      and copy_data, the function variable rxe is not used. So this function
      variable rxe is removed.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      e12ee8ce
  18. 20 4月, 2018 2 次提交
  19. 08 3月, 2018 1 次提交
    • Z
      IB/rxe: remove unnecessary skb_clone · 86af6176
      Zhu Yanjun 提交于
      In send_atomic_ack function, it is not necessary to make a
      skb_clone. To gain better performance (high throughput and
      low latency), this skb_clone is removed.
      
      The following tests are made.
      
       server                       client
      ---------                    ---------
      |1.1.1.1|<----rxe-channel--->|1.1.1.2|
      ---------                    ---------
      
      On server: rping -s -a 1.1.1.1 -v -C 1000 -S 512
      On client: rping -c -a 1.1.1.1 -v -C 1000 -S 512
      
      The kernel config CONFIG_DEBUG_KMEMLEAK is enabled on both server
      and client.
      
      This test runs for several hours. There is no memory leak and the whole
      system can work well.
      
      Based on the above network, the following tests are made.
      
      Server: ibv_rc_pingpong -d rxe0 -g 1
      Client: ibv_rc_pingpong -d rxe0 -g 1 1.1.1.1
      
      The test results on Server(10 tests are made).
      Before:
      Throughput is 137.07 Mbit/sec
      Latency is 517.76 usec/iter
      
      After:
      Throughput is 148.85 Mbit/sec
      Latency is 476.64 usec/iter
      
      The throughput is enhanced and the latency is reduced.
      
      CC: Srinivas Eeda <srinivas.eeda@oracle.com>
      CC: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      86af6176
  20. 19 1月, 2018 1 次提交
    • B
      RDMA/rxe: Fix a race condition in rxe_requester() · 65567e41
      Bart Van Assche 提交于
      The rxe driver works as follows:
      * The send queue, receive queue and completion queues are implemented as
        circular buffers.
      * ib_post_send() and ib_post_recv() calls are serialized through a spinlock.
      * Removing elements from various queues happens from tasklet
        context. Tasklets are guaranteed to run on at most one CPU. This serializes
        access to these queues. See also rxe_completer(), rxe_requester() and
        rxe_responder().
      * rxe_completer() processes the skbs queued onto qp->resp_pkts.
      * rxe_requester() handles the send queue (qp->sq.queue).
      * rxe_responder() processes the skbs queued onto qp->req_pkts.
      
      Since rxe_drain_req_pkts() processes qp->req_pkts, calling
      rxe_drain_req_pkts() from rxe_requester() is racy. Hence this patch.
      Reported-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      65567e41
  21. 16 1月, 2018 1 次提交
  22. 29 8月, 2017 1 次提交
  23. 20 7月, 2017 1 次提交
  24. 13 7月, 2017 1 次提交
    • K
      IB/rxe: do not copy extra stack memory to skb · 4c93496f
      Kees Cook 提交于
      This fixes a over-read condition detected by FORTIFY_SOURCE for this
      line:
      
      	memcpy(SKB_TO_PKT(skb), &ack_pkt, sizeof(skb->cb));
      
      The error was:
      
        In file included from ./include/linux/bitmap.h:8:0,
                         from ./include/linux/cpumask.h:11,
                         from ./include/linux/mm_types_task.h:13,
                         from ./include/linux/mm_types.h:4,
                         from ./include/linux/kmemcheck.h:4,
                         from ./include/linux/skbuff.h:18,
                         from drivers/infiniband/sw/rxe/rxe_resp.c:34:
        In function 'memcpy',
            inlined from 'send_atomic_ack.constprop' at drivers/infiniband/sw/rxe/rxe_resp.c:998:2,
            inlined from 'acknowledge' at drivers/infiniband/sw/rxe/rxe_resp.c:1026:3,
            inlined from 'rxe_responder' at drivers/infiniband/sw/rxe/rxe_resp.c:1286:10:
        ./include/linux/string.h:309:4: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
            __read_overflow2();
      
      Daniel Micay noted that struct rxe_pkt_info is 32 bytes on 32-bit
      architectures, but skb->cb is still 64.  The memcpy() over-reads 32
      bytes.  This fixes it by zeroing the unused bytes in skb->cb.
      
      Link: http://lkml.kernel.org/r/1497903987-21002-5-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Moni Shoua <monis@mellanox.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c93496f
  25. 02 5月, 2017 1 次提交
    • J
      IB/rxe: Don't clamp residual length to mtu · d5241850
      Johannes Thumshirn 提交于
      When reading a RDMA WRITE FIRST packet we copy the DMA length from the RDMA
      header into the qp->resp.resid variable for later use. Later in check_rkey()
      we clamp it to the MTU if the packet is an  RDMA WRITE packet and has a
      residual length bigger than the MTU. Later in write_data_in() we subtract the
      payload of the packet from the residual length. If the packet happens to have a
      payload of exactly the MTU size we end up with a residual length of 0 despite
      the packet not being the last in the conversation. When the next packet in the
      conversation arrives, we don't have any residual length left and thus set the QP
      into an error state.
      
      This broke NVMe over Fabrics functionality over rdma_rxe.ko
      
      The patch was verified using the following test.
      
       # echo eth0 > /sys/module/rdma_rxe/parameters/add
       # nvme connect -t rdma -a 192.168.155.101 -s 1023 -n nvmf-test
       # mkfs.xfs -fK /dev/nvme0n1
       meta-data=/dev/nvme0n1           isize=256    agcount=4, agsize=65536 blks
                =                       sectsz=4096  attr=2, projid32bit=1
                =                       crc=0        finobt=0, sparse=0
       data     =                       bsize=4096   blocks=262144, imaxpct=25
                =                       sunit=0      swidth=0 blks
       naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
       log      =internal log           bsize=4096   blocks=2560, version=2
                =                       sectsz=4096  sunit=1 blks, lazy-count=1
       realtime =none                   extsz=4096   blocks=0, rtextents=0
       # mount /dev/nvme0n1 /tmp/
       [  148.923263] XFS (nvme0n1): Mounting V4 Filesystem
       [  148.961196] XFS (nvme0n1): Ending clean mount
       # dd if=/dev/urandom of=test.bin bs=1M count=128
       128+0 records in
       128+0 records out
       134217728 bytes (134 MB, 128 MiB) copied, 0.437991 s, 306 MB/s
       # sha256sum test.bin
       cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  test.bin
       # cp test.bin /tmp/
       sha256sum /tmp/test.bin
       cde42941f045efa8c4f0f157ab6f29741753cdd8d1cff93a6b03649d83c4129a  /tmp/test.bin
      Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Max Gurtovoy <maxg@mellanox.com>
      Acked-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d5241850
  26. 21 4月, 2017 1 次提交
    • Y
      IB/rxe: Add port protocol stats · 0b1e5b99
      Yonatan Cohen 提交于
      Expose new counters using the get_hw_stats callback.
      We expose the following counters:
      
      +---------------------+----------------------------------------+
      |      Name           |           Description                  |
      |---------------------+----------------------------------------|
      |sent_pkts            | number of sent pkts                    |
      |---------------------+----------------------------------------|
      |rcvd_pkts            | number of received packets             |
      |---------------------+----------------------------------------|
      |out_of_sequence      | number of errors due to packet         |
      |                     | transport sequence number              |
      |---------------------+----------------------------------------|
      |duplicate_request    | number of received duplicated packets. |
      |                     | A request that previously executed is  |
      |                     | named duplicated.                      |
      |---------------------+----------------------------------------|
      |rcvd_rnr_err         | number of received RNR by completer    |
      |---------------------+----------------------------------------|
      |send_rnr_err         | number of sent RNR by responder        |
      |---------------------+----------------------------------------|
      |rcvd_seq_err         | number of out of sequence packets      |
      |                     | received                               |
      |---------------------+----------------------------------------|
      |ack_deffered         | number of deferred handling of ack     |
      |                     | packets.                               |
      |---------------------+----------------------------------------|
      |retry_exceeded_err   | number of times retry exceeded         |
      |---------------------+----------------------------------------|
      |completer_retry_err  | number of times completer decided to   |
      |                     | retry                                  |
      |---------------------+----------------------------------------|
      |send_err             | number of failed send packet           |
      +---------------------+----------------------------------------+
      Signed-off-by: NYonatan Cohen <yonatanc@mellanox.com>
      Reviewed-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NAndrew Boyer <andrew.boyer@dell.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      0b1e5b99
  27. 25 3月, 2017 1 次提交
    • D
      IB/rxe: increment msn only when completing a request · 9fcd67d1
      David Marchand 提交于
      According to C9-147, MSN should only be incremented when the last packet of
      a multi packet request has been received.
      
      "Logically, the requester associates a sequential Send Sequence Number
      (SSN) with each WQE posted to the send queue. The SSN bears a one-
      to-one relationship to the MSN returned by the responder in each re-
      sponse packet. Therefore, when the requester receives a response, it in-
      terprets the MSN as representing the SSN of the most recent request
      completed by the responder to determine which send WQE(s) can be
      completed."
      
      Fixes: 8700e3e7 ("Soft RoCE driver")
      Signed-off-by: NDavid Marchand <david.marchand@6wind.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      9fcd67d1
  28. 09 2月, 2017 1 次提交
  29. 11 1月, 2017 6 次提交
  30. 23 12月, 2016 1 次提交