1. 23 8月, 2016 14 次提交
  2. 04 8月, 2016 14 次提交
    • M
      Soft RoCE driver · 8700e3e7
      Moni Shoua 提交于
      Soft RoCE (RXE) - The software RoCE driver
      
      ib_rxe implements the RDMA transport and registers to the RDMA core
      device as a kernel verbs provider. It also implements the packet IO
      layer. On the other hand ib_rxe registers to the Linux netdev stack
      as a udp encapsulating protocol, in that case RDMA, for sending and
      receiving packets over any Ethernet device.  This yields a RDMA
      transport over the UDP/Ethernet network layer forming a RoCEv2
      compatible device.
      
      The configuration procedure of the Soft RoCE drivers requires
      binding to any existing Ethernet network device. This is done with
      /sys interface.
      
      A userspace Soft RoCE library (librxe) provides user applications
      the ability to run with Soft RoCE devices.  The use of rxe verbs ins
      user space requires the inclusion of librxe as a device specifics
      plug-in to libibverbs. librxe is packaged separately.
      
      Architecture:
      
           +-----------------------------------------------------------+
           |                          Application                      |
           +-----------------------------------------------------------+
                                  +-----------------------------------+
                                  |             libibverbs            |
      User                        +-----------------------------------+
                                  +----------------+ +----------------+
                                  | librxe         | | HW RoCE lib    |
                                  +----------------+ +----------------+
      +---------------------------------------------------------------+
           +--------------+                           +------------+
           | Sockets      |                           | RDMA ULP   |
           +--------------+                           +------------+
           +--------------+                  +---------------------+
           | TCP/IP       |                  | ib_core             |
           +--------------+                  +---------------------+
                                   +------------+ +----------------+
      Kernel                       | ib_rxe     | | HW RoCE driver |
                                   +------------+ +----------------+
           +------------------------------------+
           | NIC driver                         |
           +------------------------------------+
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           +-----------------------------------------------------------+
           |                          Application                      |
           +-----------------------------------------------------------+
                                  +-----------------------------------+
                                  |             libibverbs            |
      User                        +-----------------------------------+
                                  +----------------+ +----------------+
                                  | librxe         | | HW RoCE lib    |
                                  +----------------+ +----------------+
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           +--------------+                           +------------+
           | Sockets      |                           | RDMA ULP   |
           +--------------+                           +------------+
           +--------------+                  +---------------------+
           | TCP/IP       |                  | ib_core             |
           +--------------+                  +---------------------+
                                   +------------+ +----------------+
      Kernel                       | ib_rxe     | | HW RoCE driver |
                                   +------------+ +----------------+
           +------------------------------------+
           | NIC driver                         |
           +------------------------------------+
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Soft RoCE resources:
      
      [1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
      Github
      [2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
      Wiki page
      [3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library
      Signed-off-by: NKamal Heib <kamalh@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8700e3e7
    • K
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski 提交于
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      
      Semantic patches for this change (at least most of them):
      
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
      
          @@
          f(...,
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          {
          ...
          }
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      and
      
          // Options: --all-includes
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
          type t;
      
          @@
          t f(..., struct dma_attrs *attrs);
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00085f1e
    • A
      IB/core: Support for CMA multicast join flags · ab15c95a
      Alex Vesker 提交于
      Added UCMA and CMA support for multicast join flags. Flags are
      passed using UCMA CM join command previously reserved fields.
      Currently supporting two join flags indicating two different
      multicast JoinStates:
      
      1. Full Member:
         The initiator creates the Multicast group(MCG) if it wasn't
         previously created, can send Multicast messages to the group
         and receive messages from the MCG.
      
      2. Send Only Full Member:
         The initiator creates the Multicast group(MCG) if it wasn't
         previously created, can send Multicast messages to the group
         but doesn't receive any messages from the MCG.
      
         IB: Send Only Full Member requires a query of ClassPortInfo
             to determine if SM/SA supports this option. If SM/SA
             doesn't support Send-Only there will be no join request
             sent and an error will be returned.
      
         ETH: When Send Only Full Member is requested no IGMP join
      	will be sent.
      Signed-off-by: NAlex Vesker <valex@mellanox.com>
      Reviewed by: Hal Rosenstock <hal@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      ab15c95a
    • A
      IB/sa: Add cached attribute containing SM information to SA port · 3d3fd742
      Alex Vesker 提交于
      Added a new SA port attribute containing SM ClassPortInfo fields,
      (ClassPortInfo fields: Table 126 IB Spec 1.3.). This is useful for
      checking SM support for specific features. The attribute is cached
      to avoid resending queries, caching is done when a successful
      ClassPortInfo reply is received on the port. Invalidation of the
      attribute is done on SM change events, SM re-registration events,
      and SM LID change events. The fields in ClassPortInfo should not
      change during SM runtime without an event.
      Signed-off-by: NAlex Vesker <valex@mellanox.com>
      Reviewed by: Hal Rosenstock <hal@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3d3fd742
    • J
      IB/uverbs: Fix race between uverbs_close and remove_one · d1e09f30
      Jason Gunthorpe 提交于
      Fixes an oops that might happen if uverbs_close races with
      remove_one.
      
      Both contexts may run ib_uverbs_cleanup_ucontext, it depends
      on the flow.
      
      Currently, there is no protection for a case that remove_one
      didn't make the cleanup it runs to its end, the underlying
      ib_device was freed then uverbs_close will call
      ib_uverbs_cleanup_ucontext and OOPs.
      
      Above might happen if uverbs_close deleted the file from the list
      then remove_one didn't find it and runs to its end.
      
      Fixes to protect against that case by a new cleanup lock so that
      ib_uverbs_cleanup_ucontext will be called always before that
      remove_one is ended.
      
      Fixes: 35d4a0b6 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one")
      Reported-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d1e09f30
    • M
      IB/mthca: Clean up error unwind flow in mthca_reset() · 380bae5b
      Markus Elfring 提交于
      The kfree() function was called in a few cases by the mthca_reset()
      function during error handling even if the passed variables "bridge_header"
      and "hca_header" contained a null pointer.
      
      Adjust jump targets according to the Linux coding style convention.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      380bae5b
    • M
      IB/mthca: NULL arg to pci_dev_put is OK · 3491ab63
      Markus Elfring 提交于
      The pci_dev_put() function tests whether its argument is NULL and then
      returns immediately. Thus the test around the call is not needed.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3491ab63
    • M
      IB/hfi1: NULL arg to sc_return_credits is OK · f7ca535b
      Markus Elfring 提交于
      The sc_return_credits() function tests whether its argument is NULL
      and then returns immediately. Thus the test around the call is not needed.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      f7ca535b
    • M
      IB/mlx4: Add diagnostic hardware counters · 3f85f2aa
      Mark Bloch 提交于
      Expose IB diagnostic hardware counters.
      The counters count IB events and are applicable for IB and RoCE.
      
      The counters can be divided into two groups, per device and per port.
      Device counters are always exposed.
      Port counters are exposed only if the firmware supports per port counters.
      
      rq_num_dup and sq_num_to are only exposed if we have firmware support
      for them, if we do, we expose them per device and per port.
      rq_num_udsdprd and num_cqovf are device only counters.
      
      rq - denotes responder.
      sq - denotes requester.
      
      |-----------------------|---------------------------------------|
      |	Name		|	Description			|
      |-----------------------|---------------------------------------|
      |rq_num_lle		| Number of local length errors		|
      |-----------------------|---------------------------------------|
      |sq_num_lle		| number of local length errors		|
      |-----------------------|---------------------------------------|
      |rq_num_lqpoe		| Number of local QP operation errors	|
      |-----------------------|---------------------------------------|
      |sq_num_lqpoe		| Number of local QP operation errors	|
      |-----------------------|---------------------------------------|
      |rq_num_lpe		| Number of local protection errors	|
      |-----------------------|---------------------------------------|
      |sq_num_lpe		| Number of local protection errors	|
      |-----------------------|---------------------------------------|
      |rq_num_wrfe		| Number of CQEs with error		|
      |-----------------------|---------------------------------------|
      |sq_num_wrfe		| Number of CQEs with error		|
      |-----------------------|---------------------------------------|
      |sq_num_mwbe		| Number of Memory Window bind errors	|
      |-----------------------|---------------------------------------|
      |sq_num_bre		| Number of bad response errors		|
      |-----------------------|---------------------------------------|
      |sq_num_rire		| Number of Remote Invalid request	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |rq_num_rire		| Number of Remote Invalid request	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |sq_num_rae		| Number of remote access errors	|
      |-----------------------|---------------------------------------|
      |rq_num_rae		| Number of remote access errors	|
      |-----------------------|---------------------------------------|
      |sq_num_roe		| Number of remote operation errors	|
      |-----------------------|---------------------------------------|
      |sq_num_tree		| Number of transport retries exceeded	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |sq_num_rree		| Number of RNR NAK retries exceeded	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |rq_num_rnr		| Number of RNR NAKs sent		|
      |-----------------------|---------------------------------------|
      |sq_num_rnr		| Number of RNR NAKs received		|
      |-----------------------|---------------------------------------|
      |rq_num_oos		| Number of Out of Sequence requests	|
      |			| received				|
      |-----------------------|---------------------------------------|
      |sq_num_oos		| Number of Out of Sequence NAKs	|
      |			| received				|
      |-----------------------|---------------------------------------|
      |rq_num_udsdprd		| Number of UD packets silently		|
      |			| discarded on the Receive Queue due to	|
      |			| lack of receive descriptor		|
      |-----------------------|---------------------------------------|
      |rq_num_dup		| Number of duplicate requests received	|
      |-----------------------|---------------------------------------|
      |sq_num_to		| Number of time out received		|
      |-----------------------|---------------------------------------|
      |num_cqovf		| Number of CQ overflows		|
      |-----------------------|---------------------------------------|
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3f85f2aa
    • M
      Use smaller 512 byte messages for portmapper messages · e972dabf
      Mustafa Ismail 提交于
      Portmapper messages are short and do not occupy more than 512 bytes.
      Lower portmapper message size to 512 bytes. This change significantly
      reduces the amount of memory needed when trying to establish a large
      number of connections simultaneously. The old value is based on page
      size.
      Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com>
      Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      e972dabf
    • Y
      IB/ipoib: Report SG feature regardless of HW UD CSUM capability · 5faba546
      Yuval Shaia 提交于
      Decouple SG support from HW ability to do UD checksum.
      This coupling is for historical reasons and removed with 'commit
      ec5f0615 ("net: Kill link between CSUM and SG features.")'
      
      During driver load it is assumed that device does not supports SG. The
      final decision is taken after creating UD QP based on device capability.
      Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
      Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      5faba546
    • R
      IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct · 0c87b672
      Roland Dreier 提交于
      We allocate a small tracking structure as part of mlx4_ib_resize_cq().
      However, we don't need to use GFP_ATOMIC -- immediately after the
      allocation, we call mlx4_cq_resize(), which allocates a command
      mailbox with GFP_KERNEL and then sleeps on a firmware command, so we
      better not be in an atomic context.
      
      This actually has a real impact, because when this GFP_ATOMIC
      allocation fails (and GFP_ATOMIC does fail in practice) then a
      userspace consumer resizing a CQ will get a spurious failure that we
      can easily avoid.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      0c87b672
    • B
      IB/hfi1: Disable by default · a154a8cd
      Bart Van Assche 提交于
      There is a strict policy in the Linux kernel that new drivers must be
      disabled by default. Hence leave out the "default m" line from Kconfig.
      
      Fixes: f48ad614 ("IB/hfi1: Move driver out of staging")
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Jubin John <jubin.john@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: <stable@vger.kernel.org> # v4.7+
      Acked-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      a154a8cd
    • B
      IB/rdmavt: Disable by default · 378fc320
      Bart Van Assche 提交于
      There is a strict policy in the Linux kernel that new drivers must be
      disabled by default. Hence leave out the "default m" line from Kconfig.
      
      Fixes: 0194621b ("IB/rdmavt: Create module framework and handle driver registration")
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Jubin John <jubin.john@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: <stable@vger.kernel.org> # v4.6+
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      378fc320
  3. 03 8月, 2016 12 次提交