1. 14 12月, 2016 1 次提交
  2. 08 10月, 2016 2 次提交
    • J
      IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets · fd10ed8e
      Jack Morgenstein 提交于
      In MLX qp packets, the LRH (built by the driver) has both a VL field
      and an SL field. When building a QP1 packet, the VL field should
      reflect the SLtoVL mapping and not arbitrarily contain zero (as is
      done now). This bug causes credit problems in IB switches at
      high rates of QP1 packets.
      
      The fix is to cache the SL to VL mapping in the driver, and look up
      the VL mapped to the SL provided in the send request when sending
      QP1 packets.
      
      For FW versions which support generating a port_management_config_change
      event with subtype sl-to-vl-table-change, the driver uses that event
      to update its sl-to-vl mapping cache.  Otherwise, the driver snoops
      incoming SMP mads to update the cache.
      
      There remains the case where the FW is running in secure-host mode
      (so no QP0 packets are delivered to the driver), and the FW does not
      generate the sl2vl mapping change event. To support this case, the
      driver updates (via querying the FW) its sl2vl mapping cache when
      running in secure-host mode when it receives either a Port Up event
      or a client-reregister event (where the port is still up, but there
      may have been an opensm failover).
      OpenSM modifies the sl2vl mapping before Port Up and Client-reregister
      events occur, so if there is a mapping change the driver's cache will
      be properly updated.
      
      Fixes: 225c7b1f ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fd10ed8e
    • L
      IB/mlx4: Move user vendor structures · 9ce28a20
      Leon Romanovsky 提交于
      This patch moves mlx4 vendor's specific structures to
      common UAPI folder which will be visible to all consumers.
      
      These structures are used by user-space library driver
      (libmlx4) and currently manually copied to that library.
      
      This move will allow cross-compile against these files and
      simplify introduction of vendor specific data.
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      9ce28a20
  3. 17 9月, 2016 2 次提交
    • J
      IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV · 8ec07bf8
      Jack Morgenstein 提交于
      When sending QP1 MAD packets which use a GRH, the source GID
      (which consists of the 64-bit subnet prefix, and the 64 bit port GUID)
      must be included in the packet GRH.
      
      For SR-IOV, a GID cache is used, since the source GID needs to be the
      slave's source GID, and not the Hypervisor's GID. This cache also
      included a subnet_prefix. Unfortunately, the subnet_prefix field in
      the cache was never initialized (to the default subnet prefix 0xfe80::0).
      As a result, this field remained all zeroes.  Therefore, when SR-IOV
      was active, all QP1 packets which included a GRH had a source GID
      subnet prefix of all-zeroes.
      
      However, the subnet-prefix should initially be 0xfe80::0 (the default
      subnet prefix). In addition, if OpenSM modifies a port's subnet prefix,
      the new subnet prefix must be used in the GRH when sending QP1 packets.
      To fix this we now initialize the subnet prefix in the SR-IOV GID cache
      to the default subnet prefix. We update the cached value if/when OpenSM
      modifies the port's subnet prefix. We take this cached value when sending
      QP1 packets when SR-IOV is active.
      
      Note that the value is stored as an atomic64. This eliminates any need
      for locking when the subnet prefix is being updated.
      
      Note also that we depend on the FW generating the "port management change"
      event for tracking subnet-prefix changes performed by OpenSM. If running
      early FW (before 2.9.4630), subnet prefix changes will not be tracked (but
      the default subnet prefix still will be stored in the cache; therefore
      users who do not modify the subnet prefix will not have a problem).
      IF there is a need for such tracking also for early FW, we will add that
      capability in a subsequent patch.
      
      Fixes: 1ffeb2eb ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8ec07bf8
    • J
      IB/mlx4: Fix code indentation in QP1 MAD flow · baa0be70
      Jack Morgenstein 提交于
      The indentation in the QP1 GRH flow in procedure build_mlx_header is
      really confusing. Fix it, in preparation for a commit which touches
      this code.
      
      Fixes: 1ffeb2eb ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      baa0be70
  4. 20 7月, 2016 1 次提交
  5. 23 6月, 2016 2 次提交
  6. 06 5月, 2016 1 次提交
    • H
      net/mlx4: Avoid wrong virtual mappings · 73898db0
      Haggai Abramovsky 提交于
      The dma_alloc_coherent() function returns a virtual address which can
      be used for coherent access to the underlying memory.  On some
      architectures, like arm64, undefined behavior results if this memory is
      also accessed via virtual mappings that are not coherent.  Because of
      their undefined nature, operations like virt_to_page() return garbage
      when passed virtual addresses obtained from dma_alloc_coherent().  Any
      subsequent mappings via vmap() of the garbage page values are unusable
      and result in bad things like bus errors (synchronous aborts in ARM64
      speak).
      
      The mlx4 driver contains code that does the equivalent of:
      vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the
      device is opened.
      
      Prevent Ethernet driver to run this problematic code by forcing it to
      allocate contiguous memory. As for the Infiniband driver, at first we
      are trying to allocate contiguous memory, but in case of failure roll
      back to work with fragmented memory.
      Signed-off-by: NHaggai Abramovsky <hagaya@mellanox.com>
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reported-by: NDavid Daney <david.daney@cavium.com>
      Tested-by: NSinan Kaya <okaya@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73898db0
  7. 17 2月, 2016 1 次提交
    • H
      net/mlx4_core: Set UAR page size to 4KB regardless of system page size · 85743f1e
      Huy Nguyen 提交于
      problem description:
      
      The current code sets UAR page size equal to system page size.
      The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
      The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.
      
      solution:
      
      Always set UAR page to 4KB. This allows more UAR pages if the OS
      has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
      system page size, with 4MB uar region, there are 4MB/2/64KB = 32
      uars (half for uar, half for blueflame). This does not meet minimum 128
      UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
      which meet the minimum requirement.
      
      Note that only codes in mlx4_core that deal with firmware know that uar
      page size is 4KB. Codes that deal with usr page in cq and qp context
      (mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
      that uar page size equals to system page size.
      
      Note that with this implementation, on 64KB system page size kernel, there
      are 16 uars per system page but only one uars is used. The other 15
      uars are ignored because of the above assumption.
      
      Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
      to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
      firmware.
      
      Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
      the virtual OS must be updated. If hypervisor has old code, and the virtual
      OS has this new code, the new code will be backward compatible with the
      old code. If the uar size is big enough, this new code in VF continues to
      work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
      meet 128 uars requirement, this new code not loaded in VF and print the same
      error message as the old code in Hypervisor.
      Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85743f1e
  8. 20 1月, 2016 4 次提交
  9. 24 12月, 2015 3 次提交
  10. 23 12月, 2015 1 次提交
  11. 09 12月, 2015 1 次提交
  12. 29 10月, 2015 2 次提交
  13. 22 10月, 2015 5 次提交
  14. 08 10月, 2015 1 次提交
    • C
      IB: split struct ib_send_wr · e622f2f4
      Christoph Hellwig 提交于
      This patch split up struct ib_send_wr so that all non-trivial verbs
      use their own structure which embedds struct ib_send_wr.  This dramaticly
      shrinks the size of a WR for most common operations:
      
      sizeof(struct ib_send_wr) (old):	96
      
      sizeof(struct ib_send_wr):		48
      sizeof(struct ib_rdma_wr):		64
      sizeof(struct ib_atomic_wr):		96
      sizeof(struct ib_ud_wr):		88
      sizeof(struct ib_fast_reg_wr):		88
      sizeof(struct ib_bind_mw_wr):		96
      sizeof(struct ib_sig_handover_wr):	80
      
      And with Sagi's pending MR rework the fast registration WR will also be
      down to a reasonable size:
      
      sizeof(struct ib_fastreg_wr):		64
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
      Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
      Tested-by: NHaggai Eran <haggaie@mellanox.com>
      Tested-by: NSagi Grimberg <sagig@mellanox.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      e622f2f4
  15. 31 8月, 2015 1 次提交
  16. 16 6月, 2015 2 次提交
  17. 16 4月, 2015 2 次提交
  18. 18 2月, 2015 1 次提交
  19. 10 2月, 2015 1 次提交
    • Y
      IB/mlx4: Reset flow support for IB kernel ULPs · 35f05dab
      Yishai Hadas 提交于
      The driver exposes interfaces that directly relate to HW state. Upon fatal
      error, consumers of these interfaces (ULPs) that rely on completion of
      all their posted work-request could hang, thereby introducing dependencies
      in shutdown order.  To prevent this from happening, we manage the
      relevant resources (CQs, QPs) that are used by the device. Upon a fatal error,
      we now generate simulated completions for outstanding WQEs that were not
      completed at the time the HW was reset.
      
      It includes invoking the completion event handler for all involved CQs so that
      the ULPs will poll those CQs. When polled we return simulated CQEs with
      IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and
      not wait forever for completions upon receiving remove_one.
      
      The above change requires an extra check in the data path to make sure that when
      device is in error state, the simulated CQEs will be returned and no further
      WQEs will be posted.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f05dab
  20. 05 2月, 2015 2 次提交
  21. 12 12月, 2014 2 次提交
    • M
      net/mlx4: Add A0 hybrid steering · d57febe1
      Matan Barak 提交于
      A0 hybrid steering is a form of high performance flow steering.
      By using this mode, mlx4 cards use a fast limited table based steering,
      in order to enable fast steering of unicast packets to a QP.
      
      In order to implement A0 hybrid steering we allocate resources
      from different zones:
      (1) General range
      (2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.
      
      When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
      we try hard to allocate the QP from range (2). Otherwise, we try hard not
      to allocate from this  range. However, when the system is pushed to its
      limits and one needs every resource, the allocator uses every region it can.
      
      Meaning, when we run out of raw-eth qps, the allocator allocates from the
      general range (and the special-A0 area is no longer active). If we run out
      of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
      is also exhausted, the allocator will allocate from the general range
      (and the A0 region is no longer active).
      
      Note that if a raw-eth qp is allocated from the general range, it attempts
      to allocate the range such that bits 6 and 7 (blueflame bits) in the
      QP number are not set.
      
      When the feature is used in SRIOV, the VF has to notify the PF what
      kind of QP attributes it needs. In order to do that, along with the
      "Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
      to the combination of these bits, the PF tries to allocate a suitable QP.
      
      In order to maintain backward compatibility (with older PFs), the PF
      notifies which QP attributes it supports via QUERY_FUNC_CAP command.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d57febe1
    • E
      net/mlx4: Change QP allocation scheme · ddae0349
      Eugenia Emantayev 提交于
      When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
      in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
      
      The current Ethernet driver code reserves a Tx QP range with 256b alignment.
      
      This is wrong because if there are more than 64 Tx QPs in use,
      QPNs >= base + 65 will have bits 6/7 set.
      
      This problem is not specific for the Ethernet driver, any entity that
      tries to reserve more than 64 BF-enabled QPs should fail. Also, using
      ranges is not necessary here and is wasteful.
      
      The new mechanism introduced here will support reservation for
      "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
      (when hypervisors support WC in VMs). The flow we use is:
      
      1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
         and request "BF enabled QPs" if BF is supported for the function
      
      2. In the ALLOC_RES FW command, change param1 to:
      a. param1[23:0]  - number of QPs
      b. param1[31-24] - flags controlling QPs reservation
      
      Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
      bits 6 and 7 unset in order to be used in Ethernet.
      
      Bits 24-30 of the flags are currently reserved.
      
      When a function tries to allocate a QP, it states the required attributes
      for this QP. Those attributes are considered "best-effort". If an attribute,
      such as Ethernet BF enabled QP, is a must-have attribute, the function has
      to check that attribute is supported before trying to do the allocation.
      
      In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
      which are unsupported. If SRIOV is used, the PF validates those attributes
      and masks out unsupported attributes as well. In order to notify VFs which
      attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
      mailbox is filled by the PF, which notifies which QP allocation attributes
      it supports.
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddae0349
  22. 23 9月, 2014 2 次提交