1. 10 5月, 2011 1 次提交
  2. 27 4月, 2011 1 次提交
  3. 31 3月, 2011 1 次提交
  4. 25 3月, 2011 1 次提交
  5. 24 3月, 2011 2 次提交
  6. 23 3月, 2011 1 次提交
    • D
      IB: Increase DMA max_segment_size on Mellanox hardware · 7f9e5c48
      David Dillow 提交于
      By default, each device is assumed to be able only handle 64 KB chunks
      during DMA. By giving the segment size a larger value, the block layer
      will coalesce more S/G entries together for SRP, allowing larger
      requests with the same sg_tablesize setting.  The block layer is the
      only direct user of it, though a few IOMMU drivers reference it as
      well for their *_map_sg coalescing code. pci-gart_64 on x86, and a
      smattering on on sparc, powerpc, and ia64.
      
      Since other IB protocols could potentially see larger segments with
      this, let's check those:
      
       - iSER is fine, because you limit your maximum request size to 512
         KB, so we'll never overrun the page vector in struct iser_page_vec
         (128 entries currently). It is independent of the DMA segment size,
         and handles multi-page segments already.
      
       - IPoIB is fine, as it maps each page individually, and doesn't use
         ib_dma_map_sg().
      
       - RDS appears to do the right thing and has no dependencies on DMA
         segment size, but I don't claim to have done a complete audit.
      
       - NFSoRDMA and 9p are OK -- they do not use ib_dma_map_sg(), so they
         doesn't care about the coalescing.
      
       - Lustre's ko2iblnd does not care about coalescing -- it properly
         walks the returned sg list.
      
      This patch ups the value on Mellanox hardware to 1 GB, which matches
      reported firmware limits on mlx4.
      Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      7f9e5c48
  7. 19 3月, 2011 1 次提交
  8. 18 3月, 2011 3 次提交
  9. 16 3月, 2011 10 次提交
    • D
      IB/srp: try to use larger FMR sizes to cover our mappings · be8b9814
      David Dillow 提交于
      Now that we can get larger SG lists, we can take advantage of HCAs that
      allow us to use larger FMR sizes. In many cases, we can use up to 512
      entries, so start there and work our way down.
      Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
      be8b9814
    • D
      IB/srp: add support for indirect tables that don't fit in SRP_CMD · c07d424d
      David Dillow 提交于
      This allows us to guarantee the ability to submit up to 8 MB requests
      based on the current value of SCSI_MAX_SG_CHAIN_SEGMENTS. While FMR will
      usually condense the requests into 8 SG entries, it is imperative that
      the target support external tables in case the FMR mapping fails or is
      not supported.
      
      We add a safety valve to allow targets without the needed support to
      reap the benefits of the large tables, but fail in a manner that lets
      the user know that the data didn't make it to the device. The user must
      add "allow_ext_sg=1" to the target parameters to indicate that the
      target has the needed support.
      
      If indirect_sg_entries is not specified in the modules options, then
      the sg_tablesize for the target will default to cmd_sg_entries unless
      overridden by the target options.
      Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
      c07d424d
    • D
      IB/srp: rework mapping engine to use multiple FMR entries · 8f26c9ff
      David Dillow 提交于
      Instead of forcing all of the S/G entries to fit in one FMR, and falling
      back to indirect descriptors if that fails, allow the use of as many
      FMRs as needed to map the request. This lays the groundwork for allowing
      indirect descriptor tables that are larger than can fit in the command
      IU, but should marginally improve performance now by reducing the number
      of indirect descriptors needed.
      
      We increase the minimum page size for the FMR pool to 4K, as larger
      pages help increase the coverage of each FMR, and it is rare that the
      kernel would send down a request with scattered 512 byte fragments.
      
      This patch also move some of the target initialization code afte the
      parsing of options, to keep it together with the new code that needs to
      allocate memory based on the options given.
      Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
      8f26c9ff
    • D
      IB/srp: allow sg_tablesize to be set for each target · 49248644
      David Dillow 提交于
      Different configurations of target software allow differing max sizes of
      the command IU. Allowing this to be changed per-target allows all
      targets on an initiator to get an optimal setting.
      
      We deprecate srp_sg_tablesize and replace it with cmd_sg_entries in
      preparation for allowing more indirect descriptors than can fit in the
      IU.
      Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
      49248644
    • D
      IB/srp: move IB CM setup completion into its own function · 961e0be8
      David Dillow 提交于
      This is to clean up prior to further changes.
      Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
      961e0be8
    • D
      IB/srp: always avoid non-zero offsets into an FMR · 8c4037b5
      David Dillow 提交于
      It is unclear exactly how this code works around Mellanox SRP targets,
      or if the problem is on the target side or in the HCA itself. In an
      abundance of caution, we should always enable the workaround.
      Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
      8c4037b5
    • S
      RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one · a396d43a
      Sean Hefty 提交于
      rdma_destroy_id currently uses the global rdma cm 'lock' to test if an
      rdma_cm_id has been bound to a device.  This prevents an active
      address resolution callback handler from assigning a device to the
      rdma_cm_id after rdma_destroy_id checks for one.
      
      Instead, we can replace the use of the global lock around the check to
      the rdma_cm_id device pointer by setting the id state to destroying,
      then flushing all active callbacks.  The latter is accomplished by
      acquiring and releasing the handler_mutex.  Any active handler will
      complete first, and any newly scheduled handlers will find the
      rdma_cm_id in an invalid state.
      
      In addition to optimizing the current locking scheme, the use of the
      rdma_cm_id mutex is a more intuitive synchronization mechanism than
      that of the global lock.  These changes are based on feedback from
      Doug Ledford <dledford@redhat.com> while he was trying to debug a
      crash in the rdma cm destroy path.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a396d43a
    • S
      IB/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state · 8d8ac865
      Sean Hefty 提交于
      This problem was reported by Moni Shoua <monis@mellanox.com> and Amir
      Vadai <amirv@mellanox.com>:
      
      	When destroying a cm_id from a context of a work queue and if
      	the lap_state of this cm_id is IB_CM_LAP_SENT, we need to
      	release the reference of this id that was taken upon the send
      	of the LAP message.  Otherwise, if the expected APR message
      	gets lost, it is only after a long time that the reference
      	will be released, while during that the work handler thread is
      	not available to process other things.
      
      It turns out that we need to cancel any pending LAP messages whenever
      we transition out of the IB_CM_ESTABLISH state.  This occurs when
      disconnecting - either sending or receiving a DREQ.  It can also
      happen in a corner case where we receive a REJ message after sending
      an RTU, followed by a LAP.  Add checks and cancel any outstanding LAP
      messages in these three cases.
      
      Canceling the LAP when sending a DREQ fixes the destroy problem
      reported by Moni.  When a cm_id is destroyed in the IB_CM_ESTABLISHED
      state, it sends a DREQ to the remote side to notify the peer that the
      connection is going away.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      8d8ac865
    • S
      IB/cm: Bump reference count on cm_id before invoking callback · 29963437
      Sean Hefty 提交于
      When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
      refcount of the cm_id is initialized to 1.  However, cm_process_work
      will decrement the refcount after invoking all callbacks.  The result
      is that the cm_id will end up with refcount set to 0 by the end of the
      sidr req handler.
      
      If a user tries to destroy the cm_id, the destruction will proceed,
      under the incorrect assumption that no other threads are referencing
      the cm_id.  This can lead to a crash when the cm callback thread tries
      to access the cm_id.
      
      This problem was noticed as part of a larger investigation with kernel
      crashes in the rdma_cm when running on a real time OS.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      29963437
    • S
      RDMA/cma: Fix crash in request handlers · 25ae21a1
      Sean Hefty 提交于
      Doug Ledford and Red Hat reported a crash when running the rdma_cm on
      a real-time OS.  The crash has the following call trace:
      
          cm_process_work
             cma_req_handler
                cma_disable_callback
                rdma_create_id
                   kzalloc
                   init_completion
                cma_get_net_info
                cma_save_net_info
                cma_any_addr
                   cma_zero_addr
                rdma_translate_ip
                   rdma_copy_addr
                cma_acquire_dev
                   rdma_addr_get_sgid
                   ib_find_cached_gid
                   cma_attach_to_dev
                ucma_event_handler
                   kzalloc
                   ib_copy_ah_attr_to_user
                cma_comp
      
      [ preempted ]
      
          cma_write
              copy_from_user
              ucma_destroy_id
                 copy_from_user
                 _ucma_find_context
                 ucma_put_ctx
                 ucma_free_ctx
                    rdma_destroy_id
                       cma_exch
                       cma_cancel_operation
                       rdma_node_get_transport
      
              rt_mutex_slowunlock
              bad_area_nosemaphore
              oops_enter
      
      They were able to reproduce the crash multiple times with the
      following details:
      
          Crash seems to always happen on the:
                  mutex_unlock(&conn_id->handler_mutex);
          as conn_id looks to have been freed during this code path.
      
      An examination of the code shows that a race exists in the request
      handlers.  When a new connection request is received, the rdma_cm
      allocates a new connection identifier.  This identifier has a single
      reference count on it.  If a user calls rdma_destroy_id() from another
      thread after receiving a callback, rdma_destroy_id will proceed to
      destroy the id and free the associated memory.  However, the request
      handlers may still be in the process of running.  When control returns
      to the request handlers, they can attempt to access the newly created
      identifiers.
      
      Fix this by holding a reference on the newly created rdma_cm_id until
      the request handler is through accessing it.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      25ae21a1
  10. 15 3月, 2011 10 次提交
  11. 13 3月, 2011 3 次提交
  12. 03 3月, 2011 1 次提交
  13. 02 3月, 2011 2 次提交
  14. 25 2月, 2011 1 次提交
  15. 23 2月, 2011 1 次提交
  16. 18 2月, 2011 1 次提交
    • M
      IB/qib: Prevent double completions after a timeout or RNR error · c0af2c05
      Mike Marciniszyn 提交于
      There is a double completion associated with error handling for RC QPs.
      
      The sequence is:
      
       - The do_rc_ack() routine fields an RNR nack and there are 0
         rnr_retries configured on the QP.
       - qib_error_qp() stops the pending timer
       - qib_rc_send_complete() is called from sdma_complete()
       - qib_rc_send_complete() starts the timer because the msb of the psn
         just completed says an ack is needed.
       - a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
       - rc_timeout() calls qib_restart_rc()
       - qib_restart_rc() calls qib_send_complete() with a
         IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
         past
      
      The fix avoids starting the timer since another packet will never
      arrive.
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      c0af2c05