1. 06 9月, 2009 3 次提交
  2. 23 6月, 2009 1 次提交
  3. 14 6月, 2009 1 次提交
  4. 28 5月, 2009 1 次提交
  5. 21 4月, 2009 1 次提交
  6. 07 4月, 2009 2 次提交
  7. 29 1月, 2009 1 次提交
    • M
      IB/mthca: Fix dispatch of IB_EVENT_LID_CHANGE event · 270b8b85
      Moni Shoua 提交于
      When snooping a PortInfo MAD, its client_reregister bit is checked.
      If the bit is ON then a CLIENT_REREGISTER event is dispatched,
      otherwise a LID_CHANGE event is dispatched.  This way of decision
      ignores the cases where the MAD changes the LID along with an
      instruction to reregister (so a necessary LID_CHANGE event won't be
      dispatched) or the MAD is neither of these (and an unnecessary
      LID_CHANGE event will be dispatched).
      
      This causes problems at least with IPoIB, which will do a "light"
      flush on reregister, rather than the "heavy" flush required due to a
      LID change.
      
      Fix this by dispatching a CLIENT_REREGISTER event if the
      client_reregister bit is set, but also compare the LID in the MAD to
      the current LID.  If and only if they are not identical then a
      LID_CHANGE event is dispatched.
      Signed-off-by: NMoni Shoua <monis@voltaire.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NYossi Etigin <yosefe@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      270b8b85
  8. 30 10月, 2008 1 次提交
  9. 29 10月, 2008 1 次提交
  10. 30 9月, 2008 1 次提交
    • R
      IB/mthca: Use pci_request_regions() · 208dde28
      Roland Dreier 提交于
      Back in prehistoric (pre-git!) days, the kernel's MSI-X support did
      request_mem_region() on a device's MSI-X tables, which meant that a
      driver that enabled MSI-X couldn't use pci_request_regions() (since
      that would clash with the PCI layer's MSI-X request).
      
      However, that was removed (by me!) years ago, so mthca can just use
      pci_request_regions() and pci_release_regions() instead of its own
      much more complicated code that avoids requesting the MSI-X tables.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      208dde28
  11. 27 7月, 2008 1 次提交
    • F
      dma-mapping: add the device argument to dma_mapping_error() · 8d8bb39b
      FUJITA Tomonori 提交于
      Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
      architecture does:
      
      This enables us to cleanly fix the Calgary IOMMU issue that some devices
      are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
      
      I think that per-device dma_mapping_ops support would be also helpful for
      KVM people to support PCI passthrough but Andi thinks that this makes it
      difficult to support the PCI passthrough (see the above thread).  So I
      CC'ed this to KVM camp.  Comments are appreciated.
      
      A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
      pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
      NULL, the system-wide dma_ops pointer is used as before.
      
      If it's useful for KVM people, I plan to implement a mechanism to register
      a hook called when a new pci (or dma capable) device is created (it works
      with hot plugging).  It enables IOMMUs to set up an appropriate
      dma_mapping_ops per device.
      
      The major obstacle is that dma_mapping_error doesn't take a pointer to the
      device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
      device.  Note all the POWER IOMMUs use the same dma_mapping_error function
      so this is not a problem for POWER but x86 IOMMUs use different
      dma_mapping_error functions.
      
      The first patch adds the device argument to dma_mapping_error.  The patch
      is trivial but large since it touches lots of drivers and dma-mapping.h in
      all the architecture.
      
      This patch:
      
      dma_mapping_error() doesn't take a pointer to the device unlike other DMA
      operations.  So we can't have dma_mapping_ops per device.
      
      Note that POWER already has dma_mapping_ops per device but all the POWER
      IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
      argument.
      
      [akpm@linux-foundation.org: fix sge]
      [akpm@linux-foundation.org: fix svc_rdma]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix bnx2x]
      [akpm@linux-foundation.org: fix s2io]
      [akpm@linux-foundation.org: fix pasemi_mac]
      [akpm@linux-foundation.org: fix sdhci]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix sparc]
      [akpm@linux-foundation.org: fix ibmvscsi]
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Avi Kivity <avi@qumranet.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d8bb39b
  12. 23 7月, 2008 1 次提交
    • R
      IB/mthca: Keep free count for MTT buddy allocator · e8bb4beb
      Roland Dreier 提交于
      MTT entries are allocated with a buddy allocator, which just keeps
      bitmaps for each level of the buddy table.  However, all free space
      starts out at the highest order, and small allocations start scanning
      from the lowest order.  When the lowest order tables have no free
      space, this can lead to scanning potentially millions of bits before
      finding a free entry at a higher order.
      
      We can avoid this by just keeping a count of how many free entries
      each order has, and skipping the bitmap scan when an order is
      completely empty.  This provides a nice performance boost for a
      negligible increase in memory usage.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      e8bb4beb
  13. 15 7月, 2008 6 次提交
    • R
      IB/mthca: Fix check of max_send_sge for special QPs · aed01227
      Roland Dreier 提交于
      The MLX transport requires two extra gather entries for sends (one for
      the header and one for the checksum at the end, as the comment says).
      However the code checked that max_recv_sge was not too big, instead of
      checking max_send_sge as it should have.  Fix the code to check the
      correct condition.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      aed01227
    • R
      IB/mthca: Use round_jiffies() for catastrophic error polling timer · c036925a
      Roland Dreier 提交于
      Exactly when the catastrophic error polling timer function runs is not
      important, so use round_jiffies() to save unnecessary wakeups.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c036925a
    • R
      IB/mthca: Remove "stop" flag for catastrophic error polling timer · 4522e08c
      Roland Dreier 提交于
      Since we use del_timer_sync() anyway, there's no need for an
      additional flag to tell the timer not to rearm.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      4522e08c
    • R
      IB/mthca: Remove extra code for RESET->ERR QP state transition · d3809ad0
      Roland Dreier 提交于
      Commit b18aad71 ("IB/mthca: Fix RESET to ERROR transition") added some
      extra code to handle a QP state transition from RESET to ERROR.
      However, the latest 1.2.1 version of the IB spec has clarified that
      this transition is actually not allowed, so we can remove this extra
      code again.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d3809ad0
    • S
      RDMA/core: Add memory management extensions support · 00f7ec36
      Steve Wise 提交于
      This patch adds support for the IB "base memory management extension"
      (BMME) and the equivalent iWARP operations (which the iWARP verbs
      mandates all devices must implement).  The new operations are:
      
       - Allocate an ib_mr for use in fast register work requests.
      
       - Allocate/free a physical buffer lists for use in fast register work
         requests.  This allows device drivers to allocate this memory as
         needed for use in posting send requests (eg via dma_alloc_coherent).
      
       - New send queue work requests:
         * send with remote invalidate
         * fast register memory region
         * local invalidate memory region
         * RDMA read with invalidate local memory region (iWARP only)
      
      Consumer interface details:
      
       - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
         to indicate device support for these features.
      
       - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
         IB_WR_RDMA_READ_WITH_INV are added.
      
       - A new consumer API function, ib_alloc_mr() is added to allocate
         fast register memory regions.
      
       - New consumer API functions, ib_alloc_fast_reg_page_list() and
         ib_free_fast_reg_page_list() are added to allocate and free
         device-specific memory for fast registration page lists.
      
       - A new consumer API function, ib_update_fast_reg_key(), is added to
         allow the key portion of the R_Key and L_Key of a fast registration
         MR to be updated.  Consumers call this if desired before posting
         a IB_WR_FAST_REG_MR work request.
      
      Consumers can use this as follows:
      
       - MR is allocated with ib_alloc_mr().
      
       - Page list memory is allocated with ib_alloc_fast_reg_page_list().
      
       - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().
      
       - MR made VALID and bound to a specific page list via
         ib_post_send(IB_WR_FAST_REG_MR)
      
       - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
         ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
         invalidate operation.
      
       - MR is deallocated with ib_dereg_mr()
      
       - page lists dealloced via ib_free_fast_reg_page_list().
      
      Applications can allocate a fast register MR once, and then can
      repeatedly bind the MR to different physical block lists (PBLs) via
      posting work requests to a send queue (SQ).  For each outstanding
      MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
      allocated (the fast_reg_page_list is owned by the low-level driver
      from the consumer posting a work request until the request completes).
      Thus pipelining can be achieved while still allowing device-specific
      page_list processing.
      
      The 32-bit fast register memory key/STag is composed of a 24-bit index
      and an 8-bit key.  The application can change the key each time it
      fast registers thus allowing more control over the peer's use of the
      key/STag (ie it can effectively be changed each time the rkey is
      rebound to a page list).
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      00f7ec36
    • R
      RDMA: Remove subversion $Id tags · f3781d2e
      Roland Dreier 提交于
      They don't get updated by git and so they're worse than useless.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f3781d2e
  14. 24 6月, 2008 1 次提交
  15. 17 5月, 2008 1 次提交
    • R
      IB/mthca: Fix max_sge value returned by query_device · 12103dca
      Roland Dreier 提交于
      The mthca driver returns the maximum number of scatter/gather entries
      returned by the firmware as the max_sge value when device properties
      are queried.  However, the firmware also reports a limit on the
      maximum descriptor size allowed, and because mthca takes into account
      the worst case send request overhead when checking whether to allow a
      QP to be created, the largest number of scatter/gather entries that
      can be used with mthca may be limited by the maximum descriptor size
      rather than just by the actual s/g entry limit.
      
      This means that applications cannot actually create QPs with
      max_send_sge equal to the limit returned by ib_query_device().  Fix
      this by checking if the maximum descriptor size imposes a lower limit
      and if so returning that lower limit.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      12103dca
  16. 30 4月, 2008 2 次提交
    • R
      IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute · baaad380
      Roland Dreier 提交于
      Commit cb9fbc5c ("IB: expand ib_umem_get() prototype") changed the
      mthca userspace ABI to provide a way for userspace to indicate which
      memory regions need the DMA write barrier attribute.  However, it is
      possible to handle this without breaking existing userspace, by having
      the mthca kernel driver recognize whether it is talking to old or new
      userspace, depending on the size of the register MR structure passed in.
      
      The only potential drawback of this is that is allows old userspace
      (which has a bug with DMA ordering on large SGI Altix systems) to
      continue to run on new kernels, but the advantage of allowing old
      userspace to continue to work on unaffected systems seems to outweigh
      this, and we can print a warning to push people to upgrade their
      userspace.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      baaad380
    • O
      IB/mthca: Avoid recycling old FMR R_Keys too soon · 0bfe151c
      Olaf Kirch 提交于
      When a FMR is unmapped, mthca resets the map count to 0, and clears
      the upper part of the R_Key which is used as the sequence counter.
      
      This poses a problem for RDS, which uses ib_fmr_unmap as a fence
      operation.  RDS assumes that after issuing an unmap, the old R_Keys
      will be invalid for a "reasonable" period of time. For instance,
      Oracle processes uses shared memory buffers allocated from a pool of
      buffers.  When a process dies, we want to reclaim these buffers -- but
      we must make sure there are no pending RDMA operations to/from those
      buffers.  The only way to achieve that is by using unmap and sync the
      TPT.
      
      However, when the sequence count is reset on unmap, there is a high
      likelihood that a new mapping will be given the same R_Key that was
      issued a few milliseconds ago.
      
      To prevent this, don't reset the sequence count when unmapping a FMR.
      Signed-off-by: NOlaf Kirch <olaf.kirch@oracle.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      0bfe151c
  17. 29 4月, 2008 1 次提交
    • A
      IB: expand ib_umem_get() prototype · cb9fbc5c
      Arthur Kepner 提交于
      Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
      when mapping user-allocated CQs with ib_umem_get().
      Signed-off-by: NArthur Kepner <akepner@sgi.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9fbc5c
  18. 20 4月, 2008 1 次提交
  19. 19 4月, 2008 1 次提交
  20. 17 4月, 2008 8 次提交
    • J
      IB/mthca: Update module version and release date · 940801b2
      Jack Morgenstein 提交于
      The ib_mthca driver has been stable for a while, so bump the version
      number to 1.0 to indicate this.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      940801b2
    • D
      IB/mthca: Update QP state if query QP succeeds · 5121df3a
      Dotan Barak 提交于
      If the QP was moved to another state (such as SQE) by the hardware,
      then after this change the user won't have to set the IBV_QP_CUR_STATE
      mask in order to execute modify QP in order to recover from this state.
      Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      5121df3a
    • R
      IB/core: Add support for "send with invalidate" work requests · 0f39cf3d
      Roland Dreier 提交于
      Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
      "send with invalidate" work request as defined in the iWARP verbs and
      the InfiniBand base memory management extensions.  Also put "imm_data"
      and a new "invalidate_rkey" member in a new "ex" union in struct
      ib_send_wr. The invalidate_rkey member can be used to pass in an
      R_Key/STag to be invalidated.  Add this new union to struct
      ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
      ib_uverbs_post_send().
      
      Fix up low-level drivers to deal with the change to struct ib_send_wr,
      and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
      since that code never does any send with immediate operations.
      
      Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
      the iWARP drivers currently in the tree set the bit.  The amso1100
      driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
      if passed in as part of userspace send requests (since it does not
      implement kernel bypass work request queueing).  Remove the flag from
      all existing drivers that set it until we know which ones are OK.
      
      The values chosen for the new flag is not consecutive to avoid clashing
      with flags defined in the XRC patches, which are not merged yet but
      which are already in use and are likely to be merged soon.
      
      This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      0f39cf3d
    • E
      IB/core: Add creation flags to struct ib_qp_init_attr · b846f25a
      Eli Cohen 提交于
      Add a create_flags member to struct ib_qp_init_attr that will allow a
      kernel verbs consumer to create a pass special flags when creating a QP.
      Add a flag value for telling low-level drivers that a QP will be used
      for IPoIB UD LSO.  The create_flags member will also be useful for XRC
      and ehca low-latency QP support.
      
      Since no create_flags handling is implemented yet, add code to all
      low-level drivers to return -EINVAL if create_flags is non-zero.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      b846f25a
    • R
      IB/mthca: Avoid integer overflow when allocating huge ICM table · c263ff65
      Roland Dreier 提交于
      In mthca_alloc_icm_table(), the number of entries to allocate for the
      table->icm array is computed by calculating obj_size * nobj and then
      dividing by MTHCA_TABLE_CHUNK_SIZE.  If nobj is really large, then
      obj_size * nobj may overflow and the division may get the wrong value
      (even a negative value).  Fix this by calculating the number of
      objects per chunk and then dividing nobj by this value instead.
      
      This patch allows crazy configurations such as loading ib_mthca with
      the module parameter num_mtt=33554432 to work properly.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c263ff65
    • R
      IB/mthca: Avoid integer overflow when dealing with profile size · 19773539
      Roland Dreier 提交于
      mthca_make_profile() returns the size in bytes of the HCA context
      layout it creates, or a negative value if an error occurs.  However,
      the return value is declared as u64 and the memfree initialization
      path casts this value to int to test if it is negative.  This makes it
      think incorrectly than an error has occurred if the context size
      happens to be bigger than 2GB, since this turns into a negative int.
      
      Fix this by having mthca_make_profile() return an s64 and testing
      for an error by checking whether this 64-bit value itself is negative.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      19773539
    • E
      IB/mthca: Add IPoIB checksum offload support · 680b575f
      Eli Cohen 提交于
      Arbel and Sinai devices support checksum generation and verification
      of TCP and UDP packets for UD IPoIB messages.  This patch checks if
      the HCA supports this and sets the IB_DEVICE_UD_IP_CSUM capability
      flag if it does.  It implements support for handling the IB_SEND_IP_CSUM
      send flag and setting the csum_ok field in receive work completions.
      Signed-off-by: NEli Cohen <eli@mellnaox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      680b575f
    • R
      IB/mthca: Formatting cleanups · b3999393
      Roland Dreier 提交于
      Fix a few whitespace and other coding style problems.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      b3999393
  21. 20 2月, 2008 1 次提交
  22. 13 2月, 2008 2 次提交
  23. 05 2月, 2008 1 次提交