1. 10 12月, 2009 1 次提交
  2. 20 11月, 2009 3 次提交
    • S
      RDMA/cm: fix loopback address support · 6f8372b6
      Sean Hefty 提交于
      The RDMA CM is intended to support the use of a loopback address
      when establishing a connection; however, the behavior of the CM
      when loopback addresses are used is confusing and does not always
      work, depending on whether loopback was specified by the server,
      the client, or both.
      
      The defined behavior of rdma_bind_addr is to associate an RDMA
      device with an rdma_cm_id, as long as the user specified a non-
      zero address.  (ie they weren't just trying to reserve a port)
      Currently, if the loopback address is passed to rdam_bind_addr,
      no device is associated with the rdma_cm_id.  Fix this.
      
      If a loopback address is specified by the client as the destination
      address for a connection, it will fail to establish a connection.
      This is true even if the server is listing across all addresses or
      on the loopback address itself.  The issue is that the server tries
      to translate the IP address carried in the REQ message to a local
      net_device address, which fails.  The translation is not needed in
      this case, since the REQ carries the actual HW address that should
      be used.
      
      Finally, cleanup loopback support to be more transport neutral.
      Replace separate calls to get/set the sgid and dgid from the
      device address to a single call that behaves correctly depending
      on the format of the device address.  And support both IPv4 and
      IPv6 address formats.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      
      [ Fixed RDS build by s/ib_addr_get/rdma_addr_get/  - Roland ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6f8372b6
    • S
      IB/addr: Store net_device type instead of translating to RDMA transport · c4315d85
      Sean Hefty 提交于
      The struct rdma_dev_addr stores net_device address information:
      the source device address, destination hardware address, and
      broadcast address.  For consistency, store the net_device type
      rather than converting it to the rdma_node_type.
      
      The type indicates the format of the various hardware addresses,
      which is what we're concerned with, and not the RDMA node type
      that the address may map to.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c4315d85
    • S
      RDMA/cma: Replace net_device pointer with index · 6266ed6e
      Sean Hefty 提交于
      Provide the device interface when resolving route information to
      ensure that the correct outbound device is used.  This will also
      simplify processing of sin6_scope_id for IPv6 support.
      
      Based on work from:
      David Wilder <dwilder@us.ibm.com>
      Jason Gunthorpe <jgunthrope@obsidianresearch.com>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6266ed6e
  3. 17 11月, 2009 1 次提交
    • S
      RDMA/ucma: Add option to manually set IB path · a7ca1f00
      Sean Hefty 提交于
      Export rdma_set_ib_paths to user space to allow applications to
      manually set the IB path used for connections.  This allows
      alternative ways for a user space application or library to obtain
      path record information, including retrieving path information
      from cached data, avoiding direct interaction with the IB SA.
      The IB SA is a single, centralized entity that can limit scaling
      on large clusters running MPI applications.
      
      Future changes to the rdma cm can expand on this framework to
      support the full range of features allowed by the IB CM, such as
      separate forward and reverse paths and APM.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Reviewed-By: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a7ca1f00
  4. 21 9月, 2009 1 次提交
  5. 28 2月, 2009 1 次提交
    • R
      IB/mad: Fix RMPP header RRespTime manipulation · 7020cb0f
      Ramachandra K 提交于
      Fix ib_set_rmpp_flags() to use the correct bit mask for RRespTime.  In
      the 8-bit field of the RMPP header, the first 5 bits are RRespTime and
      next 3 bits are RMPPFlags. Hence to retain the first 5 bits, the mask
      should be 0xF8 instead of 0xF1.
      
      ack_recv()-->format_ack() calls ib_set_rmpp_flags() and due to the
      incorrect ANDing with 0xF1, RRespTime got changed incorrectly and RMPP
      Acks sent back always had a RRespTime of 0x1E (30) which caused the
      other end to consider the time outs to be approximately 4297 seconds
      (i.e. in the order of 4*2^30) instead of the usual ~4 seconds (order
      of 4*2^20).
      Signed-off-by: NRamachandra K <ramachandra.kuchimanchi@qlogic.com>
      Acked-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7020cb0f
  6. 15 2月, 2009 1 次提交
  7. 18 1月, 2009 1 次提交
  8. 05 8月, 2008 1 次提交
    • R
      RDMA/cma: Remove padding arrays by using struct sockaddr_storage · 3f446754
      Roland Dreier 提交于
      There are a few places where the RDMA CM code handles IPv6 by doing
      
      	struct sockaddr		addr;
      	u8			pad[sizeof(struct sockaddr_in6) -
      				    sizeof(struct sockaddr)];
      
      This is fragile and ugly; handle this in a better way with just
      
      	struct sockaddr_storage	addr;
      
      [ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
        switch to struct sockaddr_storage and get rid of padding arrays in
        struct rdma_addr. ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      3f446754
  9. 27 7月, 2008 1 次提交
    • F
      dma-mapping: add the device argument to dma_mapping_error() · 8d8bb39b
      FUJITA Tomonori 提交于
      Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
      architecture does:
      
      This enables us to cleanly fix the Calgary IOMMU issue that some devices
      are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
      
      I think that per-device dma_mapping_ops support would be also helpful for
      KVM people to support PCI passthrough but Andi thinks that this makes it
      difficult to support the PCI passthrough (see the above thread).  So I
      CC'ed this to KVM camp.  Comments are appreciated.
      
      A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
      pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
      NULL, the system-wide dma_ops pointer is used as before.
      
      If it's useful for KVM people, I plan to implement a mechanism to register
      a hook called when a new pci (or dma capable) device is created (it works
      with hot plugging).  It enables IOMMUs to set up an appropriate
      dma_mapping_ops per device.
      
      The major obstacle is that dma_mapping_error doesn't take a pointer to the
      device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
      device.  Note all the POWER IOMMUs use the same dma_mapping_error function
      so this is not a problem for POWER but x86 IOMMUs use different
      dma_mapping_error functions.
      
      The first patch adds the device argument to dma_mapping_error.  The patch
      is trivial but large since it touches lots of drivers and dma-mapping.h in
      all the architecture.
      
      This patch:
      
      dma_mapping_error() doesn't take a pointer to the device unlike other DMA
      operations.  So we can't have dma_mapping_ops per device.
      
      Note that POWER already has dma_mapping_ops per device but all the POWER
      IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
      argument.
      
      [akpm@linux-foundation.org: fix sge]
      [akpm@linux-foundation.org: fix svc_rdma]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix bnx2x]
      [akpm@linux-foundation.org: fix s2io]
      [akpm@linux-foundation.org: fix pasemi_mac]
      [akpm@linux-foundation.org: fix sdhci]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix sparc]
      [akpm@linux-foundation.org: fix ibmvscsi]
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Avi Kivity <avi@qumranet.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d8bb39b
  10. 23 7月, 2008 2 次提交
  11. 15 7月, 2008 7 次提交
    • O
      RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr · 64c5e613
      Or Gerlitz 提交于
      Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
      and copy it in as part of rdma_copy_addr().  Use rdma_translate_ip()
      in cma_new_conn_id() to reduce some code duplication and also make
      sure the src_dev member gets set.
      
      In a high-availability configuration the netdevice pointer can be used
      by the RDMA CM to align RDMA sessions to use the same links as the IP
      stack does under fail-over and route change cases.
      Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      64c5e613
    • S
      RDMA/core: Add local DMA L_Key support · 96f15c03
      Steve Wise 提交于
      - Change the IB_DEVICE_ZERO_STAG flag to the transport-neutral name
        IB_DEVICE_LOCAL_DMA_LKEY, which is used by iWARP RNICs to indicate 0
        STag support and IB HCAs to indicate reserved L_Key support.
      
      - Add a u32 local_dma_lkey member to struct ib_device.  Drivers fill
        this in with the appropriate local DMA L_Key (if they support it).
      
      - Fix up the drivers using this flag.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      96f15c03
    • R
      IB/core: Add support for multicast loopback blocking · 47ee1b9f
      Ron Livne 提交于
      This patch also adds a creation flag for QPs,
      IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK, which when set means that
      multicast sends from the QP to a group that the QP is attached to will
      not be looped back to the QP's receive queue.  This can be used to
      save receive resources when a consumer does not want a local copy of
      multicast traffic; for example IPoIB must waste CPU time throwing away
      such local copies of multicast traffic.
      
      This patch also adds a device capability flag that shows whether a
      device supports this feature or not.
      Signed-off-by: NRon Livne <ronli@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      47ee1b9f
    • S
      RDMA/core: Add iWARP protocol statistics attributes in sysfs · 7f624d02
      Steve Wise 提交于
      This patch adds a sysfs attribute group called "proto_stats" under
      /sys/class/infiniband/$device/ and populates this group with protocol
      statistics if they exist for a given device.  Currently, only iWARP
      stats are defined, but the code is designed to allow InfiniBand
      protocol stats if they become available.  These stats are per-device
      and more importantly -not- per port.
      
      Details:
      
      - Add union rdma_protocol_stats in ib_verbs.h.  This union allows
        defining transport-specific stats.  Currently only iwarp stats are
        defined.
      
      - Add struct iw_protocol_stats to define the current set of iwarp
        protocol stats.
      
      - Add new ib_device method called get_proto_stats() to return protocol
        statistics.
      
      - Add logic in core/sysfs.c to create iwarp protocol stats attributes
        if the device is an RNIC and has a get_proto_stats() method.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7f624d02
    • S
      RDMA/core: Add memory management extensions support · 00f7ec36
      Steve Wise 提交于
      This patch adds support for the IB "base memory management extension"
      (BMME) and the equivalent iWARP operations (which the iWARP verbs
      mandates all devices must implement).  The new operations are:
      
       - Allocate an ib_mr for use in fast register work requests.
      
       - Allocate/free a physical buffer lists for use in fast register work
         requests.  This allows device drivers to allocate this memory as
         needed for use in posting send requests (eg via dma_alloc_coherent).
      
       - New send queue work requests:
         * send with remote invalidate
         * fast register memory region
         * local invalidate memory region
         * RDMA read with invalidate local memory region (iWARP only)
      
      Consumer interface details:
      
       - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
         to indicate device support for these features.
      
       - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
         IB_WR_RDMA_READ_WITH_INV are added.
      
       - A new consumer API function, ib_alloc_mr() is added to allocate
         fast register memory regions.
      
       - New consumer API functions, ib_alloc_fast_reg_page_list() and
         ib_free_fast_reg_page_list() are added to allocate and free
         device-specific memory for fast registration page lists.
      
       - A new consumer API function, ib_update_fast_reg_key(), is added to
         allow the key portion of the R_Key and L_Key of a fast registration
         MR to be updated.  Consumers call this if desired before posting
         a IB_WR_FAST_REG_MR work request.
      
      Consumers can use this as follows:
      
       - MR is allocated with ib_alloc_mr().
      
       - Page list memory is allocated with ib_alloc_fast_reg_page_list().
      
       - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().
      
       - MR made VALID and bound to a specific page list via
         ib_post_send(IB_WR_FAST_REG_MR)
      
       - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
         ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
         invalidate operation.
      
       - MR is deallocated with ib_dereg_mr()
      
       - page lists dealloced via ib_free_fast_reg_page_list().
      
      Applications can allocate a fast register MR once, and then can
      repeatedly bind the MR to different physical block lists (PBLs) via
      posting work requests to a send queue (SQ).  For each outstanding
      MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
      allocated (the fast_reg_page_list is owned by the low-level driver
      from the consumer posting a work request until the request completes).
      Thus pipelining can be achieved while still allowing device-specific
      page_list processing.
      
      The 32-bit fast register memory key/STag is composed of a 24-bit index
      and an 8-bit key.  The application can change the key each time it
      fast registers thus allowing more control over the peer's use of the
      key/STag (ie it can effectively be changed each time the rkey is
      rebound to a page list).
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      00f7ec36
    • D
      RDMA: Improve include file coding style · 4deccd6d
      Dotan Barak 提交于
      Remove subversion $Id lines and improve readability by fixing other
      coding style problems pointed out by checkpatch.pl.
      Signed-off-by: NDotan Barak <dotanba@gmail.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      4deccd6d
    • S
      RDMA: Fix license text · a9474917
      Sean Hefty 提交于
      The license text for several files references a third software license
      that was inadvertently copied in.  Update the license to what was
      intended.  This update was based on a request from HP.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a9474917
  12. 10 6月, 2008 1 次提交
    • R
      IB/core: Remove IB_DEVICE_SEND_W_INV capability flag · 4c0283fc
      Roland Dreier 提交于
      In 2.6.26, we added some support for send with invalidate work
      requests, including a device capability flag to indicate whether a
      device supports such requests.  However, the support was incomplete:
      the completion structure was not extended with a field for the key
      contained in incoming send with invalidate requests.
      
      Full support for memory management extensions (send with invalidate,
      local invalidate, fast register through a send queue, etc) is planned
      for 2.6.27.  Since send with invalidate is not very useful by itself,
      just remove the IB_DEVICE_SEND_W_INV bit before the 2.6.26 final
      release; we will add an IB_DEVICE_MEM_MGT_EXTENSIONS bit in 2.6.27,
      which makes things simpler for applications, since they will not have
      quite as confusing an array of fine-grained bits to check.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      4c0283fc
  13. 29 4月, 2008 1 次提交
    • A
      IB: expand ib_umem_get() prototype · cb9fbc5c
      Arthur Kepner 提交于
      Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
      when mapping user-allocated CQs with ib_umem_get().
      Signed-off-by: NArthur Kepner <akepner@sgi.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9fbc5c
  14. 20 4月, 2008 1 次提交
  15. 17 4月, 2008 5 次提交
    • E
      IB/core: Add support for modify CQ · 2dd57162
      Eli Cohen 提交于
      Add support for modifying CQ parameters for controlling event
      generation moderation.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      2dd57162
    • R
      IB/core: Add support for "send with invalidate" work requests · 0f39cf3d
      Roland Dreier 提交于
      Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
      "send with invalidate" work request as defined in the iWARP verbs and
      the InfiniBand base memory management extensions.  Also put "imm_data"
      and a new "invalidate_rkey" member in a new "ex" union in struct
      ib_send_wr. The invalidate_rkey member can be used to pass in an
      R_Key/STag to be invalidated.  Add this new union to struct
      ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
      ib_uverbs_post_send().
      
      Fix up low-level drivers to deal with the change to struct ib_send_wr,
      and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
      since that code never does any send with immediate operations.
      
      Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
      the iWARP drivers currently in the tree set the bit.  The amso1100
      driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
      if passed in as part of userspace send requests (since it does not
      implement kernel bypass work request queueing).  Remove the flag from
      all existing drivers that set it until we know which ones are OK.
      
      The values chosen for the new flag is not consecutive to avoid clashing
      with flags defined in the XRC patches, which are not merged yet but
      which are already in use and are likely to be merged soon.
      
      This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      0f39cf3d
    • E
      IB/core: Add IPoIB UD LSO support · c93570f2
      Eli Cohen 提交于
      LSO (large send offload) allows the networking stack to pass SKBs with
      data size larger than the MTU to the IPoIB driver and have the HCA HW
      fragment the data to multiple MSS-sized packets.  Add a device
      capability flag IB_DEVICE_UD_TSO for devices that can perform TCP
      segmentation offload, a new send work request opcode IB_WR_LSO,
      header, hlen and mss fields for the work request structure, and a new
      IB_WC_LSO completion type.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c93570f2
    • E
      IB/core: Add creation flags to struct ib_qp_init_attr · b846f25a
      Eli Cohen 提交于
      Add a create_flags member to struct ib_qp_init_attr that will allow a
      kernel verbs consumer to create a pass special flags when creating a QP.
      Add a flag value for telling low-level drivers that a QP will be used
      for IPoIB UD LSO.  The create_flags member will also be useful for XRC
      and ehca low-latency QP support.
      
      Since no create_flags handling is implemented yet, add code to all
      low-level drivers to return -EINVAL if create_flags is non-zero.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      b846f25a
    • R
      IB: Make struct ib_uobject.id a signed int · b3d636b0
      Roland Dreier 提交于
      IDR IDs are signed, so struct ib_uobject.id should be signed.  This
      avoids some sparse pointer signedness warnings.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      b3d636b0
  16. 09 2月, 2008 2 次提交
  17. 26 1月, 2008 2 次提交
    • S
      RDMA/cma: add support for rdma_migrate_id() · 88314e4d
      Sean Hefty 提交于
      This is based on user feedback from Doug Ledford at RedHat:
      
      Events that occur on an rdma_cm_id are reported to userspace through an
      event channel.  Connection request events are reported on the event
      channel associated with the listen.  When the connection is accepted, a
      new rdma_cm_id is created and automatically uses the listen event
      channel.  This is suboptimal where the user only wants listen events on
      that channel.
      
      Additionally, it may be desirable to have events related to connection
      establishment use a different event channel than those related to
      already established connections.
      
      Allow the user to migrate an rdma_cm_id between event channels. All
      pending events associated with the rdma_cm_id are moved to the new event
      channel.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      88314e4d
    • S
      IB/mad: Report number of times a mad was retried · 4fc8cd49
      Sean Hefty 提交于
      To allow ULPs to tune timeout values and capture retry statistics,
      report the number of times that a mad send operation was retried.
      
      For RMPP mads, report the total number of times that the any portion
      (send window) of the send operation was retried.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      4fc8cd49
  18. 25 1月, 2008 1 次提交
  19. 02 11月, 2007 1 次提交
  20. 10 10月, 2007 6 次提交
    • S
      IB/cm: Modify interface to send MRAs in response to duplicate messages · de98b693
      Sean Hefty 提交于
      The IB CM provides a message received acknowledged (MRA) message that
      can be sent to indicate that a REQ or REP message has been received, but
      will require more time to process than the timeout specified by those
      messages.  In many cases, the application may not know how long it will
      take to respond to a CM message, but the majority of the time, it will
      usually respond before a retry has been sent.  Rather than sending an
      MRA in response to all messages just to handle the case where a longer
      timeout is needed, it is more efficient to queue the MRA for sending in
      case a duplicate message is received.
      
      This avoids sending an MRA when it is not needed, but limits the number
      of times that a REQ or REP will be resent.  It also provides for a
      simpler implementation than generating the MRA based on a timer event.
      (That is, trying to send the MRA after receiving the first REQ or REP if
      a response has not been generated, so that it is received at the remote
      side before a duplicate REQ or REP has been received)
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      de98b693
    • R
      IB/umad: Fix bit ordering and 32-on-64 problems on big endian systems · a394f83b
      Roland Dreier 提交于
      The declaration of struct ib_user_mad_reg_req.method_mask[] exported
      to userspace was an array of __u32, but the kernel internally treated
      it as a bitmap made up of longs.  This makes a difference for 64-bit
      big-endian kernels, where numbering the bits in an array of__u32 gives:
      
          |31.....0|63....31|95....64|127...96|
      
      while numbering the bits in an array of longs gives:
      
          |63..............0|127............64|
      
      64-bit userspace can handle this by just treating method_mask[] as an
      array of longs, but 32-bit userspace is really stuck: the meaning of
      the bits in method_mask[] depends on whether the kernel is 32-bit or
      64-bit, and there's no sane way for userspace to know that.
      
      Fix this by updating <rdma/ib_user_mad.h> to make it clear that
      method_mask[] is an array of longs, and using a compat_ioctl method to
      convert to an array of 64-bit longs to handle the 32-on-64 problem.
      This fixes the interface description to match existing behavior (so
      working binaries continue to work) in almost all situations, and gives
      consistent semantics in the case of 32-bit userspace that can run on
      either a 32-bit or 64-bit kernel, so that the same binary can work for
      both 32-on-32 and 32-on-64 systems.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a394f83b
    • R
      IB/umad: Add P_Key index support · 2be8e3ee
      Roland Dreier 提交于
      Add support for setting the P_Key index of sent MADs and getting the
      P_Key index of received MADs.  This requires a change to the layout of
      the ABI structure struct ib_user_mad_hdr, so to avoid breaking
      compatibility, we default to the old (unchanged) ABI and add a new
      ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware
      of the new ABI to opt into using it.
      
      We plan on switching to the new ABI by default in a year or so, and
      this patch adds a warning that is printed when an application uses the
      old ABI, to push people towards converting to the new ABI.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Reviewed-by: NHal Rosenstock <hal@xsigo.com>
      2be8e3ee
    • J
      IB/umem: Add hugetlb flag to struct ib_umem · c8d8beea
      Joachim Fenkes 提交于
      During ib_umem_get(), determine whether all pages from the memory
      region are hugetlb pages and report this in the "hugetlb" member.
      Low-level drivers can use this information if they need it.
      Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c8d8beea
    • S
      RDMA/ucma: Allow user space to set service type · 7ce86409
      Sean Hefty 提交于
      Export the ability to set the type of service to user space.  Model
      the interface after setsockopt.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7ce86409
    • S
      RDMA/cma: Add ability to specify type of service · a81c994d
      Sean Hefty 提交于
      Provide support to specify a type of service for a communication
      identifier.  A new function call is used when dealing with IPv4
      addresses.  For IPv6 addresses, the ToS is specified through the
      traffic class field in the sockaddr_in6 structure.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      
      [ The comments Eitan Zahavi and myself have made over the v1 post at 
        <http://lists.openfabrics.org/pipermail/general/2007-August/039247.html>
        were fully addressed. ]
       
      Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> 
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a81c994d