1. 17 4月, 2008 1 次提交
  2. 09 2月, 2008 2 次提交
  3. 26 1月, 2008 2 次提交
    • S
      RDMA/cma: add support for rdma_migrate_id() · 88314e4d
      Sean Hefty 提交于
      This is based on user feedback from Doug Ledford at RedHat:
      
      Events that occur on an rdma_cm_id are reported to userspace through an
      event channel.  Connection request events are reported on the event
      channel associated with the listen.  When the connection is accepted, a
      new rdma_cm_id is created and automatically uses the listen event
      channel.  This is suboptimal where the user only wants listen events on
      that channel.
      
      Additionally, it may be desirable to have events related to connection
      establishment use a different event channel than those related to
      already established connections.
      
      Allow the user to migrate an rdma_cm_id between event channels. All
      pending events associated with the rdma_cm_id are moved to the new event
      channel.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      88314e4d
    • S
      IB/mad: Report number of times a mad was retried · 4fc8cd49
      Sean Hefty 提交于
      To allow ULPs to tune timeout values and capture retry statistics,
      report the number of times that a mad send operation was retried.
      
      For RMPP mads, report the total number of times that the any portion
      (send window) of the send operation was retried.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      4fc8cd49
  4. 25 1月, 2008 1 次提交
  5. 02 11月, 2007 1 次提交
  6. 10 10月, 2007 7 次提交
    • S
      IB/cm: Modify interface to send MRAs in response to duplicate messages · de98b693
      Sean Hefty 提交于
      The IB CM provides a message received acknowledged (MRA) message that
      can be sent to indicate that a REQ or REP message has been received, but
      will require more time to process than the timeout specified by those
      messages.  In many cases, the application may not know how long it will
      take to respond to a CM message, but the majority of the time, it will
      usually respond before a retry has been sent.  Rather than sending an
      MRA in response to all messages just to handle the case where a longer
      timeout is needed, it is more efficient to queue the MRA for sending in
      case a duplicate message is received.
      
      This avoids sending an MRA when it is not needed, but limits the number
      of times that a REQ or REP will be resent.  It also provides for a
      simpler implementation than generating the MRA based on a timer event.
      (That is, trying to send the MRA after receiving the first REQ or REP if
      a response has not been generated, so that it is received at the remote
      side before a duplicate REQ or REP has been received)
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      de98b693
    • R
      IB/umad: Fix bit ordering and 32-on-64 problems on big endian systems · a394f83b
      Roland Dreier 提交于
      The declaration of struct ib_user_mad_reg_req.method_mask[] exported
      to userspace was an array of __u32, but the kernel internally treated
      it as a bitmap made up of longs.  This makes a difference for 64-bit
      big-endian kernels, where numbering the bits in an array of__u32 gives:
      
          |31.....0|63....31|95....64|127...96|
      
      while numbering the bits in an array of longs gives:
      
          |63..............0|127............64|
      
      64-bit userspace can handle this by just treating method_mask[] as an
      array of longs, but 32-bit userspace is really stuck: the meaning of
      the bits in method_mask[] depends on whether the kernel is 32-bit or
      64-bit, and there's no sane way for userspace to know that.
      
      Fix this by updating <rdma/ib_user_mad.h> to make it clear that
      method_mask[] is an array of longs, and using a compat_ioctl method to
      convert to an array of 64-bit longs to handle the 32-on-64 problem.
      This fixes the interface description to match existing behavior (so
      working binaries continue to work) in almost all situations, and gives
      consistent semantics in the case of 32-bit userspace that can run on
      either a 32-bit or 64-bit kernel, so that the same binary can work for
      both 32-on-32 and 32-on-64 systems.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a394f83b
    • R
      IB/umad: Add P_Key index support · 2be8e3ee
      Roland Dreier 提交于
      Add support for setting the P_Key index of sent MADs and getting the
      P_Key index of received MADs.  This requires a change to the layout of
      the ABI structure struct ib_user_mad_hdr, so to avoid breaking
      compatibility, we default to the old (unchanged) ABI and add a new
      ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware
      of the new ABI to opt into using it.
      
      We plan on switching to the new ABI by default in a year or so, and
      this patch adds a warning that is printed when an application uses the
      old ABI, to push people towards converting to the new ABI.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Reviewed-by: NHal Rosenstock <hal@xsigo.com>
      2be8e3ee
    • J
      IB/umem: Add hugetlb flag to struct ib_umem · c8d8beea
      Joachim Fenkes 提交于
      During ib_umem_get(), determine whether all pages from the memory
      region are hugetlb pages and report this in the "hugetlb" member.
      Low-level drivers can use this information if they need it.
      Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c8d8beea
    • S
      RDMA/ucma: Allow user space to set service type · 7ce86409
      Sean Hefty 提交于
      Export the ability to set the type of service to user space.  Model
      the interface after setsockopt.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7ce86409
    • S
      RDMA/cma: Add ability to specify type of service · a81c994d
      Sean Hefty 提交于
      Provide support to specify a type of service for a communication
      identifier.  A new function call is used when dealing with IPv4
      addresses.  For IPv6 addresses, the ToS is specified through the
      traffic class field in the sockaddr_in6 structure.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      
      [ The comments Eitan Zahavi and myself have made over the v1 post at 
        <http://lists.openfabrics.org/pipermail/general/2007-August/039247.html>
        were fully addressed. ]
       
      Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> 
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a81c994d
    • S
      IB/sa: Add new QoS fields to path record · 733d65fe
      Sean Hefty 提交于
      The QoS annex defines new fields for path records.  Add them to the
      ib_sa for consumers that want to use them.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      733d65fe
  7. 04 8月, 2007 3 次提交
  8. 11 7月, 2007 2 次提交
  9. 22 5月, 2007 1 次提交
    • A
      Detach sched.h from mm.h · e8edc6e0
      Alexey Dobriyan 提交于
      First thing mm.h does is including sched.h solely for can_do_mlock() inline
      function which has "current" dereference inside. By dealing with can_do_mlock()
      mm.h can be detached from sched.h which is good. See below, why.
      
      This patch
      a) removes unconditional inclusion of sched.h from mm.h
      b) makes can_do_mlock() normal function in mm/mlock.c
      c) exports can_do_mlock() to not break compilation
      d) adds sched.h inclusions back to files that were getting it indirectly.
      e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
         getting them indirectly
      
      Net result is:
      a) mm.h users would get less code to open, read, preprocess, parse, ... if
         they don't need sched.h
      b) sched.h stops being dependency for significant number of files:
         on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
         after patch it's only 3744 (-8.3%).
      
      Cross-compile tested on
      
      	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
      	alpha alpha-up
      	arm
      	i386 i386-up i386-defconfig i386-allnoconfig
      	ia64 ia64-up
      	m68k
      	mips
      	parisc parisc-up
      	powerpc powerpc-up
      	s390 s390-up
      	sparc sparc-up
      	sparc64 sparc64-up
      	um-x86_64
      	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
      
      as well as my two usual configs.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8edc6e0
  10. 19 5月, 2007 1 次提交
  11. 09 5月, 2007 2 次提交
    • R
      IB: Put rlimit accounting struct in struct ib_umem · 1bf66a30
      Roland Dreier 提交于
      When memory pinned with ib_umem_get() is released, ib_umem_release()
      needs to subtract the amount of memory being unpinned from
      mm->locked_vm.  However, ib_umem_release() may be called with
      mm->mmap_sem already held for writing if the memory is being released
      as part of an munmap() call, so it is sometimes necessary to defer
      this accounting into a workqueue.
      
      However, the work struct used to defer this accounting is dynamically
      allocated before it is queued, so there is the possibility of failing
      that allocation.  If the allocation fails, then ib_umem_release has no
      choice except to bail out and leave the process with a permanently
      elevated locked_vm.
      
      Fix this by allocating the structure to defer accounting as part of
      the original struct ib_umem, so there's no possibility of failing a
      later allocation if creating the struct ib_umem and pinning memory
      succeeds.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      1bf66a30
    • R
      IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules · f7c6a7b5
      Roland Dreier 提交于
      Export ib_umem_get()/ib_umem_release() and put low-level drivers in
      control of when to call ib_umem_get() to pin and DMA map userspace,
      rather than always calling it in ib_uverbs_reg_mr() before calling the
      low-level driver's reg_user_mr method.
      
      Also move these functions to be in the ib_core module instead of
      ib_uverbs, so that driver modules using them do not depend on
      ib_uverbs.
      
      This has a number of advantages:
       - It is better design from the standpoint of making generic code a
         library that can be used or overridden by device-specific code as
         the details of specific devices dictate.
       - Drivers that do not need to pin userspace memory regions do not
         need to take the performance hit of calling ib_mem_get().  For
         example, although I have not tried to implement it in this patch,
         the ipath driver should be able to avoid pinning memory and just
         use copy_{to,from}_user() to access userspace memory regions.
       - Buffers that need special mapping treatment can be identified by
         the low-level driver.  For example, it may be possible to solve
         some Altix-specific memory ordering issues with mthca CQs in
         userspace by mapping CQ buffers with extra flags.
       - Drivers that need to pin and DMA map userspace memory for things
         other than memory regions can use ib_umem_get() directly, instead
         of hacks using extra parameters to their reg_phys_mr method.  For
         example, the mlx4 driver that is pending being merged needs to pin
         and DMA map QP and CQ buffers, but it does not need to create a
         memory key for these buffers.  So the cleanest solution is for mlx4
         to call ib_umem_get() in the create_qp and create_cq methods.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f7c6a7b5
  12. 07 5月, 2007 2 次提交
    • R
      IB: Return "maybe missed event" hint from ib_req_notify_cq() · ed23a727
      Roland Dreier 提交于
      The semantics defined by the InfiniBand specification say that
      completion events are only generated when a completions is added to a
      completion queue (CQ) after completion notification is requested.  In
      other words, this means that the following race is possible:
      
      	while (CQ is not empty)
      		ib_poll_cq(CQ);
      	// new completion is added after while loop is exited
      	ib_req_notify_cq(CQ);
      	// no event is generated for the existing completion
      
      To close this race, the IB spec recommends doing another poll of the
      CQ after requesting notification.
      
      However, it is not always possible to arrange code this way (for
      example, we have found that NAPI for IPoIB cannot poll after
      requesting notification).  Also, some hardware (eg Mellanox HCAs)
      actually will generate an event for completions added before the call
      to ib_req_notify_cq() -- which is allowed by the spec, since there's
      no way for any upper-layer consumer to know exactly when a completion
      was really added -- so the extra poll of the CQ is just a waste.
      
      Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
      ib_req_notify_cq() so that it can return a hint about whether the a
      completion may have been added before the request for notification.
      The return value of ib_req_notify_cq() is extended so:
      
      	 < 0	means an error occurred while requesting notification
      	== 0	means notification was requested successfully, and if
      		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
      		events were missed and it is safe to wait for another
      		event.
      	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
      		passed in.  It means that the consumer must poll the
      		CQ again to make sure it is empty to avoid the race
      		described above.
      
      We add a flag to enable this behavior rather than turning it on
      unconditionally, because checking for missed events may incur
      significant overhead for some low-level drivers, and consumers that
      don't care about the results of this test shouldn't be forced to pay
      for the test.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      ed23a727
    • M
      IB: Add CQ comp_vector support · f4fd0b22
      Michael S. Tsirkin 提交于
      Add a num_comp_vectors member to struct ib_device and extend
      ib_create_cq() to pass in a comp_vector parameter -- this parallels
      the userspace libibverbs API.  Update all hardware drivers to set
      num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
      value.  Pass the value of num_comp_vectors to userspace rather than
      hard-coding a value of 1.
      
      We want multiple CQ event vector support (via MSI-X or similar for
      adapters that can generate multiple interrupts), but it's not clear
      how many vectors we want, or how we want to deal with policy issues
      such as how to decide which vector to use or how to set up interrupt
      affinity.  This patch is useful for experimenting, since no core
      changes will be necessary when updating a driver to support multiple
      vectors, and we know that we want to make at least these changes
      anyway.
      Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f4fd0b22
  13. 03 5月, 2007 1 次提交
    • J
      PCI: Cleanup the includes of <linux/pci.h> · 6473d160
      Jean Delvare 提交于
      I noticed that many source files include <linux/pci.h> while they do
      not appear to need it. Here is an attempt to clean it all up.
      
      In order to find all possibly affected files, I searched for all
      files including <linux/pci.h> but without any other occurence of "pci"
      or "PCI". I removed the include statement from all of these, then I
      compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
      false positives manually.
      
      My tests covered 66% of the affected files, so there could be false
      positives remaining. Untested files are:
      
      arch/alpha/kernel/err_common.c
      arch/alpha/kernel/err_ev6.c
      arch/alpha/kernel/err_ev7.c
      arch/ia64/sn/kernel/huberror.c
      arch/ia64/sn/kernel/xpnet.c
      arch/m68knommu/kernel/dma.c
      arch/mips/lib/iomap.c
      arch/powerpc/platforms/pseries/ras.c
      arch/ppc/8260_io/enet.c
      arch/ppc/8260_io/fcc_enet.c
      arch/ppc/8xx_io/enet.c
      arch/ppc/syslib/ppc4xx_sgdma.c
      arch/sh64/mach-cayman/iomap.c
      arch/xtensa/kernel/xtensa_ksyms.c
      arch/xtensa/platform-iss/setup.c
      drivers/i2c/busses/i2c-at91.c
      drivers/i2c/busses/i2c-mpc.c
      drivers/media/video/saa711x.c
      drivers/misc/hdpuftrs/hdpu_cpustate.c
      drivers/misc/hdpuftrs/hdpu_nexus.c
      drivers/net/au1000_eth.c
      drivers/net/fec_8xx/fec_main.c
      drivers/net/fec_8xx/fec_mii.c
      drivers/net/fs_enet/fs_enet-main.c
      drivers/net/fs_enet/mac-fcc.c
      drivers/net/fs_enet/mac-fec.c
      drivers/net/fs_enet/mac-scc.c
      drivers/net/fs_enet/mii-bitbang.c
      drivers/net/fs_enet/mii-fec.c
      drivers/net/ibm_emac/ibm_emac_core.c
      drivers/net/lasi_82596.c
      drivers/parisc/hppb.c
      drivers/sbus/sbus.c
      drivers/video/g364fb.c
      drivers/video/platinumfb.c
      drivers/video/stifb.c
      drivers/video/valkyriefb.c
      include/asm-arm/arch-ixp4xx/dma.h
      sound/oss/au1550_ac97.c
      
      I would welcome test reports for these files. I am fine with removing
      the untested files from the patch if the general opinion is that these
      changes aren't safe. The tested part would still be nice to have.
      
      Note that this patch depends on another header fixup patch I submitted
      to LKML yesterday:
        [PATCH] scatterlist.h needs types.h
        http://lkml.org/lkml/2007/3/01/141Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      6473d160
  14. 17 2月, 2007 2 次提交
    • S
      RDMA/cma: Add multicast communication support · c8f6a362
      Sean Hefty 提交于
      Extend rdma_cm to support multicast communication.  Multicast support
      is added to the existing RDMA_PS_UDP port space, as well as a new
      RDMA_PS_IPOIB port space.  The latter port space allows joining the
      multicast groups used by IPoIB, which enables offloading IPoIB traffic
      to a separate QP.  The port space determines the signature used in the
      MGID when joining the group.  The newly added RDMA_PS_IPOIB also
      allows for unicast operations, similar to RDMA_PS_UDP.
      
      Supporting the RDMA_PS_IPOIB requires changing how UD QPs are initialized,
      since we can no longer assume that the qkey is constant.  This requires
      saving the Q_Key to use when attaching to a device, so that it is
      available when creating the QP.  The Q_Key information is exported to
      the user through the existing rdma_init_qp_attr() interface.
      
      Multicast support is also exported to userspace through the rdma_ucm.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c8f6a362
    • S
      IB/sa: Track multicast join/leave requests · faec2f7b
      Sean Hefty 提交于
      The IB SA tracks multicast join/leave requests on a per port basis and
      does not do any reference counting: if two users of the same port join
      the same group, and one leaves that group, then the SA will remove the
      port from the group even though there is one user who wants to stay a
      member left.  Therefore, in order to support multiple users of the
      same multicast group from the same port, we need to perform reference
      counting locally.
      
      To do this, add an multicast submodule to ib_sa to perform reference
      counting of multicast join/leave operations.  Modify ib_ipoib (the
      only in-kernel user of multicast) to use the new interface.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      faec2f7b
  15. 05 2月, 2007 3 次提交
  16. 16 12月, 2006 1 次提交
    • R
      IB: Fix ib_dma_alloc_coherent() wrapper · c59a3da1
      Roland Dreier 提交于
      The ib_dma_alloc_coherent() wrapper uses a u64* for the dma_handle
      parameter, unlike dma_alloc_coherent, which uses dma_addr_t*.  This
      means that we need a temporary variable to handle the case when
      ib_dma_alloc_coherent() just falls through directly to
      dma_alloc_coherent() on architectures where sizeof u64 != sizeof
      dma_addr_t.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c59a3da1
  17. 14 12月, 2006 1 次提交
  18. 13 12月, 2006 6 次提交
  19. 30 11月, 2006 1 次提交
    • S
      IB/cm: Fix automatic path migration support · e1444b5a
      Sean Hefty 提交于
      The ib_cm_establish() function is replaced with a more generic
      ib_cm_notify().  This routine is used to notify the CM that failover
      has occurred, so that future CM messages (LAP, DREQ) reach the remote
      CM.  (Currently, we continue to use the original path)  This bumps the
      userspace CM ABI.
      
      New alternate path information is captured when a LAP message is sent
      or received.  This allows QP attributes to be initialized for the user
      when a new path is loaded after failover occurs.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      e1444b5a