1. 09 4月, 2009 1 次提交
    • Y
      RDMA/cma: Create cm id even when IB port is down · d2ca39f2
      Yossi Etigin 提交于
      When doing rdma_resolve_addr(), if the relevant IB port is down, the
      function fails and the cm_id is not bound to the correct device.
      Therefore, application does not have a device handle and cannot wait
      for the port to become active.  The function fails because the
      underlying IPoIB interface is not joined to the broadcast group and
      therefore the SA does not have a multicast record to take a Q_Key
      from.
      
      The fix is to use lazy Q_Key resolution - cma_set_qkey() will set
      id_priv->qkey if it was not set, and will be called just before the
      Q_Key is really required.
      Signed-off-by: NYossi Etigin <yosefe@voltaire.com>
      Acked-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d2ca39f2
  2. 02 4月, 2009 1 次提交
  3. 05 3月, 2009 1 次提交
  4. 04 3月, 2009 2 次提交
    • J
      IB/sa_query: Fix AH leak due to update_sm_ah() race · 6b708b3d
      Jack Morgenstein 提交于
      Our testing uncovered a race condition in ib_sa_event():
      
      	spin_lock_irqsave(&port->ah_lock, flags);
      	if (port->sm_ah)
      		kref_put(&port->sm_ah->ref, free_sm_ah);
      	port->sm_ah = NULL;
      	spin_unlock_irqrestore(&port->ah_lock, flags);
      
      	schedule_work(&sa_dev->port[event->element.port_num -
      				    sa_dev->start_port].update_task);
      
      If two events occur back-to-back (e.g., client-reregister and LID
      change), both may pass the spinlock-protected code above before the
      scheduled work updates the port->sm_ah handle.  Then if the scheduled
      work ends up running twice, the second operation will then find a
      non-NULL port->sm_ah, and will simply overwrite it in update_sm_ah --
      resulting in an AH leak.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6b708b3d
    • R
      IB/mad: Fix ib_post_send_mad() returning 0 with no generate send comp · 4780c195
      Ralph Campbell 提交于
      If ib_post_send_mad() returns 0, the API guarantees that there will be
      a callback to send_buf->mad_agent->send_handler() so that the sender
      can call ib_free_send_mad().  Otherwise, the ib_mad_send_buf will be
      leaked and the mad_agent reference count will never go to zero and the
      IB device module cannot be unloaded.  The above can happen without
      this patch if process_mad() returns (IB_MAD_RESULT_SUCCESS |
      IB_MAD_RESULT_CONSUMED).
      
      If process_mad() returns IB_MAD_RESULT_SUCCESS and there is no agent
      registered to receive the mad being sent, handle_outgoing_dr_smp()
      returns zero which causes a MAD packet which is at the end of the
      directed route to be incorrectly sent on the wire but doesn't cause a
      hang since the HCA generates a send completion.
      Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      4780c195
  5. 28 2月, 2009 2 次提交
    • R
      IB/mad: initialize mad_agent_priv before putting on lists · d9620a4c
      Ralph Campbell 提交于
      There is a potential race in ib_register_mad_agent() where the struct
      ib_mad_agent_private is not fully initialized before it is added to
      the list of agents per IB port. This means the ib_mad_agent_private
      could be seen before the refcount, spin locks, and linked lists are
      initialized.  The fix is to initialize the structure earlier.
      Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d9620a4c
    • R
      IB/mad: Fix null pointer dereference in local_completions() · 1d9bc6d6
      Ralph Campbell 提交于
      handle_outgoing_dr_smp() can queue a struct ib_mad_local_private
      *local on the mad_agent_priv->local_work work queue with
      local->mad_priv == NULL if device->process_mad() returns
      IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY and
      (!ib_response_mad(&mad_priv->mad.mad) ||
      !mad_agent_priv->agent.recv_handler).
      
      In this case, local_completions() will be called with local->mad_priv
      == NULL. The code does check for this case and skips calling
      recv_mad_agent->agent.recv_handler() but recv == 0 so
      kmem_cache_free() is called with a NULL pointer.
      
      Also, since recv isn't reinitialized each time through the loop, it
      can cause a memory leak if recv should have been zero.
      Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
      1d9bc6d6
  6. 26 2月, 2009 1 次提交
    • R
      IB: Remove sysfs files before unregistering device · 9206dff1
      Roland Dreier 提交于
      Move the ib_device_unregister_sysfs() call from ib_dealloc_device() to
      ib_unregister_device().  The old code allows device unregister to
      proceed even if some sysfs files are open, which leaves a window where
      userspace can open a file before a device is removed but then end up
      reading the file after the device is removed, which leads to various
      kernel crashes either because the device data structure is freed or
      because the low-level driver code is gone after module removal.
      
      By not returning from ib_unregister_device() until after all sysfs
      entries are removed, we make sure that data structures and/or module
      code is not freed until after all sysfs access is done.
      Reported-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      9206dff1
  7. 18 1月, 2009 1 次提交
  8. 07 1月, 2009 1 次提交
  9. 30 12月, 2008 1 次提交
    • R
      RDMA/addr: Fix build breakage when IPv6 is disabled · 2c4ab624
      Roland Dreier 提交于
      Commit 38617c64 ("RDMA/addr: Add support for translating IPv6
      addresses") broke the build when CONFIG_IPV6=n, because the ib_addr
      module unconditionally attempted to call ipv6_chk_addr() and other
      IPv6 functions that are not defined when IPv6 is disabled.  Fix this
      by only building IPv6 support if CONFIG_IPV6 is turned on, and
      add a Kconfig dependency to prevent the ib_addr code from being built
      in when IPv6 is built modular.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      2c4ab624
  10. 25 12月, 2008 2 次提交
  11. 02 11月, 2008 1 次提交
    • A
      saner FASYNC handling on file close · 233e70f4
      Al Viro 提交于
      As it is, all instances of ->release() for files that have ->fasync()
      need to remember to evict file from fasync lists; forgetting that
      creates a hole and we actually have a bunch that *does* forget.
      
      So let's keep our lives simple - let __fput() check FASYNC in
      file->f_flags and call ->fasync() there if it's been set.  And lose that
      crap in ->release() instances - leaving it there is still valid, but we
      don't have to bother anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      233e70f4
  12. 30 10月, 2008 1 次提交
  13. 29 10月, 2008 1 次提交
  14. 20 10月, 2008 1 次提交
    • P
      x86: sysfs: kill owner field from attribute · 01e8ef11
      Parag Warudkar 提交于
      Tejun's commit 7b595756 made sysfs
      attribute->owner unnecessary.  But the field was left in the structure to
      ease the merge.  It's been over a year since that change and it is now
      time to start killing attribute->owner along with its users - one arch at
      a time!
      
      This patch is attempt #1 to get rid of attribute->owner only for
      CONFIG_X86_64 or CONFIG_X86_32 .  We will deal with other arches later on
      as and when possible - avr32 will be the next since that is something I
      can test.  Compile (make allyesconfig / make allmodconfig / custom config)
      and boot tested.
      
      akpm: the idea is that we put the declaration of sttribute.owner inside
      `#ifndef CONFIG_X86'.  But that proved to be too ambitious for now because
      new usages kept on turning up in subsystem trees.
      
      [akpm: remove the ifdef for now]
      Signed-off-by: NParag Warudkar <parag.lkml@gmail.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Jean Delvare <khali@linux-fr.org>
      Cc: Roland Dreier <rolandd@cisco.com>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01e8ef11
  15. 17 10月, 2008 1 次提交
  16. 15 10月, 2008 1 次提交
  17. 11 10月, 2008 1 次提交
  18. 01 10月, 2008 1 次提交
  19. 21 9月, 2008 1 次提交
  20. 08 8月, 2008 1 次提交
  21. 05 8月, 2008 1 次提交
    • R
      RDMA/cma: Remove padding arrays by using struct sockaddr_storage · 3f446754
      Roland Dreier 提交于
      There are a few places where the RDMA CM code handles IPv6 by doing
      
      	struct sockaddr		addr;
      	u8			pad[sizeof(struct sockaddr_in6) -
      				    sizeof(struct sockaddr)];
      
      This is fragile and ugly; handle this in a better way with just
      
      	struct sockaddr_storage	addr;
      
      [ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
        switch to struct sockaddr_storage and get rid of padding arrays in
        struct rdma_addr. ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      3f446754
  22. 25 7月, 2008 2 次提交
  23. 23 7月, 2008 4 次提交
  24. 22 7月, 2008 2 次提交
  25. 15 7月, 2008 8 次提交
    • O
      RDMA/cma: Simplify locking needed for serialization of callbacks · de910bd9
      Or Gerlitz 提交于
      The RDMA CM has some logic in place to make sure that callbacks on a
      given CM ID are delivered to the consumer in a serialized manner.
      Specifically it has code to protect against a device removal racing
      with a running callback function.
      
      This patch simplifies this logic by using a mutex per ID instead of a
      wait queue and atomic variable.  This means that cma_disable_remove()
      now is more properly named to cma_disable_callback(), and
      cma_enable_remove() can now be removed because it just would become a
      trivial wrapper around mutex_unlock().
      Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      de910bd9
    • O
      RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr · 64c5e613
      Or Gerlitz 提交于
      Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
      and copy it in as part of rdma_copy_addr().  Use rdma_translate_ip()
      in cma_new_conn_id() to reduce some code duplication and also make
      sure the src_dev member gets set.
      
      In a high-availability configuration the netdevice pointer can be used
      by the RDMA CM to align RDMA sessions to use the same links as the IP
      stack does under fail-over and route change cases.
      Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      64c5e613
    • S
      RDMA/core: Add iWARP protocol statistics attributes in sysfs · 7f624d02
      Steve Wise 提交于
      This patch adds a sysfs attribute group called "proto_stats" under
      /sys/class/infiniband/$device/ and populates this group with protocol
      statistics if they exist for a given device.  Currently, only iWARP
      stats are defined, but the code is designed to allow InfiniBand
      protocol stats if they become available.  These stats are per-device
      and more importantly -not- per port.
      
      Details:
      
      - Add union rdma_protocol_stats in ib_verbs.h.  This union allows
        defining transport-specific stats.  Currently only iwarp stats are
        defined.
      
      - Add struct iw_protocol_stats to define the current set of iwarp
        protocol stats.
      
      - Add new ib_device method called get_proto_stats() to return protocol
        statistics.
      
      - Add logic in core/sysfs.c to create iwarp protocol stats attributes
        if the device is an RNIC and has a get_proto_stats() method.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7f624d02
    • R
      468f2239
    • R
      IB/core: Reset to error QP state transition is not allowed · e5a5e7d5
      Ralph Campbell 提交于
      I was reviewing the QP state transition diagram in the IB 1.2.1 spec
      and the code for qp_state_table[], and noticed that the code allows a
      QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes
      for figure 124 (pg 457) specifically says that this transition isn't
      allowed.  This is a clarification from earlier versions of the IB
      spec, which were ambiguous in this area and suggested that the RESET
      to ERR transition was allowed.
      
      Fix up the qp_state_table[] to make RESET->ERR not allowed.
      Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      e5a5e7d5
    • S
      RDMA/core: Add memory management extensions support · 00f7ec36
      Steve Wise 提交于
      This patch adds support for the IB "base memory management extension"
      (BMME) and the equivalent iWARP operations (which the iWARP verbs
      mandates all devices must implement).  The new operations are:
      
       - Allocate an ib_mr for use in fast register work requests.
      
       - Allocate/free a physical buffer lists for use in fast register work
         requests.  This allows device drivers to allocate this memory as
         needed for use in posting send requests (eg via dma_alloc_coherent).
      
       - New send queue work requests:
         * send with remote invalidate
         * fast register memory region
         * local invalidate memory region
         * RDMA read with invalidate local memory region (iWARP only)
      
      Consumer interface details:
      
       - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
         to indicate device support for these features.
      
       - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
         IB_WR_RDMA_READ_WITH_INV are added.
      
       - A new consumer API function, ib_alloc_mr() is added to allocate
         fast register memory regions.
      
       - New consumer API functions, ib_alloc_fast_reg_page_list() and
         ib_free_fast_reg_page_list() are added to allocate and free
         device-specific memory for fast registration page lists.
      
       - A new consumer API function, ib_update_fast_reg_key(), is added to
         allow the key portion of the R_Key and L_Key of a fast registration
         MR to be updated.  Consumers call this if desired before posting
         a IB_WR_FAST_REG_MR work request.
      
      Consumers can use this as follows:
      
       - MR is allocated with ib_alloc_mr().
      
       - Page list memory is allocated with ib_alloc_fast_reg_page_list().
      
       - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().
      
       - MR made VALID and bound to a specific page list via
         ib_post_send(IB_WR_FAST_REG_MR)
      
       - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
         ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
         invalidate operation.
      
       - MR is deallocated with ib_dereg_mr()
      
       - page lists dealloced via ib_free_fast_reg_page_list().
      
      Applications can allocate a fast register MR once, and then can
      repeatedly bind the MR to different physical block lists (PBLs) via
      posting work requests to a send queue (SQ).  For each outstanding
      MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
      allocated (the fast_reg_page_list is owned by the low-level driver
      from the consumer posting a work request until the request completes).
      Thus pipelining can be achieved while still allowing device-specific
      page_list processing.
      
      The 32-bit fast register memory key/STag is composed of a 24-bit index
      and an 8-bit key.  The application can change the key each time it
      fast registers thus allowing more control over the peer's use of the
      key/STag (ie it can effectively be changed each time the rkey is
      rebound to a page list).
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      00f7ec36
    • R
      RDMA: Remove subversion $Id tags · f3781d2e
      Roland Dreier 提交于
      They don't get updated by git and so they're worse than useless.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f3781d2e
    • M
      IB/sa: Fail requests made while creating new SM AH · 164ba089
      Moni Shoua 提交于
      This patch solves a race that occurs after an event occurs that causes
      the SA query module to flush its SM address handle (AH).  When SM AH
      becomes invalid and needs an update it is handled by the global
      workqueue.  On the other hand this event is also handled in the IPoIB
      driver by queuing work in the ipoib_workqueue that does multicast
      joins.  Although queuing is in the right order, it is done to 2
      different workqueues and so there is no guarantee that the first to be
      queued is the first to be executed.
      
      This causes a problem because IPoIB may end up sending an request to
      the old SM, which will take a long time to time out (since the old SM
      is gone); this leads to a much longer than necessary interruption in
      multicast traffer.
      
      The patch sets the SA query module's SM AH to NULL when the event
      occurs, and until update_sm_ah() is done, any request that needs sm_ah
      fails with -EAGAIN return status.
      
      For consumers, the patch doesn't make things worse.  Before the patch,
      MADs are sent to the wrong SM so the request gets lost.  Consumers can
      be improved if they examine the return code and respond to EAGAIN
      properly but even without an improvement the situation is not getting
      worse.
      Signed-off-by: NMoni Levy <monil@voltaire.com>
      Signed-off-by: NMoni Shoua <monis@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      164ba089