1. 20 9月, 2010 1 次提交
  2. 09 9月, 2010 13 次提交
    • Z
      RDS/IB: print string constants in more places · 59f740a6
      Zach Brown 提交于
      This prints the constant identifier for work completion status and rdma
      cm event types, like we already do for IB event types.
      
      A core string array helper is added that each string type uses.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      59f740a6
    • Z
      RDS/IB: print IB event strings as well as their number · 1bde04a6
      Zach Brown 提交于
      It's nice to not have to go digging in the code to see which event
      occurred.  It's easy to throw together a quick array that maps the ib
      event enums to their strings.  I didn't see anything in the stack that
      does this translation for us, but I also didn't look very hard.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      1bde04a6
    • Z
      RDS/IB: track signaled sends · f046011c
      Zach Brown 提交于
      We're seeing bugs today where IB connection shutdown clears the send
      ring while the tasklet is processing completed sends.  Implementation
      details cause this to dereference a null pointer.  Shutdown needs to
      wait for send completion to stop before tearing down the connection.  We
      can't simply wait for the ring to empty because it may contain
      unsignaled sends that will never be processed.
      
      This patch tracks the number of signaled sends that we've posted and
      waits for them to complete.  It also makes sure that the tasklet has
      finished executing.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      f046011c
    • C
      RDS/IB: Add caching of frags and incs · 33244125
      Chris Mason 提交于
      This patch is based heavily on an initial patch by Chris Mason.
      Instead of freeing slab memory and pages, it keeps them, and
      funnels them back to be reused.
      
      The lock minimization strategy uses xchg and cmpxchg atomic ops
      for manipulation of pointers to list heads. We anchor the lists with a
      pointer to a list_head struct instead of a static list_head struct.
      We just have to carefully use the existing primitives with
      the difference between a pointer and a static head struct.
      
      For example, 'list_empty()' means that our anchor pointer points to a list with
      a single item instead of meaning that our static head element doesn't point to
      any list items.
      
      Original patch by Chris, with significant mods and fixes by Andy and Zach.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      33244125
    • Z
      RDS/IB: add refcount tracking to struct rds_ib_device · 3e0249f9
      Zach Brown 提交于
      The RDS IB client .remove callback used to free the rds_ibdev for the given
      device unconditionally.  This could race other users of the struct.  This patch
      adds refcounting so that we only free the rds_ibdev once all of its users are
      done.
      
      Many rds_ibdev users are tied to connections.  We give the connection a
      reference and change these users to reference the device in the connection
      instead of looking it up in the IB client data.  The only user of the IB client
      data remaining is the first lookup of the device as connections are built up.
      
      Incrementing the reference count of a device found in the IB client data could
      race with final freeing so we use an RCU grace period to make sure that freeing
      won't happen until those lookups are done.
      
      MRs need the rds_ibdev to get at the pool that they're freed in to.  They exist
      outside a connection and many MRs can reference different devices from one
      socket, so it was natural to have each MR hold a reference.  MR refs can be
      dropped from interrupt handlers and final device teardown can block so we push
      it off to a work struct.  Pool teardown had to be fixed to cancel its pending
      work instead of deadlocking waiting for all queued work, including itself, to
      finish.
      
      MRs get their reference from the global device list, which gets a reference.
      It is left unprotected by locks and remains racy.  A simple global lock would
      be a significant bottleneck.  More scalable (complicated) locking should be
      done carefully in a later patch.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      3e0249f9
    • Z
      RDS/IB: rds_ib_cm_handle_connect() forgot to unlock c_cm_lock · a46ca94e
      Zach Brown 提交于
      rds_ib_cm_handle_connect() could return without unlocking the c_conn_lock if
      rds_setup_qp() failed.  Rather than adding another imbalanced mutex_unlock() to
      this error path we only unlock the mutex once as we exit the function, reducing
      the likelyhood of making this same mistake in the future.  We remove the
      previous mulitple return sites, leaving one unambigious return path.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      a46ca94e
    • A
      RDS/IB: add _to_node() macros for numa and use {k,v}malloc_node() · e4c52c98
      Andy Grover 提交于
      Allocate send/recv rings in memory that is node-local to the HCA.
      This significantly helps performance.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      e4c52c98
    • A
      RDS: Refill recv ring directly from tasklet · f17a1a55
      Andy Grover 提交于
      Performance is better if we use allocations that don't block
      to refill the receive ring. Since the whole reason we were
      kicking out to the worker thread was so we could do blocking
      allocs, we no longer need to do this.
      
      Remove gfp params from rds_ib_recv_refill(); we always use
      GFP_NOWAIT.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      f17a1a55
    • A
      RDS/IB: Do not wait for send ring to be empty on conn shutdown · e32b4a70
      Andy Grover 提交于
      Now that we are signaling send completions much less, we are likely
      to have dirty entries in the send queue when the connection is
      shut down (on rmmod, for example.) These are cleaned up a little
      further down in conn_shutdown, but if we wait on the ring_empty_wait
      for them, it'll never happen, and we hand on unload.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      e32b4a70
    • A
      RDS: Perform unmapping ops in stages · ff3d7d36
      Andy Grover 提交于
      Previously, RDS would wait until the final send WR had completed
      and then handle cleanup. With silent ops, we do not know
      if an atomic, rdma, or data op will be last. This patch
      handles any of these cases by keeping a pointer to the last
      op in the message in m_last_op.
      
      When the TX completion event fires, rds dispatches to per-op-type
      cleanup functions, and then does whole-message cleanup, if the
      last op equalled m_last_op.
      
      This patch also moves towards having op-specific functions take
      the op struct, instead of the overall rm struct.
      
      rds_ib_connection has a pointer to keep track of a a partially-
      completed data send operation. This patch changes it from an
      rds_message pointer to the narrower rm_data_op pointer, and
      modifies places that use this pointer as needed.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      ff3d7d36
    • A
      RDS/IB: Disallow connections less than RDS 3.1 · f147dd9e
      Andy Grover 提交于
      RDS 3.0 connections (in OFED 1.3 and earlier) put the
      header at the end. 3.1 connections put it at the head.
      The code has significant added complexity in order to
      handle both configurations. In OFED 1.6 we can
      drop this and simplify the code by only supporting
      "header-first" configuration.
      
      This patch checks the protocol version, and if prior
      to 3.1, does not complete the connection.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      f147dd9e
    • A
      RDS: Base init_depth and responder_resources on hw values · 40589e74
      Andy Grover 提交于
      Instead of using a constant for initiator_depth and
      responder_resources, read the per-QP values when the
      device is enumerated, and then use these values when creating
      the connection.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      40589e74
    • A
      RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisons · 8690bfa1
      Andy Grover 提交于
      Favor "if (foo)" style over "if (foo != NULL)".
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      8690bfa1
  3. 29 5月, 2010 1 次提交
  4. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  5. 17 3月, 2010 1 次提交
  6. 30 11月, 2009 1 次提交
  7. 31 10月, 2009 1 次提交
  8. 20 7月, 2009 6 次提交
  9. 02 4月, 2009 2 次提交
  10. 27 2月, 2009 1 次提交