1. 15 9月, 2010 2 次提交
  2. 14 9月, 2010 1 次提交
  3. 11 9月, 2010 2 次提交
  4. 10 9月, 2010 5 次提交
  5. 09 9月, 2010 30 次提交
    • E
      udp: add rehash on connect() · 719f8358
      Eric Dumazet 提交于
      commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
      added a secondary hash on UDP, hashed on (local addr, local port).
      
      Problem is that following sequence :
      
      fd = socket(...)
      connect(fd, &remote, ...)
      
      not only selects remote end point (address and port), but also sets
      local address, while UDP stack stored in secondary hash table the socket
      while its local address was INADDR_ANY (or ipv6 equivalent)
      
      Sequence is :
       - autobind() : choose a random local port, insert socket in hash tables
                    [while local address is INADDR_ANY]
       - connect() : set remote address and port, change local address to IP
                    given by a route lookup.
      
      When an incoming UDP frame comes, if more than 10 sockets are found in
      primary hash table, we switch to secondary table, and fail to find
      socket because its local address changed.
      
      One solution to this problem is to rehash datagram socket if needed.
      
      We add a new rehash(struct socket *) method in "struct proto", and
      implement this method for UDP v4 & v6, using a common helper.
      
      This rehashing only takes care of secondary hash table, since primary
      hash (based on local port only) is not changed.
      Reported-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      719f8358
    • E
      net: inet_add_protocol() can use cmpxchg() · e0386005
      Eric Dumazet 提交于
      Use cmpxchg() to get rid of spinlocks in inet_add_protocol() and
      friends.
      
      inet_protos[] & inet6_protos[] are moved to read_mostly section
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0386005
    • A
      RDS: Implement masked atomic operations · 20c72bd5
      Andy Grover 提交于
      Add two CMSGs for masked versions of cswp and fadd. args
      struct modified to use a union for different atomic op type's
      arguments. Change IB to do masked atomic ops. Atomic op type
      in rds_message similarly unionized.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      20c72bd5
    • Z
      RDS/IB: print string constants in more places · 59f740a6
      Zach Brown 提交于
      This prints the constant identifier for work completion status and rdma
      cm event types, like we already do for IB event types.
      
      A core string array helper is added that each string type uses.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      59f740a6
    • Z
      RDS: cancel connection work structs as we shut down · 4518071a
      Zach Brown 提交于
      Nothing was canceling the send and receive work that might have been
      queued as a conn was being destroyed.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      4518071a
    • Z
      RDS: don't call rds_conn_shutdown() from rds_conn_destroy() · ffcec0e1
      Zach Brown 提交于
      rds_conn_shutdown() can return before the connection is shut down when
      it encounters an existing state that it doesn't understand.  This lets
      rds_conn_destroy() then start tearing down the conn from under paths
      that are still using it.
      
      It's more reliable the shutdown work and wait for krdsd to complete the
      shutdown callback.  This stopped some hangs I was seeing where krdsd was
      trying to shut down a freed conn.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      ffcec0e1
    • Z
      RDS: have sockets get transport module references · 5adb5bc6
      Zach Brown 提交于
      Right now there's nothing to stop the various paths that use
      rs->rs_transport from racing with rmmod and executing freed transport
      code.  The simple fix is to have binding to a transport also hold a
      reference to the transport's module, removing this class of races.
      
      We already had an unused t_owner field which was set for the modular
      transports and which wasn't set for the built-in loop transport.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      5adb5bc6
    • Z
      RDS: remove old rs_transport comment · 77510481
      Zach Brown 提交于
      rs_transport is now also used by the rdma paths once the socket is
      bound.  We don't need this stale comment to tell us what cscope can.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      77510481
    • Z
      RDS: lock rds_conn_count decrement in rds_conn_destroy() · fe8ff6b5
      Zach Brown 提交于
      rds_conn_destroy() can race with all other modifications of the
      rds_conn_count but it was modifying the count without locking.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      fe8ff6b5
    • Z
      RDS/IB: protect the list of IB devices · ea819867
      Zach Brown 提交于
      The RDS IB device list wasn't protected by any locking.  Traversal in
      both the get_mr and FMR flushing paths could race with additon and
      removal.
      
      List manipulation is done with RCU primatives and is protected by the
      write side of a rwsem.  The list traversal in the get_mr fast path is
      protected by a rcu read critical section.  The FMR list traversal is
      more problematic because it can block while traversing the list.  We
      protect this with the read side of the rwsem.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      ea819867
    • Z
      RDS/IB: print IB event strings as well as their number · 1bde04a6
      Zach Brown 提交于
      It's nice to not have to go digging in the code to see which event
      occurred.  It's easy to throw together a quick array that maps the ib
      event enums to their strings.  I didn't see anything in the stack that
      does this translation for us, but I also didn't look very hard.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      1bde04a6
    • C
      RDS: flush fmrs before allocating new ones · 8576f374
      Chris Mason 提交于
      Flushing FMRs is somewhat expensive, and is currently kicked off when
      the interrupt handler notices that we are getting low.  The result of
      this is that FMR flushing only happens from the interrupt cpus.
      
      This spreads the load more effectively by triggering flushes just before
      we allocate a new FMR.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8576f374
    • C
      RDS: properly use sg_init_table · b4e1da3c
      Chris Mason 提交于
      This is only needed to keep debugging code from bugging.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b4e1da3c
    • Z
      RDS/IB: track signaled sends · f046011c
      Zach Brown 提交于
      We're seeing bugs today where IB connection shutdown clears the send
      ring while the tasklet is processing completed sends.  Implementation
      details cause this to dereference a null pointer.  Shutdown needs to
      wait for send completion to stop before tearing down the connection.  We
      can't simply wait for the ring to empty because it may contain
      unsignaled sends that will never be processed.
      
      This patch tracks the number of signaled sends that we've posted and
      waits for them to complete.  It also makes sure that the tasklet has
      finished executing.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      f046011c
    • Z
      RDS: remove __init and __exit annotation · ef87b7ea
      Zach Brown 提交于
      The trivial amount of memory saved isn't worth the cost of dealing with section
      mismatches.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      ef87b7ea
    • A
      RDS/IB: Use SLAB_HWCACHE_ALIGN flag for kmem_cache_create() · c20f5b96
      Andy Grover 提交于
      We are *definitely* counting cycles as closely as DaveM, so
      ensure hwcache alignment for our recv ring control structs.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      c20f5b96
    • Z
      RDS/IB: always process recv completions · d455ab64
      Zach Brown 提交于
      The recv refill path was leaking fragments because the recv event handler had
      marked a ring element as free without freeing its frag.  This was happening
      because it wasn't processing receives when the conn wasn't marked up or
      connecting, as can be the case if it races with rmmod.
      
      Two observations support always processing receives in the callback.
      
      First, buildup should only post receives, thus triggering recv event handler
      calls, once it has built up all the state to handle them.  Teardown should
      destroy the CQ and drain the ring before tearing down the state needed to
      process recvs.  Both appear to be true today.
      
      Second, this test was fundamentally racy.  There is nothing to stop rmmod and
      connection destruction from swooping in the moment after the conn state was
      sampled but before real receive procesing starts.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      d455ab64
    • Z
      RDS: return to a single-threaded krdsd · 80c51be5
      Zach Brown 提交于
      We were seeing very nasty bugs due to fundamental assumption the current code
      makes about concurrent work struct processing.  The code simpy isn't able to
      handle concurrent connection shutdown work function execution today, for
      example, which is very much possible once a multi-threaded krdsd was
      introduced.  The problem compounds as additional work structs are added to the
      mix.
      
      krdsd is no longer perforance critical now that send and receive posting and
      FMR flushing are done elsewhere, so the safest fix is to move back to the
      single threaded krdsd that the current code was built around.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      80c51be5
    • Z
      RDS/IB: create a work queue for FMR flushing · 515e079d
      Zach Brown 提交于
      This patch moves the FMR flushing work in to its own mult-threaded work queue.
      This is to maintain performance in preparation for returning the main krdsd
      work queue back to a single threaded work queue to avoid deep-rooted
      concurrency bugs.
      
      This is also good because it further separates FMRs, which might be removed
      some day, from the rest of the code base.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      515e079d
    • Z
      RDS/IB: destroy connections on rmmod · 8aeb1ba6
      Zach Brown 提交于
      IB connections were not being destroyed during rmmod.
      
      First, recently IB device removal callback was changed to disconnect
      connections that used the removing device rather than destroying them.  So
      connections with devices during rmmod were not being destroyed.
      
      Second, rds_ib_destroy_nodev_conns() was being called before connections are
      disassociated with devices.  It would almost never find connections in the
      nodev list.
      
      We first get rid of rds_ib_destroy_conns(), which is no longer called, and
      refactor the existing caller into the main body of the function and get rid of
      the list and lock wrappers.
      
      Then we call rds_ib_destroy_nodev_conns() *after* ib_unregister_client() has
      removed the IB device from all the conns and put the conns on the nodev list.
      
      The result is that IB connections are destroyed by rmmod.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      8aeb1ba6
    • Z
      RDS/IB: wait for IB dev freeing work to finish during rmmod · 24fa163a
      Zach Brown 提交于
      The RDS IB client removal callback can queue work to drop the final reference
      to an IB device.  We have to make sure that this function has returned before
      we complete rmmod or the work threads can try to execute freed code.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      24fa163a
    • A
      RDS/IB: Make ib_recv_refill return void · b6fb0df1
      Andy Grover 提交于
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      b6fb0df1
    • A
      RDS: Remove unused XLIST_PTR_TAIL and xlist_protect() · fbf4d7e3
      Andy Grover 提交于
      Not used.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      fbf4d7e3
    • A
      RDS: whitespace · c9455d99
      Andy Grover 提交于
      c9455d99
    • C
      RDS: use delayed work for the FMR flushes · 7a0ff5db
      Chris Mason 提交于
      Using a delayed work queue helps us make sure a healthy number of FMRs
      have queued up over the limit.  It makes for a large improvement in RDMA
      iops.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7a0ff5db
    • C
      rds: more FMRs are faster · eabb7322
      Chris Mason 提交于
      When we add more FMRs, we flush them less often and so we go faster.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      eabb7322
    • C
      rds: recycle FMRs through lockless lists · 6fa70da6
      Chris Mason 提交于
      FRM allocation and recycling is performance critical and fairly lock
      intensive.  The current code has a per connection lock that all
      processes bang on and it becomes a major bottleneck on large systems.
      
      This changes things to use a number of cmpxchg based lists instead,
      allowing us to go through the whole FMR lifecycle without locking inside
      RDS.
      
      Zach Brown pointed out that our usage of cmpxchg for xlist removal is
      racey if someone manages to remove and add back an FMR struct into the list
      while another CPU can see the FMR's address at the head of the list.
      
      The second CPU might assume the list hasn't changed when in fact any
      number of operations might have happened in between the deletion and
      reinsertion.
      
      This commit maintains a per cpu count of CPUs that are currently
      in xlist removal, and establishes a grace period to make sure that
      nobody can see an entry we have just removed from the list.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6fa70da6
    • Z
      rds: fix rds_send_xmit() serialization · 0f4b1c7e
      Zach Brown 提交于
      rds_send_xmit() was changed to hold an interrupt masking spinlock instead of a
      mutex so that it could be called from the IB receive tasklet path.  This broke
      the TCP transport because its xmit method can block and masks and unmasks
      interrupts.
      
      This patch serializes callers to rds_send_xmit() with a simple bit instead of
      the current spinlock or previous mutex.  This enables rds_send_xmit() to be
      called from any context and to call functions which block.  Getting rid of the
      c_send_lock exposes the bare c_lock acquisitions which are changed to block
      interrupts.
      
      A waitqueue is added so that rds_conn_shutdown() can wait for callers to leave
      rds_send_xmit() before tearing down partial send state.  This lets us get rid
      of c_senders.
      
      rds_send_xmit() is changed to check the conn state after acquiring the
      RDS_IN_XMIT bit to resolve races with the shutdown path.  Previously both
      worked with the conn state and then the lock in the same order, allowing them
      to race and execute the paths concurrently.
      
      rds_send_reset() isn't racing with rds_send_xmit() now that rds_conn_shutdown()
      properly ensures that rds_send_xmit() can't start once the conn state has been
      changed.  We can remove its previous use of the spinlock.
      
      Finally, c_send_generation is redundant.  Callers can race to test the c_flags
      bit by simply retrying instead of racing to test the c_send_generation atomic.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      0f4b1c7e
    • Z
      rds: block ints when acquiring c_lock in rds_conn_message_info() · 501dcccd
      Zach Brown 提交于
      conn->c_lock is acquired in interrupt context.  rds_conn_message_info() is
      called from user context and was acquiring c_lock without blocking interrupts,
      leading to possible deadlocks.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      501dcccd
    • Z
      rds: remove unused rds_send_acked_before() · 671202f3
      Zach Brown 提交于
      rds_send_acked_before() wasn't blocking interrupts when acquiring c_lock from
      user context but nothing calls it.  Rather than fix its use of c_lock we just
      remove the function.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      671202f3