1. 15 6月, 2016 4 次提交
  2. 08 6月, 2016 1 次提交
    • S
      RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one() · 9c79440e
      Sowmini Varadhan 提交于
      The send path needs to be quiesced before resetting callbacks from
      rds_tcp_accept_one(), and commit eb192840 ("RDS:TCP: Synchronize
      rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
      this using the c_state and RDS_IN_XMIT bit following the pattern
      used by rds_conn_shutdown(). However this leaves the possibility
      of a race window as shown in the sequence below
          take t_conn_lock in rds_tcp_conn_connect
          send outgoing syn to peer
          drop t_conn_lock in rds_tcp_conn_connect
          incoming from peer triggers rds_tcp_accept_one, conn is
      	marked CONNECTING
          wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
          call rds_tcp_reset_callbacks
          [.. race-window where incoming syn-ack can cause the conn
      	to be marked UP from rds_tcp_state_change ..]
          lock_sock called from rds_tcp_reset_callbacks, and we set
      	t_sock to null
      As soon as the conn is marked UP in the race-window above, rds_send_xmit()
      threads will proceed to rds_tcp_xmit and may encounter a null-pointer
      deref on the t_sock.
      
      Given that rds_tcp_state_change() is invoked in softirq context, whereas
      rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
      after lock_sock could result in a deadlock with tcp_sendmsg, this
      commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
      will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c79440e
  3. 03 3月, 2016 1 次提交
  4. 03 11月, 2015 1 次提交
  5. 05 10月, 2015 1 次提交
    • S
      RDS: Use a single TCP socket for both send and receive. · 3b20fc38
      Sowmini Varadhan 提交于
      Commit f711a6ae ("net/rds: RDS-TCP: Always create a new rds_sock
      for an incoming connection.") modified rds-tcp so that an incoming SYN
      would ignore an existing "client" TCP connection which had the local
      port set to the transient port.  The motivation for ignoring the existing
      "client" connection in f711a6ae was to avoid race conditions and an
      endless duel of reconnect attempts triggered by a restart/abort of one
      of the nodes in the TCP connection.
      
      However, having separate sockets for active and passive sides
      is avoidable, and the simpler model of a single TCP socket for
      both send and receives of all RDS connections associated with
      that tcp socket makes for easier observability. We avoid the race
      conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
      if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
      The c_outgoing bit is initialized in __rds_conn_create().
      
      A side-effect of re-using the client rds_connection for an incoming
      SYN is the potential of encountering duelling SYNs, i.e., we
      have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
      SYN. The logic to arbitrate this criss-crossing SYN exchange in
      rds_tcp_accept_one() has been modified to emulate the BGP state
      machine: the smaller IP address should back off from the connection attempt.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b20fc38
  6. 01 10月, 2015 1 次提交
    • S
      RDS: Use per-bucket rw lock for bind hash-table · 9b9acde7
      Santosh Shilimkar 提交于
      One global lock protecting hash-tables with 1024 buckets isn't
      efficient and it shows up in a massive systems with truck
      loads of RDS sockets serving multiple databases. The
      perf data clearly highlights the contention on the rw
      lock in these massive workloads.
      
      When the contention gets worse, the code gets into a state where
      it decides to back off on the lock. So while it has disabled interrupts,
      it sits and backs off on this lock get. This causes the system to
      become sluggish and eventually all sorts of bad things happen.
      
      The simple fix is to move the lock into the hash bucket and
      use per-bucket lock to improve the scalability.
      Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      9b9acde7
  7. 26 8月, 2015 1 次提交
  8. 08 8月, 2015 1 次提交
  9. 02 6月, 2015 1 次提交
    • W
      rds: re-entry of rds_ib_xmit/rds_iw_xmit · d655a9fb
      Wengang Wang 提交于
      The BUG_ON at line 452/453 is triggered in function rds_send_xmit.
      
       441                         while (ret) {
       442                                 tmp = min_t(int, ret, sg->length -
       443                                                       conn->c_xmit_data_off);
       444                                 conn->c_xmit_data_off += tmp;
       445                                 ret -= tmp;
       446                                 if (conn->c_xmit_data_off == sg->length) {
       447                                         conn->c_xmit_data_off = 0;
       448                                         sg++;
       449                                         conn->c_xmit_sg++;
       450                                         if (ret != 0 && conn->c_xmit_sg == rm->data.op_nents)
       451                                                 printk(KERN_ERR "conn %p rm %p sg %p ret %d\n", conn, rm, sg, ret);
       452                                         BUG_ON(ret != 0 &&
       453                                                conn->c_xmit_sg == rm->data.op_nents);
       454                                 }
       455                         }
      
      it is complaining the total sent length is bigger that we want to send.
      
      rds_ib_xmit() is wrong for the second entry for the same rds_message returning
      wrong value.
      
      the sg and off passed by rds_send_xmit to rds_ib_xmit is based on
      scatterlist.offset/length, but the rds_ib_xmit action is based on
      scatterlist.dma_address/dma_length. in case dma_length is larger than length
      there is problem. for the 2nd and later entries of rds_ib_xmit for same
      rds_message, at least one of the following two is wrong:
      
      1) the scatterlist to start with,  the choosen one can far beyond the correct
         one.
      2) the offset to start with within the scatterlist.
      
      fix:
      add op_dmasg and op_dmaoff to rm_data_op structure indicating the scatterlist
      and offset within the it to start with for rds_ib_xmit respectively. op_dmasg
      and op_dmaoff are initialized to zero when doing dma mapping for the first see
      of the message and are changed when filling send slots.
      
      the same applies to rds_iw_xmit too.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d655a9fb
  10. 01 6月, 2015 2 次提交
  11. 19 5月, 2015 1 次提交
  12. 09 4月, 2015 1 次提交
  13. 03 3月, 2015 1 次提交
  14. 24 11月, 2014 2 次提交
  15. 20 10月, 2013 1 次提交
  16. 20 3月, 2012 1 次提交
  17. 01 11月, 2011 1 次提交
  18. 20 1月, 2011 1 次提交
  19. 21 10月, 2010 1 次提交
  20. 09 9月, 2010 16 次提交
    • A
      RDS: Implement masked atomic operations · 20c72bd5
      Andy Grover 提交于
      Add two CMSGs for masked versions of cswp and fadd. args
      struct modified to use a union for different atomic op type's
      arguments. Change IB to do masked atomic ops. Atomic op type
      in rds_message similarly unionized.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      20c72bd5
    • Z
      RDS/IB: print string constants in more places · 59f740a6
      Zach Brown 提交于
      This prints the constant identifier for work completion status and rdma
      cm event types, like we already do for IB event types.
      
      A core string array helper is added that each string type uses.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      59f740a6
    • Z
      RDS: have sockets get transport module references · 5adb5bc6
      Zach Brown 提交于
      Right now there's nothing to stop the various paths that use
      rs->rs_transport from racing with rmmod and executing freed transport
      code.  The simple fix is to have binding to a transport also hold a
      reference to the transport's module, removing this class of races.
      
      We already had an unused t_owner field which was set for the modular
      transports and which wasn't set for the built-in loop transport.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      5adb5bc6
    • Z
      RDS: remove old rs_transport comment · 77510481
      Zach Brown 提交于
      rs_transport is now also used by the rdma paths once the socket is
      bound.  We don't need this stale comment to tell us what cscope can.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      77510481
    • Z
      RDS: remove __init and __exit annotation · ef87b7ea
      Zach Brown 提交于
      The trivial amount of memory saved isn't worth the cost of dealing with section
      mismatches.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      ef87b7ea
    • Z
      rds: fix rds_send_xmit() serialization · 0f4b1c7e
      Zach Brown 提交于
      rds_send_xmit() was changed to hold an interrupt masking spinlock instead of a
      mutex so that it could be called from the IB receive tasklet path.  This broke
      the TCP transport because its xmit method can block and masks and unmasks
      interrupts.
      
      This patch serializes callers to rds_send_xmit() with a simple bit instead of
      the current spinlock or previous mutex.  This enables rds_send_xmit() to be
      called from any context and to call functions which block.  Getting rid of the
      c_send_lock exposes the bare c_lock acquisitions which are changed to block
      interrupts.
      
      A waitqueue is added so that rds_conn_shutdown() can wait for callers to leave
      rds_send_xmit() before tearing down partial send state.  This lets us get rid
      of c_senders.
      
      rds_send_xmit() is changed to check the conn state after acquiring the
      RDS_IN_XMIT bit to resolve races with the shutdown path.  Previously both
      worked with the conn state and then the lock in the same order, allowing them
      to race and execute the paths concurrently.
      
      rds_send_reset() isn't racing with rds_send_xmit() now that rds_conn_shutdown()
      properly ensures that rds_send_xmit() can't start once the conn state has been
      changed.  We can remove its previous use of the spinlock.
      
      Finally, c_send_generation is redundant.  Callers can race to test the c_flags
      bit by simply retrying instead of racing to test the c_send_generation atomic.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      0f4b1c7e
    • Z
      rds: remove unused rds_send_acked_before() · 671202f3
      Zach Brown 提交于
      rds_send_acked_before() wasn't blocking interrupts when acquiring c_lock from
      user context but nothing calls it.  Rather than fix its use of c_lock we just
      remove the function.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      671202f3
    • Z
      RDS: introduce rds_conn_connect_if_down() · f3c6808d
      Zach Brown 提交于
      A few paths had the same block of code to queue a connection's connect work if
      it was in the right state.  Let's move this in to a helper function.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      f3c6808d
    • C
      rds: don't let RDS shutdown a connection while senders are present · 7e3f2952
      Chris Mason 提交于
      This is the first in a long line of patches that tries to fix races
      between RDS connection shutdown and RDS traffic.
      
      Here we are maintaining a count of active senders to make sure
      the connection doesn't go away while they are using it.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7e3f2952
    • C
      rds: Use RCU for the bind lookup searches · 38a4e5e6
      Chris Mason 提交于
      The RDS bind lookups are somewhat expensive in terms of CPU
      time and locking overhead.  This commit changes them into a
      faster RCU based hash tree instead of the rbtrees they were using
      before.
      
      On large NUMA systems it is a significant improvement.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      38a4e5e6
    • C
      rds: per-rm flush_wait waitq · c83188dc
      Chris Mason 提交于
      This removes a global waitqueue used to wait for rds messages
      and replaces it with a waitqueue inside the rds_message struct.
      
      The global waitqueue turns into a global lock and significantly
      bottlenecks operations on large machines.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      c83188dc
    • C
      RDS: Use a generation counter to avoid rds_send_xmit loop · 9e29db0e
      Chris Mason 提交于
      rds_send_xmit is required to loop around after it releases the lock
      because someone else could done a trylock, found someone working on the
      list and backed off.
      
      But, once we drop our lock, it is possible that someone else does come
      in and make progress on the list.  We should detect this and not loop
      around if another process is actually working on the list.
      
      This patch adds a generation counter that is bumped every time we
      get the lock and do some send work.  If the retry notices someone else
      has bumped the generation counter, it does not need to loop around and
      continue working.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      9e29db0e
    • A
      51e2cba8
    • A
      RDS: Change send lock from a mutex to a spinlock · 049ee3f5
      Andy Grover 提交于
      This change allows us to call rds_send_xmit() from a tasklet,
      which is crucial to our new operating model.
      
      * Change c_send_lock to a spinlock
      * Update stats fields "sem_" to "_lock"
      * Remove unneeded rds_conn_is_sending()
      
      About locking between shutdown and send -- send checks if the
      connection is up. Shutdown puts the connection into
      DISCONNECTING. After this, all threads entering send will exit
      immediately. However, a thread could be *in* send_xmit(), so
      shutdown acquires the c_send_lock to ensure everyone is out
      before proceeding with connection shutdown.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      049ee3f5
    • A
      RDS: Stop supporting old cong map sending method · 77dd550e
      Andy Grover 提交于
      We now ask the transport to give us a rm for the congestion
      map, and then we handle it normally. Previously, the
      transport defined a function that we would call to send
      a congestion map.
      
      Convert TCP and loop transports to new cong map method.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      77dd550e
    • A
      RDS: Perform unmapping ops in stages · ff3d7d36
      Andy Grover 提交于
      Previously, RDS would wait until the final send WR had completed
      and then handle cleanup. With silent ops, we do not know
      if an atomic, rdma, or data op will be last. This patch
      handles any of these cases by keeping a pointer to the last
      op in the message in m_last_op.
      
      When the TX completion event fires, rds dispatches to per-op-type
      cleanup functions, and then does whole-message cleanup, if the
      last op equalled m_last_op.
      
      This patch also moves towards having op-specific functions take
      the op struct, instead of the overall rm struct.
      
      rds_ib_connection has a pointer to keep track of a a partially-
      completed data send operation. This patch changes it from an
      rds_message pointer to the narrower rm_data_op pointer, and
      modifies places that use this pointer as needed.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      ff3d7d36